Commit Graph

1118 Commits

Author SHA1 Message Date
Tom Lane da3df47c84 lmgr.c:DescribeLockTag was never taught about virtual xids, per Greg Stark.
Also a couple of minor tweaks to try to future-proof the code a bit better
against future locktag additions.
2008-01-08 23:18:51 +00:00
Bruce Momjian 9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
Peter Eisentraut 5ca3d50db7 Clarify log messages 2007-12-13 11:55:44 +00:00
Tom Lane 895a94de6d Avoid incrementing the CommandCounter when CommandCounterIncrement is called
but no database changes have been made since the last CommandCounterIncrement.
This should result in a significant improvement in the number of "commands"
that can typically be performed within a transaction before hitting the 2^32
CommandId size limit.  In particular this buys back (and more) the possible
adverse consequences of my previous patch to fix plan caching behavior.

The implementation requires tracking whether the current CommandCounter
value has been "used" to mark any tuples.  CommandCounter values stored into
snapshots are presumed not to be used for this purpose.  This requires some
small executor changes, since the executor used to conflate the curcid of
the snapshot it was using with the command ID to mark output tuples with.
Separating these concepts allows some small simplifications in executor APIs.

Something for the TODO list: look into having CommandCounterIncrement not do
AcceptInvalidationMessages.  It seems fairly bogus to be doing it there,
but exactly where to do it instead isn't clear, and I'm disinclined to mess
with asynchronous behavior during late beta.
2007-11-30 21:22:54 +00:00
Tom Lane eae7e00f1f Fix stupid typo in recently-added code :-( 2007-11-16 00:57:55 +00:00
Bruce Momjian f6e8730d11 Re-run pgindent with updated list of typedefs. (Updated README should
avoid this problem in the future.)
2007-11-15 22:25:18 +00:00
Tom Lane 591b9b091c Use ftruncate() not truncate() in mdunlink. Seems Windows doesn't
support the latter.
2007-11-15 21:49:47 +00:00
Bruce Momjian fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
Tom Lane 6cc4451b5c Prevent re-use of a deleted relation's relfilenode until after the next
checkpoint.  This guards against an unlikely data-loss scenario in which
we re-use the relfilenode, then crash, then replay the deletion and
recreation of the file.  Even then we'd be OK if all insertions into the
new relation had been WAL-logged ... but that's not guaranteed given all
the no-WAL-logging optimizations that have recently been added.

Patch by Heikki Linnakangas, per a discussion last month.
2007-11-15 20:36:40 +00:00
Tom Lane 69500b05d6 Prevent continuing disk-space bloat when profiling (with PROFILE_PID_DIR
enabled) and autovacuum is on.  Since there will be a steady stream of autovac
worker processes exiting and dropping gmon.out files, allowing them to make
separate subdirectories results in serious bloat; and it seems unlikely that
anyone will care about those profiles anyway.  Limit the damage by forcing all
autovac workers to dump in one subdirectory, PGDATA/gprof/avworker/.

Per report from Jšrg Beyer and subsequent discussion.
2007-11-04 17:55:15 +00:00
Alvaro Herrera acac68b2bc Allow an autovacuum worker to be interrupted automatically when it is found
to be locking another process (except when it's working to prevent Xid
wraparound problems).
2007-10-26 20:45:10 +00:00
Alvaro Herrera 745c1b2c2a Rearrange vacuum-related bits in PGPROC as a bitmask, to better support
having several of them.  Add two more flags: whether the process is
executing an ANALYZE, and whether a vacuum is for Xid wraparound (which
is obviously only set by autovacuum).

Sneakily move the worker's recently-acquired PostAuthDelay to a more useful
place.
2007-10-24 20:55:36 +00:00
Tom Lane 7a315a09dc Dept. of second thoughts: fix loop in BgBufferSync so that the exit when
bgwriter_lru_maxpages is exceeded leaves the loop variables in the
expected state.  In the original coding, we'd fail to advance
next_to_clean, causing that buffer to be probably-uselessly rechecked next
time, and also have an off-by-one idea of the number of buffers scanned.
2007-09-25 22:11:48 +00:00
Tom Lane 6f5c38dcd0 Just-in-time background writing strategy. This code avoids re-scanning
buffers that cannot possibly need to be cleaned, and estimates how many
buffers it should try to clean based on moving averages of recent allocation
requests and density of reusable buffers.  The patch also adds a couple
more columns to pg_stat_bgwriter to help measure the effectiveness of the
bgwriter.

Greg Smith, building on his own work and ideas from several other people,
in particular a much older patch from Itagaki Takahiro.
2007-09-25 20:03:38 +00:00
Tom Lane 1b3d400cac TransactionIdIsInProgress can skip scanning the ProcArray if the target XID is
later than latestCompletedXid, per Florian Pflug.  Also some minor
improvements in the XIDCACHE_DEBUG code --- make sure each call of
TransactionIdIsInProgress is counted one way or another.
2007-09-23 18:50:38 +00:00
Tom Lane cc59049daf Improve handling of prune/no-prune decisions by storing a page's oldest
unpruned XMAX in its header.  At the cost of 4 bytes per page, this keeps us
from performing heap_page_prune when there's no chance of pruning anything.
Seems to be necessary per Heikki's preliminary performance testing.
2007-09-21 21:25:42 +00:00
Tom Lane da072ab2ab Make some simple performance improvements in TransactionIdIsInProgress().
For XIDs of our own transaction and subtransactions, it's cheaper to ask
TransactionIdIsCurrentTransactionId() than to look in shared memory.
Also, the xids[] work array is always the same size within any given
process, so malloc it just once instead of doing a palloc/pfree on every
call; aside from being faster this lets us get rid of some goto's, since
we no longer have any end-of-function pfree to do.  Both ideas by Heikki.
2007-09-21 17:36:53 +00:00
Tom Lane 282d2a03dd HOT updates. When we update a tuple without changing any of its indexed
columns, and the new version can be stored on the same heap page, we no longer
generate extra index entries for the new version.  Instead, index searches
follow the HOT-chain links to ensure they find the correct tuple version.

In addition, this patch introduces the ability to "prune" dead tuples on a
per-page basis, without having to do a complete VACUUM pass to recover space.
VACUUM is still needed to clean up dead index entries, however.

Pavan Deolasee, with help from a bunch of other people.
2007-09-20 17:56:33 +00:00
Tom Lane 6889303531 Redefine the lp_flags field of item pointers as having four states, rather
than two independent bits (one of which was never used in heap pages anyway,
or at least hadn't been in a very long time).  This gives us flexibility to
add the HOT notions of redirected and dead item pointers without requiring
anything so klugy as magic values of lp_off and lp_len.  The state values
are chosen so that for the states currently in use (pre-HOT) there is no
change in the physical representation.
2007-09-12 22:10:26 +00:00
Tom Lane 6bd4f401b0 Replace the former method of determining snapshot xmax --- to wit, calling
ReadNewTransactionId from GetSnapshotData --- with a "latestCompletedXid"
variable that is updated during transaction commit or abort.  Since
latestCompletedXid is written only in places that had to lock ProcArrayLock
exclusively anyway, and is read only in places that had to lock ProcArrayLock
shared anyway, it adds no new locking requirements to the system despite being
cluster-wide.  Moreover, removing ReadNewTransactionId from snapshot
acquisition eliminates the need to take both XidGenLock and ProcArrayLock at
the same time.  Since XidGenLock is sometimes held across I/O this can be a
significant win.  Some preliminary benchmarking suggested that this patch has
no effect on average throughput but can significantly improve the worst-case
transaction times seen in pgbench.  Concept by Florian Pflug, implementation
by Tom Lane.
2007-09-08 20:31:15 +00:00
Tom Lane 0a51e7073c Don't take ProcArrayLock while exiting a transaction that has no XID; there is
no need for serialization against snapshot-taking because the xact doesn't
affect anyone else's snapshot anyway.  Per discussion.  Also, move various
info about the interlocking of transactions and snapshots out of code comments
and into a hopefully-more-cohesive discussion in access/transam/README.

Also, remove a couple of now-obsolete comments about having to force some WAL
to be written to persuade RecordTransactionCommit to do its thing.
2007-09-07 20:59:26 +00:00
Tom Lane cd1aae5864 Allow CREATE INDEX CONCURRENTLY to disregard transactions in other
databases, per gripe from hubert depesz lubaczewski.  Patch from
Simon Riggs.
2007-09-07 00:58:57 +00:00
Tom Lane 0ecb4ea773 Volatile-qualify the ProcArray PGPROC pointer in a bunch of routines
that examine fields that could change under them.  This is just to make
really sure that when we are fetching a value 'only once', that's what
actually happens.  Possibly this is a bug that should be back-patched,
but in the absence of solid evidence that it's needed, I won't bother.
2007-09-05 21:11:19 +00:00
Tom Lane 295e63983d Implement lazy XID allocation: transactions that do not modify any database
rows will normally never obtain an XID at all.  We already did things this way
for subtransactions, but this patch extends the concept to top-level
transactions.  In applications where there are lots of short read-only
transactions, this should improve performance noticeably; not so much from
removal of the actual XID-assignments, as from reduction of overhead that's
driven by the rate of XID consumption.  We add a concept of a "virtual
transaction ID" so that active transactions can be uniquely identified even
if they don't have a regular XID.  This is a much lighter-weight concept:
uniqueness of VXIDs is only guaranteed over the short term, and no on-disk
record is made about them.

Florian Pflug, with some editorialization by Tom.
2007-09-05 18:10:48 +00:00
Tom Lane 24d4517b3b Improve behavior of log_lock_waits patch. Ensure that something gets logged
even if the "deadlock detected" ERROR message is suppressed by an exception
catcher.  Be clearer about the event sequence when a soft deadlock is fixed:
the fixing process might or might not still have to wait, so log that
separately.  Fix race condition when someone releases us from the lock partway
through printing all this junk --- we'd not get confused about our state, but
the log message sequence could have been misleading, ie, a "still waiting"
message with no subsequent "acquired" message.  Greg Stark and Tom Lane.
2007-08-28 03:23:44 +00:00
Tom Lane e4f4a7f5a4 Remove FileUnlink(), which wasn't being used anywhere and interacted poorly
with the recent patch to log temp file sizes at removal time.  Doesn't seem
worth fixing since it's unused.
In passing, make a few elog messages conform to the message style guide.
2007-07-26 15:15:18 +00:00
Tom Lane 82eed4dba2 Arrange to put TOAST tables belonging to temporary tables into special schemas
named pg_toast_temp_nnn, alongside the pg_temp_nnn schemas used for the temp
tables themselves.  This allows low-level code such as the relcache to
recognize that these tables are indeed temporary, which enables various
optimizations such as not WAL-logging changes and using local rather than
shared buffers for access.  Aside from obvious performance benefits, this
provides a solution to bug #3483, in which other backends unexpectedly held
open file references to temporary tables.  The scheme preserves the property
that TOAST tables are not in any schema that's normally in the search path,
so they don't conflict with user table names.

initdb forced because of changes in system view definitions.
2007-07-25 22:16:18 +00:00
Tom Lane fdb5b69e9c Suppress warning when compiling with -DPROFILE_PID_DIR: sys/stat.h is
supposed to be included when using mkdir().
2007-07-25 19:58:56 +00:00
Tom Lane 04fbe29a83 Fix WAL replay of truncate operations to cope with the possibility that the
truncated relation was deleted later in the WAL sequence.  Since replay
normally auto-creates a relation upon its first reference by a WAL log entry,
failure is seen only if the truncate entry happens to be the first reference
after the checkpoint we're restarting from; which is a pretty unusual case but
of course not impossible.  Fix by making truncate entries auto-create like
the other ones do.  Per report and test case from Dharmendra Goyal.
2007-07-20 16:29:53 +00:00
Tom Lane 82b3684672 Add comments spelling out why it's a good idea to release multiple
partition locks in reverse order.
2007-07-16 21:09:50 +00:00
Tom Lane b09cb0cf12 Remove the pgstat_drop_relation() call from smgr_internal_unlink(), because
we don't know at that point which relation OID to tell pgstat to forget.
The code was passing the relfilenode, which is incorrect, and could possibly
cause some other relation's stats to be zeroed out.  While we could try to
clean this up, it seems much simpler and more reliable to let the next
invocation of pgstat_vacuum_tabstat() fix things; which indeed is how it
worked before I introduced the buggy code into 8.1.3 and later :-(.
Problem noticed by Itagaki Takahiro, fix is per subsequent discussion.
2007-07-08 22:23:16 +00:00
Tom Lane 83aaebba63 Fix incorrect comment about the timing of AbsorbFsyncRequests() during
checkpoint.  The comment claimed that we could do this anytime after
setting the checkpoint REDO point, but actually BufferSync is relying
on the assumption that buffers dumped by other backends will be fsync'd
too.  So we really could not do it any sooner than we are doing it.
2007-07-03 14:51:24 +00:00
Tom Lane beba73763b Fix comments not updated in recent patch. 2007-07-01 02:22:23 +00:00
Tom Lane 9fc25c0511 Improve logging of checkpoints. Patch by Greg Smith, worked over
by Heikki and a little bit by me.
2007-06-30 19:12:02 +00:00
Alvaro Herrera 10af02b912 Arrange for SIGINT in autovacuum workers to cancel the current table and
continue with the schedule.  Change current uses of SIGINT to abort a worker
into SIGTERM, which keeps the old behaviour of terminating the process.

Patch from ITAGAKI Takahiro, with some editorializing of my own.
2007-06-29 17:07:39 +00:00
Tom Lane 867e2c91a0 Implement "distributed" checkpoints in which the checkpoint I/O is spread
over a fairly long period of time, rather than being spat out in a burst.
This happens only for background checkpoints carried out by the bgwriter;
other cases, such as a shutdown checkpoint, are still done at full speed.

Remove the "all buffers" scan in the bgwriter, and associated stats
infrastructure, since this seems no longer very useful when the checkpoint
itself is properly throttled.

Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas,
and some minor API editorialization by me.
2007-06-28 00:02:40 +00:00
Tom Lane 9cce91dba0 Only log 'process acquired lock' if we actually did get the lock. This
test seems inessential right now since the only control path for not
getting the lock is via CHECK_FOR_INTERRUPTS which won't return control
to ProcSleep, but it would be important if we ever allow the deadlock
code to kill someone else's transaction instead of our own.
2007-06-19 22:01:15 +00:00
Tom Lane 6e07228728 Code review for log_lock_waits patch. Don't try to issue log messages from
within a signal handler (this might be safe given the relatively narrow code
range in which the interrupt is enabled, but it seems awfully risky); do issue
more informative log messages that tell what is being waited for and the exact
length of the wait; minor other code cleanup.  Greg Stark and Tom Lane
2007-06-19 20:13:22 +00:00
Tom Lane de6a6383a7 Update obsolete comment: it's no longer the case that mdread() will allow
reads beyond EOF, except by special coercion.
2007-06-18 00:47:20 +00:00
Tom Lane e976fd43c6 Add some simple defenses against null fields in pg_largeobject, and add
comments noting that there's an alignment assumption now that the data
field could be in 1-byte-header format.  Per discussion with Greg Stark.
2007-06-12 19:46:24 +00:00
Tom Lane a04a423599 Arrange for large sequential scans to synchronize with each other, so that
when multiple backends are scanning the same relation concurrently, each page
is (ideally) read only once.

Jeff Davis, with review by Heikki and Tom.
2007-06-08 18:23:53 +00:00
Tom Lane 6d6d14b6d5 Redefine IsTransactionState() to only return true for TRANS_INPROGRESS state,
which is the only state in which it's safe to initiate database queries.
It turns out that all but two of the callers thought that's what it meant;
and the other two were using it as a proxy for "will GetTopTransactionId()
return a nonzero XID"?  Since it was in fact an unreliable guide to that,
make those two just invoke GetTopTransactionId() always, then deal with a
zero result if they get one.
2007-06-07 21:45:59 +00:00
Tom Lane 24ee8af573 Rework temp_tablespaces patch so that temp tablespaces are assigned separately
for each temp file, rather than once per sort or hashjoin; this allows
spreading the data of a large sort or join across multiple tablespaces.
(I remain dubious that this will make any difference in practice, but certain
people insisted.)  Arrange to cache the results of parsing the GUC variable
instead of recomputing from scratch on every demand, and push usage of the
cache down to the bottommost fd.c level.
2007-06-07 19:19:57 +00:00
Tom Lane acfce502ba Create a GUC parameter temp_tablespaces that allows selection of the
tablespace(s) in which to store temp tables and temporary files.  This is a
list to allow spreading the load across multiple tablespaces (a random list
element is chosen each time a temp object is to be created).  Temp files are
not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace
directories.

Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.
2007-06-03 17:08:34 +00:00
Tom Lane 964ec46cfe Fix aboriginal bug in BufFileDumpBuffer that would cause it to write the
wrong data when dumping a bufferload that crosses a component-file boundary.
This probably has not been seen in the wild because (a) component files are
normally 1GB apiece and (b) non-block-aligned buffer usage is relatively
rare.  But it's fairly easy to reproduce a problem if one reduces RELSEG_SIZE
in a test build.  Kudos to Kurt Harriman for spotting the bug.
2007-06-01 23:43:11 +00:00
Tom Lane bd0a260928 Make CREATE/DROP/RENAME DATABASE wait a little bit to see if other backends
will exit before failing because of conflicting DB usage.  Per discussion,
this seems a good idea to help mask the fact that backend exit takes nonzero
time.  Remove a couple of thereby-obsoleted sleeps in contrib and PL
regression test sequences.
2007-06-01 19:38:07 +00:00
Tom Lane d526575f89 Make large sequential scans and VACUUMs work in a limited-size "ring" of
buffers, rather than blowing out the whole shared-buffer arena.  Aside from
avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
to cause a WAL flush for every page it modified, because we had it hacked to
use only a single buffer.  Those flushes will now occur only once per
ring-ful.  The exact ring size, and the threshold for seqscans to switch into
the ring usage pattern, remain under debate; but the infrastructure seems
done.  The key bit of infrastructure is a new optional BufferAccessStrategy
object that can be passed to ReadBuffer operations; this replaces the former
StrategyHintVacuum API.

This patch also changes the buffer usage-count methodology a bit: we now
advance usage_count when first pinning a buffer, rather than when last
unpinning it.  To preserve the behavior that a buffer's lifetime starts to
decrease when it's released, the clock sweep code is modified to not decrement
usage_count of pinned buffers.

Work not done in this commit: teach GiST and GIN indexes to use the vacuum
BufferAccessStrategy for vacuum-driven fetches.

Original patch by Simon, reworked by Heikki and again by Tom.
2007-05-30 20:12:03 +00:00
Tom Lane 77947c51c0 Fix up pgstats counting of live and dead tuples to recognize that committed
and aborted transactions have different effects; also teach it not to assume
that prepared transactions are always committed.

Along the way, simplify the pgstats API by tying counting directly to
Relations; I cannot detect any redeeming social value in having stats
pointers in HeapScanDesc and IndexScanDesc structures.  And fix a few
corner cases in which counts might be missed because the relation's
pgstat_info pointer hadn't been set.
2007-05-27 03:50:39 +00:00
Tom Lane 63735ca815 Dept. of second thoughts: add comments cautioning against using
ReadOrZeroBuffer to fetch pages from beyond physical EOF.  This would
usually work, but would cause problems for md.c if writes occurred
beyond a segment boundary when the previous segment file hadn't been
fully extended.
2007-05-02 23:34:48 +00:00
Tom Lane 8c3cc86e7b During WAL recovery, when reading a page that we intend to overwrite completely
from the WAL data, don't bother to physically read it; just have bufmgr.c
return a zeroed-out buffer instead.  This speeds recovery significantly,
and also avoids unnecessary failures when a page-to-be-overwritten has corrupt
page headers on disk.  This replaces a former kluge that accomplished the
latter by pretending zero_damaged_pages was always ON during WAL recovery;
which was OK when the kluge was put in, but is unsafe when restoring a WAL
log that was written with full_page_writes off.

Heikki Linnakangas
2007-05-02 23:18:03 +00:00
Bruce Momjian 1c8302cab3 Add comment on why deadlock detection error messages only prints numbers. 2007-04-20 20:15:52 +00:00
Alvaro Herrera e2a186b03c Add a multi-worker capability to autovacuum. This allows multiple worker
processes to be running simultaneously.  Also, now autovacuum processes do not
count towards the max_connections limit; they are counted separately from
regular processes, and are limited by the new GUC variable
autovacuum_max_workers.

The launcher now has intelligence to launch workers on each database every
autovacuum_naptime seconds, limited only on the max amount of worker slots
available.

Also, the global worker I/O utilization is limited by the vacuum cost-based
delay feature.  Workers are "balanced" so that the total I/O consumption does
not exceed the established limit.  This part of the patch was contributed by
ITAGAKI Takahiro.

Per discussion.
2007-04-16 18:30:04 +00:00
Tom Lane 995ba280c1 Rearrange mdsync() looping logic to avoid the problem that a sufficiently
fast flow of new fsync requests can prevent mdsync() from ever completing.
This was an unforeseen consequence of a patch added in Mar 2006 to prevent
the fsync request queue from overflowing.  Problem identified by Heikki
Linnakangas and independently by ITAGAKI Takahiro; fix based on ideas from
Takahiro-san, Heikki, and Tom.

Back-patch as far as 8.1 because a previous back-patch introduced the problem
into 8.1 ...
2007-04-12 17:10:55 +00:00
Tom Lane 3e23b68dac Support varlena fields with single-byte headers and unaligned storage.
This commit breaks any code that assumes that the mere act of forming a tuple
(without writing it to disk) does not "toast" any fields.  While all available
regression tests pass, I'm not totally sure that we've fixed every nook and
cranny, especially in contrib.

Greg Stark with some help from Tom Lane
2007-04-06 04:21:44 +00:00
Tom Lane 9c9b619473 Remove the CheckpointStartLock in favor of having backends show whether they
are in their commit critical sections via flags in the ProcArray.  Checkpoint
can watch the ProcArray to determine when it's safe to proceed.  This is
a considerably better solution to the original problem of race conditions
between checkpoint and transaction commit: it speeds up commit, since there's
one less lock to fool with, and it prevents the problem of checkpoint being
delayed indefinitely when there's a constant flow of commits.  Heikki, with
some kibitzing from Tom.
2007-04-03 16:34:36 +00:00
Magnus Hagander 335feca441 Add some instrumentation to the bgwriter, through the stats collector.
New view pg_stat_bgwriter, and the functions required to build it.
2007-03-30 18:34:56 +00:00
Tom Lane e85a01df67 Clean up the representation of special snapshots by including a "method
pointer" in every Snapshot struct.  This allows removal of the case-by-case
tests in HeapTupleSatisfiesVisibility, which should make it a bit faster
(I didn't try any performance tests though).  More importantly, we are no
longer violating portable C practices by assuming that small integers are
distinct from all pointer values, and HeapTupleSatisfiesDirty no longer
has a non-reentrant API involving side-effects on a global variable.

There were a couple of places calling HeapTupleSatisfiesXXX routines
directly rather than through the HeapTupleSatisfiesVisibility macro.
Since these places had to be changed anyway, I chose to make them go
through the macro for uniformity.

Along the way I renamed HeapTupleSatisfiesSnapshot to HeapTupleSatisfiesMVCC
to emphasize that it's only used with MVCC-type snapshots.  I was sorely
tempted to rename HeapTupleSatisfiesVisibility to HeapTupleSatisfiesSnapshot,
but forebore for the moment to avoid confusion and reduce the likelihood that
this patch breaks some of the pending patches.  Might want to reconsider
doing that later.
2007-03-25 19:45:14 +00:00
Bruce Momjian 1e2bfb5811 Cleanup for procarray.c. 2007-03-23 03:16:39 +00:00
Alvaro Herrera 626eb02198 Cleanup the bootstrap code a little, and rename "dummy procs" in the code
comments and variables to "auxiliary proc", per Heikki's request.
2007-03-07 13:35:03 +00:00
Bruce Momjian a535cdf130 Revert temp_tablespaces because of coding problems, per Tom. 2007-03-06 02:06:15 +00:00
Bruce Momjian 0763a56501 Add lo_truncate() to backend and libpq for large object truncation.
Kris Jurka
2007-03-03 19:52:47 +00:00
Neil Conway 90d76525c5 Add resetStringInfo(), which clears the content of a StringInfo, and
fixup various places in the tree that were clearing a StringInfo by hand.
Making this function a part of the API simplifies client code slightly,
and avoids needlessly peeking inside the StringInfo interface.
2007-03-03 19:32:55 +00:00
Bruce Momjian e52c4a6e26 Add GUC log_lock_waits to log long wait times.
Simon Riggs
2007-03-03 18:46:40 +00:00
Tom Lane fb276438b6 Suppress useless searches for unused line pointers in PageAddItem. To do
this, add a 16-bit "flags" field to page headers by stealing some bits from
pd_tli.  We use one flag bit as a hint to indicate whether there are any
unused line pointers; the remaining 15 are available for future use.

This is a cut-down form of an idea proposed by Hiroki Kataoka in July 2005.
At the time it was rejected because the original patch increased the size of
page headers and it wasn't clear that the benefit outweighed the distributed
cost.  The flag-bit approach gets most of the benefit without requiring an
increase in the page header size.

Heikki Linnakangas and Tom Lane
2007-03-02 00:48:44 +00:00
Magnus Hagander 2c6feff5e7 Remove temporary Windows-specific debugging code. 2007-02-28 15:59:30 +00:00
Tom Lane 234a02b2a8 Replace direct assignments to VARATT_SIZEP(x) with SET_VARSIZE(x, len).
Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with
VARSIZE and VARDATA, and as a consequence almost no code was using the
longer names.  Rename the length fields of struct varlena and various
derived structures to catch anyplace that was accessing them directly;
and clean up various places so caught.  In itself this patch doesn't
change any behavior at all, but it is necessary infrastructure if we hope
to play any games with the representation of varlena headers.
Greg Stark and Tom Lane
2007-02-27 23:48:10 +00:00
Bruce Momjian 6f519ad01c btree source code cleanups:
I refactored findsplitloc and checksplitloc so that the division of
labor is more clear IMO. I pushed all the space calculation inside the
loop to checksplitloc.

I also fixed the off by 4 in free space calculation caused by
PageGetFreeSpace subtracting sizeof(ItemIdData), even though it was
harmless, because it was distracting and I felt it might come back to
bite us in the future if we change the page layout or alignments.
There's now a new function PageGetExactFreeSpace that doesn't do the
subtraction.

findsplitloc now tries the "just the new item to right page" split as
well. If people don't like the refactoring, I can write a patch to just
add that.

Heikki Linnakangas
2007-02-21 20:02:17 +00:00
Bruce Momjian 6765df9174 Add configure --enable-profiling to enable GCC profiling. Patches from
Korry Douglas and Nikhil S
2007-02-21 15:12:39 +00:00
Alvaro Herrera 1820650934 Restructure autovacuum in two processes: a dummy process, which runs
continuously, and requests vacuum runs of "autovacuum workers" to postmaster.
The workers do the actual vacuum work.  This allows for future improvements,
like allowing multiple autovacuum jobs running in parallel.

For now, the code keeps the original behavior of having a single autovac
process at any time by sleeping until the previous worker has finished.
2007-02-15 23:23:23 +00:00
Peter Eisentraut c138b966d4 Replace useless uses of := by = in makefiles. 2007-02-09 15:56:00 +00:00
Bruce Momjian 8b4ff8b6a1 Wording cleanup for error messages. Also change can't -> cannot.
Standard English uses "may", "can", and "might" in different ways:

        may - permission, "You may borrow my rake."

        can - ability, "I can lift that log."

        might - possibility, "It might rain today."

Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice.  Similarly, "It may crash" is better stated, "It might crash".
2007-02-01 19:10:30 +00:00
Bruce Momjian 148ea5cbea Add GUC temp_tablespaces to provide a default location for temporary
objects.

Jaime Casanova
2007-01-25 04:35:11 +00:00
Peter Eisentraut 2cc01004c6 Remove remains of old depend target. 2007-01-20 17:16:17 +00:00
Tom Lane eddbf39756 Extend yesterday's patch so that the bgwriter is also told to forget
pending fsyncs during DROP DATABASE.  Obviously necessary in hindsight :-(
2007-01-17 16:25:01 +00:00
Tom Lane 6d660587f6 Revise bgwriter fsync-request mechanism to improve robustness when a table
is deleted.  A backend about to unlink a file now sends a "revoke fsync"
request to the bgwriter to make it clean out pending fsync requests.  There
is still a race condition where the bgwriter may try to fsync after the unlink
has happened, but we can resolve that by rechecking the fsync request queue
to see if a revoke request arrived meanwhile.  This eliminates the former
kluge of "just assuming" that an ENOENT failure is okay, and lets us handle
the fact that on Windows it might be EACCES too without introducing any
questionable assumptions.  After an idea of mine improved by Magnus.

The HEAD patch doesn't apply cleanly to 8.2, but I'll see about a back-port
later.  In the meantime this could do with some testing on Windows; I've been
able to force it through the code path via ENOENT, but that doesn't prove that
it actually fixes the Windows problem ...
2007-01-17 00:17:21 +00:00
Alvaro Herrera eb63cc3da8 Arrange for autovacuum to be killed when another operation wants to be alone
accessing it, like DROP DATABASE.  This allows the regression tests to pass
with autovacuum enabled, which open the gates for finally enabling autovacuum
by default.
2007-01-16 13:28:57 +00:00
Bruce Momjian d64995aa89 Remove trace macro call from new log_temp_files, until it gets more
research.
2007-01-09 22:03:51 +00:00
Bruce Momjian be8a431881 Add GUC log_temp_files to log the use of temporary files.
Bill Moran
2007-01-09 21:31:17 +00:00
Bruce Momjian 29dccf5fe0 Update CVS HEAD for 2007 copyright. Back branches are typically not
back-stamped for this.
2007-01-05 22:20:05 +00:00
Tom Lane ef07221997 Clean up smgr.c/md.c APIs as per discussion a couple months ago. Instead of
having md.c return a success/failure boolean to smgr.c, which was just going
to elog anyway, let md.c issue the elog messages itself.  This allows better
error reporting, particularly in cases such as "short read" or "short write"
which Peter was complaining of.  Also, remove the kluge of allowing mdread()
to return zeroes from a read-beyond-EOF: this is now an error condition
except when InRecovery or zero_damaged_pages = true.  (Hash indexes used to
require that behavior, but no more.)  Also, enforce that mdwrite() is to be
used for rewriting existing blocks while mdextend() is to be used for
extending the relation EOF.  This restriction lets us get rid of the old
ad-hoc defense against creating huge files by an accidental reference to
a bogus block number: we'll only create new segments in mdextend() not
mdwrite() or mdread().  (Again, when InRecovery we allow it anyway, since
we need to allow updates of blocks that were later truncated away.)
Also, clean up the original makeshift patch for bug #2737: move the
responsibility for padding relation segments to full length into md.c.
2007-01-03 18:11:01 +00:00
Tom Lane 72619f8191 Modify local buffer management to request memory for local buffers in blocks
of increasing size, instead of one at a time.  This reduces the memory
management overhead when num_temp_buffers is large: in the previous coding
we would actually waste 50% of the space used for temp buffers, because aset.c
would round the individual requests up to 16K.  Problem noted while studying
a performance issue reported by Steven Flatt.

Back-patch as far as 8.1 --- older versions used few enough local buffers
that the issue isn't significant for them.
2006-12-27 22:31:54 +00:00
Peter Eisentraut 409600942b KB -> kB 2006-11-24 09:20:12 +00:00
Tom Lane 3ad0728c81 On systems that have setsid(2) (which should be just about everything except
Windows), arrange for each postmaster child process to be its own process
group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole
process group not only the direct child process.  This provides saner behavior
for archive and recovery scripts; in particular, it's possible to shut down a
warm-standby recovery server using "pg_ctl stop -m immediate", since delivery
of SIGQUIT to the startup subprocess will result in killing the waiting
recovery_command.  Also, this makes Query Cancel and statement_timeout apply
to scripts being run from backends via system().  (There is no support in the
core backend for that, but it's widely done using untrusted PLs.)  Per gripe
from Stephen Harris and subsequent discussion.
2006-11-21 20:59:53 +00:00
Tom Lane 1a5c450f30 When truncating a relation in-place (eg during VACUUM), do not try to unlink
any no-longer-needed segments; just truncate them to zero bytes and leave
the files in place for possible future re-use.  This avoids problems when
the segments are re-used due to relation growth shortly after truncation.
Before, the bgwriter, and possibly other backends, could still be holding
open file references to the old segment files, and would write dirty blocks
into those files where they'd disappear from the view of other processes.

Back-patch as far as 8.0.  I believe the 7.x branches are not vulnerable,
because they had no bgwriter, and "blind" writes by other backends would
always be done via freshly-opened file references.
2006-11-20 01:07:56 +00:00
Tom Lane 36e012e727 Remove temporary Windows-specific debugging code; it seems the problem
with fopen() not using FILE_SHARE_DELETE was indeed the bug we were after,
given lack of recent reports.
2006-11-06 17:10:22 +00:00
Tom Lane 48188e1621 Fix recently-understood problems with handling of XID freezing, particularly
in PITR scenarios.  We now WAL-log the replacement of old XIDs with
FrozenTransactionId, so that such replacement is guaranteed to propagate to
PITR slave databases.  Also, rather than relying on hint-bit updates to be
preserved, pg_clog is not truncated until all instances of an XID are known to
have been replaced by FrozenTransactionId.  Add new GUC variables and
pg_autovacuum columns to allow management of the freezing policy, so that
users can trade off the size of pg_clog against the amount of freezing work
done.  Revise the already-existing code that forces autovacuum of tables
approaching the wraparound point to make it more bulletproof; also, revise the
autovacuum logic so that anti-wraparound vacuuming is done per-table rather
than per-database.  initdb forced because of changes in pg_class, pg_database,
and pg_autovacuum catalogs.  Heikki Linnakangas, Simon Riggs, and Tom Lane.
2006-11-05 22:42:10 +00:00
Tom Lane 954c1813ac Remove an unnecessary HOLD_INTERRUPTS/RESUME_INTERRUPTS pair.
This was required back when RESUME_INTERRUPTS could actually
execute ProcessInterrupts, but that hasn't been true since 2001...
2006-10-22 20:34:54 +00:00
Tom Lane e0dece127d Redesign the patch for allocation of shmem space and LWLocks for add-on
modules; the first try was not usable in EXEC_BACKEND builds (e.g.,
Windows).  Instead, just provide some entry points to increase the
allocation requests during postmaster start, and provide a dedicated
LWLock that can be used to synchronize allocation operations performed
by backends.  Per discussion with Marc Munro.
2006-10-15 22:04:08 +00:00
Bruce Momjian f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Tom Lane c92f7e258e Replace strncpy with strlcpy in selected places that seem possibly relevant
to performance.  (A wholesale effort to get rid of strncpy should be
undertaken sometime, but not during beta.)  This commit also fixes dynahash.c
to correctly truncate overlength string keys for hashtables, so that its
callers don't have to anymore.
2006-09-27 18:40:10 +00:00
Tom Lane ffae5cc5a6 Add a check to prevent overwriting valid data if smgrnblocks() gives a
wrong answer, as has been seen to occur with a buggy Linux kernel.  Not
really our bug, but it's a simple test in a seldom-used control path,
so might as well have a defense.
2006-09-25 22:01:10 +00:00
Tom Lane d40d34863e Fix pg_locks view to call advisory locks advisory locks, while preserving
backward compatibility for anyone using the old userlock code that's now
on pgfoundry --- locks from that code still show as 'userlock'.
2006-09-22 23:20:14 +00:00
Tom Lane 9e936693a9 Fix free space map to correctly track the total amount of FSM space needed
even when a single relation requires more than max_fsm_pages pages.  Also,
make VACUUM emit a warning in this case, since it likely means that VACUUM
FULL or other drastic corrective measure is needed.  Per reports from Jeff
Frost and others of unexpected changes in the claimed max_fsm_pages need.
2006-09-21 20:31:22 +00:00
Tom Lane 9b4cda0df6 Add built-in userlock manipulation functions to replace the former
contrib functionality.  Along the way, remove the USER_LOCKS configuration
symbol, since it no longer makes any sense to try to compile that out.
No user documentation yet ... mmoncure has promised to write some.
Thanks to Abhijit Menon-Sen for creating a first draft to work from.
2006-09-18 22:40:40 +00:00
Tom Lane 2e5e856f6b Marginal cleanup in arrangements for ensuring StrategyHintVacuum is cleared
after an error during VACUUM.  We have a PG_TRY block anyway around the only
call sites, so just reset it in the CATCH clause instead of having
AtEOXact_Buffers blindly do it during xact end.  I think the old code was
actively wrong for the case of a failure during ANALYZE inside a
subtransaction --- the flag wouldn't get cleared until main transaction end.
Probably not worth back-patching though.
2006-09-17 22:16:22 +00:00
Bruce Momjian a0e87ad7a5 Specify lo_write() to take a _const_ buffer, to match documentation. 2006-09-07 15:37:25 +00:00
Tom Lane 8fad2e3ff4 Arrange for GetSnapshotData to copy live-subtransaction XIDs from the
PGPROC array into snapshots, and use this information to avoid visits
to pg_subtrans in HeapTupleSatisfiesSnapshot.  This appears to solve
the pg_subtrans-related context swap storm problem that's been reported
by several people for 8.1.  While at it, modify GetSnapshotData to not
take an exclusive lock on ProcArrayLock, as closer analysis shows that
shared lock is always sufficient.
Itagaki Takahiro and Tom Lane
2006-09-03 15:59:39 +00:00
Tom Lane e06fda0a8b Add a function GetLockConflicts() to lock.c to report xacts holding
locks that would conflict with a specified lock request, without
actually trying to get that lock.  Use this instead of the former ad hoc
method of doing the first wait step in CREATE INDEX CONCURRENTLY.
Fixes problem with undetected deadlock and in many cases will allow the
index creation to proceed sooner than it otherwise could've.  Per
discussion with Greg Stark.
2006-08-27 19:14:34 +00:00
Tom Lane e093dcdd28 Add the ability to create indexes 'concurrently', that is, without
blocking concurrent writes to the table.  Greg Stark, with a little help
from Tom Lane.
2006-08-25 04:06:58 +00:00
Tom Lane f836c2e37e Add some debug logging code to AllocateFile's failure path to log the
specific Windows error code (GetLastError).  This is a hopefully temporary
hack to try to diagnose rare failures.  Magnus Hagander
2006-08-24 03:15:43 +00:00
Tom Lane 9bf760f7de Add a 'waiting' column to pg_stat_activity to carry the same information
that ps_status provides by appending 'waiting' to the PS display.  This
completes the project of making it feasible to turn off process title
updates and instead rely on pg_stat_activity.  Per my suggestion a few
weeks ago.
2006-08-19 01:36:34 +00:00
Tom Lane 7aa772f03e Now that we've rearranged relation open to get a lock before touching
the rel, it's easy to get rid of the narrow race-condition window that
used to exist in VACUUM and CLUSTER.  Did some minor code-beautification
work in the same area, too.
2006-08-18 16:09:13 +00:00
Tom Lane 2dd7ab0627 Put back another improperly-removed #include. 2006-08-07 21:56:25 +00:00
Tom Lane 3467758809 Fix missing 'static' keywords --- some compilers gripe about this. 2006-08-04 16:42:56 +00:00
Bruce Momjian 2c6d96cef6 Add support for loadable modules to allocated shared memory and
lightweight locks.

Marc Munro
2006-08-01 19:03:11 +00:00
Tom Lane 09d3670df3 Change the relation_open protocol so that we obtain lock on a relation
(table or index) before trying to open its relcache entry.  This fixes
race conditions in which someone else commits a change to the relation's
catalog entries while we are in process of doing relcache load.  Problems
of that ilk have been reported sporadically for years, but it was not
really practical to fix until recently --- for instance, the recent
addition of WAL-log support for in-place updates helped.

Along the way, remove pg_am.amconcurrent: all AMs are now expected to support
concurrent update.
2006-07-31 20:09:10 +00:00
Tom Lane 8822263635 Fix a couple of comments. 2006-07-30 20:17:11 +00:00
Alvaro Herrera 92c2ecc130 Modify snapshot definition so that lazy vacuums are ignored by other
vacuums.  This allows a OLTP-like system with big tables to continue
regular vacuuming on small-but-frequently-updated tables while the
big tables are being vacuumed.

Original patch from Hannu Krossing, rewritten by Tom Lane and updated
by me.
2006-07-30 02:07:18 +00:00
Peter Eisentraut e9b4969062 DTrace support, with a small initial set of probes
by Robert Lor
2006-07-24 16:32:45 +00:00
Tom Lane a794fb0681 Convert the lock manager to use the new dynahash.c support for partitioned
hash tables, instead of the previous kluge involving multiple hash tables.
This partially undoes my patch of last December.
2006-07-23 23:08:46 +00:00
Tom Lane b25dc481c8 Fix oversight in sizing of shared buffer lookup hashtable. Because
BufferAlloc tries to insert a new mapping entry before deleting the old one
for a buffer, we have a transient need for more than NBuffers entries ---
one more in 8.1, and as many as NUM_BUFFER_PARTITIONS more in CVS HEAD.
In theory this could lead to an "out of shared memory" failure if shmem
had already been completely claimed by the time the extra entries were
needed.
2006-07-23 18:34:45 +00:00
Tom Lane 10b9ca3d05 Split the buffer mapping table into multiple separately lockable
partitions, as per discussion.  Passes functionality checks, but
I don't have any performance data yet.
2006-07-23 03:07:58 +00:00
Tom Lane 51ee9fa157 Add support to dynahash.c for partitioning shared hashtables according
to the low-order bits of the entry hash value.  Also make some incidental
cleanups in the dynahash API, such as not exporting the hash header
structs to the world.
2006-07-22 23:04:39 +00:00
Tom Lane c0e9b3139f Hmm, seems --disable-spinlocks has been broken for awhile and nobody
noticed.  Fix SpinlockSemas() to report the correct count considering
that PG 8.1 adds a spinlock to each shared-buffer header.
2006-07-22 21:04:40 +00:00
Tom Lane 3ff58b48c9 Put back another not-so-unnecessary #include, per report from Hiroshi Saito. 2006-07-16 01:05:23 +00:00
Tom Lane daecd97617 Put back some more not-so-unused-as-all-that #includes. This un-breaks
the EXEC_BACKEND code on my machines, so hopefully it will fix the
Windows buildfarm members.
2006-07-15 15:47:17 +00:00
Tom Lane cd24163f6d Fix another passel of include-file breakage. Kris Jurka, Tom Lane 2006-07-14 16:59:19 +00:00
Bruce Momjian e0522505bd Remove 576 references of include files that were not needed. 2006-07-14 14:52:27 +00:00
Tom Lane ae643747b1 Fix a passel of recently-committed violations of the rule 'thou shalt
have no other gods before c.h'.  Also remove some demonstrably redundant
#include lines, mostly of <errno.h> which was added to c.h years ago.
2006-07-14 05:28:29 +00:00
Bruce Momjian a22d76d96a Allow include files to compile own their own.
Strip unused include files out unused include files, and add needed
includes to C files.

The next step is to remove unused include files in C files.
2006-07-13 16:49:20 +00:00
Bruce Momjian 370a709c75 Add GUC update_process_title to control whether 'ps' display is updated
for every command, default to on.
2006-06-27 22:16:44 +00:00
Tom Lane 27c3e3de09 Remove redundant gettimeofday() calls to the extent practical without
changing semantics too much.  statement_timestamp is now set immediately
upon receipt of a client command message, and the various places that used
to do their own gettimeofday() calls to mark command startup are referenced
to that instead.  I have also made stats_command_string use that same
value for pg_stat_activity.query_start for both the command itself and
its eventual replacement by <IDLE> or <idle in transaction>.  There was
some debate about that, but no argument that seemed convincing enough to
justify an extra gettimeofday() call.
2006-06-20 22:52:00 +00:00
Tom Lane b13c9686d0 Take the statistics collector out of the loop for monitoring backends'
current commands; instead, store current-status information in shared
memory.  This substantially reduces the overhead of stats_command_string
and also ensures that pg_stat_activity is fully up to date at all times.
Per my recent proposal.
2006-06-19 01:51:22 +00:00
Tom Lane 8ff80c1bd3 Remove obsolete comment about VACUUM FULL: it takes buffer content locks
now, and must do so to ensure bgwriter doesn't write a page that is in
process of being compacted.
2006-06-08 14:58:33 +00:00
Bruce Momjian 26cfefabad Fix printf mask for SizeVfdCache
Qingqing Zhou
2006-05-30 13:04:59 +00:00
Tom Lane 2246e31775 Upon closer inspection, the sparc code in s_lock.c is dead code, and
always has been, because it's not got any .globl declaration!  We've
been relying on the solaris_sparc.s code instead.  Rip it out.
(Not back-patched, since this is just cosmetic cleanup.)
2006-05-12 16:50:52 +00:00
Tom Lane ab1ad7a653 Remove unnecessary .seg/.section directives, per Alan Stange. 2006-05-11 21:58:22 +00:00
Tom Lane 5749f6ef0c Rewrite btree vacuuming to fold the former bulkdelete and cleanup operations
into a single mostly-physical-order scan of the index.  This requires some
ticklish interlocking considerations, but should create no material
performance impact on normal index operations (at least given the
already-committed changes to make scans work a page at a time).  VACUUM
itself should get significantly faster in any index that's degenerated to a
very nonlinear page order.  Also, we save one pass over the index entirely,
except in the case where there were no deletions to do and so only one pass
happened anyway.

Original patch by Heikki Linnakangas, rework by Tom Lane.
2006-05-08 00:00:17 +00:00
Tom Lane 52667d56a3 Rethink the locking mechanisms used for CREATE/DROP/RENAME DATABASE.
The former approach used ExclusiveLock on pg_database, which being a
cluster-wide lock meant only one of these operations could proceed at
a time; worse, it also blocked all incoming connections in ReverifyMyDatabase.
Now that we have LockSharedObject(), we can use locks of different types
applied to databases considered as objects.  This allows much more
flexible management of the interlocking: two CREATE DATABASEs need not
block each other, and need not block connections except to the template
database being used.  Similarly DROP DATABASE doesn't block unrelated
operations.  The locking used in flatfiles.c is also much narrower in
scope than before.  Per recent proposal.
2006-05-04 16:07:29 +00:00
Bruce Momjian a1ee621589 Fix s_lock_test to use tas.o file, if needed. 2006-04-28 22:54:31 +00:00
Tom Lane 486f994be7 Revise large-object access routines to avoid running with CurrentMemoryContext
set to the large object context ("fscxt"), as this is inevitably a source of
transaction-duration memory leaks.  Not sure why we'd not noticed it before;
maybe people weren't touching a whole lot of LOs in the same transaction
before the 8.1 pg_dump changes.  Per report from Wayne Conrad.

Backpatched as far as 8.1, but the problem doubtless goes all the way back.
I'm disinclined to spend the time to try to verify that the older branches
would still work if patched, seeing that this code was significantly modified
for 8.0 and again for 8.1, and that we don't have any trouble reports before
8.1.  (Maybe the leaks were smaller before?)
2006-04-26 00:34:57 +00:00
Tom Lane b5498a26de Add some optional code (conditionally compiled under #ifdef LWLOCK_STATS)
to track the number of LWLock acquisitions and the number of times we
block waiting for an LWLock, on a per-process basis.  After having needed
this twice in the past few months, seems like it should go into CVS.
2006-04-21 16:45:12 +00:00
Tom Lane defe93463c Make the world safe for full_page_writes. Allow XLOG records that try to
update no-longer-existing pages to fall through as no-ops, but make a note
of each page number referenced by such records.  If we don't see a later
XLOG entry dropping the table or truncating away the page, complain at
the end of XLOG replay.  Since this fixes the known failure mode for
full_page_writes = off, revert my previous band-aid patch that disabled
that GUC variable.
2006-04-14 20:27:24 +00:00
Tom Lane 0fcc3c2f1d Repair a low-probability race condition identified by Qingqing Zhou.
If a process abandons a wait in LockBufferForCleanup (in practice,
only happens if someone cancels a VACUUM) just before someone else
sends it a signal indicating the buffer is available, it was possible
for the wakeup to remain in the process' semaphore, causing misbehavior
next time the process waited for an lmgr lock.  Rather than try to
prevent the race condition directly, it seems best to make the lock
manager robust against leftover wakeups, by having it repeat waiting
on the semaphore if the lock has not actually been granted or denied
yet.
2006-04-14 03:38:56 +00:00
Tom Lane a8b8f4db23 Clean up WAL/buffer interactions as per my recent proposal. Get rid of the
misleadingly-named WriteBuffer routine, and instead require routines that
change buffer pages to call MarkBufferDirty (which does exactly what it says).
We also require that they do so before calling XLogInsert; this takes care of
the synchronization requirement documented in SyncOneBuffer.  Note that
because bufmgr takes the buffer content lock (in shared mode) while writing
out any buffer, it doesn't matter whether MarkBufferDirty is executed before
the buffer content change is complete, so long as the content change is
completed before releasing exclusive lock on the buffer.  So it's OK to set
the dirtybit before we fill in the LSN.
This eliminates the former kluge of needing to set the dirtybit in LockBuffer.
Aside from making the code more transparent, we can also add some new
debugging assertions, in particular that the caller of MarkBufferDirty must
hold the buffer content lock, not merely a pin.
2006-03-31 23:32:07 +00:00
Tom Lane 4243f2387a Suppress attempts to report dropped tables to the stats collector from a
startup or recovery process.  Since such a process isn't a real backend,
pgstat.c gets confused.  This accounts for recent reports of strange
"invalid server process ID -1" log messages during crash recovery.
There isn't any point in attempting to make the report, since we'll discard
stats in such scenarios anyhow.
2006-03-30 22:11:55 +00:00
Tom Lane 6d61cdec07 Clean up and document the API for XLogOpenRelation and XLogReadBuffer.
This commit doesn't make much functional change, but it does eliminate some
duplicated code --- for instance, PageIsNew tests are now done inside
XLogReadBuffer rather than by each caller.
The GIST xlog code still needs a lot of love, but I'll worry about that
separately.
2006-03-29 21:17:39 +00:00
Tom Lane 0a20207060 Arrange to emit a description of the current XLOG record as error context
when an error occurs during xlog replay.  Also, replace the former risky
'write into a fixed-size buffer with no overflow detection' API for XLOG
record description routines; use an expansible StringInfo instead.  (The
latter accounts for most of the patch bulk.)

Qingqing Zhou
2006-03-24 04:32:13 +00:00
Bruce Momjian f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Tom Lane 60d3c9fdf4 Declare the arguments of AllocateFile() as const char *, not char *.
This is consistent with the standard definition of fopen().
2006-03-04 21:32:47 +00:00
Tom Lane 9a506a6257 Arrange to call AbsorbFsyncRequests every so often while performing a
checkpoint in the bgwriter.  This forestalls overflow of the fsync request
queue, which is not fatal but causes considerable performance degradation
when it occurs (because backends then have to do their own fsyncs).  Per
patch from Itagaki Takahiro, modified a little bit by me.
2006-03-03 00:02:02 +00:00
Bruce Momjian d5dd3d451e Add contrib/pg_freespacemap to display free space map information.
Mark Kirkwood
2006-02-12 03:55:53 +00:00
Bruce Momjian 59bb147353 Update random() usage so ranges are inclusive/exclusive as required. 2006-02-03 12:45:47 +00:00
Tom Lane d5db3abfb6 Modify pgstats code to reduce performance penalties from oversized stats data
files: avoid creating stats hashtable entries for tables that aren't being
touched except by vacuum/analyze, ensure that entries for dropped tables are
removed promptly, and tweak the data layout to avoid storing useless struct
padding.  Also improve the performance of pgstat_vacuum_tabstat(), and make
sure that autovacuum invokes it exactly once per autovac cycle rather than
multiple times or not at all.  This should cure recent complaints about 8.1
showing much higher stats I/O volume than was seen in 8.0.  It'd still be a
good idea to revisit the design with an eye to not re-writing the entire
stats dataset every half second ... but that would be too much to backpatch,
I fear.
2006-01-18 20:35:06 +00:00
Tom Lane 558bc2584d Fix fsync code to test whether F_FULLFSYNC is available, instead of
assuming it always is on Darwin.  Per report from Neil Brandt.
2006-01-17 23:52:31 +00:00
Tom Lane 39fc1fb07a Remove logic in XactLockTableWait() that attempted to mark a crashed
transaction as aborted.  Since we only call XactLockTableWait on XIDs
that we believe to be currently running, the odds of this code ever
actually firing are minimal.  It's certainly unnecessary, since a
transaction that's not either running or committed will be presumed
aborted anyway.  What's more, it's not hard to imagine scenarios where
this could result in corrupting pg_clog: for instance, if a bogus XID
somehow got passed to XactLockTableWait.  I think the code probably
dates from the ancient era when we didn't have TransactionIdIsInProgress;
back then it may have been necessary, but now I think it's a waste of
cycles and potentially dangerous.  Per discussion with Qingqing Zhou
and Karsten Hilbert.
2006-01-13 21:32:12 +00:00
Tom Lane 304160c3e2 Fix ReadBuffer() to correctly handle the case where it's trying to extend
the relation but it finds a pre-existing valid buffer.  The buffer does not
correspond to any page known to the kernel, so we *must* do smgrextend to
ensure that the space becomes allocated.  The 7.x branches all do this
correctly, but the corner case got lost somewhere during 8.0 bufmgr rewrites.
(My fault no doubt :-( ... I think I assumed that such a buffer must be
not-BM_VALID, which is not so.)
2006-01-06 00:04:20 +00:00
Bruce Momjian 44f9021223 Remove BEOS port. 2006-01-05 03:01:38 +00:00
Tom Lane 349f40b2c2 Rearrange backend startup sequence so that ShmemIndexLock can become
an LWLock instead of a spinlock.  This hardly matters on Unix machines
but should improve startup performance on Windows (or any port using
EXEC_BACKEND).  Per previous discussion.
2006-01-04 21:06:32 +00:00
Tom Lane 195f164228 Get rid of the SpinLockAcquire/SpinLockAcquire_NoHoldoff distinction
in favor of having just one set of macros that don't do HOLD/RESUME_INTERRUPTS
(hence, these correspond to the old SpinLockAcquire_NoHoldoff case).
Given our coding rules for spinlock use, there is no reason to allow
CHECK_FOR_INTERRUPTS to be done while holding a spinlock, and also there
is no situation where ImmediateInterruptOK will be true while holding a
spinlock.  Therefore doing HOLD/RESUME_INTERRUPTS while taking/releasing a
spinlock is just a waste of cycles.  Qingqing Zhou and Tom Lane.
2005-12-29 18:08:05 +00:00