Commit Graph

1476 Commits

Author SHA1 Message Date
Tom Lane
24ee8af573 Rework temp_tablespaces patch so that temp tablespaces are assigned separately
for each temp file, rather than once per sort or hashjoin; this allows
spreading the data of a large sort or join across multiple tablespaces.
(I remain dubious that this will make any difference in practice, but certain
people insisted.)  Arrange to cache the results of parsing the GUC variable
instead of recomputing from scratch on every demand, and push usage of the
cache down to the bottommost fd.c level.
2007-06-07 19:19:57 +00:00
Tom Lane
acfce502ba Create a GUC parameter temp_tablespaces that allows selection of the
tablespace(s) in which to store temp tables and temporary files.  This is a
list to allow spreading the load across multiple tablespaces (a random list
element is chosen each time a temp object is to be created).  Temp files are
not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace
directories.

Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.
2007-06-03 17:08:34 +00:00
Tom Lane
964ec46cfe Fix aboriginal bug in BufFileDumpBuffer that would cause it to write the
wrong data when dumping a bufferload that crosses a component-file boundary.
This probably has not been seen in the wild because (a) component files are
normally 1GB apiece and (b) non-block-aligned buffer usage is relatively
rare.  But it's fairly easy to reproduce a problem if one reduces RELSEG_SIZE
in a test build.  Kudos to Kurt Harriman for spotting the bug.
2007-06-01 23:43:11 +00:00
Tom Lane
bd0a260928 Make CREATE/DROP/RENAME DATABASE wait a little bit to see if other backends
will exit before failing because of conflicting DB usage.  Per discussion,
this seems a good idea to help mask the fact that backend exit takes nonzero
time.  Remove a couple of thereby-obsoleted sleeps in contrib and PL
regression test sequences.
2007-06-01 19:38:07 +00:00
Tom Lane
d526575f89 Make large sequential scans and VACUUMs work in a limited-size "ring" of
buffers, rather than blowing out the whole shared-buffer arena.  Aside from
avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
to cause a WAL flush for every page it modified, because we had it hacked to
use only a single buffer.  Those flushes will now occur only once per
ring-ful.  The exact ring size, and the threshold for seqscans to switch into
the ring usage pattern, remain under debate; but the infrastructure seems
done.  The key bit of infrastructure is a new optional BufferAccessStrategy
object that can be passed to ReadBuffer operations; this replaces the former
StrategyHintVacuum API.

This patch also changes the buffer usage-count methodology a bit: we now
advance usage_count when first pinning a buffer, rather than when last
unpinning it.  To preserve the behavior that a buffer's lifetime starts to
decrease when it's released, the clock sweep code is modified to not decrement
usage_count of pinned buffers.

Work not done in this commit: teach GiST and GIN indexes to use the vacuum
BufferAccessStrategy for vacuum-driven fetches.

Original patch by Simon, reworked by Heikki and again by Tom.
2007-05-30 20:12:03 +00:00
Tom Lane
77947c51c0 Fix up pgstats counting of live and dead tuples to recognize that committed
and aborted transactions have different effects; also teach it not to assume
that prepared transactions are always committed.

Along the way, simplify the pgstats API by tying counting directly to
Relations; I cannot detect any redeeming social value in having stats
pointers in HeapScanDesc and IndexScanDesc structures.  And fix a few
corner cases in which counts might be missed because the relation's
pgstat_info pointer hadn't been set.
2007-05-27 03:50:39 +00:00
Tom Lane
63735ca815 Dept. of second thoughts: add comments cautioning against using
ReadOrZeroBuffer to fetch pages from beyond physical EOF.  This would
usually work, but would cause problems for md.c if writes occurred
beyond a segment boundary when the previous segment file hadn't been
fully extended.
2007-05-02 23:34:48 +00:00
Tom Lane
8c3cc86e7b During WAL recovery, when reading a page that we intend to overwrite completely
from the WAL data, don't bother to physically read it; just have bufmgr.c
return a zeroed-out buffer instead.  This speeds recovery significantly,
and also avoids unnecessary failures when a page-to-be-overwritten has corrupt
page headers on disk.  This replaces a former kluge that accomplished the
latter by pretending zero_damaged_pages was always ON during WAL recovery;
which was OK when the kluge was put in, but is unsafe when restoring a WAL
log that was written with full_page_writes off.

Heikki Linnakangas
2007-05-02 23:18:03 +00:00
Bruce Momjian
1c8302cab3 Add comment on why deadlock detection error messages only prints numbers. 2007-04-20 20:15:52 +00:00
Alvaro Herrera
e2a186b03c Add a multi-worker capability to autovacuum. This allows multiple worker
processes to be running simultaneously.  Also, now autovacuum processes do not
count towards the max_connections limit; they are counted separately from
regular processes, and are limited by the new GUC variable
autovacuum_max_workers.

The launcher now has intelligence to launch workers on each database every
autovacuum_naptime seconds, limited only on the max amount of worker slots
available.

Also, the global worker I/O utilization is limited by the vacuum cost-based
delay feature.  Workers are "balanced" so that the total I/O consumption does
not exceed the established limit.  This part of the patch was contributed by
ITAGAKI Takahiro.

Per discussion.
2007-04-16 18:30:04 +00:00
Tom Lane
995ba280c1 Rearrange mdsync() looping logic to avoid the problem that a sufficiently
fast flow of new fsync requests can prevent mdsync() from ever completing.
This was an unforeseen consequence of a patch added in Mar 2006 to prevent
the fsync request queue from overflowing.  Problem identified by Heikki
Linnakangas and independently by ITAGAKI Takahiro; fix based on ideas from
Takahiro-san, Heikki, and Tom.

Back-patch as far as 8.1 because a previous back-patch introduced the problem
into 8.1 ...
2007-04-12 17:10:55 +00:00
Tom Lane
3e23b68dac Support varlena fields with single-byte headers and unaligned storage.
This commit breaks any code that assumes that the mere act of forming a tuple
(without writing it to disk) does not "toast" any fields.  While all available
regression tests pass, I'm not totally sure that we've fixed every nook and
cranny, especially in contrib.

Greg Stark with some help from Tom Lane
2007-04-06 04:21:44 +00:00
Tom Lane
9c9b619473 Remove the CheckpointStartLock in favor of having backends show whether they
are in their commit critical sections via flags in the ProcArray.  Checkpoint
can watch the ProcArray to determine when it's safe to proceed.  This is
a considerably better solution to the original problem of race conditions
between checkpoint and transaction commit: it speeds up commit, since there's
one less lock to fool with, and it prevents the problem of checkpoint being
delayed indefinitely when there's a constant flow of commits.  Heikki, with
some kibitzing from Tom.
2007-04-03 16:34:36 +00:00
Magnus Hagander
335feca441 Add some instrumentation to the bgwriter, through the stats collector.
New view pg_stat_bgwriter, and the functions required to build it.
2007-03-30 18:34:56 +00:00
Tom Lane
e85a01df67 Clean up the representation of special snapshots by including a "method
pointer" in every Snapshot struct.  This allows removal of the case-by-case
tests in HeapTupleSatisfiesVisibility, which should make it a bit faster
(I didn't try any performance tests though).  More importantly, we are no
longer violating portable C practices by assuming that small integers are
distinct from all pointer values, and HeapTupleSatisfiesDirty no longer
has a non-reentrant API involving side-effects on a global variable.

There were a couple of places calling HeapTupleSatisfiesXXX routines
directly rather than through the HeapTupleSatisfiesVisibility macro.
Since these places had to be changed anyway, I chose to make them go
through the macro for uniformity.

Along the way I renamed HeapTupleSatisfiesSnapshot to HeapTupleSatisfiesMVCC
to emphasize that it's only used with MVCC-type snapshots.  I was sorely
tempted to rename HeapTupleSatisfiesVisibility to HeapTupleSatisfiesSnapshot,
but forebore for the moment to avoid confusion and reduce the likelihood that
this patch breaks some of the pending patches.  Might want to reconsider
doing that later.
2007-03-25 19:45:14 +00:00
Bruce Momjian
1e2bfb5811 Cleanup for procarray.c. 2007-03-23 03:16:39 +00:00
Alvaro Herrera
626eb02198 Cleanup the bootstrap code a little, and rename "dummy procs" in the code
comments and variables to "auxiliary proc", per Heikki's request.
2007-03-07 13:35:03 +00:00
Bruce Momjian
a535cdf130 Revert temp_tablespaces because of coding problems, per Tom. 2007-03-06 02:06:15 +00:00
Bruce Momjian
0763a56501 Add lo_truncate() to backend and libpq for large object truncation.
Kris Jurka
2007-03-03 19:52:47 +00:00
Neil Conway
90d76525c5 Add resetStringInfo(), which clears the content of a StringInfo, and
fixup various places in the tree that were clearing a StringInfo by hand.
Making this function a part of the API simplifies client code slightly,
and avoids needlessly peeking inside the StringInfo interface.
2007-03-03 19:32:55 +00:00
Bruce Momjian
e52c4a6e26 Add GUC log_lock_waits to log long wait times.
Simon Riggs
2007-03-03 18:46:40 +00:00
Tom Lane
fb276438b6 Suppress useless searches for unused line pointers in PageAddItem. To do
this, add a 16-bit "flags" field to page headers by stealing some bits from
pd_tli.  We use one flag bit as a hint to indicate whether there are any
unused line pointers; the remaining 15 are available for future use.

This is a cut-down form of an idea proposed by Hiroki Kataoka in July 2005.
At the time it was rejected because the original patch increased the size of
page headers and it wasn't clear that the benefit outweighed the distributed
cost.  The flag-bit approach gets most of the benefit without requiring an
increase in the page header size.

Heikki Linnakangas and Tom Lane
2007-03-02 00:48:44 +00:00
Magnus Hagander
2c6feff5e7 Remove temporary Windows-specific debugging code. 2007-02-28 15:59:30 +00:00
Tom Lane
234a02b2a8 Replace direct assignments to VARATT_SIZEP(x) with SET_VARSIZE(x, len).
Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with
VARSIZE and VARDATA, and as a consequence almost no code was using the
longer names.  Rename the length fields of struct varlena and various
derived structures to catch anyplace that was accessing them directly;
and clean up various places so caught.  In itself this patch doesn't
change any behavior at all, but it is necessary infrastructure if we hope
to play any games with the representation of varlena headers.
Greg Stark and Tom Lane
2007-02-27 23:48:10 +00:00
Bruce Momjian
6f519ad01c btree source code cleanups:
I refactored findsplitloc and checksplitloc so that the division of
labor is more clear IMO. I pushed all the space calculation inside the
loop to checksplitloc.

I also fixed the off by 4 in free space calculation caused by
PageGetFreeSpace subtracting sizeof(ItemIdData), even though it was
harmless, because it was distracting and I felt it might come back to
bite us in the future if we change the page layout or alignments.
There's now a new function PageGetExactFreeSpace that doesn't do the
subtraction.

findsplitloc now tries the "just the new item to right page" split as
well. If people don't like the refactoring, I can write a patch to just
add that.

Heikki Linnakangas
2007-02-21 20:02:17 +00:00
Bruce Momjian
6765df9174 Add configure --enable-profiling to enable GCC profiling. Patches from
Korry Douglas and Nikhil S
2007-02-21 15:12:39 +00:00
Alvaro Herrera
1820650934 Restructure autovacuum in two processes: a dummy process, which runs
continuously, and requests vacuum runs of "autovacuum workers" to postmaster.
The workers do the actual vacuum work.  This allows for future improvements,
like allowing multiple autovacuum jobs running in parallel.

For now, the code keeps the original behavior of having a single autovac
process at any time by sleeping until the previous worker has finished.
2007-02-15 23:23:23 +00:00
Peter Eisentraut
c138b966d4 Replace useless uses of := by = in makefiles. 2007-02-09 15:56:00 +00:00
Bruce Momjian
8b4ff8b6a1 Wording cleanup for error messages. Also change can't -> cannot.
Standard English uses "may", "can", and "might" in different ways:

        may - permission, "You may borrow my rake."

        can - ability, "I can lift that log."

        might - possibility, "It might rain today."

Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice.  Similarly, "It may crash" is better stated, "It might crash".
2007-02-01 19:10:30 +00:00
Bruce Momjian
148ea5cbea Add GUC temp_tablespaces to provide a default location for temporary
objects.

Jaime Casanova
2007-01-25 04:35:11 +00:00
Peter Eisentraut
2cc01004c6 Remove remains of old depend target. 2007-01-20 17:16:17 +00:00
Tom Lane
eddbf39756 Extend yesterday's patch so that the bgwriter is also told to forget
pending fsyncs during DROP DATABASE.  Obviously necessary in hindsight :-(
2007-01-17 16:25:01 +00:00
Tom Lane
6d660587f6 Revise bgwriter fsync-request mechanism to improve robustness when a table
is deleted.  A backend about to unlink a file now sends a "revoke fsync"
request to the bgwriter to make it clean out pending fsync requests.  There
is still a race condition where the bgwriter may try to fsync after the unlink
has happened, but we can resolve that by rechecking the fsync request queue
to see if a revoke request arrived meanwhile.  This eliminates the former
kluge of "just assuming" that an ENOENT failure is okay, and lets us handle
the fact that on Windows it might be EACCES too without introducing any
questionable assumptions.  After an idea of mine improved by Magnus.

The HEAD patch doesn't apply cleanly to 8.2, but I'll see about a back-port
later.  In the meantime this could do with some testing on Windows; I've been
able to force it through the code path via ENOENT, but that doesn't prove that
it actually fixes the Windows problem ...
2007-01-17 00:17:21 +00:00
Alvaro Herrera
eb63cc3da8 Arrange for autovacuum to be killed when another operation wants to be alone
accessing it, like DROP DATABASE.  This allows the regression tests to pass
with autovacuum enabled, which open the gates for finally enabling autovacuum
by default.
2007-01-16 13:28:57 +00:00
Bruce Momjian
d64995aa89 Remove trace macro call from new log_temp_files, until it gets more
research.
2007-01-09 22:03:51 +00:00
Bruce Momjian
be8a431881 Add GUC log_temp_files to log the use of temporary files.
Bill Moran
2007-01-09 21:31:17 +00:00
Bruce Momjian
29dccf5fe0 Update CVS HEAD for 2007 copyright. Back branches are typically not
back-stamped for this.
2007-01-05 22:20:05 +00:00
Tom Lane
ef07221997 Clean up smgr.c/md.c APIs as per discussion a couple months ago. Instead of
having md.c return a success/failure boolean to smgr.c, which was just going
to elog anyway, let md.c issue the elog messages itself.  This allows better
error reporting, particularly in cases such as "short read" or "short write"
which Peter was complaining of.  Also, remove the kluge of allowing mdread()
to return zeroes from a read-beyond-EOF: this is now an error condition
except when InRecovery or zero_damaged_pages = true.  (Hash indexes used to
require that behavior, but no more.)  Also, enforce that mdwrite() is to be
used for rewriting existing blocks while mdextend() is to be used for
extending the relation EOF.  This restriction lets us get rid of the old
ad-hoc defense against creating huge files by an accidental reference to
a bogus block number: we'll only create new segments in mdextend() not
mdwrite() or mdread().  (Again, when InRecovery we allow it anyway, since
we need to allow updates of blocks that were later truncated away.)
Also, clean up the original makeshift patch for bug #2737: move the
responsibility for padding relation segments to full length into md.c.
2007-01-03 18:11:01 +00:00
Tom Lane
72619f8191 Modify local buffer management to request memory for local buffers in blocks
of increasing size, instead of one at a time.  This reduces the memory
management overhead when num_temp_buffers is large: in the previous coding
we would actually waste 50% of the space used for temp buffers, because aset.c
would round the individual requests up to 16K.  Problem noted while studying
a performance issue reported by Steven Flatt.

Back-patch as far as 8.1 --- older versions used few enough local buffers
that the issue isn't significant for them.
2006-12-27 22:31:54 +00:00
Peter Eisentraut
409600942b KB -> kB 2006-11-24 09:20:12 +00:00
Tom Lane
3ad0728c81 On systems that have setsid(2) (which should be just about everything except
Windows), arrange for each postmaster child process to be its own process
group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole
process group not only the direct child process.  This provides saner behavior
for archive and recovery scripts; in particular, it's possible to shut down a
warm-standby recovery server using "pg_ctl stop -m immediate", since delivery
of SIGQUIT to the startup subprocess will result in killing the waiting
recovery_command.  Also, this makes Query Cancel and statement_timeout apply
to scripts being run from backends via system().  (There is no support in the
core backend for that, but it's widely done using untrusted PLs.)  Per gripe
from Stephen Harris and subsequent discussion.
2006-11-21 20:59:53 +00:00
Tom Lane
1a5c450f30 When truncating a relation in-place (eg during VACUUM), do not try to unlink
any no-longer-needed segments; just truncate them to zero bytes and leave
the files in place for possible future re-use.  This avoids problems when
the segments are re-used due to relation growth shortly after truncation.
Before, the bgwriter, and possibly other backends, could still be holding
open file references to the old segment files, and would write dirty blocks
into those files where they'd disappear from the view of other processes.

Back-patch as far as 8.0.  I believe the 7.x branches are not vulnerable,
because they had no bgwriter, and "blind" writes by other backends would
always be done via freshly-opened file references.
2006-11-20 01:07:56 +00:00
Tom Lane
36e012e727 Remove temporary Windows-specific debugging code; it seems the problem
with fopen() not using FILE_SHARE_DELETE was indeed the bug we were after,
given lack of recent reports.
2006-11-06 17:10:22 +00:00
Tom Lane
48188e1621 Fix recently-understood problems with handling of XID freezing, particularly
in PITR scenarios.  We now WAL-log the replacement of old XIDs with
FrozenTransactionId, so that such replacement is guaranteed to propagate to
PITR slave databases.  Also, rather than relying on hint-bit updates to be
preserved, pg_clog is not truncated until all instances of an XID are known to
have been replaced by FrozenTransactionId.  Add new GUC variables and
pg_autovacuum columns to allow management of the freezing policy, so that
users can trade off the size of pg_clog against the amount of freezing work
done.  Revise the already-existing code that forces autovacuum of tables
approaching the wraparound point to make it more bulletproof; also, revise the
autovacuum logic so that anti-wraparound vacuuming is done per-table rather
than per-database.  initdb forced because of changes in pg_class, pg_database,
and pg_autovacuum catalogs.  Heikki Linnakangas, Simon Riggs, and Tom Lane.
2006-11-05 22:42:10 +00:00
Tom Lane
954c1813ac Remove an unnecessary HOLD_INTERRUPTS/RESUME_INTERRUPTS pair.
This was required back when RESUME_INTERRUPTS could actually
execute ProcessInterrupts, but that hasn't been true since 2001...
2006-10-22 20:34:54 +00:00
Tom Lane
e0dece127d Redesign the patch for allocation of shmem space and LWLocks for add-on
modules; the first try was not usable in EXEC_BACKEND builds (e.g.,
Windows).  Instead, just provide some entry points to increase the
allocation requests during postmaster start, and provide a dedicated
LWLock that can be used to synchronize allocation operations performed
by backends.  Per discussion with Marc Munro.
2006-10-15 22:04:08 +00:00
Bruce Momjian
f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Tom Lane
c92f7e258e Replace strncpy with strlcpy in selected places that seem possibly relevant
to performance.  (A wholesale effort to get rid of strncpy should be
undertaken sometime, but not during beta.)  This commit also fixes dynahash.c
to correctly truncate overlength string keys for hashtables, so that its
callers don't have to anymore.
2006-09-27 18:40:10 +00:00
Tom Lane
ffae5cc5a6 Add a check to prevent overwriting valid data if smgrnblocks() gives a
wrong answer, as has been seen to occur with a buggy Linux kernel.  Not
really our bug, but it's a simple test in a seldom-used control path,
so might as well have a defense.
2006-09-25 22:01:10 +00:00
Tom Lane
d40d34863e Fix pg_locks view to call advisory locks advisory locks, while preserving
backward compatibility for anyone using the old userlock code that's now
on pgfoundry --- locks from that code still show as 'userlock'.
2006-09-22 23:20:14 +00:00
Tom Lane
9e936693a9 Fix free space map to correctly track the total amount of FSM space needed
even when a single relation requires more than max_fsm_pages pages.  Also,
make VACUUM emit a warning in this case, since it likely means that VACUUM
FULL or other drastic corrective measure is needed.  Per reports from Jeff
Frost and others of unexpected changes in the claimed max_fsm_pages need.
2006-09-21 20:31:22 +00:00
Tom Lane
9b4cda0df6 Add built-in userlock manipulation functions to replace the former
contrib functionality.  Along the way, remove the USER_LOCKS configuration
symbol, since it no longer makes any sense to try to compile that out.
No user documentation yet ... mmoncure has promised to write some.
Thanks to Abhijit Menon-Sen for creating a first draft to work from.
2006-09-18 22:40:40 +00:00
Tom Lane
2e5e856f6b Marginal cleanup in arrangements for ensuring StrategyHintVacuum is cleared
after an error during VACUUM.  We have a PG_TRY block anyway around the only
call sites, so just reset it in the CATCH clause instead of having
AtEOXact_Buffers blindly do it during xact end.  I think the old code was
actively wrong for the case of a failure during ANALYZE inside a
subtransaction --- the flag wouldn't get cleared until main transaction end.
Probably not worth back-patching though.
2006-09-17 22:16:22 +00:00
Bruce Momjian
a0e87ad7a5 Specify lo_write() to take a _const_ buffer, to match documentation. 2006-09-07 15:37:25 +00:00
Tom Lane
8fad2e3ff4 Arrange for GetSnapshotData to copy live-subtransaction XIDs from the
PGPROC array into snapshots, and use this information to avoid visits
to pg_subtrans in HeapTupleSatisfiesSnapshot.  This appears to solve
the pg_subtrans-related context swap storm problem that's been reported
by several people for 8.1.  While at it, modify GetSnapshotData to not
take an exclusive lock on ProcArrayLock, as closer analysis shows that
shared lock is always sufficient.
Itagaki Takahiro and Tom Lane
2006-09-03 15:59:39 +00:00
Tom Lane
e06fda0a8b Add a function GetLockConflicts() to lock.c to report xacts holding
locks that would conflict with a specified lock request, without
actually trying to get that lock.  Use this instead of the former ad hoc
method of doing the first wait step in CREATE INDEX CONCURRENTLY.
Fixes problem with undetected deadlock and in many cases will allow the
index creation to proceed sooner than it otherwise could've.  Per
discussion with Greg Stark.
2006-08-27 19:14:34 +00:00
Tom Lane
e093dcdd28 Add the ability to create indexes 'concurrently', that is, without
blocking concurrent writes to the table.  Greg Stark, with a little help
from Tom Lane.
2006-08-25 04:06:58 +00:00
Tom Lane
f836c2e37e Add some debug logging code to AllocateFile's failure path to log the
specific Windows error code (GetLastError).  This is a hopefully temporary
hack to try to diagnose rare failures.  Magnus Hagander
2006-08-24 03:15:43 +00:00
Tom Lane
9bf760f7de Add a 'waiting' column to pg_stat_activity to carry the same information
that ps_status provides by appending 'waiting' to the PS display.  This
completes the project of making it feasible to turn off process title
updates and instead rely on pg_stat_activity.  Per my suggestion a few
weeks ago.
2006-08-19 01:36:34 +00:00
Tom Lane
7aa772f03e Now that we've rearranged relation open to get a lock before touching
the rel, it's easy to get rid of the narrow race-condition window that
used to exist in VACUUM and CLUSTER.  Did some minor code-beautification
work in the same area, too.
2006-08-18 16:09:13 +00:00
Tom Lane
2dd7ab0627 Put back another improperly-removed #include. 2006-08-07 21:56:25 +00:00
Tom Lane
3467758809 Fix missing 'static' keywords --- some compilers gripe about this. 2006-08-04 16:42:56 +00:00
Bruce Momjian
2c6d96cef6 Add support for loadable modules to allocated shared memory and
lightweight locks.

Marc Munro
2006-08-01 19:03:11 +00:00
Tom Lane
09d3670df3 Change the relation_open protocol so that we obtain lock on a relation
(table or index) before trying to open its relcache entry.  This fixes
race conditions in which someone else commits a change to the relation's
catalog entries while we are in process of doing relcache load.  Problems
of that ilk have been reported sporadically for years, but it was not
really practical to fix until recently --- for instance, the recent
addition of WAL-log support for in-place updates helped.

Along the way, remove pg_am.amconcurrent: all AMs are now expected to support
concurrent update.
2006-07-31 20:09:10 +00:00
Tom Lane
8822263635 Fix a couple of comments. 2006-07-30 20:17:11 +00:00
Alvaro Herrera
92c2ecc130 Modify snapshot definition so that lazy vacuums are ignored by other
vacuums.  This allows a OLTP-like system with big tables to continue
regular vacuuming on small-but-frequently-updated tables while the
big tables are being vacuumed.

Original patch from Hannu Krossing, rewritten by Tom Lane and updated
by me.
2006-07-30 02:07:18 +00:00
Peter Eisentraut
e9b4969062 DTrace support, with a small initial set of probes
by Robert Lor
2006-07-24 16:32:45 +00:00
Tom Lane
a794fb0681 Convert the lock manager to use the new dynahash.c support for partitioned
hash tables, instead of the previous kluge involving multiple hash tables.
This partially undoes my patch of last December.
2006-07-23 23:08:46 +00:00
Tom Lane
b25dc481c8 Fix oversight in sizing of shared buffer lookup hashtable. Because
BufferAlloc tries to insert a new mapping entry before deleting the old one
for a buffer, we have a transient need for more than NBuffers entries ---
one more in 8.1, and as many as NUM_BUFFER_PARTITIONS more in CVS HEAD.
In theory this could lead to an "out of shared memory" failure if shmem
had already been completely claimed by the time the extra entries were
needed.
2006-07-23 18:34:45 +00:00
Tom Lane
10b9ca3d05 Split the buffer mapping table into multiple separately lockable
partitions, as per discussion.  Passes functionality checks, but
I don't have any performance data yet.
2006-07-23 03:07:58 +00:00
Tom Lane
51ee9fa157 Add support to dynahash.c for partitioning shared hashtables according
to the low-order bits of the entry hash value.  Also make some incidental
cleanups in the dynahash API, such as not exporting the hash header
structs to the world.
2006-07-22 23:04:39 +00:00
Tom Lane
c0e9b3139f Hmm, seems --disable-spinlocks has been broken for awhile and nobody
noticed.  Fix SpinlockSemas() to report the correct count considering
that PG 8.1 adds a spinlock to each shared-buffer header.
2006-07-22 21:04:40 +00:00
Tom Lane
3ff58b48c9 Put back another not-so-unnecessary #include, per report from Hiroshi Saito. 2006-07-16 01:05:23 +00:00
Tom Lane
daecd97617 Put back some more not-so-unused-as-all-that #includes. This un-breaks
the EXEC_BACKEND code on my machines, so hopefully it will fix the
Windows buildfarm members.
2006-07-15 15:47:17 +00:00
Tom Lane
cd24163f6d Fix another passel of include-file breakage. Kris Jurka, Tom Lane 2006-07-14 16:59:19 +00:00
Bruce Momjian
e0522505bd Remove 576 references of include files that were not needed. 2006-07-14 14:52:27 +00:00
Tom Lane
ae643747b1 Fix a passel of recently-committed violations of the rule 'thou shalt
have no other gods before c.h'.  Also remove some demonstrably redundant
#include lines, mostly of <errno.h> which was added to c.h years ago.
2006-07-14 05:28:29 +00:00
Bruce Momjian
a22d76d96a Allow include files to compile own their own.
Strip unused include files out unused include files, and add needed
includes to C files.

The next step is to remove unused include files in C files.
2006-07-13 16:49:20 +00:00
Bruce Momjian
370a709c75 Add GUC update_process_title to control whether 'ps' display is updated
for every command, default to on.
2006-06-27 22:16:44 +00:00
Tom Lane
27c3e3de09 Remove redundant gettimeofday() calls to the extent practical without
changing semantics too much.  statement_timestamp is now set immediately
upon receipt of a client command message, and the various places that used
to do their own gettimeofday() calls to mark command startup are referenced
to that instead.  I have also made stats_command_string use that same
value for pg_stat_activity.query_start for both the command itself and
its eventual replacement by <IDLE> or <idle in transaction>.  There was
some debate about that, but no argument that seemed convincing enough to
justify an extra gettimeofday() call.
2006-06-20 22:52:00 +00:00
Tom Lane
b13c9686d0 Take the statistics collector out of the loop for monitoring backends'
current commands; instead, store current-status information in shared
memory.  This substantially reduces the overhead of stats_command_string
and also ensures that pg_stat_activity is fully up to date at all times.
Per my recent proposal.
2006-06-19 01:51:22 +00:00
Tom Lane
8ff80c1bd3 Remove obsolete comment about VACUUM FULL: it takes buffer content locks
now, and must do so to ensure bgwriter doesn't write a page that is in
process of being compacted.
2006-06-08 14:58:33 +00:00
Bruce Momjian
26cfefabad Fix printf mask for SizeVfdCache
Qingqing Zhou
2006-05-30 13:04:59 +00:00
Tom Lane
2246e31775 Upon closer inspection, the sparc code in s_lock.c is dead code, and
always has been, because it's not got any .globl declaration!  We've
been relying on the solaris_sparc.s code instead.  Rip it out.
(Not back-patched, since this is just cosmetic cleanup.)
2006-05-12 16:50:52 +00:00
Tom Lane
ab1ad7a653 Remove unnecessary .seg/.section directives, per Alan Stange. 2006-05-11 21:58:22 +00:00
Tom Lane
5749f6ef0c Rewrite btree vacuuming to fold the former bulkdelete and cleanup operations
into a single mostly-physical-order scan of the index.  This requires some
ticklish interlocking considerations, but should create no material
performance impact on normal index operations (at least given the
already-committed changes to make scans work a page at a time).  VACUUM
itself should get significantly faster in any index that's degenerated to a
very nonlinear page order.  Also, we save one pass over the index entirely,
except in the case where there were no deletions to do and so only one pass
happened anyway.

Original patch by Heikki Linnakangas, rework by Tom Lane.
2006-05-08 00:00:17 +00:00
Tom Lane
52667d56a3 Rethink the locking mechanisms used for CREATE/DROP/RENAME DATABASE.
The former approach used ExclusiveLock on pg_database, which being a
cluster-wide lock meant only one of these operations could proceed at
a time; worse, it also blocked all incoming connections in ReverifyMyDatabase.
Now that we have LockSharedObject(), we can use locks of different types
applied to databases considered as objects.  This allows much more
flexible management of the interlocking: two CREATE DATABASEs need not
block each other, and need not block connections except to the template
database being used.  Similarly DROP DATABASE doesn't block unrelated
operations.  The locking used in flatfiles.c is also much narrower in
scope than before.  Per recent proposal.
2006-05-04 16:07:29 +00:00
Bruce Momjian
a1ee621589 Fix s_lock_test to use tas.o file, if needed. 2006-04-28 22:54:31 +00:00
Tom Lane
486f994be7 Revise large-object access routines to avoid running with CurrentMemoryContext
set to the large object context ("fscxt"), as this is inevitably a source of
transaction-duration memory leaks.  Not sure why we'd not noticed it before;
maybe people weren't touching a whole lot of LOs in the same transaction
before the 8.1 pg_dump changes.  Per report from Wayne Conrad.

Backpatched as far as 8.1, but the problem doubtless goes all the way back.
I'm disinclined to spend the time to try to verify that the older branches
would still work if patched, seeing that this code was significantly modified
for 8.0 and again for 8.1, and that we don't have any trouble reports before
8.1.  (Maybe the leaks were smaller before?)
2006-04-26 00:34:57 +00:00
Tom Lane
b5498a26de Add some optional code (conditionally compiled under #ifdef LWLOCK_STATS)
to track the number of LWLock acquisitions and the number of times we
block waiting for an LWLock, on a per-process basis.  After having needed
this twice in the past few months, seems like it should go into CVS.
2006-04-21 16:45:12 +00:00
Tom Lane
defe93463c Make the world safe for full_page_writes. Allow XLOG records that try to
update no-longer-existing pages to fall through as no-ops, but make a note
of each page number referenced by such records.  If we don't see a later
XLOG entry dropping the table or truncating away the page, complain at
the end of XLOG replay.  Since this fixes the known failure mode for
full_page_writes = off, revert my previous band-aid patch that disabled
that GUC variable.
2006-04-14 20:27:24 +00:00
Tom Lane
0fcc3c2f1d Repair a low-probability race condition identified by Qingqing Zhou.
If a process abandons a wait in LockBufferForCleanup (in practice,
only happens if someone cancels a VACUUM) just before someone else
sends it a signal indicating the buffer is available, it was possible
for the wakeup to remain in the process' semaphore, causing misbehavior
next time the process waited for an lmgr lock.  Rather than try to
prevent the race condition directly, it seems best to make the lock
manager robust against leftover wakeups, by having it repeat waiting
on the semaphore if the lock has not actually been granted or denied
yet.
2006-04-14 03:38:56 +00:00
Tom Lane
a8b8f4db23 Clean up WAL/buffer interactions as per my recent proposal. Get rid of the
misleadingly-named WriteBuffer routine, and instead require routines that
change buffer pages to call MarkBufferDirty (which does exactly what it says).
We also require that they do so before calling XLogInsert; this takes care of
the synchronization requirement documented in SyncOneBuffer.  Note that
because bufmgr takes the buffer content lock (in shared mode) while writing
out any buffer, it doesn't matter whether MarkBufferDirty is executed before
the buffer content change is complete, so long as the content change is
completed before releasing exclusive lock on the buffer.  So it's OK to set
the dirtybit before we fill in the LSN.
This eliminates the former kluge of needing to set the dirtybit in LockBuffer.
Aside from making the code more transparent, we can also add some new
debugging assertions, in particular that the caller of MarkBufferDirty must
hold the buffer content lock, not merely a pin.
2006-03-31 23:32:07 +00:00
Tom Lane
4243f2387a Suppress attempts to report dropped tables to the stats collector from a
startup or recovery process.  Since such a process isn't a real backend,
pgstat.c gets confused.  This accounts for recent reports of strange
"invalid server process ID -1" log messages during crash recovery.
There isn't any point in attempting to make the report, since we'll discard
stats in such scenarios anyhow.
2006-03-30 22:11:55 +00:00
Tom Lane
6d61cdec07 Clean up and document the API for XLogOpenRelation and XLogReadBuffer.
This commit doesn't make much functional change, but it does eliminate some
duplicated code --- for instance, PageIsNew tests are now done inside
XLogReadBuffer rather than by each caller.
The GIST xlog code still needs a lot of love, but I'll worry about that
separately.
2006-03-29 21:17:39 +00:00
Tom Lane
0a20207060 Arrange to emit a description of the current XLOG record as error context
when an error occurs during xlog replay.  Also, replace the former risky
'write into a fixed-size buffer with no overflow detection' API for XLOG
record description routines; use an expansible StringInfo instead.  (The
latter accounts for most of the patch bulk.)

Qingqing Zhou
2006-03-24 04:32:13 +00:00
Bruce Momjian
f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Tom Lane
60d3c9fdf4 Declare the arguments of AllocateFile() as const char *, not char *.
This is consistent with the standard definition of fopen().
2006-03-04 21:32:47 +00:00
Tom Lane
9a506a6257 Arrange to call AbsorbFsyncRequests every so often while performing a
checkpoint in the bgwriter.  This forestalls overflow of the fsync request
queue, which is not fatal but causes considerable performance degradation
when it occurs (because backends then have to do their own fsyncs).  Per
patch from Itagaki Takahiro, modified a little bit by me.
2006-03-03 00:02:02 +00:00
Bruce Momjian
d5dd3d451e Add contrib/pg_freespacemap to display free space map information.
Mark Kirkwood
2006-02-12 03:55:53 +00:00
Bruce Momjian
59bb147353 Update random() usage so ranges are inclusive/exclusive as required. 2006-02-03 12:45:47 +00:00
Tom Lane
d5db3abfb6 Modify pgstats code to reduce performance penalties from oversized stats data
files: avoid creating stats hashtable entries for tables that aren't being
touched except by vacuum/analyze, ensure that entries for dropped tables are
removed promptly, and tweak the data layout to avoid storing useless struct
padding.  Also improve the performance of pgstat_vacuum_tabstat(), and make
sure that autovacuum invokes it exactly once per autovac cycle rather than
multiple times or not at all.  This should cure recent complaints about 8.1
showing much higher stats I/O volume than was seen in 8.0.  It'd still be a
good idea to revisit the design with an eye to not re-writing the entire
stats dataset every half second ... but that would be too much to backpatch,
I fear.
2006-01-18 20:35:06 +00:00
Tom Lane
558bc2584d Fix fsync code to test whether F_FULLFSYNC is available, instead of
assuming it always is on Darwin.  Per report from Neil Brandt.
2006-01-17 23:52:31 +00:00
Tom Lane
39fc1fb07a Remove logic in XactLockTableWait() that attempted to mark a crashed
transaction as aborted.  Since we only call XactLockTableWait on XIDs
that we believe to be currently running, the odds of this code ever
actually firing are minimal.  It's certainly unnecessary, since a
transaction that's not either running or committed will be presumed
aborted anyway.  What's more, it's not hard to imagine scenarios where
this could result in corrupting pg_clog: for instance, if a bogus XID
somehow got passed to XactLockTableWait.  I think the code probably
dates from the ancient era when we didn't have TransactionIdIsInProgress;
back then it may have been necessary, but now I think it's a waste of
cycles and potentially dangerous.  Per discussion with Qingqing Zhou
and Karsten Hilbert.
2006-01-13 21:32:12 +00:00
Tom Lane
304160c3e2 Fix ReadBuffer() to correctly handle the case where it's trying to extend
the relation but it finds a pre-existing valid buffer.  The buffer does not
correspond to any page known to the kernel, so we *must* do smgrextend to
ensure that the space becomes allocated.  The 7.x branches all do this
correctly, but the corner case got lost somewhere during 8.0 bufmgr rewrites.
(My fault no doubt :-( ... I think I assumed that such a buffer must be
not-BM_VALID, which is not so.)
2006-01-06 00:04:20 +00:00
Bruce Momjian
44f9021223 Remove BEOS port. 2006-01-05 03:01:38 +00:00
Tom Lane
349f40b2c2 Rearrange backend startup sequence so that ShmemIndexLock can become
an LWLock instead of a spinlock.  This hardly matters on Unix machines
but should improve startup performance on Windows (or any port using
EXEC_BACKEND).  Per previous discussion.
2006-01-04 21:06:32 +00:00
Tom Lane
195f164228 Get rid of the SpinLockAcquire/SpinLockAcquire_NoHoldoff distinction
in favor of having just one set of macros that don't do HOLD/RESUME_INTERRUPTS
(hence, these correspond to the old SpinLockAcquire_NoHoldoff case).
Given our coding rules for spinlock use, there is no reason to allow
CHECK_FOR_INTERRUPTS to be done while holding a spinlock, and also there
is no situation where ImmediateInterruptOK will be true while holding a
spinlock.  Therefore doing HOLD/RESUME_INTERRUPTS while taking/releasing a
spinlock is just a waste of cycles.  Qingqing Zhou and Tom Lane.
2005-12-29 18:08:05 +00:00
Tom Lane
fb3dbdf986 Rethink prior patch to filter out dead backend entries from the pgstats
file.  The original code probed the PGPROC array separately for each PID,
which was not good for large numbers of backends: not only is the runtime
O(N^2) but most of it is spent holding ProcArrayLock.  Instead, take the
lock just once and copy the active PIDs into an array, then use qsort
and bsearch so that the lookup time is more like O(N log N).
2005-12-16 04:03:40 +00:00
Tom Lane
ec0baf949e Divide the lock manager's shared state into 'partitions', so as to
reduce contention for the former single LockMgrLock.  Per my recent
proposal.  I set it up for 16 partitions, but on a pgbench test this
gives only a marginal further improvement over 4 partitions --- we need
to test more scenarios to choose the number of partitions.
2005-12-11 21:02:18 +00:00
Tom Lane
c599a247bb Simplify lock manager data structures by making a clear separation between
the data defining the semantics of a lock method (ie, conflict resolution
table and ancillary data, which is all constant) and the hash tables
storing the current state.  The only thing we give up by this is the
ability to use separate hashtables for different lock methods, but there
is no need for that anyway.  Put some extra fields into the LockMethod
definition structs to clean up some other uglinesses, like hard-wired
tests for DEFAULT_LOCKMETHOD and USER_LOCKMETHOD.  This commit doesn't
do anything about the performance issues we were discussing, but it clears
away some of the underbrush that's in the way of fixing that.
2005-12-09 01:22:04 +00:00
Tom Lane
f38c3e778a Fix thinko in comment. 2005-12-08 15:38:29 +00:00
Tom Lane
887a7c61f6 Get rid of slru.c's hardwired insistence on a fixed number of slots per
SLRU area.  The number of slots is still a compile-time constant (someday
we might want to change that), but at least it's a different constant for
each SLRU area.  Increase number of subtrans buffers to 32 based on
experimentation with a heavily subtrans-bashing test case, and increase
number of multixact member buffers to 16, since it's obviously silly for
it not to be at least twice the number of multixact offset buffers.
2005-12-06 23:08:34 +00:00
Tom Lane
a98871b7ac Tweak indexscan machinery to avoid taking an AccessShareLock on an index
if we already have a stronger lock due to the index's table being the
update target table of the query.  Same optimization I applied earlier
at the table level.  There doesn't seem to be much interest in the more
radical idea of not locking indexes at all, so do what we can ...
2005-12-03 05:51:03 +00:00
Tom Lane
ace17c1d82 Retry in FileRead and FileWrite if Windows returns ERROR_NO_SYSTEM_RESOURCES.
Also add a retry for Unixen returning EINTR, which hasn't been reported
as an issue but at least theoretically could be.  Patch by Qingqing Zhou,
some minor adjustments by me.
2005-12-01 20:24:18 +00:00
Bruce Momjian
436a2956d8 Re-run pgindent, fixing a problem where comment lines after a blank
comment line where output as too long, and update typedefs for /lib
directory.  Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).

Backpatch to 8.1.X.
2005-11-22 18:17:34 +00:00
Tom Lane
c859308aba DropRelFileNodeBuffers failed to fix the state of the lookup hash table
that was added to localbuf.c in 8.1; therefore, applying it to a temp table
left corrupt lookup state in memory.  The only case where this had a
significant chance of causing problems was an ON COMMIT DELETE ROWS temp
table; the other possible paths left bogus state that was unlikely to
be used again.  Per report from Csaba Nagy.
2005-11-17 17:42:02 +00:00
Tom Lane
48052de722 Repair an error introduced by log_line_prefix patch: it is not acceptable
to assume that the string pointer passed to set_ps_display is good forever.
There's no need to anyway since ps_status.c itself saves the string, and
we already had an API (get_ps_display) to return it.
I believe this explains Jim Nasby's report of intermittent crashes in
elog.c when %i format code is in use in log_line_prefix.
While at it, repair a previously unnoticed problem: on some platforms such as
Darwin, the string returned by get_ps_display was blank-padded to the maximum
length, meaning that lock.c's attempt to append " waiting" to it never worked.
2005-11-05 03:04:53 +00:00
Peter Eisentraut
07bb9f086b Message corrections 2005-10-29 00:31:52 +00:00
Tom Lane
fbbe00242d Tweak buffer manager so that 'internal' accesses to a buffer do not
advance its usage_count.  This includes writes of dirty buffers triggered
by bgwriter, checkpoint, or FlushRelationBuffers, as well as various
corner cases that really ought not count as accesses to the page.
Should make for some marginal improvement in the quality of our decisions
about when to recycle buffers.  Per suggestion from ITAGAKI Takahiro.
2005-10-27 17:07:58 +00:00
Bruce Momjian
1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Neil Conway
c10dba2fe3 Remove an antiquated comment. 2005-10-13 06:24:05 +00:00
Tom Lane
fa72121594 Fix another recently-changed place that was messing with spinlock-
protected data structures and not using a volatile pointer for same.
2005-10-12 16:55:59 +00:00
Tom Lane
07eeb9d109 Do all accesses to shared buffer headers through volatile-qualified
pointers, to ensure that compilers won't rearrange accesses to occur
while we're not holding the buffer header spinlock.  It's probably
not necessary to mark volatile in every single place in bufmgr.c,
but better safe than sorry.  Per trouble report from Kevin Grittner.
2005-10-12 16:45:14 +00:00
Tom Lane
a72ee09090 Add infrastructure for making spins_per_delay variable depending on
whether we seem to be running in a uniprocessor or multiprocessor.
The adjustment rules could probably still use further tweaking, but
I'm convinced this should be a win overall.
2005-10-11 20:41:32 +00:00
Tom Lane
82e861fbe1 Fix LWLockAssign() so that it can safely be executed after postmaster
initialization.  Add spinlocking, fix EXEC_BACKEND unsafeness.
2005-10-07 21:42:38 +00:00
Tom Lane
bb55e583f6 Allocate a few extra LWLocks for possible use by add-on modules.
Per request from Marc Munro.
2005-10-07 20:11:03 +00:00
Bruce Momjian
4f915cd377 This patch cleans up the access to members of ItemIdData.
It uses existing macros instead of touching directly.

ITAGAKI Takahiro
2005-09-22 16:46:00 +00:00
Bruce Momjian
658657177e Print proper cause of statement cancel, user interaction or timeout. 2005-09-19 17:21:49 +00:00
Tom Lane
dc06734a72 Force the size and alignment of LWLock array entries to be either 16 or 32
bytes.  This shouldn't make any difference on x86 machines, where the size
happened to be 16 bytes anyway, but on 64-bit machines and machines with
slock_t int or wider, it will speed array indexing and hopefully reduce
SMP cache contention effects.  Per recent experimentation.
2005-09-16 00:30:05 +00:00
Tom Lane
396526d8c3 Adjust m68k spinlock code to avoid duplicate in-line and not-in-line
definitions on recent Linux systems, per Martin Pitt.
2005-08-26 14:47:35 +00:00
Tom Lane
1a33436224 Replace out-of-line tas() assembly code for MIPS with a properly
constrained GCC inline version.  Thiemo Seufer, by way of Martin Pitt.
2005-08-25 17:17:10 +00:00
Tom Lane
0007490e09 Convert the arithmetic for shared memory size calculation from 'int'
to 'Size' (that is, size_t), and install overflow detection checks in it.
This allows us to remove the former arbitrary restrictions on NBuffers
etc.  It won't make any difference in a 32-bit machine, but in a 64-bit
machine you could theoretically have terabytes of shared buffers.
(How efficiently we could manage 'em remains to be seen.)  Similarly,
num_temp_buffers, work_mem, and maintenance_work_mem can be set above
2Gb on a 64-bit machine.  Original patch from Koichi Suzuki, additional
work by moi.
2005-08-20 23:26:37 +00:00
Tatsuo Ishii
bc3991c185 Add BackendXidGetPid(). 2005-08-20 01:26:36 +00:00
Bruce Momjian
28d0515d18 Fix FSM warning to mention increasing max_fsm_pages. Was incorrectly
max_fsm_relations.
2005-08-17 03:50:59 +00:00
Bruce Momjian
27639809d2 Reverse out Assert addition. 2005-08-12 23:13:54 +00:00
Bruce Momjian
fab177e64f Improve documention on loading large data sets into plperl.
David Fetter
2005-08-12 21:42:53 +00:00
Tom Lane
3ae7e4a33b Remove BufferBlockPointers array in favor of a base + (bufnum) * BLCKSZ
computation.  On modern machines this is as fast if not faster, and we
don't have to clog the CPU's L2 cache with a tens-of-KB pointer array.
If we ever decide to adopt a more dynamic allocation method for shared
buffers, we'll probably have to revert this patch, but in the meantime
we might as well save a few bytes and nanoseconds.  Per Qingqing Zhou.
2005-08-12 05:05:51 +00:00
Tom Lane
721e53785d Solve the problem of OID collisions by probing for duplicate OIDs
whenever we generate a new OID.  This prevents occasional duplicate-OID
errors that can otherwise occur once the OID counter has wrapped around.
Duplicate relfilenode values are also checked for when creating new
physical files.  Per my recent proposal.
2005-08-12 01:36:05 +00:00
Tom Lane
15269b5955 Avoid useless loop overhead in AtEOXact routines when the backend is
compiled with USE_ASSERT_CHECKING but is running with assert_enabled false.
2005-08-08 19:44:22 +00:00
Tom Lane
7117cd3a77 Cause ShutdownPostgres to do a normal transaction abort during backend
exit, instead of trying to take shortcuts.  Introduce some additional
shutdown callback routines to eliminate kluges like having ProcKill
be responsible for shutting down the buffer manager.  Ensure that the
order of operations during shutdown is predictable and what you would
expect given the module layering.
2005-08-08 03:12:16 +00:00
Tom Lane
5337ad464e Fix count_usable_fds() to stop trying to open files once it reaches
max_files_per_process.  Going further than that is just a waste of
cycles, and it seems that current Cygwin does not cope gracefully
with deliberately running the system out of FDs.  Per Andrew Dunstan.
2005-08-07 18:47:19 +00:00
Tom Lane
6eac4e69cf Tweak BgBufferSync() so that a persistent write error on a dirty buffer
doesn't block the bgwriter from making progress writing out other buffers.
This was a hard problem in the context of the ARC/2Q design, but it's
trivial in the context of clock sweep ... just advance the sweep counter
before we try to write not after.
2005-08-02 20:52:08 +00:00
Tom Lane
2a4fad1a0e Add NOWAIT option to SELECT FOR UPDATE/SHARE.
Original patch by Hans-Juergen Schoenig, revisions by Karel Zak
and Tom Lane.
2005-08-01 20:31:16 +00:00
Tom Lane
d42cf5a42a Add per-user and per-database connection limit options.
This patch also includes preliminary update of pg_dumpall for roles.
Petr Jelinek, with review by Bruce Momjian and Tom Lane.
2005-07-31 17:19:22 +00:00
Bruce Momjian
1521aef1db SUNOS4_CC -> SUNOS_CC. 2005-07-30 03:07:42 +00:00
Tom Lane
eb5949d190 Arrange for the postmaster (and standalone backends, initdb, etc) to
chdir into PGDATA and subsequently use relative paths instead of absolute
paths to access all files under PGDATA.  This seems to give a small
performance improvement, and it should make the system more robust
against naive DBAs doing things like moving a database directory that
has a live postmaster in it.  Per recent discussion.
2005-07-04 04:51:52 +00:00
Tom Lane
b95ae32b41 Avoid WAL-logging individual tuple insertions during CREATE TABLE AS
(a/k/a SELECT INTO).  Instead, flush and fsync the whole relation before
committing.  We do still need the WAL log when PITR is active, however.
Simon Riggs and Tom Lane.
2005-06-20 18:37:02 +00:00
Tom Lane
3f749924f8 Simplify uses of readdir() by creating a function ReadDir() that
includes error checking and an appropriate ereport(ERROR) message.
This gets rid of rather tedious and error-prone manipulation of errno,
as well as a Windows-specific bug workaround, at more than a dozen
call sites.  After an idea in a recent patch by Heikki Linnakangas.
2005-06-19 21:34:03 +00:00
Tom Lane
d0a89683a3 Two-phase commit. Original patch by Heikki Linnakangas, with additional
hacking by Alvaro Herrera and Tom Lane.
2005-06-17 22:32:51 +00:00
Tom Lane
8563ccae2c Simplify shared-memory lock data structures as per recent discussion:
it is sufficient to track whether a backend holds a lock or not, and
store information about transaction vs. session locks only in the
inside-the-backend LocalLockTable.  Since there can now be but one
PROCLOCK per lock per backend, LockCountMyLocks() is no longer needed,
thus eliminating some O(N^2) behavior when a backend holds many locks.
Also simplify the LockAcquire/LockRelease API by passing just a
'sessionLock' boolean instead of a transaction ID.  The previous API
was designed with the idea that per-transaction lock holding would be
important for subtransactions, but now that we have subtransactions we
know that this is unwanted.  While at it, add an 'isTempObject' parameter
to LockAcquire to indicate whether the lock is being taken on a temp
table.  This is not used just yet, but will be needed shortly for
two-phase commit.
2005-06-14 22:15:33 +00:00
Tom Lane
a2fb7b8a1f Adjust lo_open() so that specifying INV_READ without INV_WRITE creates
a descriptor that uses the current transaction snapshot, rather than
SnapshotNow as it did before (and still does if INV_WRITE is set).
This means pg_dump will now dump a consistent snapshot of large object
contents, as it never could do before.  Also, add a lo_create() function
that is similar to lo_creat() but allows the desired OID of the large
object to be specified.  This will simplify pg_restore considerably
(but I'll fix that in a separate commit).
2005-06-13 02:26:53 +00:00
Tom Lane
ee7ac7b11e Modify XLogInsert API to make callers specify whether pages to be backed
up have the standard layout with unused space between pd_lower and pd_upper.
When this is set, XLogInsert will omit the unused space without bothering
to scan it to see if it's zero.  That saves time in XLogInsert, and also
allows reversion of my earlier patch to make PageRepairFragmentation et al
explicitly re-zero freed space.  Per suggestion by Heikki Linnakangas.
2005-06-06 20:22:58 +00:00
Tom Lane
4c8495a1f2 Remove the mostly-stubbed-out-anyway support routines for WAL UNDO.
That code is never going to be used in the foreseeable future, and
where it's more than a stub it's making the redo routines harder to
read.
2005-06-06 17:01:25 +00:00
Tom Lane
21fda22ec4 Change CRCs in WAL records from 64bit to 32bit for performance reasons.
Instead of a separate CRC on each backup block, include backup blocks
in their parent WAL record's CRC; this is important to ensure that the
backup block really goes with the WAL record, ie there was not a page
tear right at the start of the backup block.  Implement a simple form
of compression of backup blocks: drop any run of zeroes starting at
pd_lower, so as not to store the unused 'hole' that commonly exists in
PG heap and index pages.  Tweak PageRepairFragmentation and related
routines to ensure they keep the unused space zeroed, so that the above
compression method remains effective.  All per recent discussions.
2005-06-02 05:55:29 +00:00
Tom Lane
140b078d2a Improve LockAcquire API per my recent proposal. All error conditions
are now reported via elog, eliminating the need to test the result code
at most call sites.  Make it possible for the caller to distinguish a
freshly acquired lock from one already held in the current transaction.
Use that capability to avoid redundant AcceptInvalidationMessages() calls
in LockRelation().
2005-05-29 22:45:02 +00:00
Tom Lane
e92a88272e Modify hash_search() API to prevent future occurrences of the error
spotted by Qingqing Zhou.  The HASH_ENTER action now automatically
fails with elog(ERROR) on out-of-memory --- which incidentally lets
us eliminate duplicate error checks in quite a bunch of places.  If
you really need the old return-NULL-on-out-of-memory behavior, you
can ask for HASH_ENTER_NULL.  But there is now an Assert in that path
checking that you aren't hoping to get that behavior in a palloc-based
hash table.
Along the way, remove the old HASH_FIND_SAVE/HASH_REMOVE_SAVED actions,
which were not being used anywhere anymore, and were surely too ugly
and unsafe to want to see revived again.
2005-05-29 04:23:07 +00:00
Bruce Momjian
6dc7760ac3 Add support for wal_fsync_writethrough for Darwin, and restructure the
code to better handle writethrough.

Chris Campbell
2005-05-20 14:53:26 +00:00
Tom Lane
f519d04a43 Update comment that I missed the first time around. 2005-05-19 23:57:11 +00:00
Tom Lane
191b13aaca Factor out lock cleanup code that is needed in several places in lock.c.
Also, remove the rather useless return value of LockReleaseAll.  Change
response to detection of corruption in the shared lock tables to PANIC,
since that is the only way of cleaning up fully.
Originally an idea of Heikki Linnakangas, variously hacked on by
Alvaro Herrera and Tom Lane.
2005-05-19 23:30:18 +00:00
Tom Lane
ee3b71f6bc Split the shared-memory array of PGPROC pointers out of the sinval
communication structure, and make it its own module with its own lock.
This should reduce contention at least a little, and it definitely makes
the code seem cleaner.  Per my recent proposal.
2005-05-19 21:35:48 +00:00
Neil Conway
f38e413b20 Code cleanup: in C89, there is no point casting the first argument to
memset() or MemSet() to a char *. For one, memset()'s first argument is
a void *, and further void * can be implicitly coerced to/from any other
pointer type.
2005-05-11 01:26:02 +00:00
Tom Lane
93b2477278 Use the standard lock manager to establish priority order when there
is contention for a tuple-level lock.  This solves the problem of a
would-be exclusive locker being starved out by an indefinite succession
of share-lockers.  Per recent discussion with Alvaro.
2005-04-30 19:03:33 +00:00
Tom Lane
3a694bb0a1 Restructure LOCKTAG as per discussions of a couple months ago.
Essentially, we shoehorn in a lockable-object-type field by taking
a byte away from the lockmethodid, which can surely fit in one byte
instead of two.  This allows less artificial definitions of all the
other fields of LOCKTAG; we can get rid of the special pg_xactlock
pseudo-relation, and also support locks on individual tuples and
general database objects (including shared objects).  None of those
possibilities are actually exploited just yet, however.

I removed pg_xactlock from pg_class, but did not force initdb for
that change.  At this point, relkind 's' (SPECIAL) is unused and
could be removed entirely.
2005-04-29 22:28:24 +00:00
Tom Lane
bedb78d386 Implement sharable row-level locks, and use them for foreign key references
to eliminate unnecessary deadlocks.  This commit adds SELECT ... FOR SHARE
paralleling SELECT ... FOR UPDATE.  The implementation uses a new SLRU
data structure (managed much like pg_subtrans) to represent multiple-
transaction-ID sets.  When more than one transaction is holding a shared
lock on a particular row, we create a MultiXactId representing that set
of transactions and store its ID in the row's XMAX.  This scheme allows
an effectively unlimited number of row locks, just as we did before,
while not costing any extra overhead except when a shared lock actually
has to be shared.   Still TODO: use the regular lock manager to control
the grant order when multiple backends are waiting for a row lock.

Alvaro Herrera and Tom Lane.
2005-04-28 21:47:18 +00:00
Bruce Momjian
3b0a5e50d7 Update VACUUM VERBOSE FSM message, per Tom. 2005-04-24 03:51:49 +00:00
Bruce Momjian
714d5a4c37 Update VACUUM VERBOSE update, per Alvaro. 2005-04-23 21:16:34 +00:00
Bruce Momjian
9ba6587f8b Update working of VACUUM VERBOSE. 2005-04-23 21:10:20 +00:00
Bruce Momjian
52e08c35f7 Make VACUUM VERBOSE FSM output all output in a single INFO output
statement.
2005-04-23 20:56:01 +00:00
Bruce Momjian
e947e1153a Modify output of VACUUM VERBOSE to be clearer. 2005-04-23 15:20:39 +00:00
Neil Conway
ea208aca00 Remove an unused variable "waitingForSignal". From Qingqing Zhou. 2005-04-15 04:18:10 +00:00
Tom Lane
162bd08b3f Completion of project to use fixed OIDs for all system catalogs and
indexes.  Replace all heap_openr and index_openr calls by heap_open
and index_open.  Remove runtime lookups of catalog OID numbers in
various places.  Remove relcache's support for looking up system
catalogs by name.  Bulky but mostly very boring patch ...
2005-04-14 20:03:27 +00:00
Tom Lane
2193a856a2 Simplify initdb-time assignment of OIDs as I proposed yesterday, and
avoid encroaching on the 'user' range of OIDs by allowing automatic
OID assignment to use values below 16k until we reach normal operation.

initdb not forced since this doesn't make any incompatible change;
however a lot of stuff will have different OIDs after your next initdb.
2005-04-13 18:54:57 +00:00
Tom Lane
badb83f9ec If we're going to have a non-panic check for held_lwlocks[] overrun,
it must occur *before* we get into the critical state of holding a
lock we have no place to record.  Per discussion with Qingqing Zhou.
2005-04-08 14:18:35 +00:00
Tom Lane
e794dfa511 Use an always-there test, not an Assert, to check for overrun of
the held_lwlocks[] array.  Per Qingqing Zhou.
2005-04-08 03:43:54 +00:00
Neil Conway
5b1c607abe Remove an unused variable `ShmemBootstrap', and remove an obsolete
comment. Patch from Alvaro.
2005-04-04 04:34:41 +00:00
Tom Lane
94e03330cb Create a routine PageIndexMultiDelete() that replaces a loop around
PageIndexTupleDelete() with a single pass of compactification ---
logic mostly lifted from PageRepairFragmentation.  I noticed while
profiling that a VACUUM that's cleaning up a whole lot of deleted
tuples would spend as much as a third of its CPU time in
PageIndexTupleDelete; not too surprising considering the loop method
was roughly O(N^2) in the number of tuples involved.
2005-03-22 06:17:03 +00:00
Tom Lane
354049c709 Remove unnecessary calls of FlushRelationBuffers: there is no need
to write out data that we are about to tell the filesystem to drop.
smgr_internal_unlink already had a DropRelFileNodeBuffers call to
get rid of dead buffers without a write after it's no longer possible
to roll back the deleting transaction.  Adding a similar call in
smgrtruncate simplifies callers and makes the overall division of
labor clearer.  This patch removes the former behavior that VACUUM
would write all dirty buffers of a relation unconditionally.
2005-03-20 22:00:54 +00:00
Tom Lane
91728fa26c Add temp_buffers GUC variable to allow users to determine the size
of the local buffer arena for temporary table access.
2005-03-19 23:27:11 +00:00
Tom Lane
d65522aeb6 Upgrade localbuf.c to use a hash table instead of linear search to
find already-allocated local buffers.  This is the last obstacle
in the way of setting NLocBuffer to something reasonably large.
2005-03-19 17:39:43 +00:00
Tom Lane
88164799ce Need to reset local buffer pin counts, not only shared buffer pins,
before we attempt any file deletions in ShutdownPostgres.  Per Tatsuo.
2005-03-18 16:16:09 +00:00
Tom Lane
cef01c3355 Avoid infinite loop in InvalidateBuffer if we ourselves are holding
a pin on the victim buffer.
2005-03-18 05:25:23 +00:00
Bruce Momjian
2c4dea126a Issue free space notices to both the user and the server log file. 2005-03-14 20:15:09 +00:00
Bruce Momjian
45905425a0 Add warning about the need to increase "max_fsm_relations" and
"max_fsm_relations" for vacuums.  Also improve VACUUM VERBOSE final
message text.

Ron Mayer
2005-03-12 05:21:52 +00:00
Neil Conway
c129c16492 Slight refactoring and optimization of some code in WaitOnLock(). 2005-03-11 03:52:06 +00:00
Tom Lane
5d5087363d Replace the BufMgrLock with separate locks on the lookup hashtable and
the freelist, plus per-buffer spinlocks that protect access to individual
shared buffer headers.  This requires abandoning a global freelist (since
the freelist is a global contention point), which shoots down ARC and 2Q
as well as plain LRU management.  Adopt a clock sweep algorithm instead.
Preliminary results show substantial improvement in multi-backend situations.
2005-03-04 20:21:07 +00:00
Tom Lane
a2ad04f4b0 Release proclock immediately in RemoveFromWaitQueue() if it represents
no held locks.  This maintains the invariant that proclocks are present
only for procs that are holding or awaiting a lock; when this is not
true, LockRelease will fail.  Per report from Stephen Clouse.
2005-03-01 21:14:59 +00:00
Bruce Momjian
0542b1e2fe Use _() macro consistently rather than gettext(). Add translation
macros around strings that were missing them.
2005-02-22 04:43:23 +00:00
Neil Conway
11635c3f6f Refactor some duplicated code in lock.c: create UnGrantLock(), move code
from LockRelease() and LockReleaseAll() into it. From Heikki Linnakangas.
2005-02-04 02:04:53 +00:00
Tom Lane
cc4f58f4cd Ensure that all details of the ARC algorithm are hidden within freelist.c.
This refactoring does not change any algorithms or data structures, just
remove visibility of the ARC datastructures from other source files.
2005-02-03 23:29:19 +00:00
Neil Conway
a885ecd6ef Change heap_modifytuple() to require a TupleDesc rather than a
Relation. Patch from Alvaro Herrera, minor editorializing by
Neil Conway.
2005-01-27 23:24:11 +00:00
Tom Lane
0ce4d56924 Phase 1 of fix for 'SMgrRelation hashtable corrupted' problem. This
is the minimum required fix.  I want to look next at taking advantage of
it by simplifying the message semantics in the shared inval message queue,
but that part can be held over for 8.1 if it turns out too ugly.
2005-01-10 20:02:24 +00:00
Tom Lane
c9d8edc906 Repair bufmgr deadlock problem reported by Michael Wildpaner. Must take
share lock on a buffer being written out before releasing BufMgrLock in
the BufferAlloc code path; if we do it later we might block on someone
who's re-pinned the buffer.  I believe this is only an issue for BufferAlloc
and not the other places that call FlushBuffer.  BufferSync must continue
to do it the old way since it may well be trying to write buffers that
other backends have pinned; but it should not be holding any conflicting
locks.  FlushRelationBuffers is okay since it's got exclusive lock at the
relation level.
2005-01-03 18:49:41 +00:00
PostgreSQL Daemon
2ff501590b Tag appropriate files for rc3
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
2004-12-31 22:04:05 +00:00
Tom Lane
96ecf9d5aa Support Sun's compiler on SunOS4 (a/k/a Solaris 9). Per ayan@ayan.net 2004-12-29 23:47:40 +00:00
Tom Lane
eee5abce46 Refactor EXEC_BACKEND code so that postmaster child processes reattach
to shared memory as soon as possible, ie, right after read_backend_variables.
The effective difference from the original code is that this happens
before instead of after read_nondefault_variables(), which loads GUC
information and is apparently capable of expanding the backend's memory
allocation more than you'd think it should.  This should fix the
failure-to-attach-to-shared-memory reports we've been seeing on Windows.
Also clean up a few bits of unnecessarily grotty EXEC_BACKEND code.
2004-12-29 21:36:09 +00:00
Bruce Momjian
08690d0688 Allow NetBSD, m64k to compile the ASM spinlock code.
R?mi Zara
2004-12-18 22:12:52 +00:00
Neil Conway
4acc97d7e4 Assert that BufferIsPinned() in IncrBufferRefCount(), rather than using
a home-brewed combination of assertions that boiled down to the same
thing.
2004-11-24 02:56:17 +00:00
Tom Lane
8ecbc46bdb Reduce the default size of the local lock hash table. There's usually
no need for it to be nearly as big as the global hash table, and since
it's not in shared memory it can grow if it does need to be bigger.
By reducing the size, we speed up hash_seq_search(), which saves a
significant fraction of subtransaction entry/exit overhead.
2004-11-20 20:16:54 +00:00
Peter Eisentraut
0ed3c7665e Small message clarifications 2004-11-05 17:11:34 +00:00
Neil Conway
8ec05b28b7 Modify hash_create() to elog(ERROR) if an error occurs, rather than
returning a NULL pointer (some callers remembered to check the return
value, but some did not -- it is safer to just bail out).

Also, cleanup pgstat.c to use elog(ERROR) rather than elog(LOG) followed
by exit().
2004-10-25 00:46:43 +00:00
Tom Lane
4347cc2392 Allow background writing to be shut down by setting limit values to zero.
This does not disable the bgwriter process: it still has to wake up often
enough to collect fsync requests from backends in a timely fashion.  But
it responds to the recent gripe about not being able to prevent the disk
from being spun up constantly.
2004-10-17 22:01:51 +00:00
Tom Lane
fdd13f1568 Give the ResourceOwner mechanism full responsibility for releasing buffer
pins at end of transaction, and reduce AtEOXact_Buffers to an Assert
cross-check that this was done correctly.  When not USE_ASSERT_CHECKING,
AtEOXact_Buffers is a complete no-op.  This gets rid of an O(NBuffers)
bottleneck during transaction commit/abort, which recent testing has shown
becomes significant above a few tens of thousands of shared buffers.
2004-10-16 18:57:26 +00:00
Tom Lane
1c2de47746 Remove BufferLocks[] array in favor of a single pointer to the buffer
(if any) currently waited for by LockBufferForCleanup(), which is all
that we were using it for anymore.  Saves some space and eliminates
proportional-to-NBuffers slowdown in UnlockBuffers().
2004-10-16 18:05:07 +00:00
Tom Lane
9ffc8ed58b Repair possible failure to update hint bits back to disk, per
http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php.
This fix is intended to be permanent: it moves the responsibility for
calling SetBufferCommitInfoNeedsSave() into the tqual.c routines,
eliminating the requirement for callers to test whether t_infomask changed.
Also, tighten validity checking on buffer IDs in bufmgr.c --- several
routines were paranoid about out-of-range shared buffer numbers but not
about out-of-range local ones, which seems a tad pointless.
2004-10-15 22:40:29 +00:00
Neil Conway
0683a47556 Allow the spinlock test to be compiled successfully in a vpath build. 2004-10-07 00:08:04 +00:00
Tom Lane
0fb3152ea9 Minor adjustments to improve the accuracy of our computation of required
shared memory size.
2004-09-29 15:15:56 +00:00
Tom Lane
3a246cc285 Arrange to preallocate all required space for the buffer and FSM hash
tables in shared memory.  This ensures that overflow of the lock table
creates no long-lasting problems.  Per discussion with Merlin Moncure.
2004-09-28 20:46:37 +00:00
Tom Lane
86fff990b2 RecentXmin is too recent to use as the cutoff point for accessing
pg_subtrans --- what we need is the oldest xmin of any snapshot in use
in the current top transaction.  Introduce a new variable TransactionXmin
to play this role.  Fixes intermittent regression failure reported by
Neil Conway.
2004-09-16 18:35:23 +00:00
Tom Lane
8f9f198603 Restructure subtransaction handling to reduce resource consumption,
as per recent discussions.  Invent SubTransactionIds that are managed like
CommandIds (ie, counter is reset at start of each top transaction), and
use these instead of TransactionIds to keep track of subtransaction status
in those modules that need it.  This means that a subtransaction does not
need an XID unless it actually inserts/modifies rows in the database.
Accordingly, don't assign it an XID nor take a lock on the XID until it
tries to do that.  This saves a lot of overhead for subtransactions that
are only used for error recovery (eg plpgsql exceptions).  Also, arrange
to release a subtransaction's XID lock as soon as the subtransaction
exits, in both the commit and abort cases.  This avoids holding many
unique locks after a long series of subtransactions.  The price is some
additional overhead in XactLockTableWait, but that seems acceptable.
Finally, restructure the state machine in xact.c to have a more orthogonal
set of states for subtransactions.
2004-09-16 16:58:44 +00:00
Tom Lane
abc98dcc15 When LockAcquire fails at the stage of creating a proclock object, be
sure to clean up the already-created lock object, if it has no other
references.  Avoids possibly-permanent leak of shared memory.
2004-09-12 18:30:50 +00:00
Tom Lane
083258e535 Fix a number of places where brittle data structures or overly strong
Asserts would lead to a server core dump if an error occurred while
trying to abort a failed subtransaction (thereby leading to re-execution
of whatever parts of AbortSubTransaction had already run).  This of course
does not prevent such an error from creating an infinite loop, but at
least we don't make the situation worse.  Responds to an open item on
the subtransactions to-do list.
2004-09-06 23:33:48 +00:00
Tom Lane
23645f0582 Fix incorrect ordering of smgr cleanup relative to buffer pin cleanup
during transaction abort.  Add a regression test case to catch related
mistakes in future.  Alvaro Herrera and Tom Lane.
2004-09-06 17:56:33 +00:00
Tom Lane
eb917c1a21 I can't see any good reason for DropRelFileNodeBuffers to be issuing
FATAL when it detects a nonzero reference count.  Reduce to ERROR.
2004-09-06 17:31:32 +00:00
Tom Lane
a421b4e850 FlushRelationBuffers was also being a bit cavalier about whether the
relation is already opened by smgr.
2004-08-31 16:13:06 +00:00
Tom Lane
332ee2dc41 Improve spinlock selftest to make it able to detect misdeclaration of
the slock_t datatype (ie, declared type smaller than what the hardware
TAS instruction needs).
2004-08-30 23:47:20 +00:00
Tom Lane
303e46ea93 Tweak md.c logic to cope with the situation where WAL replay tries to
write into a high-numbered segment of a relation that was later deleted.
We need to temporarily recreate missing segment files, instead of
failing.
2004-08-30 03:52:43 +00:00
Bruce Momjian
15d3f9f6b7 Another pgindent run with lib typedefs added. 2004-08-30 02:54:42 +00:00
Bruce Momjian
b6b71b85bc Pgindent run for 8.0. 2004-08-29 05:07:03 +00:00
Bruce Momjian
da9a8649d8 Update copyright to 2004. 2004-08-29 04:13:13 +00:00
Tom Lane
1785acebf2 Introduce local hash table for lock state, as per recent proposal.
PROCLOCK structs in shared memory now have only a bitmask for held
locks, rather than counts (making them 40 bytes smaller, which is a
good thing).  Multiple locks within a transaction are counted in the
local hash table instead, and we have provision for tracking which
ResourceOwner each count belongs to.  Solves recently reported problem
with memory leakage within long transactions.
2004-08-27 17:07:42 +00:00
Tom Lane
337b513e07 Fix user locks. Broken some time ago for all platforms by Windows-related
changes.
2004-08-26 17:23:30 +00:00
Tom Lane
4dbb880d3c Rearrange pg_subtrans handling as per recent discussion. pg_subtrans
updates are no longer WAL-logged nor even fsync'd; we do not need to,
since after a crash no old pg_subtrans data is needed again.  We truncate
pg_subtrans to RecentGlobalXmin at each checkpoint.  slru.c's API is
refactored a little bit to separate out the necessary decisions.
2004-08-23 23:22:45 +00:00
Tom Lane
f009c316ba Tweak code so that pg_subtrans is never consulted for XIDs older than
RecentXmin (== MyProc->xmin).  This ensures that it will be safe to
truncate pg_subtrans at RecentGlobalXmin, which should largely eliminate
any fear of bloat.  Along the way, eliminate SubTransXidsHaveCommonAncestor,
which isn't really needed and could not give a trustworthy result anyway
under the lookback restriction.
In an unrelated but nearby change, #ifdef out GetUndoRecPtr, which has
been dead code since 2001 and seems unlikely to ever be resurrected.
2004-08-22 02:41:58 +00:00
Tom Lane
1a3de15a3a Dept. of further reflection: I looked around to see if any other callers
of XLogInsert had the same sort of checkpoint interlock problem as
RecordTransactionCommit, and indeed I found some.  Btree index build
and ALTER TABLE SET TABLESPACE write data outside the friendly confines
of the buffer manager, and therefore they have to take their own
responsibility for checkpoint interlock.  The easiest solution seems to
be to force smgrimmedsync at the end of the index build or table copy,
even when the operation is being WAL-logged.  This is sufficient since
the new index or table will be of interest to no one if we don't get
as far as committing the current transaction.
2004-08-15 23:44:46 +00:00
Tom Lane
057ea3471f Xmin calculations should consider only top transaction IDs, and
therefore starting with GetCurrentTransactionId is wrong.  Fixes
miscomputation of RecentGlobalXmin leading to bizarre behavior
reported by Gavin Sherry.
2004-08-15 17:03:36 +00:00
Tom Lane
efcaf1e868 Some mop-up work for savepoints (nested transactions). Store a small
number of active subtransaction XIDs in each backend's PGPROC entry,
and use this to avoid expensive probes into pg_subtrans during
TransactionIdIsInProgress.  Extend EOXactCallback API to allow add-on
modules to get control at subxact start/end.  (This is deliberately
not compatible with the former API, since any uses of that API probably
need manual review anyway.)  Add basic reference documentation for
SAVEPOINT and related commands.  Minor other cleanups to check off some
of the open issues for subtransactions.
Alvaro Herrera and Tom Lane.
2004-08-01 17:32:22 +00:00
Tom Lane
a393fbf937 Restructure error handling as recently discussed. It is now really
possible to trap an error inside a function rather than letting it
propagate out to PostgresMain.  You still have to use AbortCurrentTransaction
to clean up, but at least the error handling itself will cooperate.
2004-07-31 00:45:57 +00:00
Tom Lane
1bf3d61504 Fix subtransaction behavior for large objects, temp namespace, files,
password/group files.  Also allow read-only subtransactions of a read-write
parent, but not vice versa.  These are the reasonably noncontroversial
parts of Alvaro's recent mop-up patch, plus further work on large objects
to minimize use of the TopTransactionResourceOwner.
2004-07-28 14:23:31 +00:00
Tom Lane
cc813fc2b8 Replace nested-BEGIN syntax for subtransactions with spec-compliant
SAVEPOINT/RELEASE/ROLLBACK-TO syntax.  (Alvaro)
Cause COMMIT of a failed transaction to report ROLLBACK instead of
COMMIT in its command tag.  (Tom)
Fix a few loose ends in the nested-transactions stuff.
2004-07-27 05:11:48 +00:00
Tom Lane
2042b3428d Invent WAL timelines, as per recent discussion, to make point-in-time
recovery more manageable.  Also, undo recent change to add FILE_HEADER
and WASTED_SPACE records to XLOG; instead make the XLOG page header
variable-size with extra fields in the first page of an XLOG file.
This should fix the boundary-case bugs observed by Mark Kirkwood.
initdb forced due to change of XLOG representation.
2004-07-21 22:31:26 +00:00
Tom Lane
fe548629c5 Invent ResourceOwner mechanism as per my recent proposal, and use it to
keep track of portal-related resources separately from transaction-related
resources.  This allows cursors to work in a somewhat sane fashion with
nested transactions.  For now, cursor behavior is non-subtransactional,
that is a cursor's state does not roll back if you abort a subtransaction
that fetched from the cursor.  We might want to change that later.
2004-07-17 03:32:14 +00:00
Tom Lane
8801110b20 Move TablespaceCreateDbspace() call into smgrcreate(), which is where it
probably should have been to begin with; this is to cover cases like
needing to recreate the per-db directory during WAL replay.
Also, fix heap_create to force pg_class.reltablespace to be zero instead
of the database's default tablespace; this makes the world safe for
CREATE DATABASE to handle all tables in the default tablespace alike,
as per previous discussion.  And force pg_class.reltablespace to zero
when creating a relation without physical storage (eg, a view); this
avoids possibly having dangling references in this column after a
subsequent DROP TABLESPACE.
2004-07-11 19:52:52 +00:00
Tom Lane
77a436ba55 Fix seriously nasty memory leak in new TransactionIdIsInProgress code. 2004-07-01 03:13:05 +00:00
Tom Lane
573a71a5da Nested transactions. There is still much left to do, especially on the
performance front, but with feature freeze upon us I think it's time to
drive a stake in the ground and say that this will be in 7.5.

Alvaro Herrera, with some help from Tom Lane.
2004-07-01 00:52:04 +00:00
Tom Lane
c1d9dec3e3 Looks like s_lock_test needs <time.h> on some platforms. 2004-06-19 20:31:55 +00:00
Tom Lane
1232878159 s_lock_test requires libpgport to build now. 2004-06-19 19:43:11 +00:00
Tom Lane
2467394ee1 Tablespaces. Alternate database locations are dead, long live tablespaces.
There are various things left to do: contrib dbsize and oid2name modules
need work, and so does the documentation.  Also someone should think about
COMMENT ON TABLESPACE and maybe RENAME TABLESPACE.  Also initlocation is
dead, it just doesn't know it yet.

Gavin Sherry and Tom Lane.
2004-06-18 06:14:31 +00:00
Tom Lane
bbf0ebadaf StrategyDirtyBufferList wasn't being careful to honor max_buffers limit.
Bug is only latent given that sole caller is passing NBuffers, but it
could bite someone in the rear someday.
2004-06-11 17:20:39 +00:00
Tom Lane
e6cba71503 Add some code to Assert that when we release pin on a buffer, we are
not holding the buffer's cntx_lock or io_in_progress_lock.  A recent
report from Litao Wu makes me wonder whether it is ever possible for
us to drop a buffer and forget to release its cntx_lock.  The Assert
does not fire in the regression tests, but that proves little ...
2004-06-11 16:43:24 +00:00
Bruce Momjian
a1ccbb9019 Previous code cleanup was for bufpage.c, not bufmgr.c.
This cleanup just cleans up a comment.
2004-06-09 13:11:34 +00:00
Bruce Momjian
ce04221a1e Stylistic changes in bufmgr.c
Basically replaces (*a).b with a->b as it is everywhere else in
	Postgres.

Manfred Koizar
2004-06-08 14:00:35 +00:00
Tom Lane
c3a153afed Tweak palloc/repalloc to allow zero bytes to be requested, as per recent
proposal.  Eliminate several dozen now-unnecessary hacks to avoid palloc(0).
(It's likely there are more that I didn't find.)
2004-06-05 19:48:09 +00:00
Tom Lane
921d749bd4 Adjust our timezone library to use pg_time_t (typedef'd as int64) in
place of time_t, as per prior discussion.  The behavior does not change
on machines without a 64-bit-int type, but on machines with one, which
is most, we are rid of the bizarre boundary behavior at the edges of
the 32-bit-time_t range (1901 and 2038).  The system will now treat
times over the full supported timestamp range as being in your local
time zone.  It may seem a little bizarre to consider that times in
4000 BC are PST or EST, but this is surely at least as reasonable as
propagating Gregorian calendar rules back that far.

I did not modify the format of the zic timezone database files, which
means that for the moment the system will not know about daylight-savings
periods outside the range 1901-2038.  Given the way the files are set up,
it's not a simple decision like 'widen to 64 bits'; we have to actually
think about the range of years that need to be supported.  We should
probably inquire what the plans of the upstream zic people are before
making any decisions of our own.
2004-06-03 02:08:07 +00:00
Bruce Momjian
e8d9d68ca4 Per previous discussions, here are two functions to send INT and TERM
(cancel and terminate) signals to other backends.   They permit only INT
and TERM, and permits sending only to postgresql backends.

Magnus Hagander
2004-06-02 21:29:29 +00:00
Tom Lane
2095206de1 Adjust btree index build to not use shared buffers, thereby avoiding the
locking conflict against concurrent CHECKPOINT that was discussed a few
weeks ago.  Also, if not using WAL archiving (which is always true ATM
but won't be if PITR makes it into this release), there's no need to
WAL-log the index build process; it's sufficient to force-fsync the
completed index before commit.  This seems to gain about a factor of 2
in my tests, which is consistent with writing half as much data.  I did
not try it with WAL on a separate drive though --- probably the gain would
be a lot less in that scenario.
2004-06-02 17:28:18 +00:00
Tom Lane
91d20ff7aa Additional mop-up for sync-to-fsync changes: avoid issuing fsyncs for
temp tables, and avoid WAL-logging truncations of temp tables.  Do issue
fsync on truncated files (not sure this is necessary but it seems like
a good idea).
2004-05-31 20:31:33 +00:00
Tom Lane
e674707968 Minor code rationalization: FlushRelationBuffers just returns void,
rather than an error code, and does elog(ERROR) not elog(WARNING)
when it detects a problem.  All callers were simply elog(ERROR)'ing on
failure return anyway, and I find it hard to envision a caller that would
not, so we may as well simplify the callers and produce the more useful
error message directly.
2004-05-31 19:24:05 +00:00
Tom Lane
9b178555fc Per previous discussions, get rid of use of sync(2) in favor of
explicitly fsync'ing every (non-temp) file we have written since the
last checkpoint.  In the vast majority of cases, the burden of the
fsyncs should fall on the bgwriter process not on backends.  (To this
end, we assume that an fsync issued by the bgwriter will force out
blocks written to the same file by other processes using other file
descriptors.  Anyone have a problem with that?)  This makes the world
safe for WIN32, which ain't even got sync(2), and really makes the world
safe for Unixen as well, because sync(2) never had the semantics we need:
it offers no way to wait for the requested I/O to finish.

Along the way, fix a bug I recently introduced in xlog recovery:
file truncation replay failed to clear bufmgr buffers for the dropped
blocks, which could result in 'PANIC:  heap_delete_redo: no block'
later on in xlog replay.
2004-05-31 03:48:10 +00:00
Tom Lane
c6719a2784 Implement new PostmasterIsAlive() check for WIN32, per Claudio Natoli.
In passing, align a few error messages with the style guide.
2004-05-30 03:50:15 +00:00
Tom Lane
076a055acf Separate out bgwriter code into a logically separate module, rather
than being random pieces of other files.  Give bgwriter responsibility
for all checkpoint activity (other than a post-recovery checkpoint);
so this child process absorbs the functionality of the former transient
checkpoint and shutdown subprocesses.  While at it, create an actual
include file for postmaster.c, which for some reason never had its own
file before.
2004-05-29 22:48:23 +00:00
Tom Lane
1a321f26d8 Code review for EXEC_BACKEND changes. Reduce the number of #ifdefs by
about a third, make it work on non-Windows platforms again.  (But perhaps
I broke the WIN32 code, since I have no way to test that.)  Fold all the
paths that fork postmaster child processes to go through the single
routine SubPostmasterMain, which takes care of resurrecting the state that
would normally be inherited from the postmaster (including GUC variables).
Clean up some places where there's no particularly good reason for the
EXEC and non-EXEC cases to work differently.  Take care of one or two
FIXMEs that remained in the code.
2004-05-28 05:13:32 +00:00
Tom Lane
ebfc56d3fb Handle impending sinval queue overflow by means of a separate signal
(SIGUSR1, which we have not been using recently) instead of piggybacking
on SIGUSR2-driven NOTIFY processing.  This has several good results:
the processing needed to drain the sinval queue is a lot less than the
processing needed to answer a NOTIFY; there's less contention since we
don't have a bunch of backends all trying to acquire exclusive lock on
pg_listener; backends that are sitting inside a transaction block can
still drain the queue, whereas NOTIFY processing can't run if there's
an open transaction block.  (This last is a fairly serious issue that
I don't think we ever recognized before --- with clients like JDBC that
tend to sit with open transaction blocks, the sinval queue draining
mechanism never really worked as intended, probably resulting in a lot
of useless cache-reset overhead.)  This is the last of several proposed
changes in response to Philip Warner's recent report of sinval-induced
performance problems.
2004-05-23 03:50:45 +00:00
Tom Lane
4af3421161 Get rid of rd_nblocks field in relcache entries. Turns out this was
costing us lots more to maintain than it was worth.  On shared tables
it was of exactly zero benefit because we couldn't trust it to be
up to date.  On temp tables it sometimes saved an lseek, but not often
enough to be worth getting excited about.  And the real problem was that
we forced an lseek on every relcache flush in order to update the field.
So all in all it seems best to lose the complexity.
2004-05-08 19:09:25 +00:00
Neil Conway
0370951347 Tiny assorted fixes: correct a typo in a comment in vacuumlazy.c, remove
some unused #include directives from bufmgr.c, and clarify comments in
bufmgr.h and buf.h
2004-04-25 23:50:58 +00:00
Neil Conway
139abc2896 Make LocalRefCount and PrivateRefCount arrays of int32, rather than long.
This saves a small amount of per-backend memory for LP64 machines.
2004-04-22 07:21:55 +00:00
Tom Lane
95a03e9cdf Another round of code cleanup on bufmgr. Use BM_VALID flag to keep track
of whether we have successfully read data into a buffer; this makes the
error behavior a bit more transparent (IMHO anyway), and also makes it
work correctly for local buffers which don't use Start/TerminateBufferIO.
Collapse three separate functions for writing a shared buffer into one.
This overlaps a bit with cleanups that Neil proposed awhile back, but
seems not to have committed yet.
2004-04-21 18:06:30 +00:00
Tom Lane
011c3e62e7 Code review for ARC patch. Eliminate static variables, improve handling
of VACUUM cases so that VACUUM requests don't affect the ARC state at all,
avoid corner case where BufferSync would uselessly rewrite a buffer that
no longer contains the page that was to be flushed.  Make some minor
other cleanups in and around the bufmgr as well, such as moving PinBuffer
and UnpinBuffer into bufmgr.c where they really belong.
2004-04-19 23:27:17 +00:00
Bruce Momjian
31338352bd * Most changes are to fix warnings issued when compiling win32
* removed a few redundant defines
* get_user_name safe under win32
* rationalized pipe read EOF for win32 (UPDATED PATCH USED)
* changed all backend instances of sleep() to pg_usleep

    - except for the SLEEP_ON_ASSERT in assert.c, as it would exceed a
32-bit long [Note to patcher: If a SLEEP_ON_ASSERT of 2000 seconds is
acceptable, please replace with pg_usleep(2000000000L)]

I added a comment to that part of the code:

    /*
     *  It would be nice to use pg_usleep() here, but only does 2000 sec
     *  or 33 minutes, which seems too short.
     */
    sleep(1000000);

Claudio Natoli
2004-04-19 17:42:59 +00:00
Bruce Momjian
48b2802eee When changing select() calls for delays into pg_usleep(), two comments
in s_lock.c were not updated, and still refers to select. Made my grep
hit the wrong files, so I figured a simple patch was in order.. (other
refs in the same comment block was changed..)

Magnus Hagander
2004-03-23 21:39:46 +00:00
Bruce Momjian
3947f653f9 * postmaster.c: cleanup pmdaemonize under win32; missed failure message
in CreateOptsFile
* s_lock.c: minor comment fix
* findbe.c: variables not used under win32 moved within #ifndef WIN32
case

Claudio Natoli
2004-03-15 16:18:43 +00:00
Bruce Momjian
c672aa823b For application to HEAD, following community review.
* Changes incorrect CYGWIN defines to __CYGWIN__

* Some localtime returns NULL checks (when unchecked cause SEGVs under
Win32
regression tests)

* Rationalized CreateSharedMemoryAndSemaphores and
AttachSharedMemoryAndSemaphores (Bruce, I finally remembered to do it);
requires attention.

Claudio Natoli
2004-02-25 19:41:23 +00:00
Tom Lane
7a57a67278 Replace opendir/closedir calls throughout the backend with AllocateDir
and FreeDir routines modeled on the existing AllocateFile/FreeFile.
Like the latter, these routines will avoid failing on EMFILE/ENFILE
conditions whenever possible, and will prevent leakage of directory
descriptors if an elog() occurs while one is open.
Also, reduce PANIC to ERROR in MoveOfflineLogs() --- this is not
critical code and there is no reason to force a DB restart on failure.
All per recent trouble report from Olivier Hubaut.
2004-02-23 23:03:10 +00:00
Tom Lane
f83356c7f5 Do a direct probe during postmaster startup to determine the maximum
number of openable files and the number already opened.  This eliminates
depending on sysconf(_SC_OPEN_MAX), and allows much saner behavior on
platforms where open-file slots are used up by semaphores.
2004-02-23 20:45:59 +00:00
Bruce Momjian
af3b182a57 Here is a patch that implements setitimer() on win32. With this patch
applied, deadlock detection and statement_timeout now works.

The file timer.c goes into src/backend/port/win32/.

The patch also removes two lines of "printf debugging" accidentally left
in pqsignal.h, in the console control handler.

Magnus Hagander
2004-02-18 16:25:12 +00:00
Tom Lane
da99cce7cd Avoid delaying postmaster shutdown by up to 10 seconds on platforms
where signals do not terminate sleep() delays.
2004-02-12 20:07:26 +00:00
Jan Wieck
fc65a3e1fd Fixed bug where FlushRelationBuffers() did call StrategyInvalidateBuffer()
for already empty buffers because their buffer tag was not cleard out
when the buffers have been invalidated before.

Also removed the misnamed BM_FREE bufhdr flag and replaced the checks,
which effectively ask if the buffer is unpinned, with checks against the
refcount field.

Jan
2004-02-12 15:06:56 +00:00
Tom Lane
c3c09be34b Commit the reasonably uncontroversial parts of J.R. Nield's PITR patch, to
wit: Add a header record to each WAL segment file so that it can be reliably
identified.  Avoid splitting WAL records across segment files (this is not
strictly necessary, but makes it simpler to incorporate the header records).
Make WAL entries for file creation, deletion, and truncation (as foreseen but
never implemented by Vadim).  Also, add support for making XLOG_SEG_SIZE
configurable at compile time, similarly to BLCKSZ.  Fix a couple bugs I
introduced in WAL replay during recent smgr API changes.  initdb is forced
due to changes in pg_control contents.
2004-02-11 22:55:26 +00:00
Tom Lane
58f337a343 Centralize implementation of delay code by creating a pg_usleep()
subroutine in src/port/pgsleep.c.  Remove platform dependencies from
miscadmin.h and put them in port.h where they belong.  Extend recent
vacuum cost-based-delay patch to apply to VACUUM FULL, ANALYZE, and
non-btree index vacuuming.

By the way, where is the documentation for the cost-based-delay patch?
2004-02-10 03:42:45 +00:00
Tom Lane
87bd956385 Restructure smgr API as per recent proposal. smgr no longer depends on
the relcache, and so the notion of 'blind write' is gone.  This should
improve efficiency in bgwriter and background checkpoint processes.
Internal restructuring in md.c to remove the not-very-useful array of
MdfdVec objects --- might as well just use pointers.
Also remove the long-dead 'persistent main memory' storage manager (mm.c),
since it seems quite unlikely to ever get resurrected.
2004-02-10 01:55:27 +00:00
Neil Conway
f06e79525a Win32 signals cleanup. Patch by Magnus Hagander, with input from Claudio
Natoli and Bruce Momjian (and some cosmetic fixes from Neil Conway).
Changes:

    - remove duplicate signal definitions from pqsignal.h

    - replace pqkill() with kill() and redefine kill() in Win32

    - use ereport() in place of fprintf() in some error handling in
      pqsignal.c

    - export pg_queue_signal() and make use of it where necessary

    - add a console control handler for Ctrl-C and similar handling
      on Win32

    - do WaitForSingleObjectEx() in CHECK_FOR_INTERRUPTS() on Win32;
      query cancelling should now work on Win32

    - various other fixes and cleanups
2004-02-08 22:28:57 +00:00
Jan Wieck
f425b605f4 Cost based vacuum delay feature.
Jan
2004-02-06 19:36:18 +00:00
Jan Wieck
8d09e25693 Backing out the background writer sync() option.
Jan
2004-02-04 01:24:53 +00:00
Bruce Momjian
5ee2ae2049 Remove sleep() and use single PG_SLEEP call for Win32 signal handling
and consistency.

Change PG_USLEEP to use SleepEx() for signal interuptability.
2004-01-30 15:57:04 +00:00
Bruce Momjian
50491963cb Here's the latest win32 signals code, this time in the form of a patch
against the latest shapshot. It also includes the replacement of kill()
with pqkill() and sigsetmask() with pqsigsetmask().

Passes all tests fine on my linux machine once applied. Still doesn't
link completely on Win32 - there are a few things still required. But
much closer than before.

At Bruce's request, I'm goint to write up a README file about the method
of signals delivery chosen and why the others were rejected (basically a
summary of the mailinglist discussions). I'll finish that up once/if the
patch is accepted.


Magnus Hagander
2004-01-27 00:45:26 +00:00
Bruce Momjian
eec08b95e7 [all] Removed call to getppid in SendPostmasterSignal, replacing with a
PostmasterPid variable, which gets set (early) in PostmasterMain
getppid would not be the postmaster?

[fork/exec] Implements processCancelRequest by keeping an array of

pid/cancel_key structs in shared mem

[fork/exec] Moves AttachSharedMemoryAndSemaphores call for backends into
SubPostmasterMain

[win32] Implements reaper/waitpid by keeping an arrays of children
pids,handles in postmaster local mem
      - this item is largely untested, for reasons which should be
obvious, but appears sound

[win32/all] Added extern for pgpipe in Win32 case, and changed the second
pipe call (which seems to have been missed earlier) to pgpipe

[win32] #define'd ftruncate to chsize in the Win32 case

[win32] PG_USLEEP for Win32 has a misplaced paren. Fixed.

[win32] DLLIMPORT handling for MingW case


Claudio Natoli
2004-01-26 22:59:54 +00:00
Bruce Momjian
ede3b762a3 Back out win32 patch so we can apply it separately. 2004-01-26 22:54:58 +00:00
Bruce Momjian
f4921e5ca3 Attached is a patch that fixes some trivial typos and alignment. Please
apply.

Alvaro Herrera
2004-01-26 22:51:56 +00:00
Tom Lane
c77f363384 Ensure that close() and fclose() are checked for errors, at least in
cases involving writes.  Per recent discussion about the possibility
of close-time failures on some filesystems.  There is a TODO item for
this, too.
2004-01-26 22:35:32 +00:00
Jan Wieck
d77b63b17c Added GUC variable bgwriter_flush_method controlling the action
done by the background writer between writing dirty blocks and
napping.

    none (default)   no action
	sync             bgwriter calls smgrsync() causing a sync(2)

A global sync() is only good on dedicated database servers, so
more flush methods should be added in the future.

Jan
2004-01-24 20:00:46 +00:00
Jan Wieck
dfdd59e918 Adjusted calculation of shared memory requirements to new
ARC buffer replacement strategy.

Jan
2004-01-15 16:14:26 +00:00
Bruce Momjian
4cdf51e646 Drops in the CreateProcess calls for Win32 (essentially wrapping up the
fork/exec portion of the port), and fixes a handful of whitespace issues

Claudio Natoli
2004-01-11 03:49:31 +00:00
Bruce Momjian
38081fd000 Change PG_DELAY from msec to usec and use it consistenly rather than
select().   Add Win32 Sleep() for delay.
2004-01-09 21:08:50 +00:00
Neil Conway
192ad63bd7 More janitorial work: remove the explicit casting of NULL literals to a
pointer type when it is not necessary to do so.

For future reference, casting NULL to a pointer type is only necessary
when (a) invoking a function AND either (b) the function has no prototype
OR (c) the function is a varargs function.
2004-01-07 18:56:30 +00:00
Neil Conway
dfc7e7b71d Code cleanup, mostly in the smgr:
- Update comment in IsReservedName() to the present day

     - Improve some variable & function names in commands/vacuum.c. I
       was planning to rewrite this to avoid lappend(), but since I
       still intend to do the list rewrite, there's no need for that.

     - Update some smgr comments which seemed to imply that we still
       forced all dirty pages to disk at commit-time.

     - Replace some #ifdef DIAGNOSTIC code with assertions.

     - Make the distinction between OS-level file descriptors and
       virtual file descriptors a little clearer in a few comments

     - Other minor comment improvements in the smgr code
2004-01-06 18:07:32 +00:00
Tom Lane
e8aa10ee47 ShmemInitHash forgot to specify HASH_ALLOC flag bit in its hash_create
call.  You'd think this would cause some problems, but because of the
way hash_create is coded, the only side-effect was creation of a useless
memory context for the hashtable.
2003-12-30 00:03:03 +00:00
Tom Lane
f8eed65dfb Improve spinlock code for recent x86 processors: insert a PAUSE
instruction in the s_lock() wait loop, and use test before test-and-set
in TAS() macro to avoid unnecessary bus traffic.  Patch from Manfred
Spraul, reworked a bit by Tom.
2003-12-27 20:58:58 +00:00
Bruce Momjian
aeddc2a60d Continued rearrangement to permit pgstat + BootstrapMain processes to be
fork/exec'd, in the same mode as the previous patch for backends.

Claudio Natoli
2003-12-25 03:52:51 +00:00
Tom Lane
afb09b5a31 Use inlined TAS() on PA-RISC, if we are compiling with gcc.
Patch inspired by original submission from ViSolve.
2003-12-23 22:15:07 +00:00
Tom Lane
9adaf64da3 Mop-up for HAS_TEST_AND_SET refactoring. Un-break two or three platforms
that were broken, try to make layout of s_lock.h entries consistent,
use HAVE_SPINLOCKS in preference to HAS_TEST_AND_SET everywhere outside
s_lock.h itself.
2003-12-23 18:13:17 +00:00
Bruce Momjian
69f2e9b0fc Move slock_t typdefs into s_lock.h from include/port files for
centralization and easier maintanence.
2003-12-23 03:31:30 +00:00
Bruce Momjian
887b5a7be0 Remove NEED_I386_TAS_ASM and just test for compiler defines. 2003-12-23 00:32:06 +00:00
Bruce Momjian
9114cb1c5f This applied patch remove NEED_SPARC_TAS_ASM and instead uses __sparc ||
__sparc__.
2003-12-22 23:39:53 +00:00
Bruce Momjian
ced30eb857 [ This description should have been on the earlier fork/exec
commit, but I am adding it now so it is in CVS.]

The patch basically is a slight rearrangement of the code to allow
fork/exec on Unix, with the ultimate goal of doing CreateProcess on
Win32.  The changes are:

        o  Write out postmaster global variables and per-backend
variables to be read by the exec'ed backend

        o  Mark some static variables as global when exec is used so
then can be dumped from postmaster.c, marked NON_EXEC_STATIC

        o  Remove value passing with -p now that we have per-backend
file

        o  Move some pointer storage out of shared memory for easier
dumping.

        o  Modified pgsql_temp directory cleanup to handle per-database
directories and the backend exec directory under datadir.


Claudio Natoli
2003-12-21 04:30:10 +00:00
Tom Lane
772d0f9345 The recent DUMMY_PROCS patch broke accounting for the number of semaphores
needed.  This caused us to fail all the time on Darwin, and we'd fail for
some values of maxBackends on SysV-sema platforms, too.
2003-12-21 00:33:33 +00:00
Tom Lane
16cc9dff4f bufmgr.c failed to compile on Darwin, because it didn't include
<sys/time.h> where struct timeval is defined.
2003-12-20 22:18:02 +00:00
Bruce Momjian
d75b2ec4eb This patch is the next step towards (re)allowing fork/exec.
Claudio Natoli
2003-12-20 17:31:21 +00:00
Neil Conway
fef0c8345a I posted some bufmgr cleanup a few weeks ago, but it conflicted with
some concurrent changes Jan was making to the bufmgr. Here's an
updated version of the patch -- it should apply cleanly to CVS
HEAD and passes the regression tests.

This patch makes the following changes:

     - remove the UnlockAndReleaseBuffer() and UnlockAndWriteBuffer()
       macros, and replace uses of them with calls to the appropriate
       functions.

     - remove a bunch of #ifdef BMTRACE code: it is ugly & broken
       (i.e. it doesn't compile)

     - make BufferReplace() return a bool, not an int

     - cleanup some logic in bufmgr.c; should be functionality
       equivalent to the previous code, just cleaner now

     - remove the BM_PRIVATE flag as it is unused

     - improve a few comments, etc.
2003-12-14 00:34:47 +00:00
Peter Eisentraut
2afacfc403 This patch properly sets the prototype for the on_shmem_exit and
on_proc_exit functions, and adjust all other related code to use
the proper types too.

by Kurt Roeckx
2003-12-12 18:45:10 +00:00
Bruce Momjian
e7ca867485 Try to reduce confusion about what is a lock method identifier, a lock
method control structure, or a table of control structures.

. Use type LOCKMASK where an int is not a counter.

. Get rid of INVALID_TABLEID, use INVALID_LOCKMETHOD instead.

. Use INVALID_LOCKMETHOD instead of (LOCKMETHOD) NULL, because
  LOCKMETHOD is not a pointer.

. Define and use macro LockMethodIsValid.

. Rename LOCKMETHOD to LOCKMETHODID.

. Remove global variable LongTermTableId in lmgr.c, because it is
  never used.

. Make LockTableId static in lmgr.c, because it is used nowhere else.
  Why not remove it and use DEFAULT_LOCKMETHOD?

. Rename the lock method control structure from LOCKMETHODTABLE to
  LockMethodData.  Introduce a pointer type named LockMethod.

. Remove elog(FATAL) after InitLockTable() call in
  CreateSharedMemoryAndSemaphores(), because if something goes wrong,
  there is elog(FATAL) in LockMethodTableInit(), and if this doesn't
  help, an elog(ERROR) in InitLockTable() is promoted to FATAL.

. Make InitLockTable() void, because its only caller does not use its
  return value any more.

. Rename variables in lock.c to avoid statements like
        LockMethodTable[NumLockMethods] = lockMethodTable;
        lockMethodTable = LockMethodTable[lockmethod];

. Change LOCKMETHODID type to uint16 to fit into struct LOCKTAG.

. Remove static variables BITS_OFF and BITS_ON from lock.c, because
  I agree to this doubt:
 * XXX is a fetch from a static array really faster than a shift?

. Define and use macros LOCKBIT_ON/OFF.


Manfred Koizar
2003-12-01 21:59:25 +00:00
Tom Lane
0902ece5b9 Force zero_damaged_pages to be effectively ON during recovery from WAL,
since there is no need to worry about damaged pages when we are going to
overwrite them anyway from the WAL.  Per recent discussion.
2003-12-01 16:53:19 +00:00
PostgreSQL Daemon
969685ad44 $Header: -> $PostgreSQL Changes ... 2003-11-29 19:52:15 +00:00
Peter Eisentraut
c9190ef074 Conditionalize variable that is only used conditionally, to avoid warning. 2003-11-27 18:12:50 +00:00
Tom Lane
9ea738827c Second try at fixing no-room-to-move-down PANIC in compact_fsm_storage.
Ward's report that it can still happen in RC2 forces me to realize that
this is not a can't-happen condition after all, and that the compaction
code had better cope rather than panicking.
2003-11-26 20:50:11 +00:00
Tom Lane
42ce74bf17 COMMENT ON casts, conversions, languages, operator classes, and
large objects.  Dump all these in pg_dump; also add code to pg_dump
user-defined conversions.  Make psql's large object code rely on
the backend for inserting/deleting LOB comments, instead of trying to
hack pg_description directly.  Documentation and regression tests added.

Christopher Kings-Lynne, code reviewed by Tom
2003-11-21 22:32:49 +00:00
Tom Lane
0a97cb37fc Remove unused variable. 2003-11-21 17:41:31 +00:00
Jan Wieck
cfeca62148 Background writer process
This first part of the background writer does no syncing at all.
It's only purpose is to keep the LRU heads clean so that regular
backends seldom to never have to call write().

Jan
2003-11-19 15:55:08 +00:00
Jan Wieck
1f45555892 Changed parameter name for shared cache status report interval to
debug_shared_buffers = <seconds>

as per previous discussion.


Jan
2003-11-16 16:41:01 +00:00
Jan Wieck
7c360d65a8 Added documentation for the new interface between the buffer manager
and the cache replacement strategy as well as a description of the
ARC algorithm and the special tailoring of that done for PostgreSQL.

Jan
2003-11-14 04:32:11 +00:00
Jan Wieck
6b86d62b00 2nd try for the ARC strategy.
I added a couple more Assertions while tracking down the exact
cause of the former bug.

All 93 regression tests pass now.

Jan
2003-11-13 14:57:15 +00:00
Jan Wieck
923e994d79 ARC strategy backed out ... sorry
Jan
2003-11-13 05:34:58 +00:00
Jan Wieck
48adc0b34b Replacement of the buffer replacement strategy with an ARC
algorithm adopted for PostgreSQL.

Jan
2003-11-13 00:40:02 +00:00
Tom Lane
fa5c8a055a Cross-data-type comparisons are now indexable by btrees, pursuant to my
pghackers proposal of 8-Nov.  All the existing cross-type comparison
operators (int2/int4/int8 and float4/float8) have appropriate support.
The original proposal of storing the right-hand-side datatype as part of
the primary key for pg_amop and pg_amproc got modified a bit in the event;
it is easier to store zero as the 'default' case and only store a nonzero
when the operator is actually cross-type.  Along the way, remove the
long-since-defunct bigbox_ops operator class.
2003-11-12 21:15:59 +00:00
Tom Lane
c1d62bfd00 Add operator strategy and comparison-value datatype fields to ScanKey.
Remove the 'strategy map' code, which was a large amount of mechanism
that no longer had any use except reverse-mapping from procedure OID to
strategy number.  Passing the strategy number to the index AM in the
first place is simpler and faster.
This is a preliminary step in planned support for cross-datatype index
operations.  I'm committing it now since the ScanKeyEntryInitialize()
API change touches quite a lot of files, and I want to commit those
changes before the tree drifts under me.
2003-11-09 21:30:38 +00:00
Tom Lane
4240d2bffd Update future-tense comments in README to present tense. Noted by
Neil Conway.
2003-10-31 22:48:08 +00:00
Tom Lane
abec4cbf1f compact_fsm_storage() does need to handle the case where a relation's
FSM data has to be both moved down and compressed.  Per report from
Dror Matalon.
2003-10-29 17:36:57 +00:00
Tom Lane
624292aa35 Ensure that all places that are complaining about exhaustion of shared
memory say 'out of shared memory'; some were doing that and some just
said 'out of memory'.  Also add a HINT about increasing max_locks_per_transaction
where relevant, per suggestion from Sean Chittenden.  (The former change
does not break the strings freeze; the latter does, but I think it's
worth doing anyway.)
2003-10-16 20:59:35 +00:00
Bruce Momjian
7fb9893f42 Back out -fstrict-aliasing void* casting. 2003-10-11 18:04:26 +00:00
Bruce Momjian
d51368dbbd This patch will stop gcc from issuing warnings about type-punned objects
when -fstrict-aliasing is turned on, as it is in the latest gcc when you
use -O2

Andrew Dunstan
2003-10-11 16:30:55 +00:00
Tom Lane
55d85f42a8 Repair RI trigger visibility problems (this time for sure ;-)) per recent
discussion on pgsql-hackers: in READ COMMITTED mode we just have to force
a QuerySnapshot update in the trigger, but in SERIALIZABLE mode we have
to run the scan under a current snapshot and then complain if any rows
would be updated/deleted that are not visible in the transaction snapshot.
2003-10-01 21:30:53 +00:00
Peter Eisentraut
7438af96fa More message editing, some suggested by Alvaro Herrera 2003-09-29 00:05:25 +00:00
Peter Eisentraut
feb4f44d29 Message editing: remove gratuitous variations in message wording, standardize
terms, add some clarifications, fix some untranslatable attempts at dynamic
message building.
2003-09-25 06:58:07 +00:00
Tom Lane
a56a016ceb Repair some REINDEX problems per recent discussions. The relcache is
now able to cope with assigning new relfilenode values to nailed-in-cache
indexes, so they can be reindexed using the fully crash-safe method.  This
leaves only shared system indexes as special cases.  Remove the 'index
deactivation' code, since it provides no useful protection in the shared-
index case.  Require reindexing of shared indexes to be done in standalone
mode, but remove other restrictions on REINDEX.  -P (IgnoreSystemIndexes)
now prevents using indexes for lookups, but does not disable index updates.
It is therefore safe to allow from PGOPTIONS.  Upshot: reindexing system catalogs
can be done without a standalone backend for all cases except
shared catalogs.
2003-09-24 18:54:02 +00:00
Tom Lane
5aa29e88e9 Arrange to align shared disk buffers on at least 32-byte boundaries,
not just MAXALIGN boundaries.  This makes a noticeable difference in
the speed of transfers to and from kernel space, at least on recent
Pentiums, and might help other CPUs too.  We should look at making
this happen for local buffers and buffile.c too.  Patch from Manfred Spraul.
2003-09-21 17:57:21 +00:00
Bruce Momjian
aaafbdcfd3 Fix old mention of exec() in AttachSharedMemoryAndSemaphores comment. 2003-09-12 02:13:23 +00:00
Tom Lane
7a3693716d Reimplement hash index locking algorithms, per my recent proposal to
pghackers.  This fixes the problem recently reported by Markus KrÌutner
(hash bucket split corrupts the state of scans being done concurrently),
and I believe it also fixes all the known problems with deadlocks in
hash index operations.  Hash indexes are still not really ready for prime
time (since they aren't WAL-logged), but this is a step forward.
2003-09-04 22:06:27 +00:00
Tom Lane
de9c553f6b Clean up locktable init code per recent gripe from Kurt Roeckx.
No change in behavior, but old code would have failed to detect
overrun of MAX_LOCKMODES.
2003-08-17 22:41:12 +00:00
Tom Lane
ffafacc1f6 Repair potential deadlock created by recent changes to recycle btree
index pages: when _bt_getbuf asks the FSM for a free index page, it is
possible (and, in some cases, even moderately likely) that the answer
will be the same page that _bt_split is trying to split.  _bt_getbuf
already knew that the returned page might not be free, but it wasn't
prepared for the possibility that even trying to lock the page could
be problematic.  Fix by doing a conditional rather than unconditional
grab of the page lock.
2003-08-10 19:48:08 +00:00
Bruce Momjian
46785776c4 Another pgindent run with updated typedefs. 2003-08-08 21:42:59 +00:00
Tom Lane
d5f7d2c682 Adopt a random backoff algorithm for sleep delays when waiting for a
spinlock.  Per recent pghackers discussion.
2003-08-06 16:43:43 +00:00
Tom Lane
963c1fa9d3 Minor cleanups in S_LOCK_TEST code. 2003-08-04 15:28:33 +00:00
Bruce Momjian
f3c3deb7d0 Update copyrights to 2003. 2003-08-04 02:40:20 +00:00
Bruce Momjian
089003fb46 pgindent run. 2003-08-04 00:43:34 +00:00
Tom Lane
81b5c8a136 A visit from the message-style police ... 2003-07-28 00:09:16 +00:00
Tom Lane
b556e8200e elog mop-up: bring some straggling fprintf(stderr)'s into the elog world. 2003-07-27 21:49:55 +00:00
Tom Lane
cfa191f3b8 Error message editing in backend/storage. 2003-07-24 22:04:15 +00:00
Bruce Momjian
acd1536d9f Up to now, SerializableSnapshot and QuerySnapshot are malloc'ed and
free'd for every transaction or statement, respectively.  This patch
puts these data structures into static memory, thus saving a few CPU
cycles and two malloc calls per transaction or (in isolation level
READ COMMITTED) per query.

Manfred Koizar
2003-06-12 01:42:21 +00:00
Bruce Momjian
0abe7431c6 This patch extracts page buffer pooling and the simple
least-recently-used strategy from clog.c into slru.c.  It doesn't
change any visible behaviour and passes all regression tests plus a
TruncateCLOG test done manually.

Apart from refactoring I made a little change to SlruRecentlyUsed,
formerly ClogRecentlyUsed:  It now skips incrementing lru_counts, if
slotno is already the LRU slot, thus saving a few CPU cycles.  To make
this work, lru_counts are initialised to 1 in SimpleLruInit.

SimpleLru will be used by pg_subtrans (part of the nested transactions
project), so the main purpose of this patch is to avoid future code
duplication.

Manfred Koizar
2003-06-11 22:37:46 +00:00
Bruce Momjian
98b6f37e47 Make debug_ GUC varables output DEBUG1 rather than LOG, and mention in
docs that CLIENT/LOG_MIN_MESSAGES now controls debug_* output location.
Doc changes included.
2003-05-27 17:49:47 +00:00
Bruce Momjian
12c9423832 Allow Win32 to compile under MinGW. Major changes are:
Win32 port is now called 'win32' rather than 'win'
        add -lwsock32 on Win32
        make gethostname() be only used when kerberos4 is enabled
        use /port/getopt.c
        new /port/opendir.c routines
        disable GUC unix_socket_group on Win32
        convert some keywords.c symbols to KEYWORD_P to prevent conflict
        create new FCNTL_NONBLOCK macro to turn off socket blocking
        create new /include/port.h file that has /port prototypes, move
          out of c.h
        new /include/port/win32_include dir to hold missing include files
        work around ERROR being defined in Win32 includes
2003-05-15 16:35:30 +00:00
Tom Lane
a4e775a263 Make use of new error context stack mechanism to allow random errors
detected during buffer dump to be labeled with the buffer location.
For example, if a page LSN is clobbered, we now produce something like
ERROR:  XLogFlush: request 2C000000/8468EC8 is not satisfied --- flushed only
to 0/8468EF0
CONTEXT:  writing block 0 of relation 428946/566240
whereas before there was no convenient way to find out which page had
been trashed.
2003-05-10 19:04:30 +00:00
Bruce Momjian
d9fd7d12f6 Pass shared memory id and socket descriptor number on command line for
fork/exec.
2003-05-06 23:34:56 +00:00
Bruce Momjian
a7fd03e1de Handle clog structure in shared memory in exec() case, for Win32. 2003-05-03 03:52:07 +00:00
Bruce Momjian
a2e038fbee Back out last commit --- wrong patch. 2003-05-02 21:59:31 +00:00
Bruce Momjian
fb1f7ccec5 Dump/read non-default GUC values for use by exec'ed backends, for Win32. 2003-05-02 21:52:42 +00:00
Tom Lane
4a5f38c4e6 Code review for holdable-cursors patch. Fix error recovery, memory
context sloppiness, some other things.  Includes Neil's mopup patch
of 22-Apr.
2003-04-29 03:21:30 +00:00
Tom Lane
f9ba0a7fe5 Apple's assembler likes the inlined TAS syntax too, so no reason to
maintain a separate out-of-line version of PPC tas() anymore.
Also fix S_UNLOCK for __powerpc64__ platforms.
2003-04-20 21:54:34 +00:00
Bruce Momjian
d46e643822 Add Win32 path handling for / vs. \ and drive letters. 2003-04-04 20:42:13 +00:00
Tom Lane
7cd30e1590 TestConfiguration returns int, not bool. This mistake is relatively
harmless on signed-char machines but would lead to core dump in the
deadlock detection code if char is unsigned.  Amazingly, this bug has
been here since 7.1 and yet wasn't reported till now.  Thanks to Robert
Bruccoleri for providing the opportunity to track it down.
2003-03-31 20:32:29 +00:00
Tom Lane
fd42262836 Add code to apply some simple sanity checks to the header fields of a
page when it's read in, per pghackers discussion around 17-Feb.  Add a
GUC variable zero_damaged_pages that causes the response to be a WARNING
followed by zeroing the page, rather than the normal ERROR; this is per
Hiroshi's suggestion that there needs to be a way to get at the data
in the rest of the table.
2003-03-28 20:17:13 +00:00
Bruce Momjian
54f7338fa1 This patch implements holdable cursors, following the proposal
(materialization into a tuple store) discussed on pgsql-hackers earlier.
I've updated the documentation and the regression tests.

Notes on the implementation:

- I needed to change the tuple store API slightly -- it assumes that it
won't be used to hold data across transaction boundaries, so the temp
files that it uses for on-disk storage are automatically reclaimed at
end-of-transaction. I added a flag to tuplestore_begin_heap() to control
this behavior. Is changing the tuple store API in this fashion OK?

- in order to store executor results in a tuple store, I added a new
CommandDest. This works well for the most part, with one exception: the
current DestFunction API doesn't provide enough information to allow the
Executor to store results into an arbitrary tuple store (where the
particular tuple store to use is chosen by the call site of
ExecutorRun). To workaround this, I've temporarily hacked up a solution
that works, but is not ideal: since the receiveTuple DestFunction is
passed the portal name, we can use that to lookup the Portal data
structure for the cursor and then use that to get at the tuple store the
Portal is using. This unnecessarily ties the Portal code with the
tupleReceiver code, but it works...

The proper fix for this is probably to change the DestFunction API --
Tom suggested passing the full QueryDesc to the receiveTuple function.
In that case, callers of ExecutorRun could "subclass" QueryDesc to add
any additional fields that their particular CommandDest needed to get
access to. This approach would work, but I'd like to think about it for
a little bit longer before deciding which route to go. In the mean time,
the code works fine, so I don't think a fix is urgent.

- (semi-related) I added a NO SCROLL keyword to DECLARE CURSOR, and
adjusted the behavior of SCROLL in accordance with the discussion on
-hackers.

- (unrelated) Cleaned up some SGML markup in sql.sgml, copy.sgml

Neil Conway
2003-03-27 16:51:29 +00:00
Tom Lane
4b6c198a6a Add code to dump contents of free space map into $PGDATA/global/pg_fsm.cache
at database shutdown, and then load it again at database startup.  This
preserves our hard-won knowledge of free space across restarts (given
an orderly shutdown, that is).
2003-03-06 00:04:27 +00:00
Tom Lane
391eb5e5b6 Reimplement free-space-map management as per recent discussions.
Adjustable threshold is gone in favor of keeping track of total requested
page storage and doling out proportional fractions to each relation
(with a minimum amount per relation, and some quantization of the results
to avoid thrashing with small changes in page counts).  Provide special-
case code for indexes so as not to waste space storing useless page
free space counts.  Restructure internal data storage to be a flat array
instead of list-of-chunks; this may cost a little more work in data
copying when reorganizing, but allows binary search to be used during
lookup_fsm_page_entry().
2003-03-04 21:51:22 +00:00
Tom Lane
61b22d3aab btree page recycling can be done as soon as page's next-xact label is
older than current Xmin; we don't have to wait till it's older than
GlobalXmin.
2003-02-23 23:20:52 +00:00
Tom Lane
88dc31e3f2 First cut at recycling space in btree indexes. Still some rough edges
to fix, but it seems to basically work...
2003-02-23 06:17:13 +00:00
Bruce Momjian
69c049cef4 Back out LOCKTAG changes by Rod Taylor, pending code review. Sorry. 2003-02-19 23:41:15 +00:00
Bruce Momjian
d0f3a7e9c4 - Modifies LOCKTAG to include a 'classId'. Relation receive a classId of
RelOid_pg_class, and transaction locks XactLockTableId. RelId is renamed
to objId.

- LockObject() and UnlockObject() functions created, and their use
sprinkled throughout the code to do descent locking for domains and
types. They accept lock modes AccessShare and AccessExclusive, as we
only really need a 'read' and 'write' lock at the moment.  Most locking
cases are held until the end of the transaction.

This fixes the cases Tom mentioned earlier in regards to locking with
Domains.  If the patch is good, I'll work on cleaning up issues with
other database objects that have this problem (most of them).

Rod Taylor
2003-02-19 04:02:54 +00:00
Bruce Momjian
9ee8e7a39e Update README. 2003-02-18 03:33:50 +00:00
Bruce Momjian
32cc6cbe23 Rename 'holder' references to 'proclock' for PROCLOCK references, for
consistency.
2003-02-18 02:13:24 +00:00
Bruce Momjian
48ee6f4916 This trivial patch removes the usage of some old statistics code that no
longer works -- IncrHeapAccessStat() didn't actually *do* anything
anymore, so no reason to keep it around AFAICS. I also fixed a
grammatical error in a comment.


Neil Conway
2003-02-13 05:35:11 +00:00
Tom Lane
227a404cf4 Add code to print information about a detected deadlock cycle. The
printed data is comparable to what you could read in the pg_locks view,
were you fortunate enough to have been looking at it at the right time.
2003-01-16 21:01:45 +00:00
Bruce Momjian
ef581f0552 Rewrite for-loop, because this is not the Obfuscated C Code Contest.
Manfred Koizar
2003-01-11 05:01:03 +00:00
Tom Lane
9f1f2bfb66 Fix various places where global s/NOTICE/WARNING/ was applied with too
much enthusiasm.
2003-01-07 22:23:17 +00:00
Tom Lane
973a210cce Tweak mdnblocks() to avoid doing lseek() on segments that it has
previously determined not to be the last segment of a relation.
This reduces the expected cost to one seek, rather than one seek per
segment.  We can get away with this because truncation of a relation
will cause a relcache flush and so the md.c file descriptor will be
closed; when it is re-opened we will re-determine the last segment.
2003-01-07 01:19:12 +00:00
Tom Lane
a2e8e15dd4 localbuf.c must be able to do blind writes. 2002-12-05 22:48:03 +00:00
Tom Lane
8362be35e8 Code review for superuser_reserved_connections patch. Don't try to do
database access outside a transaction; revert bogus performance improvement
in SIBackendInit(); improve comments; add documentation (this part courtesy
Neil Conway).
2002-11-21 06:36:08 +00:00
Tom Lane
6929a1e6ad Improve comment: add note that grotty special case in mdread() is
required by hash index implementation.
2002-11-12 15:26:30 +00:00
Bruce Momjian
ceb4f5ea9c > > I'll re-check that with the ppc architecture guy here.
>
> ... he is now about to write an inlined version that can go into
> s_lock.h . I'll send the new patch later on...

OK, here it comes:

An inlined version of tas(), that works for both, powerpc and
powerpc64. The patch is against 7.3b5 and passes the test suite on
both architectures.

Reinhard Max
2002-11-10 00:33:43 +00:00
Tom Lane
643dfb783d Fix some bogus comments. 2002-11-01 00:40:23 +00:00
Tom Lane
55e4ef138c Code review for statement_timeout patch. Fix some race conditions
between signal handler and enable/disable code, avoid accumulation of
timing error due to trying to maintain remaining-time instead of
absolute-end-time, disable timeout before commit not after.
2002-10-31 21:34:17 +00:00
Tom Lane
edf497dec9 Avoid palloc(0) when MaxBackends = 1. 2002-10-03 19:17:55 +00:00
Bruce Momjian
5ad4faf13a This patch removes a use of uninitialized memory in lmgr/lock.c, by
adding a missing sprintf().

Neil Conway
2002-09-26 05:18:30 +00:00
Tom Lane
8a6fab412e Remove ShutdownBufferPoolAccess exit callback, and do the work in
ProcKill instead, where we still have a PGPROC with which to wait on
LWLocks.  This fixes 'can't wait without a PROC structure' failures
occasionally seen during backend shutdown (I'm surprised they weren't
more frequent, actually).  Add an Assert() to LWLockAcquire to help
catch any similar mistakes in future.  Fix failure to update MyProcPid
for standalone backends and pgstat processes.
2002-09-25 20:31:40 +00:00
Tom Lane
7233aae50b Fix PPC s_lock operations to work correctly on multi-CPU machines.
Need 'isync' during TAS and 'sync' during S_UNLOCK.
2002-09-21 00:14:05 +00:00
Tom Lane
b2735fcd52 Performance improvement for MultiRecordFreeSpace on large relations ---
avoid O(N^2) behavior.  Problem noted and fixed by Stephen Marshall <smarshall@wsicorp.com>,
with some help from Tom Lane.
2002-09-20 19:56:01 +00:00
Bruce Momjian
229eebd559 This patch fixes two typos in src/backend/storage/ipc/README.
Neil Conway
2002-09-20 03:53:55 +00:00
Tom Lane
c91b8bc537 Cosmetic fixes from Neil Conway. 2002-09-14 19:59:20 +00:00
Tom Lane
52c9d25933 Be careful to include postgres.h *before* any system headers, to ensure
that the right flavors of largefile-related definitions are seen.
Most of these changes are probably unnecessary, but better safe than
sorry.
2002-09-05 00:43:07 +00:00
Bruce Momjian
e50f52a074 pgindent run. 2002-09-04 20:31:48 +00:00
Bruce Momjian
a12b4e279b I checked all the previous string handling errors and most of them were
already fixed by You. However there were a few left and attached patch
should fix the rest of them.

I used StringInfo only in 2 places and both of them are inside debug
ifdefs. Only performance penalty will come from using strlen() like all
the other code does.

I also modified some of the already patched parts by changing
snprintf(buf, 2 * BUFSIZE, ... style lines to
snprintf(buf, sizeof(buf), ... where buf is an array.

Jukka Holappa
2002-09-02 06:11:43 +00:00
Bruce Momjian
97ac103289 Remove sys/types.h in files that include postgres.h, and hence c.h,
because c.h has sys/types.h.
2002-09-02 02:47:07 +00:00
Tom Lane
c7a165adc6 Code review for HeapTupleHeader changes. Add version number to page headers
(overlaying low byte of page size) and add HEAP_HASOID bit to t_infomask,
per earlier discussion.  Simplify scheme for overlaying fields in tuple
header (no need for cmax to live in more than one place).  Don't try to
clear infomask status bits in tqual.c --- not safe to do it there.  Don't
try to force output table of a SELECT INTO to have OIDs, either.  Get rid
of unnecessarily complex three-state scheme for TupleDesc.tdhasoids, which
has already caused one recent failure.  Improve documentation.
2002-09-02 01:05:06 +00:00
Tom Lane
1bab464eb4 Code review for pg_locks feature. Make shmemoffset of PROCLOCK structs
available (else there's no way to interpret the list links).  Change
pg_locks view to show transaction ID locks separately from ordinary
relation locks.  Avoid showing N duplicate rows when the same lock is
held multiple times (seems unlikely that users care about exact hold
count).  Improve documentation.
2002-08-31 17:14:28 +00:00
Bruce Momjian
626eca697c This patch reserves the last superuser_reserved_connections slots for
connections by the superuser only.

This patch replaces the last patch I sent a couple of days ago.

It closes a connection that has not been authorised by a superuser if it would
leave less than the GUC variable ReservedBackends
(superuser_reserved_connections in postgres.conf) backend process slots free
in the SISeg. This differs to the first patch which only reserved the last
ReservedBackends slots in the procState array. This has made the free slot
test more expensive due to the use of a lock.

After thinking about a comment on the first patch I've also made it a fatal
error if the number of reserved slots is not less than the maximum number of
connections.

Nigel J. Andrews
2002-08-29 21:02:12 +00:00
Bruce Momjian
dd912c6977 This patches replaces a few more usages of strcpy() and sprintf() when
copying into a fixed-size buffer (in this case, a buffer of
NAMEDATALEN bytes). AFAICT nothing to worry about here, but worth
fixing anyway...

Neil Conway
2002-08-27 03:56:35 +00:00
Tom Lane
58de480999 Clean up comments to be careful about the distinction between variable-
width types and varlena types, since with the introduction of CSTRING as
a more-or-less-real type, these concepts aren't identical.  I've tried to
use varlena consistently to denote datatypes with typlen = -1, ie, they
have a length word and are potentially TOASTable; while the term variable
width covers both varlena and cstring (and, perhaps, someday other types
with other rules for computing the actual width).  No code changes in this
commit except for renaming a couple macros.
2002-08-25 17:20:01 +00:00
Bruce Momjian
82119a696e [ Newest version of patch applied.]
This patch is an updated version of the lock listing patch. I've made
the following changes:

    - write documentation
    - wrap the SRF in a view called 'pg_locks': all user-level
      access should be done through this view
    - re-diff against latest CVS

One thing I chose not to do is adapt the SRF to use the anonymous
composite type code from Joe Conway. I'll probably do that eventually,
but I'm not really convinced it's a significantly cleaner way to
bootstrap SRF builtins than the method this patch uses (of course, it
has other uses...)

Neil Conway
2002-08-17 13:04:19 +00:00
Bruce Momjian
b1a5f87209 Tom Lane wrote:
> There's no longer a separate call to heap_storage_create in that routine
> --- the right place to make the test is now in the storage_create
> boolean parameter being passed to heap_create.  A simple change, but
> it passeth patch's understanding ...

Thanks.

Attached is a patch against cvs tip as of 8:30 PM PST or so. Turned out
that even after fixing the failed hunks, there was a new spot in
bufmgr.c which needed to be fixed (related to temp relations;
RelationUpdateNumberOfBlocks). But thankfully the regression test code
caught it :-)

Joe Conway
2002-08-15 16:36:08 +00:00
Tom Lane
e44beef712 Code review of CLUSTER patch. Clean up problems with relcache getting
confused, toasted data getting lost, etc.
2002-08-11 21:17:35 +00:00
Peter Eisentraut
f1d820494c Fix failure to relink postmaster executable in the first make run if only a
single source file a few directories deep in the backend tree has changed.
2002-08-10 17:59:28 +00:00
Tom Lane
ba053de197 Still more paranoia in PageAddItem: disallow specification of an item
offset past the last-used-item-plus-one, since that would result in
leaving uninitialized holes in the item pointer array.  AFAICT the only
place that was depending on this was btree index build, which was being
cavalier about when to fill in the P_HIKEY pointer; easily fixed.
Also a small performance improvement: shuffle itemid's by means of
memmove, not a one-at-a-time loop.
2002-08-06 19:41:23 +00:00
Tom Lane
5df307c778 Restructure local-buffer handling per recent pghackers discussion.
The local buffer manager is no longer used for newly-created relations
(unless they are TEMP); a new non-TEMP relation goes through the shared
bufmgr and thus will participate normally in checkpoints.  But TEMP relations
use the local buffer manager throughout their lifespan.  Also, operations
in TEMP relations are not logged in WAL, thus improving performance.
Since it's no longer necessary to fsync relations as they move out of the
local buffers into shared buffers, quite a lot of smgr.c/md.c/fd.c code
is no longer needed and has been removed: there's no concept of a dirty
relation anymore in md.c/fd.c, and we never fsync anything but WAL.
Still TODO: improve local buffer management algorithms so that it would
be reasonable to increase NLocBuffer.
2002-08-06 02:36:35 +00:00
Tom Lane
15fe086fba Restructure system-catalog index updating logic. Instead of having
hardwired lists of index names for each catalog, use the relcache's
mechanism for caching lists of OIDs of indexes of any table.  This
reduces the common case of updating system catalog indexes to a single
line, makes it much easier to add a new system index (in fact, you
can now do so on-the-fly if you want to), and as a nice side benefit
improves performance a little.  Per recent pghackers discussion.
2002-08-05 03:29:17 +00:00
Bruce Momjian
5e6528adf7 * -Remove LockMethodTable.prio field, not used (Bruce) 2002-08-01 05:18:34 +00:00
Bruce Momjian
b75fcf9326 Complete TODO item:
* -HOLDER/HOLDERTAB rename to PROCLOCK/PROCLOCKTAG
2002-07-19 00:17:40 +00:00
Bruce Momjian
981d045e88 Complete TODO item:
* Merge LockMethodCtl and LockMethodTable into one shared structure (Bruce)
2002-07-18 23:06:20 +00:00
Bruce Momjian
4db8718e84 Add SET statement_timeout capability. Timeout is in ms. A value of
zero turns off the timer.
2002-07-13 01:02:14 +00:00
Bruce Momjian
33f1687879 There already was a macro PageGetItemId; this is now used in (almost)
all places, where pd_linp is accessed.  Also introduce new macros
SizeOfPageHeaderData and BTMaxItemSize. This is just source code
cosmetic, no behaviour changed.

Manfred Koizar
2002-07-02 05:48:44 +00:00
Bruce Momjian
8864603f3c Minor code cleanup in bufmgr.c and bufmgr.h, mainly by moving repeated
lines of code into internal routines (drop_relfilenode_buffers,
release_buffer) and by hiding unused routines (PrintBufferDescs,
PrintPinnedBufs) behind #ifdef NOT_USED. Remove AbortBufferIO()
declaration from bufmgr.c (already declared in bufmgr.h)

Manfred Koizar
2002-07-02 05:47:37 +00:00
Bruce Momjian
d84fe82230 Update copyright to 2002. 2002-06-20 20:29:54 +00:00
Bruce Momjian
6e8a1a6717 WriteBuffer return value:
>I'd vote for changing WriteBuffer to
>return void, and have it elog() on bad argument.

Manfred Koizar
2002-06-15 19:59:59 +00:00
Bruce Momjian
918e864f14 Remove some pre-WAL relics:
SharedBufferChanged
  BufferRelidLastDirtied
  BufferTagLastDirtied
  BufferDirtiedByMe

Manfred Koizar
2002-06-15 19:55:38 +00:00
Jan Wieck
469cb65aca Katherine Ward wrote:
> Changes to avoid collisions with WIN32 & MFC names...
> 1.  Renamed:
>       a.  PROC => PGPROC
>       b.  GetUserName() => GetUserNameFromId()
>       c.  GetCurrentTime() => GetCurrentDateTime()
>       d.  IGNORE => IGNORE_DTF in include/utils/datetime.h & utils/adt/datetim
>
> 2.  Added _P to some lex/yacc tokens:
>       CONST, CHAR, DELETE, FLOAT, GROUP, IN, OUT

Jan
2002-06-11 13:40:53 +00:00
Tom Lane
3f4d488022 Mark index entries "killed" when they are no longer visible to any
transaction, so as to avoid returning them out of the index AM.  Saves
repeated heap_fetch operations on frequently-updated rows.  Also detect
queries on unique keys (equality to all columns of a unique index), and
don't bother continuing scan once we have found first match.

Killing is implemented in the btree and hash AMs, but not yet in rtree
or gist, because there isn't an equally convenient place to do it in
those AMs (the outer amgetnext routine can't do it without re-pinning
the index page).

Did some small cleanup on APIs of HeapTupleSatisfies, heap_fetch, and
index_insert to make this a little easier.
2002-05-24 18:57:57 +00:00
Tom Lane
959e61e917 Remove global variable scanCommandId in favor of storing a command ID
in snapshots, per my proposal of a few days ago.  Also, tweak heapam.c
routines (heap_insert, heap_update, heap_delete, heap_mark4update) to
be passed the command ID to use, instead of doing GetCurrentCommandID.
For catalog updates they'll still get passed current command ID, but
for updates generated from the main executor they'll get passed the
command ID saved in the snapshot the query is using.  This should fix
some corner cases associated with functions and triggers that advance
current command ID while an outer query is still in progress.
2002-05-21 22:05:55 +00:00
Tom Lane
44fbe20d62 Restructure indexscan API (index_beginscan, index_getnext) per
yesterday's proposal to pghackers.  Also remove unnecessary parameters
to heap_beginscan, heap_rescan.  I modified pg_proc.h to reflect the
new numbers of parameters for the AM interface routines, but did not
force an initdb because nothing actually looks at those fields.
2002-05-20 23:51:44 +00:00
Tom Lane
72a3902a66 Create an internal semaphore API that is not tied to SysV semaphores.
As proof of concept, provide an alternate implementation based on POSIX
semaphores.  Also push the SysV shared-memory implementation into a
separate file so that it can be replaced conveniently.
2002-05-05 00:03:29 +00:00
Tom Lane
1a69a37d5b Fix obsolete comments. 2002-05-03 17:42:11 +00:00
Tom Lane
c2def1b128 Fix backslash-n typo, per Joe Conway. 2002-05-02 21:44:43 +00:00
Bruce Momjian
171824087c The patch I sent to -patches a little while ago wasn't applied: it
was in the thread "make BufferGetBlockNumber() a macro". Tom
objected to the original patch, so I prepared a new one which
doesn't change BufferGetBlockNumber() into a macro, it just
cleans up some comments and fixes an assertion. The patch
is attached.

Neil Conway
2002-04-15 23:47:12 +00:00
Bruce Momjian
33d1bb76c6 The attached patch corrects an inaccuracy in src/backend/catalog/README
and fixes a few spelling mistakes in src/bakckend/lmgr/README.

Neil Conway
2002-04-15 23:46:13 +00:00
Bruce Momjian
b73859db8c Patch against 7.2.1 sources. Uses Solaris Intimate Shared Memory
for Solaris on SPARC.  Scott Brunza (sbrunza@sonalysts.com) gets
credit for identifying the issue, making the change, and doing
the regression tests.

Earlier testing on 7.2rc2 and 7.2 showed performance gains of
1% to 10% on pgbench, osdb-pg, and some locally developed apps.

Solaris Intimate Shared Memory is described in "SOLARIS INTERNALS
Core Kernel Components" by Jim Mauro and Richard McDougall,
Copyright 2001 Sun Microsystem, Inc.  ISBN 0-13-022496-0

P.J. "Josh" Rovero
2002-04-13 19:52:51 +00:00
Bruce Momjian
3cbe6b2478 Looks like a small patch is needed as well to do the right thing on Linux.
The patch enables the mips2 ISA for the ll/sc operations, and then restores
it when done.  The kernel/libc emulation code will take over on CPUs without
ll/sc, and on CPUs with it, it'll use the operations provided by the CPU.

Combined with the earlier fix (removing -mips2), postgresql builds again on
mips and mipsel.  The patch is against 7.2-7.

Oliver Elphick
2002-04-05 11:38:13 +00:00
Bruce Momjian
92288a1cf9 Change made to elog:
o  Change all current CVS messages of NOTICE to WARNING.  We were going
to do this just before 7.3 beta but it has to be done now, as you will
see below.

o Change current INFO messages that should be controlled by
client_min_messages to NOTICE.

o Force remaining INFO messages, like from EXPLAIN, VACUUM VERBOSE, etc.
to always go to the client.

o Remove INFO from the client_min_messages options and add NOTICE.

Seems we do need three non-ERROR elog levels to handle the various
behaviors we need for these messages.

Regression passed.
2002-03-06 06:10:59 +00:00
Tom Lane
cfae62c476 Some kibitzing about appropriate elog levels for sinval messages. 2002-03-02 23:35:57 +00:00
Bruce Momjian
a033daf566 Commit to match discussed elog() changes. Only update is that LOG is
now just below FATAL in server_min_messages.  Added more text to
highlight ordering difference between it and client_min_messages.

---------------------------------------------------------------------------

REALLYFATAL => PANIC
STOP => PANIC
New INFO level the prints to client by default
New LOG level the prints to server log by default
Cause VACUUM information to print only to the client
NOTICE => INFO where purely information messages are sent
DEBUG => LOG for purely server status messages
DEBUG removed, kept as backward compatible
DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1 added
DebugLvl removed in favor of new DEBUG[1-5] symbols
New server_min_messages GUC parameter with values:
        DEBUG[5-1], INFO, NOTICE, ERROR, LOG, FATAL, PANIC
New client_min_messages GUC parameter with values:
        DEBUG[5-1], LOG, INFO, NOTICE, ERROR, FATAL, PANIC
Server startup now logged with LOG instead of DEBUG
Remove debug_level GUC parameter
elog() numbers now start at 10
Add test to print error message if older elog() values are passed to elog()
Bootstrap mode now has a -d that requires an argument, like postmaster
2002-03-02 21:39:36 +00:00
Tom Lane
d99fb0d909 Don't Assert() that fsync() and close() never fail; I have seen this
crash on Solaris when over disk quota.  Instead, report such failures
via elog(DEBUG).
2002-02-10 22:56:31 +00:00
Tom Lane
bef0c8dc29 Add cast to suppress gcc warning on Darwin platform. 2002-01-30 19:34:55 +00:00
Tom Lane
386f1809a7 Fix logic error in insert_fsm_page_entry: because compact_fsm_page_list
removes any empty chunks, the chunk previously added won't be there
anymore, so it's possible there is zero free space in the rel's page list
afterwards.  Must loop back and rerun the part that adds a chunk to
the list.
2002-01-24 15:31:43 +00:00
Tom Lane
aa00e6134e Add more sanity-checking to PageAddItem and PageIndexTupleDelete,
to prevent spreading of corruption when page header pointers are bad.
Merge PageZero into PageInit, since it was never used separately, and
remove separate memset calls used at most other PageInit call points.
Remove IndexPageCleanup, which wasn't used at all.
2002-01-15 22:14:17 +00:00
Tom Lane
5b9a058384 Tweak LWLock algorithms so that an awakened waiter for a lock is not
granted the lock when awakened; the signal now only means that the lock
is potentially available.  The waiting process must retry its attempt
to get the lock when it gets to run.  This allows the lock releasing
process to re-acquire the lock later in its timeslice.  Since LWLocks
are usually held for short periods, it is possible for a process to
acquire and release the same lock many times in a timeslice.  The old
spinlock-based implementation of these locks allowed for that; but the
original coding of LWLock would force a process swap for each acquisition
if there was any contention.  Although this approach reopens the door to
process starvation (a waiter might repeatedly fail to get the lock),
the odds of that being a big problem seem low, and the performance cost
of the previous approach is considerable.
2002-01-07 16:33:00 +00:00
Bruce Momjian
6f901b6f5a Oops, only wanted datetime.c changes in there. lock stuff reversed out. 2001-12-29 21:30:32 +00:00
Bruce Momjian
9e7b9c6f54 Fix newly introduced datetime.c compile failure; not enough parens. 2001-12-29 21:28:18 +00:00
Tom Lane
198152730b Improve LOCK_DEBUG logging code for LWLocks. 2001-12-28 23:26:04 +00:00
Tom Lane
d3fc362ec2 Ensure that all direct uses of spinlock-protected data structures use
'volatile' pointers to access those structures, so that optimizing
compilers will not decide to move the structure accesses outside of the
spinlock-acquire-to-spinlock-release sequence.  There are no known bugs
in these uses at present, but based on bad experience with lwlock.c,
it seems prudent to ensure that we protect these other uses too.
Per pghackers discussion around 12-Dec.  (Note: it should not be
necessary to worry about structures protected by LWLocks, since the
LWLock acquire and release operations are not inline macros.)
2001-12-28 18:16:43 +00:00
Tom Lane
584f818bef Declare LWLock pointers as volatile to prevent AIX compiler from
reordering operations at its whim.  Releasing TAS lock before we've
finished updating proc structure is uncool.
2001-12-10 21:13:50 +00:00
Tom Lane
f6ee99a062 Clean up usage-statistics display code (ShowUsage and friends). StatFp
is gone, usage messages now go through elog(DEBUG).
2001-11-10 23:51:14 +00:00
Bruce Momjian
77e4fd889c Fix indenting for 'extern "C"' cases. 2001-11-08 20:37:52 +00:00
Tom Lane
64af43a15f Add casts to suppress compiler warnings observed on Darwin platform
(surprised no one has reported these yet...)
2001-11-08 04:05:13 +00:00
Tom Lane
ca7578d454 The extra semaphore that proc.c now allocates for checkpoint processes
should be accounted for in the PROC_SEM_MAP_ENTRIES() macro.  Otherwise
the ports that rely on this macro to size data structures are broken.
Mea culpa.
2001-11-06 00:38:26 +00:00
Bruce Momjian
ea08e6cd55 New pgindent run with fixes suggested by Tom. Patch manually reviewed,
initdb/regression tests pass.
2001-11-05 17:46:40 +00:00
Tom Lane
d556920a98 Remove ill-considered Assert. 2001-11-05 01:34:37 +00:00
Tom Lane
fb5f1b2c13 Merge three existing ways of signaling postmaster from child processes,
so that only one signal number is used not three.  Flags in shared
memory tell the reason(s) for the current signal.  This method is
extensible to handle more signal reasons without chewing up even more
signal numbers, but the immediate reason is to keep pg_pwd reloads
separate from SIGHUP processing in the postmaster.
Also clean up some problems in the postmaster with delayed response to
checkpoint status changes --- basically, it wouldn't schedule a checkpoint
if it wasn't getting connection requests on a regular basis.
2001-11-04 19:55:31 +00:00
Bruce Momjian
c41b6b1b9c Fix small problem Tom Lane found with pgindent run. 2001-10-30 05:38:56 +00:00
Bruce Momjian
6783b2372e Another pgindent run. Fixes enum indenting, and improves #endif
spacing.  Also adds space for one-line comments.
2001-10-28 06:26:15 +00:00
Bruce Momjian
b81844b173 pgindent run on all C files. Java run to follow. initdb/regression
tests pass.
2001-10-25 05:50:21 +00:00
Tom Lane
087771ae40 Add error checking to PageRepairFragmentation to ensure that it can
never overwrite adjacent pages with copied data, even if page header
and/or item pointers are already corrupt.  Change inspired by trouble
report from Alvaro Herrera.
2001-10-23 02:20:15 +00:00
Tom Lane
8a52b893b3 Further cleanup of dynahash.c API, in pursuit of portability and
readability.  Bizarre '(long *) TRUE' return convention is gone,
in favor of just raising an error internally in dynahash.c when
we detect hashtable corruption.  HashTableWalk is gone, in favor
of using hash_seq_search directly, since it had no hope of working
with non-LONGALIGNable datatypes.  Simplify some other code that was
made undesirably grotty by promixity to HashTableWalk.
2001-10-05 17:28:13 +00:00
Tom Lane
c7a7107f41 Revise shmget() and semget() failure messages to mention the possibility
of coping by reducing shared_buffers/max_connections settings.
2001-10-01 23:26:55 +00:00
Tom Lane
0648d78ac4 Make inclusion logic for sys/sem.h and sys/ipc.h consistent across all
the files that need them.  Per trouble report from Teodor.
2001-10-01 18:16:35 +00:00
Bruce Momjian
77d2622498 Add sys/types.h for FreeBSD compile.
Teodor Sigaev
2001-10-01 17:52:34 +00:00
Tom Lane
5999e78fc4 Another round of cleanups for dynahash.c (maybe it's finally clean of
portability issues).  Caller-visible data structures are now allocated
on MAXALIGN boundaries, allowing safe use of datatypes wider than 'long'.
Rejigger hash_create API so that caller specifies size of key and
total size of entry, not size of key and size of rest of entry.
This simplifies life considerably since each number is just a sizeof(),
and padding issues etc. are taken care of automatically.
2001-10-01 05:36:17 +00:00
Tom Lane
f9f258281e Create a GUC parameter max_files_per_process that is a configurable
upper limit on what we will believe from sysconf(_SC_OPEN_MAX).  The
default value is 1000, so that under ordinary conditions it won't
affect the behavior.  But on platforms where the kernel promises far
more than it can deliver, this can be used to prevent running out of
file descriptors.  See numerous past discussions, eg, pgsql-hackers
around 23-Dec-2000.
2001-09-30 18:57:45 +00:00
Bruce Momjian
0386ccfed1 Back out change. Too many place to change too close to beta:
* HOLDER/HOLDERTAB rename to PROCLOCKLINK/PROCLOCKLINKTAG (Bruce)

Will return later.
2001-09-30 00:45:48 +00:00
Bruce Momjian
f738747494 Do this TODO item:
* HOLDER/HOLDERTAB rename to PROCLOCK/PROCLOCKTAG (Tom)

Didn't use PROCLOCKLINK because it made PROCLOCKLINKTAG too long.
2001-09-29 21:35:14 +00:00
Tom Lane
2a314add00 Whoops, I was a tad too enthusiastic about using shared lock mode for
SInvalLock.  GetSnapshotData(true) has to use exclusive lock, since
it sets MyProc->xmin.
2001-09-29 15:29:48 +00:00
Tom Lane
499abb0c0f Implement new 'lightweight lock manager' that's intermediate between
existing lock manager and spinlocks: it understands exclusive vs shared
lock but has few other fancy features.  Replace most uses of spinlocks
with lightweight locks.  All remaining uses of spinlocks have very short
lock hold times (a few dozen instructions), so tweak spinlock backoff
code to work efficiently given this assumption.  All per my proposal on
pghackers 26-Sep-01.
2001-09-29 04:02:27 +00:00
Tom Lane
90aebf7f52 Move s_lock.c and spin.c into lmgr subdirectory, which seems a much
more reasonable location for them.
2001-09-27 19:10:02 +00:00
Tom Lane
3d59ad00e8 Remove useless LockDisable() function and associated overhead, per my
proposal of 26-Aug.
2001-09-27 16:29:13 +00:00
Tom Lane
35b7601b04 Add an overall timeout on the client authentication cycle, so that
a hung client or lost connection can't indefinitely block a postmaster
child (not to mention the possibility of deliberate DoS attacks).
Timeout is controlled by new authentication_timeout GUC variable,
which I set to 60 seconds by default ... does that seem reasonable?
2001-09-21 17:06:12 +00:00
Tom Lane
863aceb54f Get rid of PID entries in shmem hash table; there is no longer any need
for them, and making them just wastes time during backend startup/shutdown.
Also, remove compile-time MAXBACKENDS limit per long-ago proposal.
You can now set MaxBackends as high as your kernel can stand without
any reconfiguration/recompilation.
2001-09-07 00:27:30 +00:00
Tom Lane
763554393a Fix code so that we recover cleanly if there are no free semaphores
available in freeSemMap.  As noted by Tatsuo, this is now a likely
scenario for detecting MaxBackends-exceeded; if MaxBackends is a multiple
of PROC_NSEMS_PER_SET then we will fail here and not in sinval.c.  The
cleanup path did not work correctly before, anyway.
2001-09-04 21:42:17 +00:00
Tom Lane
b553cba15a Clean up the lock state properly when aborting because of early deadlock
detection in ProcSleep().  Bug noted by Tomasz Zielonka --- how did this
escape detection for this long??
2001-09-04 02:26:57 +00:00
Peter Eisentraut
3c59a9e3b7 Bring references to ipcclean in sync with reality. 2001-09-04 00:22:34 +00:00
Peter Eisentraut
b1a38a4380 Install the SQL command man pages into a section appropriate for each
system.  Some systems did not understand the 'l' section, and in general
it wasn't entirely appropriate.

On SCO OpenServer, the man pages won't be installed at all until someone
figures out their man system.
2001-08-29 19:14:40 +00:00
Peter Eisentraut
f45b7270b6 Whoops, wrong logic. 2001-08-29 11:54:12 +00:00
Peter Eisentraut
dd225655b9 Change the conditionals so the mips + gcc code here doesn't apply for Irix.
The code in s_lock.h should get used.

report from Bruno Mattarollo <bruno@web1.greenpeace.org>
2001-08-28 15:04:27 +00:00
Tom Lane
bc7d37a525 Transaction IDs wrap around, per my proposal of 13-Aug-01. More
documentation to come, but the code is all here.  initdb forced.
2001-08-26 16:56:03 +00:00
Tom Lane
2589735da0 Replace implementation of pg_log as a relation accessed through the
buffer manager with 'pg_clog', a specialized access method modeled
on pg_xlog.  This simplifies startup (don't need to play games to
open pg_log; among other things, OverrideTransactionSystem goes away),
should improve performance a little, and opens the door to recycling
commit log space by removing no-longer-needed segments of the commit
log.  Actual recycling is not there yet, but I felt I should commit
this part separately since it'd still be useful if we chose not to
do transaction ID wraparound.
2001-08-25 18:52:43 +00:00
Peter Eisentraut
968d7733a1 Rename config.h to pg_config.h and os.h to pg_config_os.h, fix a number of
places that were including the wrong files.
2001-08-24 14:07:50 +00:00
Tom Lane
7326e78c42 Ensure that all TransactionId comparisons are encapsulated in macros
(TransactionIdPrecedes, TransactionIdFollows, etc).  First step on the
way to transaction ID wrap solution ...
2001-08-23 23:06:38 +00:00
Tom Lane
ef6ccb0bcc Cleanup some minor oversights in optional-OIDs stuff. 2001-08-10 20:52:25 +00:00
Bruce Momjian
3e51868226 This patch is because Hurd does not support NOFILE. It is against current
cvs.

The Debian bug report says, "The upstream source makes use of NOFILE
unconditionalized.  As the Hurd doesn't have an arbitrary limit on the
number of open files, this is not defined.  But _SC_OPEN_MAX works fine
and returns 1024 (applications can increase this as they want), so I
suggest the below diff.  Please forward this upstream, too."

Oliver Elphick
2001-08-04 19:42:34 +00:00
Tom Lane
8a59f336bb Minor performance improvement in MultiRecordFreeSpace. 2001-07-19 21:25:37 +00:00
Tom Lane
ed5c4e4a14 Improve documentation about reasoning behind the order of operations
in GetSnapshotData, GetNewTransactionId, CommitTransaction, AbortTransaction,
etc.  Correct race condition in transaction status testing in
HeapTupleSatisfiesVacuum --- this wasn't important for old VACUUM with
exclusive lock on its table, but it sure is important now.  All per
pghackers discussion 7/11/01 and 7/12/01.
2001-07-16 22:43:34 +00:00
Tom Lane
b9f3a929ee Create a new HeapTupleSatisfiesVacuum() routine in tqual.c that embodies the
validity checking rules for VACUUM.  Make some other rearrangements of the
VACUUM code to allow more code to be shared between full and lazy VACUUM.
Minor code cleanups and added comments for TransactionId manipulations.
2001-07-12 04:11:13 +00:00
Tom Lane
4fe42dfbc3 Add SHARE UPDATE EXCLUSIVE lock mode, coming soon to a VACUUM near you.
Name chosen per pghackers discussion around 6/22/01.
2001-07-09 22:18:34 +00:00
Tom Lane
55432fedd2 Implement LockBufferForCleanup(), which will allow concurrent VACUUM
to wait until it's safe to remove tuples and compact free space in a
shared buffer page.  Miscellaneous small code cleanups in bufmgr, too.
2001-07-06 21:04:26 +00:00
Tom Lane
42748087c1 First non-stub implementation of shared free space map. It's not super
useful as yet, since its primary source of information is (full) VACUUM,
which makes a concerted effort to get rid of free space before telling
the map about it ... next stop is concurrent VACUUM ...
2001-07-02 20:50:46 +00:00
Tom Lane
a29f6c095c Make the found-a-buffer-when-we-were-expecting-to-extend-the-rel path
actually work.  It had been throwing an Assert as of my recent changes
to bufmgr.c, but was not really right even before that AFAICT.
2001-07-02 18:47:18 +00:00
Tom Lane
af5ced9cfd Further work on connecting the free space map (which is still just a
stub) into the rest of the system.  Adopt a cleaner approach to preventing
deadlock in concurrent heap_updates: allow RelationGetBufferForTuple to
select any page of the rel, and put the onus on it to lock both buffers
in a consistent order.  Remove no-longer-needed isExtend hack from
API of ReleaseAndReadBuffer.
2001-06-29 21:08:25 +00:00
Tom Lane
e0c9301c87 Install infrastructure for shared-memory free space map. Doesn't actually
do anything yet, but it has the necessary connections to initialization
and so forth.  Make some gestures towards allowing number of blocks in
a relation to be BlockNumber, ie, unsigned int, rather than signed int.
(I doubt I got all the places that are sloppy about it, yet.)  On the
way, replace the hardwired NLOCKS_PER_XACT fudge factor with a GUC
variable.
2001-06-27 23:31:40 +00:00
Jan Wieck
8d80b0d980 Statistical system views (yet without the config stuff, but
it's hard to keep such massive changes in sync with the tree
so I need to get it in and work from there now).

Jan
2001-06-22 19:16:24 +00:00
Tom Lane
d8d9ed931e Add support to lock manager for conditionally locking a lock (ie,
return without waiting if we can't get the lock immediately).
Not used yet, but will be needed for concurrent VACUUM.
2001-06-22 00:04:59 +00:00
Tom Lane
bbbc00af88 Clean up some longstanding problems in shared-cache invalidation.
SI messages now include the relevant database OID, so that operations
in one database do not cause useless cache flushes in backends attached
to other databases.  Declare SI messages properly using a union, to
eliminate the former assumption that Oid is the same size as int or Index.
Rewrite the nearly-unreadable code in inval.c, and document it better.
Arrange for catcache flushes at end of command/transaction to happen before
relcache flushes do --- this avoids loading a new tuple into the catcache
while setting up new relcache entry, only to have it be flushed again
immediately.
2001-06-19 19:42:16 +00:00
Bruce Momjian
49ce6fff1d Allow removal of system-named pg_* temp tables. Rename temp file/dir as
pgsql_tmp.
2001-06-18 16:13:21 +00:00
Tom Lane
2917f0a5dd Tweak startup sequence so that running out of PROC array slots is
detected sooner in backend startup, and is treated as an expected error
(it gives 'Sorry, too many clients already' now).  This allows us not
to have to enforce the MaxBackends limit exactly in the postmaster.
Also, remove ProcRemove() and fold its functionality into ProcKill().
There's no good reason for a backend not to be responsible for removing
its PROC entry, and there are lots of good reasons for the postmaster
not to be touching shared-memory data structures.
2001-06-16 22:58:17 +00:00
Tom Lane
1d584f97b9 Clean up various to-do items associated with system indexes:
pg_database now has unique indexes on oid and on datname.
pg_shadow now has unique indexes on usename and on usesysid.
pg_am now has unique index on oid.
pg_opclass now has unique index on oid.
pg_amproc now has unique index on amid+amopclaid+amprocnum.
Remove pg_rewrite's unnecessary index on oid, delete unused RULEOID syscache.
Remove index on pg_listener and associated syscache for performance reasons
(caching rows that are certain to change before you need 'em again is
rather pointless).
Change pg_attrdef's nonunique index on adrelid into a unique index on
adrelid+adnum.

Fix various incorrect settings of pg_class.relisshared, make that the
primary reference point for whether a relation is shared or not.
IsSharedSystemRelationName() is now only consulted to initialize relisshared
during initial creation of tables and indexes.  In theory we might now
support shared user relations, though it's not clear how one would get
entries for them into pg_class &etc of multiple databases.

Fix recently reported bug that pg_attribute rows created for an index all have
the same OID.  (Proof that non-unique OID doesn't matter unless it's
actually used to do lookups ;-))

There's no need to treat pg_trigger, pg_attrdef, pg_relcheck as bootstrap
relations.  Convert them into plain system catalogs without hardwired
entries in pg_class and friends.

Unify global.bki and template1.bki into a single init script postgres.bki,
since the alleged distinction between them was misleading and pointless.
Not to mention that it didn't work for setting up indexes on shared
system relations.

Rationalize locking of pg_shadow, pg_group, pg_attrdef (no need to use
AccessExclusiveLock where ExclusiveLock or even RowExclusiveLock will do).
Also, hold locks until transaction commit where necessary.
2001-06-12 05:55:50 +00:00
Tom Lane
2a6f7ac456 Move temporary files into 'pg_tempfiles' subdirectory of each database
directory (which can be made a symlink to put temp files on another disk).
Add code to delete leftover temp files during postmaster startup.
Bruce, with some kibitzing from Tom.
2001-06-11 04:12:29 +00:00
Tom Lane
bdadc9bf1c Remove RelationGetBufferWithBuffer(), which is horribly confused about
appropriate pin-count manipulation, and instead use ReleaseAndReadBuffer.
Make use of the fact that the passed-in buffer (if there is one) must
be pinned to avoid grabbing the bufmgr spinlock when we are able to
return this same buffer.  Eliminate unnecessary 'previous tuple' and
'next tuple' fields of HeapScanDesc and IndexScanDesc, thereby removing
a whole lot of bookkeeping from heap_getnext() and related routines.
2001-06-09 18:16:59 +00:00
Tom Lane
1173344e74 Adjust WAL code so that checkpoints truncate the xlog at the previous
checkpoint's redo pointer, not its undo pointer, per discussion in
pghackers a few days ago.  No point in hanging onto undo information
until we have the ability to do something with it --- and this solves
a rather large problem with log space for long-running transactions.
Also, change all calls of write() to detect the case where write
returned a count less than requested, but failed to set errno.
Presume that this situation indicates ENOSPC, and give the appropriate
error message, rather than a random message associated with the previous
value of errno.
2001-06-06 17:07:46 +00:00
Tom Lane
ddd96e1f21 Guard against malloc failure. Also, don't examine segP->lastBackend
until we hold the spinlock.
2001-06-01 20:07:16 +00:00
Bruce Momjian
33f2614aa1 Remove SEP_CHAR, replace with / or '/' as appropriate. 2001-05-30 14:15:27 +00:00
Bruce Momjian
f6923ff3ac Oops, only wanted python change in the last commit. Backing out. 2001-05-25 15:45:34 +00:00
Bruce Momjian
dffb673692 While changing Cygwin Python to build its core as a DLL (like Win32
Python) to support shared extension modules, I have learned that Guido
prefers the style of the attached patch to solve the above problem.
I feel that this solution is particularly appropriate in this case
because the following:

    PglargeType
    PgType
    PgQueryType

are already being handled in the way that I am proposing for PgSourceType.

Jason Tishler
2001-05-25 15:34:50 +00:00
Bruce Momjian
dc0ff5c67a Small code cleanups,formatting. 2001-05-18 21:24:20 +00:00
Tom Lane
eedb7d18fa Modify RelationGetBufferForTuple() so that we only do lseek and lock
when we need to move to a new page; as long as we can insert the new
tuple on the same page as before, we only need LockBuffer and not the
expensive stuff.  Also, twiddle bufmgr interfaces to avoid redundant
lseeks in RelationGetBufferForTuple and BufferAlloc.  Successive inserts
now require one lseek per page added, rather than one per tuple with
several additional ones at each page boundary as happened before.
Lock contention when multiple backends are inserting in same table
is also greatly reduced.
2001-05-12 19:58:28 +00:00
Tom Lane
642107d5ba Avoid unnecessary lseek() calls by cleanups in md.c. mdfd_lstbcnt was
not being consulted anywhere, so remove it and remove the _mdnblocks()
calls that were used to set it.  Change smgrextend interface to pass in
the target block number (ie, current file length) --- the caller always
knows this already, having already done smgrnblocks(), so it's silly to
do it over again inside mdextend.  Net result: extension of a file now
takes one lseek(SEEK_END) and a write(), not three lseeks and a write.
2001-05-10 20:38:49 +00:00
Bruce Momjian
82c9ce2c40 Small cleanup. 2001-05-08 19:00:26 +00:00
Bruce Momjian
415263b2d2 > Occasionally and without warning I get this from my daily vacuum
> cronjob:
> NOTICE:  RegisterSharedInvalid: SI buffer overflow
> NOTICE:  InvalidateSharedInvalid: cache state reset
> I don't understand what these mean. Should I be concerned about them
> and what do they signify?

No real need to worry.  Those should've been downgraded to DEBUG-level
messages a release or two back, but nobody bothered...

Tom Lane
2001-05-07 17:20:19 +00:00
Tom Lane
08bf4d797b Check for failure of malloc() and realloc() when allocating space for
VFD entries.  On platforms where dereferencing a null pointer doesn't
lead to coredump, it's possible that this omission could have led to
unpleasant behavior like deleting the wrong file.
2001-04-03 04:07:02 +00:00
Tom Lane
6cc6f18d15 open(2) flags saved for re-opening a virtual file should probably not
include O_CREAT.
2001-04-03 02:31:52 +00:00
Tom Lane
244fd47124 _mdfd_getrelnfd() should include kernel error code in failure message. 2001-04-02 23:20:24 +00:00
Tom Lane
ff71301806 Spell __volatile__ correctly. 2001-03-27 01:16:24 +00:00
Tom Lane
ccd415c63f Fix unportable assumptions about alignment of local char[n] variables. 2001-03-25 23:23:59 +00:00
Bruce Momjian
7cf952e7b4 Fix comments that were mis-wrapped, for Tom Lane. 2001-03-23 04:49:58 +00:00
Bruce Momjian
0686d49da0 Remove dashes in comments that don't need them, rewrap with pgindent. 2001-03-22 06:16:21 +00:00
Bruce Momjian
9e1552607a pgindent run. Make it all clean. 2001-03-22 04:01:46 +00:00
Vadim B. Mikheev
ab36582a19 Check bufHdr->cntxDirty and call StartBufferIO in BufferSync()
*before* acquiring shlock on buffer context. This way we should be
protected against conflicts with FlushRelationBuffers.
(Seems we never do excl lock and then StartBufferIO for the same
buffer, so there should be no deadlock here, - but we'd better
check this very soon).
2001-03-21 10:13:29 +00:00
Tom Lane
af6e88a9cf Remove NEXTXID xlog record type to avoid three-way deadlock risk.
NEXTXID isn't really necessary, per previous discussion in pghackers,
but I mulishy insisted we should put it in anyway.  Mea culpa.
2001-03-18 20:18:59 +00:00
Tom Lane
ddc5bc958a When we add 'waiting' to the ps_status display, there should be a
space in front of it.  Improve comments a little.
2001-03-18 20:13:13 +00:00
Bruce Momjian
9de4b77cee 'waiting' status display had extra space, removed.
Change the administrator to 'an' administrator.
2001-03-14 18:24:34 +00:00
Tom Lane
4d14fe0048 XLOG (and related) changes:
* Store two past checkpoint locations, not just one, in pg_control.
  On startup, we fall back to the older checkpoint if the newer one
  is unreadable.  Also, a physical copy of the newest checkpoint record
  is kept in pg_control for possible use in disaster recovery (ie,
  complete loss of pg_xlog).  Also add a version number for pg_control
  itself.  Remove archdir from pg_control; it ought to be a GUC
  parameter, not a special case (not that it's implemented yet anyway).

* Suppress successive checkpoint records when nothing has been entered
  in the WAL log since the last one.  This is not so much to avoid I/O
  as to make it actually useful to keep track of the last two
  checkpoints.  If the things are right next to each other then there's
  not a lot of redundancy gained...

* Change CRC scheme to a true 64-bit CRC, not a pair of 32-bit CRCs
  on alternate bytes.  Polynomial borrowed from ECMA DLT1 standard.

* Fix XLOG record length handling so that it will work at BLCKSZ = 32k.

* Change XID allocation to work more like OID allocation.  (This is of
  dubious necessity, but I think it's a good idea anyway.)

* Fix a number of minor bugs, such as off-by-one logic for XLOG file
  wraparound at the 4 gig mark.

* Add documentation and clean up some coding infelicities; move file
  format declarations out to include files where planned contrib
  utilities can get at them.

* Checkpoint will now occur every CHECKPOINT_SEGMENTS log segments or
  every CHECKPOINT_TIMEOUT seconds, whichever comes first.  It is also
  possible to force a checkpoint by sending SIGUSR1 to the postmaster
  (undocumented feature...)

* Defend against kill -9 postmaster by storing shmem block's key and ID
  in postmaster.pid lockfile, and checking at startup to ensure that no
  processes are still connected to old shmem block (if it still exists).

* Switch backends to accept SIGQUIT rather than SIGUSR1 for emergency
  stop, for symmetry with postmaster and xlog utilities.  Clean up signal
  handling in bootstrap.c so that xlog utilities launched by postmaster
  will react to signals better.

* Standalone bootstrap now grabs lockfile in target directory, as added
  insurance against running it in parallel with live postmaster.
2001-03-13 01:17:06 +00:00
Tom Lane
9c9936587c Implement COMMIT_SIBLINGS parameter to allow pre-commit delay to occur
only if at least N other backends currently have open transactions.  This
is not a great deal of intelligence about whether a delay might be
profitable ... but it beats no intelligence at all.  Note that the default
COMMIT_DELAY is still zero --- this new code does nothing unless that
setting is changed.
Also, mark ENABLEFSYNC as a system-wide setting.  It's no longer safe to
allow that to be set per-backend, since we may be relying on some other
backend's fsync to have synced the WAL log.
2001-02-26 00:50:08 +00:00
Tom Lane
496ea7a876 At least on HPUX, select with delay.tv_sec = 0 and delay.tv_usec = 1000000
does not lead to a one-second delay, but to an immediate EINVAL failure.
This causes CHECKPOINT to crash with s_lock_stuck much too quickly :-(.
Fix by breaking down the requested wait div/mod 1e6.
2001-02-24 22:42:45 +00:00
Tom Lane
e74ce0a566 As long as we're fixing this space calculation, let's actually do it
right.  We should MAXALIGN the individual items because we'll
allocate them individually, not as an array.
2001-02-23 20:12:37 +00:00
Bruce Momjian
81b48493aa Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Is there one LOCKMETHODCTL for every backend?  I thought there was only
> one of them.
>>
>> You're right, that line is erroneous; it should read
>>
>> size += MAX_LOCK_METHODS * MAXALIGN(sizeof(LOCKMETHODCTL));
>>
>> Not a significant error but it should be changed for clarity ...
2001-02-23 18:28:46 +00:00
Bruce Momjian
a95ac415f7 More comment cleanups. 2001-02-22 23:20:06 +00:00
Bruce Momjian
82fc51e0b3 More comment improvements. 2001-02-22 23:02:33 +00:00
Tom Lane
33cc5d8a4d Change s_lock to not use any zero-delay select() calls; these are just a
waste of cycles on single-CPU machines, and of dubious utility on multi-CPU
machines too.
Tweak s_lock_stuck so that caller can specify timeout interval, and
increase interval before declaring stuck spinlock for buffer locks and XLOG
locks.
On systems that have fdatasync(), use that rather than fsync() to sync WAL
log writes.  Ensure that WAL file is entirely allocated during XLogFileInit.
2001-02-18 04:39:42 +00:00
Tom Lane
b634118af9 Add current seek position to FDDEBUG output for FileRead,
FileWrite, FileSeek.
2001-02-17 01:00:04 +00:00
Tom Lane
d08741eab5 Restructure the key include files per recent pghackers discussion: there
are now separate files "postgres.h" and "postgres_fe.h", which are meant
to be the primary include files for backend .c files and frontend .c files
respectively.  By default, only include files meant for frontend use are
installed into the installation include directory.  There is a new make
target 'make install-all-headers' that adds the whole content of the
src/include tree to the installed fileset, for use by people who want to
develop server-side code without keeping the complete source tree on hand.
Cleaned up a whole lot of crufty and inconsistent header inclusions.
2001-02-10 02:31:31 +00:00
Vadim B. Mikheev
21d08bc1f6 PageAddItem in overwrite mode: must *NOT* check itemid' flag if
OffsetNumber == MaxOffsetNumber + 1 - there may be garbage there!
2001-02-06 06:24:00 +00:00
Tom Lane
f433d0d3cd Special case in ProcSleep() wasn't sufficiently general: must check to
see if we shouldn't block whenever we insert ourselves anywhere before
the end of the queue, not only at the front.
2001-01-26 18:23:12 +00:00
Tom Lane
211f5afd40 Whoops, forgot to do ProcLockWakeup() after deadlock checker
rearranges wait queues.
2001-01-25 03:45:50 +00:00
Tom Lane
a05eae029a Re-implement deadlock detection and resolution, per design notes posted
to pghackers on 18-Jan-01.
2001-01-25 03:31:16 +00:00
Bruce Momjian
623bf843d2 Change Copyright from PostgreSQL, Inc to PostgreSQL Global Development Group. 2001-01-24 19:43:33 +00:00
Tom Lane
786f1a59cd Fix all the places that called heap_update() and heap_delete() without
bothering to check the return value --- which meant that in case the
update or delete failed because of a concurrent update, you'd not find
out about it, except by observing later that the transaction produced
the wrong outcome.  There are now subroutines simple_heap_update and
simple_heap_delete that should be used anyplace that you're not prepared
to do the full nine yards of coping with concurrent updates.  In
practice, that seems to mean absolutely everywhere but the executor,
because *noplace* else was checking.
2001-01-23 04:32:23 +00:00
Tom Lane
e84c429062 Clean up lockmanager data structures some more, in preparation for planned
rewrite of deadlock checking.  Lock holder objects are now reachable from
the associated LOCK as well as from the owning PROC.  This makes it
practical to find all the processes holding a lock, as well as all those
waiting on the lock.  Also, clean up some of the grottier aspects of the
SHMQueue API, and cause the waitProcs list to be stored in the intuitive
direction instead of the nonintuitive one.  (Bet you didn't know that
the code followed the 'prev' link to get to the next waiting process,
instead of the 'next' link.  It doesn't do that anymore.)
2001-01-22 22:30:06 +00:00
Bruce Momjian
b8f23aff82 Back out patch for BLOB operations until approval. 2001-01-21 03:50:25 +00:00
Bruce Momjian
c655935217 Hello,
here is the patch attached which do check in each BLOB operation, if we are
in transaction, and raise an error otherwise. This will prevent such mistakes.

--
Sincerely Yours,
Denis Perchine
2001-01-21 03:49:14 +00:00
Tom Lane
6ce0ed2813 Make critical sections (elog->crash) and interrupt holdoff sections
into distinct concepts, per recent discussion on pghackers.
2001-01-19 22:08:47 +00:00
Bruce Momjian
75815c3100 cleanup. 2001-01-19 21:09:57 +00:00
Bruce Momjian
27aaf9df7e Remove ; and add \n to ASM code. 2001-01-19 20:39:16 +00:00
Tom Lane
dae52bf3ec Oops, I had managed to break query-cancel-while-waiting-for-lock. 2001-01-16 20:59:34 +00:00
Tom Lane
64e6c60897 Rename fields of lock and lockholder structures to something a tad less
confusing, and clean up documentation.
2001-01-16 06:11:34 +00:00
Tom Lane
36839c1927 Restructure backend SIGINT/SIGTERM handling so that 'die' interrupts
are treated more like 'cancel' interrupts: the signal handler sets a
flag that is examined at well-defined spots, rather than trying to cope
with an interrupt that might happen anywhere.  See pghackers discussion
of 1/12/01.
2001-01-14 05:08:17 +00:00
Tom Lane
6162432de9 Add more critical-section calls: all code sections that hold spinlocks
are now critical sections, so as to ensure die() won't interrupt us while
we are munging shared-memory data structures.  Avoid insecure intermediate
states in some code that proc_exit will call, like palloc/pfree.  Rename
START/END_CRIT_CODE to START/END_CRIT_SECTION, since that seems to be
what people tend to call them anyway, and make them be called with () like
a function call, in hopes of not confusing pg_indent.
I doubt that this is sufficient to make SIGTERM safe anywhere; there's
just too much code that could get invoked during proc_exit().
2001-01-12 21:54:01 +00:00
Hiroshi Inoue
09a160d579 Removed a no longer needed SetWaitingForLock() call in
DeadLockCheck().
2001-01-10 01:24:19 +00:00
Hiroshi Inoue
7edff1618e Disable query cancel during HandleDeadLock(). 2001-01-09 09:38:57 +00:00
Tom Lane
e2586c3c62 LockBuffer should not elog while holding buffer's cntx_lock. 2001-01-08 18:31:49 +00:00
Tom Lane
542b7c6445 Clear QueryCancel and ProcDiePending at start of proc_exit, to ensure
that leftover cancel/die requests cannot interfere with exit activities.
2001-01-07 04:30:41 +00:00
Tom Lane
1b8a219eef Clean up non-reentrant interface for hash_seq/HashTableWalk, so that
starting a new hashtable search no longer clobbers any other search
active anywhere in the system.  Fix RelationCacheInvalidate() so that
it will not crash or go into an infinite loop if invoked recursively,
as for example by a second SI Reset message arriving while we are still
processing a prior one.
2001-01-02 04:33:24 +00:00
Vadim B. Mikheev
3e059b3802 1. WAL needs in zero-ed content of newly initialized page.
2. Log record for PageRepaireFragmentation now keeps array
   of !LP_USED offnums to redo cleanup properly.
2000-12-30 15:19:57 +00:00
Tom Lane
c23851bbe0 Paranoia about possible values of errno after a shmget/semget failure.
In theory we should always get EEXIST if there's a key collision, but
if the kernel code tests error conditions in a weird order, perhaps
EACCES or EIDRM could occur too.
2000-12-30 01:20:55 +00:00
Tom Lane
7f60b81e1a Fix failure in CreateCheckPoint on some Alpha boxes --- it's not OK to
assume that TAS() will always succeed the first time, even if the lock
is known to be free.  Also, make sure that code will eventually time out
and report a stuck spinlock, rather than looping forever.  Small cleanups
in s_lock.h, too.
2000-12-29 21:31:21 +00:00
Vadim B. Mikheev
7ceeeb662f New WAL version - CRC and data blocks backup. 2000-12-28 13:00:29 +00:00
Vadim B. Mikheev
369aace5f3 Avoid XLogFlush for clean buffers in BufferSync. 2000-12-22 20:04:43 +00:00
Tom Lane
6cc842abd3 Revise lock manager to support "session level" locks as well as "transaction
level" locks.  A session lock is not released at transaction commit (but it
is released on transaction abort, to ensure recovery after an elog(ERROR)).
In VACUUM, use a session lock to protect the master table while vacuuming a
TOAST table, so that the TOAST table can be done in an independent
transaction.

I also took this opportunity to do some cleanup and renaming in the lock
code.  The previously noted bug in ProcLockWakeup, that it couldn't wake up
any waiters beyond the first non-wakeable waiter, is now fixed.  Also found
a previously unknown bug of the same kind (failure to scan all members of
a lock queue in some cases) in DeadLockCheck.  This might have led to failure
to detect a deadlock condition, resulting in indefinite waits, but it's
difficult to characterize the conditions required to trigger a failure.
2000-12-22 00:51:54 +00:00
Tom Lane
e6e9e18e9e Remove multi.c and single.c, which have been dead code for
over two years.
2000-12-20 22:54:02 +00:00
Tom Lane
5491233f52 Ensure that 'errno' is saved and restored by all signal handlers that
might change it.  Experimentation shows that the signal handler call
mechanism does not save/restore errno for you, at least not on Linux
or HPUX, so this is definitely a real risk.
2000-12-18 17:33:42 +00:00
Tom Lane
a626b78c89 Clean up backend-exit-time cleanup behavior. Use on_shmem_exit callbacks
to ensure that we have released buffer refcounts and so forth, rather than
putting ad-hoc operations before (some of the calls to) proc_exit.  Add
commentary to discourage future hackers from repeating that mistake.
2000-12-18 00:44:50 +00:00
Tom Lane
2cf8064af8 Tweak Darwin patch to get right include order. 2000-12-11 16:35:59 +00:00
Tom Lane
41fe2a2a03 Darwin porting patches from Peter Bierman <bierman@apple.com> 2000-12-11 00:49:54 +00:00
Tom Lane
fb47385fc8 Resurrect -F switch: it controls fsyncs again, though the fsyncs are
mostly just on the WAL logfile nowadays.  But if people want to disable
fsync for performance, why should we say no?
2000-12-08 22:21:33 +00:00
Tom Lane
68ed296301 Don't use 'private' as a parameter name in visible headers ... makes C++
very unhappy ...
2000-12-03 17:18:10 +00:00
Vadim B. Mikheev
309112267f misc 2000-11-30 19:06:37 +00:00
Vadim B. Mikheev
8247f47fc7 Hope that this is valid localbuf.c version 2000-11-30 19:03:26 +00:00
Vadim B. Mikheev
81c8c244b2 No more #ifdef XLOG. 2000-11-30 08:46:26 +00:00
Tom Lane
b16516b887 It seems some platforms declare kill(2) in signal.h not unistd.h. 2000-11-30 03:11:24 +00:00