Commit Graph

534 Commits

Author SHA1 Message Date
Neil Conway 886a02d1cb Add a txn_start column to pg_stat_activity. This makes it easier to
identify long-running transactions. Since we already need to record
the transaction-start time (e.g. for now()), we don't need any
additional system calls to report this information.

Catversion bumped, initdb required.
2006-12-06 18:06:48 +00:00
Tom Lane 5f60086e10 Minor adjustments to make failures in startup/shutdown behave more cleanly.
StartupXLOG and ShutdownXLOG no longer need to be critical sections, because
in all contexts where they are invoked, elog(ERROR) would be translated to
elog(FATAL) anyway.  (One change in bgwriter.c is needed to make this true:
set ExitOnAnyError before trying to exit.  This is a good fix anyway since
the existing code would have gone into an infinite loop on elog(ERROR) during
shutdown.)  That avoids a misleading report of PANIC during semi-orderly
failures.  Modify the postmaster to include the startup process in the set of
processes that get SIGTERM when a fast shutdown is requested, and also fix it
to not try to restart the bgwriter if the bgwriter fails while trying to write
the shutdown checkpoint.  Net result is that "pg_ctl stop -m fast" does
something reasonable for a system in warm standby mode, and so should Unix
system shutdown (ie, universal SIGTERM).  Per gripe from Stephen Harris and
some corner-case testing of my own.
2006-11-30 18:29:12 +00:00
Tom Lane 395249ecbe Several changes to reduce the probability of running out of memory during
AbortTransaction, which would lead to recursion and eventual PANIC exit
as illustrated in recent report from Jeff Davis.  First, in xact.c create
a special dedicated memory context for AbortTransaction to run in.  This
solves the problem as long as AbortTransaction doesn't need more than 32K
(or whatever other size we create the context with).  But in corner cases
it might.  Second, in trigger.c arrange to keep pending after-trigger event
records in separate contexts that can be freed near the beginning of
AbortTransaction, rather than having them persist until CleanupTransaction
as before.  Third, in portalmem.c arrange to free executor state data
earlier as well.  These two changes should result in backing off the
out-of-memory condition before AbortTransaction needs any significant
amount of memory, at least in typical cases such as memory overrun due
to too many trigger events or too big an executor hash table.  And all
the same for subtransaction abort too, of course.
2006-11-23 01:14:59 +00:00
Tom Lane 3ad0728c81 On systems that have setsid(2) (which should be just about everything except
Windows), arrange for each postmaster child process to be its own process
group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole
process group not only the direct child process.  This provides saner behavior
for archive and recovery scripts; in particular, it's possible to shut down a
warm-standby recovery server using "pg_ctl stop -m immediate", since delivery
of SIGQUIT to the startup subprocess will result in killing the waiting
recovery_command.  Also, this makes Query Cancel and statement_timeout apply
to scripts being run from backends via system().  (There is no support in the
core backend for that, but it's widely done using untrusted PLs.)  Per gripe
from Stephen Harris and subsequent discussion.
2006-11-21 20:59:53 +00:00
Tom Lane 4f335a3d7f Repair two related errors in heap_lock_tuple: it was failing to recognize
cases where we already hold the desired lock "indirectly", either via
membership in a MultiXact or because the lock was originally taken by a
different subtransaction of the current transaction.  These cases must be
accounted for to avoid needless deadlocks and/or inappropriate replacement of
an exclusive lock with a shared lock.  Per report from Clarence Gardner and
subsequent investigation.
2006-11-17 18:00:15 +00:00
Peter Eisentraut e138b80996 String fix 2006-11-16 14:28:41 +00:00
Tom Lane 792d6edd5b Clean up some misleading references to %p being a full path, per Simon. 2006-11-10 22:32:20 +00:00
Tom Lane dcbdf9b1d4 Change Windows rename and unlink substitutes so that they time out after
30 seconds instead of retrying forever.  Also modify xlog.c so that if
it fails to rename an old xlog segment up to a future slot, it will
unlink the segment instead.  Per discussion of bug #2712, in which it
became apparent that Windows can handle unlinking a file that's being
held open, but not renaming it.
2006-11-08 20:12:05 +00:00
Tom Lane 48188e1621 Fix recently-understood problems with handling of XID freezing, particularly
in PITR scenarios.  We now WAL-log the replacement of old XIDs with
FrozenTransactionId, so that such replacement is guaranteed to propagate to
PITR slave databases.  Also, rather than relying on hint-bit updates to be
preserved, pg_clog is not truncated until all instances of an XID are known to
have been replaced by FrozenTransactionId.  Add new GUC variables and
pg_autovacuum columns to allow management of the freezing policy, so that
users can trade off the size of pg_clog against the amount of freezing work
done.  Revise the already-existing code that forces autovacuum of tables
approaching the wraparound point to make it more bulletproof; also, revise the
autovacuum logic so that anti-wraparound vacuuming is done per-table rather
than per-database.  initdb forced because of changes in pg_class, pg_database,
and pg_autovacuum catalogs.  Heikki Linnakangas, Simon Riggs, and Tom Lane.
2006-11-05 22:42:10 +00:00
Tom Lane 1e758d5263 Add some code to CREATE DATABASE to check for pre-existing subdirectories
that conflict with the OID that we want to use for the new database.
This avoids the risk of trying to remove files that maybe we shouldn't
remove.  Per gripe from Jon Lapham and subsequent discussion of 27-Sep.
2006-10-18 22:44:12 +00:00
Peter Eisentraut b9b4f10b5b Message style improvements 2006-10-06 17:14:01 +00:00
Bruce Momjian f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Bruce Momjian 45c8ed96b9 Make some sentences consistent with similar ones.
Euler Taveira de Oliveira
2006-10-03 21:21:36 +00:00
Alvaro Herrera 4650c4fdb9 Degrade the transaction-id wraparound point message from LOG to DEBUG1, per
discussion.

Patch from Simon Riggs.
2006-09-26 17:21:39 +00:00
Tom Lane 8fad2e3ff4 Arrange for GetSnapshotData to copy live-subtransaction XIDs from the
PGPROC array into snapshots, and use this information to avoid visits
to pg_subtrans in HeapTupleSatisfiesSnapshot.  This appears to solve
the pg_subtrans-related context swap storm problem that's been reported
by several people for 8.1.  While at it, modify GetSnapshotData to not
take an exclusive lock on ProcArrayLock, as closer analysis shows that
shared lock is always sufficient.
Itagaki Takahiro and Tom Lane
2006-09-03 15:59:39 +00:00
Tom Lane ca1fd0ea5b Move xact.c's partial support for Lists of TransactionIds into pg_list.h.
Needed because lock.c is now going to use the same type of list.
2006-08-27 19:11:46 +00:00
Tom Lane 35af5422f6 Make the server track an 'XID epoch', that is, maintain higher-order bits
of the transaction ID counter.  Nothing is done with the epoch except to
store it in checkpoint records, but this provides a foundation with which
add-on code can pretend that XIDs never wrap around.  This is a severely
trimmed and rewritten version of the xxid patch submitted by Marko Kreen.
Per discussion, the epoch counter seems the only part of xxid that really
needs to be in the core server.
2006-08-21 16:16:31 +00:00
Tom Lane e8ea9e9587 Implement archive_timeout feature to force xlog file switches to occur no more
than N seconds apart.  This allows a simple, if not very high performance,
means of guaranteeing that a PITR archive is no more than N seconds behind
real time.  Also make pg_current_xlog_location return the WAL Write pointer,
add pg_current_xlog_insert_location to return the Insert pointer, and fix
pg_xlogfile_name_offset to return its results as a two-element record instead
of a smashed-together string, as per recent discussion.

Simon Riggs
2006-08-17 23:04:10 +00:00
Tom Lane e002836913 Make recovery from WAL be restartable, by executing a checkpoint-like
operation every so often.  This improves the usefulness of PITR log
shipping for hot standby: formerly, if the standby server crashed, it
was necessary to restart it from the last base backup and replay all
the WAL since then.  Now it will only need to reread about the same
amount of WAL as the master server would.  The behavior might also
come in handy during a long PITR replay sequence.  Simon Riggs,
with some editorialization by Tom Lane.
2006-08-07 16:57:57 +00:00
Tom Lane 704ddaaa09 Add support for forcing a switch to a new xlog file; cause such a switch
to happen automatically during pg_stop_backup().  Add some functions for
interrogating the current xlog insertion point and for easily extracting
WAL filenames from the hex WAL locations displayed by pg_stop_backup
and friends.  Simon Riggs with some editorialization by Tom Lane.
2006-08-06 03:53:44 +00:00
Alvaro Herrera 92c2ecc130 Modify snapshot definition so that lazy vacuums are ignored by other
vacuums.  This allows a OLTP-like system with big tables to continue
regular vacuuming on small-but-frequently-updated tables while the
big tables are being vacuumed.

Original patch from Hannu Krossing, rewritten by Tom Lane and updated
by me.
2006-07-30 02:07:18 +00:00
Peter Eisentraut e9b4969062 DTrace support, with a small initial set of probes
by Robert Lor
2006-07-24 16:32:45 +00:00
Tom Lane 9dc842f083 Don't try to truncate multixact SLRU files in checkpoints done during xlog
recovery.  In the first place, it doesn't work because slru's
latest_page_number isn't set up yet (this is why we've been hearing reports
of strange "apparent wraparound" log messages during crash recovery, but
only from people who'd managed to advance their next-mxact counters some
considerable distance from 0).  In the second place, it seems a bit unwise
to be throwing away data during crash recovery anwyway.  This latter
consideration convinces me to just disable truncation during recovery,
rather than computing latest_page_number and pushing ahead.
2006-07-20 00:46:42 +00:00
Bruce Momjian e0522505bd Remove 576 references of include files that were not needed. 2006-07-14 14:52:27 +00:00
Bruce Momjian a22d76d96a Allow include files to compile own their own.
Strip unused include files out unused include files, and add needed
includes to C files.

The next step is to remove unused include files in C files.
2006-07-13 16:49:20 +00:00
Bruce Momjian 0ff3461bcc Alphabetically order reference to include files, "N" - "S". 2006-07-11 17:26:59 +00:00
Bruce Momjian 3a534ade39 Alphabetically order reference to include files, "G" - "M". 2006-07-11 17:04:13 +00:00
Alvaro Herrera d4cef0aa2a Improve vacuum code to track minimum Xids per table instead of per database.
To this end, add a couple of columns to pg_class, relminxid and relvacuumxid,
based on which we calculate the pg_database columns after each vacuum.

We now force all databases to be vacuumed, even template ones.  A backend
noticing too old a database (meaning pg_database.datminxid is in danger of
falling behind Xid wraparound) will signal the postmaster, which in turn will
start an autovacuum iteration to process the offending database.  In principle
this is only there to cope with frozen (non-connectable) databases without
forcing users to set them to connectable, but it could force regular user
database to go through a database-wide vacuum at any time.  Maybe we should
warn users about this somehow.  Of course the real solution will be to use
autovacuum all the time ;-)

There are some additional improvements we could have in this area: for example
the vacuum code could be smarter about not updating pg_database for each table
when called by autovacuum, and do it only once the whole autovacuum iteration
is done.

I updated the system catalogs documentation, but I didn't modify the
maintenance section.  Also having some regression tests for this would be nice
but it's not really a very straightforward thing to do.

Catalog version bumped due to system catalog changes.
2006-07-10 16:20:52 +00:00
Tom Lane b7b78d24f7 Code review for FILLFACTOR patch. Change WITH grammar as per earlier
discussion (including making def_arg allow reserved words), add missed
opt_definition for UNIQUE case.  Put the reloptions support code in a less
random place (I chose to make a new file access/common/reloptions.c).
Eliminate header inclusion creep.  Make the index options functions safely
user-callable (seems like client apps might like to be able to test validity
of options before trying to make an index).  Reduce overhead for normal case
with no options by allowing rd_options to be NULL.  Fix some unmaintainably
klugy code, including getting rid of Natts_pg_class_fixed at long last.
Some stylistic cleanup too, and pay attention to keeping comments in sync
with code.

Documentation still needs work, though I did fix the omissions in
catalogs.sgml and indexam.sgml.
2006-07-03 22:45:41 +00:00
Bruce Momjian 277807bd9e Add FILLFACTOR to CREATE INDEX.
ITAGAKI Takahiro
2006-07-02 02:23:23 +00:00
Tom Lane 3c71244b74 Put #ifdef NOT_USED around posix_fadvise call. We may want to resurrect
this someday, but right now it seems that posix_fadvise is immature to
the point of being broken on many platforms ... and we don't have any
benchmark evidence proving it's worth spending time on.
2006-06-27 18:59:17 +00:00
Tom Lane 3a04f53e7f pg_stop_backup was calling XLogArchiveNotify() twice for the newly created
backup history file.  Bug introduced by the 8.1 change to make pg_stop_backup
delete older history files.  Per report from Masao Fujii.
2006-06-22 20:42:57 +00:00
Tom Lane 27c3e3de09 Remove redundant gettimeofday() calls to the extent practical without
changing semantics too much.  statement_timestamp is now set immediately
upon receipt of a client command message, and the various places that used
to do their own gettimeofday() calls to mark command startup are referenced
to that instead.  I have also made stats_command_string use that same
value for pg_stat_activity.query_start for both the command itself and
its eventual replacement by <IDLE> or <idle in transaction>.  There was
some debate about that, but no argument that seemed convincing enough to
justify an extra gettimeofday() call.
2006-06-20 22:52:00 +00:00
Tom Lane 1e8ae13640 Don't try to call posix_fadvise() unless <fcntl.h> supplies a declaration
for it.  Hopefully will fix core dump evidenced by some buildfarm members
since fadvise patch went in.  The actual definition of the function is not
ABI-compatible with compiler's default assumption in the absence of any
declaration, so it's clearly unsafe to try to call it without seeing a
declaration.
2006-06-18 18:30:21 +00:00
Bruce Momjian 40bc06fa16 Test for POSIX_FADV_DONTNEED to use posix_fadvise(). 2006-06-16 04:11:48 +00:00
Bruce Momjian 94a5c4a01b Use posix_fadvise() to avoid kernel caching of WAL contents on WAL file
close.

ITAGAKI Takahiro
2006-06-15 19:15:00 +00:00
Teodor Sigaev 8a3631f8d8 GIN: Generalized Inverted iNdex.
text[], int4[], Tsearch2 support for GIN.
2006-05-02 11:28:56 +00:00
Bruce Momjian e6004f0151 Add statement_timestamp(), clock_timestamp(), and
transaction_timestamp() (just like now()).

Also update statement_timeout() to mention it is statement arrival time
that is measured.

Catalog version updated.
2006-04-25 00:25:22 +00:00
Tom Lane eac825aa68 Ensure that we validate the page header of the first page of a WAL file
whenever we start to read within that file.  The first page carries
extra identification information that really ought to be checked, but
as the code stood, this was only checked when we switched sequentially
into a new WAL file, or if by chance the starting checkpoint record was
within the first page.  This patch ensures that we will detect bogus
'long header' information before we start replaying the WAL sequence.
2006-04-20 04:07:38 +00:00
Tom Lane 0a87394956 Fix the torn-page hazard for PITR base backups by forcing full page writes
to occur between pg_start_backup() and pg_stop_backup(), even if the GUC
setting full_page_writes is OFF.  Per discussion, doing this in combination
with the already-existing checkpoint during pg_start_backup() should ensure
safety against partial page updates being included in the backup.  We do
not have to force full page writes to occur during normal PITR operation,
as I had first feared.
2006-04-17 18:55:05 +00:00
Tom Lane defe93463c Make the world safe for full_page_writes. Allow XLOG records that try to
update no-longer-existing pages to fall through as no-ops, but make a note
of each page number referenced by such records.  If we don't see a later
XLOG entry dropping the table or truncating away the page, complain at
the end of XLOG replay.  Since this fixes the known failure mode for
full_page_writes = off, revert my previous band-aid patch that disabled
that GUC variable.
2006-04-14 20:27:24 +00:00
Tom Lane 09b5271ebd Add a field to the first page of each WAL file to indicate the
XLOG_BLCKSZ.  This ought to help in preventing configuration mismatch
problems if anyone tries to ship PITR files between servers compiled
with different XLOG_BLCKSZ settings.  Simon Riggs
2006-04-05 03:34:05 +00:00
Tom Lane e6140d9052 Don't use BLCKSZ for the physical length of the pg_control file, but
instead a dedicated symbol.  This probably makes no functional difference
for likely values of BLCKSZ, but it makes the intent clearer.
Simon Riggs, minor editorialization by Tom Lane.
2006-04-04 22:39:59 +00:00
Tom Lane eaef111396 Define a separately configurable XLOG_BLCKSZ symbol for the page size
used within WAL files.  Historically this was the same as the data file
BLCKSZ, but there's no necessary connection, and it's possible that
performance gains might ensue from reducing XLOG_BLCKSZ.  In any case
distinguishing two symbols should improve code clarity.  This commit
does not actually change the page size, only provide the infrastructure
to make it possible to do so.  initdb forced because of addition of a
field to pg_control.
Mark Wong, with some help from Simon Riggs and Tom Lane.
2006-04-03 23:35:05 +00:00
Tom Lane a8b8f4db23 Clean up WAL/buffer interactions as per my recent proposal. Get rid of the
misleadingly-named WriteBuffer routine, and instead require routines that
change buffer pages to call MarkBufferDirty (which does exactly what it says).
We also require that they do so before calling XLogInsert; this takes care of
the synchronization requirement documented in SyncOneBuffer.  Note that
because bufmgr takes the buffer content lock (in shared mode) while writing
out any buffer, it doesn't matter whether MarkBufferDirty is executed before
the buffer content change is complete, so long as the content change is
completed before releasing exclusive lock on the buffer.  So it's OK to set
the dirtybit before we fill in the LSN.
This eliminates the former kluge of needing to set the dirtybit in LockBuffer.
Aside from making the code more transparent, we can also add some new
debugging assertions, in particular that the caller of MarkBufferDirty must
hold the buffer content lock, not merely a pin.
2006-03-31 23:32:07 +00:00
Tom Lane 6d61cdec07 Clean up and document the API for XLogOpenRelation and XLogReadBuffer.
This commit doesn't make much functional change, but it does eliminate some
duplicated code --- for instance, PageIsNew tests are now done inside
XLogReadBuffer rather than by each caller.
The GIST xlog code still needs a lot of love, but I'll worry about that
separately.
2006-03-29 21:17:39 +00:00
Tom Lane 0a971e2f20 Disable full_page_writes, because turning it off risks causing crash-recovery
failures even when the hardware and OS did nothing wrong.  Per recent analysis
of a problem report from Alex Bahdushka.

For the moment I've just diked out the test of the parameter, rather than
removing the GUC infrastructure and documentation, in case we conclude that
there's something salvageable there.  There seems no chance of it being
resurrected in the 8.1 branch though.
2006-03-28 22:01:16 +00:00
Tom Lane 0a20207060 Arrange to emit a description of the current XLOG record as error context
when an error occurs during xlog replay.  Also, replace the former risky
'write into a fixed-size buffer with no overflow detection' API for XLOG
record description routines; use an expansible StringInfo instead.  (The
latter accounts for most of the patch bulk.)

Qingqing Zhou
2006-03-24 04:32:13 +00:00
Bruce Momjian f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Neil Conway 8e5a10d46c This patch makes the error message strings throughout the backend
more compliant with the error message style guide. In particular,
errdetail should begin with a capital letter and end with a period,
whereas errmsg should not. I also fixed a few related issues in
passing, such as fixing the repeated misspelling of "lexeme" in
contrib/tsearch2 (per Tom's suggestion).
2006-03-01 06:30:32 +00:00