Commit Graph

326 Commits

Author SHA1 Message Date
Bruce Momjian 511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
Bruce Momjian 4ee79fd20d Change the name of dtrace wal tracepoints:
TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY

Robert Lor
2008-12-24 20:41:29 +00:00
Bruce Momjian 5a90bc1fbe The attached patch contains a couple of fixes in the existing probes and
includes a few new ones.

- Fixed compilation errors on OS X for probes that use typedefs
- Fixed a number of probes to pass ForkNumber per the relation forks
patch
- The new probes are those that were taken out from the previous
submitted patch and required simple fixes. Will submit the other probes
that may require more discussion in a separate patch.

Robert Lor
2008-12-17 01:39:04 +00:00
Heikki Linnakangas b457b2a24e If pg_stop_backup() is called just after switching to a new xlog file,
wait for the previous instead of the new file to be archived.

Based on patch by Simon Riggs.
2008-12-03 08:20:11 +00:00
Tom Lane 1d577f5e49 Add a startup check that pg_xlog and pg_xlog/archive_status exist.
If the latter doesn't exist, automatically recreate it.  (We don't do
this for pg_xlog, though, per discussion.)

Jonah Harris
2008-11-09 17:51:15 +00:00
Heikki Linnakangas 19c8dc839b Unite ReadBufferWithFork, ReadBufferWithStrategy, and ZeroOrReadBuffer
functions into one ReadBufferExtended function, that takes the strategy
and mode as argument. There's three modes, RBM_NORMAL which is the default
used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and
a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages
without throwing an error. The FSM needs the new mode to recover from
corrupt pages, which could happend if we crash after extending an FSM file,
and the new page is "torn".

Add fork number to some error messages in bufmgr.c, that still lacked it.
2008-10-31 15:05:00 +00:00
Tom Lane 2314baef38 Fix recoveryLastXTime logic so that it actually does what one would expect.
Per gripe from Kevin Grittner.  Backpatch to 8.3, where the bug was introduced.
2008-10-30 04:06:16 +00:00
Heikki Linnakangas 61d9674988 Make LC_COLLATE and LC_CTYPE database-level settings. Collation and
ctype are now more like encoding, stored in new datcollate and datctype
columns in pg_database.

This is a stripped-down version of Radek Strnad's patch, with further
changes by me.
2008-09-23 09:20:39 +00:00
Tom Lane ead21631e8 Fix a couple of problems pointed out by Fujii Masao in the 2008-Apr-05 patch
for pg_stop_backup.  First, it is possible that the history file name is not
alphabetically later than the last WAL file name, so we should explicitly
check that both have been archived.  Second, the previous coding would wait
forever if a checkpoint had managed to remove the WAL file before we look for
it.

Simon Riggs, plus some code cleanup by me.
2008-09-08 16:42:15 +00:00
Heikki Linnakangas 3f0e808c4a Introduce the concept of relation forks. An smgr relation can now consist
of multiple forks, and each fork can be created and grown separately.

The bulk of this patch is about changing the smgr API to include an extra
ForkNumber argument in every smgr function. Also, smgrscheduleunlink and
smgrdounlink no longer implicitly call smgrclose, because other forks might
still exist after unlinking one. The callers of those functions have been
modified to call smgrclose instead.

This patch in itself doesn't have any user-visible effect, but provides the
infrastructure needed for upcoming patches. The additional forks envisioned
are a rewritten FSM implementation that doesn't rely on a fixed-size shared
memory block, and a visibility map to allow skipping portions of a table in
VACUUM that have no dead tuples.
2008-08-11 11:05:11 +00:00
Tom Lane 9d035f4254 Clean up the use of some page-header-access macros: principally, use
SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that
makes the code clearer, and avoid casting between Page and PageHeader where
possible.  Zdenek Kotala, with some additional cleanup by Heikki Linnakangas.

I did not apply the parts of the proposed patch that would have resulted in
slightly changing the on-disk format of hash indexes; it seems to me that's
not a win as long as there's any chance of having in-place upgrade for 8.4.
2008-07-13 20:45:47 +00:00
Bruce Momjian 6b797c852b Fix recovery.conf boolean variables to take the same range of string
values as postgresql.conf.
2008-06-30 22:10:43 +00:00
Heikki Linnakangas a213f1ee6c Refactor XLogOpenRelation() and XLogReadBuffer() in preparation for relation
forks. XLogOpenRelation() and the associated light-weight relation cache in
xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument,
instead of Relation.

For functions that still need a Relation struct during WAL replay, there's a
new function called CreateFakeRelcacheEntry() that returns a fake entry like
XLogOpenRelation() used to.
2008-06-12 09:12:31 +00:00
Alvaro Herrera cc87402d6e Move BufferGetPageSize and BufferGetPage from bufpage.h to bufmgr.h. It is
more logical that way, and also it reduces the amount of unnecessary includes
in bufpage.h, which is widely used.

Zdenek Kotala.

My previous patch to bufpage.h should also have credited him as author, but I
forgot (sorry about that).
2008-06-08 22:00:48 +00:00
Magnus Hagander 8eee526c19 Set hidden field for guc enum missed in previous commit. 2008-05-28 15:22:05 +00:00
Tom Lane 1a604b4e31 Fix a subtle bug exposed by recent wal_sync_method rearrangements.
Formerly, the default value of wal_sync_method was determined inside xlog.c,
but now it is determined inside guc.c.  guc.c was reading xlogdefs.h
without having read <fcntl.h>, leading to wrong determination of
DEFAULT_SYNC_METHOD.  Obviously xlogdefs.h needs to include <fcntl.h>
for itself to ensure stable results.
2008-05-17 17:24:57 +00:00
Tom Lane 8a2f5d221b Reduce unnecessary PANIC to ERROR, improve a couple of comments. 2008-05-16 19:15:05 +00:00
Magnus Hagander 9bf1db04c0 Remove the special variable for open_sync_bit used in O_SYNC and O_DSYNC
modes, replacing it with a call to a function that derives it from the
sync_method variable, now that it has distinct values for these two cases.

This means that assign_xlog_sync_method() no longer changes any settings,
thus fixing the bug introduced in the change to use a guc enum for
wal_sync_method.
2008-05-14 14:02:57 +00:00
Magnus Hagander 72e2db86b9 Don't try to close negative file descriptors, since this can cause
crashes on certain platforms. In particular, the MSVC runtime is known
to do this.

Fixes bug #4162, reported and diagnosed by Javier Pimas
2008-05-13 20:53:52 +00:00
Magnus Hagander aa82790fca Fix breakage by the wal_sync_method patch in installations that use
O_DSYNC (specifically this broke all the Windows buildfarm members)
2008-05-12 19:45:23 +00:00
Alvaro Herrera 9084399782 Put back bufmgr.h in bufpage.h -- it is needed by some macros.
Remove #include bufmgr.h from (most?) source files which already include
bufpage.h.
2008-05-12 16:06:10 +00:00
Magnus Hagander 2739a4e1d2 Report which WAL sync method we are trying to change *to* when it fails,
not which one we had before (that worked, and thus is completley irrelevant)
2008-05-12 14:27:47 +00:00
Magnus Hagander f99760c19f Convert wal_sync_method to guc enum. 2008-05-12 08:35:05 +00:00
Alvaro Herrera f8c4d7db60 Restructure some header files a bit, in particular heapam.h, by removing some
unnecessary #include lines in it.  Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.

For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.

While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
2008-05-12 00:00:54 +00:00
Heikki Linnakangas c5f42ce8d5 Fix Assert introduced in previous patch. 2008-05-09 15:27:17 +00:00
Heikki Linnakangas f0eb3e5e58 Fix incorrect archive truncation point calculation in the %r recovery_command
parameter. This fixes bug 4137 reported by Wojciech Strzalka, where a WAL
file is deleted too early when starting the recovery of a warm standby server.

Also add a sanity check in pg_standby so that it will refuse to delete anything
earlier than the file being restored, and improve the debug message in case
nothing is deleted.

Simon Riggs. Backpatch to 8.3, which is where %r was introduced.
2008-05-09 14:27:47 +00:00
Magnus Hagander 380d1ee69e Update error messages, per notes from Tom.
Laurenz Albe
2008-04-24 14:23:43 +00:00
Magnus Hagander c979a1fefa Prevent shutdown in normal mode if online backup is running, and
have pg_ctl warn about this.

Cancel running online backups (by renaming the backup_label file,
thus rendering the backup useless) when shutting down in fast mode.

Laurenz Albe
2008-04-23 13:44:59 +00:00
Tom Lane 8472bf7a73 Allow float8, int8, and related datatypes to be passed by value on machines
where Datum is 8 bytes wide.  Since this will break old-style C functions
(those still using version 0 calling convention) that have arguments or
results of these types, provide a configure option to disable it and retain
the old pass-by-reference behavior.  Likewise, provide a configure option
to disable the recently-committed float4 pass-by-value change.

Zoltan Boszormenyi, plus configurability stuff by me.
2008-04-21 00:26:47 +00:00
Tom Lane d1cbd26ded Repair two places where SIGTERM exit could leave shared memory state
corrupted.  (Neither is very important if SIGTERM is used to shut down the
whole database cluster together, but there's a problem if someone tries to
SIGTERM individual backends.)  To do this, introduce new infrastructure
macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care
of transiently pushing an on_shmem_exit cleanup hook.  Also use this method
for createdb cleanup --- that wasn't a shared-memory-corruption problem,
but SIGTERM abort of createdb could leave orphaned files lying around.

Backpatch as far as 8.2.  The shmem corruption cases don't exist in 8.1,
and the createdb usage doesn't seem important enough to risk backpatching
further.
2008-04-16 23:59:40 +00:00
Bruce Momjian 2a1cf97c22 Have pg_stop_backup() wait for all archive files to be sent, rather than
returing right away.  This guarantees that when pg_stop_backup()
returns, you have a valid backup.

Simon Riggs
2008-04-05 01:34:06 +00:00
Tom Lane 220db7ccd8 Simplify and standardize conversions between TEXT datums and ordinary C
strings.  This patch introduces four support functions cstring_to_text,
cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and
two macros CStringGetTextDatum and TextDatumGetCString.  A number of
existing macros that provided variants on these themes were removed.

Most of the places that need to make such conversions now require just one
function or macro call, in place of the multiple notational layers that used
to be needed.  There are no longer any direct calls of textout or textin,
and we got most of the places that were using handmade conversions via
memcpy (there may be a few still lurking, though).

This commit doesn't make any serious effort to eliminate transient memory
leaks caused by detoasting toasted text objects before they reach
text_to_cstring.  We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few
places where it was easy, but much more could be done.

Brendan Jurd and Tom Lane
2008-03-25 22:42:46 +00:00
Tom Lane 2fc2795456 Remove no-longer-used XLogCacheByte field of XLogCtl.
Itagaki Takahiro
2008-03-10 02:13:22 +00:00
Tom Lane cd00406774 Replace time_t with pg_time_t (same values, but always int64) in on-disk
data structures and backend internal APIs.  This solves problems we've seen
recently with inconsistent layout of pg_control between machines that have
32-bit time_t and those that have already migrated to 64-bit time_t.  Also,
we can get out from under the problem that Windows' Unix-API emulation is not
consistent about the width of time_t.

There are a few remaining places where local time_t variables are used to hold
the current or recent result of time(NULL).  I didn't bother changing these
since they do not affect any cross-module APIs and surely all platforms will
have 64-bit time_t before overflow becomes an actual risk.  time_t should
be avoided for anything visible to extension modules, however.
2008-02-17 02:09:32 +00:00
Peter Eisentraut 6f8f8d2daa Provide a clearer error message if the pg_control version number looks
wrong because of mismatched byte ordering.
2008-01-21 11:17:46 +00:00
Bruce Momjian 9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
Bruce Momjian f6e8730d11 Re-run pgindent with updated list of typedefs. (Updated README should
avoid this problem in the future.)
2007-11-15 22:25:18 +00:00
Peter Eisentraut b30769ee54 When logging the recovery.conf parameters, show them quoted as they would
appear in the configuration file.
2007-11-15 22:02:12 +00:00
Bruce Momjian fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
Tom Lane 6cc4451b5c Prevent re-use of a deleted relation's relfilenode until after the next
checkpoint.  This guards against an unlikely data-loss scenario in which
we re-use the relfilenode, then crash, then replay the deletion and
recreation of the file.  Even then we'd be OK if all insertions into the
new relation had been WAL-logged ... but that's not guaranteed given all
the no-WAL-logging optimizations that have recently been added.

Patch by Heikki Linnakangas, per a discussion last month.
2007-11-15 20:36:40 +00:00
Tom Lane 5c8eb929e6 When telling the bgwriter that we need a checkpoint because too much xlog
has been consumed, recheck against the latest value of RedoRecPtr before
really sending the signal.  This avoids useless checkpoint activity if
XLogWrite is executed when we have a very stale local copy of RedoRecPtr.
The potential for useless checkpoint is very much worse in 8.3 because of
the walwriter process (which never does XLogInsert), so while this behavior
was intentional, it needs to be changed.  Per report from Itagaki Takahiro.
2007-10-12 19:39:59 +00:00
Tom Lane ab051bd293 Adjust recovery PS display as agreed with Simon: 'waiting for XXX'
while the restore_command does its thing, then 'recovering XXX' while
processing the segment file.  These operations are heavyweight enough
that an extra PS display set shouldn't bother anyone.
2007-09-30 17:28:56 +00:00
Tom Lane 77ccbe64dd Make recovery show the current input WAL segment name in the startup
process' PS display.  After a suggestion by Simon (not exactly his
patch though).
2007-09-29 18:32:56 +00:00
Tom Lane b46bd55a6c Make archive recovery always start a new timeline, rather than only when a
recovery stop time was used.  This avoids a corner-case risk of trying to
overwrite an existing archived copy of the last WAL segment, and seems
simpler and cleaner all around than the original definition.  Per example
from Jon Colverson and subsequent analysis by Simon.
2007-09-29 01:36:10 +00:00
Tom Lane f18dfc4835 Minor improvements in backup and recovery:
- create a separate archive_mode GUC, on which archive_command is dependent

- %r option in recovery.conf sends last restartpoint to recovery command

- %r used in pg_standby, updated README

- minor other code cleanup in pg_standby

- doc on Warm Standby now mentions pg_standby and %r

- log_restartpoints recovery option emits LOG message at each restartpoint

- end of recovery now displays last transaction end time, as requested
  by Warren Little; also shown at each restartpoint

- restart archiver if needed to carry away WAL files at shutdown

Simon Riggs
2007-09-26 22:36:30 +00:00
Tom Lane 6bd4f401b0 Replace the former method of determining snapshot xmax --- to wit, calling
ReadNewTransactionId from GetSnapshotData --- with a "latestCompletedXid"
variable that is updated during transaction commit or abort.  Since
latestCompletedXid is written only in places that had to lock ProcArrayLock
exclusively anyway, and is read only in places that had to lock ProcArrayLock
shared anyway, it adds no new locking requirements to the system despite being
cluster-wide.  Moreover, removing ReadNewTransactionId from snapshot
acquisition eliminates the need to take both XidGenLock and ProcArrayLock at
the same time.  Since XidGenLock is sometimes held across I/O this can be a
significant win.  Some preliminary benchmarking suggested that this patch has
no effect on average throughput but can significantly improve the worst-case
transaction times seen in pgbench.  Concept by Florian Pflug, implementation
by Tom Lane.
2007-09-08 20:31:15 +00:00
Tom Lane 295e63983d Implement lazy XID allocation: transactions that do not modify any database
rows will normally never obtain an XID at all.  We already did things this way
for subtransactions, but this patch extends the concept to top-level
transactions.  In applications where there are lots of short read-only
transactions, this should improve performance noticeably; not so much from
removal of the actual XID-assignments, as from reduction of overhead that's
driven by the rate of XID consumption.  We add a concept of a "virtual
transaction ID" so that active transactions can be uniquely identified even
if they don't have a regular XID.  This is a much lighter-weight concept:
uniqueness of VXIDs is only guaranteed over the short term, and no on-disk
record is made about them.

Florian Pflug, with some editorialization by Tom.
2007-09-05 18:10:48 +00:00
Tom Lane a52e4408b9 Add a debug logging message when a resource manager rejects an attempted
restart point.  Per suggestion from Simon Riggs.
2007-08-28 23:17:47 +00:00
Tom Lane 647fd9a108 Fix two bugs induced in VACUUM FULL by async-commit patch.
First, we cannot assume that XLogAsyncCommitFlush guarantees hint bits will be
settable, because clog.c's inexact LSN bookkeeping results in windows where a
previously flushed transaction is considered unhintable because it shares an
LSN slot with a later unflushed transaction.  But repair_frag requires
XMIN_COMMITTED to be correct so that it can distinguish tuples moved by the
current vacuum.  Since not being able to set the bit is an uncommon corner
case, the most practical way of dealing with it seems to be to abandon
shrinking (ie, don't invoke repair_frag) when we find a non-dead tuple whose
XMIN_COMMITTED bit couldn't be set.

Second, it is possible for the same reason that a RECENTLY_DEAD tuple does not
get its XMAX_COMMITTED bit set during scan_heap.  But by the time repair_frag
examines the tuple it might be possible to set the bit.  We therefore must
take buffer content lock when calling HeapTupleSatisfiesVacuum a second time,
else we can get an Assert failure in SetBufferCommitInfoNeedsSave.  This
latter bug is latent in existing releases, but I think it cannot actually
occur without async commit, since the first HeapTupleSatisfiesVacuum call
should always have set the bit.  So I'm not going to back-patch it.

In passing, reduce the existing "cannot shrink relation" messages from NOTICE
to LOG level.  The new message must be no higher than LOG if we don't want
unpredictable regression test failures, and consistency seems like a good
idea.  Also arrange that only one such message is reported per VACUUM FULL;
in typical scenarios you could get spammed with many such messages, which
seems a bit useless.
2007-08-13 19:08:26 +00:00
Tom Lane bdd6b62245 Switch over to using the src/timezone functions for formatting timestamps
displayed in the postmaster log.  This avoids Windows-specific problems with
localized time zone names that are in the wrong encoding, and generally seems
like a good idea to forestall other potential platform-dependent issues.
To preserve the existing behavior that all backends will log in the same time
zone, create a new GUC variable log_timezone that can only be changed on a
system-wide basis, and reference log-related calculations to that zone instead
of the TimeZone variable.

This fixes the issue reported by Hiroshi Saito that timestamps printed by
xlog.c startup could be improperly localized on Windows.  We still need a
simpler patch for that problem in the back branches, however.
2007-08-04 01:26:54 +00:00