Commit Graph

1768 Commits

Author SHA1 Message Date
Heikki Linnakangas 8c843fff2d Bootstrap WAL to begin at segment logid=0 logseg=1 (000000010000000000000001)
rather than 0/0, so that we can safely use 0/0 as an invalid value. This is a
more future-proof fix for the corner-case bug in streaming replication that
was fixed yesterday. We had a similar corner-case bug with log/seg 0/0 back in
February as well. Avoiding 0/0 as a valid value should prevent bugs like that
in the future. Per Tom Lane's idea.

Back-patch to 9.0. Since this only affects bootstrapping, it makes no
difference to existing installations. We don't need to worry about the
bug in existing installations, because if you've managed to get past the
initial base backup already, you won't hit the bug in the future either.
2010-11-02 11:39:48 +02:00
Heikki Linnakangas 931b6db39b Fix corner-case bug in tracking of latest removed WAL segment during
streaming replication. We used log/seg 0/0 to indicate that no WAL segments
have been removed since startup, but 0/0 is a valid value for the very first
WAL segment after initdb. To make that disambiguous, store
(latest removed WAL segment + 1) in the global variable.

Per report from Matt Chesler, also reproduced by Greg Smith.
2010-11-01 10:05:15 +02:00
Heikki Linnakangas 0c6293dd03 Before removing backup_label and irrevocably changing pg_control file, check
that WAL file containing the checkpoint redo-location can be found. This
avoids making the cluster irrecoverable if the redo location is in an earlie
WAL file than the checkpoint record.

Report, analysis and patch by Jeff Davis, with small changes by me.
2010-10-26 21:43:52 +03:00
Peter Eisentraut 35670340f5 Refactor typenameTypeId()
Split the old typenameTypeId() into two functions: A new typenameTypeId() that
returns only a type OID, and typenameTypeIdAndMod() that returns type OID and
typmod.  This isolates call sites better that actually care about the typmod.
2010-10-25 21:44:49 +03:00
Tom Lane def30e84c4 Don't try to fetch database name when SetTransactionIdLimit() is executed
outside a transaction.

This repairs brain fade in my patch of 2009-08-30: the reason we had been
storing oldest-database name, not OID, in ShmemVariableCache was of course
to avoid having to do a catalog lookup at times when it might be unsafe.

This error explains why Aleksandr Dushein is having trouble getting out of
an XID wraparound state in bug #5718, though not how he got into that state
in the first place.  I suspect pg_upgrade is at fault there.
2010-10-20 12:48:51 -04:00
Alvaro Herrera 17a16663d0 Remove AtStart_Cache() call in CommandCounterIncrement().
This call was present in the aboriginal code from Berkeley, and has
never been touched; it may very well be that it was there to mask
effects of bugs in other places and it may no longer be necessary.
The removal has been foreseen in a code comment since 2007; this seems
to be a good time to test this hypothesis.
2010-10-20 11:33:57 -03:00
Tom Lane 419d2374bf Fix a passel of inappropriately-named global functions in GIN.
The GIN code has absolutely no business exporting GIN-specific functions
with names as generic as compareItemPointers() or newScanKey(); that's
just trouble waiting to happen.  I got annoyed about this again just now
and decided to fix it.  This commit ensures that all global symbols
defined in access/gin/ have names including "gin" or "Gin".  There were a
couple of cases, like names involving "PostingItem", where arguably the
names were already sufficiently nongeneric; but I figured as long as I was
risking creating merge problems for unapplied GIN patches I might as well
impose a uniform policy.

I didn't touch any static symbol names.  There might be some places
where it'd be appropriate to rename some static functions to match
siblings that are exported, but I'll leave that for another time.
2010-10-17 21:43:26 -04:00
Tom Lane 48c7d9f6ff Improve GIN indexscan cost estimation.
The better estimate requires more statistics than we previously stored:
in particular, counts of "entry" versus "data" pages within the index,
as well as knowledge of the number of distinct key values.  We collect
this information during initial index build and update it during VACUUM,
storing the info in new fields on the index metapage.  No initdb is
required because these fields will read as zeroes in a pre-existing
index, and the new gincostestimate code is coded to behave (reasonably)
sanely if they are zeroes.

Teodor Sigaev, reviewed by Jan Urbanski, Tom Lane, and Itagaki Takahiro.
2010-10-17 20:52:32 -04:00
Simon Riggs 3bbcc5c999 Make startup process respond to signals to cancel waiting on latch.
A tidy up for recently committed changes to startup latch.

Fujii Masao
2010-10-14 19:15:26 +01:00
Simon Riggs 45cd9199c2 Fix bug in comment of timeline history file.
Fujii Masao
2010-10-14 19:06:06 +01:00
Tom Lane 4016bdef8a Fix assorted bugs in GIN's WAL replay logic.
The original coding was quite sloppy about handling the case where
XLogReadBuffer fails (because the page has since been deleted).  This
would result in either "bad buffer id: 0" or an Assert failure during
replay, if indeed the page were no longer there.  In a couple of places
it also neglected to check whether the change had already been applied,
which would probably result in corrupted index contents.  I believe that
bug #5703 is an instance of the first problem.  These issues could show up
without replication, but only if you were unfortunate enough to crash
between modification of a GIN index and the next checkpoint.

Back-patch to 8.2, which is as far back as GIN has WAL support.
2010-10-11 19:04:37 -04:00
Tom Lane 9cc8c84e73 Improve logging in VACUUM FULL VERBOSE and CLUSTER VERBOSE.
This patch resurrects some of the information that could be logged by the
old, now-dead implementation of VACUUM FULL, in particular counts of live
and dead tuples and the time taken for the table rebuild proper.  There's
still no logging about the ensuing index rebuilds, though.

Itagaki Takahiro
2010-10-07 21:46:46 -04:00
Magnus Hagander 9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Bruce Momjian cecde97577 Update HOT README about when single-page vacuums happen. 2010-09-19 17:51:44 +00:00
Tom Lane 54d0e2886a Add some documentation about how we WAL-log filesystem actions.
Per a question from Robert Haas.
2010-09-17 00:42:39 +00:00
Heikki Linnakangas 79b54816db Fix two typos in comments, spotted by Fujii Masao and Thom Brown 2010-09-15 13:58:22 +00:00
Heikki Linnakangas 723d0184e2 Use a latch to make startup process wake up and replay immediately when
new WAL arrives via streaming replication. This reduces the latency, and
also allows us to use a longer polling interval, which is good for energy
efficiency.

We still need to poll to check for the appearance of a trigger file, but
the interval is now 5 seconds (instead of 100ms), like when waiting for
a new WAL segment to appear in WAL archive.
2010-09-15 10:35:05 +00:00
Joe Conway 5eb15c9942 SERIALIZABLE transactions are actually implemented beneath the covers with
transaction snapshots, i.e. a snapshot registered at the beginning of
a transaction. Change variable naming and comments to reflect this reality
in preparation for a future, truly serializable mode, e.g.
Serializable Snapshot Isolation (SSI).

For the moment transaction snapshots are still used to implement
SERIALIZABLE, but hopefully not for too much longer. Patch by Kevin
Grittner and Dan Ports with review and some minor wording changes by me.
2010-09-11 18:38:58 +00:00
Heikki Linnakangas 2746e5f21d Introduce latches. A latch is a boolean variable, with the capability to
wait until it is set. Latches can be used to reliably wait until a signal
arrives, which is hard otherwise because signals don't interrupt select()
on some platforms, and even when they do, there's race conditions.

On Unix, latches use the so called self-pipe trick under the covers to
implement the sleep until the latch is set, without race conditions. On
Windows, Windows events are used.

Use the new latch abstraction to sleep in walsender, so that as soon as
a transaction finishes, walsender is woken up to immediately send the WAL
to the standby. This reduces the latency between master and standby, which
is good.

Preliminary work by Fujii Masao. The latch implementation is by me, with
helpful comments from many people.
2010-09-11 15:48:04 +00:00
Tom Lane eb36d1ad51 Fix oversight in RelFileNodeBackend patch: CreateFakeRelcacheEntry needs to
initialize the rd_backend field of a fake Relation entry correctly.
Fortunately, that is easy, since only non-temp relations should ever be
mentioned in the WAL stream.
2010-08-30 16:46:23 +00:00
Simon Riggs ac791d3ca1 Fix misleading DEBUG2 issued during RemoveOldXlogFiles() 2010-08-30 15:37:41 +00:00
Simon Riggs e72f15ed60 Truncate subtrans after each restartpoint.
Issue reported by Harald Kolb, patch by Fujii Masao, review by me.
2010-08-30 14:22:05 +00:00
Tom Lane 8fa30f906b Reduce PANIC to ERROR in some occasionally-reported btree failure cases.
This patch changes _bt_split() and _bt_pagedel() to throw a plain ERROR,
rather than PANIC, for several cases that are reported from the field
from time to time:
* right sibling's left-link doesn't match;
* PageAddItem failure during _bt_split();
* parent page's next child isn't right sibling during _bt_pagedel().
In addition the error messages for these cases have been made a bit
more verbose, with additional values included.

The original motivation for PANIC here was to capture core dumps for
subsequent analysis.  But with so many users whose platforms don't capture
core dumps by default, or who are unprepared to analyze them anyway, it's hard
to justify a forced database restart when we can fairly easily detect the
problems before we've reached the critical sections where PANIC would be
necessary.  It is not currently known whether the reports of these messages
indicate well-hidden bugs in Postgres, or are a result of storage-level
malfeasance; the latter possibility suggests that we ought to try to be more
robust even if there is a bug here that's ultimately found.

Backpatch to 8.2.  The code before that is sufficiently different that
it doesn't seem worth the trouble to back-port further.
2010-08-29 19:33:14 +00:00
Alvaro Herrera 3a1b51de19 Remove duplicate translatable phrase 2010-08-26 19:23:41 +00:00
Robert Haas d37781fa82 Tidy up a few calls to smrgextend().
In the new API introduced by my patch to include the backend ID in
temprel filenames, the last argument to smrgextend() became skipFsync
rather than isTemp, but these calls didn't get the memo.  It's not
really a problem to pass rel->rd_istemp rather than just plain false,
because smgrextend() now automatically skips the fsync for temprels
anyway, but this seems cleaner and saves some minute number of cycles.
2010-08-19 02:58:37 +00:00
Robert Haas debcec7dc3 Include the backend ID in the relpath of temporary relations.
This allows us to reliably remove all leftover temporary relation
files on cluster startup without reference to system catalogs or WAL;
therefore, we no longer include temporary relations in XLOG_XACT_COMMIT
and XLOG_XACT_ABORT WAL records.

Since these changes require including a backend ID in each
SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id
field has been reduced from two bytes to one, and the maximum number
of connections has been reduced from INT_MAX / 4 to 2^23-1.  It would
be possible to remove these restrictions by increasing the size of
SharedInvalidationMessage by 4 bytes, but right now that doesn't seem
like a good trade-off.

Review by Jaime Casanova and Tom Lane.
2010-08-13 20:10:54 +00:00
Robert Haas 95ef7cd40d Make RecordTransactionCommit() respect wal_level.
Since the only purpose of WAL-loggin SharedInvalidationMessages is to support
Hot Standby operation, they needn't be included when wal_level < hot_standby.

Back-patch to 9.0.

Review by Heikki Linnakanagas and Fujii Masao.
2010-08-13 15:42:21 +00:00
Robert Haas 30c22eb8fc Correct sundry errors in Hot Standby-related comments.
Fujii Masao
2010-08-12 23:24:54 +00:00
Tom Lane d4fe61b083 Fix an additional set of problems in GIN's handling of lossy page pointers.
Although the key-combining code claimed to work correctly if its input
contained both lossy and exact pointers for a single page in a single TID
stream, in fact this did not work, and could not work without pretty
fundamental redesign.  Modify keyGetItem so that it will not return such a
stream, by handling lossy-pointer cases a bit more explicitly than we did
before.

Per followup investigation of a gripe from Artur Dabrowski.
An example of a query that failed given his data set is
select count(*) from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:* | dd:*')) and
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'aa:*'));

Back-patch to 8.4 where the lossy pointer code was introduced.
2010-08-01 19:16:39 +00:00
Tom Lane 0454f13161 Rewrite the rbtree routines so that an RBNode is the first field of the
struct representing a tree entry, rather than being a separately allocated
piece of storage.  This API is at least as clean as the old one (if not
more so --- there were some bizarre choices in there) and it permits a
very substantial memory savings, on the order of 2X in ginbulk.c's usage.

Also, fix minor memory leaks in code called by ginEntryInsert, in
particular in ginInsertValue and entryFillRoot, as well as ginEntryInsert
itself.  These leaks resulted in the GIN index build context continuing
to bloat even after we'd filled it to maintenance_work_mem and started
to dump data out to the index.

In combination these fixes restore the GIN index build code to honoring
the maintenance_work_mem limit about as well as it did in 8.4.  Speed
seems on par with 8.4 too, maybe even a bit faster, for a non-pathological
case in which HEAD was formerly slower.

Back-patch to 9.0 so we don't have a performance regression from 8.4.
2010-08-01 02:12:42 +00:00
Tom Lane 2ab57e089b Rewrite the key-combination logic in GIN's keyGetItem() and scanGetItem()
routines to make them behave better in the presence of "lossy" index pointers.
The previous coding was outright incorrect for some cases, as recently
reported by Artur Dabrowski: scanGetItem would fail to return index entries in
cases where one index key had multiple exact pointers on the same page as
another key had a lossy pointer.  Also, keyGetItem was extremely inefficient
for cases where a single index key generates multiple "entry" streams, such as
an @@ operator with a multiple-clause tsquery.  The presence of a lossy page
pointer in any one stream defeated its ability to use the opclass
consistentFn, resulting in probing many heap pages that didn't really need to
be visited.  In Artur's example case, a query like
	WHERE tsvector @@ to_tsquery('a & b')
was about 50X slower than the theoretically equivalent
	WHERE tsvector @@ to_tsquery('a') AND tsvector @@ to_tsquery('b')
The way that I chose to fix this was to have GIN call the consistentFn
twice with both TRUE and FALSE values for the in-doubt entry stream,
returning a hit if either call produces TRUE, but not if they both return
FALSE.  The code handles this for the case of a single in-doubt entry stream,
but punts (falling back to the stupid behavior) if there's more than one lossy
reference to the same page.  The idea could be scaled up to deal with multiple
lossy references, but I think that would probably be wasted complexity.  At
least to judge by Artur's example, such cases don't occur often enough to be
worth trying to optimize.

Back-patch to 8.4.  8.3 did not have lossy GIN index pointers, so not
subject to these problems.
2010-07-31 00:30:54 +00:00
Simon Riggs 5b8bd0529e Rename asyncCommitLSN to asyncXactLSN to reflect changed role in 9.0.
Transaction aborts now record their LSN to avoid corner case
behaviour in SR/HS, hence change of name of variables and functions.
As pointed out by Fujii Masao. Cosmetic changes only.
2010-07-29 22:27:27 +00:00
Robert Haas 1a078629ac Fix possible page corruption by ALTER TABLE .. SET TABLESPACE.
If a zeroed page is present in the heap, ALTER TABLE .. SET TABLESPACE will
set the LSN and TLI while copying it, which is wrong, and heap_xlog_newpage()
will do the same thing during replay, so the corruption propagates to any
standby.  Note, however, that the bug can't be demonstrated unless archiving
is enabled, since in that case we skip WAL logging altogether, and the LSN/TLI
are not set.

Back-patch to 8.0; prior releases do not have tablespaces.

Analysis and patch by Jeff Davis.  Adjustments for back-branches and minor
wordsmithing by me.
2010-07-29 16:14:36 +00:00
Robert Haas 7be8946c78 Avoid deep recursion when assigning XIDs to multiple levels of subxacts.
Backpatch to 8.0.

Andres Freund, with cleanup and adjustment for older branches by me.
2010-07-23 00:43:00 +00:00
Tom Lane 672efc0865 Update obsolete comment. Noted by Josh Tolley. 2010-07-08 16:08:30 +00:00
Bruce Momjian 239d769e7e pgindent run for 9.0, second run 2010-07-06 19:19:02 +00:00
Tom Lane 8771634666 Don't set recoveryLastXTime when replaying a checkpoint --- that was a bogus
idea from the start since the variable is only meant to track commit/abort
events.  This patch reverts the logic around the variable to what it was in
8.4, except that the value is now kept in shared memory rather than a static
variable, so that it can be reported correctly by CreateRestartPoint (which is
executed in the bgwriter).
2010-07-03 22:15:45 +00:00
Tom Lane e76c1a0f4d Replace max_standby_delay with two parameters, max_standby_archive_delay and
max_standby_streaming_delay, and revise the implementation to avoid assuming
that timestamps found in WAL records can meaningfully be compared to clock
time on the standby server.  Instead, the delay limits are compared to the
elapsed time since we last obtained a new WAL segment from archive or since
we were last "caught up" to WAL data arriving via streaming replication.
This avoids problems with clock skew between primary and standby, as well
as other corner cases that the original coding would misbehave in, such
as the primary server having significant idle time between transactions.
Per my complaint some time ago and considerable ensuing discussion.

Do some desultory editing on the hot standby documentation, too.
2010-07-03 20:43:58 +00:00
Bruce Momjian b57ddccf05 Add C comment about why synchronous_commit=off behavior can lose
committed transactions in a postmaster crash.
2010-06-29 18:44:58 +00:00
Robert Haas 400916b6d7 emode_for_corrupt_record shouldn't reduce LOG messages to WARNING.
In non-interactive sessions, WARNING sorts below LOG.
2010-06-28 19:46:19 +00:00
Tom Lane 09698bb5fb Make RemoveOldXlogFiles's debug printout match style used elsewhere:
log and seg aren't an XLogRecPtr and shouldn't be printed like one.
Fujii Masao
2010-06-17 17:37:23 +00:00
Tom Lane 07e8b6aabc Don't allow walsender to send WAL data until it's been safely fsync'd on the
master.  Otherwise a subsequent crash could cause the master to lose WAL that
has already been applied on the slave, resulting in the slave being out of
sync and soon corrupt.  Per recent discussion and an example from Robert Haas.

Fujii Masao
2010-06-17 16:41:25 +00:00
Heikki Linnakangas 6da07cd80d If a corrupt WAL record is received by streaming replication, disconnect
and retry. If the record is genuinely corrupt in the master database,
there's little hope of recovering, but it's better than simply retrying
to apply the corrupt WAL record in a tight loop without even trying to
retransmit it, which is what we used to do.
2010-06-14 06:04:21 +00:00
Peter Eisentraut c86efdde5f Fix typo/bug, found by Clang compiler 2010-06-12 09:14:52 +00:00
Itagaki Takahiro 56834fc759 Rename restartpoint_command to archive_cleanup_command. 2010-06-10 08:13:50 +00:00
Heikki Linnakangas 0a7cb85531 Make TriggerFile variable static. It's not used outside xlog.c.
Fujii Masao
2010-06-10 07:49:23 +00:00
Heikki Linnakangas 346d7cd7fa Return NULL instead of 0/0 in pg_last_xlog_receive_location() and
pg_last_xlog_replay_location(). Per Robert Haas's suggestion, after
Itagaki Takahiro pointed out an issue in the docs. Also, some wording
changes in the docs by me.
2010-06-10 07:00:27 +00:00
Heikki Linnakangas 71815306e9 In standby mode, respect checkpoint_segments in addition to
checkpoint_timeout to trigger restartpoints. We used to deliberately only
do time-based restartpoints, because if checkpoint_segments is small we
would spend time doing restartpoints more often than really necessary.
But now that restartpoints are done in bgwriter, they're not as
disruptive as they used to be. Secondly, because streaming replication
stores the streamed WAL files in pg_xlog, we want to clean it up more
often to avoid running out of disk space when checkpoint_timeout is large
and checkpoint_segments small.

Patch by Fujii Masao, with some minor changes by me.
2010-06-09 15:04:07 +00:00
Magnus Hagander 8c873bbfa7 Make the walwriter close it's handle to an old xlog segment if it's no longer
the current one. Not doing this would leave the walwriter with a handle to a
deleted file if there was nothing for it to do for a long period of time,
preventing the file from  being completely removed.

Reported by Tollef Fog Heen, and thanks to Heikki for some hand-holding with
the patch.
2010-06-09 10:54:45 +00:00
Itagaki Takahiro b5faba1284 Ensure default-only storage parameters for TOAST relations
to be initialized with proper values. Affected parameters are
fillfactor, analyze_threshold, and analyze_scale_factor.

Especially uninitialized fillfactor caused inefficient page usage
because we built a StdRdOptions struct in which fillfactor is zero
if any reloption is set for the toast table.

In addition, we disallow toast.autovacuum_analyze_threshold and
toast.autovacuum_analyze_scale_factor because we didn't actually
support them; they are always ignored.

Report by Rumko on pgsql-bugs on 12 May 2010.
Analysis by Tom Lane and Alvaro Herrera. Patch by me.

Backpatch to 8.4.
2010-06-07 02:59:02 +00:00
Peter Eisentraut cb6038c168 Fix some inconsistent quoting of wal_level values in messages
When referring to postgresql.conf syntax, then it's without quotes
(wal_level=archive); in narrative it's with double quotes.  But never
single quotes.
2010-06-03 21:02:12 +00:00
Robert Haas d561430b66 On clean shutdown during recovery, don't warn about possible corruption.
Fujii Masao.  Review by Heikki Linnakangas and myself.
2010-06-03 03:20:00 +00:00
Heikki Linnakangas 6b24036365 Fix obsolete comments that I neglected to update in a previous patch.
Fujii Masao
2010-06-02 09:28:44 +00:00
Heikki Linnakangas c5bd8feac6 Adjust comment to reflect that we now have Hot Standby. Pointed out by
Robert Haas.
2010-05-27 00:38:39 +00:00
Robert Haas ea9968c331 Rename PM_RECOVERY_CONSISTENT and PMSIGNAL_RECOVERY_CONSISTENT.
The new names PM_HOT_STANDBY and PMSIGNAL_BEGIN_HOT_STANDBY more accurately
reflect their actual function.
2010-05-15 20:01:32 +00:00
Simon Riggs 4a24c9a063 Fix bug in processing of checkpoint time for max_standby_delay. Latest
log time was incorrectly set, typically leading to dates in the past,
which would cause more cancellations in Hot Standby on a quiet server.
2010-05-15 07:14:43 +00:00
Simon Riggs fd34374b17 Add many new Asserts in code and fix simple bug that slipped through
without them, related to previous commit. Report by Bruce Momjian.
2010-05-14 07:11:49 +00:00
Simon Riggs 463f151a23 Ensure that top level aborts call XLogSetAsyncCommit(). Not doing
so simply leads to data waiting in wal_buffers which then causes
later commits to potentially do emergency writes and for all forms
of replication to be potentially delayed without need or benefit.
Issue pointed out exactly by Fujii Masao, following bug report
by Robert Haas on a separate though related topic.
2010-05-13 11:39:30 +00:00
Simon Riggs 8431e296ea Cleanup initialization of Hot Standby. Clarify working with reanalysis
of requirements and documentation on LogStandbySnapshot(). Fixes
two minor bugs reported by Tom Lane that would lead to an incorrect
snapshot after transaction wraparound. Also fix two other problems
discovered that would give incorrect snapshots in certain cases.
ProcArrayApplyRecoveryInfo() substantially rewritten. Some minor
refactoring of xact_redo_apply() and ExpireTreeKnownAssignedTransactionIds().
2010-05-13 11:15:38 +00:00
Heikki Linnakangas ffe8c7c677 Need to hold ControlFileLock while updating control file. Update
minRecoveryPoint in control file when replaying a parameter change record,
to ensure that we don't allow hot standby on WAL generated without
wal_level='hot_standby' after a standby restart.
2010-05-03 11:17:52 +00:00
Tom Lane 609a63fd85 Improve printing of XLOG_HEAP_NEWPAGE records to include the forknum. 2010-05-02 22:37:43 +00:00
Tom Lane e55e6ecfe4 Fix replay of XLOG_HEAP_NEWPAGE WAL records to pay attention to the forknum
field of the WAL record.  The previous coding always wrote to the main fork,
resulting in data corruption if the page was meant to go into a non-default
fork.

At present, the only operation that can produce such WAL records is
ALTER TABLE/INDEX SET TABLESPACE when executed with archive_mode = on.
Data corruption would be observed on standby slaves, and could occur on the
master as well if a database crash and recovery occurred after committing
the ALTER and before the next checkpoint.  Per report from Gordon Shannon.

Back-patch to 8.4; the problem doesn't exist in earlier branches because
we didn't have a concept of multiple relation forks then.
2010-05-02 22:28:05 +00:00
Tom Lane f9ed327f76 Clean up some awkward, inaccurate, and inefficient processing around
MaxStandbyDelay.  Use the GUC units mechanism for the value, and choose more
appropriate timestamp functions for performing tests with it.  Make the
ps_activity manipulation in ResolveRecoveryConflictWithVirtualXIDs have
behavior similar to ps_activity code elsewhere, notably not updating the
display when update_process_title is off and not truncating the display
contents at an arbitrarily-chosen length.  Improve the docs to be explicit
about what MaxStandbyDelay actually measures, viz the difference between
primary and standby servers' clocks, and the possible hazards if their clocks
aren't in sync.
2010-05-02 02:10:33 +00:00
Heikki Linnakangas 21992dd4f5 Fix handling of b-tree reuse WAL records when hot standby is disabled,
and add missing code in btree_desc for them. This fixes the bug
with "tree_redo: unknown op code 208" error reported by Jaime Casanova.
2010-04-30 06:34:29 +00:00
Tom Lane 69f7a4d8e3 Adjust error checks in pg_start_backup and pg_stop_backup to make it possible
to perform a backup without archive_mode being enabled.  This gives up some
user-error protection in order to improve usefulness for streaming-replication
scenarios.  Per discussion.
2010-04-29 21:49:03 +00:00
Tom Lane f0488bd57c Rename the parameter recovery_connections to hot_standby, to reduce possible
confusion with streaming-replication settings.  Also, change its default
value to "off", because of concern about executing new and poorly-tested
code during ordinary non-replicating operation.  Per discussion.

In passing do some minor editing of related documentation.
2010-04-29 21:36:19 +00:00
Tom Lane 77acab75df Modify ShmemInitStruct and ShmemInitHash to throw errors internally,
rather than returning NULL for some-but-not-all failures as they used to.
Remove now-redundant tests for NULL from call sites.

We had to do something about this because many call sites were failing to
check for NULL; and changing it like this seems a lot more useful and
mistake-proof than adding checks to the call sites without them.
2010-04-28 16:54:16 +00:00
Heikki Linnakangas 9b8a73326e Introduce wal_level GUC to explicitly control if information needed for
archival or hot standby should be WAL-logged, instead of deducing that from
other options like archive_mode. This replaces recovery_connections GUC in
the primary, where it now has no effect, but it's still used in the standby
to enable/disable hot standby.

Remove the WAL-logging of "unlogged operations", like creating an index
without WAL-logging and fsyncing it at the end. Instead, we keep a copy of
the wal_mode setting and the settings that affect how much shared memory a
hot standby server needs to track master transactions (max_connections,
max_prepared_xacts, max_locks_per_xact) in pg_control. Whenever the settings
change, at server restart, write a WAL record noting the new settings and
update pg_control. This allows us to notice the change in those settings in
the standby at the right moment, they used to be included in checkpoint
records, but that meant that a changed value was not reflected in the
standby until the first checkpoint after the change.

Bump PG_CONTROL_VERSION and XLOG_PAGE_MAGIC. Whack XLOG_PAGE_MAGIC back to
the sequence it used to follow, before hot standby and subsequent patches
changed it to 0x9003.
2010-04-28 16:10:43 +00:00
Tom Lane 2871b4618a Replace the KnownAssignedXids hash table with a sorted-array data structure,
and be more tense about the locking requirements for it, to improve performance
in Hot Standby mode.  In passing fix a few bugs and improve a number of
comments in the existing HS code.

Simon Riggs, with some editorialization by Tom
2010-04-28 00:09:05 +00:00
Heikki Linnakangas 3efba16d56 If a base backup is cancelled by server shutdown or crash, throw an error
in WAL recovery when it sees the shutdown checkpoint record. It's more
user-friendly to find out about it at that point than at the end of
recovery, and you're not left wondering why your hot standby server never
opens up for read-only connections.
2010-04-27 09:25:18 +00:00
Robert Haas 33980a0640 Fix various instances of "the the".
Two of these were pointed out by Erik Rijkers; the rest I found.
2010-04-23 23:21:44 +00:00
Simon Riggs 491d1ea5b3 Previous patch revoked following objections. 2010-04-23 20:21:31 +00:00
Simon Riggs 6ca23b1a29 Make CheckRequiredParameterValues() depend upon correct combination
of parameters. Fix bug report by Robert Haas that error message and
hint was incorrect if wrong mode parameters specified on master.
Internal changes only. Proposals for parameter simplification on
master/primary still under way.
2010-04-23 19:57:19 +00:00
Simon Riggs a2555571fb Optimise btree delete processing when no active backends.
Clarify comments, downgrade a message to DEBUG and remove some
debug counters. Direct from ideas by Heikki Linnakangas.
2010-04-22 08:04:25 +00:00
Simon Riggs 781ec6b75d Further reductions in Hot Standby conflict processing. These
come from the realistion that HEAP2_CLEAN records don't
always remove user visible data, so conflict processing for
them can be skipped. Confirm validity using Assert checks,
clarify circumstances under which we log heap_cleanup_info
records. Tuning arises from bug fixing of earlier safety
check failures.
2010-04-22 02:15:45 +00:00
Simon Riggs bc2b85d904 Fix oversight in collecting values for cleanup_info records.
vacuum_log_cleanup_info() now generates log records with a valid
latestRemovedXid set in all cases. Also be careful not to zero the
value when we do a round of vacuuming part-way through lazy_scan_heap().
Incidentally, this reduces frequency of conflicts in Hot Standby.
2010-04-21 17:20:56 +00:00
Robert Haas 481cb5d9b5 Rename standby_keep_segments to wal_keep_segments.
Also, make the name of the GUC and the name of the backing variable match.
Alnong the way, clean up a couple of slight typographical errors in the
related docs.
2010-04-20 11:15:06 +00:00
Tom Lane 39bf46384b Fix uninitialized local variables. Not sure why gcc doesn't complain about
these --- maybe because they're effectively unused?  MSVC does complain though,
per buildfarm.
2010-04-19 17:54:48 +00:00
Simon Riggs d38603bd97 Improve sequence and sense of messages from pg_stop_backup().
Now doesn't report it is waiting until it actually is waiting,
plus message doesn't appear until at least 5 seconds wait, so
we avoid reporting the wait before we've given the archiver
a reasonable time to wake up and archive the file we just
created earlier in the function.
Also add new unconditional message to confirm safe completion.
Now a normal, healthy execution does not report waiting at
all, just safe completion.
2010-04-18 18:44:53 +00:00
Simon Riggs 2847de9df2 Remove some additional changes in previous commit that belong elsewhere. 2010-04-18 18:17:12 +00:00
Simon Riggs 21d6a6a128 Tune GetSnapshotData() during Hot Standby by avoiding loop
through normal backends. Makes code clearer also, since we
avoid various Assert()s. Performance of snapshots taken
during recovery no longer depends upon number of read-only
backends.
2010-04-18 18:06:07 +00:00
Heikki Linnakangas 78974cfb9b In standby mode, suppress repeated LOG messages about a corrupt record,
which just indicates that we've reached the end of valid WAL found in
the standby.
2010-04-16 08:58:16 +00:00
Bruce Momjian ec4b9bcc3d Doc change: effect -> affect, per Robert Haas 2010-04-15 03:05:59 +00:00
Robert Haas 9d137a756f Typo fix. Kevin Grittner. 2010-04-14 20:17:26 +00:00
Simon Riggs 55d7556a4d Fix minor typo in comment in xlog.c 2010-04-14 10:29:07 +00:00
Heikki Linnakangas 361bd1662e Allow Hot Standby to begin from a shutdown checkpoint.
Patch by Simon Riggs & me
2010-04-13 14:17:46 +00:00
Heikki Linnakangas 30556568f5 Update the location of last removed WAL segment in shared memory only
after actually removing one, so that if we can't remove segments because
WAL archiving is lagging behind, we don't unnecessarily forbid streaming
the old not-yet-archived segments that are still perfectly valid. Per
suggestion from Fujii Masao.
2010-04-12 10:40:43 +00:00
Heikki Linnakangas e57cd7f0a1 Change the logic to decide when to delete old WAL segments, so that it
doesn't take into account how far the WAL senders are. This way a hung
WAL sender doesn't prevent old WAL segments from being recycled/removed
in the primary, ultimately causing the disk to fill up. Instead add
standby_keep_segments setting to control how many old WAL segments are
kept in the primary. This also makes it more reliable to use streaming
replication without WAL archiving, assuming that you set
standby_keep_segments high enough.
2010-04-12 09:52:29 +00:00
Heikki Linnakangas 0f11ed5886 Allow quotes to be escaped in recovery.conf, by doubling them. This patch
also makes the parsing a little bit stricter, rejecting garbage after the
parameter value and values with missing ending quotes, for example.
2010-04-07 10:58:49 +00:00
Heikki Linnakangas 370f770c15 Forbid using pg_xlogfile_name() and pg_xlogfile_name_offset() during
recovery. We might want to relax this in the future, but ThisTimeLineID
isn't currently correct in backends during recovery, so the filename
returned was wrong.
2010-04-07 06:12:52 +00:00
Simon Riggs 89c5008158 Further message changes when recovery.conf parameters missing. 2010-04-06 17:51:58 +00:00
Heikki Linnakangas 492d9f2309 Rename "Log-streaming replication parameters" header to "Standby server
parameters" in recovery.conf, to match the grouping in the documentation.

Fujii Masao
2010-04-06 14:53:20 +00:00
Simon Riggs cf2575b8c4 Check compulsory parameters in recovery.conf in standby_mode, per docs. 2010-04-02 21:50:40 +00:00
Simon Riggs 31f00d163b Move system startup message prior to any calls out of data directory.
This allows us to see what mode the server is in before it starts to
perform actions that can block or hang. Otherwise server messages
may not appear until after messages that say FATAL the database
server is starting up.
2010-04-02 13:10:56 +00:00
Robert Haas 54943734f8 Refer to max_wal_senders in a more consistent fashion.
The error message now makes explicit reference to the GUC that must be changed
to fix the problem, using wording suggested by Tom Lane.  Along the way,
rename the GUC from MaxWalSenders to max_wal_senders for consistency and
grep-ability.
2010-04-01 00:43:29 +00:00
Bruce Momjian 55a01b4c0a Change recovery.conf.sample to match postgresql.conf by showing only
default values, with example comments.
2010-03-31 14:18:45 +00:00
Heikki Linnakangas 2a77355ea1 Change the retry-loop in standby mode to also try restoring files from
pg_xlog directory. This is essential for replaying WAL records that
were streamed from the master, after a standby server restart.

If a corrupt record is seen in a file restored from the archive or
streamed from the master, log it as a WARNING and keep retrying. If the
corruption is permanent, and not just a glitch in the whatever copies the
files to the archive or a network error not caught by CRC checks in TCP
for example, we will keep retrying and logging the WARNING indefinitely.
But that's better than shutting down completely, the standby is still
useful for running read-only queries. In PITR the recovery ends at such a
corrupt record, which is a bit questionable, but that's the behavior we
had in previous releases and we don't feel like chaning it now. It does
make sense for tools like pg_standby.
2010-03-30 16:23:57 +00:00
Bruce Momjian e919a844eb Properly initialize local varaible in
btree_xlog_delete_get_latestRemovedXid().  This variable was only tested
in assert builds.
2010-03-30 13:46:09 +00:00
Simon Riggs de66effede Edit recovery.conf.sample so it matches docs. Change standby_mode
example to 'on or 'off' rather than 'true' or 'false', as shown
in docs. Add restartpoint_command. Add section header for recovery
target parameters, matching docs.
2010-03-29 18:50:36 +00:00
Simon Riggs a760893dbd Derive latestRemovedXid for btree deletes by reading heap pages. The
WAL record for btree delete contains a list of tids, even when backup
blocks are present. We follow the tids to their heap tuples, taking
care to follow LP_REDIRECT tuples. We ignore LP_DEAD tuples on the
understanding that they will always have xmin/xmax earlier than any
LP_NORMAL tuples referred to by killed index tuples. Iff all tuples
are LP_DEAD we return InvalidTransactionId. The heap relfilenode is
added to the WAL record, requiring API changes to pass down the heap
Relation. XLOG_PAGE_MAGIC updated.
2010-03-28 09:27:02 +00:00