Commit Graph

348 Commits

Author SHA1 Message Date
Robert Haas 240067b3b0 Merge synchronous_replication setting into synchronous_commit.
This means one less thing to configure when setting up synchronous
replication, and also avoids some ambiguity around what the behavior
should be when the settings of these variables conflict.

Fujii Masao, with additional hacking by me.
2011-04-04 16:25:52 -04:00
Heikki Linnakangas 647f8b3dba Reword the phrase on zero replication_timeout in the docs. 2011-03-31 10:19:29 +03:00
Heikki Linnakangas 754baa21f7 Automatically terminate replication connections that are idle for more
than replication_timeout (a new GUC) milliseconds. The TCP timeout is often
too long, you want the master to notice a dead connection much sooner.
People complained about that in 9.0 too, but with synchronous replication
it's even more important to notice dead connections promptly.

Fujii Masao and Heikki Linnakangas
2011-03-30 10:20:37 +03:00
Robert Haas 9a56dc3389 Fix various possible problems with synchronous replication.
1. Don't ignore query cancel interrupts.  Instead, if the user asks to
cancel the query after we've already committed it, but before it's on
the standby, just emit a warning and let the COMMIT finish.

2. Don't ignore die interrupts (pg_terminate_backend or fast shutdown).
Instead, emit a warning message and close the connection without
acknowledging the commit.  Other backends will still see the effect of
the commit, but there's no getting around that; it's too late to abort
at this point, and ignoring die interrupts altogether doesn't seem like
a good idea.

3. If synchronous_standby_names becomes empty, wake up all backends
waiting for synchronous replication to complete.  Without this, someone
attempting to shut synchronous replication off could easily wedge the
entire system instead.

4. Avoid depending on the assumption that if a walsender updates
MyProc->syncRepState, we'll see the change even if we read it without
holding the lock.  The window for this appears to be quite narrow (and
probably doesn't exist at all on machines with strong memory ordering)
but protecting against it is practically free, so do that.

5. Remove useless state SYNC_REP_MUST_DISCONNECT, which isn't needed and
doesn't actually do anything.

There's still some further work needed here to make the behavior of fast
shutdown plausible, but that looks complex, so I'm leaving it for a
separate commit.  Review by Fujii Masao.
2011-03-17 13:12:21 -04:00
Bruce Momjian e148443ddd Document guc context values, and reference them from the config doc section.
Tom Lane
2011-03-17 00:27:01 -04:00
Bruce Momjian df4a9595c2 Wording adjustment for restart_after_crash entry
Specifically, mention that "restart" is disabled by this parameter.
2011-03-15 12:43:39 -04:00
Robert Haas f0f3617135 Minor sync rep documentation improvements.
- Make the name of the ID tag for the GUC entry match the GUC name.
- Clarify that synchronous_replication waits for xlog flush, not receipt.
- Mention that synchronous_replication won't wait if max_wal_senders=0.
2011-03-15 11:25:04 -04:00
Bruce Momjian 7a8f43968a In docs, rename "backwards compatibility" to "backward compatibility"
for consistency.
2011-03-11 14:33:10 -05:00
Bruce Momjian a1bb5a480d Document how listen_addresses can do only IPv4 or IPv6. 2011-03-11 10:31:43 -05:00
Itagaki Takahiro 1144726d07 synchronous_standby_names is a string parameter. 2011-03-09 19:49:16 +09:00
Robert Haas c74d3aceb9 Synchronous replication doc corrections.
Thom Brown
2011-03-07 11:59:58 -05:00
Simon Riggs a8a8a3e096 Efficient transaction-controlled synchronous replication.
If a standby is broadcasting reply messages and we have named
one or more standbys in synchronous_standby_names then allow
users who set synchronous_replication to wait for commit, which
then provides strict data integrity guarantees. Design avoids
sending and receiving transaction state information so minimises
bookkeeping overheads. We synchronize with the highest priority
standby that is connected and ready to synchronize. Other standbys
can be defined to takeover in case of standby failure.

This version has very strict behaviour; more relaxed options
may be added at a later date.

Simon Riggs and Fujii Masao, with reviews by Yeb Havinga, Jaime
Casanova, Heikki Linnakangas and Robert Haas, plus the assistance
of many other design reviewers.
2011-03-06 22:49:16 +00:00
Robert Haas 59d6a75942 Avoid excessive Hot Standby feedback messages.
Without this patch, when wal_receiver_status_interval=0, indicating that no
status messages should be sent, Hot Standby feedback messages are instead sent
extremely frequently.

Fujii Masao, with documentation changes by me.
2011-03-01 11:34:25 -05:00
Heikki Linnakangas be6668d6ef Increase the default for wal_sender_delay from 200ms to 1s. Now that WAL
sender is immediately woken up by transaction commit, there's no need to
wake up so aggressively.
2011-02-26 23:38:25 +02:00
Simon Riggs bca8b7f16a Hot Standby feedback for avoidance of cleanup conflicts on standby.
Standby optionally sends back information about oldestXmin of queries
which is then checked and applied to the WALSender's proc->xmin.
GetOldestXmin() is modified slightly to agree with GetSnapshotData(),
so that all backends on primary include WALSender within their snapshots.
Note this does nothing to change the snapshot xmin on either master or
standby. Feedback piggybacks on the standby reply message.
vacuum_defer_cleanup_age is no longer used on standby, though parameter
still exists on primary, since some use cases still exist.

Simon Riggs, review comments from Fujii Masao, Heikki Linnakangas, Robert Haas
2011-02-16 19:29:37 +00:00
Robert Haas 883a9659fa Assorted corrections to the patch to add WAL receiver replies.
Per reports from Fujii Masao.
2011-02-15 12:05:00 -05:00
Robert Haas 6a77e9385e Rename max_predicate_locks_per_transaction.
The new name, max_pred_locks_per_transaction, is shorter.

Kevin Grittner, per discussion.
2011-02-15 08:04:55 -05:00
Heikki Linnakangas b186523fd9 Send status updates back from standby server to master, indicating how far
the standby has written, flushed, and applied the WAL. At the moment, this
is for informational purposes only, the values are only shown in
pg_stat_replication system view, but in the future they will also be needed
for synchronous replication.

Extracted from Simon riggs' synchronous replication patch by Robert Haas, with
some tweaking by me.
2011-02-10 21:04:02 +02:00
Robert Haas 32896c40ca Avoid having autovacuum workers wait for relation locks.
Waiting for relation locks can lead to starvation - it pins down an
autovacuum worker for as long as the lock is held.  But if we're doing
an anti-wraparound vacuum, then we still wait; maintenance can no longer
be put off.

To assist with troubleshooting, if log_autovacuum_min_duration >= 0,
we log whenever an autovacuum or autoanalyze is skipped for this reason.

Per a gripe by Josh Berkus, and ensuing discussion.
2011-02-07 22:04:29 -05:00
Heikki Linnakangas dafaa3efb7 Implement genuine serializable isolation level.
Until now, our Serializable mode has in fact been what's called Snapshot
Isolation, which allows some anomalies that could not occur in any
serialized ordering of the transactions. This patch fixes that using a
method called Serializable Snapshot Isolation, based on research papers by
Michael J. Cahill (see README-SSI for full references). In Serializable
Snapshot Isolation, transactions run like they do in Snapshot Isolation,
but a predicate lock manager observes the reads and writes performed and
aborts transactions if it detects that an anomaly might occur. This method
produces some false positives, ie. it sometimes aborts transactions even
though there is no anomaly.

To track reads we implement predicate locking, see storage/lmgr/predicate.c.
Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared
memory is finite, so when a transaction takes many tuple-level locks on a
page, the locks are promoted to a single page-level lock, and further to a
single relation level lock if necessary. To lock key values with no matching
tuple, a sequential scan always takes a relation-level lock, and an index
scan acquires a page-level lock that covers the search key, whether or not
there are any matching keys at the moment.

A predicate lock doesn't conflict with any regular locks or with another
predicate locks in the normal sense. They're only used by the predicate lock
manager to detect the danger of anomalies. Only serializable transactions
participate in predicate locking, so there should be no extra overhead for
for other transactions.

Predicate locks can't be released at commit, but must be remembered until
all the transactions that overlapped with it have completed. That means that
we need to remember an unbounded amount of predicate locks, so we apply a
lossy but conservative method of tracking locks for committed transactions.
If we run short of shared memory, we overflow to a new "pg_serial" SLRU
pool.

We don't currently allow Serializable transactions in Hot Standby mode.
That would be hard, because even read-only transactions can cause anomalies
that wouldn't otherwise occur.

Serializable isolation mode now means the new fully serializable level.
Repeatable Read gives you the old Snapshot Isolation level that we have
always had.

Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and
Anssi Kääriäinen
2011-02-08 00:09:08 +02:00
Bruce Momjian ad76242633 remove tags. 2011-02-06 18:44:43 -05:00
Robert Haas 0af695fd43 Log restartpoints in the same fashion as checkpoints.
Prior to 9.0, restartpoints never created, deleted, or recycled WAL
files, but now they can.  This code makes log_checkpoints treat
checkpoints and restartpoints symmetrically.  It also adjusts up
the documentation of the parameter to mention restartpoints.

Fujii Masao.  Docs by me, as suggested by Itagaki Takahiro.
2011-02-02 21:08:53 -05:00
Bruce Momjian d56d246e70 Properly capitalize hyphenated words in documentation titles. 2011-02-01 17:00:26 -05:00
Bruce Momjian 7106f74e2a Clarify documentation to state that "zero_damaged_pages" does not force
data to disk, so the table or index should be recreated before the
parameter is turned off again.
2011-02-01 16:44:22 -05:00
Bruce Momjian 6c6e6f7fd3 Document that effective cache size does not assume data remains in the
cache between queries.
2011-02-01 15:23:35 -05:00
Itagaki Takahiro 03282bfa89 Add a link from client_encoding parameter to the list of character sets
in documentation.

Thom Brown
2011-02-01 14:26:17 +09:00
Bruce Momjian 5d5678d7c3 Properly capitalize documentation headings; some only had initial-word
capitalization.
2011-01-29 13:01:48 -05:00
Magnus Hagander 048d148fe6 Add pg_basebackup tool for streaming base backups
This tool makes it possible to do the pg_start_backup/
copy files/pg_stop_backup step in a single command.

There are still some steps to be done before this is a
complete backup solution, such as the ability to stream
the required WAL logs, but it's still usable, and
could do with some buildfarm coverage.

In passing, make the checkpoint request optionally
fast instead of hardcoding it.

Magnus Hagander, reviewed by Fujii Masao and Dimitri Fontaine
2011-01-23 12:21:23 +01:00
Tom Lane 0f73aae13d Allow the wal_buffers setting to be auto-tuned to a reasonable value.
If wal_buffers is initially set to -1 (which is now the default), it's
replaced by 1/32nd of shared_buffers, with a minimum of 8 (the old default)
and a maximum of the XLOG segment size.  The allowed range for manual
settings is still from 4 up to whatever will fit in shared memory.

Greg Smith, with implementation correction by me.
2011-01-22 20:31:24 -05:00
Robert Haas 7a32ff9732 Revert patch adding support for logging the current role.
This reverts commit a8a8867912, committed
by me earlier today (2011-01-12).  This isn't safe inside an aborted
transaction.

Noted by Tom Lane.
2011-01-12 11:59:21 -05:00
Robert Haas a8a8867912 Add support for logging the current role.
Stephen Frost, with some editorialization by me.
2011-01-12 11:34:53 -05:00
Tom Lane 576477e73c Force default wal_sync_method to be fdatasync on Linux.
Recent versions of the Linux system header files cause xlogdefs.h to
believe that open_datasync should be the default sync method, whereas
formerly fdatasync was the default on Linux.  open_datasync is a bad
choice, first because it doesn't actually outperform fdatasync (in fact
the reverse), and second because we try to use O_DIRECT with it, causing
failures on certain filesystems (e.g., ext4 with data=journal option).
This part of the patch is largely per a proposal from Marti Raudsepp.
More extensive changes are likely to follow in HEAD, but this is as much
change as we want to back-patch.

Also clean up confusing code and incorrect documentation surrounding the
fsync_writethrough option.  Those changes shouldn't result in any actual
behavioral change, but I chose to back-patch them anyway to keep the
branches looking similar in this area.

In 9.0 and HEAD, also do some copy-editing on the WAL Reliability
documentation section.

Back-patch to all supported branches, since any of them might get used
on modern Linux versions.
2010-12-08 20:01:09 -05:00
Simon Riggs e620ee35b2 Optimize commit_siblings in two ways to improve group commit.
First, avoid scanning the whole ProcArray once we know there
are at least commit_siblings active; second, skip the check
altogether if commit_siblings = 0.

Greg Smith
2010-12-08 18:48:03 +00:00
Tom Lane c623365ff9 Point out in default_tablespace's description that CREATE DATABASE ignores it.
Per gripe from Andreas Scherbaum.
2010-11-27 16:08:32 -05:00
Peter Eisentraut fc946c39ae Remove useless whitespace at end of lines 2010-11-23 22:34:55 +02:00
Robert Haas 5a12c808cf Note that effective_io_concurrency only affects bitmap heap scans.
Josh Kupershmidt
2010-10-26 21:44:14 -04:00
Bruce Momjian f75d6a1b19 Add mention of using tools/fsync to test fsync methods. Restructure
recent wal_sync_method doc paragraph to be clearer.
2010-10-19 14:56:53 +00:00
Robert Haas 694c56af2b Improve WAL reliability documentation, and add more cross-references to it.
In particular, we are now more explicit about the fact that you may need
wal_sync_method=fsync_writethrough for crash-safety on some platforms,
including MaxOS X.  There's also now an explicit caution against assuming
that the default setting of wal_sync_method is either crash-safe or best
for performance.
2010-10-07 12:22:00 -04:00
Magnus Hagander 9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Tom Lane 73b3bd5574 Document the existence of the socket lock file under unix_socket_directory,
which is perhaps not a terribly good spot for it but there doesn't seem to be
a better place.  Also add a source-code comment pointing out a couple reasons
for having a separate lock file.  Per suggestion from Greg Smith.
2010-08-26 22:00:19 +00:00
Bruce Momjian c107c35df3 Update autovacuum_freeze_max_age documentation to mention that the
default is low because of pg_clog file removal.

Backpatch to 9.0.X.
2010-08-24 13:32:25 +00:00
Tom Lane 005e427a22 Make an editorial pass over the 9.0 release notes.
This is mostly about grammar, style, and presentation, though I did find
a few small factual errors.
2010-08-23 02:43:25 +00:00
Bruce Momjian d8986332cb Document that autovacuum_freeze_max_age is used for pg_clog recycling.
We already mentioned xid wraparound.
2010-08-22 02:37:32 +00:00
Tom Lane 79dc97a401 Bring some sanity to the trace_recovery_messages code and docs.
Per gripe from Fujii Masao, though this is not exactly his proposed patch.
Categorize as DEVELOPER_OPTIONS and set context PGC_SIGHUP, as per Fujii,
but set the default to LOG because higher values aren't really sensible
(see the code for trace_recovery()).  Fix the documentation to agree with
the code and to try to explain what the variable actually does.  Get rid
of no-op calls trace_recovery(LOG), which accomplish nothing except to
demonstrate that this option confuses even its author.
2010-08-19 22:55:01 +00:00
Peter Eisentraut 5194b9d049 Spell and markup checking 2010-08-17 04:37:21 +00:00
Tom Lane e20df55cca Fix mangled grammar. 2010-08-03 19:02:21 +00:00
Peter Eisentraut 66424a2848 Fix indentation of verbatim block elements
Block elements with verbatim formatting (literallayout, programlisting,
screen, synopsis) should be aligned at column 0 independent of the surrounding
SGML, because whitespace is significant, and indenting them creates erratic
whitespace in the output.  The CSS stylesheets already take care of indenting
the output.

Assorted markup improvements to go along with it.
2010-07-29 19:34:41 +00:00
Peter Eisentraut d33cfbd2e0 Spelling fixes 2010-07-27 19:01:16 +00:00
Peter Eisentraut f581215660 Remove tab from SGML file 2010-07-24 12:16:20 +00:00
Robert Haas ce68df468a Add options to force quoting of all identifiers.
I've added a quote_all_identifiers GUC which affects the behavior
of the backend, and a --quote-all-identifiers argument to pg_dump
and pg_dumpall which sets the GUC and also affects the quoting done
internally by those applications.

Design by Tom Lane; review by Alex Hunsaker; in response to bug #5488
filed by Hartmut Goebel.
2010-07-22 01:22:35 +00:00