Enhance documentation of the build-in standby mode, explaining the retry
loop in standby mode, trying to restore from archive, pg_xlog and streaming. Move sections around to make the high availability chapter more coherent: the most prominent part is now a "Log-Shipping Standby Servers" section that describes what a standby server is (like the old "Warm Standby Servers for High Availability" section), and how to set up a warm standby server, including streaming replication, using the built-in standby mode. The pg_standby method is desribed in another section called "Alternative method for log shipping", with the added caveat that it doesn't work with streaming replication.
This commit is contained in:
parent
55a01b4c0a
commit
991bfe11d2
|
@ -1,4 +1,4 @@
|
||||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/high-availability.sgml,v 1.54 2010/03/19 19:31:06 sriggs Exp $ -->
|
<!-- $PostgreSQL: pgsql/doc/src/sgml/high-availability.sgml,v 1.55 2010/03/31 19:13:01 heikki Exp $ -->
|
||||||
|
|
||||||
<chapter id="high-availability">
|
<chapter id="high-availability">
|
||||||
<title>High Availability, Load Balancing, and Replication</title>
|
<title>High Availability, Load Balancing, and Replication</title>
|
||||||
|
@ -455,32 +455,10 @@ protocol to make nodes agree on a serializable transactional order.
|
||||||
|
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
|
|
||||||
<sect1 id="warm-standby">
|
<sect1 id="warm-standby">
|
||||||
<title>File-based Log Shipping</title>
|
<title>Log-Shipping Standby Servers</title>
|
||||||
|
|
||||||
<indexterm zone="high-availability">
|
|
||||||
<primary>warm standby</primary>
|
|
||||||
</indexterm>
|
|
||||||
|
|
||||||
<indexterm zone="high-availability">
|
|
||||||
<primary>PITR standby</primary>
|
|
||||||
</indexterm>
|
|
||||||
|
|
||||||
<indexterm zone="high-availability">
|
|
||||||
<primary>standby server</primary>
|
|
||||||
</indexterm>
|
|
||||||
|
|
||||||
<indexterm zone="high-availability">
|
|
||||||
<primary>log shipping</primary>
|
|
||||||
</indexterm>
|
|
||||||
|
|
||||||
<indexterm zone="high-availability">
|
|
||||||
<primary>witness server</primary>
|
|
||||||
</indexterm>
|
|
||||||
|
|
||||||
<indexterm zone="high-availability">
|
|
||||||
<primary>STONITH</primary>
|
|
||||||
</indexterm>
|
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
Continuous archiving can be used to create a <firstterm>high
|
Continuous archiving can be used to create a <firstterm>high
|
||||||
|
@ -510,8 +488,8 @@ protocol to make nodes agree on a serializable transactional order.
|
||||||
adjacent system, another system at the same site, or another system on
|
adjacent system, another system at the same site, or another system on
|
||||||
the far side of the globe. The bandwidth required for this technique
|
the far side of the globe. The bandwidth required for this technique
|
||||||
varies according to the transaction rate of the primary server.
|
varies according to the transaction rate of the primary server.
|
||||||
Record-based log shipping is also possible with custom-developed
|
Record-based log shipping is also possible with streaming replication
|
||||||
procedures, as discussed in <xref linkend="warm-standby-record">.
|
(see <xref linkend="streaming-replication">).
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
|
@ -519,26 +497,52 @@ protocol to make nodes agree on a serializable transactional order.
|
||||||
records are shipped after transaction commit. As a result, there is a
|
records are shipped after transaction commit. As a result, there is a
|
||||||
window for data loss should the primary server suffer a catastrophic
|
window for data loss should the primary server suffer a catastrophic
|
||||||
failure; transactions not yet shipped will be lost. The size of the
|
failure; transactions not yet shipped will be lost. The size of the
|
||||||
data loss window can be limited by use of the
|
data loss window in file-based log shipping can be limited by use of the
|
||||||
<varname>archive_timeout</varname> parameter, which can be set as low
|
<varname>archive_timeout</varname> parameter, which can be set as low
|
||||||
as a few seconds. However such a low setting will
|
as a few seconds. However such a low setting will
|
||||||
substantially increase the bandwidth required for file shipping.
|
substantially increase the bandwidth required for file shipping.
|
||||||
If you need a window of less than a minute or so, consider using
|
If you need a window of less than a minute or so, consider using
|
||||||
<xref linkend="streaming-replication">.
|
streaming replication (see <xref linkend="streaming-replication">).
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
The standby server is not available for access, since it is continually
|
Recovery performance is sufficiently good that the standby will
|
||||||
performing recovery processing. Recovery performance is sufficiently
|
typically be only moments away from full
|
||||||
good that the standby will typically be only moments away from full
|
|
||||||
availability once it has been activated. As a result, this is called
|
availability once it has been activated. As a result, this is called
|
||||||
a warm standby configuration which offers high
|
a warm standby configuration which offers high
|
||||||
availability. Restoring a server from an archived base backup and
|
availability. Restoring a server from an archived base backup and
|
||||||
rollforward will take considerably longer, so that technique only
|
rollforward will take considerably longer, so that technique only
|
||||||
offers a solution for disaster recovery, not high availability.
|
offers a solution for disaster recovery, not high availability.
|
||||||
|
A standby server can also be used for read-only queries, in which case
|
||||||
|
it is called a Hot Standby server. See <xref linkend="hot-standby"> for
|
||||||
|
more information.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<sect2 id="warm-standby-planning">
|
<indexterm zone="high-availability">
|
||||||
|
<primary>warm standby</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<indexterm zone="high-availability">
|
||||||
|
<primary>PITR standby</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<indexterm zone="high-availability">
|
||||||
|
<primary>standby server</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<indexterm zone="high-availability">
|
||||||
|
<primary>log shipping</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<indexterm zone="high-availability">
|
||||||
|
<primary>witness server</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<indexterm zone="high-availability">
|
||||||
|
<primary>STONITH</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<sect2 id="standby-planning">
|
||||||
<title>Planning</title>
|
<title>Planning</title>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
|
@ -573,9 +577,325 @@ protocol to make nodes agree on a serializable transactional order.
|
||||||
versa.
|
versa.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2 id="standby-server-operation">
|
||||||
|
<title>Standby Server Operation</title>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
There is no special mode required to enable a standby server. The
|
In standby mode, the server continously applies WAL received from the
|
||||||
operations that occur on both primary and standby servers are
|
master server. The standby server can read WAL from a WAL archive
|
||||||
|
(see <varname>restore_command</>) or directly from the master
|
||||||
|
over a TCP connection (streaming replication). The standby server will
|
||||||
|
also attempt to restore any WAL found in the standby cluster's
|
||||||
|
<filename>pg_xlog</> directory. That typically happens after a server
|
||||||
|
restart, when the standby replays again WAL that was streamed from the
|
||||||
|
master before the restart, but you can also manually copy files to
|
||||||
|
<filename>pg_xlog</> at any time to have them replayed.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
At startup, the standby begins by restoring all WAL available in the
|
||||||
|
archive location, calling <varname>restore_command</>. Once it
|
||||||
|
reaches the end of WAL available there and <varname>restore_command</>
|
||||||
|
fails, it tries to restore any WAL available in the pg_xlog directory.
|
||||||
|
If that fails, and streaming replication has been configured, the
|
||||||
|
standby tries to connect to the primary server and start streaming WAL
|
||||||
|
from the last valid record found in archive or pg_xlog. If that fails
|
||||||
|
or streaming replication is not configured, or if the connection is
|
||||||
|
later disconnected, the standby goes back to step 1 and tries to
|
||||||
|
restore the file from the archive again. This loop of retries from the
|
||||||
|
archive, pg_xlog, and via streaming replication goes on until the server
|
||||||
|
is stopped or failover is triggered by a trigger file.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Standby mode is exited and the server switches to normal operation,
|
||||||
|
when a trigger file is found (trigger_file). Before failover, it will
|
||||||
|
restore any WAL available in the archive or in pg_xlog, but won't try
|
||||||
|
to connect to the master or wait for files to become available in the
|
||||||
|
archive.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2 id="preparing-master-for-standby">
|
||||||
|
<title>Preparing Master for Standby Servers</title>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Set up continuous archiving to a WAL archive on the master, as described
|
||||||
|
in <xref linkend="continuous-archiving">. The archive location should be
|
||||||
|
accessible from the standby even when the master is down, ie. it should
|
||||||
|
reside on the standby server itself or another trusted server, not on
|
||||||
|
the master server.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
If you want to use streaming replication, set up authentication to allow
|
||||||
|
streaming replication connections and set <varname>max_wal_senders</> in
|
||||||
|
the configuration file of the primary server.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Take a base backup as described in <xref linkend="backup-base-backup">
|
||||||
|
to bootstrap the standby server.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2 id="standby-server-setup">
|
||||||
|
<title>Setting up the standby server</title>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
To set up the standby server, restore the base backup taken from primary
|
||||||
|
server (see <xref linkend="backup-pitr-recovery">). In the recovery command file
|
||||||
|
<filename>recovery.conf</> in the standby's cluster data directory,
|
||||||
|
turn on <varname>standby_mode</>. Set <varname>restore_command</> to
|
||||||
|
a simple command to copy files from the WAL archive. If you want to
|
||||||
|
use streaming replication, set <varname>primary_conninfo</>.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<note>
|
||||||
|
<para>
|
||||||
|
Do not use pg_standby or similar tools with the built-in standby mode
|
||||||
|
described here. <varname>restore_command</> should return immediately
|
||||||
|
if the file does not exist, the server will retry the command again if
|
||||||
|
necessary. See <xref linkend="log-shipping-alternative">
|
||||||
|
for using tools like pg_standby.
|
||||||
|
</para>
|
||||||
|
</note>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
You can use restartpoint_command to prune the archive of files no longer
|
||||||
|
needed by the standby.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
If you're setting up the standby server for high availability purposes,
|
||||||
|
set up WAL archiving, connections and authentication like the primary
|
||||||
|
server, because the standby server will work as a primary server after
|
||||||
|
failover. If you're setting up the standby server for reporting
|
||||||
|
purposes, with no plans to fail over to it, configure the standby
|
||||||
|
accordingly.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
You can have any number of standby servers, but if you use streaming
|
||||||
|
replication, make sure you set <varname>max_wal_senders</> high enough in
|
||||||
|
the primary to allow them to be connected simultaneously.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2 id="streaming-replication">
|
||||||
|
<title>Streaming Replication</title>
|
||||||
|
|
||||||
|
<indexterm zone="high-availability">
|
||||||
|
<primary>Streaming Replication</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Streaming replication allows a standby server to stay more up-to-date
|
||||||
|
than is possible with file-based log shipping. The standby connects
|
||||||
|
to the primary, which streams WAL records to the standby as they're
|
||||||
|
generated, without waiting for the WAL file to be filled.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Streaming replication is asynchronous, so there is still a small delay
|
||||||
|
between committing a transaction in the primary and for the changes to
|
||||||
|
become visible in the standby. The delay is however much smaller than with
|
||||||
|
file-based log shipping, typically under one second assuming the standby
|
||||||
|
is powerful enough to keep up with the load. With streaming replication,
|
||||||
|
<varname>archive_timeout</> is not required to reduce the data loss
|
||||||
|
window.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Streaming replication relies on file-based continuous archiving for
|
||||||
|
making the base backup and for allowing the standby to catch up if it is
|
||||||
|
disconnected from the primary for long enough for the primary to
|
||||||
|
delete old WAL files still required by the standby.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
To use streaming replication, set up a file-based log-shipping standby
|
||||||
|
server as described in <xref linkend="warm-standby">. The step that
|
||||||
|
turns a file-based log-shipping standby into streaming replication
|
||||||
|
standby is setting <varname>primary_conninfo</> setting in the
|
||||||
|
<filename>recovery.conf</> file to point to the primary server. Set
|
||||||
|
<xref linkend="guc-listen-addresses"> and authentication options
|
||||||
|
(see <filename>pg_hba.conf</>) on the primary so that the standby server
|
||||||
|
can connect to the <literal>replication</> pseudo-database on the primary
|
||||||
|
server (see <xref linkend="streaming-replication-authentication">).
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
On systems that support the keepalive socket option, setting
|
||||||
|
<xref linkend="guc-tcp-keepalives-idle">,
|
||||||
|
<xref linkend="guc-tcp-keepalives-interval"> and
|
||||||
|
<xref linkend="guc-tcp-keepalives-count"> helps the master promptly
|
||||||
|
notice a broken connection.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Set the maximum number of concurrent connections from the standby servers
|
||||||
|
(see <xref linkend="guc-max-wal-senders"> for details).
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
When the standby is started and <varname>primary_conninfo</> is set
|
||||||
|
correctly, the standby will connect to the primary after replaying all
|
||||||
|
WAL files available in the archive. If the connection is established
|
||||||
|
successfully, you will see a walreceiver process in the standby, and
|
||||||
|
a corresponding walsender process in the primary.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect3 id="streaming-replication-authentication">
|
||||||
|
<title>Authentication</title>
|
||||||
|
<para>
|
||||||
|
It is very important that the access privilege for replication be setup
|
||||||
|
properly so that only trusted users can read the WAL stream, because it is
|
||||||
|
easy to extract privileged information from it.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Only the superuser is allowed to connect to the primary as the replication
|
||||||
|
standby. So a role with the <literal>SUPERUSER</> and <literal>LOGIN</>
|
||||||
|
privileges needs to be created in the primary.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Client authentication for replication is controlled by the
|
||||||
|
<filename>pg_hba.conf</> record specifying <literal>replication</> in the
|
||||||
|
<replaceable>database</> field. For example, if the standby is running on
|
||||||
|
host IP <literal>192.168.1.100</> and the superuser's name for replication
|
||||||
|
is <literal>foo</>, the administrator can add the following line to the
|
||||||
|
<filename>pg_hba.conf</> file on the primary.
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
# Allow the user "foo" from host 192.168.1.100 to connect to the primary
|
||||||
|
# as a replication standby if the user's password is correctly supplied.
|
||||||
|
#
|
||||||
|
# TYPE DATABASE USER CIDR-ADDRESS METHOD
|
||||||
|
host replication foo 192.168.1.100/32 md5
|
||||||
|
</programlisting>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The host name and port number of the primary, connection user name,
|
||||||
|
and password are specified in the <filename>recovery.conf</> file or
|
||||||
|
the corresponding environment variable on the standby.
|
||||||
|
For example, if the primary is running on host IP <literal>192.168.1.50</>,
|
||||||
|
port <literal>5432</literal>, the superuser's name for replication is
|
||||||
|
<literal>foo</>, and the password is <literal>foopass</>, the administrator
|
||||||
|
can add the following line to the <filename>recovery.conf</> file on the
|
||||||
|
standby.
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
# The standby connects to the primary that is running on host 192.168.1.50
|
||||||
|
# and port 5432 as the user "foo" whose password is "foopass".
|
||||||
|
primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
You do not need to specify <literal>database=replication</> in the
|
||||||
|
<varname>primary_conninfo</varname>. The required option will be added
|
||||||
|
automatically. If you mention the database parameter at all within
|
||||||
|
<varname>primary_conninfo</varname> then a FATAL error will be raised.
|
||||||
|
</para>
|
||||||
|
</sect3>
|
||||||
|
</sect2>
|
||||||
|
</sect1>
|
||||||
|
|
||||||
|
<sect1 id="warm-standby-failover">
|
||||||
|
<title>Failover</title>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
If the primary server fails then the standby server should begin
|
||||||
|
failover procedures.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
If the standby server fails then no failover need take place. If the
|
||||||
|
standby server can be restarted, even some time later, then the recovery
|
||||||
|
process can also be restarted immediately, taking advantage of
|
||||||
|
restartable recovery. If the standby server cannot be restarted, then a
|
||||||
|
full new standby server instance should be created.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
If the primary server fails and the standby server becomes the
|
||||||
|
new primary, and then the old primary restarts, you must have
|
||||||
|
a mechanism for informing the old primary that it is no longer the primary. This is
|
||||||
|
sometimes known as <acronym>STONITH</> (Shoot The Other Node In The Head), which is
|
||||||
|
necessary to avoid situations where both systems think they are the
|
||||||
|
primary, which will lead to confusion and ultimately data loss.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Many failover systems use just two systems, the primary and the standby,
|
||||||
|
connected by some kind of heartbeat mechanism to continually verify the
|
||||||
|
connectivity between the two and the viability of the primary. It is
|
||||||
|
also possible to use a third system (called a witness server) to prevent
|
||||||
|
some cases of inappropriate failover, but the additional complexity
|
||||||
|
might not be worthwhile unless it is set up with sufficient care and
|
||||||
|
rigorous testing.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Once failover to the standby occurs, there is only a
|
||||||
|
single server in operation. This is known as a degenerate state.
|
||||||
|
The former standby is now the primary, but the former primary is down
|
||||||
|
and might stay down. To return to normal operation, a standby server
|
||||||
|
must be recreated,
|
||||||
|
either on the former primary system when it comes up, or on a third,
|
||||||
|
possibly new, system. Once complete the primary and standby can be
|
||||||
|
considered to have switched roles. Some people choose to use a third
|
||||||
|
server to provide backup for the new primary until the new standby
|
||||||
|
server is recreated,
|
||||||
|
though clearly this complicates the system configuration and
|
||||||
|
operational processes.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
So, switching from primary to standby server can be fast but requires
|
||||||
|
some time to re-prepare the failover cluster. Regular switching from
|
||||||
|
primary to standby is useful, since it allows regular downtime on
|
||||||
|
each system for maintenance. This also serves as a test of the
|
||||||
|
failover mechanism to ensure that it will really work when you need it.
|
||||||
|
Written administration procedures are advised.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
To trigger failover of a log-shipping standby server, create a trigger
|
||||||
|
file with the filename and path specified by the <varname>trigger_file</>
|
||||||
|
setting in <filename>recovery.conf</>. If <varname>trigger_file</> is
|
||||||
|
not given, there is no way to exit recovery in the standby and promote
|
||||||
|
it to a master. That can be useful for e.g reporting servers that are
|
||||||
|
only used to offload read-only queries from the primary, not for high
|
||||||
|
availability purposes.
|
||||||
|
</para>
|
||||||
|
</sect1>
|
||||||
|
|
||||||
|
<sect1 id="log-shipping-alternative">
|
||||||
|
<title>Alternative method for log shipping</title>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
An alternative to the built-in standby mode desribed in the previous
|
||||||
|
sections is to use a restore_command that polls the archive location.
|
||||||
|
This was the only option available in versions 8.4 and below. In this
|
||||||
|
setup, set <varname>standby_mode</> off, because you are implementing
|
||||||
|
the polling required for standby operation yourself. See
|
||||||
|
contrib/pg_standby (<xref linkend="pgstandby">) for a reference
|
||||||
|
implementation of this.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Note that in this mode, the server will apply WAL one file at a
|
||||||
|
time, so if you use the standby server for queries (see Hot Standby),
|
||||||
|
there is a bigger delay between an action in the master and when the
|
||||||
|
action becomes visible in the standby, corresponding the time it takes
|
||||||
|
to fill up the WAL file. archive_timeout can be used to make that delay
|
||||||
|
shorter. Also note that you can't combine streaming replication with
|
||||||
|
this method.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The operations that occur on both primary and standby servers are
|
||||||
normal continuous archiving and recovery tasks. The only point of
|
normal continuous archiving and recovery tasks. The only point of
|
||||||
contact between the two database servers is the archive of WAL files
|
contact between the two database servers is the archive of WAL files
|
||||||
that both share: primary writing to the archive, standby reading from
|
that both share: primary writing to the archive, standby reading from
|
||||||
|
@ -639,7 +959,7 @@ if (!triggered)
|
||||||
and design. One potential option is the <varname>restore_command</>
|
and design. One potential option is the <varname>restore_command</>
|
||||||
command. It is executed once for each WAL file, but the process
|
command. It is executed once for each WAL file, but the process
|
||||||
running the <varname>restore_command</> is created and dies for
|
running the <varname>restore_command</> is created and dies for
|
||||||
each file, so there is no daemon or server process, and
|
each file, so there is no daemon or server process, and
|
||||||
signals or a signal handler cannot be used. Therefore, the
|
signals or a signal handler cannot be used. Therefore, the
|
||||||
<varname>restore_command</> is not suitable to trigger failover.
|
<varname>restore_command</> is not suitable to trigger failover.
|
||||||
It is possible to use a simple timeout facility, especially if
|
It is possible to use a simple timeout facility, especially if
|
||||||
|
@ -658,7 +978,6 @@ if (!triggered)
|
||||||
files are no longer required, assuming the archive is writable from the
|
files are no longer required, assuming the archive is writable from the
|
||||||
standby server.
|
standby server.
|
||||||
</para>
|
</para>
|
||||||
</sect2>
|
|
||||||
|
|
||||||
<sect2 id="warm-standby-config">
|
<sect2 id="warm-standby-config">
|
||||||
<title>Implementation</title>
|
<title>Implementation</title>
|
||||||
|
@ -754,243 +1073,6 @@ if (!triggered)
|
||||||
</sect2>
|
</sect2>
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
<sect1 id="streaming-replication">
|
|
||||||
<title>Streaming Replication</title>
|
|
||||||
|
|
||||||
<indexterm zone="high-availability">
|
|
||||||
<primary>Streaming Replication</primary>
|
|
||||||
</indexterm>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
Streaming replication allows a standby server to stay more up-to-date
|
|
||||||
than is possible with file-based log shipping. The standby connects
|
|
||||||
to the primary, which streams WAL records to the standby as they're
|
|
||||||
generated, without waiting for the WAL file to be filled.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
Streaming replication is asynchronous, so there is still a small delay
|
|
||||||
between committing a transaction in the primary and for the changes to
|
|
||||||
become visible in the standby. The delay is however much smaller than with
|
|
||||||
file-based log shipping, typically under one second assuming the standby
|
|
||||||
is powerful enough to keep up with the load. With streaming replication,
|
|
||||||
<varname>archive_timeout</> is not required to reduce the data loss
|
|
||||||
window.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
Streaming replication relies on file-based continuous archiving for
|
|
||||||
making the base backup and for allowing the standby to catch up if it is
|
|
||||||
disconnected from the primary for long enough for the primary to
|
|
||||||
delete old WAL files still required by the standby.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<sect2 id="streaming-replication-setup">
|
|
||||||
<title>Setup</title>
|
|
||||||
<para>
|
|
||||||
The short procedure for configuring streaming replication is as follows.
|
|
||||||
For full details of each step, refer to other sections as noted.
|
|
||||||
|
|
||||||
<orderedlist>
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
Set up primary and standby systems as near identically as possible,
|
|
||||||
including two identical copies of <productname>PostgreSQL</> at the
|
|
||||||
same release level.
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
Set up continuous archiving from the primary to a WAL archive located
|
|
||||||
in a directory on the standby server. In particular, set
|
|
||||||
<xref linkend="guc-archive-mode"> and
|
|
||||||
<xref linkend="guc-archive-command">
|
|
||||||
to archive WAL files in a location accessible from the standby
|
|
||||||
(see <xref linkend="backup-archiving-wal">).
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
Set <xref linkend="guc-listen-addresses"> and authentication options
|
|
||||||
(see <filename>pg_hba.conf</>) on the primary so that the standby server can connect to
|
|
||||||
the <literal>replication</> pseudo-database on the primary server (see
|
|
||||||
<xref linkend="streaming-replication-authentication">).
|
|
||||||
</para>
|
|
||||||
<para>
|
|
||||||
On systems that support the keepalive socket option, setting
|
|
||||||
<xref linkend="guc-tcp-keepalives-idle">,
|
|
||||||
<xref linkend="guc-tcp-keepalives-interval"> and
|
|
||||||
<xref linkend="guc-tcp-keepalives-count"> helps the master promptly
|
|
||||||
notice a broken connection.
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
Set the maximum number of concurrent connections from the standby servers
|
|
||||||
(see <xref linkend="guc-max-wal-senders"> for details).
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
Start the <productname>PostgreSQL</> server on the primary.
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
Make a base backup of the primary server (see
|
|
||||||
<xref linkend="backup-base-backup">), and load this data onto the
|
|
||||||
standby. Note that all files present in <filename>pg_xlog</>
|
|
||||||
and <filename>pg_xlog/archive_status</> on the <emphasis>standby</>
|
|
||||||
server should be removed because they might be obsolete.
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
If you're setting up the standby server for high availability purposes,
|
|
||||||
set up WAL archiving, connections and authentication like the primary
|
|
||||||
server, because the standby server will work as a primary server after
|
|
||||||
failover. If you're setting up the standby server for reporting
|
|
||||||
purposes, with no plans to fail over to it, configure the standby
|
|
||||||
accordingly.
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
Create a recovery command file <filename>recovery.conf</> in the data
|
|
||||||
directory on the standby server. Set <varname>restore_command</varname>
|
|
||||||
as you would in normal recovery from a continuous archiving backup
|
|
||||||
(see <xref linkend="backup-pitr-recovery">). <literal>pg_standby</> or
|
|
||||||
similar tools that wait for the next WAL file to arrive cannot be used
|
|
||||||
with streaming replication, as the server handles retries and waiting
|
|
||||||
itself. Enable <varname>standby_mode</varname>. Set
|
|
||||||
<varname>primary_conninfo</varname> to point to the primary server.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
Start the <productname>PostgreSQL</> server on the standby. The standby
|
|
||||||
server will go into recovery mode and proceed to receive WAL records
|
|
||||||
from the primary and apply them continuously.
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
</orderedlist>
|
|
||||||
</para>
|
|
||||||
</sect2>
|
|
||||||
|
|
||||||
<sect2 id="streaming-replication-authentication">
|
|
||||||
<title>Authentication</title>
|
|
||||||
<para>
|
|
||||||
It is very important that the access privilege for replication be setup
|
|
||||||
properly so that only trusted users can read the WAL stream, because it is
|
|
||||||
easy to extract privileged information from it.
|
|
||||||
</para>
|
|
||||||
<para>
|
|
||||||
Only the superuser is allowed to connect to the primary as the replication
|
|
||||||
standby. So a role with the <literal>SUPERUSER</> and <literal>LOGIN</>
|
|
||||||
privileges needs to be created in the primary.
|
|
||||||
</para>
|
|
||||||
<para>
|
|
||||||
Client authentication for replication is controlled by the
|
|
||||||
<filename>pg_hba.conf</> record specifying <literal>replication</> in the
|
|
||||||
<replaceable>database</> field. For example, if the standby is running on
|
|
||||||
host IP <literal>192.168.1.100</> and the superuser's name for replication
|
|
||||||
is <literal>foo</>, the administrator can add the following line to the
|
|
||||||
<filename>pg_hba.conf</> file on the primary.
|
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
# Allow the user "foo" from host 192.168.1.100 to connect to the primary
|
|
||||||
# as a replication standby if the user's password is correctly supplied.
|
|
||||||
#
|
|
||||||
# TYPE DATABASE USER CIDR-ADDRESS METHOD
|
|
||||||
host replication foo 192.168.1.100/32 md5
|
|
||||||
</programlisting>
|
|
||||||
</para>
|
|
||||||
<para>
|
|
||||||
The host name and port number of the primary, connection user name,
|
|
||||||
and password are specified in the <filename>recovery.conf</> file or
|
|
||||||
the corresponding environment variable on the standby.
|
|
||||||
For example, if the primary is running on host IP <literal>192.168.1.50</>,
|
|
||||||
port <literal>5432</literal>, the superuser's name for replication is
|
|
||||||
<literal>foo</>, and the password is <literal>foopass</>, the administrator
|
|
||||||
can add the following line to the <filename>recovery.conf</> file on the
|
|
||||||
standby.
|
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
# The standby connects to the primary that is running on host 192.168.1.50
|
|
||||||
# and port 5432 as the user "foo" whose password is "foopass".
|
|
||||||
primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
You do not need to specify <literal>database=replication</> in the
|
|
||||||
<varname>primary_conninfo</varname>. The required option will be added
|
|
||||||
automatically. If you mention the database parameter at all within
|
|
||||||
<varname>primary_conninfo</varname> then a FATAL error will be raised.
|
|
||||||
</para>
|
|
||||||
</sect2>
|
|
||||||
</sect1>
|
|
||||||
|
|
||||||
<sect1 id="warm-standby-failover">
|
|
||||||
<title>Failover</title>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
If the primary server fails then the standby server should begin
|
|
||||||
failover procedures.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
If the standby server fails then no failover need take place. If the
|
|
||||||
standby server can be restarted, even some time later, then the recovery
|
|
||||||
process can also be restarted immediately, taking advantage of
|
|
||||||
restartable recovery. If the standby server cannot be restarted, then a
|
|
||||||
full new standby server instance should be created.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
If the primary server fails and the standby server becomes the
|
|
||||||
new primary, and then the old primary restarts, you must have
|
|
||||||
a mechanism for informing the old primary that it is no longer the primary. This is
|
|
||||||
sometimes known as <acronym>STONITH</> (Shoot The Other Node In The Head), which is
|
|
||||||
necessary to avoid situations where both systems think they are the
|
|
||||||
primary, which will lead to confusion and ultimately data loss.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
Many failover systems use just two systems, the primary and the standby,
|
|
||||||
connected by some kind of heartbeat mechanism to continually verify the
|
|
||||||
connectivity between the two and the viability of the primary. It is
|
|
||||||
also possible to use a third system (called a witness server) to prevent
|
|
||||||
some cases of inappropriate failover, but the additional complexity
|
|
||||||
might not be worthwhile unless it is set up with sufficient care and
|
|
||||||
rigorous testing.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
Once failover to the standby occurs, there is only a
|
|
||||||
single server in operation. This is known as a degenerate state.
|
|
||||||
The former standby is now the primary, but the former primary is down
|
|
||||||
and might stay down. To return to normal operation, a standby server
|
|
||||||
must be recreated,
|
|
||||||
either on the former primary system when it comes up, or on a third,
|
|
||||||
possibly new, system. Once complete the primary and standby can be
|
|
||||||
considered to have switched roles. Some people choose to use a third
|
|
||||||
server to provide backup for the new primary until the new standby
|
|
||||||
server is recreated,
|
|
||||||
though clearly this complicates the system configuration and
|
|
||||||
operational processes.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
So, switching from primary to standby server can be fast but requires
|
|
||||||
some time to re-prepare the failover cluster. Regular switching from
|
|
||||||
primary to standby is useful, since it allows regular downtime on
|
|
||||||
each system for maintenance. This also serves as a test of the
|
|
||||||
failover mechanism to ensure that it will really work when you need it.
|
|
||||||
Written administration procedures are advised.
|
|
||||||
</para>
|
|
||||||
</sect1>
|
|
||||||
|
|
||||||
<sect1 id="hot-standby">
|
<sect1 id="hot-standby">
|
||||||
<title>Hot Standby</title>
|
<title>Hot Standby</title>
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue