2006-11-16 22:45:25 +01:00
|
|
|
<!-- $PostgreSQL: pgsql/doc/src/sgml/failover.sgml,v 1.9 2006/11/16 21:45:25 momjian Exp $ -->
|
2006-10-26 17:32:45 +02:00
|
|
|
|
|
|
|
<chapter id="failover">
|
|
|
|
<title>Failover, Replication, Load Balancing, and Clustering Options</title>
|
|
|
|
|
|
|
|
<indexterm><primary>failover</></>
|
|
|
|
<indexterm><primary>replication</></>
|
|
|
|
<indexterm><primary>load balancing</></>
|
|
|
|
<indexterm><primary>clustering</></>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Database servers can work together to allow a backup server to
|
|
|
|
quickly take over if the primary server fails (failover), or to
|
|
|
|
allow several computers to serve the same data (load balancing).
|
|
|
|
Ideally, database servers could work together seamlessly. Web
|
|
|
|
servers serving static web pages can be combined quite easily by
|
|
|
|
merely load-balancing web requests to multiple machines. In
|
|
|
|
fact, read-only database servers can be combined relatively easily
|
|
|
|
too. Unfortunately, most database servers have a read/write mix
|
|
|
|
of requests, and read/write servers are much harder to combine.
|
|
|
|
This is because though read-only data needs to be placed on each
|
|
|
|
server only once, a write to any server has to be propagated to
|
|
|
|
all servers so that future read requests to those servers return
|
|
|
|
consistent results.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
This synchronization problem is the fundamental difficulty for servers
|
|
|
|
working together. Because there is no single solution that eliminates
|
|
|
|
the impact of the sync problem for all use cases, there are multiple
|
|
|
|
solutions. Each solution addresses this problem in a different way, and
|
|
|
|
minimizes its impact for a specific workload.
|
|
|
|
</para>
|
|
|
|
|
2006-11-16 19:25:58 +01:00
|
|
|
<para>
|
|
|
|
Some solutions deal with synchronization by allowing only one
|
|
|
|
server to modify the data. Servers that can modify data are
|
|
|
|
called read/write or "master" server. Servers with read-only
|
|
|
|
data are called backup or "slave" servers. As you will see below,
|
|
|
|
these terms cover a variety of implementations. Some servers
|
|
|
|
are masters of some data sets, and slave of others. Some slaves
|
|
|
|
cannot be accessed until they are changed to master servers,
|
|
|
|
while other slaves can reply to read-only queries while they are
|
|
|
|
slaves.
|
|
|
|
</para>
|
|
|
|
|
2006-10-26 17:32:45 +02:00
|
|
|
<para>
|
|
|
|
Some failover and load balancing solutions are synchronous, meaning that
|
|
|
|
a data-modifying transaction is not considered committed until all
|
|
|
|
servers have committed the transaction. This guarantees that a failover
|
|
|
|
will not lose any data and that all load-balanced servers will return
|
|
|
|
consistent results with no propagation delay. Asynchronous updating has
|
|
|
|
a small delay between the time of commit and its propagation to the
|
|
|
|
other servers, opening the possibility that some transactions might be
|
|
|
|
lost in the switch to a backup server, and that load balanced servers
|
|
|
|
might return slightly stale results. Asynchronous communication is used
|
|
|
|
when synchronous would be too slow.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Solutions can also be categorized by their granularity. Some solutions
|
|
|
|
can deal only with an entire database server, while others allow control
|
|
|
|
at the per-table or per-database level.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Performance must be considered in any failover or load balancing
|
|
|
|
choice. There is usually a tradeoff between functionality and
|
|
|
|
performance. For example, a full synchronous solution over a slow
|
|
|
|
network might cut performance by more than half, while an asynchronous
|
|
|
|
one might have a minimal performance impact.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2006-10-27 14:40:26 +02:00
|
|
|
The remainder of this section outlines various failover, replication,
|
2006-10-26 17:32:45 +02:00
|
|
|
and load balancing solutions.
|
|
|
|
</para>
|
|
|
|
|
2006-11-16 22:43:33 +01:00
|
|
|
<variablelist>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>Shared Disk Failover</term>
|
|
|
|
<listitem>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Shared disk failover avoids synchronization overhead by having only one
|
|
|
|
copy of the database. It uses a single disk array that is shared by
|
|
|
|
multiple servers. If the main database server fails, the backup server
|
|
|
|
is able to mount and start the database as though it was recovering from
|
|
|
|
a database crash. This allows rapid failover with no data loss.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Shared hardware functionality is common in network storage devices. One
|
|
|
|
significant limitation of this method is that if the shared disk array
|
|
|
|
fails or becomes corrupt, the primary and backup servers are both
|
|
|
|
nonfunctional.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>Warm Standby Using Point-In-Time Recovery</term>
|
|
|
|
<listitem>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
A warm standby server (see <xref linkend="warm-standby">) can
|
|
|
|
be kept current by reading a stream of write-ahead log (WAL)
|
|
|
|
records. If the main server fails, the warm standby contains
|
|
|
|
almost all of the data of the main server, and can be quickly
|
|
|
|
made the new master database server. This is asynchronous and
|
|
|
|
can only be done for the entire database server.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>Continuously Running Replication Server</term>
|
|
|
|
<listitem>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
A continuously running replication server allows the backup server to
|
|
|
|
answer read-only queries while the master server is running. It
|
|
|
|
receives a continuous stream of write activity from the master server.
|
|
|
|
Because the backup server can be used for read-only database requests,
|
|
|
|
it is ideal for data warehouse queries.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Slony-I is an example of this type of replication, with per-table
|
|
|
|
granularity. It updates the backup server in batches, so the replication
|
|
|
|
is asynchronous and might lose data during a fail over.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>Data Partitioning</term>
|
|
|
|
<listitem>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Data partitioning splits tables into data sets. Each set can
|
|
|
|
be modified by only one server. For example, data can be
|
|
|
|
partitioned by offices, e.g. London and Paris. While London
|
|
|
|
and Paris servers have all data records, only London can modify
|
|
|
|
London records, and Paris can only modify Paris records. This
|
|
|
|
is similar to the "Continuously Running Replication Server"
|
|
|
|
item above, except that instead of having a read/write server
|
|
|
|
and a read-only server, each server has a read/write data set
|
|
|
|
and a read-only data set.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Such partitioning provides both failover and load balancing. Failover
|
|
|
|
is achieved because the data resides on both servers, and this is an
|
|
|
|
ideal way to enable failover if the servers share a slow communication
|
|
|
|
channel. Load balancing is possible because read requests can go to any
|
|
|
|
of the servers, and write requests are split among the servers. Of
|
|
|
|
course, the communication to keep all the servers up-to-date adds
|
|
|
|
overhead, so ideally the write load should be low, or localized as in
|
|
|
|
the London/Paris example above.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Data partitioning is usually handled by application code, though rules
|
|
|
|
and triggers can be used to keep the read-only data sets current. Slony-I
|
|
|
|
can also be used in such a setup. While Slony-I replicates only entire
|
|
|
|
tables, London and Paris can be placed in separate tables, and
|
|
|
|
inheritance can be used to access both tables using a single table name.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>Query Broadcast Load Balancing</term>
|
|
|
|
<listitem>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Query broadcast load balancing is accomplished by having a
|
|
|
|
program intercept every SQL query and send it to all servers.
|
|
|
|
This is unique because most replication solutions have the write
|
|
|
|
server propagate its changes to the other servers. With query
|
|
|
|
broadcasting, each server operates independently. Read-only
|
|
|
|
queries can be sent to a single server because there is no need
|
|
|
|
for all servers to process it.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
One limitation of this solution is that functions like
|
|
|
|
<function>random()</>, <function>CURRENT_TIMESTAMP</>, and
|
|
|
|
sequences can have different values on different servers. This
|
|
|
|
is because each server operates independently, and because SQL
|
|
|
|
queries are broadcast (and not actual modified rows). If this
|
|
|
|
is unacceptable, applications must query such values from a
|
2006-11-16 22:45:25 +01:00
|
|
|
single server and then use those values in write queries.
|
|
|
|
Also, care must be taken that all transactions either commit
|
|
|
|
or abort on all servers, perhaps using two-phase commit (<xref
|
|
|
|
linkend="sql-prepare-transaction"
|
|
|
|
endterm="sql-prepare-transaction-title"> and <xref
|
|
|
|
linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">.
|
|
|
|
Pgpool is an example of this type of replication.
|
2006-11-16 22:43:33 +01:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>Clustering For Load Balancing</term>
|
|
|
|
<listitem>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
In clustering, each server can accept write requests, and modified
|
|
|
|
data is transmitted from the original server to every other
|
|
|
|
server before each transaction commits. Heavy write activity
|
|
|
|
can cause excessive locking, leading to poor performance. In
|
|
|
|
fact, write performance is often worse than that of a single
|
|
|
|
server. Read requests can be sent to any server. Clustering
|
|
|
|
is best for mostly read workloads, though its big advantage is
|
|
|
|
that any server can accept write requests — there is no need
|
|
|
|
to partition workloads between read/write and read-only servers.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Clustering is implemented by <productname>Oracle</> in their
|
|
|
|
<productname><acronym>RAC</></> product. <productname>PostgreSQL</>
|
|
|
|
does not offer this type of load balancing, though
|
|
|
|
<productname>PostgreSQL</> two-phase commit (<xref
|
|
|
|
linkend="sql-prepare-transaction"
|
|
|
|
endterm="sql-prepare-transaction-title"> and <xref
|
|
|
|
linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">)
|
|
|
|
can be used to implement this in application code or middleware.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>Clustering For Parallel Query Execution</term>
|
|
|
|
<listitem>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
This allows multiple servers to work concurrently on a single
|
|
|
|
query. One possible way this could work is for the data to be
|
|
|
|
split among servers and for each server to execute its part of
|
|
|
|
the query and results sent to a central server to be combined
|
|
|
|
and returned to the user. There currently is no
|
|
|
|
<productname>PostgreSQL</> open source solution for this.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>Commercial Solutions</term>
|
|
|
|
<listitem>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Because <productname>PostgreSQL</> is open source and easily
|
|
|
|
extended, a number of companies have taken <productname>PostgreSQL</>
|
|
|
|
and created commercial closed-source solutions with unique
|
|
|
|
failover, replication, and load balancing capabilities.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
</variablelist>
|
2006-10-26 17:32:45 +02:00
|
|
|
|
|
|
|
</chapter>
|