postgresql/doc/src/sgml/failover.sgml

269 lines
10 KiB
Plaintext
Raw Normal View History

<!-- $PostgreSQL: pgsql/doc/src/sgml/failover.sgml,v 1.11 2006/11/17 08:46:53 meskes Exp $ -->
<chapter id="failover">
<title>Failover, Replication, Load Balancing, and Clustering Options</title>
<indexterm><primary>failover</></>
<indexterm><primary>replication</></>
<indexterm><primary>load balancing</></>
<indexterm><primary>clustering</></>
<para>
Database servers can work together to allow a second server to
quickly take over if the primary server fails (failover), or to
allow several computers to serve the same data (load balancing).
Ideally, database servers could work together seamlessly. Web
servers serving static web pages can be combined quite easily by
merely load-balancing web requests to multiple machines. In
fact, read-only database servers can be combined relatively easily
too. Unfortunately, most database servers have a read/write mix
of requests, and read/write servers are much harder to combine.
This is because though read-only data needs to be placed on each
server only once, a write to any server has to be propagated to
all servers so that future read requests to those servers return
consistent results.
</para>
<para>
This synchronization problem is the fundamental difficulty for servers
working together. Because there is no single solution that eliminates
the impact of the sync problem for all use cases, there are multiple
solutions. Each solution addresses this problem in a different way, and
minimizes its impact for a specific workload.
</para>
<para>
Some solutions deal with synchronization by allowing only one
server to modify the data. Servers that can modify data are
called read/write or "master" servers. Servers that can reply
to read-only queries are called "slave" servers. Servers that
cannot be accessed until they are changed to master servers are
called "standby" servers.
</para>
<para>
Some failover and load balancing solutions are synchronous, meaning that
a data-modifying transaction is not considered committed until all
servers have committed the transaction. This guarantees that a failover
will not lose any data and that all load-balanced servers will return
consistent results with no propagation delay. Asynchronous updating has
a small delay between the time of commit and its propagation to the
other servers, opening the possibility that some transactions might be
lost in the switch to a backup server, and that load balanced servers
might return slightly stale results. Asynchronous communication is used
when synchronous would be too slow.
</para>
<para>
Solutions can also be categorized by their granularity. Some solutions
can deal only with an entire database server, while others allow control
at the per-table or per-database level.
</para>
<para>
Performance must be considered in any failover or load balancing
choice. There is usually a tradeoff between functionality and
performance. For example, a full synchronous solution over a slow
network might cut performance by more than half, while an asynchronous
one might have a minimal performance impact.
</para>
<para>
2006-10-27 14:40:26 +02:00
The remainder of this section outlines various failover, replication,
and load balancing solutions.
</para>
<variablelist>
<varlistentry>
<term>Shared Disk Failover</term>
<listitem>
<para>
Shared disk failover avoids synchronization overhead by having only one
copy of the database. It uses a single disk array that is shared by
multiple servers. If the main database server fails, the standby server
is able to mount and start the database as though it was recovering from
a database crash. This allows rapid failover with no data loss.
</para>
<para>
Shared hardware functionality is common in network storage
devices. Using a network file system is also possible, though
care must be taken that the file system has full POSIX behavior.
One significant limitation of this method is that if the shared
disk array fails or becomes corrupt, the primary and standby
servers are both nonfunctional. Another issue is that the
standby server should never access the shared storage while
the primary server is running.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Warm Standby Using Point-In-Time Recovery</term>
<listitem>
<para>
A warm standby server (see <xref linkend="warm-standby">) can
be kept current by reading a stream of write-ahead log (WAL)
records. If the main server fails, the warm standby contains
almost all of the data of the main server, and can be quickly
made the new master database server. This is asynchronous and
can only be done for the entire database server.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Master/Slave Replication</term>
<listitem>
<para>
A master/slave replication setup sends all data modification
queries to the master server. The master server asynchonously
sends data changes to the slave server. The slave can answer
read-only queries while the master server is running. The
slave server is ideal for data warehouse queries.
</para>
<para>
Slony-I is an example of this type of replication, with per-table
granularity, and support for multiple slaves. Because it
updates the slave server asynchronously (in batches), there is
possible data loss during fail over.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Data Partitioning</term>
<listitem>
<para>
Data partitioning splits tables into data sets. Each set can
be modified by only one server. For example, data can be
partitioned by offices, e.g. London and Paris. While London
and Paris servers have all data records, only London can modify
London records, and Paris can only modify Paris records. This
is similar to the "Master/Slave Replication" item above, except
that instead of having a read/write server and a read-only
server, each server has a read/write data set and a read-only
data set.
</para>
<para>
Such partitioning provides both failover and load balancing. Failover
is achieved because the data resides on both servers, and this is an
ideal way to enable failover if the servers share a slow communication
channel. Load balancing is possible because read requests can go to any
of the servers, and write requests are split among the servers. Of
course, the communication to keep all the servers up-to-date adds
overhead, so ideally the write load should be low, or localized as in
the London/Paris example above.
</para>
<para>
Data partitioning is usually handled by application code, though rules
and triggers can be used to keep the read-only data sets current. Slony-I
can also be used in such a setup. While Slony-I replicates only entire
tables, London and Paris can be placed in separate tables, and
inheritance can be used to access both tables using a single table name.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Multi-Master Replication Using Query Broadcasting</term>
<listitem>
<para>
One way to do multi-master replication is by having a program
intercept every SQL query and send it to all servers. Each
server operates independently. Read-only queries can be sent
to a single server because there is no need for all servers to
process it.
</para>
<para>
One limitation of this solution is that functions like
<function>random()</>, <function>CURRENT_TIMESTAMP</>, and
sequences can have different values on different servers. This
is because each server operates independently, and because SQL
queries are broadcast (and not actual modified rows). If this
is unacceptable, applications must query such values from a
single server and then use those values in write queries.
Also, care must be taken that all transactions either commit
or abort on all servers, perhaps using two-phase commit (<xref
linkend="sql-prepare-transaction"
endterm="sql-prepare-transaction-title"> and <xref
linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">.
Pgpool is an example of this type of replication.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Multi-Master Replication Using Clustering</term>
<listitem>
<para>
In clustering, each server can accept write requests, and
modified data is transmitted from the original server to every
other server before each transaction commits. Heavy write
activity can cause excessive locking, leading to poor performance.
In fact, write performance is often worse than that of a single
server. Read requests can be sent to any server. Clustering
is best for mostly read workloads, though its big advantage
is that any server can accept write requests &mdash; there is
no need to partition workloads between master and slave servers,
and because the changes are sent from one server to another,
there is not a problem with non-deterministic functions like
<function>random()</>.
</para>
<para>
Clustering is implemented by <productname>Oracle</> in their
<productname><acronym>RAC</></> product. <productname>PostgreSQL</>
does not offer this type of load balancing, though
<productname>PostgreSQL</> two-phase commit (<xref
linkend="sql-prepare-transaction"
endterm="sql-prepare-transaction-title"> and <xref
linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">)
can be used to implement this in application code or middleware.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Clustering For Parallel Query Execution</term>
<listitem>
<para>
This allows multiple servers to work concurrently on a single
query. One possible way this could work is for the data to be
split among servers and for each server to execute its part of
the query and results sent to a central server to be combined
and returned to the user. There currently is no
<productname>PostgreSQL</> open source solution for this.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Commercial Solutions</term>
<listitem>
<para>
Because <productname>PostgreSQL</> is open source and easily
extended, a number of companies have taken <productname>PostgreSQL</>
and created commercial closed-source solutions with unique
failover, replication, and load balancing capabilities.
</para>
</listitem>
</varlistentry>
</variablelist>
</chapter>