postgresql/doc/src/sgml/high-availability.sgml

278 lines
10 KiB
Plaintext
Raw Normal View History

<!-- $PostgreSQL: pgsql/doc/src/sgml/high-availability.sgml,v 1.11 2006/11/22 04:01:40 momjian Exp $ -->
<chapter id="high-availability">
<title>High Availability and Load Balancing</title>
<indexterm><primary>high availability</></>
<indexterm><primary>failover</></>
<indexterm><primary>replication</></>
<indexterm><primary>load balancing</></>
<indexterm><primary>clustering</></>
<indexterm><primary>data partitioning</></>
<para>
Database servers can work together to allow a second server to
quickly take over quickly if the primary server fails (high
availability), or to allow several computers to serve the same
data (load balancing). Ideally, database servers could work
together seamlessly. Web servers serving static web pages can
be combined quite easily by merely load-balancing web requests
to multiple machines. In fact, read-only database servers can
be combined relatively easily too. Unfortunately, most database
servers have a read/write mix of requests, and read/write servers
are much harder to combine. This is because though read-only
data needs to be placed on each server only once, a write to any
server has to be propagated to all servers so that future read
requests to those servers return consistent results.
</para>
<para>
This synchronization problem is the fundamental difficulty for
servers working together. Because there is no single solution
that eliminates the impact of the sync problem for all use cases,
there are multiple solutions. Each solution addresses this
problem in a different way, and minimizes its impact for a specific
workload.
</para>
<para>
Some solutions deal with synchronization by allowing only one
server to modify the data. Servers that can modify data are
called read/write or "master" servers. Servers that can reply
to read-only queries are called "slave" servers. Servers that
cannot be accessed until they are changed to master servers are
called "standby" servers.
</para>
<para>
Some failover and load balancing solutions are synchronous,
meaning that a data-modifying transaction is not considered
committed until all servers have committed the transaction. This
guarantees that a failover will not lose any data and that all
load-balanced servers will return consistent results with little
propagation delay. Asynchronous updating has a delay between the
time of commit and its propagation to the other servers, opening
the possibility that some transactions might be lost in the switch
to a backup server, and that load balanced servers might return
slightly stale results. Asynchronous communication is used when
synchronous would be too slow.
</para>
<para>
Solutions can also be categorized by their granularity. Some solutions
can deal only with an entire database server, while others allow control
at the per-table or per-database level.
</para>
<para>
Performance must be considered in any failover or load balancing
choice. There is usually a tradeoff between functionality and
performance. For example, a full synchronous solution over a slow
network might cut performance by more than half, while an asynchronous
one might have a minimal performance impact.
</para>
<para>
2006-10-27 14:40:26 +02:00
The remainder of this section outlines various failover, replication,
and load balancing solutions.
</para>
<variablelist>
<varlistentry>
<term>Shared Disk Failover</term>
<listitem>
<para>
Shared disk failover avoids synchronization overhead by having only one
copy of the database. It uses a single disk array that is shared by
multiple servers. If the main database server fails, the standby server
is able to mount and start the database as though it was recovering from
a database crash. This allows rapid failover with no data loss.
</para>
<para>
Shared hardware functionality is common in network storage
devices. Using a network file system is also possible, though
care must be taken that the file system has full POSIX behavior.
One significant limitation of this method is that if the shared
disk array fails or becomes corrupt, the primary and standby
servers are both nonfunctional. Another issue is that the
standby server should never access the shared storage while
the primary server is running. It is also possible to use
some type of file system mirroring to keep the standby server
current, but the mirroring must be done in a way that the
standby server has a consistent copy of the file system.
</para>
<!--
https://forge.continuent.org/pipermail/sequoia/2006-November/004070.html
Oracle RAC is a shared disk approach and just send cache invalidations
to other nodes but not actual data. As the disk is shared, data is
only commited once to disk and there is a distributed locking
protocol to make nodes agree on a serializable transactional order.
-->
</listitem>
</varlistentry>
<varlistentry>
<term>Warm Standby Using Point-In-Time Recovery</term>
<listitem>
<para>
A warm standby server (see <xref linkend="warm-standby">) can
be kept current by reading a stream of write-ahead log (WAL)
records. If the main server fails, the warm standby contains
almost all of the data of the main server, and can be quickly
made the new master database server. This is asynchronous and
can only be done for the entire database server.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Master-Slave Replication</term>
<listitem>
<para>
A master-slave replication setup sends all data modification
2006-11-17 10:00:03 +01:00
queries to the master server. The master server asynchronously
sends data changes to the slave server. The slave can answer
read-only queries while the master server is running. The
slave server is ideal for data warehouse queries.
</para>
<para>
Slony-I is an example of this type of replication, with per-table
granularity, and support for multiple slaves. Because it
updates the slave server asynchronously (in batches), there is
possible data loss during fail over.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Statement-Based Replication Middleware</term>
<listitem>
<para>
With statement-based replication middleware, a program intercepts
every SQL query and sends it to all servers. Each server
operates independently. Read-only queries can be sent to a
single server because there is no need for all servers to
process it.
</para>
<para>
If queries are simply broadcast unmodified, functions like
<function>random()</>, <function>CURRENT_TIMESTAMP</>, and
sequences would have different values on different servers.
This is because each server operates independently, and because
SQL queries are broadcast (and not actual modified rows). If
this is unacceptable, either the middleware or the application
must query such values from a single server and then use those
values in write queries. Also, care must be taken that all
transactions either commit or abort on all servers, perhaps
using two-phase commit (<xref linkend="sql-prepare-transaction"
endterm="sql-prepare-transaction-title"> and <xref
linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">.
Pgpool is an example of this type of replication.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Synchonous Multi-Master Replication</term>
<listitem>
<para>
In synchonous multi-master replication, each server can accept
write requests, and modified data is transmitted from the
original server to every other server before each transaction
commits. Heavy write activity can cause excessive locking,
leading to poor performance. In fact, write performance is
often worse than that of a single server. Read requests can
be sent to any server. Some implementations use cluster-wide
shared memory or shared disk to reduce the communication
overhead. Clustering is best for mostly read workloads, though
its big advantage is that any server can accept write requests
&mdash; there is no need to partition workloads between master
and slave servers, and because the data changes are sent from
one server to another, there is no problem with non-deterministic
functions like <function>random()</>.
</para>
<para>
<productname>PostgreSQL</> does not offer this type of load
balancing, though <productname>PostgreSQL</> two-phase commit
(<xref linkend="sql-prepare-transaction"
endterm="sql-prepare-transaction-title"> and <xref
linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">)
can be used to implement this in application code or middleware.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Asynchronous Multi-Master Replication</term>
<listitem>
<para>
For servers that are not regularly connected, like laptops or
remote servers, keeping data consistent among servers is a
challenge. One simple solution is to allow each server to
modify the data, and have periodic communication compare
databases and ask users to resolve any conflicts.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Data Partitioning</term>
<listitem>
<para>
Data partitioning splits tables into data sets. Each set can
be modified by only one server. For example, data can be
partitioned by offices, e.g. London and Paris, with a server
in each office. If queries combining London and Paris data
are necessary, an application can query both servers, or
master/slave replication can be used to keep a read-only copy
of the other office's data on each server.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Multi-Server Parallel Query Execution</term>
<listitem>
<para>
This allows multiple servers to work concurrently on a single
query. One possible way this could work is for the data to be
split among servers and for each server to execute its part of
the query and results sent to a central server to be combined
and returned to the user. Pgpool-II has this capability.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Commercial Solutions</term>
<listitem>
<para>
Because <productname>PostgreSQL</> is open source and easily
extended, a number of companies have taken <productname>PostgreSQL</>
and created commercial closed-source solutions with unique
failover, replication, and load balancing capabilities.
</para>
</listitem>
</varlistentry>
</variablelist>
</chapter>