postgresql/doc/src/sgml/failover.sgml

<!-- $PostgreSQL: pgsql/doc/src/sgml/failover.sgml,v 1.11 2006/11/17 08:46:53 meskes Exp $ -->

<chapter id="failover">
 <title>Failover, Replication, Load Balancing, and Clustering Options</title>

 <indexterm><primary>failover</></>
 <indexterm><primary>replication</></>
 <indexterm><primary>load balancing</></>
 <indexterm><primary>clustering</></>

 <para>
  Database servers can work together to allow a second server to
  quickly take over if the primary server fails (failover), or to
  allow several computers to serve the same data (load balancing).
  Ideally, database servers could work together seamlessly.  Web
  servers serving static web pages can be combined quite easily by
  merely load-balancing web requests to multiple machines.  In
  fact, read-only database servers can be combined relatively easily
  too.  Unfortunately, most database servers have a read/write mix
  of requests, and read/write servers are much harder to combine.
  This is because though read-only data needs to be placed on each
  server only once, a write to any server has to be propagated to
  all servers so that future read requests to those servers return
  consistent results.
 </para>

 <para>
  This synchronization problem is the fundamental difficulty for servers
  working together.  Because there is no single solution that eliminates
  the impact of the sync problem for all use cases, there are multiple
  solutions.  Each solution addresses this problem in a different way, and
  minimizes its impact for a specific workload.
 </para>

 <para>
  Some solutions deal with synchronization by allowing only one
  server to modify the data.  Servers that can modify data are
  called read/write or "master" servers.  Servers that can reply
  to read-only queries are called "slave" servers.  Servers that
  cannot be accessed until they are changed to master servers are
  called "standby" servers.
 </para>

 <para>
  Some failover and load balancing solutions are synchronous, meaning that
  a data-modifying transaction is not considered committed until all
  servers have committed the transaction.  This guarantees that a failover
  will not lose any data and that all load-balanced servers will return
  consistent results with no propagation delay. Asynchronous updating has
  a small delay between the time of commit and its propagation to the
  other servers, opening the possibility that some transactions might be
  lost in the switch to a backup server, and that load balanced servers
  might return slightly stale results.  Asynchronous communication is used
  when synchronous would be too slow.
 </para>

 <para>
  Solutions can also be categorized by their granularity.  Some solutions
  can deal only with an entire database server, while others allow control
  at the per-table or per-database level.
 </para>

 <para>
  Performance must be considered in any failover or load balancing
  choice.  There is usually a tradeoff between functionality and
  performance.  For example, a full synchronous solution over a slow
  network might cut performance by more than half, while an asynchronous
  one might have a minimal performance impact.
 </para>

 <para>
  The remainder of this section outlines various failover, replication,
  and load balancing solutions.
 </para>

 <variablelist>

 <varlistentry>
  <term>Shared Disk Failover</term>
  <listitem>

   <para>
    Shared disk failover avoids synchronization overhead by having only one
    copy of the database.  It uses a single disk array that is shared by
    multiple servers.  If the main database server fails, the standby server
    is able to mount and start the database as though it was recovering from
    a database crash.  This allows rapid failover with no data loss.
   </para>

   <para>
    Shared hardware functionality is common in network storage
    devices.  Using a network file system is also possible, though
    care must be taken that the file system has full POSIX behavior.
    One significant limitation of this method is that if the shared
    disk array fails or becomes corrupt, the primary and standby
    servers are both nonfunctional.  Another issue is that the
    standby server should never access the shared storage while
    the primary server is running.
   </para>
  </listitem>
 </varlistentry>

 <varlistentry>
  <term>Warm Standby Using Point-In-Time Recovery</term>
  <listitem>

   <para>
    A warm standby server (see <xref linkend="warm-standby">) can
    be kept current by reading a stream of write-ahead log (WAL)
    records.  If the main server fails, the warm standby contains
    almost all of the data of the main server, and can be quickly
    made the new master database server.  This is asynchronous and
    can only be done for the entire database server.
   </para>
  </listitem>
 </varlistentry>

 <varlistentry>
  <term>Master/Slave Replication</term>
  <listitem>

   <para>
    A master/slave replication setup sends all data modification
    queries to the master server.  The master server asynchonously
    sends data changes to the slave server.  The slave can answer
    read-only queries while the master server is running.  The
    slave server is ideal for data warehouse queries.
   </para>

   <para>
    Slony-I is an example of this type of replication, with per-table
    granularity, and support for multiple slaves.  Because it
    updates the slave server asynchronously (in batches), there is
    possible data loss during fail over.
   </para>
  </listitem>
 </varlistentry>

 <varlistentry>
  <term>Data Partitioning</term>
  <listitem>

   <para>
    Data partitioning splits tables into data sets.  Each set can
    be modified by only one server.  For example, data can be
    partitioned by offices, e.g. London and Paris.  While London
    and Paris servers have all data records, only London can modify
    London records, and Paris can only modify Paris records.  This
    is similar to the "Master/Slave Replication" item above, except
    that instead of having a read/write server and a read-only
    server, each server has a read/write data set and a read-only
    data set.
   </para>

   <para>
    Such partitioning provides both failover and load balancing.  Failover
    is achieved because the data resides on both servers, and this is an
    ideal way to enable failover if the servers share a slow communication
    channel. Load balancing is possible because read requests can go to any
    of the servers, and write requests are split among the servers.  Of
    course, the communication to keep all the servers up-to-date adds
    overhead, so ideally the write load should be low, or localized as in
    the London/Paris example above.
   </para>

   <para>
    Data partitioning is usually handled by application code, though rules
    and triggers can be used to keep the read-only data sets current.  Slony-I
    can also be used in such a setup.  While Slony-I replicates only entire
    tables, London and Paris can be placed in separate tables, and
    inheritance can be used to access both tables using a single table name.
   </para>
  </listitem>
 </varlistentry>

 <varlistentry>
  <term>Multi-Master Replication Using Query Broadcasting</term>
  <listitem>

   <para>
    One way to do multi-master replication is by having a program
    intercept every SQL query and send it to all servers.  Each
    server operates independently.  Read-only queries can be sent
    to a single server because there is no need for all servers to
    process it.
   </para>

   <para>
    One limitation of this solution is that functions like
    <function>random()</>, <function>CURRENT_TIMESTAMP</>, and
    sequences can have different values on different servers.  This
    is because each server operates independently, and because SQL
    queries are broadcast (and not actual modified rows).  If this
    is unacceptable, applications must query such values from a
    single server and then use those values in write queries.
    Also, care must be taken that all transactions either commit
    or abort on all servers, perhaps using two-phase commit (<xref
    linkend="sql-prepare-transaction"
    endterm="sql-prepare-transaction-title"> and <xref
    linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">.
    Pgpool is an example of this type of replication.
   </para>
  </listitem>
 </varlistentry>

 <varlistentry>
  <term>Multi-Master Replication Using Clustering</term>
  <listitem>

   <para>
    In clustering, each server can accept write requests, and
    modified data is transmitted from the original server to every
    other server before each transaction commits.  Heavy write
    activity can cause excessive locking, leading to poor performance.
    In fact, write performance is often worse than that of a single
    server.  Read requests can be sent to any server.  Clustering
    is best for mostly read workloads, though its big advantage
    is that any server can accept write requests &mdash; there is
    no need to partition workloads between master and slave servers,
    and because the changes are sent from one server to another,
    there is not a problem with non-deterministic functions like
    <function>random()</>.
   </para>

   <para>
    Clustering is implemented by <productname>Oracle</> in their
    <productname><acronym>RAC</></> product.  <productname>PostgreSQL</>
    does not offer this type of load balancing, though
    <productname>PostgreSQL</> two-phase commit (<xref
    linkend="sql-prepare-transaction"
    endterm="sql-prepare-transaction-title"> and <xref
    linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">)
    can be used to implement this in application code or middleware.
   </para>
  </listitem>
 </varlistentry>

 <varlistentry>
  <term>Clustering For Parallel Query Execution</term>
  <listitem>

   <para>
    This allows multiple servers to work concurrently on a single
    query.  One possible way this could work is for the data to be
    split among servers and for each server to execute its part of
    the query and results sent to a central server to be combined
    and returned to the user.  There currently is no
    <productname>PostgreSQL</> open source solution for this.
   </para>
  </listitem>
 </varlistentry>

 <varlistentry>
  <term>Commercial Solutions</term>
  <listitem>

   <para>
    Because <productname>PostgreSQL</> is open source and easily
    extended, a number of companies have taken <productname>PostgreSQL</>
    and created commercial closed-source solutions with unique
    failover, replication, and load balancing capabilities.
   </para>
  </listitem>
 </varlistentry>

 </variablelist>

</chapter>
Fixed small typo, missing 'l' in 'Cluserting' 2006-11-17 09:46:53 +01:00			`<!-- $PostgreSQL: pgsql/doc/src/sgml/failover.sgml,v 1.11 2006/11/17 08:46:53 meskes Exp $ -->`
Add missing file for documentation section on failover, replication, load balancing, and clustering options. 2006-10-26 17:32:45 +02:00
			`<chapter id="failover">`
			`<title>Failover, Replication, Load Balancing, and Clustering Options</title>`

			`<indexterm><primary>failover</></>`
			`<indexterm><primary>replication</></>`
			`<indexterm><primary>load balancing</></>`
			`<indexterm><primary>clustering</></>`

			`<para>`
Use more standard terms for replication, ideas from Markus Schiltknecht. 2006-11-17 05:52:46 +01:00			`Database servers can work together to allow a second server to`
Add missing file for documentation section on failover, replication, load balancing, and clustering options. 2006-10-26 17:32:45 +02:00			`quickly take over if the primary server fails (failover), or to`
			`allow several computers to serve the same data (load balancing).`
			`Ideally, database servers could work together seamlessly. Web`
			`servers serving static web pages can be combined quite easily by`
			`merely load-balancing web requests to multiple machines. In`
			`fact, read-only database servers can be combined relatively easily`
			`too. Unfortunately, most database servers have a read/write mix`
			`of requests, and read/write servers are much harder to combine.`
			`This is because though read-only data needs to be placed on each`
			`server only once, a write to any server has to be propagated to`
			`all servers so that future read requests to those servers return`
			`consistent results.`
			`</para>`

			`<para>`
			`This synchronization problem is the fundamental difficulty for servers`
			`working together. Because there is no single solution that eliminates`
			`the impact of the sync problem for all use cases, there are multiple`
			`solutions. Each solution addresses this problem in a different way, and`
			`minimizes its impact for a specific workload.`
			`</para>`

Clarify replication items, and define some terms. 2006-11-16 19:25:58 +01:00			`<para>`
			`Some solutions deal with synchronization by allowing only one`
			`server to modify the data. Servers that can modify data are`
Use more standard terms for replication, ideas from Markus Schiltknecht. 2006-11-17 05:52:46 +01:00			`called read/write or "master" servers. Servers that can reply`
			`to read-only queries are called "slave" servers. Servers that`
			`cannot be accessed until they are changed to master servers are`
			`called "standby" servers.`
Clarify replication items, and define some terms. 2006-11-16 19:25:58 +01:00			`</para>`

Add missing file for documentation section on failover, replication, load balancing, and clustering options. 2006-10-26 17:32:45 +02:00			`<para>`
			`Some failover and load balancing solutions are synchronous, meaning that`
			`a data-modifying transaction is not considered committed until all`
			`servers have committed the transaction. This guarantees that a failover`
			`will not lose any data and that all load-balanced servers will return`
			`consistent results with no propagation delay. Asynchronous updating has`
			`a small delay between the time of commit and its propagation to the`
			`other servers, opening the possibility that some transactions might be`
			`lost in the switch to a backup server, and that load balanced servers`
			`might return slightly stale results. Asynchronous communication is used`
			`when synchronous would be too slow.`
			`</para>`

			`<para>`
			`Solutions can also be categorized by their granularity. Some solutions`
			`can deal only with an entire database server, while others allow control`
			`at the per-table or per-database level.`
			`</para>`

			`<para>`
			`Performance must be considered in any failover or load balancing`
			`choice. There is usually a tradeoff between functionality and`
			`performance. For example, a full synchronous solution over a slow`
			`network might cut performance by more than half, while an asynchronous`
			`one might have a minimal performance impact.`
			`</para>`

			`<para>`
Fix typo in docs. 2006-10-27 14:40:26 +02:00			`The remainder of this section outlines various failover, replication,`
Add missing file for documentation section on failover, replication, load balancing, and clustering options. 2006-10-26 17:32:45 +02:00			`and load balancing solutions.`
			`</para>`

Reconfigure failover/replication doc items to be varlist entries, rather than new sections, so they appear all on the same web page. 2006-11-16 22:43:33 +01:00			`<variablelist>`

			`<varlistentry>`
			`<term>Shared Disk Failover</term>`
			`<listitem>`

			`<para>`
			`Shared disk failover avoids synchronization overhead by having only one`
			`copy of the database. It uses a single disk array that is shared by`
Use more standard terms for replication, ideas from Markus Schiltknecht. 2006-11-17 05:52:46 +01:00			`multiple servers. If the main database server fails, the standby server`
Reconfigure failover/replication doc items to be varlist entries, rather than new sections, so they appear all on the same web page. 2006-11-16 22:43:33 +01:00			`is able to mount and start the database as though it was recovering from`
			`a database crash. This allows rapid failover with no data loss.`
			`</para>`

			`<para>`
Use more standard terms for replication, ideas from Markus Schiltknecht. 2006-11-17 05:52:46 +01:00			`Shared hardware functionality is common in network storage`
			`devices. Using a network file system is also possible, though`
			`care must be taken that the file system has full POSIX behavior.`
			`One significant limitation of this method is that if the shared`
			`disk array fails or becomes corrupt, the primary and standby`
			`servers are both nonfunctional. Another issue is that the`
			`standby server should never access the shared storage while`
			`the primary server is running.`
Reconfigure failover/replication doc items to be varlist entries, rather than new sections, so they appear all on the same web page. 2006-11-16 22:43:33 +01:00			`</para>`
			`</listitem>`
			`</varlistentry>`

			`<varlistentry>`
			`<term>Warm Standby Using Point-In-Time Recovery</term>`
			`<listitem>`

			`<para>`
			`A warm standby server (see <xref linkend="warm-standby">) can`
			`be kept current by reading a stream of write-ahead log (WAL)`
			`records. If the main server fails, the warm standby contains`
			`almost all of the data of the main server, and can be quickly`
			`made the new master database server. This is asynchronous and`
			`can only be done for the entire database server.`
			`</para>`
			`</listitem>`
			`</varlistentry>`

			`<varlistentry>`
Use more standard terms for replication, ideas from Markus Schiltknecht. 2006-11-17 05:52:46 +01:00			`<term>Master/Slave Replication</term>`
Reconfigure failover/replication doc items to be varlist entries, rather than new sections, so they appear all on the same web page. 2006-11-16 22:43:33 +01:00			`<listitem>`

			`<para>`
Use more standard terms for replication, ideas from Markus Schiltknecht. 2006-11-17 05:52:46 +01:00			`A master/slave replication setup sends all data modification`
			`queries to the master server. The master server asynchonously`
			`sends data changes to the slave server. The slave can answer`
			`read-only queries while the master server is running. The`
			`slave server is ideal for data warehouse queries.`
Reconfigure failover/replication doc items to be varlist entries, rather than new sections, so they appear all on the same web page. 2006-11-16 22:43:33 +01:00			`</para>`

			`<para>`
			`Slony-I is an example of this type of replication, with per-table`
Use more standard terms for replication, ideas from Markus Schiltknecht. 2006-11-17 05:52:46 +01:00			`granularity, and support for multiple slaves. Because it`
			`updates the slave server asynchronously (in batches), there is`
			`possible data loss during fail over.`
Reconfigure failover/replication doc items to be varlist entries, rather than new sections, so they appear all on the same web page. 2006-11-16 22:43:33 +01:00			`</para>`
			`</listitem>`
			`</varlistentry>`

			`<varlistentry>`
			`<term>Data Partitioning</term>`
			`<listitem>`

			`<para>`
			`Data partitioning splits tables into data sets. Each set can`
			`be modified by only one server. For example, data can be`
			`partitioned by offices, e.g. London and Paris. While London`
			`and Paris servers have all data records, only London can modify`
			`London records, and Paris can only modify Paris records. This`
Use more standard terms for replication, ideas from Markus Schiltknecht. 2006-11-17 05:52:46 +01:00			`is similar to the "Master/Slave Replication" item above, except`
			`that instead of having a read/write server and a read-only`
			`server, each server has a read/write data set and a read-only`
			`data set.`
Reconfigure failover/replication doc items to be varlist entries, rather than new sections, so they appear all on the same web page. 2006-11-16 22:43:33 +01:00			`</para>`

			`<para>`
			`Such partitioning provides both failover and load balancing. Failover`
			`is achieved because the data resides on both servers, and this is an`
			`ideal way to enable failover if the servers share a slow communication`
			`channel. Load balancing is possible because read requests can go to any`
			`of the servers, and write requests are split among the servers. Of`
			`course, the communication to keep all the servers up-to-date adds`
			`overhead, so ideally the write load should be low, or localized as in`
			`the London/Paris example above.`
			`</para>`

Use more standard terms for replication, ideas from Markus Schiltknecht. 2006-11-17 05:52:46 +01:00			`<para>`
Reconfigure failover/replication doc items to be varlist entries, rather than new sections, so they appear all on the same web page. 2006-11-16 22:43:33 +01:00			`Data partitioning is usually handled by application code, though rules`
			`and triggers can be used to keep the read-only data sets current. Slony-I`
			`can also be used in such a setup. While Slony-I replicates only entire`
			`tables, London and Paris can be placed in separate tables, and`
			`inheritance can be used to access both tables using a single table name.`
			`</para>`
			`</listitem>`
			`</varlistentry>`

			`<varlistentry>`
Use more standard terms for replication, ideas from Markus Schiltknecht. 2006-11-17 05:52:46 +01:00			`<term>Multi-Master Replication Using Query Broadcasting</term>`
Reconfigure failover/replication doc items to be varlist entries, rather than new sections, so they appear all on the same web page. 2006-11-16 22:43:33 +01:00			`<listitem>`

			`<para>`
Use more standard terms for replication, ideas from Markus Schiltknecht. 2006-11-17 05:52:46 +01:00			`One way to do multi-master replication is by having a program`
			`intercept every SQL query and send it to all servers. Each`
			`server operates independently. Read-only queries can be sent`
			`to a single server because there is no need for all servers to`
			`process it.`
Reconfigure failover/replication doc items to be varlist entries, rather than new sections, so they appear all on the same web page. 2006-11-16 22:43:33 +01:00			`</para>`

			`<para>`
			`One limitation of this solution is that functions like`
			`<function>random()</>, <function>CURRENT_TIMESTAMP</>, and`
			`sequences can have different values on different servers. This`
			`is because each server operates independently, and because SQL`
			`queries are broadcast (and not actual modified rows). If this`
			`is unacceptable, applications must query such values from a`
Mention two-phase commit for having all transactions commit on all servers. 2006-11-16 22:45:25 +01:00			`single server and then use those values in write queries.`
			`Also, care must be taken that all transactions either commit`
			`or abort on all servers, perhaps using two-phase commit (<xref`
			`linkend="sql-prepare-transaction"`
			`endterm="sql-prepare-transaction-title"> and <xref`
			`linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">.`
			`Pgpool is an example of this type of replication.`
Reconfigure failover/replication doc items to be varlist entries, rather than new sections, so they appear all on the same web page. 2006-11-16 22:43:33 +01:00			`</para>`
			`</listitem>`
			`</varlistentry>`

			`<varlistentry>`
Fixed small typo, missing 'l' in 'Cluserting' 2006-11-17 09:46:53 +01:00			`<term>Multi-Master Replication Using Clustering</term>`
Reconfigure failover/replication doc items to be varlist entries, rather than new sections, so they appear all on the same web page. 2006-11-16 22:43:33 +01:00			`<listitem>`

Use more standard terms for replication, ideas from Markus Schiltknecht. 2006-11-17 05:52:46 +01:00			`<para>`
			`In clustering, each server can accept write requests, and`
			`modified data is transmitted from the original server to every`
			`other server before each transaction commits. Heavy write`
			`activity can cause excessive locking, leading to poor performance.`
			`In fact, write performance is often worse than that of a single`
Reconfigure failover/replication doc items to be varlist entries, rather than new sections, so they appear all on the same web page. 2006-11-16 22:43:33 +01:00			`server. Read requests can be sent to any server. Clustering`
Use more standard terms for replication, ideas from Markus Schiltknecht. 2006-11-17 05:52:46 +01:00			`is best for mostly read workloads, though its big advantage`
			`is that any server can accept write requests — there is`
			`no need to partition workloads between master and slave servers,`
			`and because the changes are sent from one server to another,`
			`there is not a problem with non-deterministic functions like`
			`<function>random()</>.`
Reconfigure failover/replication doc items to be varlist entries, rather than new sections, so they appear all on the same web page. 2006-11-16 22:43:33 +01:00			`</para>`

			`<para>`
			`Clustering is implemented by <productname>Oracle</> in their`
			`<productname><acronym>RAC</></> product. <productname>PostgreSQL</>`
			`does not offer this type of load balancing, though`
			`<productname>PostgreSQL</> two-phase commit (<xref`
			`linkend="sql-prepare-transaction"`
			`endterm="sql-prepare-transaction-title"> and <xref`
			`linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">)`
			`can be used to implement this in application code or middleware.`
			`</para>`
			`</listitem>`
			`</varlistentry>`

			`<varlistentry>`
			`<term>Clustering For Parallel Query Execution</term>`
			`<listitem>`

			`<para>`
			`This allows multiple servers to work concurrently on a single`
			`query. One possible way this could work is for the data to be`
			`split among servers and for each server to execute its part of`
			`the query and results sent to a central server to be combined`
			`and returned to the user. There currently is no`
			`<productname>PostgreSQL</> open source solution for this.`
			`</para>`
			`</listitem>`
			`</varlistentry>`

			`<varlistentry>`
			`<term>Commercial Solutions</term>`
			`<listitem>`

			`<para>`
			`Because <productname>PostgreSQL</> is open source and easily`
			`extended, a number of companies have taken <productname>PostgreSQL</>`
			`and created commercial closed-source solutions with unique`
			`failover, replication, and load balancing capabilities.`
			`</para>`
			`</listitem>`
			`</varlistentry>`

			`</variablelist>`
Add missing file for documentation section on failover, replication, load balancing, and clustering options. 2006-10-26 17:32:45 +02:00
			`</chapter>`