diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index 00f2779229..9b24ebbb2a 100644 --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml @@ -1,4 +1,4 @@ - + High Availability, Load Balancing, and Replication @@ -455,32 +455,10 @@ protocol to make nodes agree on a serializable transactional order. + - File-based Log Shipping + Log-Shipping Standby Servers - - warm standby - - - - PITR standby - - - - standby server - - - - log shipping - - - - witness server - - - - STONITH - Continuous archiving can be used to create a high @@ -510,8 +488,8 @@ protocol to make nodes agree on a serializable transactional order. adjacent system, another system at the same site, or another system on the far side of the globe. The bandwidth required for this technique varies according to the transaction rate of the primary server. - Record-based log shipping is also possible with custom-developed - procedures, as discussed in . + Record-based log shipping is also possible with streaming replication + (see ). @@ -519,26 +497,52 @@ protocol to make nodes agree on a serializable transactional order. records are shipped after transaction commit. As a result, there is a window for data loss should the primary server suffer a catastrophic failure; transactions not yet shipped will be lost. The size of the - data loss window can be limited by use of the + data loss window in file-based log shipping can be limited by use of the archive_timeout parameter, which can be set as low as a few seconds. However such a low setting will substantially increase the bandwidth required for file shipping. If you need a window of less than a minute or so, consider using - . + streaming replication (see ). - The standby server is not available for access, since it is continually - performing recovery processing. Recovery performance is sufficiently - good that the standby will typically be only moments away from full + Recovery performance is sufficiently good that the standby will + typically be only moments away from full availability once it has been activated. As a result, this is called a warm standby configuration which offers high availability. Restoring a server from an archived base backup and rollforward will take considerably longer, so that technique only offers a solution for disaster recovery, not high availability. + A standby server can also be used for read-only queries, in which case + it is called a Hot Standby server. See for + more information. - + + warm standby + + + + PITR standby + + + + standby server + + + + log shipping + + + + witness server + + + + STONITH + + + Planning @@ -573,9 +577,325 @@ protocol to make nodes agree on a serializable transactional order. versa. + + + + Standby Server Operation + - There is no special mode required to enable a standby server. The - operations that occur on both primary and standby servers are + In standby mode, the server continously applies WAL received from the + master server. The standby server can read WAL from a WAL archive + (see restore_command) or directly from the master + over a TCP connection (streaming replication). The standby server will + also attempt to restore any WAL found in the standby cluster's + pg_xlog directory. That typically happens after a server + restart, when the standby replays again WAL that was streamed from the + master before the restart, but you can also manually copy files to + pg_xlog at any time to have them replayed. + + + + At startup, the standby begins by restoring all WAL available in the + archive location, calling restore_command. Once it + reaches the end of WAL available there and restore_command + fails, it tries to restore any WAL available in the pg_xlog directory. + If that fails, and streaming replication has been configured, the + standby tries to connect to the primary server and start streaming WAL + from the last valid record found in archive or pg_xlog. If that fails + or streaming replication is not configured, or if the connection is + later disconnected, the standby goes back to step 1 and tries to + restore the file from the archive again. This loop of retries from the + archive, pg_xlog, and via streaming replication goes on until the server + is stopped or failover is triggered by a trigger file. + + + + Standby mode is exited and the server switches to normal operation, + when a trigger file is found (trigger_file). Before failover, it will + restore any WAL available in the archive or in pg_xlog, but won't try + to connect to the master or wait for files to become available in the + archive. + + + + + Preparing Master for Standby Servers + + + Set up continuous archiving to a WAL archive on the master, as described + in . The archive location should be + accessible from the standby even when the master is down, ie. it should + reside on the standby server itself or another trusted server, not on + the master server. + + + + If you want to use streaming replication, set up authentication to allow + streaming replication connections and set max_wal_senders in + the configuration file of the primary server. + + + + Take a base backup as described in + to bootstrap the standby server. + + + + + Setting up the standby server + + + To set up the standby server, restore the base backup taken from primary + server (see ). In the recovery command file + recovery.conf in the standby's cluster data directory, + turn on standby_mode. Set restore_command to + a simple command to copy files from the WAL archive. If you want to + use streaming replication, set primary_conninfo. + + + + + Do not use pg_standby or similar tools with the built-in standby mode + described here. restore_command should return immediately + if the file does not exist, the server will retry the command again if + necessary. See + for using tools like pg_standby. + + + + + You can use restartpoint_command to prune the archive of files no longer + needed by the standby. + + + + If you're setting up the standby server for high availability purposes, + set up WAL archiving, connections and authentication like the primary + server, because the standby server will work as a primary server after + failover. If you're setting up the standby server for reporting + purposes, with no plans to fail over to it, configure the standby + accordingly. + + + + You can have any number of standby servers, but if you use streaming + replication, make sure you set max_wal_senders high enough in + the primary to allow them to be connected simultaneously. + + + + + Streaming Replication + + + Streaming Replication + + + + Streaming replication allows a standby server to stay more up-to-date + than is possible with file-based log shipping. The standby connects + to the primary, which streams WAL records to the standby as they're + generated, without waiting for the WAL file to be filled. + + + + Streaming replication is asynchronous, so there is still a small delay + between committing a transaction in the primary and for the changes to + become visible in the standby. The delay is however much smaller than with + file-based log shipping, typically under one second assuming the standby + is powerful enough to keep up with the load. With streaming replication, + archive_timeout is not required to reduce the data loss + window. + + + + Streaming replication relies on file-based continuous archiving for + making the base backup and for allowing the standby to catch up if it is + disconnected from the primary for long enough for the primary to + delete old WAL files still required by the standby. + + + + To use streaming replication, set up a file-based log-shipping standby + server as described in . The step that + turns a file-based log-shipping standby into streaming replication + standby is setting primary_conninfo setting in the + recovery.conf file to point to the primary server. Set + and authentication options + (see pg_hba.conf) on the primary so that the standby server + can connect to the replication pseudo-database on the primary + server (see ). + + + + On systems that support the keepalive socket option, setting + , + and + helps the master promptly + notice a broken connection. + + + + Set the maximum number of concurrent connections from the standby servers + (see for details). + + + + When the standby is started and primary_conninfo is set + correctly, the standby will connect to the primary after replaying all + WAL files available in the archive. If the connection is established + successfully, you will see a walreceiver process in the standby, and + a corresponding walsender process in the primary. + + + + Authentication + + It is very important that the access privilege for replication be setup + properly so that only trusted users can read the WAL stream, because it is + easy to extract privileged information from it. + + + Only the superuser is allowed to connect to the primary as the replication + standby. So a role with the SUPERUSER and LOGIN + privileges needs to be created in the primary. + + + Client authentication for replication is controlled by the + pg_hba.conf record specifying replication in the + database field. For example, if the standby is running on + host IP 192.168.1.100 and the superuser's name for replication + is foo, the administrator can add the following line to the + pg_hba.conf file on the primary. + + +# Allow the user "foo" from host 192.168.1.100 to connect to the primary +# as a replication standby if the user's password is correctly supplied. +# +# TYPE DATABASE USER CIDR-ADDRESS METHOD +host replication foo 192.168.1.100/32 md5 + + + + The host name and port number of the primary, connection user name, + and password are specified in the recovery.conf file or + the corresponding environment variable on the standby. + For example, if the primary is running on host IP 192.168.1.50, + port 5432, the superuser's name for replication is + foo, and the password is foopass, the administrator + can add the following line to the recovery.conf file on the + standby. + + +# The standby connects to the primary that is running on host 192.168.1.50 +# and port 5432 as the user "foo" whose password is "foopass". +primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' + + + You do not need to specify database=replication in the + primary_conninfo. The required option will be added + automatically. If you mention the database parameter at all within + primary_conninfo then a FATAL error will be raised. + + + + + + + Failover + + + If the primary server fails then the standby server should begin + failover procedures. + + + + If the standby server fails then no failover need take place. If the + standby server can be restarted, even some time later, then the recovery + process can also be restarted immediately, taking advantage of + restartable recovery. If the standby server cannot be restarted, then a + full new standby server instance should be created. + + + + If the primary server fails and the standby server becomes the + new primary, and then the old primary restarts, you must have + a mechanism for informing the old primary that it is no longer the primary. This is + sometimes known as STONITH (Shoot The Other Node In The Head), which is + necessary to avoid situations where both systems think they are the + primary, which will lead to confusion and ultimately data loss. + + + + Many failover systems use just two systems, the primary and the standby, + connected by some kind of heartbeat mechanism to continually verify the + connectivity between the two and the viability of the primary. It is + also possible to use a third system (called a witness server) to prevent + some cases of inappropriate failover, but the additional complexity + might not be worthwhile unless it is set up with sufficient care and + rigorous testing. + + + + Once failover to the standby occurs, there is only a + single server in operation. This is known as a degenerate state. + The former standby is now the primary, but the former primary is down + and might stay down. To return to normal operation, a standby server + must be recreated, + either on the former primary system when it comes up, or on a third, + possibly new, system. Once complete the primary and standby can be + considered to have switched roles. Some people choose to use a third + server to provide backup for the new primary until the new standby + server is recreated, + though clearly this complicates the system configuration and + operational processes. + + + + So, switching from primary to standby server can be fast but requires + some time to re-prepare the failover cluster. Regular switching from + primary to standby is useful, since it allows regular downtime on + each system for maintenance. This also serves as a test of the + failover mechanism to ensure that it will really work when you need it. + Written administration procedures are advised. + + + + To trigger failover of a log-shipping standby server, create a trigger + file with the filename and path specified by the trigger_file + setting in recovery.conf. If trigger_file is + not given, there is no way to exit recovery in the standby and promote + it to a master. That can be useful for e.g reporting servers that are + only used to offload read-only queries from the primary, not for high + availability purposes. + + + + + Alternative method for log shipping + + + An alternative to the built-in standby mode desribed in the previous + sections is to use a restore_command that polls the archive location. + This was the only option available in versions 8.4 and below. In this + setup, set standby_mode off, because you are implementing + the polling required for standby operation yourself. See + contrib/pg_standby () for a reference + implementation of this. + + + + Note that in this mode, the server will apply WAL one file at a + time, so if you use the standby server for queries (see Hot Standby), + there is a bigger delay between an action in the master and when the + action becomes visible in the standby, corresponding the time it takes + to fill up the WAL file. archive_timeout can be used to make that delay + shorter. Also note that you can't combine streaming replication with + this method. + + + + The operations that occur on both primary and standby servers are normal continuous archiving and recovery tasks. The only point of contact between the two database servers is the archive of WAL files that both share: primary writing to the archive, standby reading from @@ -639,7 +959,7 @@ if (!triggered) and design. One potential option is the restore_command command. It is executed once for each WAL file, but the process running the restore_command is created and dies for - each file, so there is no daemon or server process, and + each file, so there is no daemon or server process, and signals or a signal handler cannot be used. Therefore, the restore_command is not suitable to trigger failover. It is possible to use a simple timeout facility, especially if @@ -658,7 +978,6 @@ if (!triggered) files are no longer required, assuming the archive is writable from the standby server. - Implementation @@ -754,243 +1073,6 @@ if (!triggered) - - Streaming Replication - - - Streaming Replication - - - - Streaming replication allows a standby server to stay more up-to-date - than is possible with file-based log shipping. The standby connects - to the primary, which streams WAL records to the standby as they're - generated, without waiting for the WAL file to be filled. - - - - Streaming replication is asynchronous, so there is still a small delay - between committing a transaction in the primary and for the changes to - become visible in the standby. The delay is however much smaller than with - file-based log shipping, typically under one second assuming the standby - is powerful enough to keep up with the load. With streaming replication, - archive_timeout is not required to reduce the data loss - window. - - - - Streaming replication relies on file-based continuous archiving for - making the base backup and for allowing the standby to catch up if it is - disconnected from the primary for long enough for the primary to - delete old WAL files still required by the standby. - - - - Setup - - The short procedure for configuring streaming replication is as follows. - For full details of each step, refer to other sections as noted. - - - - - Set up primary and standby systems as near identically as possible, - including two identical copies of PostgreSQL at the - same release level. - - - - - Set up continuous archiving from the primary to a WAL archive located - in a directory on the standby server. In particular, set - and - - to archive WAL files in a location accessible from the standby - (see ). - - - - - - Set and authentication options - (see pg_hba.conf) on the primary so that the standby server can connect to - the replication pseudo-database on the primary server (see - ). - - - On systems that support the keepalive socket option, setting - , - and - helps the master promptly - notice a broken connection. - - - - - Set the maximum number of concurrent connections from the standby servers - (see for details). - - - - - Start the PostgreSQL server on the primary. - - - - - Make a base backup of the primary server (see - ), and load this data onto the - standby. Note that all files present in pg_xlog - and pg_xlog/archive_status on the standby - server should be removed because they might be obsolete. - - - - - If you're setting up the standby server for high availability purposes, - set up WAL archiving, connections and authentication like the primary - server, because the standby server will work as a primary server after - failover. If you're setting up the standby server for reporting - purposes, with no plans to fail over to it, configure the standby - accordingly. - - - - - Create a recovery command file recovery.conf in the data - directory on the standby server. Set restore_command - as you would in normal recovery from a continuous archiving backup - (see ). pg_standby or - similar tools that wait for the next WAL file to arrive cannot be used - with streaming replication, as the server handles retries and waiting - itself. Enable standby_mode. Set - primary_conninfo to point to the primary server. - - - - - - Start the PostgreSQL server on the standby. The standby - server will go into recovery mode and proceed to receive WAL records - from the primary and apply them continuously. - - - - - - - - Authentication - - It is very important that the access privilege for replication be setup - properly so that only trusted users can read the WAL stream, because it is - easy to extract privileged information from it. - - - Only the superuser is allowed to connect to the primary as the replication - standby. So a role with the SUPERUSER and LOGIN - privileges needs to be created in the primary. - - - Client authentication for replication is controlled by the - pg_hba.conf record specifying replication in the - database field. For example, if the standby is running on - host IP 192.168.1.100 and the superuser's name for replication - is foo, the administrator can add the following line to the - pg_hba.conf file on the primary. - - -# Allow the user "foo" from host 192.168.1.100 to connect to the primary -# as a replication standby if the user's password is correctly supplied. -# -# TYPE DATABASE USER CIDR-ADDRESS METHOD -host replication foo 192.168.1.100/32 md5 - - - - The host name and port number of the primary, connection user name, - and password are specified in the recovery.conf file or - the corresponding environment variable on the standby. - For example, if the primary is running on host IP 192.168.1.50, - port 5432, the superuser's name for replication is - foo, and the password is foopass, the administrator - can add the following line to the recovery.conf file on the - standby. - - -# The standby connects to the primary that is running on host 192.168.1.50 -# and port 5432 as the user "foo" whose password is "foopass". -primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' - - - You do not need to specify database=replication in the - primary_conninfo. The required option will be added - automatically. If you mention the database parameter at all within - primary_conninfo then a FATAL error will be raised. - - - - - - Failover - - - If the primary server fails then the standby server should begin - failover procedures. - - - - If the standby server fails then no failover need take place. If the - standby server can be restarted, even some time later, then the recovery - process can also be restarted immediately, taking advantage of - restartable recovery. If the standby server cannot be restarted, then a - full new standby server instance should be created. - - - - If the primary server fails and the standby server becomes the - new primary, and then the old primary restarts, you must have - a mechanism for informing the old primary that it is no longer the primary. This is - sometimes known as STONITH (Shoot The Other Node In The Head), which is - necessary to avoid situations where both systems think they are the - primary, which will lead to confusion and ultimately data loss. - - - - Many failover systems use just two systems, the primary and the standby, - connected by some kind of heartbeat mechanism to continually verify the - connectivity between the two and the viability of the primary. It is - also possible to use a third system (called a witness server) to prevent - some cases of inappropriate failover, but the additional complexity - might not be worthwhile unless it is set up with sufficient care and - rigorous testing. - - - - Once failover to the standby occurs, there is only a - single server in operation. This is known as a degenerate state. - The former standby is now the primary, but the former primary is down - and might stay down. To return to normal operation, a standby server - must be recreated, - either on the former primary system when it comes up, or on a third, - possibly new, system. Once complete the primary and standby can be - considered to have switched roles. Some people choose to use a third - server to provide backup for the new primary until the new standby - server is recreated, - though clearly this complicates the system configuration and - operational processes. - - - - So, switching from primary to standby server can be fast but requires - some time to re-prepare the failover cluster. Regular switching from - primary to standby is useful, since it allows regular downtime on - each system for maintenance. This also serves as a test of the - failover mechanism to ensure that it will really work when you need it. - Written administration procedures are advised. - - - Hot Standby