postgresql/doc/src/sgml/replication-origins.sgml

<!-- doc/src/sgml/replication-origins.sgml -->
<chapter id="replication-origins">
 <title>Replication Progress Tracking</title>
 <indexterm zone="replication-origins">
  <primary>Replication Progress Tracking</primary>
 </indexterm>
 <indexterm zone="replication-origins">
  <primary>Replication Origins</primary>
 </indexterm>

 <para>
  Replication origins are intended to make it easier to implement
  logical replication solutions on top
  of <xref linkend="logicaldecoding">. They provide a solution to two
  common problems:
  <itemizedlist>
   <listitem><para>How to safely keep track of replication progress</para></listitem>
   <listitem><para>How to change replication behavior, based on the
   origin of a row; e.g. to avoid loops in bi-directional replication
   setups</para></listitem>
  </itemizedlist>
 </para>

 <para>
  Replication origins consist out of a name and a oid. The name, which
  is what should be used to refer to the origin across systems, is
  free-form text. It should be used in a way that makes conflicts
  between replication origins created by different replication
  solutions unlikely; e.g. by prefixing the replication solution's
  name to it.  The oid is used only to avoid having to store the long
  version in situations where space efficiency is important. It should
  never be shared between systems.
 </para>

 <para>
  Replication origins can be created using the
  <link linkend="pg-replication-origin-create"><function>pg_replication_origin_create()</function></link>;
  dropped using
  <link linkend="pg-replication-origin-drop"><function>pg_replication_origin_drop()</function></link>;
  and seen in the
  <link linkend="catalog-pg-replication-origin"><structname>pg_replication_origin</structname></link>
  catalog.
 </para>

 <para>
  When replicating from one system to another (independent of the fact that
  those two might be in the same cluster, or even same database) one
  nontrivial part of building a replication solution is to keep track of
  replay progress in a safe manner. When the applying process, or the whole
  cluster, dies, it needs to be possible to find out up to where data has
  successfully been replicated. Naive solutions to this like updating a row in
  a table for every replayed transaction have problems like runtime overhead
  bloat.
 </para>

 <para>
  Using the replication origin infrastructure a session can be
  marked as replaying from a remote node (using the
  <link linkend="pg-replication-origin-session-setup"><function>pg_replication_origin_session_setup()</function></link>
  function. Additionally the <acronym>LSN</acronym> and commit
  timestamp of every source transaction can be configured on a per
  transaction basis using
  <link linkend="pg-replication-origin-xact-setup"><function>pg_replication_origin_xact-setup()</function></link>.
  If that's done replication progress will be persist in a crash safe
  manner. Replay progress for all replication origins can be seen in the
  <link linkend="catalog-pg-replication-origin-status">
   <structname>pg_replication_origin_status</structname>
  </link> view. A individual origin's progress, e.g. when resuming
  replication, can be acquired using
  <link linkend="pg-replication-origin-progress"><function>pg_replication_origin_progress()</function></link>
  for any origin or
  <link linkend="pg-replication-origin-session-progress"><function>pg_replication_origin_session_progress()</function></link>
  for the origin configured in the current session.
 </para>

 <para>
  In more complex replication topologies than replication from exactly one
  system to one other, another problem can be that, that it is hard to avoid
  replicating replayed rows again. That can lead both to cycles in the
  replication and inefficiencies. Replication origins provide a optional
  mechanism to recognize and prevent that. When configured using the functions
  referenced in the previous paragraph, every change and transaction passed to
  output plugin callbacks (see <xref linkend="logicaldecoding-output-plugin">)
  generated by the session is tagged with the replication origin of the
  generating session.  This allows to treat them differently in the output
  plugin, e.g. ignoring all but locally originating rows.  Additionally
  the <link linkend="logicaldecoding-output-plugin-filter-by-origin">
  <function>filter_by_origin_cb</function></link> callback can be used
  to filter the logical decoding change stream based on the
  source. While less flexible, filtering via that callback is
  considerably more efficient.
 </para>
</chapter>
Introduce replication progress tracking infrastructure. When implementing a replication solution ontop of logical decoding, two related problems exist: * How to safely keep track of replication progress * How to change replication behavior, based on the origin of a row; e.g. to avoid loops in bi-directional replication setups The solution to these problems, as implemented here, consist out of three parts: 1) 'replication origins', which identify nodes in a replication setup. 2) 'replication progress tracking', which remembers, for each replication origin, how far replay has progressed in a efficient and crash safe manner. 3) The ability to filter out changes performed on the behest of a replication origin during logical decoding; this allows complex replication topologies. E.g. by filtering all replayed changes out. Most of this could also be implemented in "userspace", e.g. by inserting additional rows contain origin information, but that ends up being much less efficient and more complicated. We don't want to require various replication solutions to reimplement logic for this independently. The infrastructure is intended to be generic enough to be reusable. This infrastructure also replaces the 'nodeid' infrastructure of commit timestamps. It is intended to provide all the former capabilities, except that there's only 2^16 different origins; but now they integrate with logical decoding. Additionally more functionality is accessible via SQL. Since the commit timestamp infrastructure has also been introduced in 9.5 (commit 73c986add) changing the API is not a problem. For now the number of origins for which the replication progress can be tracked simultaneously is determined by the max_replication_slots GUC. That GUC is not a perfect match to configure this, but there doesn't seem to be sufficient reason to introduce a separate new one. Bumps both catversion and wal page magic. Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer Discussion: 20150216002155.GI15326@awork2.anarazel.de, 20140923182422.GA15776@alap3.anarazel.de, 20131114172632.GE7522@alap2.anarazel.de 2015-04-29 19:30:53 +02:00			`<!-- doc/src/sgml/replication-origins.sgml -->`
			`<chapter id="replication-origins">`
			`<title>Replication Progress Tracking</title>`
			`<indexterm zone="replication-origins">`
			`<primary>Replication Progress Tracking</primary>`
			`</indexterm>`
			`<indexterm zone="replication-origins">`
			`<primary>Replication Origins</primary>`
			`</indexterm>`

			`<para>`
			`Replication origins are intended to make it easier to implement`
			`logical replication solutions on top`
			`of <xref linkend="logicaldecoding">. They provide a solution to two`
			`common problems:`
			`<itemizedlist>`
			`<listitem><para>How to safely keep track of replication progress</para></listitem>`
			`<listitem><para>How to change replication behavior, based on the`
			`origin of a row; e.g. to avoid loops in bi-directional replication`
			`setups</para></listitem>`
			`</itemizedlist>`
			`</para>`

			`<para>`
			`Replication origins consist out of a name and a oid. The name, which`
			`is what should be used to refer to the origin across systems, is`
			`free-form text. It should be used in a way that makes conflicts`
			`between replication origins created by different replication`
			`solutions unlikely; e.g. by prefixing the replication solution's`
			`name to it. The oid is used only to avoid having to store the long`
			`version in situations where space efficiency is important. It should`
			`never be shared between systems.`
			`</para>`

			`<para>`
			`Replication origins can be created using the`
			`<link linkend="pg-replication-origin-create"><function>pg_replication_origin_create()</function></link>;`
			`dropped using`
			`<link linkend="pg-replication-origin-drop"><function>pg_replication_origin_drop()</function></link>;`
			`and seen in the`
			`<link linkend="catalog-pg-replication-origin"><structname>pg_replication_origin</structname></link>`
			`catalog.`
			`</para>`

			`<para>`
			`When replicating from one system to another (independent of the fact that`
			`those two might be in the same cluster, or even same database) one`
			`nontrivial part of building a replication solution is to keep track of`
			`replay progress in a safe manner. When the applying process, or the whole`
			`cluster, dies, it needs to be possible to find out up to where data has`
			`successfully been replicated. Naive solutions to this like updating a row in`
			`a table for every replayed transaction have problems like runtime overhead`
			`bloat.`
			`</para>`

			`<para>`
			`Using the replication origin infrastructure a session can be`
			`marked as replaying from a remote node (using the`
			`<link linkend="pg-replication-origin-session-setup"><function>pg_replication_origin_session_setup()</function></link>`
			`function. Additionally the <acronym>LSN</acronym> and commit`
			`timestamp of every source transaction can be configured on a per`
			`transaction basis using`
			`<link linkend="pg-replication-origin-xact-setup"><function>pg_replication_origin_xact-setup()</function></link>.`
			`If that's done replication progress will be persist in a crash safe`
			`manner. Replay progress for all replication origins can be seen in the`
			`<link linkend="catalog-pg-replication-origin-status">`
			`<structname>pg_replication_origin_status</structname>`
			`</link> view. A individual origin's progress, e.g. when resuming`
			`replication, can be acquired using`
			`<link linkend="pg-replication-origin-progress"><function>pg_replication_origin_progress()</function></link>`
			`for any origin or`
			`<link linkend="pg-replication-origin-session-progress"><function>pg_replication_origin_session_progress()</function></link>`
			`for the origin configured in the current session.`
			`</para>`

			`<para>`
			`In more complex replication topologies than replication from exactly one`
			`system to one other, another problem can be that, that it is hard to avoid`
			`replicating replayed rows again. That can lead both to cycles in the`
			`replication and inefficiencies. Replication origins provide a optional`
			`mechanism to recognize and prevent that. When configured using the functions`
			`referenced in the previous paragraph, every change and transaction passed to`
			`output plugin callbacks (see <xref linkend="logicaldecoding-output-plugin">)`
			`generated by the session is tagged with the replication origin of the`
			`generating session. This allows to treat them differently in the output`
			`plugin, e.g. ignoring all but locally originating rows. Additionally`
			`the <link linkend="logicaldecoding-output-plugin-filter-by-origin">`
			`<function>filter_by_origin_cb</function></link> callback can be used`
			`to filter the logical decoding change stream based on the`
			`source. While less flexible, filtering via that callback is`
			`considerably more efficient.`
			`</para>`
			`</chapter>`