mirror of
https://git.postgresql.org/git/postgresql.git
synced 2024-09-15 21:10:26 +02:00
3c49c6facb
Since some preparation work had already been done, the only source changes left were changing empty-element tags like <xref linkend="foo"> to <xref linkend="foo"/>, and changing the DOCTYPE. The source files are still named *.sgml, but they are actually XML files now. Renaming could be considered later. In the build system, the intermediate step to convert from SGML to XML is removed. Everything is build straight from the source files again. The OpenSP (or the old SP) package is no longer needed. The documentation toolchain instructions are updated and are much simpler now. Peter Eisentraut, Alexander Lakhin, Jürgen Purtz
97 lines
4.5 KiB
Plaintext
97 lines
4.5 KiB
Plaintext
<!-- doc/src/sgml/replication-origins.sgml -->
|
|
<chapter id="replication-origins">
|
|
<title>Replication Progress Tracking</title>
|
|
|
|
<indexterm zone="replication-origins">
|
|
<primary>Replication Progress Tracking</primary>
|
|
</indexterm>
|
|
<indexterm zone="replication-origins">
|
|
<primary>Replication Origins</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Replication origins are intended to make it easier to implement
|
|
logical replication solutions on top
|
|
of <link linkend="logicaldecoding">logical decoding</link>.
|
|
They provide a solution to two common problems:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>How to safely keep track of replication progress</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>How to change replication behavior based on the
|
|
origin of a row; for example, to prevent loops in bi-directional
|
|
replication setups</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>
|
|
Replication origins have just two properties, a name and an OID. The name,
|
|
which is what should be used to refer to the origin across systems, is
|
|
free-form <type>text</type>. It should be used in a way that makes conflicts
|
|
between replication origins created by different replication solutions
|
|
unlikely; e.g. by prefixing the replication solution's name to it.
|
|
The OID is used only to avoid having to store the long version
|
|
in situations where space efficiency is important. It should never be shared
|
|
across systems.
|
|
</para>
|
|
|
|
<para>
|
|
Replication origins can be created using the function
|
|
<link linkend="pg-replication-origin-create"><function>pg_replication_origin_create()</function></link>;
|
|
dropped using
|
|
<link linkend="pg-replication-origin-drop"><function>pg_replication_origin_drop()</function></link>;
|
|
and seen in the
|
|
<link linkend="catalog-pg-replication-origin"><structname>pg_replication_origin</structname></link>
|
|
system catalog.
|
|
</para>
|
|
|
|
<para>
|
|
One nontrivial part of building a replication solution is to keep track of
|
|
replay progress in a safe manner. When the applying process, or the whole
|
|
cluster, dies, it needs to be possible to find out up to where data has
|
|
successfully been replicated. Naive solutions to this, such as updating a
|
|
row in a table for every replayed transaction, have problems like run-time
|
|
overhead and database bloat.
|
|
</para>
|
|
|
|
<para>
|
|
Using the replication origin infrastructure a session can be
|
|
marked as replaying from a remote node (using the
|
|
<link linkend="pg-replication-origin-session-setup"><function>pg_replication_origin_session_setup()</function></link>
|
|
function). Additionally the <acronym>LSN</acronym> and commit
|
|
time stamp of every source transaction can be configured on a per
|
|
transaction basis using
|
|
<link linkend="pg-replication-origin-xact-setup"><function>pg_replication_origin_xact_setup()</function></link>.
|
|
If that's done replication progress will persist in a crash safe
|
|
manner. Replay progress for all replication origins can be seen in the
|
|
<link linkend="view-pg-replication-origin-status">
|
|
<structname>pg_replication_origin_status</structname>
|
|
</link> view. An individual origin's progress, e.g. when resuming
|
|
replication, can be acquired using
|
|
<link linkend="pg-replication-origin-progress"><function>pg_replication_origin_progress()</function></link>
|
|
for any origin or
|
|
<link linkend="pg-replication-origin-session-progress"><function>pg_replication_origin_session_progress()</function></link>
|
|
for the origin configured in the current session.
|
|
</para>
|
|
|
|
<para>
|
|
In replication topologies more complex than replication from exactly one
|
|
system to one other system, another problem can be that it is hard to avoid
|
|
replicating replayed rows again. That can lead both to cycles in the
|
|
replication and inefficiencies. Replication origins provide an optional
|
|
mechanism to recognize and prevent that. When configured using the functions
|
|
referenced in the previous paragraph, every change and transaction passed to
|
|
output plugin callbacks (see <xref linkend="logicaldecoding-output-plugin"/>)
|
|
generated by the session is tagged with the replication origin of the
|
|
generating session. This allows treating them differently in the output
|
|
plugin, e.g. ignoring all but locally-originating rows. Additionally
|
|
the <link linkend="logicaldecoding-output-plugin-filter-origin">
|
|
<function>filter_by_origin_cb</function></link> callback can be used
|
|
to filter the logical decoding change stream based on the
|
|
source. While less flexible, filtering via that callback is
|
|
considerably more efficient than doing it in the output plugin.
|
|
</para>
|
|
</chapter>
|