postgresql/doc/src/sgml/ref/create_subscription.sgml

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

558 lines
23 KiB
Plaintext
Raw Normal View History

<!--
doc/src/sgml/ref/create_subscription.sgml
PostgreSQL documentation
-->
<refentry id="sql-createsubscription">
<indexterm zone="sql-createsubscription">
<primary>CREATE SUBSCRIPTION</primary>
</indexterm>
<refmeta>
<refentrytitle>CREATE SUBSCRIPTION</refentrytitle>
<manvolnum>7</manvolnum>
<refmiscinfo>SQL - Language Statements</refmiscinfo>
</refmeta>
<refnamediv>
<refname>CREATE SUBSCRIPTION</refname>
<refpurpose>define a new subscription</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceable>
CONNECTION '<replaceable class="parameter">conninfo</replaceable>'
PUBLICATION <replaceable class="parameter">publication_name</replaceable> [, ...]
[ WITH ( <replaceable class="parameter">subscription_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) ]
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Description</title>
<para>
<command>CREATE SUBSCRIPTION</command> adds a new logical-replication
Add new predefined role pg_create_subscription. This role can be granted to non-superusers to allow them to issue CREATE SUBSCRIPTION. The non-superuser must additionally have CREATE permissions on the database in which the subscription is to be created. Most forms of ALTER SUBSCRIPTION, including ALTER SUBSCRIPTION .. SKIP, now require only that the role performing the operation own the subscription, or inherit the privileges of the owner. However, to use ALTER SUBSCRIPTION ... RENAME or ALTER SUBSCRIPTION ... OWNER TO, you also need CREATE permission on the database. This is similar to what we do for schemas. To change the owner of a schema, you must also have permission to SET ROLE to the new owner, similar to what we do for other object types. Non-superusers are required to specify a password for authentication and the remote side must use the password, similar to what is required for postgres_fdw and dblink. A superuser who wants a non-superuser to own a subscription that does not rely on password authentication may set the new password_required=false property on that subscription. A non-superuser may not set password_required=false and may not modify a subscription that already has password_required=false. This new password_required subscription property works much like the eponymous postgres_fdw property. In both cases, the actual semantics are that a password is not required if either (1) the property is set to false or (2) the relevant user is the superuser. Patch by me, reviewed by Andres Freund, Jeff Davis, Mark Dilger, and Stephen Frost (but some of those people did not fully endorse all of the decisions that the patch makes). Discussion: http://postgr.es/m/CA+TgmoaDH=0Xj7OBiQnsHTKcF2c4L+=gzPBUKSJLh8zed2_+Dg@mail.gmail.com
2023-03-30 17:37:19 +02:00
subscription. The user that creates a subscription becomes the owner
of the subscription. The subscription name must be distinct from the name of
any existing subscription in the current database.
</para>
<para>
A subscription represents a replication connection to the publisher.
Hence, in addition to adding definitions in the local catalogs, this
command normally creates a replication slot on the publisher.
</para>
<para>
A logical replication worker will be started to replicate data for the new
subscription at the commit of the transaction where this command is run,
unless the subscription is initially disabled.
</para>
Add new predefined role pg_create_subscription. This role can be granted to non-superusers to allow them to issue CREATE SUBSCRIPTION. The non-superuser must additionally have CREATE permissions on the database in which the subscription is to be created. Most forms of ALTER SUBSCRIPTION, including ALTER SUBSCRIPTION .. SKIP, now require only that the role performing the operation own the subscription, or inherit the privileges of the owner. However, to use ALTER SUBSCRIPTION ... RENAME or ALTER SUBSCRIPTION ... OWNER TO, you also need CREATE permission on the database. This is similar to what we do for schemas. To change the owner of a schema, you must also have permission to SET ROLE to the new owner, similar to what we do for other object types. Non-superusers are required to specify a password for authentication and the remote side must use the password, similar to what is required for postgres_fdw and dblink. A superuser who wants a non-superuser to own a subscription that does not rely on password authentication may set the new password_required=false property on that subscription. A non-superuser may not set password_required=false and may not modify a subscription that already has password_required=false. This new password_required subscription property works much like the eponymous postgres_fdw property. In both cases, the actual semantics are that a password is not required if either (1) the property is set to false or (2) the relevant user is the superuser. Patch by me, reviewed by Andres Freund, Jeff Davis, Mark Dilger, and Stephen Frost (but some of those people did not fully endorse all of the decisions that the patch makes). Discussion: http://postgr.es/m/CA+TgmoaDH=0Xj7OBiQnsHTKcF2c4L+=gzPBUKSJLh8zed2_+Dg@mail.gmail.com
2023-03-30 17:37:19 +02:00
<para>
To be able to create a subscription, you must have the privileges of
Add new predefined role pg_create_subscription. This role can be granted to non-superusers to allow them to issue CREATE SUBSCRIPTION. The non-superuser must additionally have CREATE permissions on the database in which the subscription is to be created. Most forms of ALTER SUBSCRIPTION, including ALTER SUBSCRIPTION .. SKIP, now require only that the role performing the operation own the subscription, or inherit the privileges of the owner. However, to use ALTER SUBSCRIPTION ... RENAME or ALTER SUBSCRIPTION ... OWNER TO, you also need CREATE permission on the database. This is similar to what we do for schemas. To change the owner of a schema, you must also have permission to SET ROLE to the new owner, similar to what we do for other object types. Non-superusers are required to specify a password for authentication and the remote side must use the password, similar to what is required for postgres_fdw and dblink. A superuser who wants a non-superuser to own a subscription that does not rely on password authentication may set the new password_required=false property on that subscription. A non-superuser may not set password_required=false and may not modify a subscription that already has password_required=false. This new password_required subscription property works much like the eponymous postgres_fdw property. In both cases, the actual semantics are that a password is not required if either (1) the property is set to false or (2) the relevant user is the superuser. Patch by me, reviewed by Andres Freund, Jeff Davis, Mark Dilger, and Stephen Frost (but some of those people did not fully endorse all of the decisions that the patch makes). Discussion: http://postgr.es/m/CA+TgmoaDH=0Xj7OBiQnsHTKcF2c4L+=gzPBUKSJLh8zed2_+Dg@mail.gmail.com
2023-03-30 17:37:19 +02:00
the <literal>pg_create_subscription</literal> role, as well as
<literal>CREATE</literal> privileges on the current database.
</para>
<para>
Additional information about subscriptions and logical replication as a
whole is available at <xref linkend="logical-replication-subscription"/> and
<xref linkend="logical-replication"/>.
</para>
</refsect1>
<refsect1>
<title>Parameters</title>
<variablelist>
<varlistentry id="sql-createsubscription-name">
<term><replaceable class="parameter">subscription_name</replaceable></term>
<listitem>
<para>
The name of the new subscription.
</para>
</listitem>
</varlistentry>
<varlistentry id="sql-createsubscription-connection">
<term><literal>CONNECTION '<replaceable class="parameter">conninfo</replaceable>'</literal></term>
<listitem>
<para>
The <application>libpq</application> connection string defining how
to connect to the publisher database. For details see
<xref linkend="libpq-connstring"/>.
</para>
</listitem>
</varlistentry>
<varlistentry id="sql-createsubscription-publication">
<term><literal>PUBLICATION <replaceable class="parameter">publication_name</replaceable> [, ...]</literal></term>
<listitem>
<para>
Names of the publications on the publisher to subscribe to.
</para>
</listitem>
</varlistentry>
<varlistentry id="sql-createsubscription-with">
<term><literal>WITH ( <replaceable class="parameter">subscription_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
<listitem>
<para>
This clause specifies optional parameters for a subscription.
</para>
<para>
The following parameters control what happens during subscription creation:
<variablelist>
<varlistentry id="sql-createsubscription-with-connect">
<term><literal>connect</literal> (<type>boolean</type>)</term>
<listitem>
<para>
Specifies whether the <command>CREATE SUBSCRIPTION</command>
command should connect to the publisher at all. The default
is <literal>true</literal>. Setting this to
<literal>false</literal> will force the values of
<literal>create_slot</literal>, <literal>enabled</literal> and
<literal>copy_data</literal> to <literal>false</literal>.
(You cannot combine setting <literal>connect</literal>
to <literal>false</literal> with
setting <literal>create_slot</literal>, <literal>enabled</literal>,
or <literal>copy_data</literal> to <literal>true</literal>.)
</para>
<para>
Since no connection is made when this option is
<literal>false</literal>, no tables are subscribed. To initiate
replication, you must manually create the replication slot, enable
the subscription, and refresh the subscription. See
<xref linkend="logical-replication-subscription-examples-deferred-slot"/>
for examples.
</para>
</listitem>
</varlistentry>
<varlistentry id="sql-createsubscription-with-create-slot">
<term><literal>create_slot</literal> (<type>boolean</type>)</term>
<listitem>
<para>
Specifies whether the command should create the replication slot on
the publisher. The default is <literal>true</literal>.
</para>
<para>
If set to <literal>false</literal>, you are responsible for
creating the publisher's slot in some other way. See
<xref linkend="logical-replication-subscription-examples-deferred-slot"/>
for examples.
</para>
</listitem>
</varlistentry>
<varlistentry id="sql-createsubscription-with-enabled">
<term><literal>enabled</literal> (<type>boolean</type>)</term>
<listitem>
<para>
Specifies whether the subscription should be actively replicating
or whether it should just be set up but not started yet. The default
is <literal>true</literal>.
</para>
</listitem>
</varlistentry>
<varlistentry id="sql-createsubscription-with-slot-name">
<term><literal>slot_name</literal> (<type>string</type>)</term>
<listitem>
<para>
Name of the publisher's replication slot to use. The default is
to use the name of the subscription for the slot name.
</para>
<para>
Setting <literal>slot_name</literal> to <literal>NONE</literal>
means there will be no replication slot associated with the
subscription. Such subscriptions must also have both
<literal>enabled</literal> and <literal>create_slot</literal> set to
<literal>false</literal>. Use this when you will be creating the
replication slot later manually. See
<xref linkend="logical-replication-subscription-examples-deferred-slot"/>
for examples.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
<para>
The following parameters control the subscription's replication
behavior after it has been created:
<variablelist>
<varlistentry id="sql-createsubscription-with-binary">
<term><literal>binary</literal> (<type>boolean</type>)</term>
<listitem>
<para>
Specifies whether the subscription will request the publisher to send
the data in binary format (as opposed to text). The default is
<literal>false</literal>. Any initial table synchronization copy
(see <literal>copy_data</literal>) also uses the same format. Binary
format can be faster than the text format, but it is less portable
across machine architectures and <productname>PostgreSQL</productname>
versions. Binary format is very data type specific; for example, it
will not allow copying from a <type>smallint</type> column to an
<type>integer</type> column, even though that would work fine in text
format. Even when this option is enabled, only data types having binary
send and receive functions will be transferred in binary. Note that
the initial synchronization requires all data types to have binary
send and receive functions, otherwise the synchronization will fail
(see <xref linkend="sql-createtype"/> for more about send/receive
functions).
</para>
<para>
When doing cross-version replication, it could be that the
publisher has a binary send function for some data type, but the
subscriber lacks a binary receive function for that type. In
such a case, data transfer will fail, and
the <literal>binary</literal> option cannot be used.
</para>
<para>
If the publisher is a <productname>PostgreSQL</productname> version
before 16, then any initial table synchronization will use text format
even if <literal>binary = true</literal>.
</para>
</listitem>
</varlistentry>
<varlistentry id="sql-createsubscription-with-copy-data">
<term><literal>copy_data</literal> (<type>boolean</type>)</term>
<listitem>
<para>
Specifies whether to copy pre-existing data in the publications
that are being subscribed to when the replication starts.
The default is <literal>true</literal>.
</para>
Allow specifying row filters for logical replication of tables. This feature adds row filtering for publication tables. When a publication is defined or modified, an optional WHERE clause can be specified. Rows that don't satisfy this WHERE clause will be filtered out. This allows a set of tables to be partially replicated. The row filter is per table. A new row filter can be added simply by specifying a WHERE clause after the table name. The WHERE clause must be enclosed by parentheses. The row filter WHERE clause for a table added to a publication that publishes UPDATE and/or DELETE operations must contain only columns that are covered by REPLICA IDENTITY. The row filter WHERE clause for a table added to a publication that publishes INSERT can use any column. If the row filter evaluates to NULL, it is regarded as "false". The WHERE clause only allows simple expressions that don't have user-defined functions, user-defined operators, user-defined types, user-defined collations, non-immutable built-in functions, or references to system columns. These restrictions could be addressed in the future. If you choose to do the initial table synchronization, only data that satisfies the row filters is copied to the subscriber. If the subscription has several publications in which a table has been published with different WHERE clauses, rows that satisfy ANY of the expressions will be copied. If a subscriber is a pre-15 version, the initial table synchronization won't use row filters even if they are defined in the publisher. The row filters are applied before publishing the changes. If the subscription has several publications in which the same table has been published with different filters (for the same publish operation), those expressions get OR'ed together so that rows satisfying any of the expressions will be replicated. This means all the other filters become redundant if (a) one of the publications have no filter at all, (b) one of the publications was created using FOR ALL TABLES, (c) one of the publications was created using FOR ALL TABLES IN SCHEMA and the table belongs to that same schema. If your publication contains a partitioned table, the publication parameter publish_via_partition_root determines if it uses the partition's row filter (if the parameter is false, the default) or the root partitioned table's row filter. Psql commands \dRp+ and \d <table-name> will display any row filters. Author: Hou Zhijie, Euler Taveira, Peter Smith, Ajin Cherian Reviewed-by: Greg Nancarrow, Haiying Tang, Amit Kapila, Tomas Vondra, Dilip Kumar, Vignesh C, Alvaro Herrera, Andres Freund, Wei Wang Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X%2BmK_DitLXF25B%3DjE6xyNCH4YOwM860JR7HarGQ%40mail.gmail.com
2022-02-22 03:24:12 +01:00
<para>
If the publications contain <literal>WHERE</literal> clauses, it
will affect what data is copied. Refer to the
<xref linkend="sql-createsubscription-notes" /> for details.
</para>
<para>
See <xref linkend="sql-createsubscription-notes"/> for details of how
<literal>copy_data = true</literal> can interact with the
<literal>origin</literal> parameter.
</para>
</listitem>
</varlistentry>
<varlistentry id="sql-createsubscription-with-streaming">
Perform apply of large transactions by parallel workers. Currently, for large transactions, the publisher sends the data in multiple streams (changes divided into chunks depending upon logical_decoding_work_mem), and then on the subscriber-side, the apply worker writes the changes into temporary files and once it receives the commit, it reads from those files and applies the entire transaction. To improve the performance of such transactions, we can instead allow them to be applied via parallel workers. In this approach, we assign a new parallel apply worker (if available) as soon as the xact's first stream is received and the leader apply worker will send changes to this new worker via shared memory. The parallel apply worker will directly apply the change instead of writing it to temporary files. However, if the leader apply worker times out while attempting to send a message to the parallel apply worker, it will switch to "partial serialize" mode - in this mode, the leader serializes all remaining changes to a file and notifies the parallel apply workers to read and apply them at the end of the transaction. We use a non-blocking way to send the messages from the leader apply worker to the parallel apply to avoid deadlocks. We keep this parallel apply assigned till the transaction commit is received and also wait for the worker to finish at commit. This preserves commit ordering and avoid writing to and reading from files in most cases. We still need to spill if there is no worker available. This patch also extends the SUBSCRIPTION 'streaming' parameter so that the user can control whether to apply the streaming transaction in a parallel apply worker or spill the change to disk. The user can set the streaming parameter to 'on/off', or 'parallel'. The parameter value 'parallel' means the streaming will be applied via a parallel apply worker, if available. The parameter value 'on' means the streaming transaction will be spilled to disk. The default value is 'off' (same as current behaviour). In addition, the patch extends the logical replication STREAM_ABORT message so that abort_lsn and abort_time can also be sent which can be used to update the replication origin in parallel apply worker when the streaming transaction is aborted. Because this message extension is needed to support parallel streaming, parallel streaming is not supported for publications on servers < PG16. Author: Hou Zhijie, Wang wei, Amit Kapila with design inputs from Sawada Masahiko Reviewed-by: Sawada Masahiko, Peter Smith, Dilip Kumar, Shi yu, Kuroda Hayato, Shveta Mallik Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
2023-01-09 02:30:39 +01:00
<term><literal>streaming</literal> (<type>enum</type>)</term>
<listitem>
<para>
Specifies whether to enable streaming of in-progress transactions
Perform apply of large transactions by parallel workers. Currently, for large transactions, the publisher sends the data in multiple streams (changes divided into chunks depending upon logical_decoding_work_mem), and then on the subscriber-side, the apply worker writes the changes into temporary files and once it receives the commit, it reads from those files and applies the entire transaction. To improve the performance of such transactions, we can instead allow them to be applied via parallel workers. In this approach, we assign a new parallel apply worker (if available) as soon as the xact's first stream is received and the leader apply worker will send changes to this new worker via shared memory. The parallel apply worker will directly apply the change instead of writing it to temporary files. However, if the leader apply worker times out while attempting to send a message to the parallel apply worker, it will switch to "partial serialize" mode - in this mode, the leader serializes all remaining changes to a file and notifies the parallel apply workers to read and apply them at the end of the transaction. We use a non-blocking way to send the messages from the leader apply worker to the parallel apply to avoid deadlocks. We keep this parallel apply assigned till the transaction commit is received and also wait for the worker to finish at commit. This preserves commit ordering and avoid writing to and reading from files in most cases. We still need to spill if there is no worker available. This patch also extends the SUBSCRIPTION 'streaming' parameter so that the user can control whether to apply the streaming transaction in a parallel apply worker or spill the change to disk. The user can set the streaming parameter to 'on/off', or 'parallel'. The parameter value 'parallel' means the streaming will be applied via a parallel apply worker, if available. The parameter value 'on' means the streaming transaction will be spilled to disk. The default value is 'off' (same as current behaviour). In addition, the patch extends the logical replication STREAM_ABORT message so that abort_lsn and abort_time can also be sent which can be used to update the replication origin in parallel apply worker when the streaming transaction is aborted. Because this message extension is needed to support parallel streaming, parallel streaming is not supported for publications on servers < PG16. Author: Hou Zhijie, Wang wei, Amit Kapila with design inputs from Sawada Masahiko Reviewed-by: Sawada Masahiko, Peter Smith, Dilip Kumar, Shi yu, Kuroda Hayato, Shveta Mallik Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
2023-01-09 02:30:39 +01:00
for this subscription. The default value is <literal>off</literal>,
meaning all transactions are fully decoded on the publisher and only
then sent to the subscriber as a whole.
</para>
<para>
If set to <literal>on</literal>, the incoming changes are written to
temporary files and then applied only after the transaction is
committed on the publisher and received by the subscriber.
</para>
<para>
If set to <literal>parallel</literal>, incoming changes are directly
applied via one of the parallel apply workers, if available. If no
parallel apply worker is free to handle streaming transactions then
the changes are written to temporary files and applied after the
transaction is committed. Note that if an error happens in a
parallel apply worker, the finish LSN of the remote transaction
might not be reported in the server log.
</para>
</listitem>
</varlistentry>
<varlistentry id="sql-createsubscription-with-synchronous-commit">
<term><literal>synchronous_commit</literal> (<type>enum</type>)</term>
<listitem>
<para>
The value of this parameter overrides the
<xref linkend="guc-synchronous-commit"/> setting within this
subscription's apply worker processes. The default value
is <literal>off</literal>.
</para>
<para>
It is safe to use <literal>off</literal> for logical replication:
If the subscriber loses transactions because of missing
synchronization, the data will be sent again from the publisher.
</para>
Add support for prepared transactions to built-in logical replication. To add support for streaming transactions at prepare time into the built-in logical replication, we need to do the following things: * Modify the output plugin (pgoutput) to implement the new two-phase API callbacks, by leveraging the extended replication protocol. * Modify the replication apply worker, to properly handle two-phase transactions by replaying them on prepare. * Add a new SUBSCRIPTION option "two_phase" to allow users to enable two-phase transactions. We enable the two_phase once the initial data sync is over. We however must explicitly disable replication of two-phase transactions during replication slot creation, even if the plugin supports it. We don't need to replicate the changes accumulated during this phase, and moreover, we don't have a replication connection open so we don't know where to send the data anyway. The streaming option is not allowed with this new two_phase option. This can be done as a separate patch. We don't allow to toggle two_phase option of a subscription because it can lead to an inconsistent replica. For the same reason, we don't allow to refresh the publication once the two_phase is enabled for a subscription unless copy_data option is false. Author: Peter Smith, Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich Reviewed-by: Amit Kapila, Sawada Masahiko, Vignesh C, Dilip Kumar, Takamichi Osumi, Greg Nancarrow Tested-By: Haiying Tang Discussion: https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru Discussion: https://postgr.es/m/CAA4eK1+opiV4aFTmWWUF9h_32=HfPOW9vZASHarT0UA5oBrtGw@mail.gmail.com
2021-07-14 04:03:50 +02:00
<para>
A different setting might be appropriate when doing synchronous
logical replication. The logical replication workers report the
positions of writes and flushes to the publisher, and when using
synchronous replication, the publisher will wait for the actual
flush. This means that setting
<literal>synchronous_commit</literal> for the subscriber to
<literal>off</literal> when the subscription is used for
synchronous replication might increase the latency for
<command>COMMIT</command> on the publisher. In this scenario, it
can be advantageous to set <literal>synchronous_commit</literal>
to <literal>local</literal> or higher.
</para>
Add support for prepared transactions to built-in logical replication. To add support for streaming transactions at prepare time into the built-in logical replication, we need to do the following things: * Modify the output plugin (pgoutput) to implement the new two-phase API callbacks, by leveraging the extended replication protocol. * Modify the replication apply worker, to properly handle two-phase transactions by replaying them on prepare. * Add a new SUBSCRIPTION option "two_phase" to allow users to enable two-phase transactions. We enable the two_phase once the initial data sync is over. We however must explicitly disable replication of two-phase transactions during replication slot creation, even if the plugin supports it. We don't need to replicate the changes accumulated during this phase, and moreover, we don't have a replication connection open so we don't know where to send the data anyway. The streaming option is not allowed with this new two_phase option. This can be done as a separate patch. We don't allow to toggle two_phase option of a subscription because it can lead to an inconsistent replica. For the same reason, we don't allow to refresh the publication once the two_phase is enabled for a subscription unless copy_data option is false. Author: Peter Smith, Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich Reviewed-by: Amit Kapila, Sawada Masahiko, Vignesh C, Dilip Kumar, Takamichi Osumi, Greg Nancarrow Tested-By: Haiying Tang Discussion: https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru Discussion: https://postgr.es/m/CAA4eK1+opiV4aFTmWWUF9h_32=HfPOW9vZASHarT0UA5oBrtGw@mail.gmail.com
2021-07-14 04:03:50 +02:00
</listitem>
</varlistentry>
<varlistentry id="sql-createsubscription-with-two-phase">
Add support for prepared transactions to built-in logical replication. To add support for streaming transactions at prepare time into the built-in logical replication, we need to do the following things: * Modify the output plugin (pgoutput) to implement the new two-phase API callbacks, by leveraging the extended replication protocol. * Modify the replication apply worker, to properly handle two-phase transactions by replaying them on prepare. * Add a new SUBSCRIPTION option "two_phase" to allow users to enable two-phase transactions. We enable the two_phase once the initial data sync is over. We however must explicitly disable replication of two-phase transactions during replication slot creation, even if the plugin supports it. We don't need to replicate the changes accumulated during this phase, and moreover, we don't have a replication connection open so we don't know where to send the data anyway. The streaming option is not allowed with this new two_phase option. This can be done as a separate patch. We don't allow to toggle two_phase option of a subscription because it can lead to an inconsistent replica. For the same reason, we don't allow to refresh the publication once the two_phase is enabled for a subscription unless copy_data option is false. Author: Peter Smith, Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich Reviewed-by: Amit Kapila, Sawada Masahiko, Vignesh C, Dilip Kumar, Takamichi Osumi, Greg Nancarrow Tested-By: Haiying Tang Discussion: https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru Discussion: https://postgr.es/m/CAA4eK1+opiV4aFTmWWUF9h_32=HfPOW9vZASHarT0UA5oBrtGw@mail.gmail.com
2021-07-14 04:03:50 +02:00
<term><literal>two_phase</literal> (<type>boolean</type>)</term>
<listitem>
<para>
Specifies whether two-phase commit is enabled for this subscription.
The default is <literal>false</literal>.
</para>
<para>
When two-phase commit is enabled, prepared transactions are sent
to the subscriber at the time of <command>PREPARE
TRANSACTION</command>, and are processed as two-phase
transactions on the subscriber too. Otherwise, prepared
transactions are sent to the subscriber only when committed, and
are then processed immediately by the subscriber.
Add support for prepared transactions to built-in logical replication. To add support for streaming transactions at prepare time into the built-in logical replication, we need to do the following things: * Modify the output plugin (pgoutput) to implement the new two-phase API callbacks, by leveraging the extended replication protocol. * Modify the replication apply worker, to properly handle two-phase transactions by replaying them on prepare. * Add a new SUBSCRIPTION option "two_phase" to allow users to enable two-phase transactions. We enable the two_phase once the initial data sync is over. We however must explicitly disable replication of two-phase transactions during replication slot creation, even if the plugin supports it. We don't need to replicate the changes accumulated during this phase, and moreover, we don't have a replication connection open so we don't know where to send the data anyway. The streaming option is not allowed with this new two_phase option. This can be done as a separate patch. We don't allow to toggle two_phase option of a subscription because it can lead to an inconsistent replica. For the same reason, we don't allow to refresh the publication once the two_phase is enabled for a subscription unless copy_data option is false. Author: Peter Smith, Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich Reviewed-by: Amit Kapila, Sawada Masahiko, Vignesh C, Dilip Kumar, Takamichi Osumi, Greg Nancarrow Tested-By: Haiying Tang Discussion: https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru Discussion: https://postgr.es/m/CAA4eK1+opiV4aFTmWWUF9h_32=HfPOW9vZASHarT0UA5oBrtGw@mail.gmail.com
2021-07-14 04:03:50 +02:00
</para>
<para>
The implementation of two-phase commit requires that replication
has successfully finished the initial table synchronization
phase. So even when <literal>two_phase</literal> is enabled for a
subscription, the internal two-phase state remains
temporarily <quote>pending</quote> until the initialization phase
completes. See column <structfield>subtwophasestate</structfield>
of <link linkend="catalog-pg-subscription"><structname>pg_subscription</structname></link>
Add support for prepared transactions to built-in logical replication. To add support for streaming transactions at prepare time into the built-in logical replication, we need to do the following things: * Modify the output plugin (pgoutput) to implement the new two-phase API callbacks, by leveraging the extended replication protocol. * Modify the replication apply worker, to properly handle two-phase transactions by replaying them on prepare. * Add a new SUBSCRIPTION option "two_phase" to allow users to enable two-phase transactions. We enable the two_phase once the initial data sync is over. We however must explicitly disable replication of two-phase transactions during replication slot creation, even if the plugin supports it. We don't need to replicate the changes accumulated during this phase, and moreover, we don't have a replication connection open so we don't know where to send the data anyway. The streaming option is not allowed with this new two_phase option. This can be done as a separate patch. We don't allow to toggle two_phase option of a subscription because it can lead to an inconsistent replica. For the same reason, we don't allow to refresh the publication once the two_phase is enabled for a subscription unless copy_data option is false. Author: Peter Smith, Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich Reviewed-by: Amit Kapila, Sawada Masahiko, Vignesh C, Dilip Kumar, Takamichi Osumi, Greg Nancarrow Tested-By: Haiying Tang Discussion: https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru Discussion: https://postgr.es/m/CAA4eK1+opiV4aFTmWWUF9h_32=HfPOW9vZASHarT0UA5oBrtGw@mail.gmail.com
2021-07-14 04:03:50 +02:00
to know the actual two-phase state.
</para>
</listitem>
</varlistentry>
<varlistentry id="sql-createsubscription-with-disable-on-error">
<term><literal>disable_on_error</literal> (<type>boolean</type>)</term>
<listitem>
<para>
Specifies whether the subscription should be automatically disabled
if any errors are detected by subscription workers during data
replication from the publisher. The default is
<literal>false</literal>.
</para>
</listitem>
</varlistentry>
2022-07-21 05:17:38 +02:00
<varlistentry id="sql-createsubscription-with-password-required">
<term><literal>password_required</literal> (<type>boolean</type>)</term>
<listitem>
<para>
Specifies whether connections to the publisher made as a result
of this subscription must use password authentication. This setting
is ignored when the subscription is owned by a superuser.
The default is <literal>true</literal>. Only superusers can set
this value to <literal>false</literal>.
</para>
</listitem>
</varlistentry>
<varlistentry id="sql-createsubscription-with-run-as-owner">
<term><literal>run_as_owner</literal> (<type>boolean</type>)</term>
<listitem>
<para>
If true, all replication actions are performed as the subscription
owner. If false, replication workers will perform actions on each
table as the owner of that table. The latter configuration is
generally much more secure; for details, see
<xref linkend="logical-replication-security" />.
The default is <literal>false</literal>.
</para>
</listitem>
</varlistentry>
<varlistentry id="sql-createsubscription-with-origin">
2022-07-21 05:17:38 +02:00
<term><literal>origin</literal> (<type>string</type>)</term>
<listitem>
<para>
Specifies whether the subscription will request the publisher to only
send changes that don't have an origin or send changes regardless of
origin. Setting <literal>origin</literal> to <literal>none</literal>
means that the subscription will request the publisher to only send
changes that don't have an origin. Setting <literal>origin</literal>
to <literal>any</literal> means that the publisher sends changes
regardless of their origin. The default is <literal>any</literal>.
</para>
<para>
See <xref linkend="sql-createsubscription-notes"/> for details of how
<literal>copy_data = true</literal> can interact with the
<literal>origin</literal> parameter.
</para>
2022-07-21 05:17:38 +02:00
</listitem>
</varlistentry>
</variablelist></para>
</listitem>
</varlistentry>
</variablelist>
<para>
When specifying a parameter of type <type>boolean</type>, the
<literal>=</literal> <replaceable class="parameter">value</replaceable>
part can be omitted, which is equivalent to
specifying <literal>TRUE</literal>.
</para>
</refsect1>
Allow specifying row filters for logical replication of tables. This feature adds row filtering for publication tables. When a publication is defined or modified, an optional WHERE clause can be specified. Rows that don't satisfy this WHERE clause will be filtered out. This allows a set of tables to be partially replicated. The row filter is per table. A new row filter can be added simply by specifying a WHERE clause after the table name. The WHERE clause must be enclosed by parentheses. The row filter WHERE clause for a table added to a publication that publishes UPDATE and/or DELETE operations must contain only columns that are covered by REPLICA IDENTITY. The row filter WHERE clause for a table added to a publication that publishes INSERT can use any column. If the row filter evaluates to NULL, it is regarded as "false". The WHERE clause only allows simple expressions that don't have user-defined functions, user-defined operators, user-defined types, user-defined collations, non-immutable built-in functions, or references to system columns. These restrictions could be addressed in the future. If you choose to do the initial table synchronization, only data that satisfies the row filters is copied to the subscriber. If the subscription has several publications in which a table has been published with different WHERE clauses, rows that satisfy ANY of the expressions will be copied. If a subscriber is a pre-15 version, the initial table synchronization won't use row filters even if they are defined in the publisher. The row filters are applied before publishing the changes. If the subscription has several publications in which the same table has been published with different filters (for the same publish operation), those expressions get OR'ed together so that rows satisfying any of the expressions will be replicated. This means all the other filters become redundant if (a) one of the publications have no filter at all, (b) one of the publications was created using FOR ALL TABLES, (c) one of the publications was created using FOR ALL TABLES IN SCHEMA and the table belongs to that same schema. If your publication contains a partitioned table, the publication parameter publish_via_partition_root determines if it uses the partition's row filter (if the parameter is false, the default) or the root partitioned table's row filter. Psql commands \dRp+ and \d <table-name> will display any row filters. Author: Hou Zhijie, Euler Taveira, Peter Smith, Ajin Cherian Reviewed-by: Greg Nancarrow, Haiying Tang, Amit Kapila, Tomas Vondra, Dilip Kumar, Vignesh C, Alvaro Herrera, Andres Freund, Wei Wang Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X%2BmK_DitLXF25B%3DjE6xyNCH4YOwM860JR7HarGQ%40mail.gmail.com
2022-02-22 03:24:12 +01:00
<refsect1 id="sql-createsubscription-notes" xreflabel="Notes">
<title>Notes</title>
<para>
See <xref linkend="logical-replication-security"/> for details on
how to configure access control between the subscription and the
publication instance.
</para>
<para>
When creating a replication slot (the default behavior), <command>CREATE
SUBSCRIPTION</command> cannot be executed inside a transaction block.
</para>
<para>
Creating a subscription that connects to the same database cluster (for
example, to replicate between databases in the same cluster or to replicate
within the same database) will only succeed if the replication slot is not
created as part of the same command. Otherwise, the <command>CREATE
SUBSCRIPTION</command> call will hang. To make this work, create the
replication slot separately (using the
function <function>pg_create_logical_replication_slot</function> with the
plugin name <literal>pgoutput</literal>) and create the subscription using
the parameter <literal>create_slot = false</literal>. See
<xref linkend="logical-replication-subscription-examples-deferred-slot"/>
for examples. This is an implementation restriction that might be lifted in a
future release.
</para>
Allow specifying row filters for logical replication of tables. This feature adds row filtering for publication tables. When a publication is defined or modified, an optional WHERE clause can be specified. Rows that don't satisfy this WHERE clause will be filtered out. This allows a set of tables to be partially replicated. The row filter is per table. A new row filter can be added simply by specifying a WHERE clause after the table name. The WHERE clause must be enclosed by parentheses. The row filter WHERE clause for a table added to a publication that publishes UPDATE and/or DELETE operations must contain only columns that are covered by REPLICA IDENTITY. The row filter WHERE clause for a table added to a publication that publishes INSERT can use any column. If the row filter evaluates to NULL, it is regarded as "false". The WHERE clause only allows simple expressions that don't have user-defined functions, user-defined operators, user-defined types, user-defined collations, non-immutable built-in functions, or references to system columns. These restrictions could be addressed in the future. If you choose to do the initial table synchronization, only data that satisfies the row filters is copied to the subscriber. If the subscription has several publications in which a table has been published with different WHERE clauses, rows that satisfy ANY of the expressions will be copied. If a subscriber is a pre-15 version, the initial table synchronization won't use row filters even if they are defined in the publisher. The row filters are applied before publishing the changes. If the subscription has several publications in which the same table has been published with different filters (for the same publish operation), those expressions get OR'ed together so that rows satisfying any of the expressions will be replicated. This means all the other filters become redundant if (a) one of the publications have no filter at all, (b) one of the publications was created using FOR ALL TABLES, (c) one of the publications was created using FOR ALL TABLES IN SCHEMA and the table belongs to that same schema. If your publication contains a partitioned table, the publication parameter publish_via_partition_root determines if it uses the partition's row filter (if the parameter is false, the default) or the root partitioned table's row filter. Psql commands \dRp+ and \d <table-name> will display any row filters. Author: Hou Zhijie, Euler Taveira, Peter Smith, Ajin Cherian Reviewed-by: Greg Nancarrow, Haiying Tang, Amit Kapila, Tomas Vondra, Dilip Kumar, Vignesh C, Alvaro Herrera, Andres Freund, Wei Wang Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X%2BmK_DitLXF25B%3DjE6xyNCH4YOwM860JR7HarGQ%40mail.gmail.com
2022-02-22 03:24:12 +01:00
<para>
If any table in the publication has a <literal>WHERE</literal> clause, rows
for which the <replaceable class="parameter">expression</replaceable>
evaluates to false or null will not be published. If the subscription has
several publications in which the same table has been published with
different <literal>WHERE</literal> clauses, a row will be published if any
of the expressions (referring to that publish operation) are satisfied. In
the case of different <literal>WHERE</literal> clauses, if one of the
publications has no <literal>WHERE</literal> clause (referring to that
publish operation) or the publication is declared as
<link linkend="sql-createpublication-for-all-tables"><literal>FOR ALL TABLES</literal></link>
or <link linkend="sql-createpublication-for-tables-in-schema"><literal>FOR TABLES IN SCHEMA</literal></link>,
rows are always published regardless of the definition of the other
expressions. If the subscriber is a <productname>PostgreSQL</productname>
version before 15, then any row filtering is ignored during the initial data
synchronization phase. For this case, the user might want to consider
deleting any initially copied data that would be incompatible with
subsequent filtering. Because initial data synchronization does not take
into account the publication
<link linkend="sql-createpublication-with-publish"><literal>publish</literal></link>
parameter when copying existing table data, some rows may be copied that
would not be replicated using DML. See
<xref linkend="logical-replication-subscription-examples"/> for examples.
Allow specifying row filters for logical replication of tables. This feature adds row filtering for publication tables. When a publication is defined or modified, an optional WHERE clause can be specified. Rows that don't satisfy this WHERE clause will be filtered out. This allows a set of tables to be partially replicated. The row filter is per table. A new row filter can be added simply by specifying a WHERE clause after the table name. The WHERE clause must be enclosed by parentheses. The row filter WHERE clause for a table added to a publication that publishes UPDATE and/or DELETE operations must contain only columns that are covered by REPLICA IDENTITY. The row filter WHERE clause for a table added to a publication that publishes INSERT can use any column. If the row filter evaluates to NULL, it is regarded as "false". The WHERE clause only allows simple expressions that don't have user-defined functions, user-defined operators, user-defined types, user-defined collations, non-immutable built-in functions, or references to system columns. These restrictions could be addressed in the future. If you choose to do the initial table synchronization, only data that satisfies the row filters is copied to the subscriber. If the subscription has several publications in which a table has been published with different WHERE clauses, rows that satisfy ANY of the expressions will be copied. If a subscriber is a pre-15 version, the initial table synchronization won't use row filters even if they are defined in the publisher. The row filters are applied before publishing the changes. If the subscription has several publications in which the same table has been published with different filters (for the same publish operation), those expressions get OR'ed together so that rows satisfying any of the expressions will be replicated. This means all the other filters become redundant if (a) one of the publications have no filter at all, (b) one of the publications was created using FOR ALL TABLES, (c) one of the publications was created using FOR ALL TABLES IN SCHEMA and the table belongs to that same schema. If your publication contains a partitioned table, the publication parameter publish_via_partition_root determines if it uses the partition's row filter (if the parameter is false, the default) or the root partitioned table's row filter. Psql commands \dRp+ and \d <table-name> will display any row filters. Author: Hou Zhijie, Euler Taveira, Peter Smith, Ajin Cherian Reviewed-by: Greg Nancarrow, Haiying Tang, Amit Kapila, Tomas Vondra, Dilip Kumar, Vignesh C, Alvaro Herrera, Andres Freund, Wei Wang Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X%2BmK_DitLXF25B%3DjE6xyNCH4YOwM860JR7HarGQ%40mail.gmail.com
2022-02-22 03:24:12 +01:00
</para>
<para>
Subscriptions having several publications in which the same table has been
published with different column lists are not supported.
</para>
<para>
We allow non-existent publications to be specified so that users can add
those later. This means
<link linkend="catalog-pg-subscription"><structname>pg_subscription</structname></link>
can have non-existent publications.
</para>
<para>
When using a subscription parameter combination of
<literal>copy_data = true</literal> and <literal>origin = NONE</literal>,
the initial sync table data is copied directly from the publisher, meaning
that knowledge of the true origin of that data is not possible. If the
publisher also has subscriptions then the copied table data might have
originated from further upstream. This scenario is detected and a WARNING is
logged to the user, but the warning is only an indication of a potential
problem; it is the user's responsibility to make the necessary checks to
ensure the copied data origins are really as wanted or not.
</para>
<para>
To find which tables might potentially include non-local origins (due to
other subscriptions created on the publisher) try this SQL query:
<programlisting>
# substitute &lt;pub-names&gt; below with your publication name(s) to be queried
SELECT DISTINCT PT.schemaname, PT.tablename
FROM pg_publication_tables PT,
pg_subscription_rel PS
JOIN pg_class C ON (C.oid = PS.srrelid)
JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE N.nspname = PT.schemaname AND
C.relname = PT.tablename AND
PT.pubname IN (&lt;pub-names&gt;);
</programlisting></para>
</refsect1>
<refsect1>
<title>Examples</title>
<para>
Create a subscription to a remote server that replicates tables in
the publications <literal>mypublication</literal> and
<literal>insert_only</literal> and starts replicating immediately on
commit:
<programlisting>
CREATE SUBSCRIPTION mysub
CONNECTION 'host=192.168.1.50 port=5432 user=foo dbname=foodb'
PUBLICATION mypublication, insert_only;
</programlisting>
</para>
<para>
Create a subscription to a remote server that replicates tables in
the <literal>insert_only</literal> publication and does not start replicating
until enabled at a later time.
<programlisting>
CREATE SUBSCRIPTION mysub
CONNECTION 'host=192.168.1.50 port=5432 user=foo dbname=foodb'
PUBLICATION insert_only
WITH (enabled = false);
2017-06-14 19:55:43 +02:00
</programlisting></para>
</refsect1>
<refsect1>
<title>Compatibility</title>
<para>
<command>CREATE SUBSCRIPTION</command> is a <productname>PostgreSQL</productname>
extension.
</para>
</refsect1>
<refsect1>
<title>See Also</title>
<simplelist type="inline">
<member><xref linkend="sql-altersubscription"/></member>
<member><xref linkend="sql-dropsubscription"/></member>
<member><xref linkend="sql-createpublication"/></member>
<member><xref linkend="sql-alterpublication"/></member>
</simplelist>
</refsect1>
</refentry>