2016-09-21 14:37:02 +02:00
|
|
|
<!-- doc/src/sgml/parallel.sgml -->
|
|
|
|
|
|
|
|
<chapter id="parallel-query">
|
|
|
|
<title>Parallel Query</title>
|
|
|
|
|
|
|
|
<indexterm zone="parallel-query">
|
|
|
|
<primary>parallel query</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
|
|
|
<para>
|
2021-09-02 04:35:38 +02:00
|
|
|
<productname>PostgreSQL</productname> can devise query plans that can leverage
|
2016-09-21 14:37:02 +02:00
|
|
|
multiple CPUs in order to answer queries faster. This feature is known
|
|
|
|
as parallel query. Many queries cannot benefit from parallel query, either
|
|
|
|
due to limitations of the current implementation or because there is no
|
2021-09-02 04:35:38 +02:00
|
|
|
imaginable query plan that is any faster than the serial query plan.
|
2016-09-21 14:37:02 +02:00
|
|
|
However, for queries that can benefit, the speedup from parallel query
|
|
|
|
is often very significant. Many queries can run more than twice as fast
|
|
|
|
when using parallel query, and some queries can run four times faster or
|
|
|
|
even more. Queries that touch a large amount of data but return only a
|
|
|
|
few rows to the user will typically benefit most. This chapter explains
|
|
|
|
some details of how parallel query works and in which situations it can be
|
|
|
|
used so that users who wish to make use of it can understand what to expect.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<sect1 id="how-parallel-query-works">
|
|
|
|
<title>How Parallel Query Works</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
When the optimizer determines that parallel query is the fastest execution
|
2021-09-02 04:35:38 +02:00
|
|
|
strategy for a particular query, it will create a query plan that includes
|
2017-08-10 19:22:31 +02:00
|
|
|
a <firstterm>Gather</firstterm> or <firstterm>Gather Merge</firstterm>
|
|
|
|
node. Here is a simple example:
|
2016-09-21 14:37:02 +02:00
|
|
|
|
|
|
|
<screen>
|
|
|
|
EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
2022-04-20 17:04:28 +02:00
|
|
|
QUERY PLAN
|
2020-05-15 00:13:08 +02:00
|
|
|
-------------------------------------------------------------------&zwsp;------------------
|
2016-09-21 14:37:02 +02:00
|
|
|
Gather (cost=1000.00..217018.43 rows=1 width=97)
|
|
|
|
Workers Planned: 2
|
|
|
|
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..216018.33 rows=1 width=97)
|
|
|
|
Filter: (filler ~~ '%x%'::text)
|
|
|
|
(4 rows)
|
|
|
|
</screen>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2017-08-10 19:22:31 +02:00
|
|
|
In all cases, the <literal>Gather</literal> or
|
|
|
|
<literal>Gather Merge</literal> node will have exactly one
|
2016-09-21 14:37:02 +02:00
|
|
|
child plan, which is the portion of the plan that will be executed in
|
2017-10-09 03:44:17 +02:00
|
|
|
parallel. If the <literal>Gather</literal> or <literal>Gather Merge</literal> node is
|
2017-08-10 19:22:31 +02:00
|
|
|
at the very top of the plan tree, then the entire query will execute in
|
|
|
|
parallel. If it is somewhere else in the plan tree, then only the portion
|
|
|
|
of the plan below it will run in parallel. In the example above, the
|
|
|
|
query accesses only one table, so there is only one plan node other than
|
2017-10-09 03:44:17 +02:00
|
|
|
the <literal>Gather</literal> node itself; since that plan node is a child of the
|
|
|
|
<literal>Gather</literal> node, it will run in parallel.
|
2016-09-21 14:37:02 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2017-10-09 03:44:17 +02:00
|
|
|
<link linkend="using-explain">Using EXPLAIN</link>, you can see the number of
|
|
|
|
workers chosen by the planner. When the <literal>Gather</literal> node is reached
|
2021-09-02 04:35:38 +02:00
|
|
|
during query execution, the process that is implementing the user's
|
2016-09-21 14:37:02 +02:00
|
|
|
session will request a number of <link linkend="bgworker">background
|
|
|
|
worker processes</link> equal to the number
|
2017-08-10 19:22:31 +02:00
|
|
|
of workers chosen by the planner. The number of background workers that
|
|
|
|
the planner will consider using is limited to at most
|
2017-11-23 15:39:47 +01:00
|
|
|
<xref linkend="guc-max-parallel-workers-per-gather"/>. The total number
|
2017-08-10 19:22:31 +02:00
|
|
|
of background workers that can exist at any one time is limited by both
|
2017-11-23 15:39:47 +01:00
|
|
|
<xref linkend="guc-max-worker-processes"/> and
|
|
|
|
<xref linkend="guc-max-parallel-workers"/>. Therefore, it is possible for a
|
2016-09-21 14:37:02 +02:00
|
|
|
parallel query to run with fewer workers than planned, or even with
|
|
|
|
no workers at all. The optimal plan may depend on the number of workers
|
|
|
|
that are available, so this can result in poor query performance. If this
|
2017-08-10 19:22:31 +02:00
|
|
|
occurrence is frequent, consider increasing
|
2017-10-09 03:44:17 +02:00
|
|
|
<varname>max_worker_processes</varname> and <varname>max_parallel_workers</varname>
|
2016-12-05 17:03:17 +01:00
|
|
|
so that more workers can be run simultaneously or alternatively reducing
|
2017-08-10 19:22:31 +02:00
|
|
|
<varname>max_parallel_workers_per_gather</varname> so that the planner
|
2016-09-21 14:37:02 +02:00
|
|
|
requests fewer workers.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2021-09-02 04:35:38 +02:00
|
|
|
Every background worker process that is successfully started for a given
|
2017-08-10 19:22:31 +02:00
|
|
|
parallel query will execute the parallel portion of the plan. The leader
|
|
|
|
will also execute that portion of the plan, but it has an additional
|
|
|
|
responsibility: it must also read all of the tuples generated by the
|
|
|
|
workers. When the parallel portion of the plan generates only a small
|
|
|
|
number of tuples, the leader will often behave very much like an additional
|
|
|
|
worker, speeding up query execution. Conversely, when the parallel portion
|
|
|
|
of the plan generates a large number of tuples, the leader may be almost
|
|
|
|
entirely occupied with reading the tuples generated by the workers and
|
2021-09-02 04:35:38 +02:00
|
|
|
performing any further processing steps that are required by plan nodes
|
2017-08-10 19:22:31 +02:00
|
|
|
above the level of the <literal>Gather</literal> node or
|
|
|
|
<literal>Gather Merge</literal> node. In such cases, the leader will
|
|
|
|
do very little of the work of executing the parallel portion of the plan.
|
2016-09-21 14:37:02 +02:00
|
|
|
</para>
|
2017-08-10 19:22:31 +02:00
|
|
|
|
|
|
|
<para>
|
|
|
|
When the node at the top of the parallel portion of the plan is
|
2017-10-09 03:44:17 +02:00
|
|
|
<literal>Gather Merge</literal> rather than <literal>Gather</literal>, it indicates that
|
2017-08-10 19:22:31 +02:00
|
|
|
each process executing the parallel portion of the plan is producing
|
|
|
|
tuples in sorted order, and that the leader is performing an
|
2017-10-09 03:44:17 +02:00
|
|
|
order-preserving merge. In contrast, <literal>Gather</literal> reads tuples
|
2017-08-10 19:22:31 +02:00
|
|
|
from the workers in whatever order is convenient, destroying any sort
|
|
|
|
order that may have existed.
|
2019-04-08 22:27:35 +02:00
|
|
|
</para>
|
2016-09-21 14:37:02 +02:00
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1 id="when-can-parallel-query-be-used">
|
|
|
|
<title>When Can Parallel Query Be Used?</title>
|
|
|
|
|
|
|
|
<para>
|
2021-09-02 04:35:38 +02:00
|
|
|
There are several settings that can cause the query planner not to
|
2016-09-21 14:37:02 +02:00
|
|
|
generate a parallel query plan under any circumstances. In order for
|
|
|
|
any parallel query plans whatsoever to be generated, the following
|
|
|
|
settings must be configured as indicated.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
2017-11-23 15:39:47 +01:00
|
|
|
<xref linkend="guc-max-parallel-workers-per-gather"/> must be set to a
|
2021-09-02 04:35:38 +02:00
|
|
|
value that is greater than zero. This is a special case of the more
|
2016-09-21 14:37:02 +02:00
|
|
|
general principle that no more workers should be used than the number
|
|
|
|
configured via <varname>max_parallel_workers_per_gather</varname>.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
In addition, the system must not be running in single-user mode. Since
|
2023-01-03 08:26:14 +01:00
|
|
|
the entire database system is running as a single process in this situation,
|
2016-09-21 14:37:02 +02:00
|
|
|
no background workers will be available.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Even when it is in general possible for parallel query plans to be
|
|
|
|
generated, the planner will not generate them for a given query
|
|
|
|
if any of the following are true:
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
2016-12-17 18:00:00 +01:00
|
|
|
<para>
|
2016-09-21 14:37:02 +02:00
|
|
|
The query writes any data or locks any database rows. If a query
|
|
|
|
contains a data-modifying operation either at the top level or within
|
2017-10-05 17:34:38 +02:00
|
|
|
a CTE, no parallel plans for that query will be generated. As an
|
2021-09-02 04:35:38 +02:00
|
|
|
exception, the following commands, which create a new table and populate
|
|
|
|
it, can use a parallel plan for the underlying <literal>SELECT</literal>
|
2021-03-17 01:43:08 +01:00
|
|
|
part of the query:
|
|
|
|
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para><command>CREATE TABLE ... AS</command></para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para><command>SELECT INTO</command></para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para><command>CREATE MATERIALIZED VIEW</command></para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para><command>REFRESH MATERIALIZED VIEW</command></para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
2016-09-21 14:37:02 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
The query might be suspended during execution. In any situation in
|
|
|
|
which the system thinks that partial or incremental execution might
|
|
|
|
occur, no parallel plan is generated. For example, a cursor created
|
|
|
|
using <link linkend="sql-declare">DECLARE CURSOR</link> will never use
|
2017-04-05 06:38:25 +02:00
|
|
|
a parallel plan. Similarly, a PL/pgSQL loop of the form
|
2016-09-21 14:37:02 +02:00
|
|
|
<literal>FOR x IN query LOOP .. END LOOP</literal> will never use a
|
|
|
|
parallel plan, because the parallel query system is unable to verify
|
|
|
|
that the code in the loop is safe to execute while parallel query is
|
2016-12-17 18:00:00 +01:00
|
|
|
active.
|
2016-09-21 14:37:02 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
The query uses any function marked <literal>PARALLEL UNSAFE</literal>.
|
|
|
|
Most system-defined functions are <literal>PARALLEL SAFE</literal>,
|
|
|
|
but user-defined functions are marked <literal>PARALLEL
|
|
|
|
UNSAFE</literal> by default. See the discussion of
|
2017-11-23 15:39:47 +01:00
|
|
|
<xref linkend="parallel-safety"/>.
|
2016-09-21 14:37:02 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
The query is running inside of another query that is already parallel.
|
|
|
|
For example, if a function called by a parallel query issues an SQL
|
|
|
|
query itself, that query will never use a parallel plan. This is a
|
|
|
|
limitation of the current implementation, but it may not be desirable
|
|
|
|
to remove this limitation, since it could result in a single query
|
2016-12-17 18:00:00 +01:00
|
|
|
using a very large number of processes.
|
2016-09-21 14:37:02 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Even when parallel query plan is generated for a particular query, there
|
|
|
|
are several circumstances under which it will be impossible to execute
|
|
|
|
that plan in parallel at execution time. If this occurs, the leader
|
2017-10-09 03:44:17 +02:00
|
|
|
will execute the portion of the plan below the <literal>Gather</literal>
|
|
|
|
node entirely by itself, almost as if the <literal>Gather</literal> node were
|
2016-09-21 14:37:02 +02:00
|
|
|
not present. This will happen if any of the following conditions are met:
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
2016-12-17 18:00:00 +01:00
|
|
|
<para>
|
2016-09-21 14:37:02 +02:00
|
|
|
No background workers can be obtained because of the limitation that
|
|
|
|
the total number of background workers cannot exceed
|
2017-11-23 15:39:47 +01:00
|
|
|
<xref linkend="guc-max-worker-processes"/>.
|
2016-09-21 14:37:02 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
2016-12-05 17:03:17 +01:00
|
|
|
<listitem>
|
2016-12-17 18:00:00 +01:00
|
|
|
<para>
|
2016-12-05 17:03:17 +01:00
|
|
|
No background workers can be obtained because of the limitation that
|
|
|
|
the total number of background workers launched for purposes of
|
2017-11-23 15:39:47 +01:00
|
|
|
parallel query cannot exceed <xref linkend="guc-max-parallel-workers"/>.
|
2016-12-05 17:03:17 +01:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
2016-09-21 14:37:02 +02:00
|
|
|
<listitem>
|
2016-12-17 18:00:00 +01:00
|
|
|
<para>
|
2016-09-21 14:37:02 +02:00
|
|
|
The client sends an Execute message with a non-zero fetch count.
|
|
|
|
See the discussion of the
|
|
|
|
<link linkend="protocol-flow-ext-query">extended query protocol</link>.
|
|
|
|
Since <link linkend="libpq">libpq</link> currently provides no way to
|
|
|
|
send such a message, this can only occur when using a client that
|
|
|
|
does not rely on libpq. If this is a frequent
|
|
|
|
occurrence, it may be a good idea to set
|
2017-11-23 15:39:47 +01:00
|
|
|
<xref linkend="guc-max-parallel-workers-per-gather"/> to zero in
|
2017-08-10 19:22:31 +02:00
|
|
|
sessions where it is likely, so as to avoid generating query plans
|
|
|
|
that may be suboptimal when run serially.
|
2016-09-21 14:37:02 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1 id="parallel-plans">
|
|
|
|
<title>Parallel Plans</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Because each worker executes the parallel portion of the plan to
|
|
|
|
completion, it is not possible to simply take an ordinary query plan
|
|
|
|
and run it using multiple workers. Each worker would produce a full
|
|
|
|
copy of the output result set, so the query would not run any faster
|
|
|
|
than normal but would produce incorrect results. Instead, the parallel
|
|
|
|
portion of the plan must be what is known internally to the query
|
2017-10-09 03:44:17 +02:00
|
|
|
optimizer as a <firstterm>partial plan</firstterm>; that is, it must be constructed
|
2021-09-02 04:35:38 +02:00
|
|
|
so that each process that executes the plan will generate only a
|
2016-09-21 14:37:02 +02:00
|
|
|
subset of the output rows in such a way that each required output row
|
|
|
|
is guaranteed to be generated by exactly one of the cooperating processes.
|
2017-08-10 19:22:31 +02:00
|
|
|
Generally, this means that the scan on the driving table of the query
|
|
|
|
must be a parallel-aware scan.
|
2016-09-21 14:37:02 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<sect2 id="parallel-scans">
|
|
|
|
<title>Parallel Scans</title>
|
|
|
|
|
|
|
|
<para>
|
2017-03-09 19:02:34 +01:00
|
|
|
The following types of parallel-aware table scans are currently supported.
|
|
|
|
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
2017-10-09 03:44:17 +02:00
|
|
|
In a <emphasis>parallel sequential scan</emphasis>, the table's blocks will
|
2022-10-20 22:29:08 +02:00
|
|
|
be divided into ranges and shared among the cooperating processes. Each
|
|
|
|
worker process will complete the scanning of its given range of blocks before
|
|
|
|
requesting an additional range of blocks.
|
2017-03-09 19:02:34 +01:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
2017-10-09 03:44:17 +02:00
|
|
|
In a <emphasis>parallel bitmap heap scan</emphasis>, one process is chosen
|
2017-03-09 19:02:34 +01:00
|
|
|
as the leader. That process performs a scan of one or more indexes
|
|
|
|
and builds a bitmap indicating which table blocks need to be visited.
|
|
|
|
These blocks are then divided among the cooperating processes as in
|
|
|
|
a parallel sequential scan. In other words, the heap scan is performed
|
|
|
|
in parallel, but the underlying index scan is not.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
2017-10-09 03:44:17 +02:00
|
|
|
In a <emphasis>parallel index scan</emphasis> or <emphasis>parallel index-only
|
|
|
|
scan</emphasis>, the cooperating processes take turns reading data from the
|
2017-03-09 19:02:34 +01:00
|
|
|
index. Currently, parallel index scans are supported only for
|
|
|
|
btree indexes. Each process will claim a single index block and will
|
2020-01-03 06:22:46 +01:00
|
|
|
scan and return all tuples referenced by that block; other processes can
|
2017-03-09 19:02:34 +01:00
|
|
|
at the same time be returning tuples from a different index block.
|
|
|
|
The results of a parallel btree scan are returned in sorted order
|
|
|
|
within each worker process.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
|
2017-08-10 19:22:31 +02:00
|
|
|
Other scan types, such as scans of non-btree indexes, may support
|
|
|
|
parallel scans in the future.
|
2016-09-21 14:37:02 +02:00
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="parallel-joins">
|
|
|
|
<title>Parallel Joins</title>
|
|
|
|
|
|
|
|
<para>
|
2017-03-09 19:02:34 +01:00
|
|
|
Just as in a non-parallel plan, the driving table may be joined to one or
|
|
|
|
more other tables using a nested loop, hash join, or merge join. The
|
|
|
|
inner side of the join may be any kind of non-parallel plan that is
|
|
|
|
otherwise supported by the planner provided that it is safe to run within
|
2018-03-22 18:25:59 +01:00
|
|
|
a parallel worker. Depending on the join type, the inner side may also be
|
|
|
|
a parallel plan.
|
2017-03-09 19:02:34 +01:00
|
|
|
</para>
|
|
|
|
|
2018-03-22 18:25:59 +01:00
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
In a <emphasis>nested loop join</emphasis>, the inner side is always
|
|
|
|
non-parallel. Although it is executed in full, this is efficient if
|
|
|
|
the inner side is an index scan, because the outer tuples and thus
|
|
|
|
the loops that look up values in the index are divided over the
|
|
|
|
cooperating processes.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
In a <emphasis>merge join</emphasis>, the inner side is always
|
|
|
|
a non-parallel plan and therefore executed in full. This may be
|
|
|
|
inefficient, especially if a sort must be performed, because the work
|
|
|
|
and resulting data are duplicated in every cooperating process.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
In a <emphasis>hash join</emphasis> (without the "parallel" prefix),
|
|
|
|
the inner side is executed in full by every cooperating process
|
|
|
|
to build identical copies of the hash table. This may be inefficient
|
|
|
|
if the hash table is large or the plan is expensive. In a
|
|
|
|
<emphasis>parallel hash join</emphasis>, the inner side is a
|
|
|
|
<emphasis>parallel hash</emphasis> that divides the work of building
|
|
|
|
a shared hash table over the cooperating processes.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
2016-09-21 14:37:02 +02:00
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="parallel-aggregation">
|
|
|
|
<title>Parallel Aggregation</title>
|
|
|
|
<para>
|
2017-10-09 03:44:17 +02:00
|
|
|
<productname>PostgreSQL</productname> supports parallel aggregation by aggregating in
|
2017-02-14 15:37:31 +01:00
|
|
|
two stages. First, each process participating in the parallel portion of
|
|
|
|
the query performs an aggregation step, producing a partial result for
|
|
|
|
each group of which that process is aware. This is reflected in the plan
|
2017-10-09 03:44:17 +02:00
|
|
|
as a <literal>Partial Aggregate</literal> node. Second, the partial results are
|
|
|
|
transferred to the leader via <literal>Gather</literal> or <literal>Gather
|
|
|
|
Merge</literal>. Finally, the leader re-aggregates the results across all
|
2017-08-10 19:22:31 +02:00
|
|
|
workers in order to produce the final result. This is reflected in the
|
2017-10-09 03:44:17 +02:00
|
|
|
plan as a <literal>Finalize Aggregate</literal> node.
|
2017-02-14 15:37:31 +01:00
|
|
|
</para>
|
2019-04-08 22:27:35 +02:00
|
|
|
|
2017-02-14 15:37:31 +01:00
|
|
|
<para>
|
2017-10-09 03:44:17 +02:00
|
|
|
Because the <literal>Finalize Aggregate</literal> node runs on the leader
|
2021-09-02 04:35:38 +02:00
|
|
|
process, queries that produce a relatively large number of groups in
|
2017-02-14 15:37:31 +01:00
|
|
|
comparison to the number of input rows will appear less favorable to the
|
|
|
|
query planner. For example, in the worst-case scenario the number of
|
2017-10-09 03:44:17 +02:00
|
|
|
groups seen by the <literal>Finalize Aggregate</literal> node could be as many as
|
2021-09-02 04:35:38 +02:00
|
|
|
the number of input rows that were seen by all worker processes in the
|
2017-10-09 03:44:17 +02:00
|
|
|
<literal>Partial Aggregate</literal> stage. For such cases, there is clearly
|
2017-02-14 15:37:31 +01:00
|
|
|
going to be no performance benefit to using parallel aggregation. The
|
|
|
|
query planner takes this into account during the planning process and is
|
|
|
|
unlikely to choose parallel aggregate in this scenario.
|
2016-09-21 14:37:02 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Parallel aggregation is not supported in all situations. Each aggregate
|
2017-10-09 03:44:17 +02:00
|
|
|
must be <link linkend="parallel-safety">safe</link> for parallelism and must
|
2016-09-21 14:37:02 +02:00
|
|
|
have a combine function. If the aggregate has a transition state of type
|
2017-10-09 03:44:17 +02:00
|
|
|
<literal>internal</literal>, it must have serialization and deserialization
|
2017-11-23 15:39:47 +01:00
|
|
|
functions. See <xref linkend="sql-createaggregate"/> for more details.
|
2017-02-14 15:37:31 +01:00
|
|
|
Parallel aggregation is not supported if any aggregate function call
|
2017-10-09 03:44:17 +02:00
|
|
|
contains <literal>DISTINCT</literal> or <literal>ORDER BY</literal> clause and is also
|
2017-02-14 15:37:31 +01:00
|
|
|
not supported for ordered set aggregates or when the query involves
|
2017-10-09 03:44:17 +02:00
|
|
|
<literal>GROUPING SETS</literal>. It can only be used when all joins involved in
|
2017-02-14 15:37:31 +01:00
|
|
|
the query are also part of the parallel portion of the plan.
|
2016-09-21 14:37:02 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
|
2018-08-01 14:14:05 +02:00
|
|
|
<sect2 id="parallel-append">
|
|
|
|
<title>Parallel Append</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Whenever <productname>PostgreSQL</productname> needs to combine rows
|
|
|
|
from multiple sources into a single result set, it uses an
|
|
|
|
<literal>Append</literal> or <literal>MergeAppend</literal> plan node.
|
|
|
|
This commonly happens when implementing <literal>UNION ALL</literal> or
|
|
|
|
when scanning a partitioned table. Such nodes can be used in parallel
|
|
|
|
plans just as they can in any other plan. However, in a parallel plan,
|
|
|
|
the planner may instead use a <literal>Parallel Append</literal> node.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
When an <literal>Append</literal> node is used in a parallel plan, each
|
|
|
|
process will execute the child plans in the order in which they appear,
|
|
|
|
so that all participating processes cooperate to execute the first child
|
|
|
|
plan until it is complete and then move to the second plan at around the
|
|
|
|
same time. When a <literal>Parallel Append</literal> is used instead, the
|
|
|
|
executor will instead spread out the participating processes as evenly as
|
|
|
|
possible across its child plans, so that multiple child plans are executed
|
|
|
|
simultaneously. This avoids contention, and also avoids paying the startup
|
|
|
|
cost of a child plan in those processes that never execute it.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Also, unlike a regular <literal>Append</literal> node, which can only have
|
|
|
|
partial children when used within a parallel plan, a <literal>Parallel
|
|
|
|
Append</literal> node can have both partial and non-partial child plans.
|
|
|
|
Non-partial children will be scanned by only a single process, since
|
|
|
|
scanning them more than once would produce duplicate results. Plans that
|
|
|
|
involve appending multiple results sets can therefore achieve
|
|
|
|
coarse-grained parallelism even when efficient partial plans are not
|
|
|
|
available. For example, consider a query against a partitioned table
|
2021-09-02 04:35:38 +02:00
|
|
|
that can only be implemented efficiently by using an index that does
|
2018-08-01 14:14:05 +02:00
|
|
|
not support parallel scans. The planner might choose a <literal>Parallel
|
|
|
|
Append</literal> of regular <literal>Index Scan</literal> plans; each
|
|
|
|
individual index scan would have to be executed to completion by a single
|
|
|
|
process, but different scans could be performed at the same time by
|
|
|
|
different processes.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<xref linkend="guc-enable-parallel-append" /> can be used to disable
|
|
|
|
this feature.
|
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
2016-09-21 14:37:02 +02:00
|
|
|
<sect2 id="parallel-plan-tips">
|
|
|
|
<title>Parallel Plan Tips</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
If a query that is expected to do so does not produce a parallel plan,
|
2017-11-23 15:39:47 +01:00
|
|
|
you can try reducing <xref linkend="guc-parallel-setup-cost"/> or
|
|
|
|
<xref linkend="guc-parallel-tuple-cost"/>. Of course, this plan may turn
|
2021-09-02 04:35:38 +02:00
|
|
|
out to be slower than the serial plan that the planner preferred, but
|
2016-09-21 14:37:02 +02:00
|
|
|
this will not always be the case. If you don't get a parallel
|
2020-09-01 00:33:37 +02:00
|
|
|
plan even with very small values of these settings (e.g., after setting
|
2016-09-21 14:37:02 +02:00
|
|
|
them both to zero), there may be some reason why the query planner is
|
|
|
|
unable to generate a parallel plan for your query. See
|
2017-11-23 15:39:47 +01:00
|
|
|
<xref linkend="when-can-parallel-query-be-used"/> and
|
|
|
|
<xref linkend="parallel-safety"/> for information on why this may be
|
2016-09-21 14:37:02 +02:00
|
|
|
the case.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
When executing a parallel plan, you can use <literal>EXPLAIN (ANALYZE,
|
2016-10-14 02:03:25 +02:00
|
|
|
VERBOSE)</literal> to display per-worker statistics for each plan node.
|
2016-09-21 14:37:02 +02:00
|
|
|
This may be useful in determining whether the work is being evenly
|
|
|
|
distributed between all plan nodes and more generally in understanding the
|
|
|
|
performance characteristics of the plan.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1 id="parallel-safety">
|
|
|
|
<title>Parallel Safety</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The planner classifies operations involved in a query as either
|
2017-10-09 03:44:17 +02:00
|
|
|
<firstterm>parallel safe</firstterm>, <firstterm>parallel restricted</firstterm>,
|
2021-09-02 04:35:38 +02:00
|
|
|
or <firstterm>parallel unsafe</firstterm>. A parallel safe operation is one that
|
2016-09-21 14:37:02 +02:00
|
|
|
does not conflict with the use of parallel query. A parallel restricted
|
2021-09-02 04:35:38 +02:00
|
|
|
operation is one that cannot be performed in a parallel worker, but that
|
2016-09-21 14:37:02 +02:00
|
|
|
can be performed in the leader while parallel query is in use. Therefore,
|
2017-10-09 03:44:17 +02:00
|
|
|
parallel restricted operations can never occur below a <literal>Gather</literal>
|
2021-09-02 04:35:38 +02:00
|
|
|
or <literal>Gather Merge</literal> node, but can occur elsewhere in a plan that
|
|
|
|
contains such a node. A parallel unsafe operation is one that cannot
|
2016-09-21 14:37:02 +02:00
|
|
|
be performed while parallel query is in use, not even in the leader.
|
2021-09-02 04:35:38 +02:00
|
|
|
When a query contains anything that is parallel unsafe, parallel query
|
2016-09-21 14:37:02 +02:00
|
|
|
is completely disabled for that query.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2020-10-19 18:28:54 +02:00
|
|
|
The following operations are always parallel restricted:
|
2016-09-21 14:37:02 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Scans of common table expressions (CTEs).
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Scans of temporary tables.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Scans of foreign tables, unless the foreign data wrapper has
|
2021-09-02 04:35:38 +02:00
|
|
|
an <literal>IsForeignScanParallelSafe</literal> API that indicates otherwise.
|
2016-09-21 14:37:02 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
2018-05-09 21:15:03 +02:00
|
|
|
Plan nodes to which an <literal>InitPlan</literal> is attached.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
2021-09-02 04:35:38 +02:00
|
|
|
Plan nodes that reference a correlated <literal>SubPlan</literal>.
|
2016-09-21 14:37:02 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
|
|
|
|
<sect2 id="parallel-labeling">
|
|
|
|
<title>Parallel Labeling for Functions and Aggregates</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The planner cannot automatically determine whether a user-defined
|
|
|
|
function or aggregate is parallel safe, parallel restricted, or parallel
|
2021-09-02 04:35:38 +02:00
|
|
|
unsafe, because this would require predicting every operation that the
|
2016-09-21 14:37:02 +02:00
|
|
|
function could possibly perform. In general, this is equivalent to the
|
|
|
|
Halting Problem and therefore impossible. Even for simple functions
|
2018-08-17 01:32:55 +02:00
|
|
|
where it could conceivably be done, we do not try, since this would be expensive
|
2016-09-21 14:37:02 +02:00
|
|
|
and error-prone. Instead, all user-defined functions are assumed to
|
|
|
|
be parallel unsafe unless otherwise marked. When using
|
2017-11-23 15:39:47 +01:00
|
|
|
<xref linkend="sql-createfunction"/> or
|
|
|
|
<xref linkend="sql-alterfunction"/>, markings can be set by specifying
|
2017-10-09 03:44:17 +02:00
|
|
|
<literal>PARALLEL SAFE</literal>, <literal>PARALLEL RESTRICTED</literal>, or
|
|
|
|
<literal>PARALLEL UNSAFE</literal> as appropriate. When using
|
2017-11-23 15:39:47 +01:00
|
|
|
<xref linkend="sql-createaggregate"/>, the
|
2017-10-09 03:44:17 +02:00
|
|
|
<literal>PARALLEL</literal> option can be specified with <literal>SAFE</literal>,
|
|
|
|
<literal>RESTRICTED</literal>, or <literal>UNSAFE</literal> as the corresponding value.
|
2016-09-21 14:37:02 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
Allow "internal" subtransactions in parallel mode.
Allow use of BeginInternalSubTransaction() in parallel mode, so long
as the subtransaction doesn't attempt to acquire an XID or increment
the command counter. Given those restrictions, the other parallel
processes don't need to know about the subtransaction at all, so
this should be safe. The benefit is that it allows subtransactions
intended for error recovery, such as pl/pgsql exception blocks,
to be used in PARALLEL SAFE functions.
Another reason for doing this is that the API of
BeginInternalSubTransaction() doesn't allow reporting failure.
pl/python for one, and perhaps other PLs, copes very poorly with an
error longjmp out of BeginInternalSubTransaction(). The headline
feature of this patch removes the only easily-triggerable failure
case within that function. There remain some resource-exhaustion
and similar cases, which we now deal with by promoting them to FATAL
errors, so that callers need not try to clean up. (It is likely
that such errors would leave us with corrupted transaction state
inside xact.c, making recovery difficult if not impossible anyway.)
Although this work started because of a report of a pl/python crash,
we're not going to do anything about that in the back branches.
Back-patching this particular fix is obviously not very wise.
While we could contemplate some narrower band-aid, pl/python is
already an untrusted language, so it seems okay to classify this
as a "so don't do that" case.
Patch by me, per report from Hao Zhang. Thanks to Robert Haas for
review.
Discussion: https://postgr.es/m/CALY6Dr-2yLVeVPhNMhuBnRgOZo1UjoTETgtKBx1B2gUi8yy+3g@mail.gmail.com
2024-03-28 17:43:10 +01:00
|
|
|
Functions and aggregates must be marked <literal>PARALLEL UNSAFE</literal>
|
|
|
|
if they write to the database, change the transaction state (other than by
|
|
|
|
using a subtransaction for error recovery), access sequences, or make
|
|
|
|
persistent changes to
|
2016-09-21 14:37:02 +02:00
|
|
|
settings. Similarly, functions must be marked <literal>PARALLEL
|
2017-10-09 03:44:17 +02:00
|
|
|
RESTRICTED</literal> if they access temporary tables, client connection state,
|
2021-09-02 04:35:38 +02:00
|
|
|
cursors, prepared statements, or miscellaneous backend-local state that
|
2016-09-21 14:37:02 +02:00
|
|
|
the system cannot synchronize across workers. For example,
|
2017-10-09 03:44:17 +02:00
|
|
|
<literal>setseed</literal> and <literal>random</literal> are parallel restricted for
|
2016-09-21 14:37:02 +02:00
|
|
|
this last reason.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
In general, if a function is labeled as being safe when it is restricted or
|
|
|
|
unsafe, or if it is labeled as being restricted when it is in fact unsafe,
|
|
|
|
it may throw errors or produce wrong answers when used in a parallel query.
|
|
|
|
C-language functions could in theory exhibit totally undefined behavior if
|
|
|
|
mislabeled, since there is no way for the system to protect itself against
|
|
|
|
arbitrary C code, but in most likely cases the result will be no worse than
|
|
|
|
for any other function. If in doubt, it is probably best to label functions
|
2017-10-09 03:44:17 +02:00
|
|
|
as <literal>UNSAFE</literal>.
|
2016-09-21 14:37:02 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2021-09-02 04:35:38 +02:00
|
|
|
If a function executed within a parallel worker acquires locks that are
|
2016-09-21 14:37:02 +02:00
|
|
|
not held by the leader, for example by querying a table not referenced in
|
|
|
|
the query, those locks will be released at worker exit, not end of
|
2021-09-02 04:35:38 +02:00
|
|
|
transaction. If you write a function that does this, and this behavior
|
2016-09-21 14:37:02 +02:00
|
|
|
difference is important to you, mark such functions as
|
|
|
|
<literal>PARALLEL RESTRICTED</literal>
|
2016-12-17 18:00:00 +01:00
|
|
|
to ensure that they execute only in the leader.
|
2016-09-21 14:37:02 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Note that the query planner does not consider deferring the evaluation of
|
|
|
|
parallel-restricted functions or aggregates involved in the query in
|
2017-10-09 03:44:17 +02:00
|
|
|
order to obtain a superior plan. So, for example, if a <literal>WHERE</literal>
|
2016-09-21 14:37:02 +02:00
|
|
|
clause applied to a particular table is parallel restricted, the query
|
2017-08-10 19:22:31 +02:00
|
|
|
planner will not consider performing a scan of that table in the parallel
|
|
|
|
portion of a plan. In some cases, it would be
|
2016-09-21 14:37:02 +02:00
|
|
|
possible (and perhaps even efficient) to include the scan of that table in
|
|
|
|
the parallel portion of the query and defer the evaluation of the
|
2017-10-09 03:44:17 +02:00
|
|
|
<literal>WHERE</literal> clause so that it happens above the <literal>Gather</literal>
|
2016-09-21 14:37:02 +02:00
|
|
|
node. However, the planner does not do this.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
</chapter>
|