mirror of
https://git.postgresql.org/git/postgresql.git
synced 2024-09-30 18:11:18 +02:00
dde5f09fad
Thomas Munro and Robert Haas, reviewed by Haribabu Kommi
302 lines
14 KiB
Plaintext
302 lines
14 KiB
Plaintext
<!-- doc/src/sgml/bgworker.sgml -->
|
|
|
|
<chapter id="bgworker">
|
|
<title>Background Worker Processes</title>
|
|
|
|
<indexterm zone="bgworker">
|
|
<primary>Background workers</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
PostgreSQL can be extended to run user-supplied code in separate processes.
|
|
Such processes are started, stopped and monitored by <command>postgres</command>,
|
|
which permits them to have a lifetime closely linked to the server's status.
|
|
These processes have the option to attach to <productname>PostgreSQL</>'s
|
|
shared memory area and to connect to databases internally; they can also run
|
|
multiple transactions serially, just like a regular client-connected server
|
|
process. Also, by linking to <application>libpq</> they can connect to the
|
|
server and behave like a regular client application.
|
|
</para>
|
|
|
|
<warning>
|
|
<para>
|
|
There are considerable robustness and security risks in using background
|
|
worker processes because, being written in the <literal>C</> language,
|
|
they have unrestricted access to data. Administrators wishing to enable
|
|
modules that include background worker process should exercise extreme
|
|
caution. Only carefully audited modules should be permitted to run
|
|
background worker processes.
|
|
</para>
|
|
</warning>
|
|
|
|
<para>
|
|
Background workers can be initialized at the time that
|
|
<productname>PostgreSQL</> is started by including the module name in
|
|
<varname>shared_preload_libraries</>. A module wishing to run a background
|
|
worker can register it by calling
|
|
<function>RegisterBackgroundWorker(<type>BackgroundWorker *worker</type>)</function>
|
|
from its <function>_PG_init()</>. Background workers can also be started
|
|
after the system is up and running by calling the function
|
|
<function>RegisterDynamicBackgroundWorker(<type>BackgroundWorker
|
|
*worker, BackgroundWorkerHandle **handle</type>)</function>. Unlike
|
|
<function>RegisterBackgroundWorker</>, which can only be called from within
|
|
the postmaster, <function>RegisterDynamicBackgroundWorker</function> must be
|
|
called from a regular backend.
|
|
</para>
|
|
|
|
<para>
|
|
The structure <structname>BackgroundWorker</structname> is defined thus:
|
|
<programlisting>
|
|
typedef void (*bgworker_main_type)(Datum main_arg);
|
|
typedef struct BackgroundWorker
|
|
{
|
|
char bgw_name[BGW_MAXLEN];
|
|
int bgw_flags;
|
|
BgWorkerStartTime bgw_start_time;
|
|
int bgw_restart_time; /* in seconds, or BGW_NEVER_RESTART */
|
|
bgworker_main_type bgw_main;
|
|
char bgw_library_name[BGW_MAXLEN]; /* only if bgw_main is NULL */
|
|
char bgw_function_name[BGW_MAXLEN]; /* only if bgw_main is NULL */
|
|
Datum bgw_main_arg;
|
|
char bgw_extra[BGW_EXTRALEN];
|
|
int bgw_notify_pid;
|
|
} BackgroundWorker;
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
<structfield>bgw_name</> is a string to be used in log messages, process
|
|
listings and similar contexts.
|
|
</para>
|
|
|
|
<para>
|
|
<structfield>bgw_flags</> is a bitwise-or'd bit mask indicating the
|
|
capabilities that the module wants. Possible values are:
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
|
<term><literal>BGWORKER_SHMEM_ACCESS</literal></term>
|
|
<listitem>
|
|
<para>
|
|
<indexterm><primary>BGWORKER_SHMEM_ACCESS</primary></indexterm>
|
|
Requests shared memory access. Workers without shared memory access
|
|
cannot access any of <productname>PostgreSQL's</productname> shared
|
|
data structures, such as heavyweight or lightweight locks, shared
|
|
buffers, or any custom data structures which the worker itself may
|
|
wish to create and use.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><literal>BGWORKER_BACKEND_DATABASE_CONNECTION</literal></term>
|
|
<listitem>
|
|
<para>
|
|
<indexterm><primary>BGWORKER_BACKEND_DATABASE_CONNECTION</primary></indexterm>
|
|
Requests the ability to establish a database connection through which it
|
|
can later run transactions and queries. A background worker using
|
|
<literal>BGWORKER_BACKEND_DATABASE_CONNECTION</literal> to connect to a
|
|
database must also attach shared memory using
|
|
<literal>BGWORKER_SHMEM_ACCESS</literal>, or worker start-up will fail.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
</para>
|
|
|
|
<para>
|
|
<structfield>bgw_start_time</structfield> is the server state during which
|
|
<command>postgres</> should start the process; it can be one of
|
|
<literal>BgWorkerStart_PostmasterStart</> (start as soon as
|
|
<command>postgres</> itself has finished its own initialization; processes
|
|
requesting this are not eligible for database connections),
|
|
<literal>BgWorkerStart_ConsistentState</> (start as soon as a consistent state
|
|
has been reached in a hot standby, allowing processes to connect to
|
|
databases and run read-only queries), and
|
|
<literal>BgWorkerStart_RecoveryFinished</> (start as soon as the system has
|
|
entered normal read-write state). Note the last two values are equivalent
|
|
in a server that's not a hot standby. Note that this setting only indicates
|
|
when the processes are to be started; they do not stop when a different state
|
|
is reached.
|
|
</para>
|
|
|
|
<para>
|
|
<structfield>bgw_restart_time</structfield> is the interval, in seconds, that
|
|
<command>postgres</command> should wait before restarting the process, in
|
|
case it crashes. It can be any positive value,
|
|
or <literal>BGW_NEVER_RESTART</literal>, indicating not to restart the
|
|
process in case of a crash.
|
|
</para>
|
|
|
|
<para>
|
|
<structfield>bgw_main</structfield> is a pointer to the function to run when
|
|
the process is started. This field can only safely be used to launch
|
|
functions within the core server, because shared libraries may be loaded
|
|
at different starting addresses in different backend processes. This will
|
|
happen on all platforms when the library is loaded using any mechanism
|
|
other than <xref linkend="guc-shared-preload-libraries">. Even when that
|
|
mechanism is used, address space layout variations will still occur on
|
|
Windows, and when <literal>EXEC_BACKEND</> is used. Therefore, most users
|
|
of this API should set this field to NULL. If it is non-NULL, it takes
|
|
precedence over <structfield>bgw_library_name</> and
|
|
<structfield>bgw_function_name</>.
|
|
</para>
|
|
|
|
<para>
|
|
<structfield>bgw_library_name</structfield> is the name of a library in
|
|
which the initial entry point for the background worker should be sought.
|
|
The named library will be dynamically loaded by the worker process and
|
|
<structfield>bgw_function_name</structfield> will be used to identify the
|
|
function to be called. If loading a function from the core code,
|
|
<structfield>bgw_main</> should be set instead.
|
|
</para>
|
|
|
|
<para>
|
|
<structfield>bgw_function_name</structfield> is the name of a function in
|
|
a dynamically loaded library which should be used as the initial entry point
|
|
for a new background worker.
|
|
</para>
|
|
|
|
<para>
|
|
<structfield>bgw_main_arg</structfield> is the <type>Datum</> argument
|
|
to the background worker main function. Regardless of whether that
|
|
function is specified via <structfield>bgw_main</> or via the combination
|
|
of <function>bgw_library_name</> and <function>bgw_function_name</>,
|
|
this main function should take a single argument of type <type>Datum</>
|
|
and return <type>void</>. <structfield>bgw_main_arg</structfield> will be
|
|
passed as the argument. In addition, the global variable
|
|
<literal>MyBgworkerEntry</literal>
|
|
points to a copy of the <structname>BackgroundWorker</structname> structure
|
|
passed at registration time; the worker may find it helpful to examine
|
|
this structure.
|
|
</para>
|
|
|
|
<para>
|
|
On Windows (and anywhere else where <literal>EXEC_BACKEND</literal> is
|
|
defined) or in dynamic background workers it is not safe to pass a
|
|
<type>Datum</> by reference, only by value. If an argument is required, it
|
|
is safest to pass an int32 or other small value and use that as an index
|
|
into an array allocated in shared memory. If a value like a <type>cstring</>
|
|
or <type>text</type> is passed then the pointer won't be valid from the
|
|
new background worker process.
|
|
</para>
|
|
|
|
<para>
|
|
<structfield>bgw_extra</structfield> can contain extra data to be passed
|
|
to the background worker. Unlike <structfield>bgw_main_arg</>, this data
|
|
is not passed as an argument to the worker's main function, but it can be
|
|
accessed via <literal>MyBgworkerEntry</literal>, as discussed above.
|
|
</para>
|
|
|
|
<para>
|
|
<structfield>bgw_notify_pid</structfield> is the PID of a PostgreSQL
|
|
backend process to which the postmaster should send <literal>SIGUSR1</>
|
|
when the process is started or exits. It should be 0 for workers registered
|
|
at postmaster startup time, or when the backend registering the worker does
|
|
not wish to wait for the worker to start up. Otherwise, it should be
|
|
initialized to <literal>MyProcPid</>.
|
|
</para>
|
|
|
|
<para>Once running, the process can connect to a database by calling
|
|
<function>BackgroundWorkerInitializeConnection(<parameter>char *dbname</parameter>, <parameter>char *username</parameter>)</function> or
|
|
<function>BackgroundWorkerInitializeConnectionByOid(<parameter>Oid dboid</parameter>, <parameter>Oid useroid</parameter>)</function>.
|
|
This allows the process to run transactions and queries using the
|
|
<literal>SPI</literal> interface. If <varname>dbname</> is NULL or
|
|
<varname>dboid</> is <literal>InvalidOid</>, the session is not connected
|
|
to any particular database, but shared catalogs can be accessed.
|
|
If <varname>username</> is NULL or <varname>useroid</> is
|
|
<literal>InvalidOid</>, the process will run as the superuser created
|
|
during <command>initdb</>.
|
|
A background worker can only call one of these two functions, and only
|
|
once. It is not possible to switch databases.
|
|
</para>
|
|
|
|
<para>
|
|
Signals are initially blocked when control reaches the
|
|
<structfield>bgw_main</> function, and must be unblocked by it; this is to
|
|
allow the process to customize its signal handlers, if necessary.
|
|
Signals can be unblocked in the new process by calling
|
|
<function>BackgroundWorkerUnblockSignals</> and blocked by calling
|
|
<function>BackgroundWorkerBlockSignals</>.
|
|
</para>
|
|
|
|
<para>
|
|
If <structfield>bgw_restart_time</structfield> for a background worker is
|
|
configured as <literal>BGW_NEVER_RESTART</>, or if it exits with an exit
|
|
code of 0 or is terminated by <function>TerminateBackgroundWorker</>,
|
|
it will be automatically unregistered by the postmaster on exit.
|
|
Otherwise, it will be restarted after the time period configured via
|
|
<structfield>bgw_restart_time</>, or immediately if the postmaster
|
|
reinitializes the cluster due to a backend failure. Backends which need
|
|
to suspend execution only temporarily should use an interruptible sleep
|
|
rather than exiting; this can be achieved by calling
|
|
<function>WaitLatch()</function>. Make sure the
|
|
<literal>WL_POSTMASTER_DEATH</> flag is set when calling that function, and
|
|
verify the return code for a prompt exit in the emergency case that
|
|
<command>postgres</> itself has terminated.
|
|
</para>
|
|
|
|
<para>
|
|
When a background worker is registered using the
|
|
<function>RegisterDynamicBackgroundWorker</function> function, it is
|
|
possible for the backend performing the registration to obtain information
|
|
regarding the status of the worker. Backends wishing to do this should
|
|
pass the address of a <type>BackgroundWorkerHandle *</type> as the second
|
|
argument to <function>RegisterDynamicBackgroundWorker</function>. If the
|
|
worker is successfully registered, this pointer will be initialized with an
|
|
opaque handle that can subsequently be passed to
|
|
<function>GetBackgroundWorkerPid(<parameter>BackgroundWorkerHandle *</parameter>, <parameter>pid_t *</parameter>)</function> or
|
|
<function>TerminateBackgroundWorker(<parameter>BackgroundWorkerHandle *</parameter>)</function>.
|
|
<function>GetBackgroundWorkerPid</> can be used to poll the status of the
|
|
worker: a return value of <literal>BGWH_NOT_YET_STARTED</> indicates that
|
|
the worker has not yet been started by the postmaster;
|
|
<literal>BGWH_STOPPED</literal> indicates that it has been started but is
|
|
no longer running; and <literal>BGWH_STARTED</literal> indicates that it is
|
|
currently running. In this last case, the PID will also be returned via the
|
|
second argument.
|
|
<function>TerminateBackgroundWorker</> causes the postmaster to send
|
|
<literal>SIGTERM</> to the worker if it is running, and to unregister it
|
|
as soon as it is not.
|
|
</para>
|
|
|
|
<para>
|
|
In some cases, a process which registers a background worker may wish to
|
|
wait for the worker to start up. This can be accomplished by initializing
|
|
<structfield>bgw_notify_pid</structfield> to <literal>MyProcPid</> and
|
|
then passing the <type>BackgroundWorkerHandle *</type> obtained at
|
|
registration time to
|
|
<function>WaitForBackgroundWorkerStartup(<parameter>BackgroundWorkerHandle
|
|
*handle</parameter>, <parameter>pid_t *</parameter>)</function> function.
|
|
This function will block until the postmaster has attempted to start the
|
|
background worker, or until the postmaster dies. If the background runner
|
|
is running, the return value will <literal>BGWH_STARTED</>, and
|
|
the PID will be written to the provided address. Otherwise, the return
|
|
value will be <literal>BGWH_STOPPED</literal> or
|
|
<literal>BGWH_POSTMASTER_DIED</literal>.
|
|
</para>
|
|
|
|
<para>
|
|
If a background worker sends asynchronous notifications with the
|
|
<command>NOTIFY</command> command via the Server Programming Interface
|
|
(<acronym>SPI</acronym>), it should call
|
|
<function>ProcessCompletedNotifies</function> explicitly after committing
|
|
the enclosing transaction so that any notifications can be delivered. If a
|
|
background worker registers to receive asynchronous notifications with
|
|
the <command>LISTEN</command> through <acronym>SPI</acronym>, the worker
|
|
will log those notifications, but there is no programmatic way for the
|
|
worker to intercept and respond to those notifications.
|
|
</para>
|
|
|
|
<para>
|
|
The <filename>worker_spi</> contrib module contains a working example,
|
|
which demonstrates some useful techniques.
|
|
</para>
|
|
|
|
<para>
|
|
The maximum number of registered background workers is limited by
|
|
<xref linkend="guc-max-worker-processes">.
|
|
</para>
|
|
</chapter>
|