postgresql/doc/src/sgml/bgworker.sgml

<!-- doc/src/sgml/bgworker.sgml -->

<chapter id="bgworker">
 <title>Background Worker Processes</title>

 <indexterm zone="bgworker">
  <primary>Background workers</primary>
 </indexterm>

 <para>
  PostgreSQL can be extended to run user-supplied code in separate processes.
  Such processes are started, stopped and monitored by <command>postgres</command>,
  which permits them to have a lifetime closely linked to the server's status.
  These processes have the option to attach to <productname>PostgreSQL</>'s
  shared memory area and to connect to databases internally; they can also run
  multiple transactions serially, just like a regular client-connected server
  process.  Also, by linking to <application>libpq</> they can connect to the
  server and behave like a regular client application.
 </para>

 <warning>
  <para>
   There are considerable robustness and security risks in using background
   worker processes because, being written in the <literal>C</> language,
   they have unrestricted access to data.  Administrators wishing to enable
   modules that include background worker process should exercise extreme
   caution.  Only carefully audited modules should be permitted to run
   background worker processes.
  </para>
 </warning>

 <para>
  Background workers can be initialized at the time that
  <productname>PostgreSQL</> is started by including the module name in
  <varname>shared_preload_libraries</>.  A module wishing to run a background
  worker can register it by calling
  <function>RegisterBackgroundWorker(<type>BackgroundWorker *worker</type>)</function>
  from its <function>_PG_init()</>.  Background workers can also be started
  after the system is up and running by calling the function
  <function>RegisterDynamicBackgroundWorker(<type>BackgroundWorker
  *worker, BackgroundWorkerHandle **handle</type>)</function>.  Unlike
  <function>RegisterBackgroundWorker</>, which can only be called from within
  the postmaster, <function>RegisterDynamicBackgroundWorker</function> must be
  called from a regular backend.
 </para>

 <para>
  The structure <structname>BackgroundWorker</structname> is defined thus:
<programlisting>
typedef void (*bgworker_main_type)(Datum main_arg);
typedef struct BackgroundWorker
{
    char        bgw_name[BGW_MAXLEN];
    char        bgw_type[BGW_MAXLEN];
    int         bgw_flags;
    BgWorkerStartTime bgw_start_time;
    int         bgw_restart_time;       /* in seconds, or BGW_NEVER_RESTART */
    char        bgw_library_name[BGW_MAXLEN];
    char        bgw_function_name[BGW_MAXLEN];
    Datum       bgw_main_arg;
    char        bgw_extra[BGW_EXTRALEN];
    int         bgw_notify_pid;
} BackgroundWorker;
</programlisting>
  </para>

  <para>
   <structfield>bgw_name</> and <structfield>bgw_type</structfield> are
   strings to be used in log messages, process listings and similar contexts.
   <structfield>bgw_type</structfield> should be the same for all background
   workers of the same type, so that it is possible to group such workers in a
   process listing, for example.  <structfield>bgw_name</structfield> on the
   other hand can contain additional information about the specific process.
   (Typically, the string for <structfield>bgw_name</structfield> will contain
   the type somehow, but that is not strictly required.)
  </para>

  <para>
   <structfield>bgw_flags</> is a bitwise-or'd bit mask indicating the
   capabilities that the module wants.  Possible values are:
   <variablelist>

    <varlistentry>
     <term><literal>BGWORKER_SHMEM_ACCESS</literal></term>
     <listitem>
      <para>
       <indexterm><primary>BGWORKER_SHMEM_ACCESS</primary></indexterm>
       Requests shared memory access.  Workers without shared memory access
       cannot access any of <productname>PostgreSQL's</productname> shared
       data structures, such as heavyweight or lightweight locks, shared
       buffers, or any custom data structures which the worker itself may
       wish to create and use.
      </para>
     </listitem>
    </varlistentry>

    <varlistentry>
     <term><literal>BGWORKER_BACKEND_DATABASE_CONNECTION</literal></term>
     <listitem>
      <para>
       <indexterm><primary>BGWORKER_BACKEND_DATABASE_CONNECTION</primary></indexterm>
       Requests the ability to establish a database connection through which it
       can later run transactions and queries.  A background worker using
       <literal>BGWORKER_BACKEND_DATABASE_CONNECTION</literal> to connect to a
       database must also attach shared memory using
       <literal>BGWORKER_SHMEM_ACCESS</literal>, or worker start-up will fail.
      </para>
     </listitem>
    </varlistentry>

   </variablelist>

  </para>

  <para>
   <structfield>bgw_start_time</structfield> is the server state during which
   <command>postgres</> should start the process; it can be one of
   <literal>BgWorkerStart_PostmasterStart</> (start as soon as
   <command>postgres</> itself has finished its own initialization; processes
   requesting this are not eligible for database connections),
   <literal>BgWorkerStart_ConsistentState</> (start as soon as a consistent state
   has been reached in a hot standby, allowing processes to connect to
   databases and run read-only queries), and
   <literal>BgWorkerStart_RecoveryFinished</> (start as soon as the system has
   entered normal read-write state).  Note the last two values are equivalent
   in a server that's not a hot standby.  Note that this setting only indicates
   when the processes are to be started; they do not stop when a different state
   is reached.
  </para>

  <para>
   <structfield>bgw_restart_time</structfield> is the interval, in seconds, that
   <command>postgres</command> should wait before restarting the process, in
   case it crashes.  It can be any positive value,
   or <literal>BGW_NEVER_RESTART</literal>, indicating not to restart the
   process in case of a crash.
  </para>

  <para>
   <structfield>bgw_library_name</structfield> is the name of a library in
   which the initial entry point for the background worker should be sought.
   The named library will be dynamically loaded by the worker process and
   <structfield>bgw_function_name</structfield> will be used to identify the
   function to be called.  If loading a function from the core code, this must
   be set to "postgres".
  </para>

  <para>
   <structfield>bgw_function_name</structfield> is the name of a function in
   a dynamically loaded library which should be used as the initial entry point
   for a new background worker.
  </para>

  <para>
   <structfield>bgw_main_arg</structfield> is the <type>Datum</> argument
   to the background worker main function.  This main function should take a
   single argument of type <type>Datum</> and return <type>void</>.
   <structfield>bgw_main_arg</structfield> will be passed as the argument.
   In addition, the global variable <literal>MyBgworkerEntry</literal>
   points to a copy of the <structname>BackgroundWorker</structname> structure
   passed at registration time; the worker may find it helpful to examine
   this structure.
  </para>

  <para>
   On Windows (and anywhere else where <literal>EXEC_BACKEND</literal> is
   defined) or in dynamic background workers it is not safe to pass a
   <type>Datum</> by reference, only by value. If an argument is required, it
   is safest to pass an int32 or other small value and use that as an index
   into an array allocated in shared memory. If a value like a <type>cstring</>
   or <type>text</type> is passed then the pointer won't be valid from the
   new background worker process.
  </para>

  <para>
   <structfield>bgw_extra</structfield> can contain extra data to be passed
   to the background worker.  Unlike <structfield>bgw_main_arg</>, this data
   is not passed as an argument to the worker's main function, but it can be
   accessed via <literal>MyBgworkerEntry</literal>, as discussed above.
  </para>

  <para>
   <structfield>bgw_notify_pid</structfield> is the PID of a PostgreSQL
   backend process to which the postmaster should send <literal>SIGUSR1</>
   when the process is started or exits.  It should be 0 for workers registered
   at postmaster startup time, or when the backend registering the worker does
   not wish to wait for the worker to start up.  Otherwise, it should be
   initialized to <literal>MyProcPid</>.
  </para>

  <para>Once running, the process can connect to a database by calling
   <function>BackgroundWorkerInitializeConnection(<parameter>char *dbname</parameter>, <parameter>char *username</parameter>)</function> or
   <function>BackgroundWorkerInitializeConnectionByOid(<parameter>Oid dboid</parameter>, <parameter>Oid useroid</parameter>)</function>.
   This allows the process to run transactions and queries using the
   <literal>SPI</literal> interface.  If <varname>dbname</> is NULL or
   <varname>dboid</> is <literal>InvalidOid</>, the session is not connected
   to any particular database, but shared catalogs can be accessed.
   If <varname>username</> is NULL or <varname>useroid</> is
   <literal>InvalidOid</>, the process will run as the superuser created
   during <command>initdb</>.
   A background worker can only call one of these two functions, and only
   once.  It is not possible to switch databases.
  </para>

  <para>
   Signals are initially blocked when control reaches the
   background worker's main function, and must be unblocked by it; this is to
   allow the process to customize its signal handlers, if necessary.
   Signals can be unblocked in the new process by calling
   <function>BackgroundWorkerUnblockSignals</> and blocked by calling
   <function>BackgroundWorkerBlockSignals</>.
  </para>

  <para>
   If <structfield>bgw_restart_time</structfield> for a background worker is
   configured as <literal>BGW_NEVER_RESTART</>, or if it exits with an exit
   code of 0 or is terminated by <function>TerminateBackgroundWorker</>,
   it will be automatically unregistered by the postmaster on exit.
   Otherwise, it will be restarted after the time period configured via
   <structfield>bgw_restart_time</>, or immediately if the postmaster
   reinitializes the cluster due to a backend failure.  Backends which need
   to suspend execution only temporarily should use an interruptible sleep
   rather than exiting; this can be achieved by calling
   <function>WaitLatch()</function>. Make sure the
   <literal>WL_POSTMASTER_DEATH</> flag is set when calling that function, and
   verify the return code for a prompt exit in the emergency case that
   <command>postgres</> itself has terminated.
  </para>

  <para>
   When a background worker is registered using the
   <function>RegisterDynamicBackgroundWorker</function> function, it is
   possible for the backend performing the registration to obtain information
   regarding the status of the worker.  Backends wishing to do this should
   pass the address of a <type>BackgroundWorkerHandle *</type> as the second
   argument to <function>RegisterDynamicBackgroundWorker</function>.  If the
   worker is successfully registered, this pointer will be initialized with an
   opaque handle that can subsequently be passed to
   <function>GetBackgroundWorkerPid(<parameter>BackgroundWorkerHandle *</parameter>, <parameter>pid_t *</parameter>)</function> or
   <function>TerminateBackgroundWorker(<parameter>BackgroundWorkerHandle *</parameter>)</function>.
   <function>GetBackgroundWorkerPid</> can be used to poll the status of the
   worker: a return value of <literal>BGWH_NOT_YET_STARTED</> indicates that
   the worker has not yet been started by the postmaster;
   <literal>BGWH_STOPPED</literal> indicates that it has been started but is
   no longer running; and <literal>BGWH_STARTED</literal> indicates that it is
   currently running.  In this last case, the PID will also be returned via the
   second argument.
   <function>TerminateBackgroundWorker</> causes the postmaster to send
   <literal>SIGTERM</> to the worker if it is running, and to unregister it
   as soon as it is not.
  </para>

  <para>
   In some cases, a process which registers a background worker may wish to
   wait for the worker to start up.  This can be accomplished by initializing
   <structfield>bgw_notify_pid</structfield> to <literal>MyProcPid</> and
   then passing the <type>BackgroundWorkerHandle *</type> obtained at
   registration time to
   <function>WaitForBackgroundWorkerStartup(<parameter>BackgroundWorkerHandle
   *handle</parameter>, <parameter>pid_t *</parameter>)</function> function.
   This function will block until the postmaster has attempted to start the
   background worker, or until the postmaster dies.  If the background runner
   is running, the return value will <literal>BGWH_STARTED</>, and
   the PID will be written to the provided address.  Otherwise, the return
   value will be <literal>BGWH_STOPPED</literal> or
   <literal>BGWH_POSTMASTER_DIED</literal>.
  </para>

  <para>
   If a background worker sends asynchronous notifications with the
   <command>NOTIFY</command> command via the Server Programming Interface
   (<acronym>SPI</acronym>), it should call
   <function>ProcessCompletedNotifies</function> explicitly after committing
   the enclosing transaction so that any notifications can be delivered.  If a
   background worker registers to receive asynchronous notifications with
   the <command>LISTEN</command> through <acronym>SPI</acronym>, the worker
   will log those notifications, but there is no programmatic way for the
   worker to intercept and respond to those notifications.
  </para>

  <para>
   The <filename>src/test/modules/worker_spi</> module
   contains a working example,
   which demonstrates some useful techniques.
  </para>

  <para>
   The maximum number of registered background workers is limited by
   <xref linkend="guc-max-worker-processes">.
  </para>
</chapter>