postgresql/doc/src/sgml/jit.sgml
Andres Freund c03c1449c0 Fix issues around EXPLAIN with JIT.
I (Andres) was more than a bit hasty in committing 33001fd7a7
after last minute changes, leading to a number of problems (jit output
was only shown for JIT in parallel workers, and just EXPLAIN without
ANALYZE didn't work).  Lukas luckily found these issues quickly.

Instead of combining instrumentation in in standard_ExecutorEnd(), do
so on demand in the new ExplainPrintJITSummary().

Also update a documentation example of the JIT output, changed in
52050ad8eb.

Author: Lukas Fittl, with minor changes by me
Discussion: https://postgr.es/m/CAP53PkxmgJht69pabxBXJBM+0oc6kf3KHMborLP7H2ouJ0CCtQ@mail.gmail.com
Backpatch: 11, where JIT compilation was introduced
2018-10-03 12:48:37 -07:00

286 lines
11 KiB
Plaintext

<!-- doc/src/sgml/jit.sgml -->
<chapter id="jit">
<title>Just-in-Time Compilation (<acronym>JIT</acronym>)</title>
<indexterm zone="jit">
<primary><acronym>JIT</acronym></primary>
</indexterm>
<indexterm>
<primary>Just-In-Time compilation</primary>
<see><acronym>JIT</acronym></see>
</indexterm>
<para>
This chapter explains what just-in-time compilation is, and how it can be
configured in <productname>PostgreSQL</productname>.
</para>
<sect1 id="jit-reason">
<title>What is <acronym>JIT</acronym> compilation?</title>
<para>
Just-in-Time (<acronym>JIT</acronym>) compilation is the process of turning
some form of interpreted program evaluation into a native program, and
doing so at run time.
For example, instead of using general-purpose code that can evaluate
arbitrary SQL expressions to evaluate a particular SQL predicate
like <literal>WHERE a.col = 3</literal>, it is possible to generate a
function that is specific to that expression and can be natively executed
by the CPU, yielding a speedup.
</para>
<para>
<productname>PostgreSQL</productname> has builtin support to perform
<acronym>JIT</acronym> compilation using <ulink
url="https://llvm.org/"><productname>LLVM</productname></ulink> when
<productname>PostgreSQL</productname> is built with
<link linkend="configure-with-llvm"><literal>--with-llvm</literal></link>.
</para>
<para>
See <filename>src/backend/jit/README</filename> for further details.
</para>
<sect2 id="jit-accelerated-operations">
<title><acronym>JIT</acronym> Accelerated Operations</title>
<para>
Currently <productname>PostgreSQL</productname>'s <acronym>JIT</acronym>
implementation has support for accelerating expression evaluation and
tuple deforming. Several other operations could be accelerated in the
future.
</para>
<para>
Expression evaluation is used to evaluate <literal>WHERE</literal>
clauses, target lists, aggregates and projections. It can be accelerated
by generating code specific to each case.
</para>
<para>
Tuple deforming is the process of transforming an on-disk tuple (see <xref
linkend="storage-tuple-layout"/>) into its in-memory representation.
It can be accelerated by creating a function specific to the table layout
and the number of columns to be extracted.
</para>
</sect2>
<sect2 id="jit-inlining">
<title>Inlining</title>
<para>
<productname>PostgreSQL</productname> is very extensible and allows new
data types, functions, operators and other database objects to be defined;
see <xref linkend="extend"/>. In fact the built-in objects are implemented
using nearly the same mechanisms. This extensibility implies some
overhead, for example due to function calls (see <xref linkend="xfunc"/>).
To reduce that overhead, <acronym>JIT</acronym> compilation can inline the
bodies of small functions into the expressions using them. That allows a
significant percentage of the overhead to be optimized away.
</para>
</sect2>
<sect2 id="jit-optimization">
<title>Optimization</title>
<para>
<productname>LLVM</productname> has support for optimizing generated
code. Some of the optimizations are cheap enough to be performed whenever
<acronym>JIT</acronym> is used, while others are only beneficial for
longer-running queries.
See <ulink url="https://llvm.org/docs/Passes.html#transform-passes"/> for
more details about optimizations.
</para>
</sect2>
</sect1>
<sect1 id="jit-decision">
<title>When to <acronym>JIT</acronym>?</title>
<para>
<acronym>JIT</acronym> compilation is beneficial primarily for long-running
CPU-bound queries. Frequently these will be analytical queries. For short
queries the added overhead of performing <acronym>JIT</acronym> compilation
will often be higher than the time it can save.
</para>
<para>
To determine whether <acronym>JIT</acronym> compilation should be used,
the total estimated cost of a query (see
<xref linkend="planner-stats-details"/> and
<xref linkend="runtime-config-query-constants"/>) is used.
The estimated cost of the query will be compared with the setting of <xref
linkend="guc-jit-above-cost"/>. If the cost is higher,
<acronym>JIT</acronym> compilation will be performed.
Two further decisions are then needed.
Firstly, if the estimated cost is more
than the setting of <xref linkend="guc-jit-inline-above-cost"/>, short
functions and operators used in the query will be inlined.
Secondly, if the estimated cost is more than the setting of <xref
linkend="guc-jit-optimize-above-cost"/>, expensive optimizations are
applied to improve the generated code.
Each of these options increases the <acronym>JIT</acronym> compilation
overhead, but can reduce query execution time considerably.
</para>
<para>
These cost-based decisions will be made at plan time, not execution
time. This means that when prepared statements are in use, and a generic
plan is used (see <xref linkend="sql-prepare"/>), the values of the
configuration parameters in effect at prepare time control the decisions,
not the settings at execution time.
</para>
<note>
<para>
If <xref linkend="guc-jit"/> is set to <literal>off</literal>, or if no
<acronym>JIT</acronym> implementation is available (for example because
the server was compiled without <literal>--with-llvm</literal>),
<acronym>JIT</acronym> will not be performed, even if it would be
beneficial based on the above criteria. Setting <xref linkend="guc-jit"/>
to <literal>off</literal> has effects at both plan and execution time.
</para>
</note>
<para>
<xref linkend="sql-explain"/> can be used to see whether
<acronym>JIT</acronym> is used or not. As an example, here is a query that
is not using <acronym>JIT</acronym>:
<screen>
=# EXPLAIN ANALYZE SELECT SUM(relpages) FROM pg_class;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Aggregate (cost=16.27..16.29 rows=1 width=8) (actual time=0.303..0.303 rows=1 loops=1)
-> Seq Scan on pg_class (cost=0.00..15.42 rows=342 width=4) (actual time=0.017..0.111 rows=356 loops=1)
Planning Time: 0.116 ms
Execution Time: 0.365 ms
(4 rows)
</screen>
Given the cost of the plan, it is entirely reasonable that no
<acronym>JIT</acronym> was used; the cost of <acronym>JIT</acronym> would
have been bigger than the potential savings. Adjusting the cost limits
will lead to <acronym>JIT</acronym> use:
<screen>
=# SET jit_above_cost = 10;
SET
=# EXPLAIN ANALYZE SELECT SUM(relpages) FROM pg_class;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Aggregate (cost=16.27..16.29 rows=1 width=8) (actual time=6.049..6.049 rows=1 loops=1)
-> Seq Scan on pg_class (cost=0.00..15.42 rows=342 width=4) (actual time=0.019..0.052 rows=356 loops=1)
Planning Time: 0.133 ms
JIT:
Functions: 3
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.259 ms, Inlining 0.000 ms, Optimization 0.797 ms, Emission 5.048 ms, Total 7.104 ms
Execution Time: 7.416 ms
</screen>
As visible here, <acronym>JIT</acronym> was used, but inlining and
expensive optimization were not. If <xref
linkend="guc-jit-inline-above-cost"/> or <xref
linkend="guc-jit-optimize-above-cost"/> were also lowered,
that would change.
</para>
</sect1>
<sect1 id="jit-configuration" xreflabel="JIT Configuration">
<title>Configuration</title>
<para>
The configuration variable
<xref linkend="guc-jit"/> determines whether <acronym>JIT</acronym>
compilation is enabled or disabled.
If it is enabled, the configuration variables
<xref linkend="guc-jit-above-cost"/>, <xref
linkend="guc-jit-inline-above-cost"/>, and <xref
linkend="guc-jit-optimize-above-cost"/> determine
whether <acronym>JIT</acronym> compilation is performed for a query,
and how much effort is spent doing so.
</para>
<para>
<xref linkend="guc-jit-provider"/> determines which <acronym>JIT</acronym>
implementation is used. It is rarely required to be changed. See <xref
linkend="jit-pluggable"/>.
</para>
<para>
For development and debugging purposes a few additional configuration
parameters exist, as described in
<xref linkend="runtime-config-developer"/>.
</para>
</sect1>
<sect1 id="jit-extensibility">
<title>Extensibility</title>
<sect2 id="jit-extensibility-bitcode">
<title>Inlining Support for Extensions</title>
<para>
<productname>PostgreSQL</productname>'s <acronym>JIT</acronym>
implementation can inline the bodies of functions
of types <literal>C</literal> and <literal>internal</literal>, as well as
operators based on such functions. To do so for functions in extensions,
the definitions of those functions need to be made available.
When using <link linkend="extend-pgxs">PGXS</link> to build an extension
against a server that has been compiled with LLVM JIT support, the
relevant files will be built and installed automatically.
</para>
<para>
The relevant files have to be installed into
<filename>$pkglibdir/bitcode/$extension/</filename> and a summary of them
into <filename>$pkglibdir/bitcode/$extension.index.bc</filename>, where
<literal>$pkglibdir</literal> is the directory returned by
<literal>pg_config --pkglibdir</literal> and <literal>$extension</literal>
is the base name of the extension's shared library.
<note>
<para>
For functions built into <productname>PostgreSQL</productname> itself,
the bitcode is installed into
<literal>$pkglibdir/bitcode/postgres</literal>.
</para>
</note>
</para>
</sect2>
<sect2 id="jit-pluggable">
<title>Pluggable <acronym>JIT</acronym> Providers</title>
<para>
<productname>PostgreSQL</productname> provides a <acronym>JIT</acronym>
implementation based on <productname>LLVM</productname>. The interface to
the <acronym>JIT</acronym> provider is pluggable and the provider can be
changed without recompiling (although currently, the build process only
provides inlining support data for <productname>LLVM</productname>).
The active provider is chosen via the setting
<xref linkend="guc-jit-provider"/>.
</para>
<sect3>
<title><acronym>JIT</acronym> Provider Interface</title>
<para>
A <acronym>JIT</acronym> provider is loaded by dynamically loading the
named shared library. The normal library search path is used to locate
the library. To provide the required <acronym>JIT</acronym> provider
callbacks and to indicate that the library is actually a
<acronym>JIT</acronym> provider, it needs to provide a C function named
<function>_PG_jit_provider_init</function>. This function is passed a
struct that needs to be filled with the callback function pointers for
individual actions:
<programlisting>
struct JitProviderCallbacks
{
JitProviderResetAfterErrorCB reset_after_error;
JitProviderReleaseContextCB release_context;
JitProviderCompileExprCB compile_expr;
};
extern void _PG_jit_provider_init(JitProviderCallbacks *cb);
</programlisting>
</para>
</sect3>
</sect2>
</sect1>
</chapter>