2010-09-20 22:08:53 +02:00
|
|
|
<!-- doc/src/sgml/plhandler.sgml -->
|
2003-10-23 00:28:10 +02:00
|
|
|
|
|
|
|
<chapter id="plhandler">
|
2019-09-08 10:26:35 +02:00
|
|
|
<title>Writing a Procedural Language Handler</title>
|
2003-10-23 00:28:10 +02:00
|
|
|
|
|
|
|
<indexterm zone="plhandler">
|
|
|
|
<primary>procedural language</primary>
|
|
|
|
<secondary>handler for</secondary>
|
|
|
|
</indexterm>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
All calls to functions that are written in a language other than
|
|
|
|
the current <quote>version 1</quote> interface for compiled
|
2018-05-02 16:51:11 +02:00
|
|
|
languages (this includes functions in user-defined procedural languages
|
|
|
|
and functions written in SQL) go through a <firstterm>call handler</firstterm>
|
2003-10-23 00:28:10 +02:00
|
|
|
function for the specific language. It is the responsibility of
|
|
|
|
the call handler to execute the function in a meaningful way, such
|
|
|
|
as by interpreting the supplied source text. This chapter outlines
|
|
|
|
how a new procedural language's call handler can be written.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The call handler for a procedural language is a
|
|
|
|
<quote>normal</quote> function that must be written in a compiled
|
|
|
|
language such as C, using the version-1 interface, and registered
|
|
|
|
with <productname>PostgreSQL</productname> as taking no arguments
|
|
|
|
and returning the type <type>language_handler</type>. This
|
2017-01-25 15:27:09 +01:00
|
|
|
special pseudo-type identifies the function as a call handler and
|
2003-10-23 00:28:10 +02:00
|
|
|
prevents it from being called directly in SQL commands.
|
2010-06-08 22:12:59 +02:00
|
|
|
For more details on C language calling conventions and dynamic loading,
|
2017-11-23 15:39:47 +01:00
|
|
|
see <xref linkend="xfunc-c"/>.
|
2003-10-23 00:28:10 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The call handler is called in the same way as any other function:
|
|
|
|
It receives a pointer to a
|
Change function call information to be variable length.
Before this change FunctionCallInfoData, the struct arguments etc for
V1 function calls are stored in, always had space for
FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two
arrays. For nearly every function call 100 arguments is far more than
needed, therefore wasting memory. Arg and argnull being two separate
arrays also guarantees that to access a single argument, two
cachelines have to be touched.
Change the layout so there's a single variable-length array with pairs
of value / isnull. That drastically reduces memory consumption for
most function calls (on x86-64 a two argument function now uses
64bytes, previously 936 bytes), and makes it very likely that argument
value and its nullness are on the same cacheline.
Arguments are stored in a new NullableDatum struct, which, due to
padding, needs more memory per argument than before. But as usually
far fewer arguments are stored, and individual arguments are cheaper
to access, that's still a clear win. It's likely that there's other
places where conversion to NullableDatum arrays would make sense,
e.g. TupleTableSlots, but that's for another commit.
Because the function call information is now variable-length
allocations have to take the number of arguments into account. For
heap allocations that can be done with SizeForFunctionCallInfoData(),
for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro
that helps to allocate an appropriately sized and aligned variable.
Some places with stack allocation function call information don't know
the number of arguments at compile time, and currently variably sized
stack allocations aren't allowed in postgres. Therefore allow for
FUNC_MAX_ARGS space in these cases. They're not that common, so for
now that seems acceptable.
Because of the need to allocate FunctionCallInfo of the appropriate
size, older extensions may need to update their code. To avoid subtle
breakages, the FunctionCallInfoData struct has been renamed to
FunctionCallInfoBaseData. Most code only references FunctionCallInfo,
so that shouldn't cause much collateral damage.
This change is also a prerequisite for more efficient expression JIT
compilation (by allocating the function call information on the stack,
allowing LLVM to optimize it away); previously the size of the call
information caused problems inside LLVM's optimizer.
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
2019-01-26 23:17:52 +01:00
|
|
|
<structname>FunctionCallInfoBaseData</structname> <type>struct</type> containing
|
2003-10-23 00:28:10 +02:00
|
|
|
argument values and information about the called function, and it
|
|
|
|
is expected to return a <type>Datum</type> result (and possibly
|
|
|
|
set the <structfield>isnull</structfield> field of the
|
Change function call information to be variable length.
Before this change FunctionCallInfoData, the struct arguments etc for
V1 function calls are stored in, always had space for
FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two
arrays. For nearly every function call 100 arguments is far more than
needed, therefore wasting memory. Arg and argnull being two separate
arrays also guarantees that to access a single argument, two
cachelines have to be touched.
Change the layout so there's a single variable-length array with pairs
of value / isnull. That drastically reduces memory consumption for
most function calls (on x86-64 a two argument function now uses
64bytes, previously 936 bytes), and makes it very likely that argument
value and its nullness are on the same cacheline.
Arguments are stored in a new NullableDatum struct, which, due to
padding, needs more memory per argument than before. But as usually
far fewer arguments are stored, and individual arguments are cheaper
to access, that's still a clear win. It's likely that there's other
places where conversion to NullableDatum arrays would make sense,
e.g. TupleTableSlots, but that's for another commit.
Because the function call information is now variable-length
allocations have to take the number of arguments into account. For
heap allocations that can be done with SizeForFunctionCallInfoData(),
for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro
that helps to allocate an appropriately sized and aligned variable.
Some places with stack allocation function call information don't know
the number of arguments at compile time, and currently variably sized
stack allocations aren't allowed in postgres. Therefore allow for
FUNC_MAX_ARGS space in these cases. They're not that common, so for
now that seems acceptable.
Because of the need to allocate FunctionCallInfo of the appropriate
size, older extensions may need to update their code. To avoid subtle
breakages, the FunctionCallInfoData struct has been renamed to
FunctionCallInfoBaseData. Most code only references FunctionCallInfo,
so that shouldn't cause much collateral damage.
This change is also a prerequisite for more efficient expression JIT
compilation (by allocating the function call information on the stack,
allowing LLVM to optimize it away); previously the size of the call
information caused problems inside LLVM's optimizer.
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
2019-01-26 23:17:52 +01:00
|
|
|
<structname>FunctionCallInfoBaseData</structname> structure, if it wishes
|
2003-10-23 00:28:10 +02:00
|
|
|
to return an SQL null result). The difference between a call
|
|
|
|
handler and an ordinary callee function is that the
|
|
|
|
<structfield>flinfo->fn_oid</structfield> field of the
|
Change function call information to be variable length.
Before this change FunctionCallInfoData, the struct arguments etc for
V1 function calls are stored in, always had space for
FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two
arrays. For nearly every function call 100 arguments is far more than
needed, therefore wasting memory. Arg and argnull being two separate
arrays also guarantees that to access a single argument, two
cachelines have to be touched.
Change the layout so there's a single variable-length array with pairs
of value / isnull. That drastically reduces memory consumption for
most function calls (on x86-64 a two argument function now uses
64bytes, previously 936 bytes), and makes it very likely that argument
value and its nullness are on the same cacheline.
Arguments are stored in a new NullableDatum struct, which, due to
padding, needs more memory per argument than before. But as usually
far fewer arguments are stored, and individual arguments are cheaper
to access, that's still a clear win. It's likely that there's other
places where conversion to NullableDatum arrays would make sense,
e.g. TupleTableSlots, but that's for another commit.
Because the function call information is now variable-length
allocations have to take the number of arguments into account. For
heap allocations that can be done with SizeForFunctionCallInfoData(),
for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro
that helps to allocate an appropriately sized and aligned variable.
Some places with stack allocation function call information don't know
the number of arguments at compile time, and currently variably sized
stack allocations aren't allowed in postgres. Therefore allow for
FUNC_MAX_ARGS space in these cases. They're not that common, so for
now that seems acceptable.
Because of the need to allocate FunctionCallInfo of the appropriate
size, older extensions may need to update their code. To avoid subtle
breakages, the FunctionCallInfoData struct has been renamed to
FunctionCallInfoBaseData. Most code only references FunctionCallInfo,
so that shouldn't cause much collateral damage.
This change is also a prerequisite for more efficient expression JIT
compilation (by allocating the function call information on the stack,
allowing LLVM to optimize it away); previously the size of the call
information caused problems inside LLVM's optimizer.
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
2019-01-26 23:17:52 +01:00
|
|
|
<structname>FunctionCallInfoBaseData</structname> structure will contain
|
2003-10-23 00:28:10 +02:00
|
|
|
the OID of the actual function to be called, not of the call
|
|
|
|
handler itself. The call handler must use this field to determine
|
|
|
|
which function to execute. Also, the passed argument list has
|
|
|
|
been set up according to the declaration of the target function,
|
|
|
|
not of the call handler.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
It's up to the call handler to fetch the entry of the function from the
|
2009-10-08 06:41:07 +02:00
|
|
|
<classname>pg_proc</classname> system catalog and to analyze the argument
|
2017-10-09 03:44:17 +02:00
|
|
|
and return types of the called function. The <literal>AS</literal> clause from the
|
2005-01-06 00:42:03 +01:00
|
|
|
<command>CREATE FUNCTION</command> command for the function will be found
|
2003-10-23 00:28:10 +02:00
|
|
|
in the <literal>prosrc</literal> column of the
|
2005-01-06 00:42:03 +01:00
|
|
|
<classname>pg_proc</classname> row. This is commonly source
|
|
|
|
text in the procedural language, but in theory it could be something else,
|
|
|
|
such as a path name to a file, or anything else that tells the call handler
|
2003-10-23 00:28:10 +02:00
|
|
|
what to do in detail.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Often, the same function is called many times per SQL statement.
|
|
|
|
A call handler can avoid repeated lookups of information about the
|
|
|
|
called function by using the
|
|
|
|
<structfield>flinfo->fn_extra</structfield> field. This will
|
2017-10-09 03:44:17 +02:00
|
|
|
initially be <symbol>NULL</symbol>, but can be set by the call handler to point at
|
2003-10-23 00:28:10 +02:00
|
|
|
information about the called function. On subsequent calls, if
|
2017-10-09 03:44:17 +02:00
|
|
|
<structfield>flinfo->fn_extra</structfield> is already non-<symbol>NULL</symbol>
|
2003-10-23 00:28:10 +02:00
|
|
|
then it can be used and the information lookup step skipped. The
|
|
|
|
call handler must make sure that
|
|
|
|
<structfield>flinfo->fn_extra</structfield> is made to point at
|
|
|
|
memory that will live at least until the end of the current query,
|
|
|
|
since an <structname>FmgrInfo</structname> data structure could be
|
|
|
|
kept that long. One way to do this is to allocate the extra data
|
|
|
|
in the memory context specified by
|
|
|
|
<structfield>flinfo->fn_mcxt</structfield>; such data will
|
|
|
|
normally have the same lifespan as the
|
|
|
|
<structname>FmgrInfo</structname> itself. But the handler could
|
|
|
|
also choose to use a longer-lived memory context so that it can cache
|
|
|
|
function definition information across queries.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
When a procedural-language function is invoked as a trigger, no arguments
|
|
|
|
are passed in the usual way, but the
|
Change function call information to be variable length.
Before this change FunctionCallInfoData, the struct arguments etc for
V1 function calls are stored in, always had space for
FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two
arrays. For nearly every function call 100 arguments is far more than
needed, therefore wasting memory. Arg and argnull being two separate
arrays also guarantees that to access a single argument, two
cachelines have to be touched.
Change the layout so there's a single variable-length array with pairs
of value / isnull. That drastically reduces memory consumption for
most function calls (on x86-64 a two argument function now uses
64bytes, previously 936 bytes), and makes it very likely that argument
value and its nullness are on the same cacheline.
Arguments are stored in a new NullableDatum struct, which, due to
padding, needs more memory per argument than before. But as usually
far fewer arguments are stored, and individual arguments are cheaper
to access, that's still a clear win. It's likely that there's other
places where conversion to NullableDatum arrays would make sense,
e.g. TupleTableSlots, but that's for another commit.
Because the function call information is now variable-length
allocations have to take the number of arguments into account. For
heap allocations that can be done with SizeForFunctionCallInfoData(),
for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro
that helps to allocate an appropriately sized and aligned variable.
Some places with stack allocation function call information don't know
the number of arguments at compile time, and currently variably sized
stack allocations aren't allowed in postgres. Therefore allow for
FUNC_MAX_ARGS space in these cases. They're not that common, so for
now that seems acceptable.
Because of the need to allocate FunctionCallInfo of the appropriate
size, older extensions may need to update their code. To avoid subtle
breakages, the FunctionCallInfoData struct has been renamed to
FunctionCallInfoBaseData. Most code only references FunctionCallInfo,
so that shouldn't cause much collateral damage.
This change is also a prerequisite for more efficient expression JIT
compilation (by allocating the function call information on the stack,
allowing LLVM to optimize it away); previously the size of the call
information caused problems inside LLVM's optimizer.
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
2019-01-26 23:17:52 +01:00
|
|
|
<structname>FunctionCallInfoBaseData</structname>'s
|
2003-10-23 00:28:10 +02:00
|
|
|
<structfield>context</structfield> field points at a
|
2017-10-09 03:44:17 +02:00
|
|
|
<structname>TriggerData</structname> structure, rather than being <symbol>NULL</symbol>
|
2003-10-23 00:28:10 +02:00
|
|
|
as it is in a plain function call. A language handler should
|
|
|
|
provide mechanisms for procedural-language functions to get at the trigger
|
|
|
|
information.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2020-08-18 04:10:50 +02:00
|
|
|
A template for a procedural-language handler written as a C extension is
|
|
|
|
provided in <literal>src/test/modules/plsample</literal>. This is a
|
|
|
|
working sample demonstrating one way to create a procedural-language
|
|
|
|
handler, process parameters, and return a value.
|
2003-10-23 00:28:10 +02:00
|
|
|
</para>
|
|
|
|
|
2009-10-08 06:41:07 +02:00
|
|
|
<para>
|
|
|
|
Although providing a call handler is sufficient to create a minimal
|
|
|
|
procedural language, there are two other functions that can optionally
|
|
|
|
be provided to make the language more convenient to use. These
|
|
|
|
are a <firstterm>validator</firstterm> and an
|
|
|
|
<firstterm>inline handler</firstterm>. A validator can be provided
|
|
|
|
to allow language-specific checking to be done during
|
2017-11-23 15:39:47 +01:00
|
|
|
<xref linkend="sql-createfunction"/>.
|
2009-10-08 06:41:07 +02:00
|
|
|
An inline handler can be provided to allow the language to support
|
2017-11-23 15:39:47 +01:00
|
|
|
anonymous code blocks executed via the <xref linkend="sql-do"/> command.
|
2009-10-08 06:41:07 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
If a validator is provided by a procedural language, it
|
|
|
|
must be declared as a function taking a single parameter of type
|
2017-10-09 03:44:17 +02:00
|
|
|
<type>oid</type>. The validator's result is ignored, so it is customarily
|
|
|
|
declared to return <type>void</type>. The validator will be called at
|
|
|
|
the end of a <command>CREATE FUNCTION</command> command that has created
|
2009-10-08 06:41:07 +02:00
|
|
|
or updated a function written in the procedural language.
|
2017-10-09 03:44:17 +02:00
|
|
|
The passed-in OID is the OID of the function's <classname>pg_proc</classname>
|
2009-10-08 06:41:07 +02:00
|
|
|
row. The validator must fetch this row in the usual way, and do
|
2014-02-17 15:33:31 +01:00
|
|
|
whatever checking is appropriate.
|
2017-10-09 03:44:17 +02:00
|
|
|
First, call <function>CheckFunctionValidatorAccess()</function> to diagnose
|
2014-02-17 15:33:31 +01:00
|
|
|
explicit calls to the validator that the user could not achieve through
|
2017-10-09 03:44:17 +02:00
|
|
|
<command>CREATE FUNCTION</command>. Typical checks then include verifying
|
2009-10-08 06:41:07 +02:00
|
|
|
that the function's argument and result types are supported by the
|
|
|
|
language, and that the function's body is syntactically correct
|
|
|
|
in the language. If the validator finds the function to be okay,
|
|
|
|
it should just return. If it finds an error, it should report that
|
2017-10-09 03:44:17 +02:00
|
|
|
via the normal <function>ereport()</function> error reporting mechanism.
|
2009-10-08 06:41:07 +02:00
|
|
|
Throwing an error will force a transaction rollback and thus prevent
|
|
|
|
the incorrect function definition from being committed.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Validator functions should typically honor the <xref
|
2017-11-23 15:39:47 +01:00
|
|
|
linkend="guc-check-function-bodies"/> parameter: if it is turned off then
|
2014-02-17 15:33:31 +01:00
|
|
|
any expensive or context-sensitive checking should be skipped. If the
|
|
|
|
language provides for code execution at compilation time, the validator
|
|
|
|
must suppress checks that would induce such execution. In particular,
|
2017-10-09 03:44:17 +02:00
|
|
|
this parameter is turned off by <application>pg_dump</application> so that it can
|
2014-02-17 15:33:31 +01:00
|
|
|
load procedural language functions without worrying about side effects or
|
|
|
|
dependencies of the function bodies on other database objects.
|
|
|
|
(Because of this requirement, the call handler should avoid
|
2009-10-08 06:41:07 +02:00
|
|
|
assuming that the validator has fully checked the function. The point
|
|
|
|
of having a validator is not to let the call handler omit checks, but
|
|
|
|
to notify the user immediately if there are obvious errors in a
|
2017-10-09 03:44:17 +02:00
|
|
|
<command>CREATE FUNCTION</command> command.)
|
2013-09-04 00:32:20 +02:00
|
|
|
While the choice of exactly what to check is mostly left to the
|
|
|
|
discretion of the validator function, note that the core
|
2017-10-09 03:44:17 +02:00
|
|
|
<command>CREATE FUNCTION</command> code only executes <literal>SET</literal> clauses
|
|
|
|
attached to a function when <varname>check_function_bodies</varname> is on.
|
2013-09-04 00:32:20 +02:00
|
|
|
Therefore, checks whose results might be affected by GUC parameters
|
2017-10-09 03:44:17 +02:00
|
|
|
definitely should be skipped when <varname>check_function_bodies</varname> is
|
2022-07-21 20:55:23 +02:00
|
|
|
off, to avoid false failures when restoring a dump.
|
2009-10-08 06:41:07 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
If an inline handler is provided by a procedural language, it
|
|
|
|
must be declared as a function taking a single parameter of type
|
2017-10-09 03:44:17 +02:00
|
|
|
<type>internal</type>. The inline handler's result is ignored, so it is
|
|
|
|
customarily declared to return <type>void</type>. The inline handler
|
|
|
|
will be called when a <command>DO</command> statement is executed specifying
|
2009-10-08 06:41:07 +02:00
|
|
|
the procedural language. The parameter actually passed is a pointer
|
2017-10-09 03:44:17 +02:00
|
|
|
to an <structname>InlineCodeBlock</structname> struct, which contains information
|
|
|
|
about the <command>DO</command> statement's parameters, in particular the
|
2009-10-08 06:41:07 +02:00
|
|
|
text of the anonymous code block to be executed. The inline handler
|
|
|
|
should execute this code and return.
|
|
|
|
</para>
|
|
|
|
|
2011-03-05 07:08:38 +01:00
|
|
|
<para>
|
|
|
|
It's recommended that you wrap all these function declarations,
|
2017-10-09 03:44:17 +02:00
|
|
|
as well as the <command>CREATE LANGUAGE</command> command itself, into
|
|
|
|
an <firstterm>extension</firstterm> so that a simple <command>CREATE EXTENSION</command>
|
2011-03-05 07:08:38 +01:00
|
|
|
command is sufficient to install the language. See
|
2017-11-23 15:39:47 +01:00
|
|
|
<xref linkend="extend-extensions"/> for information about writing
|
2011-03-05 07:08:38 +01:00
|
|
|
extensions.
|
|
|
|
</para>
|
|
|
|
|
2004-12-30 22:45:37 +01:00
|
|
|
<para>
|
|
|
|
The procedural languages included in the standard distribution
|
2009-10-08 06:41:07 +02:00
|
|
|
are good references when trying to write your own language handler.
|
2017-10-09 03:44:17 +02:00
|
|
|
Look into the <filename>src/pl</filename> subdirectory of the source tree.
|
2017-11-23 15:39:47 +01:00
|
|
|
The <xref linkend="sql-createlanguage"/>
|
2009-10-08 06:41:07 +02:00
|
|
|
reference page also has some useful details.
|
2004-12-30 22:45:37 +01:00
|
|
|
</para>
|
|
|
|
|
2003-10-23 00:28:10 +02:00
|
|
|
</chapter>
|