2000-11-19 23:07:16 +01:00
|
|
|
Proposal for function-manager redesign 19-Nov-2000
|
2000-05-28 20:06:55 +02:00
|
|
|
--------------------------------------
|
|
|
|
|
|
|
|
We know that the existing mechanism for calling Postgres functions needs
|
|
|
|
to be redesigned. It has portability problems because it makes
|
|
|
|
assumptions about parameter passing that violate ANSI C; it fails to
|
|
|
|
handle NULL arguments and results cleanly; and "function handlers" that
|
|
|
|
support a class of functions (such as fmgr_pl) can only be done via a
|
|
|
|
really ugly, non-reentrant kluge. (Global variable set during every
|
|
|
|
function call, forsooth.) Here is a proposal for fixing these problems.
|
|
|
|
|
|
|
|
In the past, the major objections to redoing the function-manager
|
|
|
|
interface have been (a) it'll be quite tedious to implement, since every
|
|
|
|
built-in function and everyplace that calls such functions will need to
|
|
|
|
be touched; (b) such wide-ranging changes will be difficult to make in
|
|
|
|
parallel with other development work; (c) it will break existing
|
|
|
|
user-written loadable modules that define "C language" functions. While
|
|
|
|
I have no solution to the "tedium" aspect, I believe I see an answer to
|
|
|
|
the other problems: by use of function handlers, we can support both old
|
|
|
|
and new interfaces in parallel for both callers and callees, at some
|
|
|
|
small efficiency cost for the old styles. That way, most of the changes
|
|
|
|
can be done on an incremental file-by-file basis --- we won't need a
|
|
|
|
"big bang" where everything changes at once. Support for callees
|
|
|
|
written in the old style can be left in place indefinitely, to provide
|
|
|
|
backward compatibility for user-written C functions.
|
|
|
|
|
|
|
|
|
|
|
|
Changes in pg_proc (system data about a function)
|
|
|
|
-------------------------------------------------
|
|
|
|
|
|
|
|
A new column "proisstrict" will be added to the system pg_proc table.
|
|
|
|
This is a boolean value which will be TRUE if the function is "strict",
|
|
|
|
that is it always returns NULL when any of its inputs are NULL. The
|
|
|
|
function manager will check this field and skip calling the function when
|
|
|
|
it's TRUE and there are NULL inputs. This allows us to remove explicit
|
2000-11-19 23:07:16 +01:00
|
|
|
NULL-value tests from many functions that currently need them (not to
|
|
|
|
mention fixing many more that need them but don't have them). A function
|
2000-05-28 20:06:55 +02:00
|
|
|
that is not marked "strict" is responsible for checking whether its inputs
|
|
|
|
are NULL or not. Most builtin functions will be marked "strict".
|
|
|
|
|
|
|
|
An optional WITH parameter will be added to CREATE FUNCTION to allow
|
|
|
|
specification of whether user-defined functions are strict or not. I am
|
|
|
|
inclined to make the default be "not strict", since that seems to be the
|
|
|
|
more useful case for functions expressed in SQL or a PL language, but
|
|
|
|
am open to arguments for the other choice.
|
|
|
|
|
|
|
|
|
|
|
|
The new function-manager interface
|
|
|
|
----------------------------------
|
|
|
|
|
|
|
|
The core of the new design is revised data structures for representing
|
|
|
|
the result of a function lookup and for representing the parameters
|
|
|
|
passed to a specific function invocation. (We want to keep function
|
|
|
|
lookup separate from function call, since many parts of the system apply
|
|
|
|
the same function over and over; the lookup overhead should be paid once
|
|
|
|
per query, not once per tuple.)
|
|
|
|
|
|
|
|
|
|
|
|
When a function is looked up in pg_proc, the result is represented as
|
|
|
|
|
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
PGFunction fn_addr; /* pointer to function or handler to be called */
|
|
|
|
Oid fn_oid; /* OID of function (NOT of handler, if any) */
|
|
|
|
short fn_nargs; /* 0..FUNC_MAX_ARGS, or -1 if variable arg count */
|
|
|
|
bool fn_strict; /* function is "strict" (NULL in => NULL out) */
|
2000-11-19 23:07:16 +01:00
|
|
|
bool fn_retset; /* function returns a set (over multiple calls) */
|
2000-05-28 20:06:55 +02:00
|
|
|
void *fn_extra; /* extra space for use by handler */
|
2000-11-19 23:07:16 +01:00
|
|
|
MemoryContext fn_mcxt; /* memory context to store fn_extra in */
|
2003-04-09 01:20:04 +02:00
|
|
|
Node *fn_expr; /* expression parse tree for call, or NULL */
|
2000-05-28 20:06:55 +02:00
|
|
|
} FmgrInfo;
|
|
|
|
|
|
|
|
For an ordinary built-in function, fn_addr is just the address of the C
|
|
|
|
routine that implements the function. Otherwise it is the address of a
|
|
|
|
handler for the class of functions that includes the target function.
|
|
|
|
The handler can use the function OID and perhaps also the fn_extra slot
|
|
|
|
to find the specific code to execute. (fn_oid = InvalidOid can be used
|
|
|
|
to denote a not-yet-initialized FmgrInfo struct. fn_extra will always
|
|
|
|
be NULL when an FmgrInfo is first filled by the function lookup code, but
|
|
|
|
a function handler could set it to avoid making repeated lookups of its
|
|
|
|
own when the same FmgrInfo is used repeatedly during a query.) fn_nargs
|
2000-11-19 23:07:16 +01:00
|
|
|
is the number of arguments expected by the function, fn_strict is its
|
|
|
|
strictness flag, and fn_retset shows whether it returns a set; all of
|
2003-04-09 01:20:04 +02:00
|
|
|
these values come from the function's pg_proc entry. If the function is
|
|
|
|
being called as part of a SQL expression, fn_expr will point to the
|
|
|
|
expression parse tree for the function call; this can be used to extract
|
|
|
|
parse-time knowledge about the actual arguments.
|
2000-05-28 20:06:55 +02:00
|
|
|
|
|
|
|
FmgrInfo already exists in the current code, but has fewer fields. This
|
|
|
|
change should be transparent at the source-code level.
|
|
|
|
|
|
|
|
|
|
|
|
During a call of a function, the following data structure is created
|
|
|
|
and passed to the function:
|
|
|
|
|
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
FmgrInfo *flinfo; /* ptr to lookup info used for this call */
|
|
|
|
Node *context; /* pass info about context of call */
|
|
|
|
Node *resultinfo; /* pass or return extra info about result */
|
|
|
|
bool isnull; /* function must set true if result is NULL */
|
|
|
|
short nargs; /* # arguments actually passed */
|
|
|
|
Datum arg[FUNC_MAX_ARGS]; /* Arguments passed to function */
|
|
|
|
bool argnull[FUNC_MAX_ARGS]; /* T if arg[i] is actually NULL */
|
|
|
|
} FunctionCallInfoData;
|
|
|
|
typedef FunctionCallInfoData* FunctionCallInfo;
|
|
|
|
|
|
|
|
flinfo points to the lookup info used to make the call. Ordinary functions
|
|
|
|
will probably ignore this field, but function class handlers will need it
|
|
|
|
to find out the OID of the specific function being called.
|
|
|
|
|
|
|
|
context is NULL for an "ordinary" function call, but may point to additional
|
|
|
|
info when the function is called in certain contexts. (For example, the
|
|
|
|
trigger manager will pass information about the current trigger event here.)
|
|
|
|
If context is used, it should point to some subtype of Node; the particular
|
2000-11-19 23:07:16 +01:00
|
|
|
kind of context is indicated by the node type field. (A callee should
|
|
|
|
always check the node type before assuming it knows what kind of context is
|
|
|
|
being passed.) fmgr itself puts no other restrictions on the use of this
|
|
|
|
field.
|
2000-05-28 20:06:55 +02:00
|
|
|
|
|
|
|
resultinfo is NULL when calling any function from which a simple Datum
|
|
|
|
result is expected. It may point to some subtype of Node if the function
|
2000-11-19 23:07:16 +01:00
|
|
|
returns more than a Datum. (For example, resultinfo is used when calling a
|
|
|
|
function that returns a set, as discussed below.) Like the context field,
|
|
|
|
resultinfo is a hook for expansion; fmgr itself doesn't constrain the use
|
|
|
|
of the field.
|
2000-05-28 20:06:55 +02:00
|
|
|
|
|
|
|
nargs, arg[], and argnull[] hold the arguments being passed to the function.
|
|
|
|
Notice that all the arguments passed to a function (as well as its result
|
|
|
|
value) will now uniformly be of type Datum. As discussed below, callers
|
|
|
|
and callees should apply the standard Datum-to-and-from-whatever macros
|
|
|
|
to convert to the actual argument types of a particular function. The
|
|
|
|
value in arg[i] is unspecified when argnull[i] is true.
|
|
|
|
|
|
|
|
It is generally the responsibility of the caller to ensure that the
|
|
|
|
number of arguments passed matches what the callee is expecting; except
|
|
|
|
for callees that take a variable number of arguments, the callee will
|
|
|
|
typically ignore the nargs field and just grab values from arg[].
|
|
|
|
|
|
|
|
The isnull field will be initialized to "false" before the call. On
|
|
|
|
return from the function, isnull is the null flag for the function result:
|
|
|
|
if it is true the function's result is NULL, regardless of the actual
|
|
|
|
function return value. Note that simple "strict" functions can ignore
|
|
|
|
both isnull and argnull[], since they won't even get called when there
|
|
|
|
are any TRUE values in argnull[].
|
|
|
|
|
|
|
|
FunctionCallInfo replaces FmgrValues plus a bunch of ad-hoc parameter
|
|
|
|
conventions, global variables (fmgr_pl_finfo and CurrentTriggerData at
|
|
|
|
least), and other uglinesses.
|
|
|
|
|
|
|
|
|
|
|
|
Callees, whether they be individual functions or function handlers,
|
|
|
|
shall always have this signature:
|
|
|
|
|
|
|
|
Datum function (FunctionCallInfo fcinfo);
|
|
|
|
|
|
|
|
which is represented by the typedef
|
|
|
|
|
|
|
|
typedef Datum (*PGFunction) (FunctionCallInfo fcinfo);
|
|
|
|
|
|
|
|
The function is responsible for setting fcinfo->isnull appropriately
|
|
|
|
as well as returning a result represented as a Datum. Note that since
|
|
|
|
all callees will now have exactly the same signature, and will be called
|
|
|
|
through a function pointer declared with exactly that signature, we
|
|
|
|
should have no portability or optimization problems.
|
|
|
|
|
|
|
|
|
|
|
|
Function coding conventions
|
|
|
|
---------------------------
|
|
|
|
|
|
|
|
As an example, int4 addition goes from old-style
|
|
|
|
|
|
|
|
int32
|
|
|
|
int4pl(int32 arg1, int32 arg2)
|
|
|
|
{
|
|
|
|
return arg1 + arg2;
|
|
|
|
}
|
|
|
|
|
|
|
|
to new-style
|
|
|
|
|
|
|
|
Datum
|
|
|
|
int4pl(FunctionCallInfo fcinfo)
|
|
|
|
{
|
|
|
|
/* we assume the function is marked "strict", so we can ignore
|
|
|
|
* NULL-value handling */
|
|
|
|
|
|
|
|
return Int32GetDatum(DatumGetInt32(fcinfo->arg[0]) +
|
|
|
|
DatumGetInt32(fcinfo->arg[1]));
|
|
|
|
}
|
|
|
|
|
|
|
|
This is, of course, much uglier than the old-style code, but we can
|
|
|
|
improve matters with some well-chosen macros for the boilerplate parts.
|
|
|
|
I propose below macros that would make the code look like
|
|
|
|
|
|
|
|
Datum
|
|
|
|
int4pl(PG_FUNCTION_ARGS)
|
|
|
|
{
|
|
|
|
int32 arg1 = PG_GETARG_INT32(0);
|
|
|
|
int32 arg2 = PG_GETARG_INT32(1);
|
|
|
|
|
|
|
|
PG_RETURN_INT32( arg1 + arg2 );
|
|
|
|
}
|
|
|
|
|
|
|
|
This is still more code than before, but it's fairly readable, and it's
|
|
|
|
also amenable to machine processing --- for example, we could probably
|
|
|
|
write a script that scans code like this and extracts argument and result
|
|
|
|
type info for comparison to the pg_proc table.
|
|
|
|
|
|
|
|
For the standard data types float4, float8, and int8, these macros should
|
|
|
|
hide the indirection and space allocation involved, so that the function's
|
|
|
|
code is not explicitly aware that these types are pass-by-reference. This
|
|
|
|
will offer a considerable gain in readability, and it also opens up the
|
|
|
|
opportunity to make these types be pass-by-value on machines where it's
|
|
|
|
feasible to do so. (For example, on an Alpha it's pretty silly to make int8
|
|
|
|
be pass-by-ref, since Datum is going to be 64 bits anyway. float4 could
|
|
|
|
become pass-by-value on all machines...)
|
|
|
|
|
|
|
|
Here are the proposed macros and coding conventions:
|
|
|
|
|
|
|
|
The definition of an fmgr-callable function will always look like
|
|
|
|
|
|
|
|
Datum
|
|
|
|
function_name(PG_FUNCTION_ARGS)
|
|
|
|
{
|
|
|
|
...
|
|
|
|
}
|
|
|
|
|
|
|
|
"PG_FUNCTION_ARGS" just expands to "FunctionCallInfo fcinfo". The main
|
|
|
|
reason for using this macro is to make it easy for scripts to spot function
|
|
|
|
definitions. However, if we ever decide to change the calling convention
|
|
|
|
again, it might come in handy to have this macro in place.
|
|
|
|
|
|
|
|
A nonstrict function is responsible for checking whether each individual
|
|
|
|
argument is null or not, which it can do with PG_ARGISNULL(n) (which is
|
|
|
|
just "fcinfo->argnull[n]"). It should avoid trying to fetch the value
|
|
|
|
of any argument that is null.
|
|
|
|
|
|
|
|
Both strict and nonstrict functions can return NULL, if needed, with
|
|
|
|
PG_RETURN_NULL();
|
|
|
|
which expands to
|
|
|
|
{ fcinfo->isnull = true; return (Datum) 0; }
|
|
|
|
|
|
|
|
Argument values are ordinarily fetched using code like
|
|
|
|
int32 name = PG_GETARG_INT32(number);
|
|
|
|
|
|
|
|
For float4, float8, and int8, the PG_GETARG macros will hide the pass-by-
|
|
|
|
reference nature of the data types; for example PG_GETARG_FLOAT4 expands to
|
|
|
|
(* (float4 *) DatumGetPointer(fcinfo->arg[number]))
|
|
|
|
and would typically be called like this:
|
|
|
|
float4 arg = PG_GETARG_FLOAT4(0);
|
|
|
|
Note that "float4" and "float8" are the recommended typedefs to use, not
|
|
|
|
"float32data" and "float64data", and the macros are named accordingly.
|
|
|
|
But 64-bit ints should be declared as "int64".
|
|
|
|
|
|
|
|
Non-null values are returned with a PG_RETURN_XXX macro of the appropriate
|
|
|
|
type. For example, PG_RETURN_INT32 expands to
|
|
|
|
return Int32GetDatum(x)
|
|
|
|
PG_RETURN_FLOAT4, PG_RETURN_FLOAT8, and PG_RETURN_INT64 hide the pass-by-
|
|
|
|
reference nature of their datatypes.
|
|
|
|
|
|
|
|
fmgr.h will provide PG_GETARG and PG_RETURN macros for all the basic data
|
|
|
|
types. Modules or header files that define specialized SQL datatypes
|
|
|
|
(eg, timestamp) should define appropriate macros for those types, so that
|
|
|
|
functions manipulating the types can be coded in the standard style.
|
|
|
|
|
2000-11-19 23:07:16 +01:00
|
|
|
For non-primitive data types (particularly variable-length types) it won't
|
|
|
|
be very practical to hide the pass-by-reference nature of the data type,
|
|
|
|
so the PG_GETARG and PG_RETURN macros for those types won't do much more
|
|
|
|
than DatumGetPointer/PointerGetDatum plus the appropriate typecast (but see
|
|
|
|
TOAST discussion, below). Functions returning such types will need to
|
|
|
|
palloc() their result space explicitly. I recommend naming the GETARG and
|
|
|
|
RETURN macros for such types to end in "_P", as a reminder that they
|
2000-05-28 20:06:55 +02:00
|
|
|
produce or take a pointer. For example, PG_GETARG_TEXT_P yields "text *".
|
|
|
|
|
|
|
|
When a function needs to access fcinfo->flinfo or one of the other auxiliary
|
|
|
|
fields of FunctionCallInfo, it should just do it. I doubt that providing
|
|
|
|
syntactic-sugar macros for these cases is useful.
|
|
|
|
|
|
|
|
|
|
|
|
Call-site coding conventions
|
|
|
|
----------------------------
|
|
|
|
|
|
|
|
There are many places in the system that call either a specific function
|
|
|
|
(for example, the parser invokes "textin" by name in places) or a
|
|
|
|
particular group of functions that have a common argument list (for
|
|
|
|
example, the optimizer invokes selectivity estimation functions with
|
|
|
|
a fixed argument list). These places will need to change, but we should
|
|
|
|
try to avoid making them significantly uglier than before.
|
|
|
|
|
|
|
|
Places that invoke an arbitrary function with an arbitrary argument list
|
|
|
|
can simply be changed to fill a FunctionCallInfoData structure directly;
|
|
|
|
that'll be no worse and possibly cleaner than what they do now.
|
|
|
|
|
|
|
|
When invoking a specific built-in function by name, we have generally
|
|
|
|
just written something like
|
|
|
|
result = textin ( ... args ... )
|
|
|
|
which will not work after textin() is converted to the new call style.
|
|
|
|
I suggest that code like this be converted to use "helper" functions
|
|
|
|
that will create and fill in a FunctionCallInfoData struct. For
|
|
|
|
example, if textin is being called with one argument, it'd look
|
|
|
|
something like
|
|
|
|
result = DirectFunctionCall1(textin, PointerGetDatum(argument));
|
|
|
|
These helper routines will have declarations like
|
|
|
|
Datum DirectFunctionCall2(PGFunction func, Datum arg1, Datum arg2);
|
|
|
|
Note it will be the caller's responsibility to convert to and from
|
|
|
|
Datum; appropriate conversion macros should be used.
|
|
|
|
|
|
|
|
The DirectFunctionCallN routines will not bother to fill in
|
|
|
|
fcinfo->flinfo (indeed cannot, since they have no idea about an OID for
|
|
|
|
the target function); they will just set it NULL. This is unlikely to
|
|
|
|
bother any built-in function that could be called this way. Note also
|
|
|
|
that this style of coding cannot pass a NULL input value nor cope with
|
|
|
|
a NULL result (it couldn't before, either!). We can make the helper
|
|
|
|
routines elog an error if they see that the function returns a NULL.
|
|
|
|
|
|
|
|
When invoking a function that has a known argument signature, we have
|
|
|
|
usually written either
|
|
|
|
result = fmgr(targetfuncOid, ... args ... );
|
|
|
|
or
|
|
|
|
result = fmgr_ptr(FmgrInfo *finfo, ... args ... );
|
|
|
|
depending on whether an FmgrInfo lookup has been done yet or not.
|
|
|
|
This kind of code can be recast using helper routines, in the same
|
|
|
|
style as above:
|
|
|
|
result = OidFunctionCall1(funcOid, PointerGetDatum(argument));
|
|
|
|
result = FunctionCall2(funcCallInfo,
|
|
|
|
PointerGetDatum(argument),
|
|
|
|
Int32GetDatum(argument));
|
|
|
|
Again, this style of coding does not allow for expressing NULL inputs
|
|
|
|
or receiving a NULL result.
|
|
|
|
|
|
|
|
As with the callee-side situation, I propose adding argument conversion
|
|
|
|
macros that hide the pass-by-reference nature of int8, float4, and
|
|
|
|
float8, with an eye to making those types relatively painless to convert
|
|
|
|
to pass-by-value.
|
|
|
|
|
|
|
|
The existing helper functions fmgr(), fmgr_c(), etc will be left in
|
|
|
|
place until all uses of them are gone. Of course their internals will
|
|
|
|
have to change in the first step of implementation, but they can
|
|
|
|
continue to support the same external appearance.
|
|
|
|
|
|
|
|
|
2000-11-19 23:07:16 +01:00
|
|
|
Support for TOAST-able data types
|
|
|
|
---------------------------------
|
|
|
|
|
|
|
|
For TOAST-able data types, the PG_GETARG macro will deliver a de-TOASTed
|
|
|
|
data value. There might be a few cases where the still-toasted value is
|
|
|
|
wanted, but the vast majority of cases want the de-toasted result, so
|
|
|
|
that will be the default. To get the argument value without causing
|
|
|
|
de-toasting, use PG_GETARG_RAW_VARLENA_P(n).
|
|
|
|
|
|
|
|
Some functions require a modifiable copy of their input values. In these
|
|
|
|
cases, it's silly to do an extra copy step if we copied the data anyway
|
|
|
|
to de-TOAST it. Therefore, each toastable datatype has an additional
|
|
|
|
fetch macro, for example PG_GETARG_TEXT_P_COPY(n), which delivers a
|
|
|
|
guaranteed-fresh copy, combining this with the detoasting step if possible.
|
|
|
|
|
|
|
|
There is also a PG_FREE_IF_COPY(ptr,n) macro, which pfree's the given
|
|
|
|
pointer if and only if it is different from the original value of the n'th
|
|
|
|
argument. This can be used to free the de-toasted value of the n'th
|
|
|
|
argument, if it was actually de-toasted. Currently, doing this is not
|
|
|
|
necessary for the majority of functions because the core backend code
|
|
|
|
releases temporary space periodically, so that memory leaked in function
|
|
|
|
execution isn't a big problem. However, as of 7.1 memory leaks in
|
|
|
|
functions that are called by index searches will not be cleaned up until
|
|
|
|
end of transaction. Therefore, functions that are listed in pg_amop or
|
|
|
|
pg_amproc should be careful not to leak detoasted copies, and so these
|
|
|
|
functions do need to use PG_FREE_IF_COPY() for toastable inputs.
|
|
|
|
|
|
|
|
A function should never try to re-TOAST its result value; it should just
|
|
|
|
deliver an untoasted result that's been palloc'd in the current memory
|
|
|
|
context. When and if the value is actually stored into a tuple, the
|
|
|
|
tuple toaster will decide whether toasting is needed.
|
|
|
|
|
|
|
|
|
|
|
|
Functions accepting or returning sets
|
|
|
|
-------------------------------------
|
|
|
|
|
2002-08-30 02:28:41 +02:00
|
|
|
[ this section revised 29-Aug-2002 for 7.3 ]
|
2000-11-19 23:07:16 +01:00
|
|
|
|
|
|
|
If a function is marked in pg_proc as returning a set, then it is called
|
|
|
|
with fcinfo->resultinfo pointing to a node of type ReturnSetInfo. A
|
|
|
|
function that desires to return a set should raise an error "called in
|
|
|
|
context that does not accept a set result" if resultinfo is NULL or does
|
2002-08-30 02:28:41 +02:00
|
|
|
not point to a ReturnSetInfo node.
|
|
|
|
|
|
|
|
There are currently two modes in which a function can return a set result:
|
|
|
|
value-per-call, or materialize. In value-per-call mode, the function returns
|
|
|
|
one value each time it is called, and finally reports "done" when it has no
|
|
|
|
more values to return. In materialize mode, the function's output set is
|
|
|
|
instantiated in a Tuplestore object; all the values are returned in one call.
|
|
|
|
Additional modes might be added in future.
|
|
|
|
|
|
|
|
ReturnSetInfo contains a field "allowedModes" which is set (by the caller)
|
|
|
|
to a bitmask that's the OR of the modes the caller can support. The actual
|
|
|
|
mode used by the function is returned in another field "returnMode". For
|
|
|
|
backwards-compatibility reasons, returnMode is initialized to value-per-call
|
|
|
|
and need only be changed if the function wants to use a different mode.
|
|
|
|
The function should elog() if it cannot use any of the modes the caller is
|
|
|
|
willing to support.
|
|
|
|
|
|
|
|
Value-per-call mode works like this: ReturnSetInfo contains a field
|
2000-11-19 23:07:16 +01:00
|
|
|
"isDone", which should be set to one of these values:
|
|
|
|
|
|
|
|
ExprSingleResult /* expression does not return a set */
|
|
|
|
ExprMultipleResult /* this result is an element of a set */
|
|
|
|
ExprEndResult /* there are no more elements in the set */
|
|
|
|
|
2002-08-30 02:28:41 +02:00
|
|
|
(the caller will initialize it to ExprSingleResult). If the function simply
|
|
|
|
returns a Datum without touching ReturnSetInfo, then the call is over and a
|
|
|
|
single-item set has been returned. To return a set, the function must set
|
|
|
|
isDone to ExprMultipleResult for each set element. After all elements have
|
|
|
|
been returned, the next call should set isDone to ExprEndResult and return a
|
|
|
|
null result. (Note it is possible to return an empty set by doing this on
|
|
|
|
the first call.)
|
2000-11-19 23:07:16 +01:00
|
|
|
|
2002-08-30 02:28:41 +02:00
|
|
|
The ReturnSetInfo node also contains a link to the ExprContext within which
|
|
|
|
the function is being evaluated. This is useful for value-per-call functions
|
2002-05-12 22:10:05 +02:00
|
|
|
that need to close down internal state when they are not run to completion:
|
|
|
|
they can register a shutdown callback function in the ExprContext.
|
|
|
|
|
2002-08-30 02:28:41 +02:00
|
|
|
Materialize mode works like this: the function creates a Tuplestore holding
|
|
|
|
the (possibly empty) result set, and returns it. There are no multiple calls.
|
|
|
|
The function must also return a TupleDesc that indicates the tuple structure.
|
|
|
|
The Tuplestore and TupleDesc should be created in the context
|
|
|
|
econtext->ecxt_per_query_memory (note this will *not* be the context the
|
|
|
|
function is called in). The function stores pointers to the Tuplestore and
|
|
|
|
TupleDesc into ReturnSetInfo, sets returnMode to indicate materialize mode,
|
|
|
|
and returns null. isDone is not used and should be left at ExprSingleResult.
|
|
|
|
|
2002-08-31 01:59:46 +02:00
|
|
|
If the function is being called as a table function (ie, it appears in a
|
|
|
|
FROM item), then the expected tuple descriptor is passed in ReturnSetInfo;
|
|
|
|
in other contexts the expectedDesc field will be NULL. The function need
|
|
|
|
not pay attention to expectedDesc, but it may be useful in special cases.
|
|
|
|
|
2002-08-30 02:28:41 +02:00
|
|
|
There is no support for functions accepting sets; instead, the function will
|
|
|
|
be called multiple times, once for each element of the input set.
|
|
|
|
|
2000-11-19 23:07:16 +01:00
|
|
|
|
2000-05-28 20:06:55 +02:00
|
|
|
Notes about function handlers
|
|
|
|
-----------------------------
|
|
|
|
|
|
|
|
Handlers for classes of functions should find life much easier and
|
|
|
|
cleaner in this design. The OID of the called function is directly
|
|
|
|
reachable from the passed parameters; we don't need the global variable
|
|
|
|
fmgr_pl_finfo anymore. Also, by modifying fcinfo->flinfo->fn_extra,
|
|
|
|
the handler can cache lookup info to avoid repeat lookups when the same
|
|
|
|
function is invoked many times. (fn_extra can only be used as a hint,
|
|
|
|
since callers are not required to re-use an FmgrInfo struct.
|
|
|
|
But in performance-critical paths they normally will do so.)
|
|
|
|
|
2000-11-19 23:07:16 +01:00
|
|
|
If the handler wants to allocate memory to hold fn_extra data, it should
|
|
|
|
NOT do so in CurrentMemoryContext, since the current context may well be
|
|
|
|
much shorter-lived than the context where the FmgrInfo is. Instead,
|
|
|
|
allocate the memory in context flinfo->fn_mcxt, or in a long-lived cache
|
|
|
|
context. fn_mcxt normally points at the context that was
|
|
|
|
CurrentMemoryContext at the time the FmgrInfo structure was created;
|
|
|
|
in any case it is required to be a context at least as long-lived as the
|
|
|
|
FmgrInfo itself.
|
|
|
|
|
|
|
|
|
|
|
|
Telling the difference between old- and new-style functions
|
|
|
|
-----------------------------------------------------------
|
|
|
|
|
|
|
|
During the conversion process, we carried two different pg_language
|
|
|
|
entries, "internal" and "newinternal", for internal functions. The
|
|
|
|
function manager used the language code to distinguish which calling
|
|
|
|
convention to use. (Old-style internal functions were supported via
|
|
|
|
a function handler.) As of Nov. 2000, no old-style internal functions
|
|
|
|
remain, so we can drop support for them. We will remove the old "internal"
|
|
|
|
pg_language entry and rename "newinternal" to "internal".
|
|
|
|
|
|
|
|
The interim solution for dynamically-loaded compiled functions has been
|
|
|
|
similar: two pg_language entries "C" and "newC". This naming convention
|
|
|
|
is not desirable for the long run, and yet we cannot stop supporting
|
|
|
|
old-style user functions. Instead, it seems better to use just one
|
|
|
|
pg_language entry "C", and require the dynamically-loaded library to
|
|
|
|
provide additional information that identifies new-style functions.
|
|
|
|
This avoids compatibility problems --- for example, existing dump
|
|
|
|
scripts will identify PL language handlers as being in language "C",
|
|
|
|
which would be wrong under the "newC" convention. Also, this approach
|
|
|
|
should generalize more conveniently for future extensions to the function
|
|
|
|
interface specification.
|
|
|
|
|
|
|
|
Given a dynamically loaded function named "foo" (note that the name being
|
|
|
|
considered here is the link-symbol name, not the SQL-level function name),
|
|
|
|
the function manager will look for another function in the same dynamically
|
|
|
|
loaded library named "pg_finfo_foo". If this second function does not
|
|
|
|
exist, then foo is assumed to be called old-style, thus ensuring backwards
|
|
|
|
compatibility with existing libraries. If the info function does exist,
|
|
|
|
it is expected to have the signature
|
|
|
|
|
|
|
|
Pg_finfo_record * pg_finfo_foo (void);
|
|
|
|
|
|
|
|
The info function will be called by the fmgr, and must return a pointer
|
|
|
|
to a Pg_finfo_record struct. (The returned struct will typically be a
|
|
|
|
statically allocated constant in the dynamic-link library.) The current
|
|
|
|
definition of the struct is just
|
|
|
|
|
|
|
|
typedef struct {
|
|
|
|
int api_version;
|
|
|
|
} Pg_finfo_record;
|
|
|
|
|
|
|
|
where api_version is 0 to indicate old-style or 1 to indicate new-style
|
|
|
|
calling convention. In future releases, additional fields may be defined
|
|
|
|
after api_version, but these additional fields will only be used if
|
2000-11-19 23:11:56 +01:00
|
|
|
api_version is greater than 1.
|
2000-11-19 23:07:16 +01:00
|
|
|
|
|
|
|
These details will be hidden from the author of a dynamically loaded
|
|
|
|
function by using a macro. To define a new-style dynamically loaded
|
|
|
|
function named foo, write
|
|
|
|
|
|
|
|
PG_FUNCTION_INFO_V1(foo);
|
|
|
|
|
|
|
|
Datum
|
|
|
|
foo(PG_FUNCTION_ARGS)
|
|
|
|
{
|
|
|
|
...
|
|
|
|
}
|
|
|
|
|
|
|
|
The function itself is written using the same conventions as for new-style
|
|
|
|
internal functions; you just need to add the PG_FUNCTION_INFO_V1() macro.
|
|
|
|
Note that old-style and new-style functions can be intermixed in the same
|
|
|
|
library, depending on whether or not you write a PG_FUNCTION_INFO_V1() for
|
|
|
|
each one.
|
|
|
|
|
|
|
|
The SQL declaration for a dynamically-loaded function is CREATE FUNCTION
|
|
|
|
foo ... LANGUAGE 'C' regardless of whether it is old- or new-style.
|
|
|
|
|
|
|
|
New-style dynamic functions will be invoked directly by fmgr, and will
|
|
|
|
therefore have the same performance as internal functions after the initial
|
|
|
|
pg_proc lookup overhead. Old-style dynamic functions will be invoked via
|
|
|
|
a handler, and will therefore have a small performance penalty.
|
|
|
|
|
|
|
|
To allow old-style dynamic functions to work safely on toastable datatypes,
|
|
|
|
the handler for old-style functions will automatically detoast toastable
|
|
|
|
arguments before passing them to the old-style function. A new-style
|
|
|
|
function is expected to take care of toasted arguments by using the
|
|
|
|
standard argument access macros defined above.
|