postgresql/doc/src/sgml/xfunc.sgml

2224 lines
72 KiB
Plaintext
Raw Normal View History

<!--
$Header: /cvsroot/pgsql/doc/src/sgml/xfunc.sgml,v 1.53 2002/07/18 04:47:17 momjian Exp $
-->
<chapter id="xfunc">
<title id="xfunc-title">Extending <acronym>SQL</acronym>: Functions</title>
2001-11-12 20:19:39 +01:00
<indexterm zone="xfunc"><primary>function</></>
2001-09-15 21:56:59 +02:00
<sect1 id="xfunc-intro">
<title>Introduction</title>
<comment>
Historically, functions were perhaps considered a tool for creating
types. Today, few people build their own types but many write
their own functions. This introduction ought to be changed to
reflect this.
</comment>
<para>
As it turns out, part of defining a new type is the
definition of functions that describe its behavior.
Consequently, while it is possible to define a new
function without defining a new type, the reverse is
not true. We therefore describe how to add new functions
to <productname>PostgreSQL</productname> before describing
how to add new types.
</para>
<para>
2001-09-15 21:56:59 +02:00
<productname>PostgreSQL</productname> provides four kinds of
functions:
<itemizedlist>
<listitem>
<para>
query language functions
(functions written in <acronym>SQL</acronym>)
</para>
</listitem>
<listitem>
<para>
procedural language
2001-09-15 21:56:59 +02:00
functions (functions written in, for example, <application>PL/Tcl</> or <application>PL/pgSQL</>)
</para>
</listitem>
<listitem>
<para>
internal functions
</para>
</listitem>
<listitem>
<para>
C language functions
</para>
</listitem>
</itemizedlist>
2001-09-15 21:56:59 +02:00
</para>
2001-09-15 21:56:59 +02:00
<para>
Every kind
2002-01-07 03:29:15 +01:00
of function can take a base type, a composite type, or
some combination as arguments (parameters). In addition,
every kind of function can return a base type or
a composite type. It's easiest to define <acronym>SQL</acronym>
functions, so we'll start with those. Examples in this section
can also be found in <filename>funcs.sql</filename>
2001-09-15 21:56:59 +02:00
and <filename>funcs.c</filename> in the tutorial directory.
</para>
2002-01-07 03:29:15 +01:00
<para>
Throughout this chapter, it can be useful to look at the reference
page of the <command>CREATE FUNCTION</command> command to
understand the examples better.
</para>
2001-09-15 21:56:59 +02:00
</sect1>
<sect1 id="xfunc-sql">
<title>Query Language (<acronym>SQL</acronym>) Functions</title>
2001-11-12 20:19:39 +01:00
<indexterm zone="xfunc-sql"><primary>function</><secondary>SQL</></>
<para>
2001-09-15 21:56:59 +02:00
SQL functions execute an arbitrary list of SQL statements, returning
2001-11-01 05:07:29 +01:00
the result of the last query in the list, which must be a
<literal>SELECT</>.
In the simple (non-set)
case, the first row of the last query's result will be returned.
2002-03-22 20:20:45 +01:00
(Bear in mind that <quote>the first row</quote> of a multirow
2001-11-01 05:07:29 +01:00
result is not well-defined unless you use <literal>ORDER BY</>.)
If the last query happens
to return no rows at all, NULL will be returned.
</para>
<para>
2001-11-12 20:19:39 +01:00
<indexterm><primary>SETOF</><seealso>function</></>
Alternatively, an SQL function may be declared to return a set,
by specifying the function's return type
as <literal>SETOF</literal> <replaceable>sometype</>. In this case
all rows of the last query's result are returned. Further details
appear below.
</para>
<para>
2001-09-15 21:56:59 +02:00
The body of an SQL function should be a list of one or more SQL
statements separated by semicolons. Note that because the syntax
of the <command>CREATE FUNCTION</command> command requires the body of the
function to be enclosed in single quotes, single quote marks
(<literal>'</>) used
2001-09-15 21:56:59 +02:00
in the body of the function must be escaped, by writing two single
quotes (<literal>''</>) or a backslash (<literal>\'</>) where each
quote is desired.
</para>
<para>
2001-09-15 21:56:59 +02:00
Arguments to the SQL function may be referenced in the function
body using the syntax <literal>$<replaceable>n</></>: $1 refers to
the first argument, $2 to the second, and so on. If an argument
is of a composite type, then the <quote>dot notation</quote>,
e.g., <literal>$1.emp</literal>, may be used to access attributes
of the argument.
</para>
<sect2>
<title>Examples</title>
<para>
To illustrate a simple SQL function, consider the following,
which might be used to debit a bank account:
2001-09-15 21:56:59 +02:00
<programlisting>
CREATE FUNCTION tp1 (integer, numeric) RETURNS integer AS '
2001-09-15 21:56:59 +02:00
UPDATE bank
SET balance = balance - $2
WHERE accountno = $1;
SELECT 1;
2001-09-15 21:56:59 +02:00
' LANGUAGE SQL;
</programlisting>
A user could execute this function to debit account 17 by $100.00 as
follows:
2001-09-15 21:56:59 +02:00
<programlisting>
SELECT tp1(17, 100.0);
</programlisting>
</para>
<para>
In practice one would probably like a more useful result from the
function than a constant <quote>1</>, so a more likely definition
is
2001-09-15 21:56:59 +02:00
<programlisting>
CREATE FUNCTION tp1 (integer, numeric) RETURNS numeric AS '
UPDATE bank
SET balance = balance - $2
WHERE accountno = $1;
SELECT balance FROM bank WHERE accountno = $1;
2001-09-15 21:56:59 +02:00
' LANGUAGE SQL;
</programlisting>
which adjusts the balance and returns the new balance.
</para>
<para>
Any collection of commands in the <acronym>SQL</acronym>
language can be packaged together and defined as a function.
The commands can include data modification (i.e.,
<command>INSERT</command>, <command>UPDATE</command>, and
<command>DELETE</command>) as well
as <command>SELECT</command> queries. However, the final command
must be a <command>SELECT</command> that returns whatever is
specified as the function's return type.
<programlisting>
CREATE FUNCTION clean_EMP () RETURNS integer AS '
DELETE FROM EMP
WHERE EMP.salary &lt;= 0;
SELECT 1 AS ignore_this;
' LANGUAGE SQL;
SELECT clean_EMP();
</programlisting>
<screen>
x
---
1
</screen>
</para>
</sect2>
<sect2>
<title><acronym>SQL</acronym> Functions on Base Types</title>
<para>
The simplest possible <acronym>SQL</acronym> function has no arguments and
2001-09-15 21:56:59 +02:00
simply returns a base type, such as <type>integer</type>:
1998-03-01 09:16:16 +01:00
2001-09-15 21:56:59 +02:00
<programlisting>
CREATE FUNCTION one() RETURNS integer AS '
SELECT 1 as RESULT;
' LANGUAGE SQL;
1998-03-01 09:16:16 +01:00
SELECT one();
2001-09-15 21:56:59 +02:00
</programlisting>
1998-03-01 09:16:16 +01:00
2001-09-15 21:56:59 +02:00
<screen>
one
-----
1
2001-09-15 21:56:59 +02:00
</screen>
</para>
2001-09-15 21:56:59 +02:00
<para>
2001-09-15 21:56:59 +02:00
Notice that we defined a column alias within the function body for the result of the function
(with the name <literal>RESULT</>), but this column alias is not visible
2002-01-20 23:19:57 +01:00
outside the function. Hence, the result is labeled <literal>one</>
instead of <literal>RESULT</>.
</para>
2001-09-15 21:56:59 +02:00
<para>
2001-09-15 21:56:59 +02:00
It is almost as easy to define <acronym>SQL</acronym> functions
that take base types as arguments. In the example below, notice
2001-09-15 21:56:59 +02:00
how we refer to the arguments within the function as <literal>$1</>
and <literal>$2</>:
2001-09-15 21:56:59 +02:00
<programlisting>
CREATE FUNCTION add_em(integer, integer) RETURNS integer AS '
SELECT $1 + $2;
' LANGUAGE SQL;
1998-03-01 09:16:16 +01:00
SELECT add_em(1, 2) AS answer;
2001-09-15 21:56:59 +02:00
</programlisting>
1998-03-01 09:16:16 +01:00
2001-09-15 21:56:59 +02:00
<screen>
answer
--------
3
</screen>
</para>
</sect2>
1998-03-01 09:16:16 +01:00
<sect2>
<title><acronym>SQL</acronym> Functions on Composite Types</title>
1998-03-01 09:16:16 +01:00
<para>
1998-03-01 09:16:16 +01:00
When specifying functions with arguments of composite
types, we must not only specify which
2001-09-15 21:56:59 +02:00
argument we want (as we did above with <literal>$1</> and <literal>$2</literal>) but
also the attributes of that argument. For example, suppose that
<type>EMP</type> is a table containing employee data, and therefore
also the name of the composite type of each row of the table. Here
is a function <function>double_salary</function> that computes what your
salary would be if it were doubled:
2001-09-15 21:56:59 +02:00
<programlisting>
CREATE FUNCTION double_salary(EMP) RETURNS integer AS '
SELECT $1.salary * 2 AS salary;
' LANGUAGE SQL;
SELECT name, double_salary(EMP) AS dream
FROM EMP
WHERE EMP.cubicle ~= point '(2,1)';
2001-09-15 21:56:59 +02:00
</programlisting>
2001-09-15 21:56:59 +02:00
<screen>
name | dream
------+-------
Sam | 2400
</screen>
</para>
2001-09-15 21:56:59 +02:00
<para>
Notice the use of the syntax <literal>$1.salary</literal>
to select one field of the argument row value. Also notice
2002-01-07 03:29:15 +01:00
how the calling <command>SELECT</> command uses a table name to denote
the entire current row of that table as a composite value.
</para>
2001-09-15 21:56:59 +02:00
<para>
It is also possible to build a function that returns a composite type.
(However, as we'll see below, there are some
unfortunate restrictions on how the function may be used.)
This is an example of a function
2001-09-15 21:56:59 +02:00
that returns a single <type>EMP</type> row:
2001-09-15 21:56:59 +02:00
<programlisting>
CREATE FUNCTION new_emp() RETURNS EMP AS '
SELECT text ''None'' AS name,
1000 AS salary,
25 AS age,
2001-09-15 21:56:59 +02:00
point ''(2,2)'' AS cubicle;
' LANGUAGE SQL;
</programlisting>
</para>
2001-09-15 21:56:59 +02:00
<para>
1998-03-01 09:16:16 +01:00
In this case we have specified each of the attributes
with a constant value, but any computation or expression
could have been substituted for these constants.
Note two important things about defining the function:
<itemizedlist>
<listitem>
<para>
The target list order must be exactly the same as
that in which the columns appear in the table associated
with the composite type. (Naming the columns, as we did above,
is irrelevant to the system.)
</para>
</listitem>
<listitem>
<para>
You must typecast the expressions to match the
2001-09-15 21:56:59 +02:00
definition of the composite type, or you will get errors like this:
<screen>
<computeroutput>
ERROR: function declared to return emp returns varchar instead of text at column 1
2001-09-15 21:56:59 +02:00
</computeroutput>
</screen>
</para>
</listitem>
</itemizedlist>
</para>
<para>
In the present release of <productname>PostgreSQL</productname>
there are some unpleasant restrictions on how functions returning
composite types can be used. Briefly, when calling a function that
returns a row, we cannot retrieve the entire row. We must either
extract a single attribute out of the row or pass the entire row into
another function. (Trying to display the entire row value will yield
a meaningless number.) For example,
2001-09-15 21:56:59 +02:00
<programlisting>
SELECT (new_emp()).name;
2001-09-15 21:56:59 +02:00
</programlisting>
1998-03-01 09:16:16 +01:00
2001-09-15 21:56:59 +02:00
<screen>
name
------
2001-09-15 21:56:59 +02:00
None
</screen>
We need the extra parentheses to keep the parser from getting confused:
<screen>
SELECT new_emp().name;
ERROR: parser: parse error at or near "."
2001-09-15 21:56:59 +02:00
</screen>
</para>
<para>
Another approach is to use
functional notation for extracting attributes. The simple way
to explain this is that we can use the
notations <literal>attribute(table)</> and <literal>table.attribute</>
interchangeably:
<programlisting>
SELECT name(new_emp());
</programlisting>
<screen>
name
------
None
</screen>
<programlisting>
--
-- this is the same as:
-- SELECT EMP.name AS youngster FROM EMP WHERE EMP.age &lt; 30
--
SELECT name(EMP) AS youngster
FROM EMP
WHERE age(EMP) &lt; 30;
</programlisting>
<screen>
youngster
-----------
Sam
</screen>
</para>
<para>
Another way to use a function returning a row result is to declare a
2002-01-07 03:29:15 +01:00
second function accepting a row type parameter, and pass the function
result to it:
2001-09-15 21:56:59 +02:00
<programlisting>
CREATE FUNCTION getname(emp) RETURNS text AS
'SELECT $1.name;'
LANGUAGE SQL;
</programlisting>
<screen>
SELECT getname(new_emp());
getname
---------
None
(1 row)
</screen>
</para>
</sect2>
<sect2>
<title><acronym>SQL</acronym> Table Functions (Functions Returning Sets)</title>
<para>
A table function is one that may be used in the <command>FROM</command>
clause of a query. All SQL Language functions may be used in this manner.
If the function is defined to return a base type, the table function
produces a one column result set. If the function is defined to
return <literal>SETOF <replaceable>sometype</></literal>, the table
function returns multiple rows. To illustrate a SQL table function,
consider the following, which returns <literal>SETOF</literal> a
composite type:
<programlisting>
CREATE TABLE foo (fooid int, foosubid int, fooname text, primary key(fooid,foosubid));
INSERT INTO foo VALUES(1,1,'Joe');
INSERT INTO foo VALUES(1,2,'Ed');
INSERT INTO foo VALUES(2,1,'Mary');
CREATE FUNCTION getfoo(int) RETURNS setof foo AS '
SELECT * FROM foo WHERE fooid = $1;
' LANGUAGE SQL;
SELECT * FROM getfoo(1) AS t1;
</programlisting>
<screen>
fooid | foosubid | fooname
-------+----------+---------
1 | 1 | Joe
1 | 2 | Ed
(2 rows)
</screen>
</para>
<para>
When an SQL function is declared as returning <literal>SETOF
<replaceable>sometype</></literal>, the function's final
<command>SELECT</> query is executed to completion, and each row it
outputs is returned as an element of the set.
</para>
<para>
Functions returning sets may also currently be called in the target list
of a <command>SELECT</> query. For each row that the <command>SELECT</>
generates by itself, the function returning set is invoked, and an output
row is generated for each element of the function's result set. Note,
however, that this capability is deprecated and may be removed in future
releases. The following is an example function returning a set from the
target list:
<programlisting>
CREATE FUNCTION listchildren(text) RETURNS SETOF text AS
'SELECT name FROM nodes WHERE parent = $1'
LANGUAGE SQL;
2001-09-15 21:56:59 +02:00
</programlisting>
2001-09-15 21:56:59 +02:00
<screen>
SELECT * FROM nodes;
name | parent
-----------+--------
Top |
Child1 | Top
Child2 | Top
Child3 | Top
SubChild1 | Child1
SubChild2 | Child1
(6 rows)
SELECT listchildren('Top');
listchildren
--------------
Child1
Child2
Child3
(3 rows)
SELECT name, listchildren(name) FROM nodes;
name | listchildren
--------+--------------
Top | Child1
Top | Child2
Top | Child3
Child1 | SubChild1
Child1 | SubChild2
(5 rows)
2001-09-15 21:56:59 +02:00
</screen>
2002-01-07 03:29:15 +01:00
In the last <command>SELECT</command>,
notice that no output row appears for <literal>Child2</>, <literal>Child3</>, etc.
This happens because <function>listchildren</function> returns an empty set
for those inputs, so no output rows are generated.
</para>
</sect2>
</sect1>
<sect1 id="xfunc-pl">
<title>Procedural Language Functions</title>
<para>
2001-09-15 21:56:59 +02:00
Procedural languages aren't built into the <productname>PostgreSQL</productname> server; they are offered
by loadable modules. Please refer to the documentation of the
procedural language in question for details about the syntax and how the function body
is interpreted for each language.
</para>
1998-03-01 09:16:16 +01:00
<para>
There are currently four procedural languages available in the
standard <productname>PostgreSQL</productname> distribution:
2001-09-15 21:56:59 +02:00
<application>PL/pgSQL</application>, <application>PL/Tcl</application>,
<application>PL/Perl</application>, and <application>PL/Python</application>. Other languages can be
defined by users. Refer to <xref linkend="xplang"> for more
2001-09-15 21:56:59 +02:00
information. The basics of developing a new procedural language are covered in <xref linkend="xfunc-plhandler">.
</para>
</sect1>
1998-03-01 09:16:16 +01:00
<sect1 id="xfunc-internal">
<title>Internal Functions</title>
1998-03-01 09:16:16 +01:00
2001-11-12 20:19:39 +01:00
<indexterm zone="xfunc-internal"><primary>function</><secondary>internal</></>
<para>
Internal functions are functions written in C that have been statically
2001-09-15 21:56:59 +02:00
linked into the <productname>PostgreSQL</productname> server.
The <quote>body</quote> of the function definition
specifies the C-language name of the function, which need not be the
same as the name being declared for SQL use.
2001-09-15 21:56:59 +02:00
(For reasons of backwards compatibility, an empty body
is accepted as meaning that the C-language function name is the
same as the SQL name.)
</para>
<para>
2001-09-15 21:56:59 +02:00
Normally, all internal functions present in the
backend are declared during the initialization of the database cluster (<command>initdb</command>),
but a user could use <command>CREATE FUNCTION</command>
to create additional alias names for an internal function.
Internal functions are declared in <command>CREATE FUNCTION</command>
2001-09-15 21:56:59 +02:00
with language name <literal>internal</literal>. For instance, to
create an alias for the <function>sqrt</function> function:
<programlisting>
CREATE FUNCTION square_root(double precision) RETURNS double precision
AS 'dsqrt'
LANGUAGE INTERNAL
WITH (isStrict);
</programlisting>
(Most internal functions expect to be declared <quote>strict</quote>.)
</para>
2001-09-15 21:56:59 +02:00
<note>
<para>
Not all <quote>predefined</quote> functions are
<quote>internal</quote> in the above sense. Some predefined
functions are written in SQL.
</para>
</note>
</sect1>
<sect1 id="xfunc-c">
<title>C Language Functions</title>
<para>
User-defined functions can be written in C (or a language that can
be made compatible with C, such as C++). Such functions are
compiled into dynamically loadable objects (also called shared
libraries) and are loaded by the server on demand. The dynamic
loading feature is what distinguishes <quote>C language</> functions
from <quote>internal</> functions --- the actual coding conventions
are essentially the same for both. (Hence, the standard internal
function library is a rich source of coding examples for user-defined
C functions.)
</para>
<para>
Two different calling conventions are currently used for C functions.
The newer <quote>version 1</quote> calling convention is indicated by writing
a <literal>PG_FUNCTION_INFO_V1()</literal> macro call for the function,
as illustrated below. Lack of such a macro indicates an old-style
("version 0") function. The language name specified in <command>CREATE FUNCTION</command>
is <literal>C</literal> in either case. Old-style functions are now deprecated
because of portability problems and lack of functionality, but they
are still supported for compatibility reasons.
</para>
<sect2 id="xfunc-c-dynload">
<title>Dynamic Loading</title>
<para>
The first time a user-defined function in a particular
loadable object file is called in a backend session,
the dynamic loader loads that object file into memory so that the
function can be called. The <command>CREATE FUNCTION</command>
for a user-defined C function must therefore specify two pieces of
information for the function: the name of the loadable
object file, and the C name (link symbol) of the specific function to call
within that object file. If the C name is not explicitly specified then
it is assumed to be the same as the SQL function name.
</para>
<para>
The following algorithm is used to locate the shared object file
based on the name given in the <command>CREATE FUNCTION</command>
command:
<orderedlist>
<listitem>
<para>
If the name is an absolute path, the given file is loaded.
</para>
</listitem>
<listitem>
<para>
If the name starts with the string <literal>$libdir</literal>,
that part is replaced by the <productname>PostgreSQL</> package
library directory
2001-11-12 20:19:39 +01:00
name, which is determined at build time.<indexterm><primary>$libdir</></>
</para>
</listitem>
<listitem>
<para>
If the name does not contain a directory part, the file is
searched for in the path specified by the configuration variable
2001-11-12 20:19:39 +01:00
<varname>dynamic_library_path</varname>.<indexterm><primary>dynamic_library_path</></>
</para>
</listitem>
<listitem>
<para>
Otherwise (the file was not found in the path, or it contains a
non-absolute directory part), the dynamic loader will try to
take the name as given, which will most likely fail. (It is
unreliable to depend on the current working directory.)
</para>
</listitem>
</orderedlist>
If this sequence does not work, the platform-specific shared
library file name extension (often <filename>.so</filename>) is
appended to the given name and this sequence is tried again. If
that fails as well, the load will fail.
</para>
<note>
<para>
2002-01-07 03:29:15 +01:00
The user ID the <application>PostgreSQL</application> server runs
as must be able to traverse the path to the file you intend to
load. Making the file or a higher-level directory not readable
2002-03-22 20:20:45 +01:00
and/or not executable by the <systemitem>postgres</systemitem> user is a
common mistake.
</para>
</note>
<para>
In any case, the file name that is given in the
<command>CREATE FUNCTION</command> command is recorded literally
in the system catalogs, so if the file needs to be loaded again
the same procedure is applied.
</para>
<note>
<para>
<application>PostgreSQL</application> will not compile a C function
automatically. The object file must be compiled before it is referenced
in a <command>CREATE
FUNCTION</> command. See <xref linkend="dfunc"> for additional
information.
</para>
</note>
<note>
<para>
After it is used for the first time, a dynamically loaded object
file is retained in memory. Future calls in the same session to the
function(s) in that file will only incur the small overhead of a symbol
table lookup. If you need to force a reload of an object file, for
example after recompiling it, use the <command>LOAD</> command or
begin a fresh session.
</para>
</note>
<para>
It is recommended to locate shared libraries either relative to
<literal>$libdir</literal> or through the dynamic library path.
This simplifies version upgrades if the new installation is at a
different location. The actual directory that
<literal>$libdir</literal> stands for can be found out with the
command <literal>pg_config --pkglibdir</literal>.
</para>
<note>
<para>
Before <application>PostgreSQL</application> release 7.2, only exact
absolute paths to object files could be specified in <command>CREATE
FUNCTION</>. This approach is now deprecated since it makes the
function definition unnecessarily unportable. It's best to specify
just the shared library name with no path nor extension, and let
the search mechanism provide that information instead.
</para>
</note>
</sect2>
<sect2>
<title>Base Types in C-Language Functions</title>
<para>
2001-09-15 21:56:59 +02:00
<xref linkend="xfunc-c-type-table"> gives the C type required for
parameters in the C functions that will be loaded into
2002-01-07 03:29:15 +01:00
<productname>PostgreSQL</>.
2001-09-15 21:56:59 +02:00
The <quote>Defined In</quote> column gives the header file that
needs to be included to get the type definition. (The actual
definition may be in a different file that is included by the
listed file. It is recommended that users stick to the defined
interface.) Note that you should always include
<filename>postgres.h</filename> first in any source file, because
it declares a number of things that you will need anyway.
</para>
<table tocentry="1" id="xfunc-c-type-table">
<title>Equivalent C Types
2001-09-15 21:56:59 +02:00
for Built-In <productname>PostgreSQL</productname> Types</title>
<titleabbrev>Equivalent C Types</titleabbrev>
<tgroup cols="3">
<thead>
<row>
<entry>
2001-09-15 21:56:59 +02:00
SQL Type
</entry>
<entry>
C Type
</entry>
<entry>
Defined In
</entry>
</row>
</thead>
<tbody>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>abstime</type></entry>
<entry><type>AbsoluteTime</type></entry>
<entry><filename>utils/nabstime.h</filename></entry>
</row>
<row>
<entry><type>boolean</type></entry>
<entry><type>bool</type></entry>
<entry><filename>postgres.h</filename> (maybe compiler built-in)</entry>
</row>
<row>
<entry><type>box</type></entry>
<entry><type>BOX*</type></entry>
<entry><filename>utils/geo_decls.h</filename></entry>
2001-09-15 21:56:59 +02:00
</row>
<row>
<entry><type>bytea</type></entry>
<entry><type>bytea*</type></entry>
<entry><filename>postgres.h</filename></entry>
</row>
<row>
<entry><type>"char"</type></entry>
<entry><type>char</type></entry>
<entry>(compiler built-in)</entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>character</type></entry>
<entry><type>BpChar*</type></entry>
<entry><filename>postgres.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>cid</type></entry>
<entry><type>CommandId</type></entry>
<entry><filename>postgres.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>date</type></entry>
<entry><type>DateADT</type></entry>
<entry><filename>utils/date.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>smallint</type> (<type>int2</type>)</entry>
<entry><type>int2</type> or <type>int16</type></entry>
<entry><filename>postgres.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>int2vector</type></entry>
<entry><type>int2vector*</type></entry>
<entry><filename>postgres.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>integer</type> (<type>int4</type>)</entry>
<entry><type>int4</type> or <type>int32</type></entry>
<entry><filename>postgres.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>real</type> (<type>float4</type>)</entry>
<entry><type>float4*</type></entry>
<entry><filename>postgres.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>double precision</type> (<type>float8</type>)</entry>
<entry><type>float8*</type></entry>
<entry><filename>postgres.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>interval</type></entry>
<entry><type>Interval*</type></entry>
<entry><filename>utils/timestamp.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>lseg</type></entry>
<entry><type>LSEG*</type></entry>
<entry><filename>utils/geo_decls.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>name</type></entry>
<entry><type>Name</type></entry>
<entry><filename>postgres.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>oid</type></entry>
<entry><type>Oid</type></entry>
<entry><filename>postgres.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>oidvector</type></entry>
<entry><type>oidvector*</type></entry>
<entry><filename>postgres.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>path</type></entry>
<entry><type>PATH*</type></entry>
<entry><filename>utils/geo_decls.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>point</type></entry>
<entry><type>POINT*</type></entry>
<entry><filename>utils/geo_decls.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>regproc</type></entry>
<entry><type>regproc</type></entry>
<entry><filename>postgres.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>reltime</type></entry>
<entry><type>RelativeTime</type></entry>
<entry><filename>utils/nabstime.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>text</type></entry>
<entry><type>text*</type></entry>
<entry><filename>postgres.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>tid</type></entry>
<entry><type>ItemPointer</type></entry>
<entry><filename>storage/itemptr.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>time</type></entry>
<entry><type>TimeADT</type></entry>
<entry><filename>utils/date.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>time with time zone</type></entry>
<entry><type>TimeTzADT</type></entry>
<entry><filename>utils/date.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>timestamp</type></entry>
<entry><type>Timestamp*</type></entry>
<entry><filename>utils/timestamp.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>tinterval</type></entry>
<entry><type>TimeInterval</type></entry>
<entry><filename>utils/nabstime.h</filename></entry>
</row>
<row>
2001-09-15 21:56:59 +02:00
<entry><type>varchar</type></entry>
<entry><type>VarChar*</type></entry>
<entry><filename>postgres.h</filename></entry>
</row>
<row>
<entry><type>xid</type></entry>
<entry><type>TransactionId</type></entry>
<entry><filename>postgres.h</filename></entry>
</row>
</tbody>
</tgroup>
</table>
<para>
Internally, <productname>PostgreSQL</productname> regards a
base type as a <quote>blob of memory</quote>. The user-defined
functions that you define over a type in turn define the
way that <productname>PostgreSQL</productname> can operate
on it. That is, <productname>PostgreSQL</productname> will
only store and retrieve the data from disk and use your
user-defined functions to input, process, and output the data.
1998-03-01 09:16:16 +01:00
Base types can have one of three internal formats:
<itemizedlist>
<listitem>
<para>
pass by value, fixed-length
</para>
</listitem>
<listitem>
<para>
pass by reference, fixed-length
</para>
</listitem>
<listitem>
<para>
pass by reference, variable-length
</para>
</listitem>
</itemizedlist>
</para>
<para>
1998-03-01 09:16:16 +01:00
By-value types can only be 1, 2 or 4 bytes in length
2001-09-15 21:56:59 +02:00
(also 8 bytes, if <literal>sizeof(Datum)</literal> is 8 on your machine).
You should be careful
to define your types such that they will be the same
size (in bytes) on all architectures. For example, the
<literal>long</literal> type is dangerous because it
is 4 bytes on some machines and 8 bytes on others, whereas
2001-09-15 21:56:59 +02:00
<type>int</type> type is 4 bytes on most
Unix machines. A reasonable implementation of
the <type>int4</type> type on Unix
1998-03-01 09:16:16 +01:00
machines might be:
<programlisting>
/* 4-byte integer, passed by value */
typedef int int4;
</programlisting>
2001-09-15 21:56:59 +02:00
<productname>PostgreSQL</productname> automatically figures
things out so that the integer types really have the size they
advertise.
</para>
1998-03-01 09:16:16 +01:00
<para>
1998-03-01 09:16:16 +01:00
On the other hand, fixed-length types of any size may
be passed by-reference. For example, here is a sample
2001-09-15 21:56:59 +02:00
implementation of a <productname>PostgreSQL</productname> type:
1998-03-01 09:16:16 +01:00
<programlisting>
/* 16-byte structure, passed by reference */
typedef struct
{
double x, y;
} Point;
</programlisting>
</para>
<para>
1998-03-01 09:16:16 +01:00
Only pointers to such types can be used when passing
them in and out of <productname>PostgreSQL</productname> functions.
To return a value of such a type, allocate the right amount of
memory with <literal>palloc()</literal>, fill in the allocated memory,
and return a pointer to it. (Alternatively, you can return an input
value of the same type by returning its pointer. <emphasis>Never</>
modify the contents of a pass-by-reference input value, however.)
</para>
<para>
1998-03-01 09:16:16 +01:00
Finally, all variable-length types must also be passed
by reference. All variable-length types must begin
with a length field of exactly 4 bytes, and all data to
be stored within that type must be located in the memory
immediately following that length field. The
length field is the total length of the structure
(i.e., it includes the size of the length field
itself). We can define the text type as follows:
<programlisting>
typedef struct {
int4 length;
char data[1];
} text;
</programlisting>
</para>
<para>
Obviously, the data field declared here is not long enough to hold
all possible strings. Since it's impossible to declare a variable-size
structure in <acronym>C</acronym>, we rely on the knowledge that the
<acronym>C</acronym> compiler won't range-check array subscripts. We
just allocate the necessary amount of space and then access the array as
if it were declared the right length. (If this isn't a familiar trick to
you, you may wish to spend some time with an introductory
<acronym>C</acronym> programming textbook before delving deeper into
<productname>PostgreSQL</productname> server programming.)
When manipulating
variable-length types, we must be careful to allocate
the correct amount of memory and set the length field correctly.
For example, if we wanted to store 40 bytes in a text
1998-03-01 09:16:16 +01:00
structure, we might use a code fragment like this:
<programlisting>
#include "postgres.h"
...
char buffer[40]; /* our source data */
...
text *destination = (text *) palloc(VARHDRSZ + 40);
destination-&gt;length = VARHDRSZ + 40;
memcpy(destination-&gt;data, buffer, 40);
...
</programlisting>
<literal>VARHDRSZ</> is the same as <literal>sizeof(int4)</>, but
it's considered good style to use the macro <literal>VARHDRSZ</>
to refer to the size of the overhead for a variable-length type.
</para>
<para>
1998-03-01 09:16:16 +01:00
Now that we've gone over all of the possible structures
for base types, we can show some examples of real functions.
</para>
</sect2>
<sect2>
<title>Version-0 Calling Conventions for C-Language Functions</title>
<para>
We present the <quote>old style</quote> calling convention first --- although
this approach is now deprecated, it's easier to get a handle on
initially. In the version-0 method, the arguments and result
of the C function are just declared in normal C style, but being
careful to use the C representation of each SQL data type as shown
above.
</para>
<para>
Here are some examples:
<programlisting>
#include "postgres.h"
#include &lt;string.h&gt;
/* By Value */
int
add_one(int arg)
{
return arg + 1;
}
/* By Reference, Fixed Length */
float8 *
add_one_float8(float8 *arg)
{
float8 *result = (float8 *) palloc(sizeof(float8));
*result = *arg + 1.0;
return result;
}
Point *
makepoint(Point *pointx, Point *pointy)
{
Point *new_point = (Point *) palloc(sizeof(Point));
new_point->x = pointx->x;
new_point->y = pointy->y;
return new_point;
}
/* By Reference, Variable Length */
text *
copytext(text *t)
{
/*
* VARSIZE is the total size of the struct in bytes.
*/
text *new_t = (text *) palloc(VARSIZE(t));
VARATT_SIZEP(new_t) = VARSIZE(t);
/*
* VARDATA is a pointer to the data region of the struct.
*/
memcpy((void *) VARDATA(new_t), /* destination */
(void *) VARDATA(t), /* source */
VARSIZE(t)-VARHDRSZ); /* how many bytes */
return new_t;
}
text *
concat_text(text *arg1, text *arg2)
{
int32 new_text_size = VARSIZE(arg1) + VARSIZE(arg2) - VARHDRSZ;
text *new_text = (text *) palloc(new_text_size);
VARATT_SIZEP(new_text) = new_text_size;
memcpy(VARDATA(new_text), VARDATA(arg1), VARSIZE(arg1)-VARHDRSZ);
memcpy(VARDATA(new_text) + (VARSIZE(arg1)-VARHDRSZ),
VARDATA(arg2), VARSIZE(arg2)-VARHDRSZ);
return new_text;
}
</programlisting>
</para>
1998-03-01 09:16:16 +01:00
<para>
Supposing that the above code has been prepared in file
<filename>funcs.c</filename> and compiled into a shared object,
we could define the functions to <productname>PostgreSQL</productname>
with commands like this:
1998-03-01 09:16:16 +01:00
<programlisting>
CREATE FUNCTION add_one(int4) RETURNS int4
AS '<replaceable>PGROOT</replaceable>/tutorial/funcs' LANGUAGE C
WITH (isStrict);
-- note overloading of SQL function name add_one()
CREATE FUNCTION add_one(float8) RETURNS float8
AS '<replaceable>PGROOT</replaceable>/tutorial/funcs',
'add_one_float8'
LANGUAGE C WITH (isStrict);
CREATE FUNCTION makepoint(point, point) RETURNS point
AS '<replaceable>PGROOT</replaceable>/tutorial/funcs' LANGUAGE C
WITH (isStrict);
CREATE FUNCTION copytext(text) RETURNS text
AS '<replaceable>PGROOT</replaceable>/tutorial/funcs' LANGUAGE C
WITH (isStrict);
CREATE FUNCTION concat_text(text, text) RETURNS text
AS '<replaceable>PGROOT</replaceable>/tutorial/funcs' LANGUAGE C
WITH (isStrict);
</programlisting>
</para>
1998-03-01 09:16:16 +01:00
<para>
Here <replaceable>PGROOT</replaceable> stands for the full path to
the <productname>PostgreSQL</productname> source tree. (Better style would
be to use just <literal>'funcs'</> in the <literal>AS</> clause,
after having added <replaceable>PGROOT</replaceable><literal>/tutorial</>
to the search path. In any case, we may omit the system-specific
extension for a shared library, commonly <literal>.so</literal> or
<literal>.sl</literal>.)
</para>
<para>
Notice that we have specified the functions as <quote>strict</quote>,
meaning that
the system should automatically assume a NULL result if any input
value is NULL. By doing this, we avoid having to check for NULL inputs
in the function code. Without this, we'd have to check for NULLs
explicitly, for example by checking for a null pointer for each
pass-by-reference argument. (For pass-by-value arguments, we don't
even have a way to check!)
</para>
<para>
Although this calling convention is simple to use,
it is not very portable; on some architectures there are problems
with passing smaller-than-int data types this way. Also, there is
no simple way to return a NULL result, nor to cope with NULL arguments
in any way other than making the function strict. The version-1
convention, presented next, overcomes these objections.
</para>
</sect2>
1998-03-01 09:16:16 +01:00
<sect2>
<title>Version-1 Calling Conventions for C-Language Functions</title>
<para>
The version-1 calling convention relies on macros to suppress most
of the complexity of passing arguments and results. The C declaration
of a version-1 function is always
<programlisting>
Datum funcname(PG_FUNCTION_ARGS)
</programlisting>
In addition, the macro call
<programlisting>
PG_FUNCTION_INFO_V1(funcname);
</programlisting>
must appear in the same source file (conventionally it's written
just before the function itself). This macro call is not needed
for <literal>internal</>-language functions, since
<productname>PostgreSQL</> currently
assumes all internal functions are version-1. However, it is
<emphasis>required</emphasis> for dynamically-loaded functions.
</para>
<para>
In a version-1 function, each actual argument is fetched using a
<function>PG_GETARG_<replaceable>xxx</replaceable>()</function>
2002-03-22 20:20:45 +01:00
macro that corresponds to the argument's data type, and the result
is returned using a
2001-01-22 17:11:17 +01:00
<function>PG_RETURN_<replaceable>xxx</replaceable>()</function>
macro for the return type.
</para>
<para>
2001-01-22 17:11:17 +01:00
Here we show the same functions as above, coded in version-1 style:
<programlisting>
#include "postgres.h"
#include &lt;string.h&gt;
#include "fmgr.h"
/* By Value */
PG_FUNCTION_INFO_V1(add_one);
Datum
add_one(PG_FUNCTION_ARGS)
{
int32 arg = PG_GETARG_INT32(0);
PG_RETURN_INT32(arg + 1);
}
/* By Reference, Fixed Length */
PG_FUNCTION_INFO_V1(add_one_float8);
Datum
add_one_float8(PG_FUNCTION_ARGS)
{
/* The macros for FLOAT8 hide its pass-by-reference nature */
float8 arg = PG_GETARG_FLOAT8(0);
PG_RETURN_FLOAT8(arg + 1.0);
}
PG_FUNCTION_INFO_V1(makepoint);
Datum
makepoint(PG_FUNCTION_ARGS)
{
2001-01-22 17:11:17 +01:00
/* Here, the pass-by-reference nature of Point is not hidden */
Point *pointx = PG_GETARG_POINT_P(0);
Point *pointy = PG_GETARG_POINT_P(1);
Point *new_point = (Point *) palloc(sizeof(Point));
new_point->x = pointx->x;
new_point->y = pointy->y;
PG_RETURN_POINT_P(new_point);
}
/* By Reference, Variable Length */
PG_FUNCTION_INFO_V1(copytext);
Datum
copytext(PG_FUNCTION_ARGS)
{
text *t = PG_GETARG_TEXT_P(0);
/*
* VARSIZE is the total size of the struct in bytes.
*/
text *new_t = (text *) palloc(VARSIZE(t));
VARATT_SIZEP(new_t) = VARSIZE(t);
/*
* VARDATA is a pointer to the data region of the struct.
*/
memcpy((void *) VARDATA(new_t), /* destination */
(void *) VARDATA(t), /* source */
VARSIZE(t)-VARHDRSZ); /* how many bytes */
PG_RETURN_TEXT_P(new_t);
}
PG_FUNCTION_INFO_V1(concat_text);
Datum
concat_text(PG_FUNCTION_ARGS)
{
text *arg1 = PG_GETARG_TEXT_P(0);
text *arg2 = PG_GETARG_TEXT_P(1);
int32 new_text_size = VARSIZE(arg1) + VARSIZE(arg2) - VARHDRSZ;
text *new_text = (text *) palloc(new_text_size);
VARATT_SIZEP(new_text) = new_text_size;
memcpy(VARDATA(new_text), VARDATA(arg1), VARSIZE(arg1)-VARHDRSZ);
memcpy(VARDATA(new_text) + (VARSIZE(arg1)-VARHDRSZ),
VARDATA(arg2), VARSIZE(arg2)-VARHDRSZ);
PG_RETURN_TEXT_P(new_text);
}
</programlisting>
</para>
<para>
The <command>CREATE FUNCTION</command> commands are the same as
2001-01-22 17:11:17 +01:00
for the version-0 equivalents.
</para>
<para>
At first glance, the version-1 coding conventions may appear to
be just pointless obscurantism. However, they do offer a number
of improvements, because the macros can hide unnecessary detail.
2002-01-07 03:29:15 +01:00
An example is that in coding <function>add_one_float8</>, we no longer need to
be aware that <type>float8</type> is a pass-by-reference type. Another
example is that the <literal>GETARG</> macros for variable-length types hide
the need to deal with fetching <quote>toasted</quote> (compressed or
out-of-line) values. The old-style <function>copytext</function>
and <function>concat_text</function> functions shown above are
actually wrong in the presence of toasted values, because they
don't call <function>pg_detoast_datum()</function> on their
inputs. (The handler for old-style dynamically-loaded functions
currently takes care of this detail, but it does so less
efficiently than is possible for a version-1 function.)
</para>
<para>
One big improvement in version-1 functions is better handling of NULL
2002-01-07 03:29:15 +01:00
inputs and results. The macro <function>PG_ARGISNULL(<replaceable>n</>)</function>
allows a function to test whether each input is NULL (of course, doing
this is only necessary in functions not declared <quote>strict</>).
As with the
<function>PG_GETARG_<replaceable>xxx</replaceable>()</function> macros,
the input arguments are counted beginning at zero. Note that one
should refrain from executing
<function>PG_GETARG_<replaceable>xxx</replaceable>()</function> until
one has verified that the argument isn't NULL.
To return a NULL result, execute <function>PG_RETURN_NULL()</function>;
2002-01-07 03:29:15 +01:00
this works in both strict and nonstrict functions.
</para>
<para>
Other options provided in the new-style interface are two
variants of the
<function>PG_GETARG_<replaceable>xxx</replaceable>()</function>
macros. The first of these,
<function>PG_GETARG_<replaceable>xxx</replaceable>_COPY()</function>
guarantees to return a copy of the specified parameter which is
safe for writing into. (The normal macros will sometimes return a
pointer to the value which must not be written to. Using the
<function>PG_GETARG_<replaceable>xxx</replaceable>_COPY()</function>
macros guarantees a writable result.)
</para>
<para>
The second variant consists of the
<function>PG_GETARG_<replaceable>xxx</replaceable>_SLICE()</function>
macros which take three parameters. The first is the number of the
parameter (as above). The second and third are the offset and
length of the segment to be returned. Offsets are counted from
zero, and a negative length requests that the remainder of the
value be returned. These routines provide more efficient access to
parts of large values in the case where they have storage type
"external". (The storage type of a column can be specified using
2002-03-11 06:03:52 +01:00
<literal>ALTER TABLE <replaceable>tablename</replaceable> ALTER
COLUMN <replaceable>colname</replaceable> SET STORAGE
2002-03-11 06:03:52 +01:00
<replaceable>storagetype</replaceable></literal>. Storage type is one of
<literal>plain</>, <literal>external</>, <literal>extended</literal>,
or <literal>main</>.)
</para>
<para>
The version-1 function call conventions make it possible to
return <quote>set</quote> results and implement trigger functions and
2001-01-22 17:11:17 +01:00
procedural-language call handlers. Version-1 code is also more
portable than version-0, because it does not break ANSI C restrictions
on function call protocol. For more details see
<filename>src/backend/utils/fmgr/README</filename> in the source
distribution.
</para>
</sect2>
<sect2>
<title>Composite Types in C-Language Functions</title>
1998-03-01 09:16:16 +01:00
<para>
1998-03-01 09:16:16 +01:00
Composite types do not have a fixed layout like C
structures. Instances of a composite type may contain
null fields. In addition, composite types that are
part of an inheritance hierarchy may have different
fields than other members of the same inheritance hierarchy.
Therefore, <productname>PostgreSQL</productname> provides
a procedural interface for accessing fields of composite types
from C. As <productname>PostgreSQL</productname> processes
a set of rows, each row will be passed into your
function as an opaque structure of type <literal>TUPLE</literal>.
1998-03-01 09:16:16 +01:00
Suppose we want to write a function to answer the query
<programlisting>
SELECT name, c_overpaid(emp, 1500) AS overpaid
FROM emp
WHERE name = 'Bill' OR name = 'Sam';
</programlisting>
2002-01-07 03:29:15 +01:00
In the query above, we can define <function>c_overpaid</> as:
1998-03-01 09:16:16 +01:00
<programlisting>
#include "postgres.h"
#include "executor/executor.h" /* for GetAttributeByName() */
bool
c_overpaid(TupleTableSlot *t, /* the current row of EMP */
int32 limit)
{
bool isnull;
int32 salary;
salary = DatumGetInt32(GetAttributeByName(t, "salary", &amp;isnull));
if (isnull)
return (false);
return salary &gt; limit;
}
/* In version-1 coding, the above would look like this: */
PG_FUNCTION_INFO_V1(c_overpaid);
Datum
c_overpaid(PG_FUNCTION_ARGS)
{
TupleTableSlot *t = (TupleTableSlot *) PG_GETARG_POINTER(0);
int32 limit = PG_GETARG_INT32(1);
bool isnull;
int32 salary;
salary = DatumGetInt32(GetAttributeByName(t, "salary", &amp;isnull));
if (isnull)
PG_RETURN_BOOL(false);
/* Alternatively, we might prefer to do PG_RETURN_NULL() for null salary */
PG_RETURN_BOOL(salary &gt; limit);
}
</programlisting>
</para>
1998-03-01 09:16:16 +01:00
<para>
<function>GetAttributeByName</function> is the
<productname>PostgreSQL</productname> system function that
returns attributes out of the current row. It has
three arguments: the argument of type <type>TupleTableSlot*</type> passed into
1998-03-01 09:16:16 +01:00
the function, the name of the desired attribute, and a
return parameter that tells whether the attribute
2002-01-07 03:29:15 +01:00
is null. <function>GetAttributeByName</function> returns a <type>Datum</type>
value that you can convert to the proper data type by using the
appropriate <function>DatumGet<replaceable>XXX</replaceable>()</function> macro.
</para>
1998-03-01 09:16:16 +01:00
<para>
2002-01-07 03:29:15 +01:00
The following command lets <productname>PostgreSQL</productname>
know about the <function>c_overpaid</function> function:
<programlisting>
CREATE FUNCTION c_overpaid(emp, int4)
2000-05-20 13:24:37 +02:00
RETURNS bool
AS '<replaceable>PGROOT</replaceable>/tutorial/funcs'
LANGUAGE C;
</programlisting>
</para>
</sect2>
<sect2>
<title>Table Function API</title>
<para>
The Table Function API assists in the creation of a user defined
C Language table functions (<xref linkend="xfunc-tablefunctions">).
Table functions are functions that produce a set of rows, made up of
either base (scalar) data types, or composite (multi-column) data types.
The API is split into two main components: support for returning
composite data types, and support for returning multiple rows
(set returning functions or SRFs).
</para>
<para>
The Table Function API relies on macros and functions to suppress most
of the complexity of building composite data types and return multiple
results. In addition to the version-1 conventions discussed elsewhere,
a table function always requires the following:
<programlisting>
#include "funcapi.h"
</programlisting>
</para>
<para>
The Table Function API support for returning composite data types
(or tuples) starts with the AttInMetadata struct. This struct holds
arrays of individual attribute information needed to create a tuple from
raw C strings. It also requires a copy of the TupleDesc. The information
carried here is derived from the TupleDesc, but it is stored here to
avoid redundant cpu cycles on each call to a Table Function.
<programlisting>
typedef struct
{
/* full TupleDesc */
TupleDesc tupdesc;
/* pointer to array of attribute "type"in finfo */
FmgrInfo *attinfuncs;
/* pointer to array of attribute type typelem */
Oid *attelems;
/* pointer to array of attribute type typtypmod */
int4 *atttypmods;
} AttInMetadata;
</programlisting>
To assist you in populating this struct, several functions and a macro
are available. Use
<programlisting>
TupleDesc RelationNameGetTupleDesc(char *relname)
</programlisting>
to get a TupleDesc based on the function's return type relation, or
<programlisting>
TupleDesc TypeGetTupleDesc(Oid typeoid, List *colaliases)
</programlisting>
to get a TupleDesc based on the function's type oid. This can be used to
get a TupleDesc for a base (scalar), or composite (relation) type. Then
<programlisting>
AttInMetadata *TupleDescGetAttInMetadata(TupleDesc tupdesc)
</programlisting>
will return a pointer to an AttInMetadata struct, initialized based on
the function's TupleDesc. AttInMetadata is be used in conjunction with
C strings to produce a properly formed tuple. The metadata is stored here
for use across calls to avoid redundant work.
</para>
<para>
In order to return a tuple you must create a tuple slot based on the
TupleDesc. You can use
<programlisting>
TupleTableSlot *TupleDescGetSlot(TupleDesc tupdesc)
</programlisting>
to initialize this tuple slot, or obtain one through other (user provided)
means. The tuple slot is needed to create a Datum for return by the
function.
</para>
<para>
If desired,
<programlisting>
HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values)
</programlisting>
can be used to build a HeapTuple given user data in C string form.
"values" is an array of C strings, one for each attribute of the return
tuple. The C strings should be in the form expected by the "in" function
of the attribute data type. For more information on this requirement,
see the individual data type "in" functions in the source code
(e.g. textin() for data type TEXT). In order to return a NULL value for
one of the attributes, the corresponding pointer in the "values" array
should be set to NULL.
</para>
<para>
Finally, in order to return a tuple using the SRF portion of the API
(described below), the tuple must be converted into a Datum. Use
<programlisting>
TupleGetDatum(TupleTableSlot *slot, HeapTuple tuple)
</programlisting>
to get a Datum given a tuple and a slot.
</para>
<para>
The Table Function API support for set returning functions starts with
the FuncCallContext struct. This struct holds function context for
SRFs using fcinfo->flinfo->fn_extra to hold a pointer to it across calls.
<programlisting>
typedef struct
{
/*
* Number of times we've been called before.
*
* call_cntr is initialized to 0 for you by SRF_FIRSTCALL_INIT(), and
* incremented for you every time SRF_RETURN_NEXT() is called.
*/
uint32 call_cntr;
/*
* OPTIONAL maximum number of calls
*
* max_calls is here for convenience ONLY and setting it is OPTIONAL.
* If not set, you must provide alternative means to know when the
* function is done.
*/
uint32 max_calls;
/*
* OPTIONAL pointer to result slot
*
* slot is for use when returning tuples (i.e. composite data types)
* and is not needed when returning base (i.e. scalar) data types.
*/
TupleTableSlot *slot;
/*
* OPTIONAL pointer to misc user provided context info
*
* user_fctx is for use as a pointer to your own struct to retain
* arbitrary context information between calls for your function.
*/
void *user_fctx;
/*
* OPTIONAL pointer to struct containing arrays of attribute type input
* metainfo
*
* attinmeta is for use when returning tuples (i.e. composite data types)
* and is not needed when returning base (i.e. scalar) data types. It
* is ONLY needed if you intend to use BuildTupleFromCStrings() to create
* the return tuple.
*/
AttInMetadata *attinmeta;
/*
* memory context used to initialize structure
*
* fmctx is set by SRF_FIRSTCALL_INIT() for you, and used by
* SRF_RETURN_DONE() for cleanup. It is primarily for internal use
* by the API.
*/
MemoryContext fmctx;
} FuncCallContext;
</programlisting>
To assist you in populating this struct, several functions and macros
are available. Use
<programlisting>
SRF_IS_FIRSTCALL()
</programlisting>
to determine if your function has been called for the first or a
subsequent time. On the first call (only) use
<programlisting>
SRF_FIRSTCALL_INIT()
</programlisting>
to initialize the FuncCallContext struct. On every function call,
including the first, use
<programlisting>
SRF_PERCALL_SETUP()
</programlisting>
to properly set up for using the FuncCallContext struct and clearing
any previously returned data left over from the previous pass.
</para>
<para>
If your function has data to return, use
<programlisting>
SRF_RETURN_NEXT(funcctx, result)
</programlisting>
to send it and prepare for the next call. Finally, when your function
is finished returning data, use
<programlisting>
SRF_RETURN_DONE(funcctx)
</programlisting>
to clean up and end the SRF.
</para>
<para>
A complete pseudo-code example looks like the following:
<programlisting>
Datum
my_Set_Returning_Function(PG_FUNCTION_ARGS)
{
FuncCallContext *funcctx;
Datum result;
[user defined declarations]
if(SRF_IS_FIRSTCALL())
{
[user defined code]
funcctx = SRF_FIRSTCALL_INIT();
[if returning composite]
[obtain slot]
funcctx->slot = slot;
[endif returning composite]
[user defined code]
}
[user defined code]
funcctx = SRF_PERCALL_SETUP();
[user defined code]
if (funcctx->call_cntr < funcctx->max_calls)
{
[user defined code]
[obtain result Datum]
SRF_RETURN_NEXT(funcctx, result);
}
else
{
SRF_RETURN_DONE(funcctx);
}
}
</programlisting>
</para>
<para>
An example of a simple composite returning SRF looks like:
<programlisting>
PG_FUNCTION_INFO_V1(testpassbyval);
Datum
testpassbyval(PG_FUNCTION_ARGS)
{
FuncCallContext *funcctx;
int call_cntr;
int max_calls;
TupleDesc tupdesc;
TupleTableSlot *slot;
AttInMetadata *attinmeta;
/* stuff done only on the first call of the function */
if(SRF_IS_FIRSTCALL())
{
/* create a function context for cross-call persistence */
funcctx = SRF_FIRSTCALL_INIT();
/* total number of tuples to be returned */
funcctx->max_calls = PG_GETARG_UINT32(0);
/*
* Build a tuple description for a __testpassbyval tuple
*/
tupdesc = RelationNameGetTupleDesc("__testpassbyval");
/* allocate a slot for a tuple with this tupdesc */
slot = TupleDescGetSlot(tupdesc);
/* assign slot to function context */
funcctx->slot = slot;
/*
* Generate attribute metadata needed later to produce tuples from raw
* C strings
*/
attinmeta = TupleDescGetAttInMetadata(tupdesc);
funcctx->attinmeta = attinmeta;
}
/* stuff done on every call of the function */
funcctx = SRF_PERCALL_SETUP();
call_cntr = funcctx->call_cntr;
max_calls = funcctx->max_calls;
slot = funcctx->slot;
attinmeta = funcctx->attinmeta;
if (call_cntr < max_calls) /* do when there is more left to send */
{
char **values;
HeapTuple tuple;
Datum result;
/*
* Prepare a values array for storage in our slot.
* This should be an array of C strings which will
* be processed later by the appropriate "in" functions.
*/
values = (char **) palloc(3 * sizeof(char *));
values[0] = (char *) palloc(16 * sizeof(char));
values[1] = (char *) palloc(16 * sizeof(char));
values[2] = (char *) palloc(16 * sizeof(char));
snprintf(values[0], 16, "%d", 1 * PG_GETARG_INT32(1));
snprintf(values[1], 16, "%d", 2 * PG_GETARG_INT32(1));
snprintf(values[2], 16, "%d", 3 * PG_GETARG_INT32(1));
/* build a tuple */
tuple = BuildTupleFromCStrings(attinmeta, values);
/* make the tuple into a datum */
result = TupleGetDatum(slot, tuple);
/* Clean up */
pfree(values[0]);
pfree(values[1]);
pfree(values[2]);
pfree(values);
SRF_RETURN_NEXT(funcctx, result);
}
else /* do when there is no more left */
{
SRF_RETURN_DONE(funcctx);
}
}
</programlisting>
with supporting SQL code of
<programlisting>
CREATE VIEW __testpassbyval AS
SELECT
0::INT4 AS f1,
0::INT4 AS f2,
0::INT4 AS f3;
CREATE OR REPLACE FUNCTION testpassbyval(int4, int4) RETURNS setof __testpassbyval
AS 'MODULE_PATHNAME','testpassbyval' LANGUAGE 'c' IMMUTABLE STRICT;
</programlisting>
</para>
1998-03-01 09:16:16 +01:00
<para>
See contrib/tablefunc for more examples of Table Functions.
</para>
</sect2>
1998-03-01 09:16:16 +01:00
<sect2>
<title>Writing Code</title>
1998-03-01 09:16:16 +01:00
<para>
1998-03-01 09:16:16 +01:00
We now turn to the more difficult task of writing
programming language functions. Be warned: this section
of the manual will not make you a programmer. You must
have a good understanding of <acronym>C</acronym>
(including the use of pointers and the malloc memory manager)
before trying to write <acronym>C</acronym> functions for
use with <productname>PostgreSQL</productname>. While it may
be possible to load functions written in languages other
than <acronym>C</acronym> into <productname>PostgreSQL</productname>,
this is often difficult (when it is possible at all)
because other languages, such as <acronym>FORTRAN</acronym>
and <acronym>Pascal</acronym> often do not follow the same
<firstterm>calling convention</firstterm>
as <acronym>C</acronym>. That is, other
1998-03-01 09:16:16 +01:00
languages do not pass argument and return values
between functions in the same way. For this reason, we
will assume that your programming language functions
are written in <acronym>C</acronym>.
</para>
<para>
The basic rules for building <acronym>C</acronym> functions
are as follows:
1998-03-01 09:16:16 +01:00
<itemizedlist>
<listitem>
<para>
2001-11-12 20:19:39 +01:00
Use <literal>pg_config --includedir-server</literal><indexterm><primary>pg_config</></> to find
out where the <productname>PostgreSQL</> server header files are installed on
your system (or the system that your users will be running
on). This option is new with <productname>PostgreSQL</> 7.2.
For <productname>PostgreSQL</>
7.1 you should use the option <option>--includedir</option>.
(<command>pg_config</command> will exit with a non-zero status
if it encounters an unknown option.) For releases prior to
7.1 you will have to guess, but since that was before the
current calling conventions were introduced, it is unlikely
that you want to support those releases.
</para>
</listitem>
<listitem>
<para>
When allocating memory, use the
<productname>PostgreSQL</productname> routines
<function>palloc</function> and <function>pfree</function>
instead of the corresponding <acronym>C</acronym> library
routines <function>malloc</function> and
<function>free</function>. The memory allocated by
<function>palloc</function> will be freed automatically at the
end of each transaction, preventing memory leaks.
</para>
</listitem>
<listitem>
<para>
Always zero the bytes of your structures using
<function>memset</function> or <function>bzero</function>.
Several routines (such as the hash access method, hash join
and the sort algorithm) compute functions of the raw bits
contained in your structure. Even if you initialize all
fields of your structure, there may be several bytes of
alignment padding (holes in the structure) that may contain
garbage values.
</para>
</listitem>
<listitem>
<para>
Most of the internal <productname>PostgreSQL</productname> types
are declared in <filename>postgres.h</filename>, while the function
manager interfaces (<symbol>PG_FUNCTION_ARGS</symbol>, etc.)
are in <filename>fmgr.h</filename>, so you will need to
include at least these two files. For portability reasons it's best
to include <filename>postgres.h</filename> <emphasis>first</>,
before any other system or user header files.
Including <filename>postgres.h</filename> will also include
<filename>elog.h</filename> and <filename>palloc.h</filename>
for you.
</para>
</listitem>
<listitem>
<para>
Symbol names defined within object files must not conflict
with each other or with symbols defined in the
<productname>PostgreSQL</productname> server executable. You
will have to rename your functions or variables if you get
error messages to this effect.
</para>
</listitem>
<listitem>
<para>
Compiling and linking your object code so that
it can be dynamically loaded into
<productname>PostgreSQL</productname>
always requires special flags.
2000-12-26 01:10:37 +01:00
See <xref linkend="dfunc">
for a detailed explanation of how to do it for
your particular operating system.
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
&dfunc;
</sect1>
<sect1 id="xfunc-overload">
<title>Function Overloading</title>
2001-11-12 20:19:39 +01:00
<indexterm zone="xfunc-overload"><primary>overloading</></>
<para>
More than one function may be defined with the same SQL name, so long
2001-09-15 21:56:59 +02:00
as the arguments they take are different. In other words,
function names can be <firstterm>overloaded</firstterm>. When a
query is executed, the server will determine which function to
call from the data types and the number of the provided arguments.
Overloading can also be used to simulate functions with a variable
number of arguments, up to a finite maximum number.
</para>
<para>
A function may also have the same name as an attribute. In the case
that there is an ambiguity between a function on a complex type and
an attribute of the complex type, the attribute will always be used.
</para>
2001-09-15 21:56:59 +02:00
<para>
When creating a family of overloaded functions, one should be
careful not to create ambiguities. For instance, given the
functions
<programlisting>
CREATE FUNCTION test(int, real) RETURNS ...
CREATE FUNCTION test(smallint, double precision) RETURNS ...
</programlisting>
it is not immediately clear which function would be called with
some trivial input like <literal>test(1, 1.5)</literal>. The
currently implemented resolution rules are described in the
<citetitle>User's Guide</citetitle>, but it is unwise to design a
system that subtly relies on this behavior.
</para>
2001-09-15 21:56:59 +02:00
<para>
When overloading C language functions, there is an additional
constraint: The C name of each function in the family of
overloaded functions must be different from the C names of all
other functions, either internal or dynamically loaded. If this
rule is violated, the behavior is not portable. You might get a
run-time linker error, or one of the functions will get called
(usually the internal one). The alternative form of the
<literal>AS</> clause for the SQL <command>CREATE
FUNCTION</command> command decouples the SQL function name from
the function name in the C source code. E.g.,
<programlisting>
CREATE FUNCTION test(int) RETURNS int
AS '<replaceable>filename</>', 'test_1arg'
LANGUAGE C;
CREATE FUNCTION test(int, int) RETURNS int
AS '<replaceable>filename</>', 'test_2arg'
LANGUAGE C;
</programlisting>
The names of the C functions here reflect one of many possible conventions.
</para>
2001-09-15 21:56:59 +02:00
<para>
Prior to <productname>PostgreSQL</productname> 7.0, this
alternative syntax did not exist. There is a trick to get around
the problem, by defining a set of C functions with different names
and then define a set of identically-named SQL function wrappers
that take the appropriate argument types and call the matching C
function.
</para>
</sect1>
<sect1 id="xfunc-tablefunctions">
<title>Table Functions</title>
<indexterm zone="xfunc-tablefunctions"><primary>function</></>
<para>
Table functions are functions that produce a set of rows, made up of
either base (scalar) data types, or composite (multi-column) data types.
They are used like a table, view, or subselect in the <literal>FROM</>
clause of a query. Columns returned by table functions may be included in
<literal>SELECT</>, <literal>JOIN</>, or <literal>WHERE</> clauses in the
same manner as a table, view, or subselect column.
</para>
<para>
If a table function returns a base data type, the single result column
is named for the function. If the function returns a composite type, the
result columns get the same names as the individual attributes of the type.
</para>
<para>
A table function may be aliased in the <literal>FROM</> clause, but it also
may be left unaliased. If a function is used in the FROM clause with no
alias, the function name is used as the relation name.
</para>
<para>
Table functions work wherever tables do in <literal>SELECT</> statements.
For example
<programlisting>
CREATE TABLE foo (fooid int, foosubid int, fooname text, primary key(fooid,foosubid));
CREATE FUNCTION getfoo(int) RETURNS foo AS 'SELECT * FROM foo WHERE fooid = $1;' LANGUAGE SQL;
SELECT * FROM getfoo(1) AS t1;
SELECT * FROM foo where foosubid in (select foosubid from getfoo(foo.fooid) z where z.fooid = foo.fooid);
CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);
SELECT * FROM vw_getfoo;
</programlisting>
are all valid statements.
</para>
<para>
Currently, table functions are supported as SQL language functions
(<xref linkend="xfunc-sql">) and C language functions
(<xref linkend="xfunc-c">). See these individual sections for more
details.
</para>
</sect1>
<sect1 id="xfunc-plhandler">
<title>Procedural Language Handlers</title>
<para>
All calls to functions that are written in a language other than
the current <quote>version 1</quote> interface for compiled
languages (this includes functions in user-defined procedural languages,
functions written in SQL, and functions using the version 0 compiled
language interface), go through a <firstterm>call handler</firstterm>
function for the specific language. It is the responsibility of
the call handler to execute the function in a meaningful way, such
as by interpreting the supplied source text. This section
describes how a language call handler can be written. This is not
a common task, in fact, it has only been done a handful of times
in the history of <productname>PostgreSQL</productname>, but the
topic naturally belongs in this chapter, and the material might
give some insight into the extensible nature of the
<productname>PostgreSQL</productname> system.
</para>
<para>
The call handler for a procedural language is a
<quote>normal</quote> function, which must be written in a
compiled language such as C and registered with
<productname>PostgreSQL</productname> as taking no arguments and
returning the <type>opaque</type> type, a placeholder for
unspecified or undefined types. This prevents the call handler
from being called directly as a function from queries. (However,
arguments may be supplied in the actual call to the handler when a
function in the language offered by the handler is to be
executed.)
</para>
<note>
<para>
In <productname>PostgreSQL</productname> 7.1 and later, call
handlers must adhere to the <quote>version 1</quote> function
manager interface, not the old-style interface.
</para>
</note>
<para>
The call handler is called in the same way as any other function:
It receives a pointer to a
<structname>FunctionCallInfoData</structname> struct containing
argument values and information about the called function, and it
is expected to return a <type>Datum</type> result (and possibly
set the <structfield>isnull</structfield> field of the
<structname>FunctionCallInfoData</structname> struct, if it wishes
to return an SQL NULL result). The difference between a call
handler and an ordinary callee function is that the
<structfield>flinfo-&gt;fn_oid</structfield> field of the
<structname>FunctionCallInfoData</structname> struct will contain
the OID of the actual function to be called, not of the call
handler itself. The call handler must use this field to determine
which function to execute. Also, the passed argument list has
been set up according to the declaration of the target function,
not of the call handler.
</para>
<para>
It's up to the call handler to fetch the
<classname>pg_proc</classname> entry and to analyze the argument
and return types of the called procedure. The AS clause from the
<command>CREATE FUNCTION</command> of the procedure will be found
in the <literal>prosrc</literal> attribute of the
<classname>pg_proc</classname> table entry. This may be the source
text in the procedural language itself (like for PL/Tcl), a
path name to a file, or anything else that tells the call handler
what to do in detail.
</para>
<para>
Often, the same function is called many times per SQL statement.
A call handler can avoid repeated lookups of information about the
called function by using the
<structfield>flinfo-&gt;fn_extra</structfield> field. This will
initially be NULL, but can be set by the call handler to point at
information about the PL function. On subsequent calls, if
<structfield>flinfo-&gt;fn_extra</structfield> is already non-NULL
then it can be used and the information lookup step skipped. The
call handler must be careful that
<structfield>flinfo-&gt;fn_extra</structfield> is made to point at
memory that will live at least until the end of the current query,
since an <structname>FmgrInfo</structname> data structure could be
kept that long. One way to do this is to allocate the extra data
in the memory context specified by
<structfield>flinfo-&gt;fn_mcxt</structfield>; such data will
normally have the same lifespan as the
<structname>FmgrInfo</structname> itself. But the handler could
also choose to use a longer-lived context so that it can cache
function definition information across queries.
</para>
<para>
When a PL function is invoked as a trigger, no explicit arguments
are passed, but the
<structname>FunctionCallInfoData</structname>'s
<structfield>context</structfield> field points at a
<structname>TriggerData</structname> node, rather than being NULL
as it is in a plain function call. A language handler should
provide mechanisms for PL functions to get at the trigger
information.
</para>
<para>
This is a template for a PL handler written in C:
<programlisting>
#include "postgres.h"
#include "executor/spi.h"
#include "commands/trigger.h"
#include "utils/elog.h"
#include "fmgr.h"
#include "access/heapam.h"
#include "utils/syscache.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
PG_FUNCTION_INFO_V1(plsample_call_handler);
Datum
plsample_call_handler(PG_FUNCTION_ARGS)
{
Datum retval;
if (CALLED_AS_TRIGGER(fcinfo))
{
/*
* Called as a trigger procedure
*/
TriggerData *trigdata = (TriggerData *) fcinfo->context;
retval = ...
}
else {
/*
* Called as a function
*/
retval = ...
}
return retval;
}
</programlisting>
</para>
<para>
Only a few thousand lines of code have to be added instead of the
dots to complete the call handler. See <xref linkend="xfunc-c">
for information on how to compile it into a loadable module.
</para>
<para>
The following commands then register the sample procedural
language:
<programlisting>
CREATE FUNCTION plsample_call_handler () RETURNS opaque
AS '/usr/local/pgsql/lib/plsample'
LANGUAGE C;
CREATE LANGUAGE plsample
HANDLER plsample_call_handler;
</programlisting>
</para>
</sect1>
</chapter>
<!-- Keep this comment at the end of the file
Local variables:
mode:sgml
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"./reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:("/usr/lib/sgml/catalog")
sgml-local-ecat-files:nil
End:
-->