Extending <acronym>SQL</acronym>: Functions function Introduction Historically, functions were perhaps considered a tool for creating types. Today, few people build their own types but many write their own functions. This introduction ought to be changed to reflect this. As it turns out, part of defining a new type is the definition of functions that describe its behavior. Consequently, while it is possible to define a new function without defining a new type, the reverse is not true. We therefore describe how to add new functions to PostgreSQL before describing how to add new types. PostgreSQL provides four kinds of functions: query language functions (functions written in SQL) procedural language functions (functions written in, for example, PL/Tcl or PL/pgSQL) internal functions C language functions Every kind of function can take a base type, a composite type or some combination as arguments (parameters). In addition, every kind of function can return a base type or a composite type. It's easiest to define SQL functions, so we'll start with those. Examples in this section can also be found in funcs.sql and funcs.c in the tutorial directory. Query Language (<acronym>SQL</acronym>) Functions functionSQL SQL functions execute an arbitrary list of SQL statements, returning the result of the last query in the list, which must be a SELECT. In the simple (non-set) case, the first row of the last query's result will be returned. (Bear in mind that the first row of a multi-row result is not well-defined unless you use ORDER BY.) If the last query happens to return no rows at all, NULL will be returned. SETOFfunction Alternatively, an SQL function may be declared to return a set, by specifying the function's return type as SETOF sometype. In this case all rows of the last query's result are returned. Further details appear below. The body of an SQL function should be a list of one or more SQL statements separated by semicolons. Note that because the syntax of the CREATE FUNCTION command requires the body of the function to be enclosed in single quotes, single quote marks (') used in the body of the function must be escaped, by writing two single quotes ('') or a backslash (\') where each quote is desired. Arguments to the SQL function may be referenced in the function body using the syntax $n: $1 refers to the first argument, $2 to the second, and so on. If an argument is of a composite type, then the dot notation, e.g., $1.emp, may be used to access attributes of the argument. Examples To illustrate a simple SQL function, consider the following, which might be used to debit a bank account: CREATE FUNCTION tp1 (integer, numeric) RETURNS integer AS ' UPDATE bank SET balance = balance - $2 WHERE accountno = $1; SELECT 1; ' LANGUAGE SQL; A user could execute this function to debit account 17 by $100.00 as follows: SELECT tp1(17, 100.0); In practice one would probably like a more useful result from the function than a constant 1, so a more likely definition is CREATE FUNCTION tp1 (integer, numeric) RETURNS numeric AS ' UPDATE bank SET balance = balance - $2 WHERE accountno = $1; SELECT balance FROM bank WHERE accountno = $1; ' LANGUAGE SQL; which adjusts the balance and returns the new balance. Any collection of commands in the SQL language can be packaged together and defined as a function. The commands can include data modification (i.e., INSERT, UPDATE, and DELETE) as well as SELECT queries. However, the final command must be a SELECT that returns whatever is specified as the function's return type. CREATE FUNCTION clean_EMP () RETURNS integer AS ' DELETE FROM EMP WHERE EMP.salary <= 0; SELECT 1 AS ignore_this; ' LANGUAGE SQL; SELECT clean_EMP(); x --- 1 <acronym>SQL</acronym> Functions on Base Types The simplest possible SQL function has no arguments and simply returns a base type, such as integer: CREATE FUNCTION one() RETURNS integer AS ' SELECT 1 as RESULT; ' LANGUAGE SQL; SELECT one(); one ----- 1 Notice that we defined a column alias within the function body for the result of the function (with the name RESULT), but this column alias is not visible outside the function. Hence, the result is labelled one instead of RESULT. It is almost as easy to define SQL functions that take base types as arguments. In the example below, notice how we refer to the arguments within the function as $1 and $2: CREATE FUNCTION add_em(integer, integer) RETURNS integer AS ' SELECT $1 + $2; ' LANGUAGE SQL; SELECT add_em(1, 2) AS answer; answer -------- 3 <acronym>SQL</acronym> Functions on Composite Types When specifying functions with arguments of composite types, we must not only specify which argument we want (as we did above with $1 and $2) but also the attributes of that argument. For example, suppose that EMP is a table containing employee data, and therefore also the name of the composite type of each row of the table. Here is a function double_salary that computes what your salary would be if it were doubled: CREATE FUNCTION double_salary(EMP) RETURNS integer AS ' SELECT $1.salary * 2 AS salary; ' LANGUAGE SQL; SELECT name, double_salary(EMP) AS dream FROM EMP WHERE EMP.cubicle ~= point '(2,1)'; name | dream ------+------- Sam | 2400 Notice the use of the syntax $1.salary to select one field of the argument row value. Also notice how the calling SELECT command uses a table name to denote the entire current row of that table as a composite value. It is also possible to build a function that returns a composite type. (However, as we'll see below, there are some unfortunate restrictions on how the function may be used.) This is an example of a function that returns a single EMP row: CREATE FUNCTION new_emp() RETURNS EMP AS ' SELECT text ''None'' AS name, 1000 AS salary, 25 AS age, point ''(2,2)'' AS cubicle; ' LANGUAGE SQL; In this case we have specified each of the attributes with a constant value, but any computation or expression could have been substituted for these constants. Note two important things about defining the function: The target list order must be exactly the same as that in which the columns appear in the table associated with the composite type. You must typecast the expressions to match the definition of the composite type, or you will get errors like this: ERROR: function declared to return emp returns varchar instead of text at column 1 In the present release of PostgreSQL there are some unpleasant restrictions on how functions returning composite types can be used. Briefly, when calling a function that returns a row, we cannot retrieve the entire row. We must either project a single attribute out of the row or pass the entire row into another function. (Trying to display the entire row value will yield a meaningless number.) For example, SELECT name(new_emp()); name ------ None This example makes use of the function notation for projecting attributes. The simple way to explain this is that we can usually use the notations attribute(table) and table.attribute interchangeably: -- -- this is the same as: -- SELECT EMP.name AS youngster FROM EMP WHERE EMP.age < 30 -- SELECT name(EMP) AS youngster FROM EMP WHERE age(EMP) < 30; youngster ----------- Sam The reason why, in general, we must use the function syntax for projecting attributes of function return values is that the parser just doesn't understand the dot syntax for projection when combined with function calls. SELECT new_emp().name AS nobody; ERROR: parser: parse error at or near "." Another way to use a function returning a row result is to declare a second function accepting a rowtype parameter, and pass the function result to it: CREATE FUNCTION getname(emp) RETURNS text AS 'SELECT $1.name;' LANGUAGE SQL; SELECT getname(new_emp()); getname --------- None (1 row) <acronym>SQL</acronym> Functions Returning Sets As previously mentioned, an SQL function may be declared as returning SETOF sometype. In this case the function's final SELECT query is executed to completion, and each row it outputs is returned as an element of the set. Functions returning sets may only be called in the target list of a SELECT query. For each row that the SELECT generates by itself, the function returning set is invoked, and an output row is generated for each element of the function's result set. An example: CREATE FUNCTION listchildren(text) RETURNS SETOF text AS 'SELECT name FROM nodes WHERE parent = $1' LANGUAGE SQL; SELECT * FROM nodes; name | parent -----------+-------- Top | Child1 | Top Child2 | Top Child3 | Top SubChild1 | Child1 SubChild2 | Child1 (6 rows) SELECT listchildren('Top'); listchildren -------------- Child1 Child2 Child3 (3 rows) SELECT name, listchildren(name) FROM nodes; name | listchildren --------+-------------- Top | Child1 Top | Child2 Top | Child3 Child1 | SubChild1 Child1 | SubChild2 (5 rows) In the last SELECT, notice that no output row appears for Child2, Child3, etc. This happens because listchildren() returns an empty set for those inputs, so no output rows are generated. Procedural Language Functions Procedural languages aren't built into the PostgreSQL server; they are offered by loadable modules. Please refer to the documentation of the procedural language in question for details about the syntax and how the function body is interpreted for each language. There are currently four procedural languages available in the standard PostgreSQL distribution: PL/pgSQL, PL/Tcl, PL/Perl, and PL/Python. Other languages can be defined by users. Refer to for more information. The basics of developing a new procedural language are covered in . Internal Functions functioninternal Internal functions are functions written in C that have been statically linked into the PostgreSQL server. The body of the function definition specifies the C-language name of the function, which need not be the same as the name being declared for SQL use. (For reasons of backwards compatibility, an empty body is accepted as meaning that the C-language function name is the same as the SQL name.) Normally, all internal functions present in the backend are declared during the initialization of the database cluster (initdb), but a user could use CREATE FUNCTION to create additional alias names for an internal function. Internal functions are declared in CREATE FUNCTION with language name internal. For instance, to create an alias for the sqrt function: CREATE FUNCTION square_root(double precision) RETURNS double precision AS 'dsqrt' LANGUAGE INTERNAL WITH (isStrict); (Most internal functions expect to be declared strict.) Not all predefined functions are internal in the above sense. Some predefined functions are written in SQL. C Language Functions User-defined functions can be written in C (or a language that can be made compatible with C, such as C++). Such functions are compiled into dynamically loadable objects (also called shared libraries) and are loaded by the server on demand. The dynamic loading feature is what distinguishes C language functions from internal functions --- the actual coding conventions are essentially the same for both. (Hence, the standard internal function library is a rich source of coding examples for user-defined C functions.) Two different calling conventions are currently used for C functions. The newer version 1 calling convention is indicated by writing a PG_FUNCTION_INFO_V1() macro call for the function, as illustrated below. Lack of such a macro indicates an old-style ("version 0") function. The language name specified in CREATE FUNCTION is C in either case. Old-style functions are now deprecated because of portability problems and lack of functionality, but they are still supported for compatibility reasons. Dynamic Loading The first time a user-defined function in a particular loadable object file is called in a backend session, the dynamic loader loads that object file into memory so that the function can be called. The CREATE FUNCTION for a user-defined C function must therefore specify two pieces of information for the function: the name of the loadable object file, and the C name (link symbol) of the specific function to call within that object file. If the C name is not explicitly specified then it is assumed to be the same as the SQL function name. The following algorithm is used to locate the shared object file based on the name given in the CREATE FUNCTION command: If the name is an absolute path, the given file is loaded. If the name starts with the string $libdir, that part is replaced by the PostgreSQL package library directory name, which is determined at build time.$libdir If the name does not contain a directory part, the file is searched for in the path specified by the configuration variable dynamic_library_path.dynamic_library_path Otherwise (the file was not found in the path, or it contains a non-absolute directory part), the dynamic loader will try to take the name as given, which will most likely fail. (It is unreliable to depend on the current working directory.) If this sequence does not work, the platform-specific shared library file name extension (often .so) is appended to the given name and this sequence is tried again. If that fails as well, the load will fail. The user id the PostgreSQL server runs as must be able to traverse the path to the file you intend to load. Making the file or a higher-level directory not readable and/or not executable by the postgres user is a common mistake. In any case, the file name that is given in the CREATE FUNCTION command is recorded literally in the system catalogs, so if the file needs to be loaded again the same procedure is applied. PostgreSQL will not compile a C function automatically. The object file must be compiled before it is referenced in a CREATE FUNCTION command. See for additional information. After it is used for the first time, a dynamically loaded object file is retained in memory. Future calls in the same session to the function(s) in that file will only incur the small overhead of a symbol table lookup. If you need to force a reload of an object file, for example after recompiling it, use the LOAD command or begin a fresh session. It is recommended to locate shared libraries either relative to $libdir or through the dynamic library path. This simplifies version upgrades if the new installation is at a different location. The actual directory that $libdir stands for can be found out with the command pg_config --pkglibdir. Before PostgreSQL release 7.2, only exact absolute paths to object files could be specified in CREATE FUNCTION. This approach is now deprecated since it makes the function definition unnecessarily unportable. It's best to specify just the shared library name with no path nor extension, and let the search mechanism provide that information instead. Base Types in C-Language Functions gives the C type required for parameters in the C functions that will be loaded into PostgreSQL The Defined In column gives the header file that needs to be included to get the type definition. (The actual definition may be in a different file that is included by the listed file. It is recommended that users stick to the defined interface.) Note that you should always include postgres.h first in any source file, because it declares a number of things that you will need anyway. Equivalent C Types for Built-In <productname>PostgreSQL</productname> TypesEquivalent C Types SQL Type C Type Defined In abstime AbsoluteTime utils/nabstime.h boolean bool postgres.h (maybe compiler built-in) box BOX* utils/geo_decls.h bytea bytea* postgres.h "char" char (compiler built-in) character BpChar* postgres.h cid CommandId postgres.h date DateADT utils/date.h smallint (int2) int2 or int16 postgres.h int2vector int2vector* postgres.h integer (int4) int4 or int32 postgres.h real (float4) float4* postgres.h double precision (float8) float8* postgres.h interval Interval* utils/timestamp.h lseg LSEG* utils/geo_decls.h name Name postgres.h oid Oid postgres.h oidvector oidvector* postgres.h path PATH* utils/geo_decls.h point POINT* utils/geo_decls.h regproc regproc postgres.h reltime RelativeTime utils/nabstime.h text text* postgres.h tid ItemPointer storage/itemptr.h time TimeADT utils/date.h time with time zone TimeTzADT utils/date.h timestamp Timestamp* utils/timestamp.h tinterval TimeInterval utils/nabstime.h varchar VarChar* postgres.h xid TransactionId postgres.h
Internally, PostgreSQL regards a base type as a blob of memory. The user-defined functions that you define over a type in turn define the way that PostgreSQL can operate on it. That is, PostgreSQL will only store and retrieve the data from disk and use your user-defined functions to input, process, and output the data. Base types can have one of three internal formats: pass by value, fixed-length pass by reference, fixed-length pass by reference, variable-length By-value types can only be 1, 2 or 4 bytes in length (also 8 bytes, if sizeof(Datum) is 8 on your machine). You should be careful to define your types such that they will be the same size (in bytes) on all architectures. For example, the long type is dangerous because it is 4 bytes on some machines and 8 bytes on others, whereas int type is 4 bytes on most Unix machines. A reasonable implementation of the int4 type on Unix machines might be: /* 4-byte integer, passed by value */ typedef int int4; PostgreSQL automatically figures things out so that the integer types really have the size they advertise. On the other hand, fixed-length types of any size may be passed by-reference. For example, here is a sample implementation of a PostgreSQL type: /* 16-byte structure, passed by reference */ typedef struct { double x, y; } Point; Only pointers to such types can be used when passing them in and out of PostgreSQL functions. To return a value of such a type, allocate the right amount of memory with palloc(), fill in the allocated memory, and return a pointer to it. (Alternatively, you can return an input value of the same type by returning its pointer. Never modify the contents of a pass-by-reference input value, however.) Finally, all variable-length types must also be passed by reference. All variable-length types must begin with a length field of exactly 4 bytes, and all data to be stored within that type must be located in the memory immediately following that length field. The length field is the total length of the structure (i.e., it includes the size of the length field itself). We can define the text type as follows: typedef struct { int4 length; char data[1]; } text; Obviously, the data field declared here is not long enough to hold all possible strings. Since it's impossible to declare a variable-size structure in C, we rely on the knowledge that the C compiler won't range-check array subscripts. We just allocate the necessary amount of space and then access the array as if it were declared the right length. (If this isn't a familiar trick to you, you may wish to spend some time with an introductory C programming textbook before delving deeper into PostgreSQL server programming.) When manipulating variable-length types, we must be careful to allocate the correct amount of memory and set the length field correctly. For example, if we wanted to store 40 bytes in a text structure, we might use a code fragment like this: #include "postgres.h" ... char buffer[40]; /* our source data */ ... text *destination = (text *) palloc(VARHDRSZ + 40); destination->length = VARHDRSZ + 40; memcpy(destination->data, buffer, 40); ... VARHDRSZ is the same as sizeof(int4), but it's considered good style to use the macro VARHDRSZ to refer to the size of the overhead for a variable-length type. Now that we've gone over all of the possible structures for base types, we can show some examples of real functions.
Version-0 Calling Conventions for C-Language Functions We present the old style calling convention first --- although this approach is now deprecated, it's easier to get a handle on initially. In the version-0 method, the arguments and result of the C function are just declared in normal C style, but being careful to use the C representation of each SQL data type as shown above. Here are some examples: #include "postgres.h" #include <string.h> /* By Value */ int add_one(int arg) { return arg + 1; } /* By Reference, Fixed Length */ float8 * add_one_float8(float8 *arg) { float8 *result = (float8 *) palloc(sizeof(float8)); *result = *arg + 1.0; return result; } Point * makepoint(Point *pointx, Point *pointy) { Point *new_point = (Point *) palloc(sizeof(Point)); new_point->x = pointx->x; new_point->y = pointy->y; return new_point; } /* By Reference, Variable Length */ text * copytext(text *t) { /* * VARSIZE is the total size of the struct in bytes. */ text *new_t = (text *) palloc(VARSIZE(t)); VARATT_SIZEP(new_t) = VARSIZE(t); /* * VARDATA is a pointer to the data region of the struct. */ memcpy((void *) VARDATA(new_t), /* destination */ (void *) VARDATA(t), /* source */ VARSIZE(t)-VARHDRSZ); /* how many bytes */ return new_t; } text * concat_text(text *arg1, text *arg2) { int32 new_text_size = VARSIZE(arg1) + VARSIZE(arg2) - VARHDRSZ; text *new_text = (text *) palloc(new_text_size); VARATT_SIZEP(new_text) = new_text_size; memcpy(VARDATA(new_text), VARDATA(arg1), VARSIZE(arg1)-VARHDRSZ); memcpy(VARDATA(new_text) + (VARSIZE(arg1)-VARHDRSZ), VARDATA(arg2), VARSIZE(arg2)-VARHDRSZ); return new_text; } Supposing that the above code has been prepared in file funcs.c and compiled into a shared object, we could define the functions to PostgreSQL with commands like this: CREATE FUNCTION add_one(int4) RETURNS int4 AS 'PGROOT/tutorial/funcs' LANGUAGE C WITH (isStrict); -- note overloading of SQL function name add_one() CREATE FUNCTION add_one(float8) RETURNS float8 AS 'PGROOT/tutorial/funcs', 'add_one_float8' LANGUAGE C WITH (isStrict); CREATE FUNCTION makepoint(point, point) RETURNS point AS 'PGROOT/tutorial/funcs' LANGUAGE C WITH (isStrict); CREATE FUNCTION copytext(text) RETURNS text AS 'PGROOT/tutorial/funcs' LANGUAGE C WITH (isStrict); CREATE FUNCTION concat_text(text, text) RETURNS text AS 'PGROOT/tutorial/funcs' LANGUAGE C WITH (isStrict); Here PGROOT stands for the full path to the PostgreSQL source tree. (Better style would be to use just 'funcs' in the AS clause, after having added PGROOT/tutorial to the search path. In any case, we may omit the system-specific extension for a shared library, commonly .so or .sl.) Notice that we have specified the functions as strict, meaning that the system should automatically assume a NULL result if any input value is NULL. By doing this, we avoid having to check for NULL inputs in the function code. Without this, we'd have to check for NULLs explicitly, for example by checking for a null pointer for each pass-by-reference argument. (For pass-by-value arguments, we don't even have a way to check!) Although this calling convention is simple to use, it is not very portable; on some architectures there are problems with passing smaller-than-int data types this way. Also, there is no simple way to return a NULL result, nor to cope with NULL arguments in any way other than making the function strict. The version-1 convention, presented next, overcomes these objections. Version-1 Calling Conventions for C-Language Functions The version-1 calling convention relies on macros to suppress most of the complexity of passing arguments and results. The C declaration of a version-1 function is always Datum funcname(PG_FUNCTION_ARGS) In addition, the macro call PG_FUNCTION_INFO_V1(funcname); must appear in the same source file (conventionally it's written just before the function itself). This macro call is not needed for internal-language functions, since PostgreSQL currently assumes all internal functions are version-1. However, it is required for dynamically-loaded functions. In a version-1 function, each actual argument is fetched using a PG_GETARG_xxx() macro that corresponds to the argument's datatype, and the result is returned using a PG_RETURN_xxx() macro for the return type. Here we show the same functions as above, coded in version-1 style: #include "postgres.h" #include <string.h> #include "fmgr.h" /* By Value */ PG_FUNCTION_INFO_V1(add_one); Datum add_one(PG_FUNCTION_ARGS) { int32 arg = PG_GETARG_INT32(0); PG_RETURN_INT32(arg + 1); } /* By Reference, Fixed Length */ PG_FUNCTION_INFO_V1(add_one_float8); Datum add_one_float8(PG_FUNCTION_ARGS) { /* The macros for FLOAT8 hide its pass-by-reference nature */ float8 arg = PG_GETARG_FLOAT8(0); PG_RETURN_FLOAT8(arg + 1.0); } PG_FUNCTION_INFO_V1(makepoint); Datum makepoint(PG_FUNCTION_ARGS) { /* Here, the pass-by-reference nature of Point is not hidden */ Point *pointx = PG_GETARG_POINT_P(0); Point *pointy = PG_GETARG_POINT_P(1); Point *new_point = (Point *) palloc(sizeof(Point)); new_point->x = pointx->x; new_point->y = pointy->y; PG_RETURN_POINT_P(new_point); } /* By Reference, Variable Length */ PG_FUNCTION_INFO_V1(copytext); Datum copytext(PG_FUNCTION_ARGS) { text *t = PG_GETARG_TEXT_P(0); /* * VARSIZE is the total size of the struct in bytes. */ text *new_t = (text *) palloc(VARSIZE(t)); VARATT_SIZEP(new_t) = VARSIZE(t); /* * VARDATA is a pointer to the data region of the struct. */ memcpy((void *) VARDATA(new_t), /* destination */ (void *) VARDATA(t), /* source */ VARSIZE(t)-VARHDRSZ); /* how many bytes */ PG_RETURN_TEXT_P(new_t); } PG_FUNCTION_INFO_V1(concat_text); Datum concat_text(PG_FUNCTION_ARGS) { text *arg1 = PG_GETARG_TEXT_P(0); text *arg2 = PG_GETARG_TEXT_P(1); int32 new_text_size = VARSIZE(arg1) + VARSIZE(arg2) - VARHDRSZ; text *new_text = (text *) palloc(new_text_size); VARATT_SIZEP(new_text) = new_text_size; memcpy(VARDATA(new_text), VARDATA(arg1), VARSIZE(arg1)-VARHDRSZ); memcpy(VARDATA(new_text) + (VARSIZE(arg1)-VARHDRSZ), VARDATA(arg2), VARSIZE(arg2)-VARHDRSZ); PG_RETURN_TEXT_P(new_text); } The CREATE FUNCTION commands are the same as for the version-0 equivalents. At first glance, the version-1 coding conventions may appear to be just pointless obscurantism. However, they do offer a number of improvements, because the macros can hide unnecessary detail. An example is that in coding add_one_float8, we no longer need to be aware that float8 is a pass-by-reference type. Another example is that the GETARG macros for variable-length types hide the need to deal with fetching toasted (compressed or out-of-line) values. The old-style copytext and concat_text functions shown above are actually wrong in the presence of toasted values, because they don't call pg_detoast_datum() on their inputs. (The handler for old-style dynamically-loaded functions currently takes care of this detail, but it does so less efficiently than is possible for a version-1 function.) One big improvement in version-1 functions is better handling of NULL inputs and results. The macro PG_ARGISNULL(n) allows a function to test whether each input is NULL (of course, doing this is only necessary in functions not declared strict). As with the PG_GETARG_xxx() macros, the input arguments are counted beginning at zero. Note that one should refrain from executing PG_GETARG_xxx() until one has verified that the argument isn't NULL. To return a NULL result, execute PG_RETURN_NULL(); this works in both strict and non-strict functions. The version-1 function call conventions make it possible to return set results and implement trigger functions and procedural-language call handlers. Version-1 code is also more portable than version-0, because it does not break ANSI C restrictions on function call protocol. For more details see src/backend/utils/fmgr/README in the source distribution. Composite Types in C-Language Functions Composite types do not have a fixed layout like C structures. Instances of a composite type may contain null fields. In addition, composite types that are part of an inheritance hierarchy may have different fields than other members of the same inheritance hierarchy. Therefore, PostgreSQL provides a procedural interface for accessing fields of composite types from C. As PostgreSQL processes a set of rows, each row will be passed into your function as an opaque structure of type TUPLE. Suppose we want to write a function to answer the query SELECT name, c_overpaid(emp, 1500) AS overpaid FROM emp WHERE name = 'Bill' OR name = 'Sam'; In the query above, we can define c_overpaid as: #include "postgres.h" #include "executor/executor.h" /* for GetAttributeByName() */ bool c_overpaid(TupleTableSlot *t, /* the current row of EMP */ int32 limit) { bool isnull; int32 salary; salary = DatumGetInt32(GetAttributeByName(t, "salary", &isnull)); if (isnull) return (false); return salary > limit; } /* In version-1 coding, the above would look like this: */ PG_FUNCTION_INFO_V1(c_overpaid); Datum c_overpaid(PG_FUNCTION_ARGS) { TupleTableSlot *t = (TupleTableSlot *) PG_GETARG_POINTER(0); int32 limit = PG_GETARG_INT32(1); bool isnull; int32 salary; salary = DatumGetInt32(GetAttributeByName(t, "salary", &isnull)); if (isnull) PG_RETURN_BOOL(false); /* Alternatively, we might prefer to do PG_RETURN_NULL() for null salary */ PG_RETURN_BOOL(salary > limit); } GetAttributeByName is the PostgreSQL system function that returns attributes out of the current row. It has three arguments: the argument of type TupleTableSlot* passed into the function, the name of the desired attribute, and a return parameter that tells whether the attribute is null. GetAttributeByName returns a Datum value that you can convert to the proper datatype by using the appropriate DatumGetXXX() macro. The following query lets PostgreSQL know about the c_overpaid function: CREATE FUNCTION c_overpaid(emp, int4) RETURNS bool AS 'PGROOT/tutorial/funcs' LANGUAGE C; While there are ways to construct new rows or modify existing rows from within a C function, these are far too complex to discuss in this manual. Consult the backend source code for examples. Writing Code We now turn to the more difficult task of writing programming language functions. Be warned: this section of the manual will not make you a programmer. You must have a good understanding of C (including the use of pointers and the malloc memory manager) before trying to write C functions for use with PostgreSQL. While it may be possible to load functions written in languages other than C into PostgreSQL, this is often difficult (when it is possible at all) because other languages, such as FORTRAN and Pascal often do not follow the same calling convention as C. That is, other languages do not pass argument and return values between functions in the same way. For this reason, we will assume that your programming language functions are written in C. The basic rules for building C functions are as follows: Use pg_config --includedir-serverpg_config to find out where the PostgreSQL server header files are installed on your system (or the system that your users will be running on). This option is new with PostgreSQL 7.2. For PostgreSQL 7.1 you should use the option . (pg_config will exit with a non-zero status if it encounters an unknown option.) For releases prior to 7.1 you will have to guess, but since that was before the current calling conventions were introduced, it is unlikely that you want to support those releases. When allocating memory, use the PostgreSQL routines palloc and pfree instead of the corresponding C library routines malloc and free. The memory allocated by palloc will be freed automatically at the end of each transaction, preventing memory leaks. Always zero the bytes of your structures using memset or bzero. Several routines (such as the hash access method, hash join and the sort algorithm) compute functions of the raw bits contained in your structure. Even if you initialize all fields of your structure, there may be several bytes of alignment padding (holes in the structure) that may contain garbage values. Most of the internal PostgreSQL types are declared in postgres.h, while the function manager interfaces (PG_FUNCTION_ARGS, etc.) are in fmgr.h, so you will need to include at least these two files. For portability reasons it's best to include postgres.h first, before any other system or user header files. Including postgres.h will also include elog.h and palloc.h for you. Symbol names defined within object files must not conflict with each other or with symbols defined in the PostgreSQL server executable. You will have to rename your functions or variables if you get error messages to this effect. Compiling and linking your object code so that it can be dynamically loaded into PostgreSQL always requires special flags. See for a detailed explanation of how to do it for your particular operating system. &dfunc;
Function Overloading overloading More than one function may be defined with the same SQL name, so long as the arguments they take are different. In other words, function names can be overloaded. When a query is executed, the server will determine which function to call from the data types and the number of the provided arguments. Overloading can also be used to simulate functions with a variable number of arguments, up to a finite maximum number. A function may also have the same name as an attribute. In the case that there is an ambiguity between a function on a complex type and an attribute of the complex type, the attribute will always be used. When creating a family of overloaded functions, one should be careful not to create ambiguities. For instance, given the functions CREATE FUNCTION test(int, real) RETURNS ... CREATE FUNCTION test(smallint, double precision) RETURNS ... it is not immediately clear which function would be called with some trivial input like test(1, 1.5). The currently implemented resolution rules are described in the User's Guide, but it is unwise to design a system that subtly relies on this behavior. When overloading C language functions, there is an additional constraint: The C name of each function in the family of overloaded functions must be different from the C names of all other functions, either internal or dynamically loaded. If this rule is violated, the behavior is not portable. You might get a run-time linker error, or one of the functions will get called (usually the internal one). The alternative form of the AS clause for the SQL CREATE FUNCTION command decouples the SQL function name from the function name in the C source code. E.g., CREATE FUNCTION test(int) RETURNS int AS 'filename', 'test_1arg' LANGUAGE C; CREATE FUNCTION test(int, int) RETURNS int AS 'filename', 'test_2arg' LANGUAGE C; The names of the C functions here reflect one of many possible conventions. Prior to PostgreSQL 7.0, this alternative syntax did not exist. There is a trick to get around the problem, by defining a set of C functions with different names and then define a set of identically-named SQL function wrappers that take the appropriate argument types and call the matching C function. Procedural Language Handlers All calls to functions that are written in a language other than the current version 1 interface for compiled languages (this includes functions in user-defined procedural languages, functions written in SQL, and functions using the version 0 compiled language interface), go through a call handler function for the specific language. It is the responsibility of the call handler to execute the function in a meaningful way, such as by interpreting the supplied source text. This section describes how a language call handler can be written. This is not a common task, in fact, it has only been done a handful of times in the history of PostgreSQL, but the topic naturally belongs in this chapter, and the material might give some insight into the extensible nature of the PostgreSQL system. The call handler for a procedural language is a normal function, which must be written in a compiled language such as C and registered with PostgreSQL as taking no arguments and returning the opaque type, a placeholder for unspecified or undefined types. This prevents the call handler from being called directly as a function from queries. (However, arguments may be supplied in the actual call to the handler when a function in the language offered by the handler is to be executed.) In PostgreSQL 7.1 and later, call handlers must adhere to the version 1 function manager interface, not the old-style interface. The call handler is called in the same way as any other function: It receives a pointer to a FunctionCallInfoData struct containing argument values and information about the called function, and it is expected to return a Datum result (and possibly set the isnull field of the FunctionCallInfoData struct, if it wishes to return an SQL NULL result). The difference between a call handler and an ordinary callee function is that the flinfo->fn_oid field of the FunctionCallInfoData struct will contain the OID of the actual function to be called, not of the call handler itself. The call handler must use this field to determine which function to execute. Also, the passed argument list has been set up according to the declaration of the target function, not of the call handler. It's up to the call handler to fetch the pg_proc entry and to analyze the argument and return types of the called procedure. The AS clause from the CREATE FUNCTION of the procedure will be found in the prosrc attribute of the pg_proc table entry. This may be the source text in the procedural language itself (like for PL/Tcl), a path name to a file, or anything else that tells the call handler what to do in detail. Often, the same function is called many times per SQL statement. A call handler can avoid repeated lookups of information about the called function by using the flinfo->fn_extra field. This will initially be NULL, but can be set by the call handler to point at information about the PL function. On subsequent calls, if flinfo->fn_extra is already non-NULL then it can be used and the information lookup step skipped. The call handler must be careful that flinfo->fn_extra is made to point at memory that will live at least until the end of the current query, since an FmgrInfo data structure could be kept that long. One way to do this is to allocate the extra data in the memory context specified by flinfo->fn_mcxt; such data will normally have the same lifespan as the FmgrInfo itself. But the handler could also choose to use a longer-lived context so that it can cache function definition information across queries. When a PL function is invoked as a trigger, no explicit arguments are passed, but the FunctionCallInfoData's context field points at a TriggerData node, rather than being NULL as it is in a plain function call. A language handler should provide mechanisms for PL functions to get at the trigger information. This is a template for a PL handler written in C: #include "postgres.h" #include "executor/spi.h" #include "commands/trigger.h" #include "utils/elog.h" #include "fmgr.h" #include "access/heapam.h" #include "utils/syscache.h" #include "catalog/pg_proc.h" #include "catalog/pg_type.h" PG_FUNCTION_INFO_V1(plsample_call_handler); Datum plsample_call_handler(PG_FUNCTION_ARGS) { Datum retval; if (CALLED_AS_TRIGGER(fcinfo)) { /* * Called as a trigger procedure */ TriggerData *trigdata = (TriggerData *) fcinfo->context; retval = ... } else { /* * Called as a function */ retval = ... } return retval; } Only a few thousand lines of code have to be added instead of the dots to complete the call handler. See for information on how to compile it into a loadable module. The following commands then register the sample procedural language: CREATE FUNCTION plsample_call_handler () RETURNS opaque AS '/usr/local/pgsql/lib/plsample' LANGUAGE C; CREATE LANGUAGE plsample HANDLER plsample_call_handler;