mirror of
https://git.postgresql.org/git/postgresql.git
synced 2024-09-10 06:39:45 +02:00
294 lines
9.4 KiB
Plaintext
294 lines
9.4 KiB
Plaintext
<!-- doc/src/sgml/xtypes.sgml -->
|
|
|
|
<sect1 id="xtypes">
|
|
<title>User-defined Types</title>
|
|
|
|
<indexterm zone="xtypes">
|
|
<primary>data type</primary>
|
|
<secondary>user-defined</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
As described in <xref linkend="extend-type-system">,
|
|
<productname>PostgreSQL</productname> can be extended to support new
|
|
data types. This section describes how to define new base types,
|
|
which are data types defined below the level of the <acronym>SQL</>
|
|
language. Creating a new base type requires implementing functions
|
|
to operate on the type in a low-level language, usually C.
|
|
</para>
|
|
|
|
<para>
|
|
The examples in this section can be found in
|
|
<filename>complex.sql</filename> and <filename>complex.c</filename>
|
|
in the <filename>src/tutorial</> directory of the source distribution.
|
|
See the <filename>README</> file in that directory for instructions
|
|
about running the examples.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm>
|
|
<primary>input function</primary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>output function</primary>
|
|
</indexterm>
|
|
A user-defined type must always have input and output functions.
|
|
These functions determine how the type appears in strings (for input
|
|
by the user and output to the user) and how the type is organized in
|
|
memory. The input function takes a null-terminated character string
|
|
as its argument and returns the internal (in memory) representation
|
|
of the type. The output function takes the internal representation
|
|
of the type as argument and returns a null-terminated character
|
|
string. If we want to do anything more with the type than merely
|
|
store it, we must provide additional functions to implement whatever
|
|
operations we'd like to have for the type.
|
|
</para>
|
|
|
|
<para>
|
|
Suppose we want to define a type <type>complex</> that represents
|
|
complex numbers. A natural way to represent a complex number in
|
|
memory would be the following C structure:
|
|
|
|
<programlisting>
|
|
typedef struct Complex {
|
|
double x;
|
|
double y;
|
|
} Complex;
|
|
</programlisting>
|
|
|
|
We will need to make this a pass-by-reference type, since it's too
|
|
large to fit into a single <type>Datum</> value.
|
|
</para>
|
|
|
|
<para>
|
|
As the external string representation of the type, we choose a
|
|
string of the form <literal>(x,y)</literal>.
|
|
</para>
|
|
|
|
<para>
|
|
The input and output functions are usually not hard to write,
|
|
especially the output function. But when defining the external
|
|
string representation of the type, remember that you must eventually
|
|
write a complete and robust parser for that representation as your
|
|
input function. For instance:
|
|
|
|
<programlisting><![CDATA[
|
|
PG_FUNCTION_INFO_V1(complex_in);
|
|
|
|
Datum
|
|
complex_in(PG_FUNCTION_ARGS)
|
|
{
|
|
char *str = PG_GETARG_CSTRING(0);
|
|
double x,
|
|
y;
|
|
Complex *result;
|
|
|
|
if (sscanf(str, " ( %lf , %lf )", &x, &y) != 2)
|
|
ereport(ERROR,
|
|
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
|
|
errmsg("invalid input syntax for complex: \"%s\"",
|
|
str)));
|
|
|
|
result = (Complex *) palloc(sizeof(Complex));
|
|
result->x = x;
|
|
result->y = y;
|
|
PG_RETURN_POINTER(result);
|
|
}
|
|
]]>
|
|
</programlisting>
|
|
|
|
The output function can simply be:
|
|
|
|
<programlisting><![CDATA[
|
|
PG_FUNCTION_INFO_V1(complex_out);
|
|
|
|
Datum
|
|
complex_out(PG_FUNCTION_ARGS)
|
|
{
|
|
Complex *complex = (Complex *) PG_GETARG_POINTER(0);
|
|
char *result;
|
|
|
|
result = psprintf("(%g,%g)", complex->x, complex->y);
|
|
PG_RETURN_CSTRING(result);
|
|
}
|
|
]]>
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
You should be careful to make the input and output functions inverses of
|
|
each other. If you do not, you will have severe problems when you
|
|
need to dump your data into a file and then read it back in. This
|
|
is a particularly common problem when floating-point numbers are
|
|
involved.
|
|
</para>
|
|
|
|
<para>
|
|
Optionally, a user-defined type can provide binary input and output
|
|
routines. Binary I/O is normally faster but less portable than textual
|
|
I/O. As with textual I/O, it is up to you to define exactly what the
|
|
external binary representation is. Most of the built-in data types
|
|
try to provide a machine-independent binary representation. For
|
|
<type>complex</type>, we will piggy-back on the binary I/O converters
|
|
for type <type>float8</>:
|
|
|
|
<programlisting><![CDATA[
|
|
PG_FUNCTION_INFO_V1(complex_recv);
|
|
|
|
Datum
|
|
complex_recv(PG_FUNCTION_ARGS)
|
|
{
|
|
StringInfo buf = (StringInfo) PG_GETARG_POINTER(0);
|
|
Complex *result;
|
|
|
|
result = (Complex *) palloc(sizeof(Complex));
|
|
result->x = pq_getmsgfloat8(buf);
|
|
result->y = pq_getmsgfloat8(buf);
|
|
PG_RETURN_POINTER(result);
|
|
}
|
|
|
|
PG_FUNCTION_INFO_V1(complex_send);
|
|
|
|
Datum
|
|
complex_send(PG_FUNCTION_ARGS)
|
|
{
|
|
Complex *complex = (Complex *) PG_GETARG_POINTER(0);
|
|
StringInfoData buf;
|
|
|
|
pq_begintypsend(&buf);
|
|
pq_sendfloat8(&buf, complex->x);
|
|
pq_sendfloat8(&buf, complex->y);
|
|
PG_RETURN_BYTEA_P(pq_endtypsend(&buf));
|
|
}
|
|
]]>
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Once we have written the I/O functions and compiled them into a shared
|
|
library, we can define the <type>complex</type> type in SQL.
|
|
First we declare it as a shell type:
|
|
|
|
<programlisting>
|
|
CREATE TYPE complex;
|
|
</programlisting>
|
|
|
|
This serves as a placeholder that allows us to reference the type while
|
|
defining its I/O functions. Now we can define the I/O functions:
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION complex_in(cstring)
|
|
RETURNS complex
|
|
AS '<replaceable>filename</replaceable>'
|
|
LANGUAGE C IMMUTABLE STRICT;
|
|
|
|
CREATE FUNCTION complex_out(complex)
|
|
RETURNS cstring
|
|
AS '<replaceable>filename</replaceable>'
|
|
LANGUAGE C IMMUTABLE STRICT;
|
|
|
|
CREATE FUNCTION complex_recv(internal)
|
|
RETURNS complex
|
|
AS '<replaceable>filename</replaceable>'
|
|
LANGUAGE C IMMUTABLE STRICT;
|
|
|
|
CREATE FUNCTION complex_send(complex)
|
|
RETURNS bytea
|
|
AS '<replaceable>filename</replaceable>'
|
|
LANGUAGE C IMMUTABLE STRICT;
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Finally, we can provide the full definition of the data type:
|
|
<programlisting>
|
|
CREATE TYPE complex (
|
|
internallength = 16,
|
|
input = complex_in,
|
|
output = complex_out,
|
|
receive = complex_recv,
|
|
send = complex_send,
|
|
alignment = double
|
|
);
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm>
|
|
<primary>array</primary>
|
|
<secondary>of user-defined type</secondary>
|
|
</indexterm>
|
|
When you define a new base type,
|
|
<productname>PostgreSQL</productname> automatically provides support
|
|
for arrays of that type. The array type typically
|
|
has the same name as the base type with the underscore character
|
|
(<literal>_</>) prepended.
|
|
</para>
|
|
|
|
<para>
|
|
Once the data type exists, we can declare additional functions to
|
|
provide useful operations on the data type. Operators can then be
|
|
defined atop the functions, and if needed, operator classes can be
|
|
created to support indexing of the data type. These additional
|
|
layers are discussed in following sections.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm>
|
|
<primary>TOAST</primary>
|
|
<secondary>and user-defined types</secondary>
|
|
</indexterm>
|
|
If the values of your data type vary in size (in internal form), you should
|
|
make the data type <acronym>TOAST</>-able (see <xref
|
|
linkend="storage-toast">). You should do this even if the data are always
|
|
too small to be compressed or stored externally, because
|
|
<acronym>TOAST</> can save space on small data too, by reducing header
|
|
overhead.
|
|
</para>
|
|
|
|
<para>
|
|
To do this, the internal representation must follow the standard layout for
|
|
variable-length data: the first four bytes must be a <type>char[4]</type>
|
|
field which is never accessed directly (customarily named
|
|
<structfield>vl_len_</>). You
|
|
must use <function>SET_VARSIZE()</function> to store the size of the datum
|
|
in this field and <function>VARSIZE()</function> to retrieve it. The C
|
|
functions operating on the data type must always be careful to unpack any
|
|
toasted values they are handed, by using <function>PG_DETOAST_DATUM</>.
|
|
(This detail is customarily hidden by defining type-specific
|
|
<function>GETARG_DATATYPE_P</function> macros.) Then, when running the
|
|
<command>CREATE TYPE</command> command, specify the internal length as
|
|
<literal>variable</> and select the appropriate storage option.
|
|
</para>
|
|
|
|
<para>
|
|
If the alignment is unimportant (either just for a specific function or
|
|
because the data type specifies byte alignment anyway) then it's possible
|
|
to avoid some of the overhead of <function>PG_DETOAST_DATUM</>. You can use
|
|
<function>PG_DETOAST_DATUM_PACKED</> instead (customarily hidden by
|
|
defining a <function>GETARG_DATATYPE_PP</> macro) and using the macros
|
|
<function>VARSIZE_ANY_EXHDR</> and <function>VARDATA_ANY</> to access
|
|
a potentially-packed datum.
|
|
Again, the data returned by these macros is not aligned even if the data
|
|
type definition specifies an alignment. If the alignment is important you
|
|
must go through the regular <function>PG_DETOAST_DATUM</> interface.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
Older code frequently declares <structfield>vl_len_</> as an
|
|
<type>int32</> field instead of <type>char[4]</>. This is OK as long as
|
|
the struct definition has other fields that have at least <type>int32</>
|
|
alignment. But it is dangerous to use such a struct definition when
|
|
working with a potentially unaligned datum; the compiler may take it as
|
|
license to assume the datum actually is aligned, leading to core dumps on
|
|
architectures that are strict about alignment.
|
|
</para>
|
|
</note>
|
|
|
|
<para>
|
|
For further details see the description of the
|
|
<xref linkend="sql-createtype"> command.
|
|
</para>
|
|
</sect1>
|