postgresql/doc/src/sgml/xaggr.sgml

185 lines
6.6 KiB
Plaintext
Raw Normal View History

<!-- $PostgreSQL: pgsql/doc/src/sgml/xaggr.sgml,v 1.33 2006/09/16 00:30:16 momjian Exp $ -->
<sect1 id="xaggr">
<title>User-Defined Aggregates</title>
1998-03-01 09:16:16 +01:00
<indexterm zone="xaggr">
2003-08-31 19:32:24 +02:00
<primary>aggregate function</primary>
<secondary>user-defined</secondary>
</indexterm>
<para>
Aggregate functions in <productname>PostgreSQL</productname>
are expressed in terms of <firstterm>state values</firstterm>
and <firstterm>state transition functions</firstterm>.
That is, an aggregate operates using a state value that is updated
as each successive input row is processed.
To define a new aggregate
function, one selects a data type for the state value,
an initial value for the state, and a state transition
function. The state transition function is just an
ordinary function that could also be used outside the
context of the aggregate. A <firstterm>final function</firstterm>
can also be specified, in case the desired result of the aggregate
is different from the data that needs to be kept in the running
state value.
</para>
<para>
Thus, in addition to the argument and result data types seen by a user
of the aggregate, there is an internal state-value data type that
may be different from both the argument and result types.
</para>
<para>
If we define an aggregate that does not use a final function,
we have an aggregate that computes a running function of
the column values from each row. <function>sum</> is an
example of this kind of aggregate. <function>sum</> starts at
zero and always adds the current row's value to
2002-01-07 03:29:15 +01:00
its running total. For example, if we want to make a <function>sum</>
aggregate to work on a data type for complex numbers,
we only need the addition function for that data type.
The aggregate definition would be:
<screen>
CREATE AGGREGATE sum (complex)
(
sfunc = complex_add,
stype = complex,
initcond = '(0,0)'
);
1998-03-01 09:16:16 +01:00
SELECT sum(a) FROM test_complex;
1998-03-01 09:16:16 +01:00
sum
-----------
2002-01-07 03:29:15 +01:00
(34,53.9)
</screen>
(Notice that we are relying on function overloading: there is more than
one aggregate named <function>sum</>, but
<productname>PostgreSQL</productname> can figure out which kind
of sum applies to a column of type <type>complex</type>.)
</para>
1998-03-01 09:16:16 +01:00
<para>
2002-01-07 03:29:15 +01:00
The above definition of <function>sum</function> will return zero (the initial
state condition) if there are no nonnull input values.
Perhaps we want to return null in that case instead &mdash; the SQL standard
2002-01-07 03:29:15 +01:00
expects <function>sum</function> to behave that way. We can do this simply by
omitting the <literal>initcond</literal> phrase, so that the initial state
condition is null. Ordinarily this would mean that the <literal>sfunc</literal>
would need to check for a null state-condition input, but for
<function>sum</function> and some other simple aggregates like
<function>max</> and <function>min</>,
it is sufficient to insert the first nonnull input value into
the state variable and then start applying the transition function
at the second nonnull input value. <productname>PostgreSQL</productname>
will do that automatically if the initial condition is null and
the transition function is marked <quote>strict</> (i.e., not to be called
for null inputs).
</para>
<para>
Another bit of default behavior for a <quote>strict</> transition function
is that the previous state value is retained unchanged whenever a
null input value is encountered. Thus, null values are ignored. If you
need some other behavior for null inputs, do not declare your
transition function as strict; instead code it to test for null inputs and
do whatever is needed.
</para>
<para>
<function>avg</> (average) is a more complex example of an aggregate.
It requires
two pieces of running state: the sum of the inputs and the count
of the number of inputs. The final result is obtained by dividing
these quantities. Average is typically implemented by using a
two-element array as the state value. For example,
the built-in implementation of <function>avg(float8)</function>
looks like:
2002-01-07 03:29:15 +01:00
<programlisting>
CREATE AGGREGATE avg (float8)
(
sfunc = float8_accum,
stype = float8[],
finalfunc = float8_avg,
initcond = '{0,0}'
);
2002-01-07 03:29:15 +01:00
</programlisting>
</para>
<para>
Aggregate functions may use polymorphic
state transition functions or final functions, so that the same functions
can be used to implement multiple aggregates.
2003-08-31 19:32:24 +02:00
See <xref linkend="extend-types-polymorphic">
for an explanation of polymorphic functions.
Going a step further, the aggregate function itself may be specified
with polymorphic input type(s) and state type, allowing a single
aggregate definition to serve for multiple input data types.
Here is an example of a polymorphic aggregate:
<programlisting>
CREATE AGGREGATE array_accum (anyelement)
(
sfunc = array_append,
stype = anyarray,
initcond = '{}'
);
</programlisting>
Here, the actual state type for any aggregate call is the array type
having the actual input type as elements.
</para>
<para>
Here's the output using two different actual data types as arguments:
<programlisting>
SELECT attrelid::regclass, array_accum(attname)
2003-11-12 23:47:47 +01:00
FROM pg_attribute
WHERE attnum &gt; 0 AND attrelid = 'pg_tablespace'::regclass
2003-11-12 23:47:47 +01:00
GROUP BY attrelid;
attrelid | array_accum
---------------+---------------------------------------
pg_tablespace | {spcname,spcowner,spclocation,spcacl}
(1 row)
SELECT attrelid::regclass, array_accum(atttypid)
2003-11-12 23:47:47 +01:00
FROM pg_attribute
WHERE attnum &gt; 0 AND attrelid = 'pg_tablespace'::regclass
2003-11-12 23:47:47 +01:00
GROUP BY attrelid;
attrelid | array_accum
---------------+-----------------
pg_tablespace | {19,26,25,1034}
(1 row)
</programlisting>
</para>
<para>
A function written in C can detect that it is being called as an
aggregate transition or final function by seeing if it was passed
an <structname>AggState</> node as the function call <quote>context</>,
for example by
<programlisting>
if (fcinfo->context &amp;&amp; IsA(fcinfo->context, AggState))
</programlisting>
One reason for checking this is that when it is true, the first input
must be a temporary transition value and can therefore safely be modified
in-place rather than allocating a new copy. (This is the <emphasis>only</>
case where it is safe for a function to modify a pass-by-reference input.)
2005-03-30 04:08:39 +02:00
See <literal>int8inc()</> for an example.
</para>
<para>
For further details see the
<xref linkend="sql-createaggregate" endterm="sql-createaggregate-title">
command.
</para>
</sect1>