postgresql/doc/src/sgml/xaggr.sgml

<!-- $PostgreSQL: pgsql/doc/src/sgml/xaggr.sgml,v 1.33 2006/09/16 00:30:16 momjian Exp $ -->

 <sect1 id="xaggr">
  <title>User-Defined Aggregates</title>

  <indexterm zone="xaggr">
   <primary>aggregate function</primary>
   <secondary>user-defined</secondary>
  </indexterm>

  <para>
   Aggregate functions  in <productname>PostgreSQL</productname> 
   are expressed in terms of <firstterm>state values</firstterm>
   and <firstterm>state transition functions</firstterm>.
   That is, an aggregate operates using a state value that is updated
   as each successive input row is processed.
   To define a new aggregate
   function, one selects a data type for the state value,
   an initial value for the state, and a state transition
   function.  The state transition function is just an
   ordinary function that could also be used outside the
   context of the aggregate.  A <firstterm>final function</firstterm>
   can also be specified, in case the desired result of the aggregate
   is different from the data that needs to be kept in the running
   state value.
  </para>

  <para>
   Thus, in addition to the argument and result data types seen by a user
   of the aggregate, there is an internal state-value data type that
   may be different from both the argument and result types.
  </para>

  <para>
   If we define an aggregate that does not use a final function,
   we have an aggregate that computes a running function of
   the column values from each row.  <function>sum</>  is  an
   example  of  this  kind  of aggregate.  <function>sum</> starts at
   zero and always adds the current  row's  value  to
   its  running  total.  For example, if we want to make a <function>sum</>
   aggregate to work on a data type for complex numbers,
   we only need the addition function for that data type.
   The aggregate definition would be:
   
<screen>
CREATE AGGREGATE sum (complex)
(
    sfunc = complex_add,
    stype = complex,
    initcond = '(0,0)'
);

SELECT sum(a) FROM test_complex;

   sum
-----------
 (34,53.9)
</screen>

   (Notice that we are relying on function overloading: there is more than
    one aggregate named <function>sum</>, but
   <productname>PostgreSQL</productname> can figure out which kind
   of sum applies to a column of type <type>complex</type>.)
  </para>

  <para>
   The above definition of <function>sum</function> will return zero (the initial
   state condition) if there are no nonnull input values.
   Perhaps we want to return null in that case instead &mdash; the SQL standard
   expects <function>sum</function> to behave that way.  We can do this simply by
   omitting the <literal>initcond</literal> phrase, so that the initial state
   condition is null.  Ordinarily this would mean that the <literal>sfunc</literal>
   would need to check for a null state-condition input, but for
   <function>sum</function> and some other simple aggregates like
   <function>max</> and <function>min</>,
   it is sufficient to insert the first nonnull input value into
   the state variable and then start applying the transition function
   at the second nonnull input value.  <productname>PostgreSQL</productname>
   will do that automatically if the initial condition is null and
   the transition function is marked <quote>strict</> (i.e., not to be called
   for null inputs).
  </para>
  
  <para>
   Another bit of default behavior for a <quote>strict</> transition function
   is that the previous state value is retained unchanged whenever a
   null input value is encountered.  Thus, null values are ignored.  If you
   need some other behavior for null inputs, do not declare your
   transition function as strict; instead code it to test for null inputs and
   do whatever is needed.
  </para>
  
  <para>
   <function>avg</> (average) is a more complex example of an aggregate.
   It requires
   two pieces of running state: the sum of the inputs and the count
   of the number of inputs.  The final result is obtained by dividing
   these quantities.  Average is typically implemented by using a
   two-element array as the state value.  For example,
   the built-in implementation of <function>avg(float8)</function>
   looks like:

<programlisting>
CREATE AGGREGATE avg (float8)
(
    sfunc = float8_accum,
    stype = float8[],
    finalfunc = float8_avg,
    initcond = '{0,0}'
);
</programlisting>
  </para>

  <para>
   Aggregate functions may use polymorphic
   state transition functions or final functions, so that the same functions
   can be used to implement multiple aggregates.
   See <xref linkend="extend-types-polymorphic">
   for an explanation of polymorphic functions.
   Going a step further, the aggregate function itself may be specified
   with polymorphic input type(s) and state type, allowing a single
   aggregate definition to serve for multiple input data types.
   Here is an example of a polymorphic aggregate:

<programlisting>
CREATE AGGREGATE array_accum (anyelement)
(
    sfunc = array_append,
    stype = anyarray,
    initcond = '{}'
);
</programlisting>

   Here, the actual state type for any aggregate call is the array type
   having the actual input type as elements.
  </para>

  <para>
   Here's the output using two different actual data types as arguments:

<programlisting>
SELECT attrelid::regclass, array_accum(attname)
    FROM pg_attribute
    WHERE attnum &gt; 0 AND attrelid = 'pg_tablespace'::regclass
    GROUP BY attrelid;

   attrelid    |              array_accum              
---------------+---------------------------------------
 pg_tablespace | {spcname,spcowner,spclocation,spcacl}
(1 row)

SELECT attrelid::regclass, array_accum(atttypid)
    FROM pg_attribute
    WHERE attnum &gt; 0 AND attrelid = 'pg_tablespace'::regclass
    GROUP BY attrelid;

   attrelid    |   array_accum   
---------------+-----------------
 pg_tablespace | {19,26,25,1034}
(1 row)
</programlisting>
  </para>

  <para>
   A function written in C can detect that it is being called as an
   aggregate transition or final function by seeing if it was passed
   an <structname>AggState</> node as the function call <quote>context</>,
   for example by
<programlisting>
        if (fcinfo->context &amp;&amp; IsA(fcinfo->context, AggState))
</programlisting>
   One reason for checking this is that when it is true, the first input
   must be a temporary transition value and can therefore safely be modified
   in-place rather than allocating a new copy.  (This is the <emphasis>only</>
   case where it is safe for a function to modify a pass-by-reference input.)
   See <literal>int8inc()</> for an example.
  </para>

  <para>
   For further details see the
   <xref linkend="sql-createaggregate" endterm="sql-createaggregate-title">
   command.
  </para>
 </sect1>