postgresql/doc/src/sgml/intagg.sgml

<!-- $PostgreSQL: pgsql/doc/src/sgml/intagg.sgml,v 1.3 2007/12/10 05:32:51 tgl Exp $ -->

<sect1 id="intagg">
 <title>intagg</title>

 <indexterm zone="intagg">
  <primary>intagg</primary>
 </indexterm>

 <para>
  The <filename>intagg</filename> module provides an integer aggregator and an
  enumerator.
 </para>

 <sect2>
  <title>Functions</title>

 <para>
  The aggregator is an aggregate function
  <function>int_array_aggregate(integer)</>
  that produces an integer array
  containing exactly the integers it is fed.
  Here is a not-tremendously-useful example:
 </para>

 <programlisting>
test=# select int_array_aggregate(i) from
test-#   generate_series(1,10,2) i;
 int_array_aggregate
---------------------
 {1,3,5,7,9}
(1 row)
 </programlisting>

 <para>
  The enumerator is a function
  <function>int_array_enum(integer[])</>
  that returns <type>setof integer</>.  It is essentially the reverse
  operation of the aggregator: given an array of integers, expand it
  into a set of rows.  For example,
 </para>

 <programlisting>
test=# select * from int_array_enum(array[1,3,5,7,9]);
 int_array_enum
----------------
              1
              3
              5
              7
              9
(5 rows)
 </programlisting>

 </sect2>

 <sect2>
  <title>Sample Uses</title>

  <para>
   Many database systems have the notion of a one to many table. Such a table
   usually sits between two indexed tables, for example:
  </para>

 <programlisting>
CREATE TABLE left (id INT PRIMARY KEY, ...);
CREATE TABLE right (id INT PRIMARY KEY, ...);
CREATE TABLE one_to_many(left INT REFERENCES left, right INT REFERENCES right);
 </programlisting>

 <para>
  It is typically used like this:
 </para>

 <programlisting>
  SELECT right.* from right JOIN one_to_many ON (right.id = one_to_many.right)
    WHERE one_to_many.left = <replaceable>item</>;
 </programlisting>

 <para>
  This will return all the items in the right hand table for an entry
  in the left hand table. This is a very common construct in SQL.
 </para>

 <para>
  Now, this methodology can be cumbersome with a very large number of
  entries in the <structname>one_to_many</> table.  Often,
  a join like this would result in an index scan
  and a fetch for each right hand entry in the table for a particular
  left hand entry. If you have a very dynamic system, there is not much you
  can do. However, if you have some data which is fairly static, you can
  create a summary table with the aggregator.
 </para>

 <programlisting>
CREATE TABLE summary as
  SELECT left, int_array_aggregate(right) AS right
  FROM one_to_many
  GROUP BY left;
 </programlisting>

 <para>
  This will create a table with one row per left item, and an array
  of right items. Now this is pretty useless without some way of using
  the array; that's why there is an array enumerator.  You can do
 </para>

 <programlisting>
SELECT left, int_array_enum(right) FROM summary WHERE left = <replaceable>item</>;
 </programlisting>

 <para>
  The above query using <function>int_array_enum</> produces the same results
  as
 </para>

 <programlisting>
SELECT left, right FROM one_to_many WHERE left = <replaceable>item</>;
 </programlisting>

 <para>
  The difference is that the query against the summary table has to get
  only one row from the table, whereas the direct query against
  <structname>one_to_many</> must index scan and fetch a row for each entry.
 </para>

 <para>
  On one system, an <command>EXPLAIN</> showed a query with a cost of 8488 was
  reduced to a cost of 329.  The original query was a join involving the
  <structname>one_to_many</> table, which was replaced by:
 </para>

 <programlisting>
SELECT right, count(right) FROM
  ( SELECT left, int_array_enum(right) AS right
    FROM summary JOIN (SELECT left FROM left_table WHERE left = <replaceable>item</>) AS lefts
         ON (summary.left = lefts.left)
  ) AS list
  GROUP BY right
  ORDER BY count DESC;
 </programlisting>

 </sect2>

</sect1>