postgresql/doc/src/sgml/xindex.sgml

<!--
$Header: /cvsroot/pgsql/doc/src/sgml/xindex.sgml,v 1.20 2001/10/26 21:17:03 tgl Exp $
Postgres documentation
-->

 <chapter id="xindex">
  <title>Interfacing Extensions To Indexes</title>

  <para>
   The procedures described thus far let you define a new type, new
   functions and new operators.  However, we cannot yet define a secondary
   index (such as a <acronym>B-tree</acronym>, <acronym>R-tree</acronym> or
   hash access method) over a new type or its operators.
  </para>

  <para>
   Look back at
   <xref linkend="EXTEND-CATALOGS">.
   The right half shows the  catalogs  that we must modify in order to tell
   <productname>Postgres</productname> how to use a user-defined type and/or
   user-defined  operators with an index (i.e., <filename>pg_am, pg_amop,
    pg_amproc, pg_operator</filename> and <filename>pg_opclass</filename>).
   Unfortunately, there is no simple command to do this.  We will demonstrate
   how to modify these catalogs through a running example:  a  new  operator
   class for the <acronym>B-tree</acronym> access method that stores and
   sorts complex numbers in ascending absolute value order.
  </para>

  <para>
   The <filename>pg_am</filename> table contains one row for every index
   access method.  Support for the heap access method is built into
   <productname>Postgres</productname>, but every other access method is
   described in <filename>pg_am</filename>.  The schema is

   <table tocentry="1">
    <title>Index Access Method Schema</title>

    <tgroup cols="2">
     <thead>
      <row>
       <entry>Column</entry>
       <entry>Description</entry>
      </row>
     </thead>
     <tbody>
      <row>
       <entry>amname</entry>
       <entry>name of the access method</entry>
      </row>
      <row>
       <entry>amowner</entry>
       <entry>user id of the owner</entry>
      </row>
      <row>
       <entry>amstrategies</entry>
       <entry>number of strategies for this access method (see below)</entry>
      </row>
      <row>
       <entry>amsupport</entry>
       <entry>number of support routines for this access method (see below)</entry>
      </row>
      <row>
       <entry>amorderstrategy</entry>
       <entry>zero if the index offers no sort order, otherwise the strategy
        number of the strategy operator that describes the sort order</entry>
      </row>
      <row>
       <entry>amcanunique</entry>
       <entry>does AM support UNIQUE indexes?</entry>
      </row>
      <row>
       <entry>amcanmulticol</entry>
       <entry>does AM support multicolumn indexes?</entry>
      </row>
      <row>
       <entry>amindexnulls</entry>
       <entry>does AM support NULL index entries?</entry>
      </row>
      <row>
       <entry>amconcurrent</entry>
       <entry>does AM support concurrent updates?</entry>
      </row>
      <row>
       <entry>amgettuple</entry>
      </row>
      <row>
       <entry>aminsert</entry>
      </row>
      <row>
       <entry>...</entry>
       <entry>procedure  identifiers  for  interface routines to the access
        method.  For example, regproc ids for opening,  closing,  and
        getting rows from the access method appear here.</entry>
      </row>
     </tbody>
    </tgroup>
   </table>
  </para>

  <para>
   The <acronym>object ID</acronym> of the row in
   <filename>pg_am</filename> is used as a foreign key in a lot of other
   tables.  You  do not  need to  add a new row to this table; all that
   you are interested in is the <acronym>object ID</acronym> of the access
   method you want to extend:

   <programlisting>
SELECT oid FROM pg_am WHERE amname = 'btree';

 oid
-----
 403
(1 row)
   </programlisting>

   We will use that <command>SELECT</command> in a <command>WHERE</command>
   clause later.
  </para>

  <para>
   The <filename>amstrategies</filename> column exists to standardize
   comparisons across data types.  For example, <acronym>B-tree</acronym>s
   impose a strict ordering on keys, lesser to greater.  Since
   <productname>Postgres</productname> allows the user to define operators,
   <productname>Postgres</productname> cannot look at the name of an operator
   (e.g., <literal>&gt;</> or <literal>&lt;</>) and tell what kind of comparison it is.  In fact,
   some  access methods don't impose any ordering at all.  For example,
   <acronym>R-tree</acronym>s express a rectangle-containment relationship,
   whereas a hashed data structure expresses only bitwise similarity based
   on the value of a hash function.  <productname>Postgres</productname>
   needs some consistent way of taking a qualification in your query,
   looking at the operator and then deciding if a usable index exists.  This
   implies that <productname>Postgres</productname> needs to know, for
   example, that the  <literal>&lt;=</>  and  <literal>&gt;</> operators partition a
   <acronym>B-tree</acronym>.  <productname>Postgres</productname>
   uses strategies to express these relationships  between
   operators and the way they can be used to scan indexes.
  </para>

  <para>
   Defining a new set of strategies is beyond the scope of this discussion,
   but we'll explain how <acronym>B-tree</acronym> strategies work because
   you'll need to know that to add a new B-tree operator class. In the
   <filename>pg_am</filename> table, the amstrategies column is the
   number of strategies defined for this access method. For
   <acronym>B-tree</acronym>s, this number is 5.  These strategies
   correspond to

   <table tocentry="1">
    <title>B-tree Strategies</title>
    <titleabbrev>B-tree</titleabbrev>
    <tgroup cols="2">
     <thead>
      <row>
       <entry>Operation</entry>
       <entry>Index</entry>
      </row>
     </thead>
     <tbody>
      <row>
       <entry>less than</entry>
       <entry>1</entry>
      </row>
      <row>
       <entry>less than or equal</entry>
       <entry>2</entry>
      </row>
      <row>
       <entry>equal</entry>
       <entry>3</entry>
      </row>
      <row>
       <entry>greater than or equal</entry>
       <entry>4</entry>
      </row>
      <row>
       <entry>greater than</entry>
       <entry>5</entry>
      </row>
     </tbody>
    </tgroup>
   </table>
  </para>

  <para>
   The idea is that you'll need to add operators corresponding to the
   comparisons above to the <filename>pg_amop</filename> relation (see below).
   The access method code can use these strategy numbers, regardless of data
   type, to figure out how to partition the <acronym>B-tree</acronym>,
   compute selectivity, and so on.  Don't worry about the details of adding
   operators yet; just understand that there must be a set of these
   operators for <filename>int2, int4, oid,</filename> and every other
   data type on which a <acronym>B-tree</acronym> can operate.
  </para>

  <para>
   Sometimes, strategies aren't enough information for the system to figure
   out how to use an index.  Some access methods require additional support
   routines in order to work. For example, the <acronym>B-tree</acronym>
   access method must be able to compare two keys and determine whether one
   is greater than, equal to, or less than the other.  Similarly, the
   <acronym>R-tree</acronym> access method must be able to compute
   intersections,  unions, and sizes of rectangles.  These
   operations do not correspond to operators used in qualifications in
   SQL queries;  they are administrative routines used by
   the access methods, internally.
  </para>

  <para>
   In order to manage diverse support routines consistently across all
   <productname>Postgres</productname> access methods,
   <filename>pg_am</filename> includes a column called
   <filename>amsupport</filename>.  This column records the number of
   support routines used by an access method.  For <acronym>B-tree</acronym>s,
   this number is one -- the routine to take two keys and return -1, 0, or
   +1, depending on whether the first key is less than, equal
   to, or greater than the second.

   <note>
    <para>
     Strictly  speaking, this routine can return a negative
     number (&lt; 0), zero, or a non-zero positive number (&gt; 0).
    </para>
   </note>
  </para>

  <para>
   The <filename>amstrategies</filename> entry in <filename>pg_am</filename>
   is just the number
   of strategies defined for the access method in question.  The operators
   for less than, less equal, and so on don't appear in
   <filename>pg_am</filename>.  Similarly, <filename>amsupport</filename>
   is just the number of support routines required by  the  access
   method.  The actual routines are listed elsewhere.
  </para>

  <para>
   By the way, the <filename>amorderstrategy</filename> entry tells whether
   the access method supports ordered scan.  Zero means it doesn't; if it
   does, <filename>amorderstrategy</filename> is the number of the strategy
   routine that corresponds to the ordering operator.  For example, B-tree
   has <filename>amorderstrategy</filename> = 1 which is its
   <quote>less than</quote> strategy number.
  </para>

  <para>
   The next table of interest is <filename>pg_opclass</filename>.  This table
   defines operator class names and input data types for each of the operator
   classes supported by a given index access method.  The same class name
   can be used for several different access methods (for example, both B-tree
   and hash access methods have operator classes named
   <filename>oid_ops</filename>), but a separate
   <filename>pg_opclass</filename> row must appear for each access method.
   The <filename>oid</filename> of the <filename>pg_opclass</filename> row is
   used as a foreign 
   key in other tables to associate specific operators and support routines
   with the operator class.
  </para>

  <para>
   You need to add a row with your opclass name (for example,
   <filename>complex_abs_ops</filename>) to
   <filename>pg_opclass</filename>:

   <programlisting>
INSERT INTO pg_opclass (opcamid, opcname, opcintype, opcdefault, opckeytype)
    VALUES (
        (SELECT oid FROM pg_am WHERE amname = 'btree'),
        'complex_abs_ops',
        (SELECT oid FROM pg_type WHERE typname = 'complex'),
        true,
        0);

SELECT oid, *
    FROM pg_opclass
    WHERE opcname = 'complex_abs_ops';

  oid   | opcamid |     opcname     | opcintype | opcdefault | opckeytype
--------+---------+-----------------+-----------+------------+------------
 277975 |     403 | complex_abs_ops |    277946 | t          |          0
(1 row)
   </programlisting>

   Note that the oid for your <filename>pg_opclass</filename> row will
   be different!  Don't worry about this though.  We'll get this number
   from the system later just like we got the oid of the type here.
  </para>

  <para>
   The above example assumes that you want to make this new opclass the
   default B-tree opclass for the <filename>complex</filename> data type.
   If you don't, just set <filename>opcdefault</filename> to false instead.
   <filename>opckeytype</filename> is not described here; it should always
   be zero for B-tree opclasses.
  </para>

  <para>
   So now we have an access method and an operator  class.
   We  still  need  a  set of operators.  The procedure for
   defining operators was discussed earlier in  this  manual.
   For  the  <filename>complex_abs_ops</filename>  operator  class on B-trees,
   the operators we require are:

   <programlisting>
        absolute value less-than
        absolute value less-than-or-equal
        absolute value equal
        absolute value greater-than-or-equal
        absolute value greater-than
   </programlisting>
  </para>

  <para>
   Suppose the code that implements these functions
   is stored in the file
   <replaceable>PGROOT</replaceable><filename>/tutorial/complex.c</filename>,
   which we have compiled into
   <replaceable>PGROOT</replaceable><filename>/tutorial/complex.so</filename>.
  </para>

  <para>
   Part of the C code looks like this: (note that we will only show the
   equality operator for the rest of the examples.  The other four
   operators are very similar.  Refer to <filename>complex.c</filename>
   or <filename>complex.source</filename> for the details.)

   <programlisting>
#define Mag(c) ((c)-&gt;x*(c)-&gt;x + (c)-&gt;y*(c)-&gt;y)

         bool
         complex_abs_eq(Complex *a, Complex *b)
         {
             double amag = Mag(a), bmag = Mag(b);
             return (amag==bmag);
         }
   </programlisting>
  </para>

  <para>
   We make the function known to Postgres like this:
   <programlisting>
CREATE FUNCTION complex_abs_eq(complex, complex)
              RETURNS bool
              AS '<replaceable>PGROOT</replaceable>/tutorial/complex'
              LANGUAGE C;
   </programlisting>
  </para>

  <para>
   There are some important things that are happening here.
  </para>

  <para>
   First, note that operators for less-than, less-than-or-equal, equal,
   greater-than-or-equal, and greater-than for <filename>complex</filename>
   are being defined.  We can only have one operator named, say, = and
   taking type <filename>complex</filename> for both operands.  In this case
   we don't have any other operator = for <filename>complex</filename>,
   but if we were building a practical data type we'd probably want = to
   be the ordinary equality operation for complex numbers.  In that case,
   we'd need to use some other operator name for complex_abs_eq.
  </para>

  <para>
   Second, although Postgres can cope with operators having
   the same name as long as they have different input data types, C can only
   cope with one global routine having a given name, period.  So we shouldn't
   name the C function something simple like <filename>abs_eq</filename>.
   Usually it's a good practice to include the data type name in the C
   function name, so as not to conflict with functions for other data types.
  </para>

  <para>
   Third, we could have made the Postgres name of the function
   <filename>abs_eq</filename>, relying on Postgres to distinguish it
   by input data types from any other Postgres function of the same name.
   To keep the example simple, we make the function have the same names
   at the C level and Postgres level.
  </para>

  <para>
   Finally, note that these operator functions return Boolean values.
   In practice, all operators defined as index access method strategies
   must return Boolean, since they must appear at the top level of a WHERE
   clause to be used with an index.
   (On the other
   hand, the support function returns whatever the particular access method
   expects -- in this case, a signed integer.)
  </para>

  <para>
   The final routine in the
   file is the <quote>support routine</quote> mentioned when we discussed the amsupport
   column of the <filename>pg_am</filename> table.  We will use this
   later on.  For now, ignore it.
  </para>

  <para>
   Now we are ready to define the operators:

   <programlisting>
CREATE OPERATOR = (
     leftarg = complex, rightarg = complex,
     procedure = complex_abs_eq,
     restrict = eqsel, join = eqjoinsel
         );
   </programlisting>

   The important
   things here are the procedure names (which are the <acronym>C</acronym>
   functions defined above) and the restriction and join selectivity
   functions.  You should just use the selectivity functions used in
   the example (see <filename>complex.source</filename>).
   Note that there
   are different such functions for the less-than, equal, and greater-than
   cases.  These must be supplied, or the optimizer will be unable to
   make effective use of the index.
  </para>

  <para>
   The next step is to add entries for these operators to
   the <filename>pg_amop</filename> relation.  To do this,
   we'll need the <filename>oid</filename>s of the operators we just
   defined.  We'll look up the names of all the operators that take
   two <filename>complex</filename>es, and pick ours out:
   
   <programlisting>
    SELECT o.oid AS opoid, o.oprname
     INTO TEMP TABLE complex_ops_tmp
     FROM pg_operator o, pg_type t
     WHERE o.oprleft = t.oid and o.oprright = t.oid
      and t.typname = 'complex';

 opoid  | oprname
--------+---------
 277963 | +
 277970 | &lt;
 277971 | &lt;=
 277972 | =
 277973 | &gt;=
 277974 | &gt;
(6 rows)
   </programlisting>

   (Again, some of your <filename>oid</filename> numbers will almost
   certainly be different.)  The operators we are interested in are those
   with <filename>oid</filename>s 277970 through 277974.  The values you
   get will probably be different, and you should substitute them for the
   values below.  We will do this with a select statement.
  </para>

  <para>
   Now we are ready to insert entries into <filename>pg_amop</filename> for
   our new operator class.  These entries must associate the correct
   B-tree strategy numbers with each of the operators we need.
   The command to insert the less-than operator looks like:

   <programlisting>
    INSERT INTO pg_amop (amopclaid, amopstrategy, amopreqcheck, amopopr)
        SELECT opcl.oid, 1, false, c.opoid
        FROM pg_opclass opcl, complex_ops_tmp c
        WHERE
            opcamid = (SELECT oid FROM pg_am WHERE amname = 'btree') AND
            opcname = 'complex_abs_ops' AND
            c.oprname = '&lt;';
   </programlisting>

   Now do this for the other operators substituting for the <literal>1</> in the
   second line above and the <literal>&lt;</> in the last line.  Note the order:
   <quote>less than</> is 1, <quote>less than or equal</> is 2,
   <quote>equal</> is 3, <quote>greater than or equal</quote> is 4, and
   <quote>greater than</quote> is 5.
  </para>

  <para>
   The field <filename>amopreqcheck</filename> is not discussed here; it
   should always be false for B-tree operators.
  </para>

  <para>
   The final step is registration of the <quote>support routine</quote> previously
   described in our discussion of <filename>pg_am</filename>.  The
   <filename>oid</filename> of this support routine is stored in the
   <filename>pg_amproc</filename> table, keyed by the operator class
   <filename>oid</filename> and the support routine number.
   First, we need to register the function in
   <productname>Postgres</productname> (recall that we put the
   <acronym>C</acronym> code that implements this routine in the bottom of
   the file in which we implemented the operator routines):

   <programlisting>
    CREATE FUNCTION complex_abs_cmp(complex, complex)
     RETURNS int4
     AS '<replaceable>PGROOT</replaceable>/tutorial/complex'
     LANGUAGE C;

    SELECT oid, proname FROM pg_proc
     WHERE proname = 'complex_abs_cmp';

  oid   |     proname
--------+-----------------
 277997 | complex_abs_cmp
(1 row)
   </programlisting>

   (Again, your <filename>oid</filename> number will probably be different.)
   We can add the new row as follows:

   <programlisting>
    INSERT INTO pg_amproc (amopclaid, amprocnum, amproc)
        SELECT opcl.oid, 1, p.oid
        FROM pg_opclass opcl, pg_proc p
        WHERE
            opcamid = (SELECT oid FROM pg_am WHERE amname = 'btree') AND
            opcname = 'complex_abs_ops' AND
            p.proname = 'complex_abs_cmp';
   </programlisting>
  </para>

  <para>
   And we're done!  (Whew.)  It should now be possible to create
   and use B-tree indexes on <filename>complex</filename> columns.
  </para>

 </chapter>

<!-- Keep this comment at the end of the file
Local variables:
mode:sgml
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"./reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:("/usr/lib/sgml/catalog")
sgml-local-ecat-files:nil
End:
-->