postgresql/doc/src/sgml/xindex.sgml

580 lines
20 KiB
Plaintext
Raw Normal View History

<!--
$Header: /cvsroot/pgsql/doc/src/sgml/xindex.sgml,v 1.24 2002/04/17 20:57:56 tgl Exp $
PostgreSQL documentation
-->
2002-01-07 03:29:15 +01:00
<chapter id="xindex">
<title>Interfacing Extensions To Indexes</title>
<sect1 id="xindex-intro">
<title>Introduction</title>
<para>
2002-01-07 03:29:15 +01:00
The procedures described thus far let you define new types, new
functions, and new operators. However, we cannot yet define a secondary
index (such as a B-tree, R-tree, or
hash access method) over a new type or its operators.
</para>
<para>
Look back at
2000-12-26 01:10:37 +01:00
<xref linkend="EXTEND-CATALOGS">.
The right half shows the catalogs that we must modify in order to tell
<productname>PostgreSQL</productname> how to use a user-defined type and/or
user-defined operators with an index (i.e., <filename>pg_am, pg_amop,
pg_amproc, pg_operator</filename> and <filename>pg_opclass</filename>).
Unfortunately, there is no simple command to do this. We will demonstrate
how to modify these catalogs through a running example: a new operator
class for the B-tree access method that stores and
sorts complex numbers in ascending absolute value order.
</para>
2002-01-07 03:29:15 +01:00
</sect1>
<sect1 id="xindex-am">
<title>Access Methods</title>
<para>
2002-01-07 03:29:15 +01:00
The <filename>pg_am</filename> table contains one row for every
index access method. Support for the heap access method is built
into <productname>PostgreSQL</productname>, but all other access
methods are described in <filename>pg_am</filename>. The schema is
shown in <xref linkend="xindex-pgam-table">.
2002-01-07 03:29:15 +01:00
<table tocentry="1" id="xindex-pgam-table">
<title>Index Access Method Schema</title>
<tgroup cols="2">
<thead>
<row>
<entry>Column</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry>amname</entry>
<entry>name of the access method</entry>
</row>
<row>
<entry>amowner</entry>
2002-01-07 03:29:15 +01:00
<entry>user ID of the owner (currently not used)</entry>
</row>
<row>
<entry>amstrategies</entry>
<entry>number of strategies for this access method (see below)</entry>
</row>
<row>
<entry>amsupport</entry>
<entry>number of support routines for this access method (see below)</entry>
</row>
<row>
<entry>amorderstrategy</entry>
<entry>zero if the index offers no sort order, otherwise the strategy
number of the strategy operator that describes the sort order</entry>
</row>
<row>
<entry>amcanunique</entry>
2002-01-07 03:29:15 +01:00
<entry>does AM support unique indexes?</entry>
</row>
<row>
<entry>amcanmulticol</entry>
<entry>does AM support multicolumn indexes?</entry>
</row>
<row>
<entry>amindexnulls</entry>
<entry>does AM support NULL index entries?</entry>
</row>
<row>
<entry>amconcurrent</entry>
<entry>does AM support concurrent updates?</entry>
</row>
<row>
<entry>amgettuple</entry>
</row>
<row>
<entry>aminsert</entry>
</row>
<row>
<entry>...</entry>
<entry>procedure identifiers for interface routines to the access
2002-01-07 03:29:15 +01:00
method. For example, regproc IDs for opening, closing, and
getting rows from the access method appear here.</entry>
</row>
</tbody>
</tgroup>
</table>
</para>
<para>
The <acronym>OID</acronym> of the row in
<filename>pg_am</filename> is used as a foreign key in a lot of other
tables. You do not need to add a new row to this table; all that
you are interested in is the <acronym>OID</acronym> of the access
method you want to extend:
2002-01-07 03:29:15 +01:00
<screen>
1998-03-01 09:16:16 +01:00
SELECT oid FROM pg_am WHERE amname = 'btree';
oid
-----
403
(1 row)
2002-01-07 03:29:15 +01:00
</screen>
2002-01-07 03:29:15 +01:00
We will use that query in a <literal>WHERE</literal>
clause later.
</para>
2002-01-07 03:29:15 +01:00
</sect1>
<sect1 id="xindex-strategies">
<title>Access Method Strategies</title>
<para>
2002-01-07 03:29:15 +01:00
The <structfield>amstrategies</structfield> column exists to standardize
comparisons across data types. For example, B-trees
impose a strict ordering on keys, lesser to greater. Since
<productname>PostgreSQL</productname> allows the user to define operators,
<productname>PostgreSQL</productname> cannot look at the name of an operator
(e.g., <literal>&gt;</> or <literal>&lt;</>) and tell what kind of comparison it is. In fact,
some access methods don't impose any ordering at all. For example,
2002-01-07 03:29:15 +01:00
R-trees express a rectangle-containment relationship,
whereas a hashed data structure expresses only bitwise similarity based
on the value of a hash function. <productname>PostgreSQL</productname>
needs some consistent way of taking a qualification in your query,
2002-01-07 03:29:15 +01:00
looking at the operator, and then deciding if a usable index exists. This
implies that <productname>PostgreSQL</productname> needs to know, for
example, that the <literal>&lt;=</> and <literal>&gt;</> operators partition a
2002-01-07 03:29:15 +01:00
B-tree. <productname>PostgreSQL</productname>
uses <firstterm>strategies</firstterm> to express these relationships between
operators and the way they can be used to scan indexes.
</para>
<para>
2002-01-07 03:29:15 +01:00
Defining a new set of strategies is beyond the scope of this
discussion, but we'll explain how B-tree strategies work because
you'll need to know that to add a new B-tree operator class. In the
2002-01-07 03:29:15 +01:00
<classname>pg_am</classname> table, the
<structfield>amstrategies</structfield> column sets the number of
strategies defined for this access method. For B-trees, this number
is 5. The meanings of these strategies are shown in <xref
linkend="xindex-btree-table">.
</para>
2002-01-07 03:29:15 +01:00
<table tocentry="1" id="xindex-btree-table">
<title>B-tree Strategies</title>
<titleabbrev>B-tree</titleabbrev>
<tgroup cols="2">
<thead>
<row>
<entry>Operation</entry>
<entry>Index</entry>
</row>
</thead>
<tbody>
<row>
<entry>less than</entry>
<entry>1</entry>
</row>
<row>
<entry>less than or equal</entry>
<entry>2</entry>
</row>
<row>
<entry>equal</entry>
<entry>3</entry>
</row>
<row>
<entry>greater than or equal</entry>
<entry>4</entry>
</row>
<row>
<entry>greater than</entry>
<entry>5</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
2002-01-07 03:29:15 +01:00
The idea is that you'll need to add operators corresponding to these strategies
to the <classname>pg_amop</classname> relation (see below).
The access method code can use these strategy numbers, regardless of data
2002-01-07 03:29:15 +01:00
type, to figure out how to partition the B-tree,
compute selectivity, and so on. Don't worry about the details of adding
operators yet; just understand that there must be a set of these
2002-01-07 03:29:15 +01:00
operators for <type>int2</>, <type>int4</>, <type>oid</>, and all other
data types on which a B-tree can operate.
</para>
2002-01-07 03:29:15 +01:00
</sect1>
<sect1 id="xindex-support">
<title>Access Method Support Routines</title>
<para>
Sometimes, strategies aren't enough information for the system to figure
out how to use an index. Some access methods require additional support
2002-01-07 03:29:15 +01:00
routines in order to work. For example, the B-tree
access method must be able to compare two keys and determine whether one
is greater than, equal to, or less than the other. Similarly, the
2002-01-07 03:29:15 +01:00
R-tree access method must be able to compute
intersections, unions, and sizes of rectangles. These
operations do not correspond to operators used in qualifications in
SQL queries; they are administrative routines used by
the access methods, internally.
</para>
<para>
In order to manage diverse support routines consistently across all
<productname>PostgreSQL</productname> access methods,
2002-01-07 03:29:15 +01:00
<classname>pg_am</classname> includes a column called
<structfield>amsupport</structfield>. This column records the
number of support routines used by an access method. For B-trees,
this number is one: the routine to take two keys and return -1, 0,
or +1, depending on whether the first key is less than, equal to,
or greater than the second. (Strictly speaking, this routine can
return a negative number (&lt; 0), zero, or a non-zero positive
number (&gt; 0).)
</para>
<para>
2002-01-07 03:29:15 +01:00
The <structfield>amstrategies</structfield> entry in
<classname>pg_am</classname> is just the number of strategies
defined for the access method in question. The operators for less
than, less equal, and so on don't appear in
<classname>pg_am</classname>. Similarly,
<structfield>amsupport</structfield> is just the number of support
routines required by the access method. The actual routines are
listed elsewhere.
</para>
<para>
2002-01-07 03:29:15 +01:00
By the way, the <structfield>amorderstrategy</structfield> column tells whether
the access method supports ordered scan. Zero means it doesn't; if it
2002-01-07 03:29:15 +01:00
does, <structfield>amorderstrategy</structfield> is the number of the strategy
routine that corresponds to the ordering operator. For example, B-tree
2002-01-07 03:29:15 +01:00
has <structfield>amorderstrategy</structfield> = 1, which is its
<quote>less than</quote> strategy number.
</para>
2002-01-07 03:29:15 +01:00
</sect1>
<sect1 id="xindex-opclass">
<title>Operator Classes</title>
<para>
2002-01-07 03:29:15 +01:00
The next table of interest is <classname>pg_opclass</classname>. This table
defines operator class names and input data types for each of the operator
classes supported by a given index access method. The same class name
can be used for several different access methods (for example, both B-tree
and hash access methods have operator classes named
2002-01-07 03:29:15 +01:00
<literal>oid_ops</literal>), but a separate
<filename>pg_opclass</filename> row must appear for each access method.
2002-01-07 03:29:15 +01:00
The OID of the <classname>pg_opclass</classname> row is
used as a foreign
key in other tables to associate specific operators and support routines
with the operator class.
</para>
<para>
2002-01-07 03:29:15 +01:00
You need to add a row with your operator class name (for example,
<literal>complex_abs_ops</literal>) to
<classname>pg_opclass</classname>:
2002-01-07 03:29:15 +01:00
<programlisting>
INSERT INTO pg_opclass (opcamid, opcname, opcnamespace, opcowner, opcintype, opcdefault, opckeytype)
VALUES (
(SELECT oid FROM pg_am WHERE amname = 'btree'),
'complex_abs_ops',
(SELECT oid FROM pg_namespace WHERE nspname = 'pg_catalog'),
1, -- UID of superuser is hardwired to 1 as of PG 7.3
(SELECT oid FROM pg_type WHERE typname = 'complex'),
true,
0);
SELECT oid, *
1998-03-01 09:16:16 +01:00
FROM pg_opclass
WHERE opcname = 'complex_abs_ops';
oid | opcamid | opcname | opcnamespace | opcowner | opcintype | opcdefault | opckeytype
--------+---------+-----------------+--------------+----------+-----------+------------+------------
277975 | 403 | complex_abs_ops | 11 | 1 | 277946 | t | 0
(1 row)
2002-01-07 03:29:15 +01:00
</programlisting>
2002-01-07 03:29:15 +01:00
Note that the OID for your <classname>pg_opclass</classname> row will
be different! Don't worry about this though. We'll get this number
2002-01-07 03:29:15 +01:00
from the system later just like we got the OID of the type here.
</para>
<para>
2002-01-07 03:29:15 +01:00
The above example assumes that you want to make this new operator class the
default B-tree operator class for the <type>complex</type> data type.
If you don't, just set <structfield>opcdefault</structfield> to false instead.
<structfield>opckeytype</structfield> is not described here; it should always
be zero for B-tree operator classes.
</para>
2002-01-07 03:29:15 +01:00
</sect1>
<sect1 id="xindex-operators">
<title>Creating the Operators and Support Routines</title>
<para>
So now we have an access method and an operator class.
We still need a set of operators. The procedure for
2002-01-07 03:29:15 +01:00
defining operators was discussed in <xref linkend="xoper">.
For the <literal>complex_abs_ops</literal> operator class on B-trees,
the operators we require are:
2002-01-07 03:29:15 +01:00
<itemizedlist spacing="compact">
<listitem><simpara>absolute-value less-than (strategy 1)</></>
<listitem><simpara>absolute-value less-than-or-equal (strategy 2)</></>
<listitem><simpara>absolute-value equal (strategy 3)</></>
<listitem><simpara>absolute-value greater-than-or-equal (strategy 4)</></>
<listitem><simpara>absolute-value greater-than (strategy 5)</></>
</itemizedlist>
</para>
<para>
Suppose the code that implements these functions
is stored in the file
2002-01-07 03:29:15 +01:00
<filename><replaceable>PGROOT</replaceable>/src/tutorial/complex.c</filename>,
which we have compiled into
2002-01-07 03:29:15 +01:00
<filename><replaceable>PGROOT</replaceable>/src/tutorial/complex.so</filename>.
Part of the C code looks like this:
2002-01-07 03:29:15 +01:00
<programlisting>
1998-03-01 09:16:16 +01:00
#define Mag(c) ((c)-&gt;x*(c)-&gt;x + (c)-&gt;y*(c)-&gt;y)
bool
complex_abs_eq(Complex *a, Complex *b)
{
double amag = Mag(a), bmag = Mag(b);
return (amag==bmag);
}
2002-01-07 03:29:15 +01:00
</programlisting>
(Note that we will only show the equality operator for the rest of
the examples. The other four operators are very similar. Refer to
<filename>complex.c</filename> or
<filename>complex.source</filename> for the details.)
</para>
<para>
We make the function known to <productname>PostgreSQL</productname> like this:
2002-01-07 03:29:15 +01:00
<programlisting>
CREATE FUNCTION complex_abs_eq(complex, complex) RETURNS boolean
AS '<replaceable>PGROOT</replaceable>/src/tutorial/complex'
LANGUAGE C;
</programlisting>
</para>
<para>
2002-01-07 03:29:15 +01:00
There are some important things that are happening here:
2002-01-07 03:29:15 +01:00
<itemizedlist>
<listitem>
<para>
First, note that operators for less-than, less-than-or-equal, equal,
greater-than-or-equal, and greater-than for <filename>complex</filename>
are being defined. We can only have one operator named, say, = and
taking type <filename>complex</filename> for both operands. In this case
we don't have any other operator = for <filename>complex</filename>,
but if we were building a practical data type we'd probably want = to
be the ordinary equality operation for complex numbers. In that case,
2002-01-07 03:29:15 +01:00
we'd need to use some other operator name for <function>complex_abs_eq</>.
</para>
2002-01-07 03:29:15 +01:00
</listitem>
2002-01-07 03:29:15 +01:00
<listitem>
<para>
Second, although <productname>PostgreSQL</productname> can cope with operators having
the same name as long as they have different input data types, C can only
cope with one global routine having a given name, period. So we shouldn't
name the C function something simple like <filename>abs_eq</filename>.
Usually it's a good practice to include the data type name in the C
function name, so as not to conflict with functions for other data types.
</para>
2002-01-07 03:29:15 +01:00
</listitem>
1998-03-01 09:16:16 +01:00
2002-01-07 03:29:15 +01:00
<listitem>
<para>
Third, we could have made the <productname>PostgreSQL</productname> name of the function
<filename>abs_eq</filename>, relying on <productname>PostgreSQL</productname> to distinguish it
by input data types from any other <productname>PostgreSQL</productname> function of the same name.
To keep the example simple, we make the function have the same names
at the C level and <productname>PostgreSQL</productname> level.
</para>
2002-01-07 03:29:15 +01:00
</listitem>
2002-01-07 03:29:15 +01:00
<listitem>
<para>
Finally, note that these operator functions return Boolean values.
2002-01-07 03:29:15 +01:00
In practice, all operators defined as index access method
strategies must return type <type>boolean</type>, since they must
appear at the top level of a <literal>WHERE</> clause to be used with an index.
(On the other hand, the support function returns whatever the
particular access method expects -- in this case, a signed
integer.)
</para>
</listitem>
</itemizedlist>
</para>
<para>
2002-01-07 03:29:15 +01:00
The final routine in the file is the <quote>support routine</quote>
mentioned when we discussed the <structfield>amsupport</> column of the
<classname>pg_am</classname> table. We will use this later on. For
now, ignore it.
</para>
<para>
Now we are ready to define the operators:
2002-01-07 03:29:15 +01:00
<programlisting>
1998-03-01 09:16:16 +01:00
CREATE OPERATOR = (
leftarg = complex, rightarg = complex,
1998-03-01 09:16:16 +01:00
procedure = complex_abs_eq,
restrict = eqsel, join = eqjoinsel
2001-08-31 06:17:13 +02:00
);
2002-01-07 03:29:15 +01:00
</programlisting>
The important
2002-01-07 03:29:15 +01:00
things here are the procedure names (which are the C
functions defined above) and the restriction and join selectivity
functions. You should just use the selectivity functions used in
the example (see <filename>complex.source</filename>).
Note that there
are different such functions for the less-than, equal, and greater-than
2002-01-07 03:29:15 +01:00
cases. These must be supplied or the optimizer will be unable to
make effective use of the index.
</para>
<para>
The next step is to add entries for these operators to
2002-01-07 03:29:15 +01:00
the <classname>pg_amop</classname> relation. To do this,
we'll need the OIDs of the operators we just
defined. We'll look up the names of all the operators that take
2002-01-07 03:29:15 +01:00
two operands of type <type>complex</type>, and pick ours out:
2002-01-07 03:29:15 +01:00
<screen>
SELECT o.oid AS opoid, o.oprname
INTO TEMP TABLE complex_ops_tmp
FROM pg_operator o, pg_type t
WHERE o.oprleft = t.oid and o.oprright = t.oid
and t.typname = 'complex';
opoid | oprname
--------+---------
277963 | +
277970 | &lt;
277971 | &lt;=
277972 | =
277973 | &gt;=
277974 | &gt;
(6 rows)
2002-01-07 03:29:15 +01:00
</screen>
2002-01-07 03:29:15 +01:00
(Again, some of your OID numbers will almost
certainly be different.) The operators we are interested in are those
2002-01-07 03:29:15 +01:00
with OIDs 277970 through 277974. The values you
get will probably be different, and you should substitute them for the
values below. We will do this with a select statement.
</para>
<para>
2002-01-07 03:29:15 +01:00
Now we are ready to insert entries into <classname>pg_amop</classname> for
our new operator class. These entries must associate the correct
B-tree strategy numbers with each of the operators we need.
The command to insert the less-than operator looks like:
2002-01-07 03:29:15 +01:00
<programlisting>
INSERT INTO pg_amop (amopclaid, amopstrategy, amopreqcheck, amopopr)
SELECT opcl.oid, 1, false, c.opoid
FROM pg_opclass opcl, complex_ops_tmp c
WHERE
opcamid = (SELECT oid FROM pg_am WHERE amname = 'btree') AND
opcname = 'complex_abs_ops' AND
c.oprname = '&lt;';
2002-01-07 03:29:15 +01:00
</programlisting>
Now do this for the other operators substituting for the <literal>1</> in the
second line above and the <literal>&lt;</> in the last line. Note the order:
<quote>less than</> is 1, <quote>less than or equal</> is 2,
<quote>equal</> is 3, <quote>greater than or equal</quote> is 4, and
<quote>greater than</quote> is 5.
</para>
<para>
The field <filename>amopreqcheck</filename> is not discussed here; it
should always be false for B-tree operators.
</para>
<para>
2002-01-07 03:29:15 +01:00
The final step is the registration of the <quote>support routine</quote> previously
described in our discussion of <classname>pg_am</classname>. The
OID of this support routine is stored in the
<classname>pg_amproc</classname> table, keyed by the operator class
OID and the support routine number.
</para>
<para>
First, we need to register the function in
<productname>PostgreSQL</productname> (recall that we put the
2002-01-07 03:29:15 +01:00
C code that implements this routine in the bottom of
the file in which we implemented the operator routines):
2002-01-07 03:29:15 +01:00
<programlisting>
CREATE FUNCTION complex_abs_cmp(complex, complex)
RETURNS integer
AS '<replaceable>PGROOT</replaceable>/src/tutorial/complex'
LANGUAGE C;
1998-03-01 09:16:16 +01:00
2002-01-07 03:29:15 +01:00
SELECT oid, proname FROM pg_proc
WHERE proname = 'complex_abs_cmp';
oid | proname
--------+-----------------
277997 | complex_abs_cmp
(1 row)
2002-01-07 03:29:15 +01:00
</programlisting>
(Again, your OID number will probably be different.)
</para>
2002-01-07 03:29:15 +01:00
<para>
We can add the new row as follows:
2002-01-07 03:29:15 +01:00
<programlisting>
INSERT INTO pg_amproc (amopclaid, amprocnum, amproc)
SELECT opcl.oid, 1, p.oid
FROM pg_opclass opcl, pg_proc p
WHERE
opcamid = (SELECT oid FROM pg_am WHERE amname = 'btree') AND
opcname = 'complex_abs_ops' AND
p.proname = 'complex_abs_cmp';
2002-01-07 03:29:15 +01:00
</programlisting>
</para>
<para>
And we're done! (Whew.) It should now be possible to create
2002-01-07 03:29:15 +01:00
and use B-tree indexes on <type>complex</type> columns.
</para>
2002-01-07 03:29:15 +01:00
</sect1>
2002-01-07 03:29:15 +01:00
</chapter>
<!-- Keep this comment at the end of the file
Local variables:
mode:sgml
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"./reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:("/usr/lib/sgml/catalog")
sgml-local-ecat-files:nil
End:
-->