postgresql/doc/src/sgml/xindex.sgml

596 lines
21 KiB
Plaintext
Raw Normal View History

<!--
$Header: /cvsroot/pgsql/doc/src/sgml/xindex.sgml,v 1.25 2002/05/29 17:36:40 tgl Exp $
PostgreSQL documentation
-->
2002-01-07 03:29:15 +01:00
<chapter id="xindex">
<title>Interfacing Extensions To Indexes</title>
<sect1 id="xindex-intro">
<title>Introduction</title>
<para>
2002-01-07 03:29:15 +01:00
The procedures described thus far let you define new types, new
functions, and new operators. However, we cannot yet define a secondary
index (such as a B-tree, R-tree, or
hash access method) over a new type or its operators.
</para>
<para>
Look back at
2000-12-26 01:10:37 +01:00
<xref linkend="EXTEND-CATALOGS">.
The right half shows the catalogs that we must modify in order to tell
<productname>PostgreSQL</productname> how to use a user-defined type and/or
user-defined operators with an index (i.e., <filename>pg_am, pg_amop,
pg_amproc, pg_operator</filename> and <filename>pg_opclass</filename>).
Unfortunately, there is no simple command to do this. We will demonstrate
how to modify these catalogs through a running example: a new operator
class for the B-tree access method that stores and
sorts complex numbers in ascending absolute value order.
</para>
2002-01-07 03:29:15 +01:00
</sect1>
<sect1 id="xindex-am">
<title>Access Methods</title>
<para>
2002-01-07 03:29:15 +01:00
The <filename>pg_am</filename> table contains one row for every
index access method. Support for the heap access method is built
into <productname>PostgreSQL</productname>, but all other access
methods are described in <filename>pg_am</filename>. The schema is
shown in <xref linkend="xindex-pgam-table">.
2002-01-07 03:29:15 +01:00
<table tocentry="1" id="xindex-pgam-table">
<title>Index Access Method Schema</title>
<tgroup cols="2">
<thead>
<row>
<entry>Column</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry>amname</entry>
<entry>name of the access method</entry>
</row>
<row>
<entry>amowner</entry>
2002-01-07 03:29:15 +01:00
<entry>user ID of the owner (currently not used)</entry>
</row>
<row>
<entry>amstrategies</entry>
<entry>number of strategies for this access method (see below)</entry>
</row>
<row>
<entry>amsupport</entry>
<entry>number of support routines for this access method (see below)</entry>
</row>
<row>
<entry>amorderstrategy</entry>
<entry>zero if the index offers no sort order, otherwise the strategy
number of the strategy operator that describes the sort order</entry>
</row>
<row>
<entry>amcanunique</entry>
2002-01-07 03:29:15 +01:00
<entry>does AM support unique indexes?</entry>
</row>
<row>
<entry>amcanmulticol</entry>
<entry>does AM support multicolumn indexes?</entry>
</row>
<row>
<entry>amindexnulls</entry>
<entry>does AM support NULL index entries?</entry>
</row>
<row>
<entry>amconcurrent</entry>
<entry>does AM support concurrent updates?</entry>
</row>
<row>
<entry>amgettuple</entry>
</row>
<row>
<entry>aminsert</entry>
</row>
<row>
<entry>...</entry>
<entry>procedure identifiers for interface routines to the access
2002-01-07 03:29:15 +01:00
method. For example, regproc IDs for opening, closing, and
getting rows from the access method appear here.</entry>
</row>
</tbody>
</tgroup>
</table>
</para>
<note>
<para>
An index AM that supports multiple columns (has
<structfield>amcanmulticol</structfield> true) <emphasis>must</>
support indexing nulls in columns after the first, because the planner
will assume the index can be used for queries on just the first
column(s). For example, consider an index on (a,b) and a query
WHERE a = 4. The system will assume the index can be used to scan for
rows with a = 4, which is wrong if the index omits rows where b is null.
However it is okay to omit rows where the first indexed column is null.
(GiST currently does so.)
<structfield>amindexnulls</structfield> should be set true only if the
index AM indexes all rows, including arbitrary combinations of nulls.
</para>
</note>
<para>
The <acronym>OID</acronym> of the row in
<filename>pg_am</filename> is used as a foreign key in a lot of other
tables. You do not need to add a new row to this table; all that
you are interested in is the <acronym>OID</acronym> of the access
method you want to extend:
2002-01-07 03:29:15 +01:00
<screen>
1998-03-01 09:16:16 +01:00
SELECT oid FROM pg_am WHERE amname = 'btree';
oid
-----
403
(1 row)
2002-01-07 03:29:15 +01:00
</screen>
2002-01-07 03:29:15 +01:00
We will use that query in a <literal>WHERE</literal>
clause later.
</para>
2002-01-07 03:29:15 +01:00
</sect1>
<sect1 id="xindex-strategies">
<title>Access Method Strategies</title>
<para>
2002-01-07 03:29:15 +01:00
The <structfield>amstrategies</structfield> column exists to standardize
comparisons across data types. For example, B-trees
impose a strict ordering on keys, lesser to greater. Since
<productname>PostgreSQL</productname> allows the user to define operators,
<productname>PostgreSQL</productname> cannot look at the name of an operator
(e.g., <literal>&gt;</> or <literal>&lt;</>) and tell what kind of comparison it is. In fact,
some access methods don't impose any ordering at all. For example,
2002-01-07 03:29:15 +01:00
R-trees express a rectangle-containment relationship,
whereas a hashed data structure expresses only bitwise similarity based
on the value of a hash function. <productname>PostgreSQL</productname>
needs some consistent way of taking a qualification in your query,
2002-01-07 03:29:15 +01:00
looking at the operator, and then deciding if a usable index exists. This
implies that <productname>PostgreSQL</productname> needs to know, for
example, that the <literal>&lt;=</> and <literal>&gt;</> operators partition a
2002-01-07 03:29:15 +01:00
B-tree. <productname>PostgreSQL</productname>
uses <firstterm>strategies</firstterm> to express these relationships between
operators and the way they can be used to scan indexes.
</para>
<para>
2002-01-07 03:29:15 +01:00
Defining a new set of strategies is beyond the scope of this
discussion, but we'll explain how B-tree strategies work because
you'll need to know that to add a new B-tree operator class. In the
2002-01-07 03:29:15 +01:00
<classname>pg_am</classname> table, the
<structfield>amstrategies</structfield> column sets the number of
strategies defined for this access method. For B-trees, this number
is 5. The meanings of these strategies are shown in <xref
linkend="xindex-btree-table">.
</para>
2002-01-07 03:29:15 +01:00
<table tocentry="1" id="xindex-btree-table">
<title>B-tree Strategies</title>
<titleabbrev>B-tree</titleabbrev>
<tgroup cols="2">
<thead>
<row>
<entry>Operation</entry>
<entry>Index</entry>
</row>
</thead>
<tbody>
<row>
<entry>less than</entry>
<entry>1</entry>
</row>
<row>
<entry>less than or equal</entry>
<entry>2</entry>
</row>
<row>
<entry>equal</entry>
<entry>3</entry>
</row>
<row>
<entry>greater than or equal</entry>
<entry>4</entry>
</row>
<row>
<entry>greater than</entry>
<entry>5</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
2002-01-07 03:29:15 +01:00
The idea is that you'll need to add operators corresponding to these strategies
to the <classname>pg_amop</classname> relation (see below).
The access method code can use these strategy numbers, regardless of data
2002-01-07 03:29:15 +01:00
type, to figure out how to partition the B-tree,
compute selectivity, and so on. Don't worry about the details of adding
operators yet; just understand that there must be a set of these
2002-01-07 03:29:15 +01:00
operators for <type>int2</>, <type>int4</>, <type>oid</>, and all other
data types on which a B-tree can operate.
</para>
2002-01-07 03:29:15 +01:00
</sect1>
<sect1 id="xindex-support">
<title>Access Method Support Routines</title>
<para>
Sometimes, strategies aren't enough information for the system to figure
out how to use an index. Some access methods require additional support
2002-01-07 03:29:15 +01:00
routines in order to work. For example, the B-tree
access method must be able to compare two keys and determine whether one
is greater than, equal to, or less than the other. Similarly, the
2002-01-07 03:29:15 +01:00
R-tree access method must be able to compute
intersections, unions, and sizes of rectangles. These
operations do not correspond to operators used in qualifications in
SQL queries; they are administrative routines used by
the access methods, internally.
</para>
<para>
In order to manage diverse support routines consistently across all
<productname>PostgreSQL</productname> access methods,
2002-01-07 03:29:15 +01:00
<classname>pg_am</classname> includes a column called
<structfield>amsupport</structfield>. This column records the
number of support routines used by an access method. For B-trees,
this number is one: the routine to take two keys and return -1, 0,
or +1, depending on whether the first key is less than, equal to,
or greater than the second. (Strictly speaking, this routine can
return a negative number (&lt; 0), zero, or a non-zero positive
number (&gt; 0).)
</para>
<para>
2002-01-07 03:29:15 +01:00
The <structfield>amstrategies</structfield> entry in
<classname>pg_am</classname> is just the number of strategies
defined for the access method in question. The operators for less
than, less equal, and so on don't appear in
<classname>pg_am</classname>. Similarly,
<structfield>amsupport</structfield> is just the number of support
routines required by the access method. The actual routines are
listed elsewhere.
</para>
<para>
2002-01-07 03:29:15 +01:00
By the way, the <structfield>amorderstrategy</structfield> column tells whether
the access method supports ordered scan. Zero means it doesn't; if it
2002-01-07 03:29:15 +01:00
does, <structfield>amorderstrategy</structfield> is the number of the strategy
routine that corresponds to the ordering operator. For example, B-tree
2002-01-07 03:29:15 +01:00
has <structfield>amorderstrategy</structfield> = 1, which is its
<quote>less than</quote> strategy number.
</para>
2002-01-07 03:29:15 +01:00
</sect1>
<sect1 id="xindex-opclass">
<title>Operator Classes</title>
<para>
2002-01-07 03:29:15 +01:00
The next table of interest is <classname>pg_opclass</classname>. This table
defines operator class names and input data types for each of the operator
classes supported by a given index access method. The same class name
can be used for several different access methods (for example, both B-tree
and hash access methods have operator classes named
2002-01-07 03:29:15 +01:00
<literal>oid_ops</literal>), but a separate
<filename>pg_opclass</filename> row must appear for each access method.
2002-01-07 03:29:15 +01:00
The OID of the <classname>pg_opclass</classname> row is
used as a foreign
key in other tables to associate specific operators and support routines
with the operator class.
</para>
<para>
2002-01-07 03:29:15 +01:00
You need to add a row with your operator class name (for example,
<literal>complex_abs_ops</literal>) to
<classname>pg_opclass</classname>:
2002-01-07 03:29:15 +01:00
<programlisting>
INSERT INTO pg_opclass (opcamid, opcname, opcnamespace, opcowner, opcintype, opcdefault, opckeytype)
VALUES (
(SELECT oid FROM pg_am WHERE amname = 'btree'),
'complex_abs_ops',
(SELECT oid FROM pg_namespace WHERE nspname = 'pg_catalog'),
1, -- UID of superuser is hardwired to 1 as of PG 7.3
(SELECT oid FROM pg_type WHERE typname = 'complex'),
true,
0);
SELECT oid, *
1998-03-01 09:16:16 +01:00
FROM pg_opclass
WHERE opcname = 'complex_abs_ops';
oid | opcamid | opcname | opcnamespace | opcowner | opcintype | opcdefault | opckeytype
--------+---------+-----------------+--------------+----------+-----------+------------+------------
277975 | 403 | complex_abs_ops | 11 | 1 | 277946 | t | 0
(1 row)
2002-01-07 03:29:15 +01:00
</programlisting>
2002-01-07 03:29:15 +01:00
Note that the OID for your <classname>pg_opclass</classname> row will
be different! Don't worry about this though. We'll get this number
2002-01-07 03:29:15 +01:00
from the system later just like we got the OID of the type here.
</para>
<para>
2002-01-07 03:29:15 +01:00
The above example assumes that you want to make this new operator class the
default B-tree operator class for the <type>complex</type> data type.
If you don't, just set <structfield>opcdefault</structfield> to false instead.
<structfield>opckeytype</structfield> is not described here; it should always
be zero for B-tree operator classes.
</para>
2002-01-07 03:29:15 +01:00
</sect1>
<sect1 id="xindex-operators">
<title>Creating the Operators and Support Routines</title>
<para>
So now we have an access method and an operator class.
We still need a set of operators. The procedure for
2002-01-07 03:29:15 +01:00
defining operators was discussed in <xref linkend="xoper">.
For the <literal>complex_abs_ops</literal> operator class on B-trees,
the operators we require are:
2002-01-07 03:29:15 +01:00
<itemizedlist spacing="compact">
<listitem><simpara>absolute-value less-than (strategy 1)</></>
<listitem><simpara>absolute-value less-than-or-equal (strategy 2)</></>
<listitem><simpara>absolute-value equal (strategy 3)</></>
<listitem><simpara>absolute-value greater-than-or-equal (strategy 4)</></>
<listitem><simpara>absolute-value greater-than (strategy 5)</></>
</itemizedlist>
</para>
<para>
Suppose the code that implements these functions
is stored in the file
2002-01-07 03:29:15 +01:00
<filename><replaceable>PGROOT</replaceable>/src/tutorial/complex.c</filename>,
which we have compiled into
2002-01-07 03:29:15 +01:00
<filename><replaceable>PGROOT</replaceable>/src/tutorial/complex.so</filename>.
Part of the C code looks like this:
2002-01-07 03:29:15 +01:00
<programlisting>
1998-03-01 09:16:16 +01:00
#define Mag(c) ((c)-&gt;x*(c)-&gt;x + (c)-&gt;y*(c)-&gt;y)
bool
complex_abs_eq(Complex *a, Complex *b)
{
double amag = Mag(a), bmag = Mag(b);
return (amag==bmag);
}
2002-01-07 03:29:15 +01:00
</programlisting>
(Note that we will only show the equality operator for the rest of
the examples. The other four operators are very similar. Refer to
<filename>complex.c</filename> or
<filename>complex.source</filename> for the details.)
</para>
<para>
We make the function known to <productname>PostgreSQL</productname> like this:
2002-01-07 03:29:15 +01:00
<programlisting>
CREATE FUNCTION complex_abs_eq(complex, complex) RETURNS boolean
AS '<replaceable>PGROOT</replaceable>/src/tutorial/complex'
LANGUAGE C;
</programlisting>
</para>
<para>
2002-01-07 03:29:15 +01:00
There are some important things that are happening here:
2002-01-07 03:29:15 +01:00
<itemizedlist>
<listitem>
<para>
First, note that operators for less-than, less-than-or-equal, equal,
greater-than-or-equal, and greater-than for <filename>complex</filename>
are being defined. We can only have one operator named, say, = and
taking type <filename>complex</filename> for both operands. In this case
we don't have any other operator = for <filename>complex</filename>,
but if we were building a practical data type we'd probably want = to
be the ordinary equality operation for complex numbers. In that case,
2002-01-07 03:29:15 +01:00
we'd need to use some other operator name for <function>complex_abs_eq</>.
</para>
2002-01-07 03:29:15 +01:00
</listitem>
2002-01-07 03:29:15 +01:00
<listitem>
<para>
Second, although <productname>PostgreSQL</productname> can cope with operators having
the same name as long as they have different input data types, C can only
cope with one global routine having a given name, period. So we shouldn't
name the C function something simple like <filename>abs_eq</filename>.
Usually it's a good practice to include the data type name in the C
function name, so as not to conflict with functions for other data types.
</para>
2002-01-07 03:29:15 +01:00
</listitem>
1998-03-01 09:16:16 +01:00
2002-01-07 03:29:15 +01:00
<listitem>
<para>
Third, we could have made the <productname>PostgreSQL</productname> name of the function
<filename>abs_eq</filename>, relying on <productname>PostgreSQL</productname> to distinguish it
by input data types from any other <productname>PostgreSQL</productname> function of the same name.
To keep the example simple, we make the function have the same names
at the C level and <productname>PostgreSQL</productname> level.
</para>
2002-01-07 03:29:15 +01:00
</listitem>
2002-01-07 03:29:15 +01:00
<listitem>
<para>
Finally, note that these operator functions return Boolean values.
2002-01-07 03:29:15 +01:00
In practice, all operators defined as index access method
strategies must return type <type>boolean</type>, since they must
appear at the top level of a <literal>WHERE</> clause to be used with an index.
(On the other hand, the support function returns whatever the
particular access method expects -- in this case, a signed
integer.)
</para>
</listitem>
</itemizedlist>
</para>
<para>
2002-01-07 03:29:15 +01:00
The final routine in the file is the <quote>support routine</quote>
mentioned when we discussed the <structfield>amsupport</> column of the
<classname>pg_am</classname> table. We will use this later on. For
now, ignore it.
</para>
<para>
Now we are ready to define the operators:
2002-01-07 03:29:15 +01:00
<programlisting>
1998-03-01 09:16:16 +01:00
CREATE OPERATOR = (
leftarg = complex, rightarg = complex,
1998-03-01 09:16:16 +01:00
procedure = complex_abs_eq,
restrict = eqsel, join = eqjoinsel
2001-08-31 06:17:13 +02:00
);
2002-01-07 03:29:15 +01:00
</programlisting>
The important
2002-01-07 03:29:15 +01:00
things here are the procedure names (which are the C
functions defined above) and the restriction and join selectivity
functions. You should just use the selectivity functions used in
the example (see <filename>complex.source</filename>).
Note that there
are different such functions for the less-than, equal, and greater-than
2002-01-07 03:29:15 +01:00
cases. These must be supplied or the optimizer will be unable to
make effective use of the index.
</para>
<para>
The next step is to add entries for these operators to
2002-01-07 03:29:15 +01:00
the <classname>pg_amop</classname> relation. To do this,
we'll need the OIDs of the operators we just
defined. We'll look up the names of all the operators that take
2002-01-07 03:29:15 +01:00
two operands of type <type>complex</type>, and pick ours out:
2002-01-07 03:29:15 +01:00
<screen>
SELECT o.oid AS opoid, o.oprname
INTO TEMP TABLE complex_ops_tmp
FROM pg_operator o, pg_type t
WHERE o.oprleft = t.oid and o.oprright = t.oid
and t.typname = 'complex';
opoid | oprname
--------+---------
277963 | +
277970 | &lt;
277971 | &lt;=
277972 | =
277973 | &gt;=
277974 | &gt;
(6 rows)
2002-01-07 03:29:15 +01:00
</screen>
2002-01-07 03:29:15 +01:00
(Again, some of your OID numbers will almost
certainly be different.) The operators we are interested in are those
2002-01-07 03:29:15 +01:00
with OIDs 277970 through 277974. The values you
get will probably be different, and you should substitute them for the
values below. We will do this with a select statement.
</para>
<para>
2002-01-07 03:29:15 +01:00
Now we are ready to insert entries into <classname>pg_amop</classname> for
our new operator class. These entries must associate the correct
B-tree strategy numbers with each of the operators we need.
The command to insert the less-than operator looks like:
2002-01-07 03:29:15 +01:00
<programlisting>
INSERT INTO pg_amop (amopclaid, amopstrategy, amopreqcheck, amopopr)
SELECT opcl.oid, 1, false, c.opoid
FROM pg_opclass opcl, complex_ops_tmp c
WHERE
opcamid = (SELECT oid FROM pg_am WHERE amname = 'btree') AND
opcname = 'complex_abs_ops' AND
c.oprname = '&lt;';
2002-01-07 03:29:15 +01:00
</programlisting>
Now do this for the other operators substituting for the <literal>1</> in the
second line above and the <literal>&lt;</> in the last line. Note the order:
<quote>less than</> is 1, <quote>less than or equal</> is 2,
<quote>equal</> is 3, <quote>greater than or equal</quote> is 4, and
<quote>greater than</quote> is 5.
</para>
<para>
The field <filename>amopreqcheck</filename> is not discussed here; it
should always be false for B-tree operators.
</para>
<para>
2002-01-07 03:29:15 +01:00
The final step is the registration of the <quote>support routine</quote> previously
described in our discussion of <classname>pg_am</classname>. The
OID of this support routine is stored in the
<classname>pg_amproc</classname> table, keyed by the operator class
OID and the support routine number.
</para>
<para>
First, we need to register the function in
<productname>PostgreSQL</productname> (recall that we put the
2002-01-07 03:29:15 +01:00
C code that implements this routine in the bottom of
the file in which we implemented the operator routines):
2002-01-07 03:29:15 +01:00
<programlisting>
CREATE FUNCTION complex_abs_cmp(complex, complex)
RETURNS integer
AS '<replaceable>PGROOT</replaceable>/src/tutorial/complex'
LANGUAGE C;
1998-03-01 09:16:16 +01:00
2002-01-07 03:29:15 +01:00
SELECT oid, proname FROM pg_proc
WHERE proname = 'complex_abs_cmp';
oid | proname
--------+-----------------
277997 | complex_abs_cmp
(1 row)
2002-01-07 03:29:15 +01:00
</programlisting>
(Again, your OID number will probably be different.)
</para>
2002-01-07 03:29:15 +01:00
<para>
We can add the new row as follows:
2002-01-07 03:29:15 +01:00
<programlisting>
INSERT INTO pg_amproc (amopclaid, amprocnum, amproc)
SELECT opcl.oid, 1, p.oid
FROM pg_opclass opcl, pg_proc p
WHERE
opcamid = (SELECT oid FROM pg_am WHERE amname = 'btree') AND
opcname = 'complex_abs_ops' AND
p.proname = 'complex_abs_cmp';
2002-01-07 03:29:15 +01:00
</programlisting>
</para>
<para>
And we're done! (Whew.) It should now be possible to create
2002-01-07 03:29:15 +01:00
and use B-tree indexes on <type>complex</type> columns.
</para>
2002-01-07 03:29:15 +01:00
</sect1>
2002-01-07 03:29:15 +01:00
</chapter>
<!-- Keep this comment at the end of the file
Local variables:
mode:sgml
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"./reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:("/usr/lib/sgml/catalog")
sgml-local-ecat-files:nil
End:
-->