mirror of
https://git.postgresql.org/git/postgresql.git
synced 2024-10-03 02:06:52 +02:00
501 lines
18 KiB
Plaintext
501 lines
18 KiB
Plaintext
<!--
|
|
$Header: /cvsroot/pgsql/doc/src/sgml/xindex.sgml,v 1.10 2000/03/28 02:59:55 tgl Exp $
|
|
Postgres documentation
|
|
-->
|
|
|
|
<chapter id="xindex">
|
|
<title>Interfacing Extensions To Indices</title>
|
|
|
|
<para>
|
|
The procedures described thus far let you define a new type, new
|
|
functions and new operators. However, we cannot yet define a secondary
|
|
index (such as a <acronym>B-tree</acronym>, <acronym>R-tree</acronym> or
|
|
hash access method) over a new type or its operators.
|
|
</para>
|
|
|
|
<para>
|
|
Look back at
|
|
<xref endterm="EXTEND-CATALOGS" linkend="EXTEND-CATALOGS">.
|
|
The right half shows the catalogs that we must modify in order to tell
|
|
<productname>Postgres</productname> how to use a user-defined type and/or
|
|
user-defined operators with an index (i.e., <filename>pg_am, pg_amop,
|
|
pg_amproc, pg_operator</filename> and <filename>pg_opclass</filename>).
|
|
Unfortunately, there is no simple command to do this. We will demonstrate
|
|
how to modify these catalogs through a running example: a new operator
|
|
class for the <acronym>B-tree</acronym> access method that stores and
|
|
sorts complex numbers in ascending absolute value order.
|
|
</para>
|
|
|
|
<para>
|
|
The <filename>pg_am</filename> class contains one instance for every user
|
|
defined access method. Support for the heap access method is built into
|
|
<productname>Postgres</productname>, but every other access method is
|
|
described here. The schema is
|
|
|
|
<table tocentry="1">
|
|
<title>Index Schema</title>
|
|
<titleabbrev>Indices</titleabbrev>
|
|
<tgroup cols="2">
|
|
<thead>
|
|
<row>
|
|
<entry>Attribute</entry>
|
|
<entry>Description</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>amname</entry>
|
|
<entry>name of the access method</entry>
|
|
</row>
|
|
<row>
|
|
<entry>amowner</entry>
|
|
<entry>object id of the owner's instance in pg_user</entry>
|
|
</row>
|
|
<row>
|
|
<entry>amstrategies</entry>
|
|
<entry>number of strategies for this access method (see below)</entry>
|
|
</row>
|
|
<row>
|
|
<entry>amsupport</entry>
|
|
<entry>number of support routines for this access method (see below)</entry>
|
|
</row>
|
|
<row>
|
|
<entry>amorderstrategy</entry>
|
|
<entry>zero if the index offers no sort order, otherwise the strategy
|
|
number of the strategy operator that describes the sort order</entry>
|
|
</row>
|
|
<row>
|
|
<entry>amgettuple</entry>
|
|
</row>
|
|
<row>
|
|
<entry>aminsert</entry>
|
|
</row>
|
|
<row>
|
|
<entry>...</entry>
|
|
<entry>procedure identifiers for interface routines to the access
|
|
method. For example, regproc ids for opening, closing, and
|
|
getting instances from the access method appear here.</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
</para>
|
|
|
|
<para>
|
|
The <acronym>object ID</acronym> of the instance in
|
|
<filename>pg_am</filename> is used as a foreign key in lots of other
|
|
classes. You don't need to add a new instance to this class; all
|
|
you're interested in is the <acronym>object ID</acronym> of the access
|
|
method instance you want to extend:
|
|
|
|
<programlisting>
|
|
SELECT oid FROM pg_am WHERE amname = 'btree';
|
|
|
|
oid
|
|
-----
|
|
403
|
|
(1 row)
|
|
</programlisting>
|
|
|
|
We will use that <command>SELECT</command> in a <command>WHERE</command>
|
|
clause later.
|
|
</para>
|
|
|
|
<para>
|
|
The <filename>amstrategies</filename> attribute exists to standardize
|
|
comparisons across data types. For example, <acronym>B-tree</acronym>s
|
|
impose a strict ordering on keys, lesser to greater. Since
|
|
<productname>Postgres</productname> allows the user to define operators,
|
|
<productname>Postgres</productname> cannot look at the name of an operator
|
|
(eg, ">" or "<") and tell what kind of comparison it is. In fact,
|
|
some access methods don't impose any ordering at all. For example,
|
|
<acronym>R-tree</acronym>s express a rectangle-containment relationship,
|
|
whereas a hashed data structure expresses only bitwise similarity based
|
|
on the value of a hash function. <productname>Postgres</productname>
|
|
needs some consistent way of taking a qualification in your query,
|
|
looking at the operator and then deciding if a usable index exists. This
|
|
implies that <productname>Postgres</productname> needs to know, for
|
|
example, that the "<=" and ">" operators partition a
|
|
<acronym>B-tree</acronym>. <productname>Postgres</productname>
|
|
uses strategies to express these relationships between
|
|
operators and the way they can be used to scan indices.
|
|
</para>
|
|
|
|
<para>
|
|
Defining a new set of strategies is beyond the scope of this discussion,
|
|
but we'll explain how <acronym>B-tree</acronym> strategies work because
|
|
you'll need to know that to add a new operator class. In the
|
|
<filename>pg_am</filename> class, the amstrategies attribute is the
|
|
number of strategies defined for this access method. For
|
|
<acronym>B-tree</acronym>s, this number is 5. These strategies
|
|
correspond to
|
|
|
|
<table tocentry="1">
|
|
<title>B-tree Strategies</title>
|
|
<titleabbrev>B-tree</titleabbrev>
|
|
<tgroup cols="2">
|
|
<thead>
|
|
<row>
|
|
<entry>Operation</entry>
|
|
<entry>Index</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>less than</entry>
|
|
<entry>1</entry>
|
|
</row>
|
|
<row>
|
|
<entry>less than or equal</entry>
|
|
<entry>2</entry>
|
|
</row>
|
|
<row>
|
|
<entry>equal</entry>
|
|
<entry>3</entry>
|
|
</row>
|
|
<row>
|
|
<entry>greater than or equal</entry>
|
|
<entry>4</entry>
|
|
</row>
|
|
<row>
|
|
<entry>greater than</entry>
|
|
<entry>5</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
</para>
|
|
|
|
<para>
|
|
The idea is that you'll need to add procedures corresponding to the
|
|
comparisons above to the <filename>pg_amop</filename> relation (see below).
|
|
The access method code can use these strategy numbers, regardless of data
|
|
type, to figure out how to partition the <acronym>B-tree</acronym>,
|
|
compute selectivity, and so on. Don't worry about the details of adding
|
|
procedures yet; just understand that there must be a set of these
|
|
procedures for <filename>int2, int4, oid,</filename> and every other
|
|
data type on which a <acronym>B-tree</acronym> can operate.
|
|
</para>
|
|
|
|
<para>
|
|
Sometimes, strategies aren't enough information for the system to figure
|
|
out how to use an index. Some access methods require other support
|
|
routines in order to work. For example, the <acronym>B-tree</acronym>
|
|
access method must be able to compare two keys and determine whether one
|
|
is greater than, equal to, or less than the other. Similarly, the
|
|
<acronym>R-tree</acronym> access method must be able to compute
|
|
intersections, unions, and sizes of rectangles. These
|
|
operations do not correspond to user qualifications in
|
|
SQL queries; they are administrative routines used by
|
|
the access methods, internally.
|
|
</para>
|
|
|
|
<para>
|
|
In order to manage diverse support routines consistently across all
|
|
<productname>Postgres</productname> access methods,
|
|
<filename>pg_am</filename> includes an attribute called
|
|
<filename>amsupport</filename>. This attribute records the number of
|
|
support routines used by an access method. For <acronym>B-tree</acronym>s,
|
|
this number is one -- the routine to take two keys and return -1, 0, or
|
|
+1, depending on whether the first key is less than, equal
|
|
to, or greater than the second.
|
|
|
|
<note>
|
|
<para>
|
|
Strictly speaking, this routine can return a negative
|
|
number (< 0), 0, or a non-zero positive number (> 0).
|
|
</para>
|
|
</note>
|
|
</para>
|
|
|
|
<para>
|
|
The <filename>amstrategies</filename> entry in <filename>pg_am</filename>
|
|
is just the number
|
|
of strategies defined for the access method in question. The procedures
|
|
for less than, less equal, and so on don't appear in
|
|
<filename>pg_am</filename>. Similarly, <filename>amsupport</filename>
|
|
is just the number of support routines required by the access
|
|
method. The actual routines are listed elsewhere.
|
|
</para>
|
|
|
|
<para>
|
|
By the way, the <filename>amorderstrategy</filename> entry tells whether
|
|
the access method supports ordered scan. Zero means it doesn't; if it
|
|
does, <filename>amorderstrategy</filename> is the number of the strategy
|
|
routine that corresponds to the ordering operator. For example, btree
|
|
has <filename>amorderstrategy</filename> = 1 which is its
|
|
"less than" strategy number.
|
|
</para>
|
|
|
|
<para>
|
|
The next class of interest is <filename>pg_opclass</filename>. This class
|
|
exists only to associate an operator class name and perhaps a default type
|
|
with an operator class oid. Some existing opclasses are <filename>int2_ops,
|
|
int4_ops,</filename> and <filename>oid_ops</filename>. You need to add an
|
|
instance with your opclass name (for example,
|
|
<filename>complex_abs_ops</filename>) to
|
|
<filename>pg_opclass</filename>. The <filename>oid</filename> of
|
|
this instance will be a foreign key in other classes, notably
|
|
<filename>pg_amop</filename>.
|
|
|
|
<programlisting>
|
|
INSERT INTO pg_opclass (opcname, opcdeftype)
|
|
SELECT 'complex_abs_ops', oid FROM pg_type WHERE typname = 'complex';
|
|
|
|
SELECT oid, opcname, opcdeftype
|
|
FROM pg_opclass
|
|
WHERE opcname = 'complex_abs_ops';
|
|
|
|
oid | opcname | opcdeftype
|
|
--------+-----------------+------------
|
|
277975 | complex_abs_ops | 277946
|
|
(1 row)
|
|
</programlisting>
|
|
|
|
Note that the oid for your <filename>pg_opclass</filename> instance will
|
|
be different! Don't worry about this though. We'll get this number
|
|
from the system later just like we got the oid of the type here.
|
|
</para>
|
|
|
|
<para>
|
|
The above example assumes that you want to make this new opclass the
|
|
default index opclass for the <filename>complex</filename> datatype.
|
|
If you don't, just insert zero into <filename>opcdeftype</filename>,
|
|
rather than inserting the datatype's oid:
|
|
|
|
<programlisting>
|
|
INSERT INTO pg_opclass (opcname, opcdeftype) VALUES ('complex_abs_ops', 0);
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
<para>
|
|
So now we have an access method and an operator class.
|
|
We still need a set of operators. The procedure for
|
|
defining operators was discussed earlier in this manual.
|
|
For the <filename>complex_abs_ops</filename> operator class on Btrees,
|
|
the operators we require are:
|
|
|
|
<programlisting>
|
|
absolute value less-than
|
|
absolute value less-than-or-equal
|
|
absolute value equal
|
|
absolute value greater-than-or-equal
|
|
absolute value greater-than
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Suppose the code that implements the functions defined
|
|
is stored in the file
|
|
<filename>PGROOT/src/tutorial/complex.c</filename>
|
|
</para>
|
|
|
|
<para>
|
|
Part of the C code looks like this: (note that we will only show the
|
|
equality operator for the rest of the examples. The other four
|
|
operators are very similar. Refer to <filename>complex.c</filename>
|
|
or <filename>complex.source</filename> for the details.)
|
|
|
|
<programlisting>
|
|
#define Mag(c) ((c)->x*(c)->x + (c)->y*(c)->y)
|
|
|
|
bool
|
|
complex_abs_eq(Complex *a, Complex *b)
|
|
{
|
|
double amag = Mag(a), bmag = Mag(b);
|
|
return (amag==bmag);
|
|
}
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
We make the function known to Postgres like this:
|
|
<programlisting>
|
|
CREATE FUNCTION complex_abs_eq(complex, complex)
|
|
RETURNS bool
|
|
AS 'PGROOT/tutorial/obj/complex.so'
|
|
LANGUAGE 'c';
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
There are some important things that are happening here.
|
|
</para>
|
|
|
|
<para>
|
|
First, note that operators for less-than, less-than-or-equal, equal,
|
|
greater-than-or-equal, and greater-than for <filename>complex</filename>
|
|
are being defined. We can only have one operator named, say, = and
|
|
taking type <filename>complex</filename> for both operands. In this case
|
|
we don't have any other operator = for <filename>complex</filename>,
|
|
but if we were building a practical datatype we'd probably want = to
|
|
be the ordinary equality operation for complex numbers. In that case,
|
|
we'd need to use some other operator name for complex_abs_eq.
|
|
</para>
|
|
|
|
<para>
|
|
Second, although Postgres can cope with operators having
|
|
the same name as long as they have different input datatypes, C can only
|
|
cope with one global routine having a given name, period. So we shouldn't
|
|
name the C function something simple like <filename>abs_eq</filename>.
|
|
Usually it's a good practice to include the datatype name in the C
|
|
function name, so as not to conflict with functions for other datatypes.
|
|
</para>
|
|
|
|
<para>
|
|
Third, we could have made the Postgres name of the function
|
|
<filename>abs_eq</filename>, relying on Postgres to distinguish it
|
|
by input datatypes from any other Postgres function of the same name.
|
|
To keep the example simple, we make the function have the same names
|
|
at the C level and Postgres level.
|
|
</para>
|
|
|
|
<para>
|
|
Finally, note that these operator functions return Boolean values.
|
|
The access methods rely on this fact. (On the other
|
|
hand, the support function returns whatever the particular access method
|
|
expects -- in this case, a signed integer.) The final routine in the
|
|
file is the "support routine" mentioned when we discussed the amsupport
|
|
attribute of the <filename>pg_am</filename> class. We will use this
|
|
later on. For now, ignore it.
|
|
</para>
|
|
|
|
<para>
|
|
Now we are ready to define the operators:
|
|
|
|
<programlisting>
|
|
CREATE OPERATOR = (
|
|
leftarg = complex, rightarg = complex,
|
|
procedure = complex_abs_eq,
|
|
restrict = eqsel, join = eqjoinsel
|
|
)
|
|
</programlisting>
|
|
|
|
The important
|
|
things here are the procedure names (which are the <acronym>C</acronym>
|
|
functions defined above) and the restriction and join selectivity
|
|
functions. You should just use the selectivity functions used in
|
|
the example (see <filename>complex.source</filename>).
|
|
Note that there
|
|
are different such functions for the less-than, equal, and greater-than
|
|
cases. These must be supplied, or the optimizer will be unable to
|
|
make effective use of the index.
|
|
</para>
|
|
|
|
<para>
|
|
The next step is to add entries for these operators to
|
|
the <filename>pg_amop</filename> relation. To do this,
|
|
we'll need the <filename>oid</filename>s of the operators we just
|
|
defined. We'll look up the names of all the operators that take
|
|
two <filename>complex</filename>es, and pick ours out:
|
|
|
|
<programlisting>
|
|
SELECT o.oid AS opoid, o.oprname
|
|
INTO TABLE complex_ops_tmp
|
|
FROM pg_operator o, pg_type t
|
|
WHERE o.oprleft = t.oid and o.oprright = t.oid
|
|
and t.typname = 'complex';
|
|
|
|
opoid | oprname
|
|
--------+---------
|
|
277963 | +
|
|
277970 | <
|
|
277971 | <=
|
|
277972 | =
|
|
277973 | >=
|
|
277974 | >
|
|
(6 rows)
|
|
</programlisting>
|
|
|
|
(Again, some of your <filename>oid</filename> numbers will almost
|
|
certainly be different.) The operators we are interested in are those
|
|
with <filename>oid</filename>s 277970 through 277974. The values you
|
|
get will probably be different, and you should substitute them for the
|
|
values below. We will do this with a select statement.
|
|
</para>
|
|
|
|
<para>
|
|
Now we're ready to update <filename>pg_amop</filename> with our new
|
|
operator class. The most important thing in this entire discussion
|
|
is that the operators are ordered, from less than through greater
|
|
than, in <filename>pg_amop</filename>. We add the instances we need:
|
|
|
|
<programlisting>
|
|
INSERT INTO pg_amop (amopid, amopclaid, amopopr, amopstrategy)
|
|
SELECT am.oid, opcl.oid, c.opoid, 1
|
|
FROM pg_am am, pg_opclass opcl, complex_ops_tmp c
|
|
WHERE amname = 'btree' AND
|
|
opcname = 'complex_abs_ops' AND
|
|
c.oprname = '<';
|
|
</programlisting>
|
|
|
|
Now do this for the other operators substituting for the "1" in the
|
|
third line above and the "<" in the last line. Note the order:
|
|
"less than" is 1, "less than or equal" is 2, "equal" is 3, "greater
|
|
than or equal" is 4, and "greater than" is 5.
|
|
</para>
|
|
|
|
<para>
|
|
The next step is registration of the "support routine" previously
|
|
described in our discussion of <filename>pg_am</filename>. The
|
|
<filename>oid</filename> of this support routine is stored in the
|
|
<filename>pg_amproc</filename> class, keyed by the access method
|
|
<filename>oid</filename> and the operator class <filename>oid</filename>.
|
|
First, we need to register the function in
|
|
<productname>Postgres</productname> (recall that we put the
|
|
<acronym>C</acronym> code that implements this routine in the bottom of
|
|
the file in which we implemented the operator routines):
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION complex_abs_cmp(complex, complex)
|
|
RETURNS int4
|
|
AS 'PGROOT/tutorial/obj/complex.so'
|
|
LANGUAGE 'c';
|
|
|
|
SELECT oid, proname FROM pg_proc
|
|
WHERE proname = 'complex_abs_cmp';
|
|
|
|
oid | proname
|
|
--------+-----------------
|
|
277997 | complex_abs_cmp
|
|
(1 row)
|
|
</programlisting>
|
|
|
|
(Again, your <filename>oid</filename> number will probably be different.)
|
|
We can add the new instance as follows:
|
|
|
|
<programlisting>
|
|
INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum)
|
|
SELECT a.oid, b.oid, c.oid, 1
|
|
FROM pg_am a, pg_opclass b, pg_proc c
|
|
WHERE a.amname = 'btree' AND
|
|
b.opcname = 'complex_abs_ops' AND
|
|
c.proname = 'complex_abs_cmp';
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
And we're done! (Whew.) It should now be possible to create
|
|
and use btree indexes on <filename>complex</filename> columns.
|
|
</para>
|
|
|
|
</chapter>
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
Local variables:
|
|
mode: sgml
|
|
sgml-omittag:nil
|
|
sgml-shorttag:t
|
|
sgml-minimize-attributes:nil
|
|
sgml-always-quote-attributes:t
|
|
sgml-indent-step:1
|
|
sgml-indent-data:t
|
|
sgml-parent-document:nil
|
|
sgml-default-dtd-file:"./reference.ced"
|
|
sgml-exposed-tags:nil
|
|
sgml-local-catalogs:"/usr/lib/sgml/catalog"
|
|
sgml-local-ecat-files:nil
|
|
End:
|
|
-->
|