Interfacing Extensions To Indexes Introduction The procedures described thus far let you define new types, new functions, and new operators. However, we cannot yet define a secondary index (such as a B-tree, R-tree, or hash access method) over a new type or its operators. Look back at . The right half shows the catalogs that we must modify in order to tell PostgreSQL how to use a user-defined type and/or user-defined operators with an index (i.e., pg_am, pg_amop, pg_amproc, pg_operator and pg_opclass). Unfortunately, there is no simple command to do this. We will demonstrate how to modify these catalogs through a running example: a new operator class for the B-tree access method that stores and sorts complex numbers in ascending absolute value order. Access Methods The pg_am table contains one row for every index access method. Support for the heap access method is built into PostgreSQL, but all other access methods are described in pg_am. The schema is shown in . Index Access Method Schema Column Description amname name of the access method amowner user ID of the owner (currently not used) amstrategies number of strategies for this access method (see below) amsupport number of support routines for this access method (see below) amorderstrategy zero if the index offers no sort order, otherwise the strategy number of the strategy operator that describes the sort order amcanunique does AM support unique indexes? amcanmulticol does AM support multicolumn indexes? amindexnulls does AM support NULL index entries? amconcurrent does AM support concurrent updates? amgettuple aminsert ... procedure identifiers for interface routines to the access method. For example, regproc IDs for opening, closing, and getting rows from the access method appear here.
The OID of the row in pg_am is used as a foreign key in a lot of other tables. You do not need to add a new row to this table; all that you are interested in is the OID of the access method you want to extend: SELECT oid FROM pg_am WHERE amname = 'btree'; oid ----- 403 (1 row) We will use that query in a WHERE clause later.
Access Method Strategies The amstrategies column exists to standardize comparisons across data types. For example, B-trees impose a strict ordering on keys, lesser to greater. Since PostgreSQL allows the user to define operators, PostgreSQL cannot look at the name of an operator (e.g., > or <) and tell what kind of comparison it is. In fact, some access methods don't impose any ordering at all. For example, R-trees express a rectangle-containment relationship, whereas a hashed data structure expresses only bitwise similarity based on the value of a hash function. PostgreSQL needs some consistent way of taking a qualification in your query, looking at the operator, and then deciding if a usable index exists. This implies that PostgreSQL needs to know, for example, that the <= and > operators partition a B-tree. PostgreSQL uses strategies to express these relationships between operators and the way they can be used to scan indexes. Defining a new set of strategies is beyond the scope of this discussion, but we'll explain how B-tree strategies work because you'll need to know that to add a new B-tree operator class. In the pg_am table, the amstrategies column sets the number of strategies defined for this access method. For B-trees, this number is 5. The meanings of these strategies are shown in . B-tree StrategiesB-tree Operation Index less than 1 less than or equal 2 equal 3 greater than or equal 4 greater than 5
The idea is that you'll need to add operators corresponding to these strategies to the pg_amop relation (see below). The access method code can use these strategy numbers, regardless of data type, to figure out how to partition the B-tree, compute selectivity, and so on. Don't worry about the details of adding operators yet; just understand that there must be a set of these operators for int2, int4, oid, and all other data types on which a B-tree can operate.
Access Method Support Routines Sometimes, strategies aren't enough information for the system to figure out how to use an index. Some access methods require additional support routines in order to work. For example, the B-tree access method must be able to compare two keys and determine whether one is greater than, equal to, or less than the other. Similarly, the R-tree access method must be able to compute intersections, unions, and sizes of rectangles. These operations do not correspond to operators used in qualifications in SQL queries; they are administrative routines used by the access methods, internally. In order to manage diverse support routines consistently across all PostgreSQL access methods, pg_am includes a column called amsupport. This column records the number of support routines used by an access method. For B-trees, this number is one: the routine to take two keys and return -1, 0, or +1, depending on whether the first key is less than, equal to, or greater than the second. (Strictly speaking, this routine can return a negative number (< 0), zero, or a non-zero positive number (> 0).) The amstrategies entry in pg_am is just the number of strategies defined for the access method in question. The operators for less than, less equal, and so on don't appear in pg_am. Similarly, amsupport is just the number of support routines required by the access method. The actual routines are listed elsewhere. By the way, the amorderstrategy column tells whether the access method supports ordered scan. Zero means it doesn't; if it does, amorderstrategy is the number of the strategy routine that corresponds to the ordering operator. For example, B-tree has amorderstrategy = 1, which is its less than strategy number. Operator Classes The next table of interest is pg_opclass. This table defines operator class names and input data types for each of the operator classes supported by a given index access method. The same class name can be used for several different access methods (for example, both B-tree and hash access methods have operator classes named oid_ops), but a separate pg_opclass row must appear for each access method. The OID of the pg_opclass row is used as a foreign key in other tables to associate specific operators and support routines with the operator class. You need to add a row with your operator class name (for example, complex_abs_ops) to pg_opclass: INSERT INTO pg_opclass (opcamid, opcname, opcnamespace, opcowner, opcintype, opcdefault, opckeytype) VALUES ( (SELECT oid FROM pg_am WHERE amname = 'btree'), 'complex_abs_ops', (SELECT oid FROM pg_namespace WHERE nspname = 'pg_catalog'), 1, -- UID of superuser is hardwired to 1 as of PG 7.3 (SELECT oid FROM pg_type WHERE typname = 'complex'), true, 0); SELECT oid, * FROM pg_opclass WHERE opcname = 'complex_abs_ops'; oid | opcamid | opcname | opcnamespace | opcowner | opcintype | opcdefault | opckeytype --------+---------+-----------------+--------------+----------+-----------+------------+------------ 277975 | 403 | complex_abs_ops | 11 | 1 | 277946 | t | 0 (1 row) Note that the OID for your pg_opclass row will be different! Don't worry about this though. We'll get this number from the system later just like we got the OID of the type here. The above example assumes that you want to make this new operator class the default B-tree operator class for the complex data type. If you don't, just set opcdefault to false instead. opckeytype is not described here; it should always be zero for B-tree operator classes. Creating the Operators and Support Routines So now we have an access method and an operator class. We still need a set of operators. The procedure for defining operators was discussed in . For the complex_abs_ops operator class on B-trees, the operators we require are: absolute-value less-than (strategy 1) absolute-value less-than-or-equal (strategy 2) absolute-value equal (strategy 3) absolute-value greater-than-or-equal (strategy 4) absolute-value greater-than (strategy 5) Suppose the code that implements these functions is stored in the file PGROOT/src/tutorial/complex.c, which we have compiled into PGROOT/src/tutorial/complex.so. Part of the C code looks like this: #define Mag(c) ((c)->x*(c)->x + (c)->y*(c)->y) bool complex_abs_eq(Complex *a, Complex *b) { double amag = Mag(a), bmag = Mag(b); return (amag==bmag); } (Note that we will only show the equality operator for the rest of the examples. The other four operators are very similar. Refer to complex.c or complex.source for the details.) We make the function known to PostgreSQL like this: CREATE FUNCTION complex_abs_eq(complex, complex) RETURNS boolean AS 'PGROOT/src/tutorial/complex' LANGUAGE C; There are some important things that are happening here: First, note that operators for less-than, less-than-or-equal, equal, greater-than-or-equal, and greater-than for complex are being defined. We can only have one operator named, say, = and taking type complex for both operands. In this case we don't have any other operator = for complex, but if we were building a practical data type we'd probably want = to be the ordinary equality operation for complex numbers. In that case, we'd need to use some other operator name for complex_abs_eq. Second, although PostgreSQL can cope with operators having the same name as long as they have different input data types, C can only cope with one global routine having a given name, period. So we shouldn't name the C function something simple like abs_eq. Usually it's a good practice to include the data type name in the C function name, so as not to conflict with functions for other data types. Third, we could have made the PostgreSQL name of the function abs_eq, relying on PostgreSQL to distinguish it by input data types from any other PostgreSQL function of the same name. To keep the example simple, we make the function have the same names at the C level and PostgreSQL level. Finally, note that these operator functions return Boolean values. In practice, all operators defined as index access method strategies must return type boolean, since they must appear at the top level of a WHERE clause to be used with an index. (On the other hand, the support function returns whatever the particular access method expects -- in this case, a signed integer.) The final routine in the file is the support routine mentioned when we discussed the amsupport column of the pg_am table. We will use this later on. For now, ignore it. Now we are ready to define the operators: CREATE OPERATOR = ( leftarg = complex, rightarg = complex, procedure = complex_abs_eq, restrict = eqsel, join = eqjoinsel ); The important things here are the procedure names (which are the C functions defined above) and the restriction and join selectivity functions. You should just use the selectivity functions used in the example (see complex.source). Note that there are different such functions for the less-than, equal, and greater-than cases. These must be supplied or the optimizer will be unable to make effective use of the index. The next step is to add entries for these operators to the pg_amop relation. To do this, we'll need the OIDs of the operators we just defined. We'll look up the names of all the operators that take two operands of type complex, and pick ours out: SELECT o.oid AS opoid, o.oprname INTO TEMP TABLE complex_ops_tmp FROM pg_operator o, pg_type t WHERE o.oprleft = t.oid and o.oprright = t.oid and t.typname = 'complex'; opoid | oprname --------+--------- 277963 | + 277970 | < 277971 | <= 277972 | = 277973 | >= 277974 | > (6 rows) (Again, some of your OID numbers will almost certainly be different.) The operators we are interested in are those with OIDs 277970 through 277974. The values you get will probably be different, and you should substitute them for the values below. We will do this with a select statement. Now we are ready to insert entries into pg_amop for our new operator class. These entries must associate the correct B-tree strategy numbers with each of the operators we need. The command to insert the less-than operator looks like: INSERT INTO pg_amop (amopclaid, amopstrategy, amopreqcheck, amopopr) SELECT opcl.oid, 1, false, c.opoid FROM pg_opclass opcl, complex_ops_tmp c WHERE opcamid = (SELECT oid FROM pg_am WHERE amname = 'btree') AND opcname = 'complex_abs_ops' AND c.oprname = '<'; Now do this for the other operators substituting for the 1 in the second line above and the < in the last line. Note the order: less than is 1, less than or equal is 2, equal is 3, greater than or equal is 4, and greater than is 5. The field amopreqcheck is not discussed here; it should always be false for B-tree operators. The final step is the registration of the support routine previously described in our discussion of pg_am. The OID of this support routine is stored in the pg_amproc table, keyed by the operator class OID and the support routine number. First, we need to register the function in PostgreSQL (recall that we put the C code that implements this routine in the bottom of the file in which we implemented the operator routines): CREATE FUNCTION complex_abs_cmp(complex, complex) RETURNS integer AS 'PGROOT/src/tutorial/complex' LANGUAGE C; SELECT oid, proname FROM pg_proc WHERE proname = 'complex_abs_cmp'; oid | proname --------+----------------- 277997 | complex_abs_cmp (1 row) (Again, your OID number will probably be different.) We can add the new row as follows: INSERT INTO pg_amproc (amopclaid, amprocnum, amproc) SELECT opcl.oid, 1, p.oid FROM pg_opclass opcl, pg_proc p WHERE opcamid = (SELECT oid FROM pg_am WHERE amname = 'btree') AND opcname = 'complex_abs_ops' AND p.proname = 'complex_abs_cmp'; And we're done! (Whew.) It should now be possible to create and use B-tree indexes on complex columns.