postgresql/contrib/intarray
Tom Lane f933766ba7 Restructure pg_opclass, pg_amop, and pg_amproc per previous discussions in
pgsql-hackers.  pg_opclass now has a row for each opclass supported by each
index AM, not a row for each opclass name.  This allows pg_opclass to show
directly whether an AM supports an opclass, and furthermore makes it possible
to store additional information about an opclass that might be AM-dependent.
pg_opclass and pg_amop now store "lossy" and "haskeytype" information that we
previously expected the user to remember to provide in CREATE INDEX commands.
Lossiness is no longer an index-level property, but is associated with the
use of a particular operator in a particular index opclass.

Along the way, IndexSupportInitialize now uses the syscaches to retrieve
pg_amop and pg_amproc entries.  I find this reduces backend launch time by
about ten percent, at the cost of a couple more special cases in catcache.c's
IndexScanOK.

Initial work by Oleg Bartunov and Teodor Sigaev, further hacking by Tom Lane.

initdb forced.
2001-08-21 16:36:06 +00:00
..
bench Restructure pg_opclass, pg_amop, and pg_amproc per previous discussions in 2001-08-21 16:36:06 +00:00
data 1. Fixed error with empty array ( '{}' ), 2001-08-04 19:35:32 +00:00
expected Restructure pg_opclass, pg_amop, and pg_amproc per previous discussions in 2001-08-21 16:36:06 +00:00
sql Restructure pg_opclass, pg_amop, and pg_amproc per previous discussions in 2001-08-21 16:36:06 +00:00
Makefile The attached patch enables the contrib subtree to build cleanly under 2001-06-18 21:38:02 +00:00
README.intarray Restructure pg_opclass, pg_amop, and pg_amproc per previous discussions in 2001-08-21 16:36:06 +00:00
_int.c Looks okay in a quick glance, except error message spelling is poor: 2001-08-04 19:36:45 +00:00
_int.sql.in Restructure pg_opclass, pg_amop, and pg_amproc per previous discussions in 2001-08-21 16:36:06 +00:00

README.intarray

This is an implementation of RD-tree data structure using GiST interface
of PostgreSQL. It has built-in lossy compression.

Current implementation provides index support for one-dimensional array of
int4's - gist__int_ops, suitable for small and medium size of arrays (used on
default), and gist__intbig_ops for indexing large arrays (we use superimposed
signature with length of 4096 bits to represent sets).

All work was done by Teodor Sigaev (teodor@stack.net) and Oleg Bartunov
(oleg@sai.msu.su). See http://www.sai.msu.su/~megera/postgres/gist
for additional information.

CHANGES:

March 19, 2001
   1. Added support for toastable keys
   2. Improved split algorithm for intbig (selection speedup is about 30%)

INSTALLATION:

  gmake
  gmake install
  -- load functions
  psql <database> < _int.sql 

REGRESSION TEST:

   gmake installcheck

EXAMPLE USAGE:

  create table message (mid int not null,sections int[]);
  create table message_section_map (mid int not null,sid int not null);

  -- create indices
CREATE unique index message_key on message ( mid );
CREATE unique index message_section_map_key2 on message_section_map (sid, mid );
CREATE INDEX message_rdtree_idx on message using gist ( sections gist__int_ops);

  -- select some messages with section in 1 OR 2 - OVERLAP operator
  select message.mid from message where message.sections && '{1,2}';  

  -- select messages contains in sections 1 AND 2 - CONTAINS operator
  select message.mid from message where message.sections @ '{1,2}';
  -- the same, CONTAINED operator
  select message.mid from message where '{1,2}' ~ message.sections;

BENCHMARK:

  subdirectory bench contains benchmark suite.
  cd ./bench
  1. createdb TEST
  2. psql TEST < ../_int.sql
  3. ./create_test.pl | psql TEST
  4. ./bench.pl - perl script to benchmark queries, supports OR, AND queries
                  with/without RD-Tree. Run script without arguments to 
                  see availbale options.

     a)test without RD-Tree (OR)
       ./bench.pl -d TEST -s 1,2 -v
     b)test with RD-Tree 
       ./bench.pl -d TEST -s 1,2 -v -r

BENCHMARKS:

Size of table <message>: 200000
Size of table <message_section_map>: 268538 

Distribution of messages by sections:

section 0: 73899 messages
section 1: 16298 messages
section 50: 1241 messages
section 99: 705 messages

old - without RD-Tree support,
new - with RD-Tree

+----------+---------------+----------------+
|Search set|OR, time in sec|AND, time in sec|
|          +-------+-------+--------+-------+
|          |  old  |  new  |   old  |  new  |
+----------+-------+-------+--------+-------+
|         1|  1.427|  0.215|       -|      -|
+----------+-------+-------+--------+-------+
|        99|  1.029|  0.018|       -|      -|
+----------+-------+-------+--------+-------+
|       1,2|  1.829|  0.334|   5.654|  0.042|
+----------+-------+-------+--------+-------+
| 1,2,50,60|  2.057|  0.359|   5.044|  0.007|
+----------+-------+-------+--------+-------+