postgresql

History

Tom Lane e11349fdba Prevent buffer overrun while parsing an integer in a "query_int" value. contrib/intarray's gettoken() uses a fixed-size buffer to collect an integer's digits, and did not guard against overrunning the buffer. This is at least a backend crash risk, and in principle might allow arbitrary code execution. The code didn't check for overflow of the integer value either, which while not presenting a crash risk was still bad. Thanks to Apple Inc's security team for reporting this issue and supplying the fix. Security: CVE-2010-4015		2011-01-27 17:43:45 -05:00
..
bench	Add CVS tag lines to files that were lacking them.	2006-03-11 04:38:42 +00:00
data	1. Fixed error with empty array ( '{}' ),	2001-08-04 19:35:32 +00:00
expected	Remove ill-considered (not to mention undocumented) attempt to make	2007-09-14 03:25:37 +00:00
sql	Remove ill-considered (not to mention undocumented) attempt to make	2007-09-14 03:25:37 +00:00
.gitignore	Some more gitignore cleanups: cover contrib and PL regression test outputs.	2010-09-22 17:23:10 -04:00
Makefile	Make GIN opclass worked with intarray extensions	2006-05-03 16:31:07 +00:00
README.intarray	Remove ill-considered (not to mention undocumented) attempt to make	2007-09-14 03:25:37 +00:00
_int.h	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
_int.sql.in	Remove ill-considered (not to mention undocumented) attempt to make	2007-09-14 03:25:37 +00:00
_int_bool.c	Prevent buffer overrun while parsing an integer in a "query_int" value.	2011-01-27 17:43:45 -05:00
_int_gin.c	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
_int_gist.c	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
_int_op.c	Magic blocks don't do us any good unless we use 'em ... so install one	2006-05-30 22:12:16 +00:00
_int_tool.c	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
_intbig_gist.c	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
uninstall__int.sql	Remove ill-considered (not to mention undocumented) attempt to make	2007-09-14 03:25:37 +00:00

README.intarray

This is an implementation of RD-tree data structure using GiST interface
of PostgreSQL. It has built-in lossy compression.

Current implementation provides index support for one-dimensional array of
integers: gist__int_ops, suitable for small and medium size of arrays (used by
default), and gist__intbig_ops for indexing large arrays (we use superimposed
signature with length of 4096 bits to represent sets).  There is also a
non-default gin__int_ops for GIN indexes on integer arrays.

All work was done by Teodor Sigaev (teodor@stack.net) and Oleg Bartunov
(oleg@sai.msu.su). See http://www.sai.msu.su/~megera/postgres/gist
for additional information. Andrey Oktyabrski did a great work on 
adding new functions and operations.


FUNCTIONS:

  int   icount(int[]) - the number of elements in intarray

test=# select icount('{1,2,3}'::int[]);   
 icount 
--------
      3
(1 row)

  int[] sort(int[], 'asc' | 'desc') - sort intarray

test=# select sort('{1,2,3}'::int[],'desc');
  sort   
---------
 {3,2,1}
(1 row)

  int[] sort(int[]) - sort in ascending order
  int[] sort_asc(int[]),sort_desc(int[]) - shortcuts for sort 

  int[] uniq(int[]) - returns unique elements

test=# select uniq(sort('{1,2,3,2,1}'::int[]));
  uniq   
---------
 {1,2,3}
(1 row)

  int   idx(int[], int item) - returns index of first intarray matching element to item, or
                              '0' if matching failed.

test=# select idx('{1,2,3,2,1}'::int[],2);
 idx 
-----
   2
(1 row)


  int[] subarray(int[],int START [, int LEN]) - returns part of intarray starting from
                                                element number START (from 1) and length LEN. 

test=# select subarray('{1,2,3,2,1}'::int[],2,3);
 subarray 
----------
 {2,3,2}
(1 row)

  int[] intset(int4) - casting int4 to int[]

test=# select intset(1);
 intset 
--------
 {1}
(1 row)

OPERATIONS:

  int[] && int[]  - overlap - returns TRUE if arrays have at least one common element
  int[] @> int[]  - contains - returns TRUE if left array contains right array
  int[] <@ int[]  - contained - returns TRUE if left array is contained in right array
  # int[]         - returns the number of elements in array
  int[] + int     - push element to array ( add to end of array)
  int[] + int[]   - merge of arrays (right array added to the end of left one)
  int[] - int     - remove entries matched by right argument from array
  int[] - int[]   - remove right array from left
  int[] | int     - returns intarray - union of arguments
  int[] | int[]   - returns intarray as a union of two arrays
  int[] & int[]   - returns intersection of arrays
  int[] @@ query_int - returns TRUE if array satisfies query (like '1&(2|3)') 
  query_int ~~ int[] - returns TRUE if array satisfies query (commutator of @@)

(Before PostgreSQL 8.2, the containment operators @> and <@ were
respectively called @ and ~.  These names are still available, but are
deprecated and will eventually be retired.  Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)

CHANGES:

August 6, 2002
   1. Reworked patch from Andrey Oktyabrski (ano@spider.ru) with
      functions: icount, sort, sort_asc, uniq, idx, subarray
      operations: #, +, -, |, &
October 1, 2001
   1. Change search method in array to binary
September 28, 2001
   1. gist__int_ops now is without lossy
   2. add sort entry in picksplit
September 21, 2001
   1. Added support for boolean query (indexable operator @@, looks like
      a @@ '1|(2&3)', perfomance is better in any case )
   2. Done some small optimizations
March 19, 2001
   1. Added support for toastable keys
   2. Improved split algorithm for intbig (selection speedup is about 30%)

INSTALLATION:

  gmake
  gmake install
  -- load functions
  psql <database> < _int.sql 

REGRESSION TEST:

   gmake installcheck

EXAMPLE USAGE:

  create table message (mid int not null,sections int[]);
  create table message_section_map (mid int not null,sid int not null);

  -- create indices
CREATE unique index message_key on message ( mid );
CREATE unique index message_section_map_key2 on message_section_map (sid, mid );
CREATE INDEX message_rdtree_idx on message using gist ( sections gist__int_ops);

  -- select some messages with section in 1 OR 2 - OVERLAP operator
  select message.mid from message where message.sections && '{1,2}';  

  -- select messages contains in sections 1 AND 2 - CONTAINS operator
  select message.mid from message where message.sections @> '{1,2}';
  -- the same, CONTAINED operator
  select message.mid from message where '{1,2}' <@ message.sections;

BENCHMARK:

  subdirectory bench contains benchmark suite.
  cd ./bench
  1. createdb TEST
  2. psql TEST < ../_int.sql
  3. ./create_test.pl | psql TEST
  4. ./bench.pl - perl script to benchmark queries, supports OR, AND queries
                  with/without RD-Tree. Run script without arguments to 
                  see availbale options.

     a)test without RD-Tree (OR)
       ./bench.pl -d TEST -c -s 1,2 -v
     b)test with RD-Tree 
       ./bench.pl -d TEST -c -s 1,2 -v -r

BENCHMARKS:

Size of table <message>: 200000
Size of table <message_section_map>: 269133 

Distribution of messages by sections:

section 0: 74377 messages
section 1: 16284 messages
section 50: 1229 messages
section 99: 683 messages

old - without RD-Tree support,
new - with RD-Tree

+----------+---------------+----------------+
|Search set|OR, time in sec|AND, time in sec|
|          +-------+-------+--------+-------+
|          |  old  |  new  |   old  |  new  |
+----------+-------+-------+--------+-------+
|         1|  0.625|  0.101|       -|      -|
+----------+-------+-------+--------+-------+
|        99|  0.018|  0.017|       -|      -|
+----------+-------+-------+--------+-------+
|       1,2|  0.766|  0.133|   0.628|  0.045|
+----------+-------+-------+--------+-------+
| 1,2,50,65|  0.794|  0.141|   0.030|  0.006|
+----------+-------+-------+--------+-------+