diff --git a/doc/src/sgml/bloom.sgml b/doc/src/sgml/bloom.sgml index 285b67b3f1..d1cf9ac24a 100644 --- a/doc/src/sgml/bloom.sgml +++ b/doc/src/sgml/bloom.sgml @@ -110,75 +110,70 @@ CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3) FROM generate_series(1,10000000); SELECT 10000000 -=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6); -CREATE INDEX -=# SELECT pg_size_pretty(pg_relation_size('bloomidx')); - pg_size_pretty ----------------- - 153 MB -(1 row) -=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6); -CREATE INDEX -=# SELECT pg_size_pretty(pg_relation_size('btreeidx')); - pg_size_pretty ----------------- - 387 MB -(1 row) A sequential scan over this large table takes a long time: =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451; - QUERY PLAN --------------------------------------------------------------------&zwsp;----------------------------------------- - Seq Scan on tbloom (cost=0.00..213694.08 rows=1 width=24) (actual time=1445.438..1445.438 rows=0 loops=1) + QUERY PLAN +-------------------------------------------------------------------&zwsp;----------------------------------- + Seq Scan on tbloom (cost=0.00..2137.14 rows=3 width=24) (actual time=16.971..16.971 rows=0 loops=1) Filter: ((i2 = 898732) AND (i5 = 123451)) - Rows Removed by Filter: 10000000 - Planning time: 0.177 ms - Execution time: 1445.473 ms + Rows Removed by Filter: 100000 + Planning Time: 0.346 ms + Execution Time: 16.988 ms (5 rows) - So the planner will usually select an index scan if possible. - With a btree index, we get results like this: + Even with the btree index defined the result will still be a + sequential scan: +=# CREATE INDEX btreeidx ON tbloom (i1, i2, i3, i4, i5, i6); +CREATE INDEX +=# SELECT pg_size_pretty(pg_relation_size('btreeidx')); + pg_size_pretty +---------------- + 3976 kB +(1 row) =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451; - QUERY PLAN --------------------------------------------------------------------&zwsp;------------------------------------------------------------- - Index Only Scan using btreeidx on tbloom (cost=0.56..298311.96 rows=1 width=24) (actual time=445.709..445.709 rows=0 loops=1) - Index Cond: ((i2 = 898732) AND (i5 = 123451)) - Heap Fetches: 0 - Planning time: 0.193 ms - Execution time: 445.770 ms + QUERY PLAN +-------------------------------------------------------------------&zwsp;----------------------------------- + Seq Scan on tbloom (cost=0.00..2137.00 rows=2 width=24) (actual time=12.805..12.805 rows=0 loops=1) + Filter: ((i2 = 898732) AND (i5 = 123451)) + Rows Removed by Filter: 100000 + Planning Time: 0.138 ms + Execution Time: 12.817 ms (5 rows) - Bloom is better than btree in handling this type of search: + Having the bloom index defined on the table is better than btree in + handling this type of search: +=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6); +CREATE INDEX +=# SELECT pg_size_pretty(pg_relation_size('bloomidx')); + pg_size_pretty +---------------- + 1584 kB +(1 row) =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451; - QUERY PLAN --------------------------------------------------------------------&zwsp;-------------------------------------------------------- - Bitmap Heap Scan on tbloom (cost=178435.39..178439.41 rows=1 width=24) (actual time=76.698..76.698 rows=0 loops=1) + QUERY PLAN +-------------------------------------------------------------------&zwsp;-------------------------------------------------- + Bitmap Heap Scan on tbloom (cost=1792.00..1799.69 rows=2 width=24) (actual time=0.388..0.388 rows=0 loops=1) Recheck Cond: ((i2 = 898732) AND (i5 = 123451)) - Rows Removed by Index Recheck: 2439 - Heap Blocks: exact=2408 - -> Bitmap Index Scan on bloomidx (cost=0.00..178435.39 rows=1 width=0) (actual time=72.455..72.455 rows=2439 loops=1) + Rows Removed by Index Recheck: 29 + Heap Blocks: exact=28 + -> Bitmap Index Scan on bloomidx (cost=0.00..1792.00 rows=2 width=0) (actual time=0.356..0.356 rows=29 loops=1) Index Cond: ((i2 = 898732) AND (i5 = 123451)) - Planning time: 0.475 ms - Execution time: 76.778 ms + Planning Time: 0.099 ms + Execution Time: 0.408 ms (8 rows) - Note the relatively large number of false positives: 2439 rows were - selected to be visited in the heap, but none actually matched the - query. We could reduce that by specifying a larger signature length. - In this example, creating the index with length=200 - reduced the number of false positives to 55; but it doubled the index size - (to 306 MB) and ended up being slower for this query (125 ms overall). @@ -187,24 +182,36 @@ CREATE INDEX A better strategy for btree is to create a separate index on each column. Then the planner will choose something like this: +=# CREATE INDEX btreeidx1 ON tbloom (i1); +CREATE INDEX +=# CREATE INDEX btreeidx2 ON tbloom (i2); +CREATE INDEX +=# CREATE INDEX btreeidx3 ON tbloom (i3); +CREATE INDEX +=# CREATE INDEX btreeidx4 ON tbloom (i4); +CREATE INDEX +=# CREATE INDEX btreeidx5 ON tbloom (i5); +CREATE INDEX +=# CREATE INDEX btreeidx6 ON tbloom (i6); +CREATE INDEX =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451; - QUERY PLAN --------------------------------------------------------------------&zwsp;----------------------------------------------------------- - Bitmap Heap Scan on tbloom (cost=9.29..13.30 rows=1 width=24) (actual time=0.148..0.148 rows=0 loops=1) + QUERY PLAN +-------------------------------------------------------------------&zwsp;-------------------------------------------------------- + Bitmap Heap Scan on tbloom (cost=24.34..32.03 rows=2 width=24) (actual time=0.028..0.029 rows=0 loops=1) Recheck Cond: ((i5 = 123451) AND (i2 = 898732)) - -> BitmapAnd (cost=9.29..9.29 rows=1 width=0) (actual time=0.145..0.145 rows=0 loops=1) - -> Bitmap Index Scan on tbloom_i5_idx (cost=0.00..4.52 rows=11 width=0) (actual time=0.089..0.089 rows=10 loops=1) + -> BitmapAnd (cost=24.34..24.34 rows=2 width=0) (actual time=0.027..0.027 rows=0 loops=1) + -> Bitmap Index Scan on btreeidx5 (cost=0.00..12.04 rows=500 width=0) (actual time=0.026..0.026 rows=0 loops=1) Index Cond: (i5 = 123451) - -> Bitmap Index Scan on tbloom_i2_idx (cost=0.00..4.52 rows=11 width=0) (actual time=0.048..0.048 rows=8 loops=1) + -> Bitmap Index Scan on btreeidx2 (cost=0.00..12.04 rows=500 width=0) (never executed) Index Cond: (i2 = 898732) - Planning time: 2.049 ms - Execution time: 0.280 ms + Planning Time: 0.491 ms + Execution Time: 0.055 ms (9 rows) Although this query runs much faster than with either of the single - indexes, we pay a large penalty in index size. Each of the single-column - btree indexes occupies 214 MB, so the total space needed is over 1.2GB, - more than 8 times the space used by the bloom index. + indexes, we pay a penalty in index size. Each of the single-column + btree indexes occupies 2 MB, so the total space needed is 12 MB, + eight times the space used by the bloom index.