BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
CREATE TABLE brintest (byteacol bytea,
|
|
|
|
charcol "char",
|
|
|
|
namecol name,
|
|
|
|
int8col bigint,
|
|
|
|
int2col smallint,
|
|
|
|
int4col integer,
|
|
|
|
textcol text,
|
|
|
|
oidcol oid,
|
|
|
|
tidcol tid,
|
|
|
|
float4col real,
|
|
|
|
float8col double precision,
|
|
|
|
macaddrcol macaddr,
|
|
|
|
inetcol inet,
|
2015-05-07 18:02:22 +02:00
|
|
|
cidrcol cidr,
|
BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
bpcharcol character,
|
|
|
|
datecol date,
|
|
|
|
timecol time without time zone,
|
|
|
|
timestampcol timestamp without time zone,
|
|
|
|
timestamptzcol timestamp with time zone,
|
|
|
|
intervalcol interval,
|
|
|
|
timetzcol time with time zone,
|
|
|
|
bitcol bit(10),
|
|
|
|
varbitcol bit varying(16),
|
|
|
|
numericcol numeric,
|
|
|
|
uuidcol uuid,
|
2015-05-15 23:05:22 +02:00
|
|
|
int4rangecol int4range,
|
|
|
|
lsncol pg_lsn,
|
|
|
|
boxcol box
|
2020-08-17 22:20:06 +02:00
|
|
|
) WITH (fillfactor=10, autovacuum_enabled=off);
|
BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
|
|
|
|
INSERT INTO brintest SELECT
|
2015-05-07 18:02:22 +02:00
|
|
|
repeat(stringu1, 8)::bytea,
|
BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
substr(stringu1, 1, 1)::"char",
|
|
|
|
stringu1::name, 142857 * tenthous,
|
|
|
|
thousand,
|
|
|
|
twothousand,
|
2015-05-07 18:02:22 +02:00
|
|
|
repeat(stringu1, 8),
|
BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
unique1::oid,
|
|
|
|
format('(%s,%s)', tenthous, twenty)::tid,
|
|
|
|
(four + 1.0)/(hundred+1),
|
|
|
|
odd::float8 / (tenthous + 1),
|
|
|
|
format('%s:00:%s:00:%s:00', to_hex(odd), to_hex(even), to_hex(hundred))::macaddr,
|
2015-05-07 18:02:22 +02:00
|
|
|
inet '10.2.3.4/24' + tenthous,
|
|
|
|
cidr '10.2.3/24' + tenthous,
|
BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
substr(stringu1, 1, 1)::bpchar,
|
|
|
|
date '1995-08-15' + tenthous,
|
|
|
|
time '01:20:30' + thousand * interval '18.5 second',
|
|
|
|
timestamp '1942-07-23 03:05:09' + tenthous * interval '36.38 hours',
|
|
|
|
timestamptz '1972-10-10 03:00' + thousand * interval '1 hour',
|
|
|
|
justify_days(justify_hours(tenthous * interval '12 minutes')),
|
2015-05-07 18:02:22 +02:00
|
|
|
timetz '01:30:20+02' + hundred * interval '15 seconds',
|
BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
thousand::bit(10),
|
|
|
|
tenthous::bit(16)::varbit,
|
|
|
|
tenthous::numeric(36,30) * fivethous * even / (hundred + 1),
|
|
|
|
format('%s%s-%s-%s-%s-%s%s%s', to_char(tenthous, 'FM0000'), to_char(tenthous, 'FM0000'), to_char(tenthous, 'FM0000'), to_char(tenthous, 'FM0000'), to_char(tenthous, 'FM0000'), to_char(tenthous, 'FM0000'), to_char(tenthous, 'FM0000'), to_char(tenthous, 'FM0000'))::uuid,
|
2015-05-15 23:05:22 +02:00
|
|
|
int4range(thousand, twothousand),
|
|
|
|
format('%s/%s%s', odd, even, tenthous)::pg_lsn,
|
|
|
|
box(point(odd, even), point(thousand, twothousand))
|
2015-06-04 19:46:34 +02:00
|
|
|
FROM tenk1 ORDER BY unique2 LIMIT 100;
|
2014-11-14 20:27:26 +01:00
|
|
|
|
2015-05-07 18:02:22 +02:00
|
|
|
-- throw in some NULL's and different values
|
2015-05-15 23:05:22 +02:00
|
|
|
INSERT INTO brintest (inetcol, cidrcol, int4rangecol) SELECT
|
2015-05-07 18:02:22 +02:00
|
|
|
inet 'fe80::6e40:8ff:fea9:8c46' + tenthous,
|
2015-05-15 23:05:22 +02:00
|
|
|
cidr 'fe80::6e40:8ff:fea9:8c46' + tenthous,
|
|
|
|
'empty'::int4range
|
2015-06-04 19:46:34 +02:00
|
|
|
FROM tenk1 ORDER BY thousand, tenthous LIMIT 25;
|
BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
|
|
|
|
CREATE INDEX brinidx ON brintest USING brin (
|
|
|
|
byteacol,
|
|
|
|
charcol,
|
|
|
|
namecol,
|
|
|
|
int8col,
|
|
|
|
int2col,
|
|
|
|
int4col,
|
|
|
|
textcol,
|
|
|
|
oidcol,
|
|
|
|
tidcol,
|
|
|
|
float4col,
|
|
|
|
float8col,
|
|
|
|
macaddrcol,
|
2015-05-15 23:05:22 +02:00
|
|
|
inetcol inet_inclusion_ops,
|
2015-01-22 21:01:09 +01:00
|
|
|
inetcol inet_minmax_ops,
|
2015-06-04 21:24:22 +02:00
|
|
|
cidrcol inet_inclusion_ops,
|
|
|
|
cidrcol inet_minmax_ops,
|
BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
bpcharcol,
|
|
|
|
datecol,
|
|
|
|
timecol,
|
|
|
|
timestampcol,
|
|
|
|
timestamptzcol,
|
|
|
|
intervalcol,
|
|
|
|
timetzcol,
|
|
|
|
bitcol,
|
|
|
|
varbitcol,
|
|
|
|
numericcol,
|
|
|
|
uuidcol,
|
2015-05-15 23:05:22 +02:00
|
|
|
int4rangecol,
|
|
|
|
lsncol,
|
|
|
|
boxcol
|
BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
) with (pages_per_range = 1);
|
|
|
|
|
2015-06-04 20:39:52 +02:00
|
|
|
CREATE TABLE brinopers (colname name, typ text,
|
|
|
|
op text[], value text[], matches int[],
|
|
|
|
check (cardinality(op) = cardinality(value)),
|
|
|
|
check (cardinality(op) = cardinality(matches)));
|
BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
|
2015-05-07 18:02:22 +02:00
|
|
|
INSERT INTO brinopers VALUES
|
2015-06-04 20:39:52 +02:00
|
|
|
('byteacol', 'bytea',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{AAAAAA, AAAAAA, BNAAAABNAAAABNAAAABNAAAABNAAAABNAAAABNAAAABNAAAA, ZZZZZZ, ZZZZZZ}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('charcol', '"char"',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{A, A, M, Z, Z}',
|
|
|
|
'{97, 100, 6, 100, 98}'),
|
|
|
|
('namecol', 'name',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{AAAAAA, AAAAAA, MAAAAA, ZZAAAA, ZZAAAA}',
|
|
|
|
'{100, 100, 2, 100, 100}'),
|
|
|
|
('int2col', 'int2',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{0, 0, 800, 999, 999}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('int2col', 'int4',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{0, 0, 800, 999, 1999}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('int2col', 'int8',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{0, 0, 800, 999, 1428427143}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('int4col', 'int2',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{0, 0, 800, 1999, 1999}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('int4col', 'int4',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{0, 0, 800, 1999, 1999}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('int4col', 'int8',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{0, 0, 800, 1999, 1428427143}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('int8col', 'int2',
|
|
|
|
'{>, >=}',
|
|
|
|
'{0, 0}',
|
|
|
|
'{100, 100}'),
|
|
|
|
('int8col', 'int4',
|
|
|
|
'{>, >=}',
|
|
|
|
'{0, 0}',
|
|
|
|
'{100, 100}'),
|
|
|
|
('int8col', 'int8',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{0, 0, 1257141600, 1428427143, 1428427143}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('textcol', 'text',
|
|
|
|
'{>, >=, =, <=, <}',
|
2016-07-21 19:11:00 +02:00
|
|
|
'{ABABAB, ABABAB, BNAAAABNAAAABNAAAABNAAAABNAAAABNAAAABNAAAABNAAAA, ZZAAAA, ZZAAAA}',
|
2015-06-04 20:39:52 +02:00
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('oidcol', 'oid',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{0, 0, 8800, 9999, 9999}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('tidcol', 'tid',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{"(0,0)", "(0,0)", "(8800,0)", "(9999,19)", "(9999,19)"}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('float4col', 'float4',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{0.0103093, 0.0103093, 1, 1, 1}',
|
|
|
|
'{100, 100, 4, 100, 96}'),
|
|
|
|
('float4col', 'float8',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{0.0103093, 0.0103093, 1, 1, 1}',
|
|
|
|
'{100, 100, 4, 100, 96}'),
|
|
|
|
('float8col', 'float4',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{0, 0, 0, 1.98, 1.98}',
|
|
|
|
'{99, 100, 1, 100, 100}'),
|
|
|
|
('float8col', 'float8',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{0, 0, 0, 1.98, 1.98}',
|
|
|
|
'{99, 100, 1, 100, 100}'),
|
|
|
|
('macaddrcol', 'macaddr',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{00:00:01:00:00:00, 00:00:01:00:00:00, 2c:00:2d:00:16:00, ff:fe:00:00:00:00, ff:fe:00:00:00:00}',
|
|
|
|
'{99, 100, 2, 100, 100}'),
|
|
|
|
('inetcol', 'inet',
|
|
|
|
'{&&, =, <, <=, >, >=, >>=, >>, <<=, <<}',
|
|
|
|
'{10/8, 10.2.14.231/24, 255.255.255.255, 255.255.255.255, 0.0.0.0, 0.0.0.0, 10.2.14.231/24, 10.2.14.231/25, 10.2.14.231/8, 0/0}',
|
|
|
|
'{100, 1, 100, 100, 125, 125, 2, 2, 100, 100}'),
|
|
|
|
('inetcol', 'inet',
|
|
|
|
'{&&, >>=, <<=, =}',
|
|
|
|
'{fe80::6e40:8ff:fea9:a673/32, fe80::6e40:8ff:fea9:8c46, fe80::6e40:8ff:fea9:a673/32, fe80::6e40:8ff:fea9:8c46}',
|
|
|
|
'{25, 1, 25, 1}'),
|
|
|
|
('inetcol', 'cidr',
|
|
|
|
'{&&, <, <=, >, >=, >>=, >>, <<=, <<}',
|
|
|
|
'{10/8, 255.255.255.255, 255.255.255.255, 0.0.0.0, 0.0.0.0, 10.2.14/24, 10.2.14/25, 10/8, 0/0}',
|
|
|
|
'{100, 100, 100, 125, 125, 2, 2, 100, 100}'),
|
|
|
|
('inetcol', 'cidr',
|
|
|
|
'{&&, >>=, <<=, =}',
|
|
|
|
'{fe80::/32, fe80::6e40:8ff:fea9:8c46, fe80::/32, fe80::6e40:8ff:fea9:8c46}',
|
|
|
|
'{25, 1, 25, 1}'),
|
|
|
|
('cidrcol', 'inet',
|
|
|
|
'{&&, =, <, <=, >, >=, >>=, >>, <<=, <<}',
|
|
|
|
'{10/8, 10.2.14/24, 255.255.255.255, 255.255.255.255, 0.0.0.0, 0.0.0.0, 10.2.14.231/24, 10.2.14.231/25, 10.2.14.231/8, 0/0}',
|
|
|
|
'{100, 2, 100, 100, 125, 125, 2, 2, 100, 100}'),
|
|
|
|
('cidrcol', 'inet',
|
|
|
|
'{&&, >>=, <<=, =}',
|
|
|
|
'{fe80::6e40:8ff:fea9:a673/32, fe80::6e40:8ff:fea9:8c46, fe80::6e40:8ff:fea9:a673/32, fe80::6e40:8ff:fea9:8c46}',
|
|
|
|
'{25, 1, 25, 1}'),
|
|
|
|
('cidrcol', 'cidr',
|
|
|
|
'{&&, =, <, <=, >, >=, >>=, >>, <<=, <<}',
|
|
|
|
'{10/8, 10.2.14/24, 255.255.255.255, 255.255.255.255, 0.0.0.0, 0.0.0.0, 10.2.14/24, 10.2.14/25, 10/8, 0/0}',
|
|
|
|
'{100, 2, 100, 100, 125, 125, 2, 2, 100, 100}'),
|
|
|
|
('cidrcol', 'cidr',
|
|
|
|
'{&&, >>=, <<=, =}',
|
|
|
|
'{fe80::/32, fe80::6e40:8ff:fea9:8c46, fe80::/32, fe80::6e40:8ff:fea9:8c46}',
|
|
|
|
'{25, 1, 25, 1}'),
|
|
|
|
('bpcharcol', 'bpchar',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{A, A, W, Z, Z}',
|
|
|
|
'{97, 100, 6, 100, 98}'),
|
|
|
|
('datecol', 'date',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{1995-08-15, 1995-08-15, 2009-12-01, 2022-12-30, 2022-12-30}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('timecol', 'time',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{01:20:30, 01:20:30, 02:28:57, 06:28:31.5, 06:28:31.5}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('timestampcol', 'timestamp',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{1942-07-23 03:05:09, 1942-07-23 03:05:09, 1964-03-24 19:26:45, 1984-01-20 22:42:21, 1984-01-20 22:42:21}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('timestampcol', 'timestamptz',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{1942-07-23 03:05:09, 1942-07-23 03:05:09, 1964-03-24 19:26:45, 1984-01-20 22:42:21, 1984-01-20 22:42:21}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('timestamptzcol', 'timestamptz',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{1972-10-10 03:00:00-04, 1972-10-10 03:00:00-04, 1972-10-19 09:00:00-07, 1972-11-20 19:00:00-03, 1972-11-20 19:00:00-03}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('intervalcol', 'interval',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{00:00:00, 00:00:00, 1 mons 13 days 12:24, 2 mons 23 days 07:48:00, 1 year}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('timetzcol', 'timetz',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{01:30:20+02, 01:30:20+02, 01:35:50+02, 23:55:05+02, 23:55:05+02}',
|
|
|
|
'{99, 100, 2, 100, 100}'),
|
|
|
|
('bitcol', 'bit(10)',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{0000000010, 0000000010, 0011011110, 1111111000, 1111111000}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('varbitcol', 'varbit(16)',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{0000000000000100, 0000000000000100, 0001010001100110, 1111111111111000, 1111111111111000}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('numericcol', 'numeric',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{0.00, 0.01, 2268164.347826086956521739130434782609, 99470151.9, 99470151.9}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('uuidcol', 'uuid',
|
|
|
|
'{>, >=, =, <=, <}',
|
|
|
|
'{00040004-0004-0004-0004-000400040004, 00040004-0004-0004-0004-000400040004, 52225222-5222-5222-5222-522252225222, 99989998-9998-9998-9998-999899989998, 99989998-9998-9998-9998-999899989998}',
|
|
|
|
'{100, 100, 1, 100, 100}'),
|
|
|
|
('int4rangecol', 'int4range',
|
|
|
|
'{<<, &<, &&, &>, >>, @>, <@, =, <, <=, >, >=}',
|
|
|
|
'{"[10000,)","[10000,)","(,]","[3,4)","[36,44)","(1500,1501]","[3,4)","[222,1222)","[36,44)","[43,1043)","[367,4466)","[519,)"}',
|
|
|
|
'{53, 53, 53, 53, 50, 22, 72, 1, 74, 75, 34, 21}'),
|
|
|
|
('int4rangecol', 'int4range',
|
|
|
|
'{@>, <@, =, <=, >, >=}',
|
|
|
|
'{empty, empty, empty, empty, empty, empty}',
|
|
|
|
'{125, 72, 72, 72, 53, 125}'),
|
|
|
|
('int4rangecol', 'int4',
|
|
|
|
'{@>}',
|
|
|
|
'{1500}',
|
|
|
|
'{22}'),
|
|
|
|
('lsncol', 'pg_lsn',
|
|
|
|
'{>, >=, =, <=, <, IS, IS NOT}',
|
|
|
|
'{0/1200, 0/1200, 44/455222, 198/1999799, 198/1999799, NULL, NULL}',
|
|
|
|
'{100, 100, 1, 100, 100, 25, 100}'),
|
|
|
|
('boxcol', 'point',
|
|
|
|
'{@>}',
|
|
|
|
'{"(500,43)"}',
|
|
|
|
'{11}'),
|
|
|
|
('boxcol', 'box',
|
|
|
|
'{<<, &<, &&, &>, >>, <<|, &<|, |&>, |>>, @>, <@, ~=}',
|
|
|
|
'{"((1000,2000),(3000,4000))","((1,2),(3000,4000))","((1,2),(3000,4000))","((1,2),(3000,4000))","((1,2),(3,4))","((1000,2000),(3000,4000))","((1,2000),(3,4000))","((1000,2),(3000,4))","((1,2),(3,4))","((1,2),(300,400))","((1,2),(3000,4000))","((222,1222),(44,45))"}',
|
|
|
|
'{100, 100, 100, 99, 96, 100, 100, 99, 96, 1, 99, 1}');
|
BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
|
|
|
|
DO $x$
|
|
|
|
DECLARE
|
2015-05-07 18:02:22 +02:00
|
|
|
r record;
|
|
|
|
r2 record;
|
|
|
|
cond text;
|
2017-02-07 22:34:11 +01:00
|
|
|
idx_ctids tid[];
|
|
|
|
ss_ctids tid[];
|
2015-05-07 18:02:22 +02:00
|
|
|
count int;
|
2015-06-04 20:39:52 +02:00
|
|
|
plan_ok bool;
|
|
|
|
plan_line text;
|
BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
BEGIN
|
2015-06-04 20:39:52 +02:00
|
|
|
FOR r IN SELECT colname, oper, typ, value[ordinality], matches[ordinality] FROM brinopers, unnest(op) WITH ORDINALITY AS oper LOOP
|
2015-05-07 18:02:22 +02:00
|
|
|
|
|
|
|
-- prepare the condition
|
|
|
|
IF r.value IS NULL THEN
|
|
|
|
cond := format('%I %s %L', r.colname, r.oper, r.value);
|
|
|
|
ELSE
|
|
|
|
cond := format('%I %s %L::%s', r.colname, r.oper, r.value, r.typ);
|
|
|
|
END IF;
|
|
|
|
|
|
|
|
-- run the query using the brin index
|
|
|
|
SET enable_seqscan = 0;
|
|
|
|
SET enable_bitmapscan = 1;
|
2015-06-04 20:39:52 +02:00
|
|
|
|
|
|
|
plan_ok := false;
|
2017-02-07 22:34:11 +01:00
|
|
|
FOR plan_line IN EXECUTE format($y$EXPLAIN SELECT array_agg(ctid) FROM brintest WHERE %s $y$, cond) LOOP
|
|
|
|
IF plan_line LIKE '%Bitmap Heap Scan on brintest%' THEN
|
2015-06-04 20:39:52 +02:00
|
|
|
plan_ok := true;
|
|
|
|
END IF;
|
|
|
|
END LOOP;
|
|
|
|
IF NOT plan_ok THEN
|
|
|
|
RAISE WARNING 'did not get bitmap indexscan plan for %', r;
|
|
|
|
END IF;
|
|
|
|
|
2017-02-07 22:34:11 +01:00
|
|
|
EXECUTE format($y$SELECT array_agg(ctid) FROM brintest WHERE %s $y$, cond)
|
|
|
|
INTO idx_ctids;
|
2015-05-07 18:02:22 +02:00
|
|
|
|
|
|
|
-- run the query using a seqscan
|
|
|
|
SET enable_seqscan = 1;
|
|
|
|
SET enable_bitmapscan = 0;
|
2015-06-04 20:39:52 +02:00
|
|
|
|
|
|
|
plan_ok := false;
|
2017-02-07 22:34:11 +01:00
|
|
|
FOR plan_line IN EXECUTE format($y$EXPLAIN SELECT array_agg(ctid) FROM brintest WHERE %s $y$, cond) LOOP
|
|
|
|
IF plan_line LIKE '%Seq Scan on brintest%' THEN
|
2015-06-04 20:39:52 +02:00
|
|
|
plan_ok := true;
|
|
|
|
END IF;
|
|
|
|
END LOOP;
|
|
|
|
IF NOT plan_ok THEN
|
|
|
|
RAISE WARNING 'did not get seqscan plan for %', r;
|
|
|
|
END IF;
|
|
|
|
|
2017-02-07 22:34:11 +01:00
|
|
|
EXECUTE format($y$SELECT array_agg(ctid) FROM brintest WHERE %s $y$, cond)
|
|
|
|
INTO ss_ctids;
|
2015-05-07 18:02:22 +02:00
|
|
|
|
|
|
|
-- make sure both return the same results
|
2017-02-07 22:34:11 +01:00
|
|
|
count := array_length(idx_ctids, 1);
|
2015-05-07 18:02:22 +02:00
|
|
|
|
2017-02-07 22:34:11 +01:00
|
|
|
IF NOT (count = array_length(ss_ctids, 1) AND
|
|
|
|
idx_ctids @> ss_ctids AND
|
|
|
|
idx_ctids <@ ss_ctids) THEN
|
|
|
|
-- report the results of each scan to make the differences obvious
|
2015-05-07 18:02:22 +02:00
|
|
|
RAISE WARNING 'something not right in %: count %', r, count;
|
|
|
|
SET enable_seqscan = 1;
|
|
|
|
SET enable_bitmapscan = 0;
|
|
|
|
FOR r2 IN EXECUTE 'SELECT ' || r.colname || ' FROM brintest WHERE ' || cond LOOP
|
|
|
|
RAISE NOTICE 'seqscan: %', r2;
|
|
|
|
END LOOP;
|
|
|
|
|
|
|
|
SET enable_seqscan = 0;
|
|
|
|
SET enable_bitmapscan = 1;
|
|
|
|
FOR r2 IN EXECUTE 'SELECT ' || r.colname || ' FROM brintest WHERE ' || cond LOOP
|
|
|
|
RAISE NOTICE 'bitmapscan: %', r2;
|
|
|
|
END LOOP;
|
|
|
|
END IF;
|
|
|
|
|
2015-06-04 20:39:52 +02:00
|
|
|
-- make sure we found expected number of matches
|
|
|
|
IF count != r.matches THEN RAISE WARNING 'unexpected number of results % for %', count, r; END IF;
|
2015-05-07 18:02:22 +02:00
|
|
|
END LOOP;
|
|
|
|
END;
|
BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
$x$;
|
|
|
|
|
2017-04-06 22:49:26 +02:00
|
|
|
RESET enable_seqscan;
|
|
|
|
RESET enable_bitmapscan;
|
|
|
|
|
BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
INSERT INTO brintest SELECT
|
|
|
|
repeat(stringu1, 42)::bytea,
|
|
|
|
substr(stringu1, 1, 1)::"char",
|
|
|
|
stringu1::name, 142857 * tenthous,
|
|
|
|
thousand,
|
|
|
|
twothousand,
|
|
|
|
repeat(stringu1, 42),
|
|
|
|
unique1::oid,
|
|
|
|
format('(%s,%s)', tenthous, twenty)::tid,
|
|
|
|
(four + 1.0)/(hundred+1),
|
|
|
|
odd::float8 / (tenthous + 1),
|
|
|
|
format('%s:00:%s:00:%s:00', to_hex(odd), to_hex(even), to_hex(hundred))::macaddr,
|
|
|
|
inet '10.2.3.4' + tenthous,
|
2015-05-07 18:02:22 +02:00
|
|
|
cidr '10.2.3/24' + tenthous,
|
BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
substr(stringu1, 1, 1)::bpchar,
|
|
|
|
date '1995-08-15' + tenthous,
|
|
|
|
time '01:20:30' + thousand * interval '18.5 second',
|
|
|
|
timestamp '1942-07-23 03:05:09' + tenthous * interval '36.38 hours',
|
|
|
|
timestamptz '1972-10-10 03:00' + thousand * interval '1 hour',
|
|
|
|
justify_days(justify_hours(tenthous * interval '12 minutes')),
|
|
|
|
timetz '01:30:20' + hundred * interval '15 seconds',
|
|
|
|
thousand::bit(10),
|
|
|
|
tenthous::bit(16)::varbit,
|
|
|
|
tenthous::numeric(36,30) * fivethous * even / (hundred + 1),
|
|
|
|
format('%s%s-%s-%s-%s-%s%s%s', to_char(tenthous, 'FM0000'), to_char(tenthous, 'FM0000'), to_char(tenthous, 'FM0000'), to_char(tenthous, 'FM0000'), to_char(tenthous, 'FM0000'), to_char(tenthous, 'FM0000'), to_char(tenthous, 'FM0000'), to_char(tenthous, 'FM0000'))::uuid,
|
2015-05-15 23:05:22 +02:00
|
|
|
int4range(thousand, twothousand),
|
|
|
|
format('%s/%s%s', odd, even, tenthous)::pg_lsn,
|
|
|
|
box(point(odd, even), point(thousand, twothousand))
|
2015-06-04 19:46:34 +02:00
|
|
|
FROM tenk1 ORDER BY unique2 LIMIT 5 OFFSET 5;
|
BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
|
2017-04-01 21:10:04 +02:00
|
|
|
SELECT brin_desummarize_range('brinidx', 0);
|
2015-05-26 20:10:46 +02:00
|
|
|
VACUUM brintest; -- force a summarization cycle in brinidx
|
BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
|
|
|
|
UPDATE brintest SET int8col = int8col * int4col;
|
2014-11-14 20:27:26 +01:00
|
|
|
UPDATE brintest SET textcol = '' WHERE textcol IS NOT NULL;
|
2015-12-26 18:56:09 +01:00
|
|
|
|
|
|
|
-- Tests for brin_summarize_new_values
|
|
|
|
SELECT brin_summarize_new_values('brintest'); -- error, not an index
|
|
|
|
SELECT brin_summarize_new_values('tenk1_unique1'); -- error, not a BRIN index
|
|
|
|
SELECT brin_summarize_new_values('brinidx'); -- ok, no change expected
|
BRIN auto-summarization
Previously, only VACUUM would cause a page range to get initially
summarized by BRIN indexes, which for some use cases takes too much time
since the inserts occur. To avoid the delay, have brininsert request a
summarization run for the previous range as soon as the first tuple is
inserted into the first page of the next range. Autovacuum is in charge
of processing these requests, after doing all the regular vacuuming/
analyzing work on tables.
This doesn't impose any new tasks on autovacuum, because autovacuum was
already in charge of doing summarizations. The only actual effect is to
change the timing, i.e. that it occurs earlier. For this reason, we
don't go any great lengths to record these requests very robustly; if
they are lost because of a server crash or restart, they will happen at
a later time anyway.
Most of the new code here is in autovacuum, which can now be told about
"work items" to process. This can be used for other things such as GIN
pending list cleaning, perhaps visibility map bit setting, both of which
are currently invoked during vacuum, but do not really depend on vacuum
taking place.
The requests are at the page range level, a granularity for which we did
not have SQL-level access; we only had index-level summarization
requests via brin_summarize_new_values(). It seems reasonable to add
SQL-level access to range-level summarization too, so add a function
brin_summarize_range() to do that.
Authors: Álvaro Herrera, based on sketch from Simon Riggs.
Reviewed-by: Thomas Munro.
Discussion: https://postgr.es/m/20170301045823.vneqdqkmsd4as4ds@alvherre.pgsql
2017-04-01 19:00:53 +02:00
|
|
|
|
2017-04-01 21:10:04 +02:00
|
|
|
-- Tests for brin_desummarize_range
|
|
|
|
SELECT brin_desummarize_range('brinidx', -1); -- error, invalid range
|
|
|
|
SELECT brin_desummarize_range('brinidx', 0);
|
|
|
|
SELECT brin_desummarize_range('brinidx', 0);
|
|
|
|
SELECT brin_desummarize_range('brinidx', 100000000);
|
|
|
|
|
BRIN auto-summarization
Previously, only VACUUM would cause a page range to get initially
summarized by BRIN indexes, which for some use cases takes too much time
since the inserts occur. To avoid the delay, have brininsert request a
summarization run for the previous range as soon as the first tuple is
inserted into the first page of the next range. Autovacuum is in charge
of processing these requests, after doing all the regular vacuuming/
analyzing work on tables.
This doesn't impose any new tasks on autovacuum, because autovacuum was
already in charge of doing summarizations. The only actual effect is to
change the timing, i.e. that it occurs earlier. For this reason, we
don't go any great lengths to record these requests very robustly; if
they are lost because of a server crash or restart, they will happen at
a later time anyway.
Most of the new code here is in autovacuum, which can now be told about
"work items" to process. This can be used for other things such as GIN
pending list cleaning, perhaps visibility map bit setting, both of which
are currently invoked during vacuum, but do not really depend on vacuum
taking place.
The requests are at the page range level, a granularity for which we did
not have SQL-level access; we only had index-level summarization
requests via brin_summarize_new_values(). It seems reasonable to add
SQL-level access to range-level summarization too, so add a function
brin_summarize_range() to do that.
Authors: Álvaro Herrera, based on sketch from Simon Riggs.
Reviewed-by: Thomas Munro.
Discussion: https://postgr.es/m/20170301045823.vneqdqkmsd4as4ds@alvherre.pgsql
2017-04-01 19:00:53 +02:00
|
|
|
-- Test brin_summarize_range
|
|
|
|
CREATE TABLE brin_summarize (
|
|
|
|
value int
|
|
|
|
) WITH (fillfactor=10, autovacuum_enabled=false);
|
|
|
|
CREATE INDEX brin_summarize_idx ON brin_summarize USING brin (value) WITH (pages_per_range=2);
|
|
|
|
-- Fill a few pages
|
|
|
|
DO $$
|
|
|
|
DECLARE curtid tid;
|
|
|
|
BEGIN
|
|
|
|
LOOP
|
|
|
|
INSERT INTO brin_summarize VALUES (1) RETURNING ctid INTO curtid;
|
|
|
|
EXIT WHEN curtid > tid '(2, 0)';
|
|
|
|
END LOOP;
|
|
|
|
END;
|
|
|
|
$$;
|
|
|
|
|
|
|
|
-- summarize one range
|
|
|
|
SELECT brin_summarize_range('brin_summarize_idx', 0);
|
|
|
|
-- nothing: already summarized
|
|
|
|
SELECT brin_summarize_range('brin_summarize_idx', 1);
|
|
|
|
-- summarize one range
|
|
|
|
SELECT brin_summarize_range('brin_summarize_idx', 2);
|
|
|
|
-- nothing: page doesn't exist in table
|
|
|
|
SELECT brin_summarize_range('brin_summarize_idx', 4294967295);
|
|
|
|
-- invalid block number values
|
|
|
|
SELECT brin_summarize_range('brin_summarize_idx', -1);
|
|
|
|
SELECT brin_summarize_range('brin_summarize_idx', 4294967296);
|
2017-04-06 22:49:26 +02:00
|
|
|
|
2020-01-22 22:35:05 +01:00
|
|
|
-- test value merging in add_value
|
Avoid loss of code coverage with unlogged-index test cases.
Commit 4fb5c794e intended to add coverage of some ambuildempty
methods that were not getting reached, without removing any
test coverage. However, by changing a temp table to unlogged
it managed to negate the intent of 4c51a2d1e, which means that
we didn't have reliable test coverage of ginvacuum.c anymore.
As things stand, much of that file might or might not get reached
depending on timing, which seems pretty undesirable.
Although this is only clearly broken for the GIN test, it seems
best to revert 4fb5c794e altogether and instead add bespoke test
cases covering unlogged indexes for these four AMs. We don't
need to do very much with them, so the extra tests are cheap.
(Note that btree, hash, and bloom already have similar test cases,
so they need no additional work.)
We can also undo dec8ad367. Since the testing deficiency that that
hacked around was later fixed by 2f2e24d90, let's intentionally leave
an unlogged table behind to improve test coverage in the modules that
use the regression database for other test purposes. (The case I used
also leaves an unlogged sequence behind.)
Per report from Alex Kozhemyakin. Back-patch to v15 where the
faulty test came in.
Discussion: https://postgr.es/m/b00c8ee096ee46cd25c183125562a1a7@postgrespro.ru
2022-09-25 19:10:10 +02:00
|
|
|
CREATE TABLE brintest_2 (n numrange);
|
2020-01-22 22:35:05 +01:00
|
|
|
CREATE INDEX brinidx_2 ON brintest_2 USING brin (n);
|
|
|
|
INSERT INTO brintest_2 VALUES ('empty');
|
|
|
|
INSERT INTO brintest_2 VALUES (numrange(0, 2^1000::numeric));
|
|
|
|
INSERT INTO brintest_2 VALUES ('(-1, 0)');
|
|
|
|
|
|
|
|
SELECT brin_desummarize_range('brinidx', 0);
|
|
|
|
SELECT brin_summarize_range('brinidx', 0);
|
|
|
|
DROP TABLE brintest_2;
|
2017-04-06 22:49:26 +02:00
|
|
|
|
|
|
|
-- test brin cost estimates behave sanely based on correlation of values
|
|
|
|
CREATE TABLE brin_test (a INT, b INT);
|
|
|
|
INSERT INTO brin_test SELECT x/100,x%100 FROM generate_series(1,10000) x(x);
|
|
|
|
CREATE INDEX brin_test_a_idx ON brin_test USING brin (a) WITH (pages_per_range = 2);
|
|
|
|
CREATE INDEX brin_test_b_idx ON brin_test USING brin (b) WITH (pages_per_range = 2);
|
|
|
|
VACUUM ANALYZE brin_test;
|
|
|
|
|
|
|
|
-- Ensure brin index is used when columns are perfectly correlated
|
|
|
|
EXPLAIN (COSTS OFF) SELECT * FROM brin_test WHERE a = 1;
|
|
|
|
-- Ensure brin index is not used when values are not correlated
|
|
|
|
EXPLAIN (COSTS OFF) SELECT * FROM brin_test WHERE b = 1;
|
2020-11-07 00:39:19 +01:00
|
|
|
|
|
|
|
-- make sure data are properly de-toasted in BRIN index
|
|
|
|
CREATE TABLE brintest_3 (a text, b text, c text, d text);
|
|
|
|
|
|
|
|
-- long random strings (~2000 chars each, so ~6kB for min/max on two
|
|
|
|
-- columns) to trigger toasting
|
2023-03-13 10:15:44 +01:00
|
|
|
WITH rand_value AS (SELECT string_agg(fipshash(i::text),'') AS val FROM generate_series(1,60) s(i))
|
2020-11-07 00:39:19 +01:00
|
|
|
INSERT INTO brintest_3
|
|
|
|
SELECT val, val, val, val FROM rand_value;
|
|
|
|
|
|
|
|
CREATE INDEX brin_test_toast_idx ON brintest_3 USING brin (b, c);
|
|
|
|
DELETE FROM brintest_3;
|
|
|
|
|
|
|
|
-- We need to wait a bit for all transactions to complete, so that the
|
|
|
|
-- vacuum actually removes the TOAST rows. Creating an index concurrently
|
|
|
|
-- is a one way to achieve that, because it does exactly such wait.
|
|
|
|
CREATE INDEX CONCURRENTLY brin_test_temp_idx ON brintest_3(a);
|
|
|
|
DROP INDEX brin_test_temp_idx;
|
|
|
|
|
|
|
|
-- vacuum the table, to discard TOAST data
|
|
|
|
VACUUM brintest_3;
|
|
|
|
|
|
|
|
-- retry insert with a different random-looking (but deterministic) value
|
|
|
|
-- the value is different, and so should replace either min or max in the
|
|
|
|
-- brin summary
|
2023-03-13 10:15:44 +01:00
|
|
|
WITH rand_value AS (SELECT string_agg(fipshash((-i)::text),'') AS val FROM generate_series(1,60) s(i))
|
2020-11-07 00:39:19 +01:00
|
|
|
INSERT INTO brintest_3
|
|
|
|
SELECT val, val, val, val FROM rand_value;
|
|
|
|
|
|
|
|
-- now try some queries, accessing the brin index
|
|
|
|
SET enable_seqscan = off;
|
|
|
|
|
|
|
|
EXPLAIN (COSTS OFF)
|
|
|
|
SELECT * FROM brintest_3 WHERE b < '0';
|
|
|
|
|
|
|
|
SELECT * FROM brintest_3 WHERE b < '0';
|
|
|
|
|
|
|
|
DROP TABLE brintest_3;
|
|
|
|
RESET enable_seqscan;
|
Avoid loss of code coverage with unlogged-index test cases.
Commit 4fb5c794e intended to add coverage of some ambuildempty
methods that were not getting reached, without removing any
test coverage. However, by changing a temp table to unlogged
it managed to negate the intent of 4c51a2d1e, which means that
we didn't have reliable test coverage of ginvacuum.c anymore.
As things stand, much of that file might or might not get reached
depending on timing, which seems pretty undesirable.
Although this is only clearly broken for the GIN test, it seems
best to revert 4fb5c794e altogether and instead add bespoke test
cases covering unlogged indexes for these four AMs. We don't
need to do very much with them, so the extra tests are cheap.
(Note that btree, hash, and bloom already have similar test cases,
so they need no additional work.)
We can also undo dec8ad367. Since the testing deficiency that that
hacked around was later fixed by 2f2e24d90, let's intentionally leave
an unlogged table behind to improve test coverage in the modules that
use the regression database for other test purposes. (The case I used
also leaves an unlogged sequence behind.)
Per report from Alex Kozhemyakin. Back-patch to v15 where the
faulty test came in.
Discussion: https://postgr.es/m/b00c8ee096ee46cd25c183125562a1a7@postgrespro.ru
2022-09-25 19:10:10 +02:00
|
|
|
|
|
|
|
-- test an unlogged table, mostly to get coverage of brinbuildempty
|
|
|
|
CREATE UNLOGGED TABLE brintest_unlogged (n numrange);
|
|
|
|
CREATE INDEX brinidx_unlogged ON brintest_unlogged USING brin (n);
|
|
|
|
INSERT INTO brintest_unlogged VALUES (numrange(0, 2^1000::numeric));
|
|
|
|
DROP TABLE brintest_unlogged;
|