postgresql/doc/src/sgml/pageinspect.sgml

732 lines
22 KiB
Plaintext
Raw Normal View History

2010-09-20 22:08:53 +02:00
<!-- doc/src/sgml/pageinspect.sgml -->
<sect1 id="pageinspect" xreflabel="pageinspect">
<title>pageinspect</title>
<indexterm zone="pageinspect">
<primary>pageinspect</primary>
</indexterm>
<para>
The <filename>pageinspect</filename> module provides functions that allow you to
inspect the contents of database pages at a low level, which is useful for
debugging purposes. All of these functions may be used only by superusers.
</para>
<sect2>
<title>General Functions</title>
<variablelist>
<varlistentry>
<term>
<function>get_raw_page(relname text, fork text, blkno int) returns bytea</function>
<indexterm>
<primary>get_raw_page</primary>
</indexterm>
</term>
<listitem>
<para>
<function>get_raw_page</function> reads the specified block of the named
relation and returns a copy as a <type>bytea</type> value. This allows a
single time-consistent copy of the block to be obtained.
<replaceable>fork</replaceable> should be <literal>'main'</literal> for
the main data fork, <literal>'fsm'</literal> for the free space map,
<literal>'vm'</literal> for the visibility map, or <literal>'init'</literal>
for the initialization fork.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<function>get_raw_page(relname text, blkno int) returns bytea</function>
</term>
<listitem>
<para>
A shorthand version of <function>get_raw_page</function>, for reading
from the main fork. Equivalent to
<literal>get_raw_page(relname, 'main', blkno)</literal>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<function>page_header(page bytea) returns record</function>
<indexterm>
<primary>page_header</primary>
</indexterm>
</term>
<listitem>
<para>
<function>page_header</function> shows fields that are common to all
<productname>PostgreSQL</productname> heap and index pages.
</para>
<para>
A page image obtained with <function>get_raw_page</function> should be
passed as argument. For example:
<screen>
test=# SELECT * FROM page_header(get_raw_page('pg_class', 0));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
-----------+----------+--------+-------+-------+---------+----------+---------+-----------
0/24A1B50 | 0 | 1 | 232 | 368 | 8192 | 8192 | 4 | 0
</screen>
The returned columns correspond to the fields in the
<structname>PageHeaderData</structname> struct.
See <filename>src/include/storage/bufpage.h</filename> for details.
</para>
<para>
The <structfield>checksum</structfield> field is the checksum stored in
the page, which might be incorrect if the page is somehow corrupted. If
data checksums are not enabled for this instance, then the value stored
is meaningless.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<function>page_checksum(page bytea, blkno int4) returns smallint</function>
<indexterm>
<primary>page_checksum</primary>
</indexterm>
</term>
<listitem>
<para>
<function>page_checksum</function> computes the checksum for the page, as if
it was located at the given block.
</para>
<para>
A page image obtained with <function>get_raw_page</function> should be
passed as argument. For example:
<screen>
test=# SELECT page_checksum(get_raw_page('pg_class', 0), 0);
page_checksum
---------------
13443
</screen>
Note that the checksum depends on the block number, so matching block
numbers should be passed (except when doing esoteric debugging).
</para>
<para>
The checksum computed with this function can be compared with
the <structfield>checksum</structfield> result field of the
function <function>page_header</function>. If data checksums are
enabled for this instance, then the two values should be equal.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<function>fsm_page_contents(page bytea) returns text</function>
<indexterm>
<primary>fsm_page_contents</primary>
</indexterm>
</term>
<listitem>
<para>
<function>fsm_page_contents</function> shows the internal node structure
of a FSM page. For example:
<screen>
test=# SELECT fsm_page_contents(get_raw_page('pg_class', 'fsm', 0));
</screen>
The output is a multiline string, with one line per node in the binary
tree within the page. Only those nodes that are not zero are printed.
The so-called "next" pointer, which points to the next slot to be
returned from the page, is also printed.
</para>
<para>
See <filename>src/backend/storage/freespace/README</filename> for more
information on the structure of an FSM page.
</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2>
<title>Heap Functions</title>
<variablelist>
<varlistentry>
<term>
<function>heap_page_items(page bytea) returns setof record</function>
<indexterm>
<primary>heap_page_items</primary>
</indexterm>
</term>
<listitem>
<para>
<function>heap_page_items</function> shows all line pointers on a heap
page. For those line pointers that are in use, tuple headers as well
as tuple raw data are also shown. All tuples are shown, whether or not
the tuples were visible to an MVCC snapshot at the time the raw page
was copied.
</para>
<para>
A heap page image obtained with <function>get_raw_page</function> should
be passed as argument. For example:
<screen>
test=# SELECT * FROM heap_page_items(get_raw_page('pg_class', 0));
</screen>
See <filename>src/include/storage/itemid.h</filename> and
<filename>src/include/access/htup_details.h</filename> for explanations of the fields
returned.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<function>tuple_data_split(rel_oid oid, t_data bytea, t_infomask integer, t_infomask2 integer, t_bits text [, do_detoast bool]) returns bytea[]</function>
<indexterm>
<primary>tuple_data_split</primary>
</indexterm>
</term>
<listitem>
<para>
<function>tuple_data_split</function> splits tuple data into attributes
in the same way as backend internals.
<screen>
test=# SELECT tuple_data_split('pg_class'::regclass, t_data, t_infomask, t_infomask2, t_bits) FROM heap_page_items(get_raw_page('pg_class', 0));
</screen>
This function should be called with the same arguments as the return
attributes of <function>heap_page_items</function>.
</para>
<para>
If <parameter>do_detoast</parameter> is <literal>true</literal>,
attribute that will be detoasted as needed. Default value is
<literal>false</literal>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<function>heap_page_item_attrs(page bytea, rel_oid regclass [, do_detoast bool]) returns setof record</function>
<indexterm>
<primary>heap_page_item_attrs</primary>
</indexterm>
</term>
<listitem>
<para>
<function>heap_page_item_attrs</function> is equivalent to
<function>heap_page_items</function> except that it returns
tuple raw data as an array of attributes that can optionally
be detoasted by <parameter>do_detoast</parameter> which is
<literal>false</literal> by default.
</para>
<para>
A heap page image obtained with <function>get_raw_page</function> should
be passed as argument. For example:
<screen>
test=# SELECT * FROM heap_page_item_attrs(get_raw_page('pg_class', 0), 'pg_class'::regclass);
</screen>
</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2>
<title>B-Tree Functions</title>
<variablelist>
<varlistentry>
<term>
<function>bt_metap(relname text) returns record</function>
<indexterm>
<primary>bt_metap</primary>
</indexterm>
</term>
<listitem>
<para>
2010-08-17 06:37:21 +02:00
<function>bt_metap</function> returns information about a B-tree
index's metapage. For example:
<screen>
test=# SELECT * FROM bt_metap('pg_cast_oid_index');
Skip full index scan during cleanup of B-tree indexes when possible Vacuum of index consists from two stages: multiple (zero of more) ambulkdelete calls and one amvacuumcleanup call. When workload on particular table is append-only, then autovacuum isn't intended to touch this table. However, user may run vacuum manually in order to fill visibility map and get benefits of index-only scans. Then ambulkdelete wouldn't be called for indexes of such table (because no heap tuples were deleted), only amvacuumcleanup would be called In this case, amvacuumcleanup would perform full index scan for two objectives: put recyclable pages into free space map and update index statistics. This patch allows btvacuumclanup to skip full index scan when two conditions are satisfied: no pages are going to be put into free space map and index statistics isn't stalled. In order to check first condition, we store oldest btpo_xact in the meta-page. When it's precedes RecentGlobalXmin, then there are some recyclable pages. In order to check second condition we store number of heap tuples observed during previous full index scan by cleanup. If fraction of newly inserted tuples is less than vacuum_cleanup_index_scale_factor, then statistics isn't considered to be stalled. vacuum_cleanup_index_scale_factor can be defined as both reloption and GUC (default). This patch bumps B-tree meta-page version. Upgrade of meta-page is performed "on the fly": during VACUUM meta-page is rewritten with new version. No special handling in pg_upgrade is required. Author: Masahiko Sawada, Alexander Korotkov Review by: Peter Geoghegan, Kyotaro Horiguchi, Alexander Korotkov, Yura Sokolov Discussion: https://www.postgresql.org/message-id/flat/CAD21AoAX+d2oD_nrd9O2YkpzHaFr=uQeGr9s1rKC3O4ENc568g@mail.gmail.com
2018-04-04 18:29:00 +02:00
-[ RECORD 1 ]-----------+-------
magic | 340322
version | 3
root | 1
level | 0
fastroot | 1
fastlevel | 0
oldest_xact | 582
last_cleanup_num_tuples | 1000
</screen>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<function>bt_page_stats(relname text, blkno int) returns record</function>
<indexterm>
<primary>bt_page_stats</primary>
</indexterm>
</term>
<listitem>
<para>
<function>bt_page_stats</function> returns summary information about
2010-08-17 06:37:21 +02:00
single pages of B-tree indexes. For example:
<screen>
test=# SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
-[ RECORD 1 ]-+-----
blkno | 1
type | l
live_items | 256
dead_items | 0
avg_item_size | 12
page_size | 8192
free_size | 4056
btpo_prev | 0
btpo_next | 0
btpo | 0
btpo_flags | 3
</screen>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<function>bt_page_items(relname text, blkno int) returns setof record</function>
<indexterm>
<primary>bt_page_items</primary>
</indexterm>
</term>
<listitem>
<para>
<function>bt_page_items</function> returns detailed information about
2010-08-17 06:37:21 +02:00
all of the items on a B-tree index page. For example:
<screen>
test=# SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------
1 | (0,1) | 12 | f | f | 23 27 00 00
2 | (0,2) | 12 | f | f | 24 27 00 00
3 | (0,3) | 12 | f | f | 25 27 00 00
4 | (0,4) | 12 | f | f | 26 27 00 00
5 | (0,5) | 12 | f | f | 27 27 00 00
6 | (0,6) | 12 | f | f | 28 27 00 00
7 | (0,7) | 12 | f | f | 29 27 00 00
8 | (0,8) | 12 | f | f | 2a 27 00 00
</screen>
In a B-tree leaf page, <structfield>ctid</structfield> points to a heap tuple.
In an internal page, the block number part of <structfield>ctid</structfield>
points to another page in the index itself, while the offset part
(the second number) is ignored and is usually 1.
</para>
<para>
Note that the first item on any non-rightmost page (any page with
a non-zero value in the <structfield>btpo_next</structfield> field) is the
page's <quote>high key</quote>, meaning its <structfield>data</structfield>
serves as an upper bound on all items appearing on the page, while
its <structfield>ctid</structfield> field is meaningless. Also, on non-leaf
pages, the first real data item (the first item that is not a high
key) is a <quote>minus infinity</quote> item, with no actual value
in its <structfield>data</structfield> field. Such an item does have a valid
downlink in its <structfield>ctid</structfield> field, however.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<function>bt_page_items(page bytea) returns setof record</function>
<indexterm>
<primary>bt_page_items</primary>
</indexterm>
</term>
<listitem>
<para>
It is also possible to pass a page to <function>bt_page_items</function>
as a <type>bytea</type> value. A page image obtained
with <function>get_raw_page</function> should be passed as argument. So
the last example could also be rewritten like this:
<screen>
test=# SELECT * FROM bt_page_items(get_raw_page('pg_cast_oid_index', 1));
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------
1 | (0,1) | 12 | f | f | 23 27 00 00
2 | (0,2) | 12 | f | f | 24 27 00 00
3 | (0,3) | 12 | f | f | 25 27 00 00
4 | (0,4) | 12 | f | f | 26 27 00 00
5 | (0,5) | 12 | f | f | 27 27 00 00
6 | (0,6) | 12 | f | f | 28 27 00 00
7 | (0,7) | 12 | f | f | 29 27 00 00
8 | (0,8) | 12 | f | f | 2a 27 00 00
</screen>
All the other details are the same as explained in the previous item.
</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2>
<title>BRIN Functions</title>
<variablelist>
BRIN: Block Range Indexes BRIN is a new index access method intended to accelerate scans of very large tables, without the maintenance overhead of btrees or other traditional indexes. They work by maintaining "summary" data about block ranges. Bitmap index scans work by reading each summary tuple and comparing them with the query quals; all pages in the range are returned in a lossy TID bitmap if the quals are consistent with the values in the summary tuple, otherwise not. Normal index scans are not supported because these indexes do not store TIDs. As new tuples are added into the index, the summary information is updated (if the block range in which the tuple is added is already summarized) or not; in the latter case, a subsequent pass of VACUUM or the brin_summarize_new_values() function will create the summary information. For data types with natural 1-D sort orders, the summary info consists of the maximum and the minimum values of each indexed column within each page range. This type of operator class we call "Minmax", and we supply a bunch of them for most data types with B-tree opclasses. Since the BRIN code is generalized, other approaches are possible for things such as arrays, geometric types, ranges, etc; even for things such as enum types we could do something different than minmax with better results. In this commit I only include minmax. Catalog version bumped due to new builtin catalog entries. There's more that could be done here, but this is a good step forwards. Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera, with contribution by Heikki Linnakangas. Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas. Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo. PS: The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
<varlistentry>
<term>
<function>brin_page_type(page bytea) returns text</function>
<indexterm>
<primary>brin_page_type</primary>
</indexterm>
</term>
<listitem>
<para>
<function>brin_page_type</function> returns the page type of the given
<acronym>BRIN</acronym> index page, or throws an error if the page is
not a valid <acronym>BRIN</acronym> page. For example:
<screen>
test=# SELECT brin_page_type(get_raw_page('brinidx', 0));
BRIN: Block Range Indexes BRIN is a new index access method intended to accelerate scans of very large tables, without the maintenance overhead of btrees or other traditional indexes. They work by maintaining "summary" data about block ranges. Bitmap index scans work by reading each summary tuple and comparing them with the query quals; all pages in the range are returned in a lossy TID bitmap if the quals are consistent with the values in the summary tuple, otherwise not. Normal index scans are not supported because these indexes do not store TIDs. As new tuples are added into the index, the summary information is updated (if the block range in which the tuple is added is already summarized) or not; in the latter case, a subsequent pass of VACUUM or the brin_summarize_new_values() function will create the summary information. For data types with natural 1-D sort orders, the summary info consists of the maximum and the minimum values of each indexed column within each page range. This type of operator class we call "Minmax", and we supply a bunch of them for most data types with B-tree opclasses. Since the BRIN code is generalized, other approaches are possible for things such as arrays, geometric types, ranges, etc; even for things such as enum types we could do something different than minmax with better results. In this commit I only include minmax. Catalog version bumped due to new builtin catalog entries. There's more that could be done here, but this is a good step forwards. Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera, with contribution by Heikki Linnakangas. Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas. Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo. PS: The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
brin_page_type
----------------
meta
</screen>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<function>brin_metapage_info(page bytea) returns record</function>
<indexterm>
<primary>brin_metapage_info</primary>
</indexterm>
</term>
<listitem>
<para>
<function>brin_metapage_info</function> returns assorted information
about a <acronym>BRIN</acronym> index metapage. For example:
<screen>
test=# SELECT * FROM brin_metapage_info(get_raw_page('brinidx', 0));
BRIN: Block Range Indexes BRIN is a new index access method intended to accelerate scans of very large tables, without the maintenance overhead of btrees or other traditional indexes. They work by maintaining "summary" data about block ranges. Bitmap index scans work by reading each summary tuple and comparing them with the query quals; all pages in the range are returned in a lossy TID bitmap if the quals are consistent with the values in the summary tuple, otherwise not. Normal index scans are not supported because these indexes do not store TIDs. As new tuples are added into the index, the summary information is updated (if the block range in which the tuple is added is already summarized) or not; in the latter case, a subsequent pass of VACUUM or the brin_summarize_new_values() function will create the summary information. For data types with natural 1-D sort orders, the summary info consists of the maximum and the minimum values of each indexed column within each page range. This type of operator class we call "Minmax", and we supply a bunch of them for most data types with B-tree opclasses. Since the BRIN code is generalized, other approaches are possible for things such as arrays, geometric types, ranges, etc; even for things such as enum types we could do something different than minmax with better results. In this commit I only include minmax. Catalog version bumped due to new builtin catalog entries. There's more that could be done here, but this is a good step forwards. Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera, with contribution by Heikki Linnakangas. Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas. Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo. PS: The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
magic | version | pagesperrange | lastrevmappage
------------+---------+---------------+----------------
0xA8109CFA | 1 | 4 | 2
</screen>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<function>brin_revmap_data(page bytea) returns setof tid</function>
<indexterm>
<primary>brin_revmap_data</primary>
</indexterm>
</term>
<listitem>
<para>
<function>brin_revmap_data</function> returns the list of tuple
identifiers in a <acronym>BRIN</acronym> index range map page.
For example:
<screen>
2016-10-27 18:00:00 +02:00
test=# SELECT * FROM brin_revmap_data(get_raw_page('brinidx', 2)) LIMIT 5;
BRIN: Block Range Indexes BRIN is a new index access method intended to accelerate scans of very large tables, without the maintenance overhead of btrees or other traditional indexes. They work by maintaining "summary" data about block ranges. Bitmap index scans work by reading each summary tuple and comparing them with the query quals; all pages in the range are returned in a lossy TID bitmap if the quals are consistent with the values in the summary tuple, otherwise not. Normal index scans are not supported because these indexes do not store TIDs. As new tuples are added into the index, the summary information is updated (if the block range in which the tuple is added is already summarized) or not; in the latter case, a subsequent pass of VACUUM or the brin_summarize_new_values() function will create the summary information. For data types with natural 1-D sort orders, the summary info consists of the maximum and the minimum values of each indexed column within each page range. This type of operator class we call "Minmax", and we supply a bunch of them for most data types with B-tree opclasses. Since the BRIN code is generalized, other approaches are possible for things such as arrays, geometric types, ranges, etc; even for things such as enum types we could do something different than minmax with better results. In this commit I only include minmax. Catalog version bumped due to new builtin catalog entries. There's more that could be done here, but this is a good step forwards. Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera, with contribution by Heikki Linnakangas. Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas. Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo. PS: The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
pages
---------
(6,137)
(6,138)
(6,139)
(6,140)
(6,141)
</screen>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<function>brin_page_items(page bytea, index oid) returns setof record</function>
<indexterm>
<primary>brin_page_items</primary>
</indexterm>
</term>
<listitem>
<para>
<function>brin_page_items</function> returns the data stored in the
<acronym>BRIN</acronym> data page. For example:
<screen>
test=# SELECT * FROM brin_page_items(get_raw_page('brinidx', 5),
'brinidx')
ORDER BY blknum, attnum LIMIT 6;
BRIN: Block Range Indexes BRIN is a new index access method intended to accelerate scans of very large tables, without the maintenance overhead of btrees or other traditional indexes. They work by maintaining "summary" data about block ranges. Bitmap index scans work by reading each summary tuple and comparing them with the query quals; all pages in the range are returned in a lossy TID bitmap if the quals are consistent with the values in the summary tuple, otherwise not. Normal index scans are not supported because these indexes do not store TIDs. As new tuples are added into the index, the summary information is updated (if the block range in which the tuple is added is already summarized) or not; in the latter case, a subsequent pass of VACUUM or the brin_summarize_new_values() function will create the summary information. For data types with natural 1-D sort orders, the summary info consists of the maximum and the minimum values of each indexed column within each page range. This type of operator class we call "Minmax", and we supply a bunch of them for most data types with B-tree opclasses. Since the BRIN code is generalized, other approaches are possible for things such as arrays, geometric types, ranges, etc; even for things such as enum types we could do something different than minmax with better results. In this commit I only include minmax. Catalog version bumped due to new builtin catalog entries. There's more that could be done here, but this is a good step forwards. Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera, with contribution by Heikki Linnakangas. Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas. Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo. PS: The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
------------+--------+--------+----------+----------+-------------+--------------
137 | 0 | 1 | t | f | f |
137 | 0 | 2 | f | f | f | {1 .. 88}
138 | 4 | 1 | t | f | f |
138 | 4 | 2 | f | f | f | {89 .. 176}
139 | 8 | 1 | t | f | f |
139 | 8 | 2 | f | f | f | {177 .. 264}
</screen>
The returned columns correspond to the fields in the
<structname>BrinMemTuple</structname> and <structname>BrinValues</structname> structs.
See <filename>src/include/access/brin_tuple.h</filename> for details.
BRIN: Block Range Indexes BRIN is a new index access method intended to accelerate scans of very large tables, without the maintenance overhead of btrees or other traditional indexes. They work by maintaining "summary" data about block ranges. Bitmap index scans work by reading each summary tuple and comparing them with the query quals; all pages in the range are returned in a lossy TID bitmap if the quals are consistent with the values in the summary tuple, otherwise not. Normal index scans are not supported because these indexes do not store TIDs. As new tuples are added into the index, the summary information is updated (if the block range in which the tuple is added is already summarized) or not; in the latter case, a subsequent pass of VACUUM or the brin_summarize_new_values() function will create the summary information. For data types with natural 1-D sort orders, the summary info consists of the maximum and the minimum values of each indexed column within each page range. This type of operator class we call "Minmax", and we supply a bunch of them for most data types with B-tree opclasses. Since the BRIN code is generalized, other approaches are possible for things such as arrays, geometric types, ranges, etc; even for things such as enum types we could do something different than minmax with better results. In this commit I only include minmax. Catalog version bumped due to new builtin catalog entries. There's more that could be done here, but this is a good step forwards. Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera, with contribution by Heikki Linnakangas. Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas. Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo. PS: The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2>
<title>GIN Functions</title>
BRIN: Block Range Indexes BRIN is a new index access method intended to accelerate scans of very large tables, without the maintenance overhead of btrees or other traditional indexes. They work by maintaining "summary" data about block ranges. Bitmap index scans work by reading each summary tuple and comparing them with the query quals; all pages in the range are returned in a lossy TID bitmap if the quals are consistent with the values in the summary tuple, otherwise not. Normal index scans are not supported because these indexes do not store TIDs. As new tuples are added into the index, the summary information is updated (if the block range in which the tuple is added is already summarized) or not; in the latter case, a subsequent pass of VACUUM or the brin_summarize_new_values() function will create the summary information. For data types with natural 1-D sort orders, the summary info consists of the maximum and the minimum values of each indexed column within each page range. This type of operator class we call "Minmax", and we supply a bunch of them for most data types with B-tree opclasses. Since the BRIN code is generalized, other approaches are possible for things such as arrays, geometric types, ranges, etc; even for things such as enum types we could do something different than minmax with better results. In this commit I only include minmax. Catalog version bumped due to new builtin catalog entries. There's more that could be done here, but this is a good step forwards. Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera, with contribution by Heikki Linnakangas. Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas. Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo. PS: The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
<variablelist>
<varlistentry>
<term>
<function>gin_metapage_info(page bytea) returns record</function>
<indexterm>
<primary>gin_metapage_info</primary>
</indexterm>
</term>
<listitem>
<para>
<function>gin_metapage_info</function> returns information about
a <acronym>GIN</acronym> index metapage. For example:
<screen>
test=# SELECT * FROM gin_metapage_info(get_raw_page('gin_index', 0));
-[ RECORD 1 ]----+-----------
pending_head | 4294967295
pending_tail | 4294967295
tail_free_size | 0
n_pending_pages | 0
n_pending_tuples | 0
n_total_pages | 7
n_entry_pages | 6
n_data_pages | 0
n_entries | 693
version | 2
</screen>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<function>gin_page_opaque_info(page bytea) returns record</function>
<indexterm>
<primary>gin_page_opaque_info</primary>
</indexterm>
</term>
<listitem>
<para>
<function>gin_page_opaque_info</function> returns information about
a <acronym>GIN</acronym> index opaque area, like the page type.
For example:
<screen>
test=# SELECT * FROM gin_page_opaque_info(get_raw_page('gin_index', 2));
rightlink | maxoff | flags
-----------+--------+------------------------
5 | 0 | {data,leaf,compressed}
(1 row)
</screen>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<function>gin_leafpage_items(page bytea) returns setof record</function>
<indexterm>
<primary>gin_leafpage_items</primary>
</indexterm>
</term>
<listitem>
<para>
<function>gin_leafpage_items</function> returns information about
the data stored in a <acronym>GIN</acronym> leaf page. For example:
<screen>
2016-10-27 18:00:00 +02:00
test=# SELECT first_tid, nbytes, tids[0:5] AS some_tids
FROM gin_leafpage_items(get_raw_page('gin_test_idx', 2));
first_tid | nbytes | some_tids
-----------+--------+----------------------------------------------------------
(8,41) | 244 | {"(8,41)","(8,43)","(8,44)","(8,45)","(8,46)"}
(10,45) | 248 | {"(10,45)","(10,46)","(10,47)","(10,48)","(10,49)"}
(12,52) | 248 | {"(12,52)","(12,53)","(12,54)","(12,55)","(12,56)"}
(14,59) | 320 | {"(14,59)","(14,60)","(14,61)","(14,62)","(14,63)"}
(167,16) | 376 | {"(167,16)","(167,17)","(167,18)","(167,19)","(167,20)"}
(170,30) | 376 | {"(170,30)","(170,31)","(170,32)","(170,33)","(170,34)"}
(173,44) | 197 | {"(173,44)","(173,45)","(173,46)","(173,47)","(173,48)"}
(7 rows)
</screen>
</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2>
<title>Hash Functions</title>
<variablelist>
<varlistentry>
<term>
<function>hash_page_type(page bytea) returns text</function>
<indexterm>
<primary>hash_page_type</primary>
</indexterm>
</term>
<listitem>
<para>
<function>hash_page_type</function> returns page type of
the given <acronym>HASH</acronym> index page. For example:
<screen>
test=# SELECT hash_page_type(get_raw_page('con_hash_index', 0));
hash_page_type
----------------
metapage
</screen>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<function>hash_page_stats(page bytea) returns setof record</function>
<indexterm>
<primary>hash_page_stats</primary>
</indexterm>
</term>
<listitem>
<para>
<function>hash_page_stats</function> returns information about
a bucket or overflow page of a <acronym>HASH</acronym> index.
For example:
<screen>
test=# SELECT * FROM hash_page_stats(get_raw_page('con_hash_index', 1));
-[ RECORD 1 ]---+-----------
live_items | 407
dead_items | 0
page_size | 8192
free_size | 8
hasho_prevblkno | 4096
hasho_nextblkno | 8474
hasho_bucket | 0
hasho_flag | 66
hasho_page_id | 65408
</screen>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<function>hash_page_items(page bytea) returns setof record</function>
<indexterm>
<primary>hash_page_items</primary>
</indexterm>
</term>
<listitem>
<para>
<function>hash_page_items</function> returns information about
the data stored in a bucket or overflow page of a <acronym>HASH</acronym>
index page. For example:
<screen>
test=# SELECT * FROM hash_page_items(get_raw_page('con_hash_index', 1)) LIMIT 5;
itemoffset | ctid | data
------------+-----------+------------
1 | (899,77) | 1053474816
2 | (897,29) | 1053474816
3 | (894,207) | 1053474816
4 | (892,159) | 1053474816
5 | (890,111) | 1053474816
</screen>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<function>hash_bitmap_info(index oid, blkno int) returns record</function>
<indexterm>
<primary>hash_bitmap_info</primary>
</indexterm>
</term>
<listitem>
<para>
<function>hash_bitmap_info</function> shows the status of a bit
in the bitmap page for a particular overflow page of <acronym>HASH</acronym>
index. For example:
<screen>
test=# SELECT * FROM hash_bitmap_info('con_hash_index', 2052);
bitmapblkno | bitmapbit | bitstatus
-------------+-----------+-----------
65 | 3 | t
</screen>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<function>hash_metapage_info(page bytea) returns record</function>
<indexterm>
<primary>hash_metapage_info</primary>
</indexterm>
</term>
<listitem>
<para>
<function>hash_metapage_info</function> returns information stored
in meta page of a <acronym>HASH</acronym> index. For example:
<screen>
test=# SELECT magic, version, ntuples, ffactor, bsize, bmsize, bmshift,
test-# maxbucket, highmask, lowmask, ovflpoint, firstfree, nmaps, procid,
test-# regexp_replace(spares::text, '(,0)*}', '}') as spares,
test-# regexp_replace(mapp::text, '(,0)*}', '}') as mapp
test-# FROM hash_metapage_info(get_raw_page('con_hash_index', 0));
-[ RECORD 1 ]-------------------------------------------------------------------------------
magic | 105121344
version | 4
ntuples | 500500
ffactor | 40
bsize | 8152
bmsize | 4096
bmshift | 15
maxbucket | 12512
highmask | 16383
lowmask | 8191
ovflpoint | 28
firstfree | 1204
nmaps | 1
procid | 450
spares | {0,0,0,0,0,0,1,1,1,1,1,1,1,1,3,4,4,4,45,55,58,59,508,567,628,704,1193,1202,1204}
mapp | {65}
</screen>
</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
</sect1>