postgresql/contrib/pageinspect/expected/btree.out

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

103 lines
3.1 KiB
Plaintext
Raw Normal View History

pageinspect: Add more sanity checks to prevent out-of-bound reads A couple of code paths use the special area on the page passed by the function caller, expecting to find some data in it. However, feeding an incorrect page can lead to out-of-bound reads when trying to access the page special area (like a heap page that has no special area, leading PageGetSpecialPointer() to grab a pointer outside the allocated page). The functions used for hash and btree indexes have some protection already against that, while some other functions using a relation OID as argument would make sure that the access method involved is correct, but functions taking in input a raw page without knowing the relation the page is attached to would run into problems. This commit improves the set of checks used in the code paths of BRIN, btree (including one check if a leaf page is found with a non-zero level), GIN and GiST to verify that the page given in input has a special area size that fits with each access method, which is done though PageGetSpecialSize(), becore calling PageGetSpecialPointer(). The scope of the checks done is limited to work with pages that one would pass after getting a block with get_raw_page(), as it is possible to craft byteas that could bypass existing code paths. Having too many checks would also impact the usability of pageinspect, as the existing code is very useful to look at the content details in a corrupted page, so the focus is really to avoid out-of-bound reads as this is never a good thing even with functions whose execution is limited to superusers. The safest approach could be to rework the functions so as these fetch a block using a relation OID and a block number, but there are also cases where using a raw page is useful. Tests are added to cover all the code paths that needed such checks, and an error message for hash indexes is reworded to fit better with what this commit adds. Reported-By: Alexander Lakhin Author: Julien Rouhaud, Michael Paquier Discussion: https://postgr.es/m/16527-ef7606186f0610a1@postgresql.org Discussion: https://postgr.es/m/561e187b-3549-c8d5-03f5-525c14e65bd0@postgrespro.ru Backpatch-through: 10
2022-03-27 10:53:55 +02:00
CREATE TABLE test1 (a int8, b int4range);
INSERT INTO test1 VALUES (72057594037927937, '[0,1)');
2016-09-29 18:00:00 +02:00
CREATE INDEX test1_a_idx ON test1 USING btree (a);
\x
SELECT * FROM bt_metap('test1_a_idx');
Skip full index scan during cleanup of B-tree indexes when possible Vacuum of index consists from two stages: multiple (zero of more) ambulkdelete calls and one amvacuumcleanup call. When workload on particular table is append-only, then autovacuum isn't intended to touch this table. However, user may run vacuum manually in order to fill visibility map and get benefits of index-only scans. Then ambulkdelete wouldn't be called for indexes of such table (because no heap tuples were deleted), only amvacuumcleanup would be called In this case, amvacuumcleanup would perform full index scan for two objectives: put recyclable pages into free space map and update index statistics. This patch allows btvacuumclanup to skip full index scan when two conditions are satisfied: no pages are going to be put into free space map and index statistics isn't stalled. In order to check first condition, we store oldest btpo_xact in the meta-page. When it's precedes RecentGlobalXmin, then there are some recyclable pages. In order to check second condition we store number of heap tuples observed during previous full index scan by cleanup. If fraction of newly inserted tuples is less than vacuum_cleanup_index_scale_factor, then statistics isn't considered to be stalled. vacuum_cleanup_index_scale_factor can be defined as both reloption and GUC (default). This patch bumps B-tree meta-page version. Upgrade of meta-page is performed "on the fly": during VACUUM meta-page is rewritten with new version. No special handling in pg_upgrade is required. Author: Masahiko Sawada, Alexander Korotkov Review by: Peter Geoghegan, Kyotaro Horiguchi, Alexander Korotkov, Yura Sokolov Discussion: https://www.postgresql.org/message-id/flat/CAD21AoAX+d2oD_nrd9O2YkpzHaFr=uQeGr9s1rKC3O4ENc568g@mail.gmail.com
2018-04-04 18:29:00 +02:00
-[ RECORD 1 ]-----------+-------
magic | 340322
Make heap TID a tiebreaker nbtree index column. Make nbtree treat all index tuples as having a heap TID attribute. Index searches can distinguish duplicates by heap TID, since heap TID is always guaranteed to be unique. This general approach has numerous benefits for performance, and is prerequisite to teaching VACUUM to perform "retail index tuple deletion". Naively adding a new attribute to every pivot tuple has unacceptable overhead (it bloats internal pages), so suffix truncation of pivot tuples is added. This will usually truncate away the "extra" heap TID attribute from pivot tuples during a leaf page split, and may also truncate away additional user attributes. This can increase fan-out, especially in a multi-column index. Truncation can only occur at the attribute granularity, which isn't particularly effective, but works well enough for now. A future patch may add support for truncating "within" text attributes by generating truncated key values using new opclass infrastructure. Only new indexes (BTREE_VERSION 4 indexes) will have insertions that treat heap TID as a tiebreaker attribute, or will have pivot tuples undergo suffix truncation during a leaf page split (on-disk compatibility with versions 2 and 3 is preserved). Upgrades to version 4 cannot be performed on-the-fly, unlike upgrades from version 2 to version 3. contrib/amcheck continues to work with version 2 and 3 indexes, while also enforcing stricter invariants when verifying version 4 indexes. These stricter invariants are the same invariants described by "3.1.12 Sequencing" from the Lehman and Yao paper. A later patch will enhance the logic used by nbtree to pick a split point. This patch is likely to negatively impact performance without smarter choices around the precise point to split leaf pages at. Making these two mostly-distinct sets of enhancements into distinct commits seems like it might clarify their design, even though neither commit is particularly useful on its own. The maximum allowed size of new tuples is reduced by an amount equal to the space required to store an extra MAXALIGN()'d TID in a new high key during leaf page splits. The user-facing definition of the "1/3 of a page" restriction is already imprecise, and so does not need to be revised. However, there should be a compatibility note in the v12 release notes. Author: Peter Geoghegan Reviewed-By: Heikki Linnakangas, Alexander Korotkov Discussion: https://postgr.es/m/CAH2-WzkVb0Kom=R+88fDFb=JSxZMFvbHVC6Mn9LJ2n=X=kS-Uw@mail.gmail.com
2019-03-20 18:04:01 +01:00
version | 4
Skip full index scan during cleanup of B-tree indexes when possible Vacuum of index consists from two stages: multiple (zero of more) ambulkdelete calls and one amvacuumcleanup call. When workload on particular table is append-only, then autovacuum isn't intended to touch this table. However, user may run vacuum manually in order to fill visibility map and get benefits of index-only scans. Then ambulkdelete wouldn't be called for indexes of such table (because no heap tuples were deleted), only amvacuumcleanup would be called In this case, amvacuumcleanup would perform full index scan for two objectives: put recyclable pages into free space map and update index statistics. This patch allows btvacuumclanup to skip full index scan when two conditions are satisfied: no pages are going to be put into free space map and index statistics isn't stalled. In order to check first condition, we store oldest btpo_xact in the meta-page. When it's precedes RecentGlobalXmin, then there are some recyclable pages. In order to check second condition we store number of heap tuples observed during previous full index scan by cleanup. If fraction of newly inserted tuples is less than vacuum_cleanup_index_scale_factor, then statistics isn't considered to be stalled. vacuum_cleanup_index_scale_factor can be defined as both reloption and GUC (default). This patch bumps B-tree meta-page version. Upgrade of meta-page is performed "on the fly": during VACUUM meta-page is rewritten with new version. No special handling in pg_upgrade is required. Author: Masahiko Sawada, Alexander Korotkov Review by: Peter Geoghegan, Kyotaro Horiguchi, Alexander Korotkov, Yura Sokolov Discussion: https://www.postgresql.org/message-id/flat/CAD21AoAX+d2oD_nrd9O2YkpzHaFr=uQeGr9s1rKC3O4ENc568g@mail.gmail.com
2018-04-04 18:29:00 +02:00
root | 1
level | 0
fastroot | 1
fastlevel | 0
oldest_xact | 0
last_cleanup_num_tuples | -1
allequalimage | t
2016-09-29 18:00:00 +02:00
SELECT * FROM bt_page_stats('test1_a_idx', 0);
ERROR: block 0 is a meta page
SELECT * FROM bt_page_stats('test1_a_idx', 1);
-[ RECORD 1 ]-+-----
blkno | 1
type | l
live_items | 1
dead_items | 0
avg_item_size | 16
page_size | 8192
free_size | 8128
btpo_prev | 0
btpo_next | 0
btpo | 0
btpo_flags | 3
SELECT * FROM bt_page_stats('test1_a_idx', 2);
ERROR: block number out of range
SELECT * FROM bt_page_items('test1_a_idx', 0);
ERROR: block 0 is a meta page
SELECT * FROM bt_page_items('test1_a_idx', 1);
-[ RECORD 1 ]-----------------------
itemoffset | 1
ctid | (0,1)
itemlen | 16
nulls | f
vars | f
data | 01 00 00 00 00 00 00 01
dead | f
htid | (0,1)
tids |
2016-09-29 18:00:00 +02:00
SELECT * FROM bt_page_items('test1_a_idx', 2);
ERROR: block number out of range
SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 0));
ERROR: block is a meta page
SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 1));
-[ RECORD 1 ]-----------------------
itemoffset | 1
ctid | (0,1)
itemlen | 16
nulls | f
vars | f
data | 01 00 00 00 00 00 00 01
dead | f
htid | (0,1)
tids |
SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 2));
ERROR: block number 2 is out of range for relation "test1_a_idx"
2022-03-16 03:20:51 +01:00
-- Failure when using a non-btree index.
CREATE INDEX test1_a_hash ON test1 USING hash(a);
SELECT bt_metap('test1_a_hash');
ERROR: "test1_a_hash" is not a btree index
SELECT bt_page_stats('test1_a_hash', 0);
ERROR: "test1_a_hash" is not a btree index
SELECT bt_page_items('test1_a_hash', 0);
ERROR: "test1_a_hash" is not a btree index
pageinspect: Add more sanity checks to prevent out-of-bound reads A couple of code paths use the special area on the page passed by the function caller, expecting to find some data in it. However, feeding an incorrect page can lead to out-of-bound reads when trying to access the page special area (like a heap page that has no special area, leading PageGetSpecialPointer() to grab a pointer outside the allocated page). The functions used for hash and btree indexes have some protection already against that, while some other functions using a relation OID as argument would make sure that the access method involved is correct, but functions taking in input a raw page without knowing the relation the page is attached to would run into problems. This commit improves the set of checks used in the code paths of BRIN, btree (including one check if a leaf page is found with a non-zero level), GIN and GiST to verify that the page given in input has a special area size that fits with each access method, which is done though PageGetSpecialSize(), becore calling PageGetSpecialPointer(). The scope of the checks done is limited to work with pages that one would pass after getting a block with get_raw_page(), as it is possible to craft byteas that could bypass existing code paths. Having too many checks would also impact the usability of pageinspect, as the existing code is very useful to look at the content details in a corrupted page, so the focus is really to avoid out-of-bound reads as this is never a good thing even with functions whose execution is limited to superusers. The safest approach could be to rework the functions so as these fetch a block using a relation OID and a block number, but there are also cases where using a raw page is useful. Tests are added to cover all the code paths that needed such checks, and an error message for hash indexes is reworded to fit better with what this commit adds. Reported-By: Alexander Lakhin Author: Julien Rouhaud, Michael Paquier Discussion: https://postgr.es/m/16527-ef7606186f0610a1@postgresql.org Discussion: https://postgr.es/m/561e187b-3549-c8d5-03f5-525c14e65bd0@postgrespro.ru Backpatch-through: 10
2022-03-27 10:53:55 +02:00
SELECT bt_page_items(get_raw_page('test1_a_hash', 0));
ERROR: block is a meta page
CREATE INDEX test1_b_gist ON test1 USING gist(b);
-- Special area of GiST is the same as btree, this complains about inconsistent
-- leaf data on the page.
SELECT bt_page_items(get_raw_page('test1_b_gist', 0));
ERROR: block is not a valid btree leaf page
-- Several failure modes.
2022-03-16 03:20:51 +01:00
-- Suppress the DETAIL message, to allow the tests to work across various
pageinspect: Add more sanity checks to prevent out-of-bound reads A couple of code paths use the special area on the page passed by the function caller, expecting to find some data in it. However, feeding an incorrect page can lead to out-of-bound reads when trying to access the page special area (like a heap page that has no special area, leading PageGetSpecialPointer() to grab a pointer outside the allocated page). The functions used for hash and btree indexes have some protection already against that, while some other functions using a relation OID as argument would make sure that the access method involved is correct, but functions taking in input a raw page without knowing the relation the page is attached to would run into problems. This commit improves the set of checks used in the code paths of BRIN, btree (including one check if a leaf page is found with a non-zero level), GIN and GiST to verify that the page given in input has a special area size that fits with each access method, which is done though PageGetSpecialSize(), becore calling PageGetSpecialPointer(). The scope of the checks done is limited to work with pages that one would pass after getting a block with get_raw_page(), as it is possible to craft byteas that could bypass existing code paths. Having too many checks would also impact the usability of pageinspect, as the existing code is very useful to look at the content details in a corrupted page, so the focus is really to avoid out-of-bound reads as this is never a good thing even with functions whose execution is limited to superusers. The safest approach could be to rework the functions so as these fetch a block using a relation OID and a block number, but there are also cases where using a raw page is useful. Tests are added to cover all the code paths that needed such checks, and an error message for hash indexes is reworded to fit better with what this commit adds. Reported-By: Alexander Lakhin Author: Julien Rouhaud, Michael Paquier Discussion: https://postgr.es/m/16527-ef7606186f0610a1@postgresql.org Discussion: https://postgr.es/m/561e187b-3549-c8d5-03f5-525c14e65bd0@postgrespro.ru Backpatch-through: 10
2022-03-27 10:53:55 +02:00
-- page sizes and architectures.
2022-03-16 03:20:51 +01:00
\set VERBOSITY terse
pageinspect: Add more sanity checks to prevent out-of-bound reads A couple of code paths use the special area on the page passed by the function caller, expecting to find some data in it. However, feeding an incorrect page can lead to out-of-bound reads when trying to access the page special area (like a heap page that has no special area, leading PageGetSpecialPointer() to grab a pointer outside the allocated page). The functions used for hash and btree indexes have some protection already against that, while some other functions using a relation OID as argument would make sure that the access method involved is correct, but functions taking in input a raw page without knowing the relation the page is attached to would run into problems. This commit improves the set of checks used in the code paths of BRIN, btree (including one check if a leaf page is found with a non-zero level), GIN and GiST to verify that the page given in input has a special area size that fits with each access method, which is done though PageGetSpecialSize(), becore calling PageGetSpecialPointer(). The scope of the checks done is limited to work with pages that one would pass after getting a block with get_raw_page(), as it is possible to craft byteas that could bypass existing code paths. Having too many checks would also impact the usability of pageinspect, as the existing code is very useful to look at the content details in a corrupted page, so the focus is really to avoid out-of-bound reads as this is never a good thing even with functions whose execution is limited to superusers. The safest approach could be to rework the functions so as these fetch a block using a relation OID and a block number, but there are also cases where using a raw page is useful. Tests are added to cover all the code paths that needed such checks, and an error message for hash indexes is reworded to fit better with what this commit adds. Reported-By: Alexander Lakhin Author: Julien Rouhaud, Michael Paquier Discussion: https://postgr.es/m/16527-ef7606186f0610a1@postgresql.org Discussion: https://postgr.es/m/561e187b-3549-c8d5-03f5-525c14e65bd0@postgrespro.ru Backpatch-through: 10
2022-03-27 10:53:55 +02:00
-- invalid page size
2022-03-16 03:20:51 +01:00
SELECT bt_page_items('aaa'::bytea);
ERROR: invalid page size
pageinspect: Add more sanity checks to prevent out-of-bound reads A couple of code paths use the special area on the page passed by the function caller, expecting to find some data in it. However, feeding an incorrect page can lead to out-of-bound reads when trying to access the page special area (like a heap page that has no special area, leading PageGetSpecialPointer() to grab a pointer outside the allocated page). The functions used for hash and btree indexes have some protection already against that, while some other functions using a relation OID as argument would make sure that the access method involved is correct, but functions taking in input a raw page without knowing the relation the page is attached to would run into problems. This commit improves the set of checks used in the code paths of BRIN, btree (including one check if a leaf page is found with a non-zero level), GIN and GiST to verify that the page given in input has a special area size that fits with each access method, which is done though PageGetSpecialSize(), becore calling PageGetSpecialPointer(). The scope of the checks done is limited to work with pages that one would pass after getting a block with get_raw_page(), as it is possible to craft byteas that could bypass existing code paths. Having too many checks would also impact the usability of pageinspect, as the existing code is very useful to look at the content details in a corrupted page, so the focus is really to avoid out-of-bound reads as this is never a good thing even with functions whose execution is limited to superusers. The safest approach could be to rework the functions so as these fetch a block using a relation OID and a block number, but there are also cases where using a raw page is useful. Tests are added to cover all the code paths that needed such checks, and an error message for hash indexes is reworded to fit better with what this commit adds. Reported-By: Alexander Lakhin Author: Julien Rouhaud, Michael Paquier Discussion: https://postgr.es/m/16527-ef7606186f0610a1@postgresql.org Discussion: https://postgr.es/m/561e187b-3549-c8d5-03f5-525c14e65bd0@postgrespro.ru Backpatch-through: 10
2022-03-27 10:53:55 +02:00
-- invalid special area size
CREATE INDEX test1_a_brin ON test1 USING brin(a);
SELECT bt_page_items(get_raw_page('test1', 0));
ERROR: input page is not a valid btree page
SELECT bt_page_items(get_raw_page('test1_a_brin', 0));
ERROR: input page is not a valid btree page
2022-03-16 03:20:51 +01:00
\set VERBOSITY default
-- Tests with all-zero pages.
SHOW block_size \gset
SELECT bt_page_items(decode(repeat('00', :block_size), 'hex'));
-[ RECORD 1 ]-+-
bt_page_items |
2016-09-29 18:00:00 +02:00
DROP TABLE test1;