postgresql/contrib/pageinspect/expected/btree.out

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

109 lines
3.4 KiB
Plaintext
Raw Normal View History

pageinspect: Add more sanity checks to prevent out-of-bound reads A couple of code paths use the special area on the page passed by the function caller, expecting to find some data in it. However, feeding an incorrect page can lead to out-of-bound reads when trying to access the page special area (like a heap page that has no special area, leading PageGetSpecialPointer() to grab a pointer outside the allocated page). The functions used for hash and btree indexes have some protection already against that, while some other functions using a relation OID as argument would make sure that the access method involved is correct, but functions taking in input a raw page without knowing the relation the page is attached to would run into problems. This commit improves the set of checks used in the code paths of BRIN, btree (including one check if a leaf page is found with a non-zero level), GIN and GiST to verify that the page given in input has a special area size that fits with each access method, which is done though PageGetSpecialSize(), becore calling PageGetSpecialPointer(). The scope of the checks done is limited to work with pages that one would pass after getting a block with get_raw_page(), as it is possible to craft byteas that could bypass existing code paths. Having too many checks would also impact the usability of pageinspect, as the existing code is very useful to look at the content details in a corrupted page, so the focus is really to avoid out-of-bound reads as this is never a good thing even with functions whose execution is limited to superusers. The safest approach could be to rework the functions so as these fetch a block using a relation OID and a block number, but there are also cases where using a raw page is useful. Tests are added to cover all the code paths that needed such checks, and an error message for hash indexes is reworded to fit better with what this commit adds. Reported-By: Alexander Lakhin Author: Julien Rouhaud, Michael Paquier Discussion: https://postgr.es/m/16527-ef7606186f0610a1@postgresql.org Discussion: https://postgr.es/m/561e187b-3549-c8d5-03f5-525c14e65bd0@postgrespro.ru Backpatch-through: 10
2022-03-27 10:53:51 +02:00
CREATE TABLE test1 (a int8, b int4range);
INSERT INTO test1 VALUES (72057594037927937, '[0,1)');
2016-09-29 18:00:00 +02:00
CREATE INDEX test1_a_idx ON test1 USING btree (a);
\x
SELECT * FROM bt_metap('test1_a_idx');
Use full 64-bit XIDs in deleted nbtree pages. Otherwise we risk "leaking" deleted pages by making them non-recyclable indefinitely. Commit 6655a729 did the same thing for deleted pages in GiST indexes. That work was used as a starting point here. Stop storing an XID indicating the oldest bpto.xact across all deleted though unrecycled pages in nbtree metapages. There is no longer any reason to care about that condition/the oldest XID. It only ever made sense when wraparound was something _bt_vacuum_needs_cleanup() had to consider. The btm_oldest_btpo_xact metapage field has been repurposed and renamed. It is now btm_last_cleanup_num_delpages, which is used to remember how many non-recycled deleted pages remain from the last VACUUM (in practice its value is usually the precise number of pages that were _newly deleted_ during the specific VACUUM operation that last set the field). The general idea behind storing btm_last_cleanup_num_delpages is to use it to give _some_ consideration to non-recycled deleted pages inside _bt_vacuum_needs_cleanup() -- though never too much. We only really need to avoid leaving a truly excessive number of deleted pages in an unrecycled state forever. We only do this to cover certain narrow cases where no other factor makes VACUUM do a full scan, and yet the index continues to grow (and so actually misses out on recycling existing deleted pages). These metapage changes result in a clear user-visible benefit: We no longer trigger full index scans during VACUUM operations solely due to the presence of only 1 or 2 known deleted (though unrecycled) blocks from a very large index. All that matters now is keeping the costs and benefits in balance over time. Fix an issue that has been around since commit 857f9c36, which added the "skip full scan of index" mechanism (i.e. the _bt_vacuum_needs_cleanup() logic). The accuracy of btm_last_cleanup_num_heap_tuples accidentally hinged upon _when_ the source value gets stored. We now always store btm_last_cleanup_num_heap_tuples in btvacuumcleanup(). This fixes the issue because IndexVacuumInfo.num_heap_tuples (the source field) is expected to accurately indicate the state of the table _after_ the VACUUM completes inside btvacuumcleanup(). A backpatchable fix cannot easily be extracted from this commit. A targeted fix for the issue will follow in a later commit, though that won't happen today. I (pgeoghegan) have chosen to remove any mention of deleted pages in the documentation of the vacuum_cleanup_index_scale_factor GUC/param, since the presence of deleted (though unrecycled) pages is no longer of much concern to users. The vacuum_cleanup_index_scale_factor description in the docs now seems rather unclear in any case, and it should probably be rewritten in the near future. Perhaps some passing mention of page deletion will be added back at the same time. Bump XLOG_PAGE_MAGIC due to nbtree WAL records using full XIDs now. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX=d5u-tRYhi-F4wnV2uN2zHpMUXw@mail.gmail.com
2021-02-25 03:41:34 +01:00
-[ RECORD 1 ]-------------+-------
magic | 340322
version | 4
root | 1
level | 0
fastroot | 1
fastlevel | 0
last_cleanup_num_delpages | 0
last_cleanup_num_tuples | -1
allequalimage | t
2016-09-29 18:00:00 +02:00
SELECT * FROM bt_page_stats('test1_a_idx', -1);
ERROR: invalid block number
2016-09-29 18:00:00 +02:00
SELECT * FROM bt_page_stats('test1_a_idx', 0);
ERROR: block 0 is a meta page
SELECT * FROM bt_page_stats('test1_a_idx', 1);
-[ RECORD 1 ]-+-----
blkno | 1
type | l
live_items | 1
dead_items | 0
avg_item_size | 16
page_size | 8192
free_size | 8128
btpo_prev | 0
btpo_next | 0
Use full 64-bit XIDs in deleted nbtree pages. Otherwise we risk "leaking" deleted pages by making them non-recyclable indefinitely. Commit 6655a729 did the same thing for deleted pages in GiST indexes. That work was used as a starting point here. Stop storing an XID indicating the oldest bpto.xact across all deleted though unrecycled pages in nbtree metapages. There is no longer any reason to care about that condition/the oldest XID. It only ever made sense when wraparound was something _bt_vacuum_needs_cleanup() had to consider. The btm_oldest_btpo_xact metapage field has been repurposed and renamed. It is now btm_last_cleanup_num_delpages, which is used to remember how many non-recycled deleted pages remain from the last VACUUM (in practice its value is usually the precise number of pages that were _newly deleted_ during the specific VACUUM operation that last set the field). The general idea behind storing btm_last_cleanup_num_delpages is to use it to give _some_ consideration to non-recycled deleted pages inside _bt_vacuum_needs_cleanup() -- though never too much. We only really need to avoid leaving a truly excessive number of deleted pages in an unrecycled state forever. We only do this to cover certain narrow cases where no other factor makes VACUUM do a full scan, and yet the index continues to grow (and so actually misses out on recycling existing deleted pages). These metapage changes result in a clear user-visible benefit: We no longer trigger full index scans during VACUUM operations solely due to the presence of only 1 or 2 known deleted (though unrecycled) blocks from a very large index. All that matters now is keeping the costs and benefits in balance over time. Fix an issue that has been around since commit 857f9c36, which added the "skip full scan of index" mechanism (i.e. the _bt_vacuum_needs_cleanup() logic). The accuracy of btm_last_cleanup_num_heap_tuples accidentally hinged upon _when_ the source value gets stored. We now always store btm_last_cleanup_num_heap_tuples in btvacuumcleanup(). This fixes the issue because IndexVacuumInfo.num_heap_tuples (the source field) is expected to accurately indicate the state of the table _after_ the VACUUM completes inside btvacuumcleanup(). A backpatchable fix cannot easily be extracted from this commit. A targeted fix for the issue will follow in a later commit, though that won't happen today. I (pgeoghegan) have chosen to remove any mention of deleted pages in the documentation of the vacuum_cleanup_index_scale_factor GUC/param, since the presence of deleted (though unrecycled) pages is no longer of much concern to users. The vacuum_cleanup_index_scale_factor description in the docs now seems rather unclear in any case, and it should probably be rewritten in the near future. Perhaps some passing mention of page deletion will be added back at the same time. Bump XLOG_PAGE_MAGIC due to nbtree WAL records using full XIDs now. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX=d5u-tRYhi-F4wnV2uN2zHpMUXw@mail.gmail.com
2021-02-25 03:41:34 +01:00
btpo_level | 0
2016-09-29 18:00:00 +02:00
btpo_flags | 3
SELECT * FROM bt_page_stats('test1_a_idx', 2);
ERROR: block number out of range
SELECT * FROM bt_page_items('test1_a_idx', -1);
ERROR: invalid block number
2016-09-29 18:00:00 +02:00
SELECT * FROM bt_page_items('test1_a_idx', 0);
ERROR: block 0 is a meta page
SELECT * FROM bt_page_items('test1_a_idx', 1);
-[ RECORD 1 ]-----------------------
itemoffset | 1
ctid | (0,1)
itemlen | 16
nulls | f
vars | f
data | 01 00 00 00 00 00 00 01
dead | f
htid | (0,1)
tids |
2016-09-29 18:00:00 +02:00
SELECT * FROM bt_page_items('test1_a_idx', 2);
ERROR: block number out of range
SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', -1));
ERROR: invalid block number
SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 0));
ERROR: block is a meta page
SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 1));
-[ RECORD 1 ]-----------------------
itemoffset | 1
ctid | (0,1)
itemlen | 16
nulls | f
vars | f
data | 01 00 00 00 00 00 00 01
dead | f
htid | (0,1)
tids |
SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 2));
ERROR: block number 2 is out of range for relation "test1_a_idx"
2022-03-16 03:20:47 +01:00
-- Failure when using a non-btree index.
CREATE INDEX test1_a_hash ON test1 USING hash(a);
SELECT bt_metap('test1_a_hash');
ERROR: "test1_a_hash" is not a btree index
SELECT bt_page_stats('test1_a_hash', 0);
ERROR: "test1_a_hash" is not a btree index
SELECT bt_page_items('test1_a_hash', 0);
ERROR: "test1_a_hash" is not a btree index
pageinspect: Add more sanity checks to prevent out-of-bound reads A couple of code paths use the special area on the page passed by the function caller, expecting to find some data in it. However, feeding an incorrect page can lead to out-of-bound reads when trying to access the page special area (like a heap page that has no special area, leading PageGetSpecialPointer() to grab a pointer outside the allocated page). The functions used for hash and btree indexes have some protection already against that, while some other functions using a relation OID as argument would make sure that the access method involved is correct, but functions taking in input a raw page without knowing the relation the page is attached to would run into problems. This commit improves the set of checks used in the code paths of BRIN, btree (including one check if a leaf page is found with a non-zero level), GIN and GiST to verify that the page given in input has a special area size that fits with each access method, which is done though PageGetSpecialSize(), becore calling PageGetSpecialPointer(). The scope of the checks done is limited to work with pages that one would pass after getting a block with get_raw_page(), as it is possible to craft byteas that could bypass existing code paths. Having too many checks would also impact the usability of pageinspect, as the existing code is very useful to look at the content details in a corrupted page, so the focus is really to avoid out-of-bound reads as this is never a good thing even with functions whose execution is limited to superusers. The safest approach could be to rework the functions so as these fetch a block using a relation OID and a block number, but there are also cases where using a raw page is useful. Tests are added to cover all the code paths that needed such checks, and an error message for hash indexes is reworded to fit better with what this commit adds. Reported-By: Alexander Lakhin Author: Julien Rouhaud, Michael Paquier Discussion: https://postgr.es/m/16527-ef7606186f0610a1@postgresql.org Discussion: https://postgr.es/m/561e187b-3549-c8d5-03f5-525c14e65bd0@postgrespro.ru Backpatch-through: 10
2022-03-27 10:53:51 +02:00
SELECT bt_page_items(get_raw_page('test1_a_hash', 0));
ERROR: block is a meta page
CREATE INDEX test1_b_gist ON test1 USING gist(b);
-- Special area of GiST is the same as btree, this complains about inconsistent
-- leaf data on the page.
SELECT bt_page_items(get_raw_page('test1_b_gist', 0));
ERROR: block is not a valid btree leaf page
-- Several failure modes.
2022-03-16 03:20:47 +01:00
-- Suppress the DETAIL message, to allow the tests to work across various
pageinspect: Add more sanity checks to prevent out-of-bound reads A couple of code paths use the special area on the page passed by the function caller, expecting to find some data in it. However, feeding an incorrect page can lead to out-of-bound reads when trying to access the page special area (like a heap page that has no special area, leading PageGetSpecialPointer() to grab a pointer outside the allocated page). The functions used for hash and btree indexes have some protection already against that, while some other functions using a relation OID as argument would make sure that the access method involved is correct, but functions taking in input a raw page without knowing the relation the page is attached to would run into problems. This commit improves the set of checks used in the code paths of BRIN, btree (including one check if a leaf page is found with a non-zero level), GIN and GiST to verify that the page given in input has a special area size that fits with each access method, which is done though PageGetSpecialSize(), becore calling PageGetSpecialPointer(). The scope of the checks done is limited to work with pages that one would pass after getting a block with get_raw_page(), as it is possible to craft byteas that could bypass existing code paths. Having too many checks would also impact the usability of pageinspect, as the existing code is very useful to look at the content details in a corrupted page, so the focus is really to avoid out-of-bound reads as this is never a good thing even with functions whose execution is limited to superusers. The safest approach could be to rework the functions so as these fetch a block using a relation OID and a block number, but there are also cases where using a raw page is useful. Tests are added to cover all the code paths that needed such checks, and an error message for hash indexes is reworded to fit better with what this commit adds. Reported-By: Alexander Lakhin Author: Julien Rouhaud, Michael Paquier Discussion: https://postgr.es/m/16527-ef7606186f0610a1@postgresql.org Discussion: https://postgr.es/m/561e187b-3549-c8d5-03f5-525c14e65bd0@postgrespro.ru Backpatch-through: 10
2022-03-27 10:53:51 +02:00
-- page sizes and architectures.
2022-03-16 03:20:47 +01:00
\set VERBOSITY terse
pageinspect: Add more sanity checks to prevent out-of-bound reads A couple of code paths use the special area on the page passed by the function caller, expecting to find some data in it. However, feeding an incorrect page can lead to out-of-bound reads when trying to access the page special area (like a heap page that has no special area, leading PageGetSpecialPointer() to grab a pointer outside the allocated page). The functions used for hash and btree indexes have some protection already against that, while some other functions using a relation OID as argument would make sure that the access method involved is correct, but functions taking in input a raw page without knowing the relation the page is attached to would run into problems. This commit improves the set of checks used in the code paths of BRIN, btree (including one check if a leaf page is found with a non-zero level), GIN and GiST to verify that the page given in input has a special area size that fits with each access method, which is done though PageGetSpecialSize(), becore calling PageGetSpecialPointer(). The scope of the checks done is limited to work with pages that one would pass after getting a block with get_raw_page(), as it is possible to craft byteas that could bypass existing code paths. Having too many checks would also impact the usability of pageinspect, as the existing code is very useful to look at the content details in a corrupted page, so the focus is really to avoid out-of-bound reads as this is never a good thing even with functions whose execution is limited to superusers. The safest approach could be to rework the functions so as these fetch a block using a relation OID and a block number, but there are also cases where using a raw page is useful. Tests are added to cover all the code paths that needed such checks, and an error message for hash indexes is reworded to fit better with what this commit adds. Reported-By: Alexander Lakhin Author: Julien Rouhaud, Michael Paquier Discussion: https://postgr.es/m/16527-ef7606186f0610a1@postgresql.org Discussion: https://postgr.es/m/561e187b-3549-c8d5-03f5-525c14e65bd0@postgrespro.ru Backpatch-through: 10
2022-03-27 10:53:51 +02:00
-- invalid page size
2022-03-16 03:20:47 +01:00
SELECT bt_page_items('aaa'::bytea);
ERROR: invalid page size
pageinspect: Add more sanity checks to prevent out-of-bound reads A couple of code paths use the special area on the page passed by the function caller, expecting to find some data in it. However, feeding an incorrect page can lead to out-of-bound reads when trying to access the page special area (like a heap page that has no special area, leading PageGetSpecialPointer() to grab a pointer outside the allocated page). The functions used for hash and btree indexes have some protection already against that, while some other functions using a relation OID as argument would make sure that the access method involved is correct, but functions taking in input a raw page without knowing the relation the page is attached to would run into problems. This commit improves the set of checks used in the code paths of BRIN, btree (including one check if a leaf page is found with a non-zero level), GIN and GiST to verify that the page given in input has a special area size that fits with each access method, which is done though PageGetSpecialSize(), becore calling PageGetSpecialPointer(). The scope of the checks done is limited to work with pages that one would pass after getting a block with get_raw_page(), as it is possible to craft byteas that could bypass existing code paths. Having too many checks would also impact the usability of pageinspect, as the existing code is very useful to look at the content details in a corrupted page, so the focus is really to avoid out-of-bound reads as this is never a good thing even with functions whose execution is limited to superusers. The safest approach could be to rework the functions so as these fetch a block using a relation OID and a block number, but there are also cases where using a raw page is useful. Tests are added to cover all the code paths that needed such checks, and an error message for hash indexes is reworded to fit better with what this commit adds. Reported-By: Alexander Lakhin Author: Julien Rouhaud, Michael Paquier Discussion: https://postgr.es/m/16527-ef7606186f0610a1@postgresql.org Discussion: https://postgr.es/m/561e187b-3549-c8d5-03f5-525c14e65bd0@postgrespro.ru Backpatch-through: 10
2022-03-27 10:53:51 +02:00
-- invalid special area size
CREATE INDEX test1_a_brin ON test1 USING brin(a);
SELECT bt_page_items(get_raw_page('test1', 0));
ERROR: input page is not a valid btree page
SELECT bt_page_items(get_raw_page('test1_a_brin', 0));
ERROR: input page is not a valid btree page
2022-03-16 03:20:47 +01:00
\set VERBOSITY default
-- Tests with all-zero pages.
SHOW block_size \gset
SELECT bt_page_items(decode(repeat('00', :block_size), 'hex'));
-[ RECORD 1 ]-+-
bt_page_items |
2016-09-29 18:00:00 +02:00
DROP TABLE test1;