Fix nbtree backward scan race condition comments.

Remove comments that supposed that holding a pin was a useful interlock
for _bt_walk_left().  There are times when _bt_walk_left() doesn't hold
either a lock or a pin on any page, so clearly this can't be true.
_bt_walk_left() is even prepared to deal with concurrent deletion of
both the original page and any pages to its left.

Oversight in commit 2ed5b87f96.
This commit is contained in:
Peter Geoghegan 2023-12-08 15:37:53 -08:00
parent dc3f9bc549
commit aa210e0c12
1 changed files with 8 additions and 19 deletions

View File

@ -2036,8 +2036,8 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
* _bt_readnextpage() -- Read next page containing valid data for scan * _bt_readnextpage() -- Read next page containing valid data for scan
* *
* On success exit, so->currPos is updated to contain data from the next * On success exit, so->currPos is updated to contain data from the next
* interesting page. Caller is responsible to release lock and pin on * interesting page, and we return true. Caller must release the lock (and
* buffer on success. We return true to indicate success. * maybe the pin) on the buffer on success exit.
* *
* If there are no more matching records in the given direction, we drop all * If there are no more matching records in the given direction, we drop all
* locks and pins, set so->currPos.buf to InvalidBuffer, and return false. * locks and pins, set so->currPos.buf to InvalidBuffer, and return false.
@ -2127,18 +2127,9 @@ _bt_readnextpage(IndexScanDesc scan, BlockNumber blkno, ScanDirection dir)
* *
* It might be possible to rearrange this code to have less overhead * It might be possible to rearrange this code to have less overhead
* in pinning and locking, but that would require capturing the left * in pinning and locking, but that would require capturing the left
* pointer when the page is initially read, and using it here, along * sibling block number when the page is initially read, and then
* with big changes to _bt_walk_left() and the code below. It is not * optimistically starting there (rather than pinning the page twice).
* clear whether this would be a win, since if the page immediately to * It is not clear that this would be worth the complexity.
* the left splits after we read this page and before we step left, we
* would need to visit more pages than with the current code.
*
* Note that if we change the code so that we drop the pin for a scan
* which uses a non-MVCC snapshot, we will need to modify the code for
* walking left, to allow for the possibility that a referenced page
* has been deleted. As long as the buffer is pinned or the snapshot
* is MVCC the page cannot move past the half-dead state to fully
* deleted.
*/ */
if (BTScanPosIsPinned(so->currPos)) if (BTScanPosIsPinned(so->currPos))
_bt_lockbuf(rel, so->currPos.buf, BT_READ); _bt_lockbuf(rel, so->currPos.buf, BT_READ);
@ -2243,9 +2234,8 @@ _bt_parallel_readpage(IndexScanDesc scan, BlockNumber blkno, ScanDirection dir)
* Returns InvalidBuffer if there is no page to the left (no lock is held * Returns InvalidBuffer if there is no page to the left (no lock is held
* in that case). * in that case).
* *
* When working on a non-leaf level, it is possible for the returned page * It is possible for the returned leaf page to be half-dead; caller must
* to be half-dead; the caller should check that condition and step left * check that condition and step left again when required.
* again if it's important.
*/ */
static Buffer static Buffer
_bt_walk_left(Relation rel, Buffer buf) _bt_walk_left(Relation rel, Buffer buf)
@ -2288,8 +2278,7 @@ _bt_walk_left(Relation rel, Buffer buf)
* anymore, not that its left sibling got split more than four times. * anymore, not that its left sibling got split more than four times.
* *
* Note that it is correct to test P_ISDELETED not P_IGNORE here, * Note that it is correct to test P_ISDELETED not P_IGNORE here,
* because half-dead pages are still in the sibling chain. Caller * because half-dead pages are still in the sibling chain.
* must reject half-dead pages if wanted.
*/ */
tries = 0; tries = 0;
for (;;) for (;;)