2008-12-03 14:05:22 +01:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* visibilitymap.c
|
|
|
|
* bitmap for tracking visibility of heap tuples
|
|
|
|
*
|
2013-01-01 23:15:01 +01:00
|
|
|
* Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
|
2008-12-03 14:05:22 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/access/heap/visibilitymap.c
|
2008-12-03 14:05:22 +01:00
|
|
|
*
|
|
|
|
* INTERFACE ROUTINES
|
Make the visibility map crash-safe.
This involves two main changes from the previous behavior. First,
when we set a bit in the visibility map, emit a new WAL record of type
XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and
the visibility map bit. Second, when inserting, updating, or deleting
a tuple, we can no longer get away with clearing the visibility map
bit after releasing the lock on the corresponding heap page, because
an intervening crash might leave the visibility map bit set and the
page-level bit clear. Making this work requires a bit of interface
refactoring.
In passing, a few minor but related cleanups: change the test in
visibilitymap_set and visibilitymap_clear to throw an error if the
wrong page (or no page) is pinned, rather than silently doing nothing;
this case should never occur. Also, remove duplicate definitions of
InvalidXLogRecPtr.
Patch by me, review by Noah Misch.
2011-06-22 05:04:40 +02:00
|
|
|
* visibilitymap_clear - clear a bit in the visibility map
|
|
|
|
* visibilitymap_pin - pin a map page for setting a bit
|
|
|
|
* visibilitymap_pin_ok - check whether correct map page is already pinned
|
|
|
|
* visibilitymap_set - set a bit in a previously pinned page
|
|
|
|
* visibilitymap_test - test if a bit is set
|
2012-06-10 21:20:04 +02:00
|
|
|
* visibilitymap_count - count number of bits set in visibility map
|
2011-10-14 23:23:01 +02:00
|
|
|
* visibilitymap_truncate - truncate the visibility map
|
2008-12-03 14:05:22 +01:00
|
|
|
*
|
|
|
|
* NOTES
|
|
|
|
*
|
|
|
|
* The visibility map is a bitmap with one bit per heap page. A set bit means
|
2010-02-26 03:01:40 +01:00
|
|
|
* that all tuples on the page are known visible to all transactions, and
|
2009-08-24 04:18:32 +02:00
|
|
|
* therefore the page doesn't need to be vacuumed. The map is conservative in
|
|
|
|
* the sense that we make sure that whenever a bit is set, we know the
|
|
|
|
* condition is true, but if a bit is not set, it might or might not be true.
|
2008-12-03 14:05:22 +01:00
|
|
|
*
|
2012-06-10 21:20:04 +02:00
|
|
|
* Clearing a visibility map bit is not separately WAL-logged. The callers
|
2008-12-03 14:05:22 +01:00
|
|
|
* must make sure that whenever a bit is cleared, the bit is cleared on WAL
|
2011-10-29 20:45:39 +02:00
|
|
|
* replay of the updating operation as well.
|
|
|
|
*
|
|
|
|
* When we *set* a visibility map during VACUUM, we must write WAL. This may
|
|
|
|
* seem counterintuitive, since the bit is basically a hint: if it is clear,
|
|
|
|
* it may still be the case that every tuple on the page is visible to all
|
|
|
|
* transactions; we just don't know that for certain. The difficulty is that
|
|
|
|
* there are two bits which are typically set together: the PD_ALL_VISIBLE bit
|
2012-06-10 21:20:04 +02:00
|
|
|
* on the page itself, and the visibility map bit. If a crash occurs after the
|
2011-10-29 20:45:39 +02:00
|
|
|
* visibility map page makes it to disk and before the updated heap page makes
|
2012-06-10 21:20:04 +02:00
|
|
|
* it to disk, redo must set the bit on the heap page. Otherwise, the next
|
2011-10-29 20:45:39 +02:00
|
|
|
* insert, update, or delete on the heap page will fail to realize that the
|
|
|
|
* visibility map bit must be cleared, possibly causing index-only scans to
|
|
|
|
* return wrong answers.
|
|
|
|
*
|
|
|
|
* VACUUM will normally skip pages for which the visibility map bit is set;
|
|
|
|
* such pages can't contain any dead tuples and therefore don't need vacuuming.
|
|
|
|
* The visibility map is not used for anti-wraparound vacuums, because
|
2008-12-03 14:05:22 +01:00
|
|
|
* an anti-wraparound vacuum needs to freeze tuples and observe the latest xid
|
2009-08-24 04:18:32 +02:00
|
|
|
* present in the table, even on pages that don't have any dead tuples.
|
2008-12-03 14:05:22 +01:00
|
|
|
*
|
|
|
|
* LOCKING
|
|
|
|
*
|
|
|
|
* In heapam.c, whenever a page is modified so that not all tuples on the
|
|
|
|
* page are visible to everyone anymore, the corresponding bit in the
|
2011-09-27 15:30:23 +02:00
|
|
|
* visibility map is cleared. In order to be crash-safe, we need to do this
|
|
|
|
* while still holding a lock on the heap page and in the same critical
|
|
|
|
* section that logs the page modification. However, we don't want to hold
|
|
|
|
* the buffer lock over any I/O that may be required to read in the visibility
|
|
|
|
* map page. To avoid this, we examine the heap page before locking it;
|
|
|
|
* if the page-level PD_ALL_VISIBLE bit is set, we pin the visibility map
|
2012-06-10 21:20:04 +02:00
|
|
|
* bit. Then, we lock the buffer. But this creates a race condition: there
|
2011-09-27 15:30:23 +02:00
|
|
|
* is a possibility that in the time it takes to lock the buffer, the
|
|
|
|
* PD_ALL_VISIBLE bit gets set. If that happens, we have to unlock the
|
2012-06-10 21:20:04 +02:00
|
|
|
* buffer, pin the visibility map page, and relock the buffer. This shouldn't
|
2011-09-27 15:30:23 +02:00
|
|
|
* happen often, because only VACUUM currently sets visibility map bits,
|
|
|
|
* and the race will only occur if VACUUM processes a given page at almost
|
|
|
|
* exactly the same time that someone tries to further modify it.
|
2008-12-03 14:05:22 +01:00
|
|
|
*
|
|
|
|
* To set a bit, you need to hold a lock on the heap page. That prevents
|
|
|
|
* the race condition where VACUUM sees that all tuples on the page are
|
|
|
|
* visible to everyone, but another backend modifies the page before VACUUM
|
|
|
|
* sets the bit in the visibility map.
|
|
|
|
*
|
|
|
|
* When a bit is set, the LSN of the visibility map page is updated to make
|
|
|
|
* sure that the visibility map update doesn't get written to disk before the
|
|
|
|
* WAL record of the changes that made it possible to set the bit is flushed.
|
2009-08-24 04:18:32 +02:00
|
|
|
* But when a bit is cleared, we don't have to do that because it's always
|
|
|
|
* safe to clear a bit in the map from correctness point of view.
|
2008-12-03 14:05:22 +01:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
|
2012-08-29 01:02:00 +02:00
|
|
|
#include "access/heapam_xlog.h"
|
2008-12-03 14:05:22 +01:00
|
|
|
#include "access/visibilitymap.h"
|
Make the visibility map crash-safe.
This involves two main changes from the previous behavior. First,
when we set a bit in the visibility map, emit a new WAL record of type
XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and
the visibility map bit. Second, when inserting, updating, or deleting
a tuple, we can no longer get away with clearing the visibility map
bit after releasing the lock on the corresponding heap page, because
an intervening crash might leave the visibility map bit set and the
page-level bit clear. Making this work requires a bit of interface
refactoring.
In passing, a few minor but related cleanups: change the test in
visibilitymap_set and visibilitymap_clear to throw an error if the
wrong page (or no page) is pinned, rather than silently doing nothing;
this case should never occur. Also, remove duplicate definitions of
InvalidXLogRecPtr.
Patch by me, review by Noah Misch.
2011-06-22 05:04:40 +02:00
|
|
|
#include "miscadmin.h"
|
2008-12-03 14:05:22 +01:00
|
|
|
#include "storage/bufmgr.h"
|
|
|
|
#include "storage/lmgr.h"
|
|
|
|
#include "storage/smgr.h"
|
2012-02-02 02:35:42 +01:00
|
|
|
#include "utils/inval.h"
|
2010-02-09 22:43:30 +01:00
|
|
|
|
2008-12-03 14:05:22 +01:00
|
|
|
|
|
|
|
/*#define TRACE_VISIBILITYMAP */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Size of the bitmap on each visibility map page, in bytes. There's no
|
2009-06-18 12:08:08 +02:00
|
|
|
* extra headers, so the whole page minus the standard page header is
|
|
|
|
* used for the bitmap.
|
2008-12-03 14:05:22 +01:00
|
|
|
*/
|
2008-12-06 18:31:37 +01:00
|
|
|
#define MAPSIZE (BLCKSZ - MAXALIGN(SizeOfPageHeaderData))
|
2008-12-03 14:05:22 +01:00
|
|
|
|
|
|
|
/* Number of bits allocated for each heap block. */
|
|
|
|
#define BITS_PER_HEAPBLOCK 1
|
|
|
|
|
|
|
|
/* Number of heap blocks we can represent in one byte. */
|
|
|
|
#define HEAPBLOCKS_PER_BYTE 8
|
|
|
|
|
|
|
|
/* Number of heap blocks we can represent in one visibility map page. */
|
|
|
|
#define HEAPBLOCKS_PER_PAGE (MAPSIZE * HEAPBLOCKS_PER_BYTE)
|
|
|
|
|
|
|
|
/* Mapping from heap block number to the right bit in the visibility map */
|
|
|
|
#define HEAPBLK_TO_MAPBLOCK(x) ((x) / HEAPBLOCKS_PER_PAGE)
|
|
|
|
#define HEAPBLK_TO_MAPBYTE(x) (((x) % HEAPBLOCKS_PER_PAGE) / HEAPBLOCKS_PER_BYTE)
|
|
|
|
#define HEAPBLK_TO_MAPBIT(x) ((x) % HEAPBLOCKS_PER_BYTE)
|
|
|
|
|
2011-10-14 23:23:01 +02:00
|
|
|
/* table for fast counting of set bits */
|
|
|
|
static const uint8 number_of_ones[256] = {
|
|
|
|
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
|
|
|
|
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
|
|
|
|
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
|
|
|
|
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
|
|
|
|
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
|
|
|
|
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
|
|
|
|
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
|
|
|
|
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
|
|
|
|
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
|
|
|
|
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
|
|
|
|
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
|
|
|
|
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
|
|
|
|
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
|
|
|
|
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
|
|
|
|
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
|
|
|
|
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8
|
|
|
|
};
|
|
|
|
|
2008-12-03 14:05:22 +01:00
|
|
|
/* prototypes for internal routines */
|
|
|
|
static Buffer vm_readbuf(Relation rel, BlockNumber blkno, bool extend);
|
|
|
|
static void vm_extend(Relation rel, BlockNumber nvmblocks);
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* visibilitymap_clear - clear a bit in visibility map
|
|
|
|
*
|
Make the visibility map crash-safe.
This involves two main changes from the previous behavior. First,
when we set a bit in the visibility map, emit a new WAL record of type
XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and
the visibility map bit. Second, when inserting, updating, or deleting
a tuple, we can no longer get away with clearing the visibility map
bit after releasing the lock on the corresponding heap page, because
an intervening crash might leave the visibility map bit set and the
page-level bit clear. Making this work requires a bit of interface
refactoring.
In passing, a few minor but related cleanups: change the test in
visibilitymap_set and visibilitymap_clear to throw an error if the
wrong page (or no page) is pinned, rather than silently doing nothing;
this case should never occur. Also, remove duplicate definitions of
InvalidXLogRecPtr.
Patch by me, review by Noah Misch.
2011-06-22 05:04:40 +02:00
|
|
|
* You must pass a buffer containing the correct map page to this function.
|
|
|
|
* Call visibilitymap_pin first to pin the right one. This function doesn't do
|
|
|
|
* any I/O.
|
2008-12-03 14:05:22 +01:00
|
|
|
*/
|
|
|
|
void
|
Make the visibility map crash-safe.
This involves two main changes from the previous behavior. First,
when we set a bit in the visibility map, emit a new WAL record of type
XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and
the visibility map bit. Second, when inserting, updating, or deleting
a tuple, we can no longer get away with clearing the visibility map
bit after releasing the lock on the corresponding heap page, because
an intervening crash might leave the visibility map bit set and the
page-level bit clear. Making this work requires a bit of interface
refactoring.
In passing, a few minor but related cleanups: change the test in
visibilitymap_set and visibilitymap_clear to throw an error if the
wrong page (or no page) is pinned, rather than silently doing nothing;
this case should never occur. Also, remove duplicate definitions of
InvalidXLogRecPtr.
Patch by me, review by Noah Misch.
2011-06-22 05:04:40 +02:00
|
|
|
visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer buf)
|
2008-12-03 14:05:22 +01:00
|
|
|
{
|
|
|
|
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
|
|
|
|
int mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
|
|
|
|
int mapBit = HEAPBLK_TO_MAPBIT(heapBlk);
|
|
|
|
uint8 mask = 1 << mapBit;
|
|
|
|
char *map;
|
|
|
|
|
|
|
|
#ifdef TRACE_VISIBILITYMAP
|
|
|
|
elog(DEBUG1, "vm_clear %s %d", RelationGetRelationName(rel), heapBlk);
|
|
|
|
#endif
|
|
|
|
|
Make the visibility map crash-safe.
This involves two main changes from the previous behavior. First,
when we set a bit in the visibility map, emit a new WAL record of type
XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and
the visibility map bit. Second, when inserting, updating, or deleting
a tuple, we can no longer get away with clearing the visibility map
bit after releasing the lock on the corresponding heap page, because
an intervening crash might leave the visibility map bit set and the
page-level bit clear. Making this work requires a bit of interface
refactoring.
In passing, a few minor but related cleanups: change the test in
visibilitymap_set and visibilitymap_clear to throw an error if the
wrong page (or no page) is pinned, rather than silently doing nothing;
this case should never occur. Also, remove duplicate definitions of
InvalidXLogRecPtr.
Patch by me, review by Noah Misch.
2011-06-22 05:04:40 +02:00
|
|
|
if (!BufferIsValid(buf) || BufferGetBlockNumber(buf) != mapBlock)
|
|
|
|
elog(ERROR, "wrong buffer passed to visibilitymap_clear");
|
2008-12-03 14:05:22 +01:00
|
|
|
|
Make the visibility map crash-safe.
This involves two main changes from the previous behavior. First,
when we set a bit in the visibility map, emit a new WAL record of type
XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and
the visibility map bit. Second, when inserting, updating, or deleting
a tuple, we can no longer get away with clearing the visibility map
bit after releasing the lock on the corresponding heap page, because
an intervening crash might leave the visibility map bit set and the
page-level bit clear. Making this work requires a bit of interface
refactoring.
In passing, a few minor but related cleanups: change the test in
visibilitymap_set and visibilitymap_clear to throw an error if the
wrong page (or no page) is pinned, rather than silently doing nothing;
this case should never occur. Also, remove duplicate definitions of
InvalidXLogRecPtr.
Patch by me, review by Noah Misch.
2011-06-22 05:04:40 +02:00
|
|
|
LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
|
|
|
|
map = PageGetContents(BufferGetPage(buf));
|
2008-12-03 14:05:22 +01:00
|
|
|
|
|
|
|
if (map[mapByte] & mask)
|
|
|
|
{
|
|
|
|
map[mapByte] &= ~mask;
|
|
|
|
|
Make the visibility map crash-safe.
This involves two main changes from the previous behavior. First,
when we set a bit in the visibility map, emit a new WAL record of type
XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and
the visibility map bit. Second, when inserting, updating, or deleting
a tuple, we can no longer get away with clearing the visibility map
bit after releasing the lock on the corresponding heap page, because
an intervening crash might leave the visibility map bit set and the
page-level bit clear. Making this work requires a bit of interface
refactoring.
In passing, a few minor but related cleanups: change the test in
visibilitymap_set and visibilitymap_clear to throw an error if the
wrong page (or no page) is pinned, rather than silently doing nothing;
this case should never occur. Also, remove duplicate definitions of
InvalidXLogRecPtr.
Patch by me, review by Noah Misch.
2011-06-22 05:04:40 +02:00
|
|
|
MarkBufferDirty(buf);
|
2008-12-03 14:05:22 +01:00
|
|
|
}
|
|
|
|
|
Make the visibility map crash-safe.
This involves two main changes from the previous behavior. First,
when we set a bit in the visibility map, emit a new WAL record of type
XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and
the visibility map bit. Second, when inserting, updating, or deleting
a tuple, we can no longer get away with clearing the visibility map
bit after releasing the lock on the corresponding heap page, because
an intervening crash might leave the visibility map bit set and the
page-level bit clear. Making this work requires a bit of interface
refactoring.
In passing, a few minor but related cleanups: change the test in
visibilitymap_set and visibilitymap_clear to throw an error if the
wrong page (or no page) is pinned, rather than silently doing nothing;
this case should never occur. Also, remove duplicate definitions of
InvalidXLogRecPtr.
Patch by me, review by Noah Misch.
2011-06-22 05:04:40 +02:00
|
|
|
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
|
2008-12-03 14:05:22 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* visibilitymap_pin - pin a map page for setting a bit
|
|
|
|
*
|
|
|
|
* Setting a bit in the visibility map is a two-phase operation. First, call
|
|
|
|
* visibilitymap_pin, to pin the visibility map page containing the bit for
|
|
|
|
* the heap page. Because that can require I/O to read the map page, you
|
|
|
|
* shouldn't hold a lock on the heap page while doing that. Then, call
|
|
|
|
* visibilitymap_set to actually set the bit.
|
|
|
|
*
|
|
|
|
* On entry, *buf should be InvalidBuffer or a valid buffer returned by
|
|
|
|
* an earlier call to visibilitymap_pin or visibilitymap_test on the same
|
|
|
|
* relation. On return, *buf is a valid buffer with the map page containing
|
2010-04-24 01:21:44 +02:00
|
|
|
* the bit for heapBlk.
|
2008-12-03 14:05:22 +01:00
|
|
|
*
|
|
|
|
* If the page doesn't exist in the map file yet, it is extended.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
visibilitymap_pin(Relation rel, BlockNumber heapBlk, Buffer *buf)
|
|
|
|
{
|
|
|
|
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
|
|
|
|
|
|
|
|
/* Reuse the old pinned buffer if possible */
|
|
|
|
if (BufferIsValid(*buf))
|
|
|
|
{
|
|
|
|
if (BufferGetBlockNumber(*buf) == mapBlock)
|
|
|
|
return;
|
|
|
|
|
|
|
|
ReleaseBuffer(*buf);
|
|
|
|
}
|
|
|
|
*buf = vm_readbuf(rel, mapBlock, true);
|
|
|
|
}
|
|
|
|
|
Make the visibility map crash-safe.
This involves two main changes from the previous behavior. First,
when we set a bit in the visibility map, emit a new WAL record of type
XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and
the visibility map bit. Second, when inserting, updating, or deleting
a tuple, we can no longer get away with clearing the visibility map
bit after releasing the lock on the corresponding heap page, because
an intervening crash might leave the visibility map bit set and the
page-level bit clear. Making this work requires a bit of interface
refactoring.
In passing, a few minor but related cleanups: change the test in
visibilitymap_set and visibilitymap_clear to throw an error if the
wrong page (or no page) is pinned, rather than silently doing nothing;
this case should never occur. Also, remove duplicate definitions of
InvalidXLogRecPtr.
Patch by me, review by Noah Misch.
2011-06-22 05:04:40 +02:00
|
|
|
/*
|
|
|
|
* visibilitymap_pin_ok - do we already have the correct page pinned?
|
|
|
|
*
|
|
|
|
* On entry, buf should be InvalidBuffer or a valid buffer returned by
|
|
|
|
* an earlier call to visibilitymap_pin or visibilitymap_test on the same
|
|
|
|
* relation. The return value indicates whether the buffer covers the
|
|
|
|
* given heapBlk.
|
|
|
|
*/
|
|
|
|
bool
|
|
|
|
visibilitymap_pin_ok(BlockNumber heapBlk, Buffer buf)
|
|
|
|
{
|
|
|
|
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
|
|
|
|
|
|
|
|
return BufferIsValid(buf) && BufferGetBlockNumber(buf) == mapBlock;
|
|
|
|
}
|
|
|
|
|
2008-12-03 14:05:22 +01:00
|
|
|
/*
|
|
|
|
* visibilitymap_set - set a bit on a previously pinned page
|
|
|
|
*
|
Make the visibility map crash-safe.
This involves two main changes from the previous behavior. First,
when we set a bit in the visibility map, emit a new WAL record of type
XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and
the visibility map bit. Second, when inserting, updating, or deleting
a tuple, we can no longer get away with clearing the visibility map
bit after releasing the lock on the corresponding heap page, because
an intervening crash might leave the visibility map bit set and the
page-level bit clear. Making this work requires a bit of interface
refactoring.
In passing, a few minor but related cleanups: change the test in
visibilitymap_set and visibilitymap_clear to throw an error if the
wrong page (or no page) is pinned, rather than silently doing nothing;
this case should never occur. Also, remove duplicate definitions of
InvalidXLogRecPtr.
Patch by me, review by Noah Misch.
2011-06-22 05:04:40 +02:00
|
|
|
* recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
|
2012-06-10 21:20:04 +02:00
|
|
|
* or InvalidXLogRecPtr in normal running. The page LSN is advanced to the
|
Make the visibility map crash-safe.
This involves two main changes from the previous behavior. First,
when we set a bit in the visibility map, emit a new WAL record of type
XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and
the visibility map bit. Second, when inserting, updating, or deleting
a tuple, we can no longer get away with clearing the visibility map
bit after releasing the lock on the corresponding heap page, because
an intervening crash might leave the visibility map bit set and the
page-level bit clear. Making this work requires a bit of interface
refactoring.
In passing, a few minor but related cleanups: change the test in
visibilitymap_set and visibilitymap_clear to throw an error if the
wrong page (or no page) is pinned, rather than silently doing nothing;
this case should never occur. Also, remove duplicate definitions of
InvalidXLogRecPtr.
Patch by me, review by Noah Misch.
2011-06-22 05:04:40 +02:00
|
|
|
* one provided; in normal running, we generate a new XLOG record and set the
|
2012-06-10 21:20:04 +02:00
|
|
|
* page LSN to that value. cutoff_xid is the largest xmin on the page being
|
2012-04-27 02:00:21 +02:00
|
|
|
* marked all-visible; it is needed for Hot Standby, and can be
|
|
|
|
* InvalidTransactionId if the page contains no tuples.
|
2008-12-03 14:05:22 +01:00
|
|
|
*
|
2013-03-22 14:54:07 +01:00
|
|
|
* Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
|
|
|
|
* this function. Except in recovery, caller should also pass the heap
|
|
|
|
* buffer. When checksums are enabled and we're not in recovery, we must add
|
|
|
|
* the heap buffer to the WAL chain to protect it from being torn.
|
|
|
|
*
|
Make the visibility map crash-safe.
This involves two main changes from the previous behavior. First,
when we set a bit in the visibility map, emit a new WAL record of type
XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and
the visibility map bit. Second, when inserting, updating, or deleting
a tuple, we can no longer get away with clearing the visibility map
bit after releasing the lock on the corresponding heap page, because
an intervening crash might leave the visibility map bit set and the
page-level bit clear. Making this work requires a bit of interface
refactoring.
In passing, a few minor but related cleanups: change the test in
visibilitymap_set and visibilitymap_clear to throw an error if the
wrong page (or no page) is pinned, rather than silently doing nothing;
this case should never occur. Also, remove duplicate definitions of
InvalidXLogRecPtr.
Patch by me, review by Noah Misch.
2011-06-22 05:04:40 +02:00
|
|
|
* You must pass a buffer containing the correct map page to this function.
|
|
|
|
* Call visibilitymap_pin first to pin the right one. This function doesn't do
|
|
|
|
* any I/O.
|
2008-12-03 14:05:22 +01:00
|
|
|
*/
|
|
|
|
void
|
2013-03-22 14:54:07 +01:00
|
|
|
visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
|
|
|
|
XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid)
|
2008-12-03 14:05:22 +01:00
|
|
|
{
|
|
|
|
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
|
|
|
|
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
|
|
|
|
uint8 mapBit = HEAPBLK_TO_MAPBIT(heapBlk);
|
|
|
|
Page page;
|
|
|
|
char *map;
|
|
|
|
|
|
|
|
#ifdef TRACE_VISIBILITYMAP
|
|
|
|
elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
|
|
|
|
#endif
|
|
|
|
|
Make the visibility map crash-safe.
This involves two main changes from the previous behavior. First,
when we set a bit in the visibility map, emit a new WAL record of type
XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and
the visibility map bit. Second, when inserting, updating, or deleting
a tuple, we can no longer get away with clearing the visibility map
bit after releasing the lock on the corresponding heap page, because
an intervening crash might leave the visibility map bit set and the
page-level bit clear. Making this work requires a bit of interface
refactoring.
In passing, a few minor but related cleanups: change the test in
visibilitymap_set and visibilitymap_clear to throw an error if the
wrong page (or no page) is pinned, rather than silently doing nothing;
this case should never occur. Also, remove duplicate definitions of
InvalidXLogRecPtr.
Patch by me, review by Noah Misch.
2011-06-22 05:04:40 +02:00
|
|
|
Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
|
2013-03-22 14:54:07 +01:00
|
|
|
Assert(InRecovery || BufferIsValid(heapBuf));
|
Make the visibility map crash-safe.
This involves two main changes from the previous behavior. First,
when we set a bit in the visibility map, emit a new WAL record of type
XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and
the visibility map bit. Second, when inserting, updating, or deleting
a tuple, we can no longer get away with clearing the visibility map
bit after releasing the lock on the corresponding heap page, because
an intervening crash might leave the visibility map bit set and the
page-level bit clear. Making this work requires a bit of interface
refactoring.
In passing, a few minor but related cleanups: change the test in
visibilitymap_set and visibilitymap_clear to throw an error if the
wrong page (or no page) is pinned, rather than silently doing nothing;
this case should never occur. Also, remove duplicate definitions of
InvalidXLogRecPtr.
Patch by me, review by Noah Misch.
2011-06-22 05:04:40 +02:00
|
|
|
|
2013-03-22 14:54:07 +01:00
|
|
|
/* Check that we have the right heap page pinned, if present */
|
|
|
|
if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
|
|
|
|
elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
|
2008-12-03 14:05:22 +01:00
|
|
|
|
2013-03-22 14:54:07 +01:00
|
|
|
/* Check that we have the right VM page pinned */
|
|
|
|
if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
|
|
|
|
elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
|
|
|
|
|
|
|
|
page = BufferGetPage(vmBuf);
|
2008-12-03 14:05:22 +01:00
|
|
|
map = PageGetContents(page);
|
2013-03-22 14:54:07 +01:00
|
|
|
LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
|
2008-12-03 14:05:22 +01:00
|
|
|
|
|
|
|
if (!(map[mapByte] & (1 << mapBit)))
|
|
|
|
{
|
Make the visibility map crash-safe.
This involves two main changes from the previous behavior. First,
when we set a bit in the visibility map, emit a new WAL record of type
XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and
the visibility map bit. Second, when inserting, updating, or deleting
a tuple, we can no longer get away with clearing the visibility map
bit after releasing the lock on the corresponding heap page, because
an intervening crash might leave the visibility map bit set and the
page-level bit clear. Making this work requires a bit of interface
refactoring.
In passing, a few minor but related cleanups: change the test in
visibilitymap_set and visibilitymap_clear to throw an error if the
wrong page (or no page) is pinned, rather than silently doing nothing;
this case should never occur. Also, remove duplicate definitions of
InvalidXLogRecPtr.
Patch by me, review by Noah Misch.
2011-06-22 05:04:40 +02:00
|
|
|
START_CRIT_SECTION();
|
|
|
|
|
2008-12-03 14:05:22 +01:00
|
|
|
map[mapByte] |= (1 << mapBit);
|
2013-03-22 14:54:07 +01:00
|
|
|
MarkBufferDirty(vmBuf);
|
2008-12-03 14:05:22 +01:00
|
|
|
|
Make the visibility map crash-safe.
This involves two main changes from the previous behavior. First,
when we set a bit in the visibility map, emit a new WAL record of type
XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and
the visibility map bit. Second, when inserting, updating, or deleting
a tuple, we can no longer get away with clearing the visibility map
bit after releasing the lock on the corresponding heap page, because
an intervening crash might leave the visibility map bit set and the
page-level bit clear. Making this work requires a bit of interface
refactoring.
In passing, a few minor but related cleanups: change the test in
visibilitymap_set and visibilitymap_clear to throw an error if the
wrong page (or no page) is pinned, rather than silently doing nothing;
this case should never occur. Also, remove duplicate definitions of
InvalidXLogRecPtr.
Patch by me, review by Noah Misch.
2011-06-22 05:04:40 +02:00
|
|
|
if (RelationNeedsWAL(rel))
|
|
|
|
{
|
|
|
|
if (XLogRecPtrIsInvalid(recptr))
|
2013-03-22 14:54:07 +01:00
|
|
|
{
|
|
|
|
Assert(!InRecovery);
|
|
|
|
recptr = log_heap_visible(rel->rd_node, heapBuf, vmBuf,
|
2012-04-27 02:00:21 +02:00
|
|
|
cutoff_xid);
|
2013-03-22 14:54:07 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If data checksums are enabled, we need to protect the heap
|
|
|
|
* page from being torn.
|
|
|
|
*/
|
|
|
|
if (DataChecksumsEnabled())
|
|
|
|
{
|
2013-05-29 22:58:43 +02:00
|
|
|
Page heapPage = BufferGetPage(heapBuf);
|
2013-03-22 14:54:07 +01:00
|
|
|
|
|
|
|
/* caller is expected to set PD_ALL_VISIBLE first */
|
|
|
|
Assert(PageIsAllVisible(heapPage));
|
|
|
|
PageSetLSN(heapPage, recptr);
|
|
|
|
}
|
|
|
|
}
|
2008-12-03 14:05:22 +01:00
|
|
|
PageSetLSN(page, recptr);
|
Make the visibility map crash-safe.
This involves two main changes from the previous behavior. First,
when we set a bit in the visibility map, emit a new WAL record of type
XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and
the visibility map bit. Second, when inserting, updating, or deleting
a tuple, we can no longer get away with clearing the visibility map
bit after releasing the lock on the corresponding heap page, because
an intervening crash might leave the visibility map bit set and the
page-level bit clear. Making this work requires a bit of interface
refactoring.
In passing, a few minor but related cleanups: change the test in
visibilitymap_set and visibilitymap_clear to throw an error if the
wrong page (or no page) is pinned, rather than silently doing nothing;
this case should never occur. Also, remove duplicate definitions of
InvalidXLogRecPtr.
Patch by me, review by Noah Misch.
2011-06-22 05:04:40 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
END_CRIT_SECTION();
|
2008-12-03 14:05:22 +01:00
|
|
|
}
|
|
|
|
|
2013-03-22 14:54:07 +01:00
|
|
|
LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
|
2008-12-03 14:05:22 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* visibilitymap_test - test if a bit is set
|
|
|
|
*
|
|
|
|
* Are all tuples on heapBlk visible to all, according to the visibility map?
|
|
|
|
*
|
|
|
|
* On entry, *buf should be InvalidBuffer or a valid buffer returned by an
|
|
|
|
* earlier call to visibilitymap_pin or visibilitymap_test on the same
|
|
|
|
* relation. On return, *buf is a valid buffer with the map page containing
|
2010-04-24 01:21:44 +02:00
|
|
|
* the bit for heapBlk, or InvalidBuffer. The caller is responsible for
|
2008-12-03 14:05:22 +01:00
|
|
|
* releasing *buf after it's done testing and setting bits.
|
Fix more crash-safe visibility map bugs, and improve comments.
In lazy_scan_heap, we could issue bogus warnings about incorrect
information in the visibility map, because we checked the visibility
map bit before locking the heap page, creating a race condition. Fix
by rechecking the visibility map bit before we complain. Rejigger
some related logic so that we rely on the possibly-outdated
all_visible_according_to_vm value as little as possible.
In heap_multi_insert, it's not safe to clear the visibility map bit
before beginning the critical section. The visibility map is not
crash-safe unless we treat clearing the bit as a critical operation.
Specifically, if the transaction were to error out after we set the
bit and before entering the critical section, we could end up writing
the heap page to disk (with the bit cleared) and crashing before the
visibility map page made it to disk. That would be bad. heap_insert
has this correct, but somehow the order of operations got rearranged
when heap_multi_insert was added.
Also, add some more comments to visibilitymap_test, lazy_scan_heap,
and IndexOnlyNext, expounding on concurrency issues.
Per extensive code review by Andres Freund, and further review by Tom
Lane, who also made the original report about the bogus warnings.
2012-06-07 18:25:41 +02:00
|
|
|
*
|
|
|
|
* NOTE: This function is typically called without a lock on the heap page,
|
2012-06-10 21:20:04 +02:00
|
|
|
* so somebody else could change the bit just after we look at it. In fact,
|
Fix more crash-safe visibility map bugs, and improve comments.
In lazy_scan_heap, we could issue bogus warnings about incorrect
information in the visibility map, because we checked the visibility
map bit before locking the heap page, creating a race condition. Fix
by rechecking the visibility map bit before we complain. Rejigger
some related logic so that we rely on the possibly-outdated
all_visible_according_to_vm value as little as possible.
In heap_multi_insert, it's not safe to clear the visibility map bit
before beginning the critical section. The visibility map is not
crash-safe unless we treat clearing the bit as a critical operation.
Specifically, if the transaction were to error out after we set the
bit and before entering the critical section, we could end up writing
the heap page to disk (with the bit cleared) and crashing before the
visibility map page made it to disk. That would be bad. heap_insert
has this correct, but somehow the order of operations got rearranged
when heap_multi_insert was added.
Also, add some more comments to visibilitymap_test, lazy_scan_heap,
and IndexOnlyNext, expounding on concurrency issues.
Per extensive code review by Andres Freund, and further review by Tom
Lane, who also made the original report about the bogus warnings.
2012-06-07 18:25:41 +02:00
|
|
|
* since we don't lock the visibility map page either, it's even possible that
|
|
|
|
* someone else could have changed the bit just before we look at it, but yet
|
2012-06-10 21:20:04 +02:00
|
|
|
* we might see the old value. It is the caller's responsibility to deal with
|
Fix more crash-safe visibility map bugs, and improve comments.
In lazy_scan_heap, we could issue bogus warnings about incorrect
information in the visibility map, because we checked the visibility
map bit before locking the heap page, creating a race condition. Fix
by rechecking the visibility map bit before we complain. Rejigger
some related logic so that we rely on the possibly-outdated
all_visible_according_to_vm value as little as possible.
In heap_multi_insert, it's not safe to clear the visibility map bit
before beginning the critical section. The visibility map is not
crash-safe unless we treat clearing the bit as a critical operation.
Specifically, if the transaction were to error out after we set the
bit and before entering the critical section, we could end up writing
the heap page to disk (with the bit cleared) and crashing before the
visibility map page made it to disk. That would be bad. heap_insert
has this correct, but somehow the order of operations got rearranged
when heap_multi_insert was added.
Also, add some more comments to visibilitymap_test, lazy_scan_heap,
and IndexOnlyNext, expounding on concurrency issues.
Per extensive code review by Andres Freund, and further review by Tom
Lane, who also made the original report about the bogus warnings.
2012-06-07 18:25:41 +02:00
|
|
|
* all concurrency issues!
|
2008-12-03 14:05:22 +01:00
|
|
|
*/
|
|
|
|
bool
|
|
|
|
visibilitymap_test(Relation rel, BlockNumber heapBlk, Buffer *buf)
|
|
|
|
{
|
|
|
|
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
|
|
|
|
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
|
|
|
|
uint8 mapBit = HEAPBLK_TO_MAPBIT(heapBlk);
|
|
|
|
bool result;
|
|
|
|
char *map;
|
|
|
|
|
|
|
|
#ifdef TRACE_VISIBILITYMAP
|
|
|
|
elog(DEBUG1, "vm_test %s %d", RelationGetRelationName(rel), heapBlk);
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/* Reuse the old pinned buffer if possible */
|
|
|
|
if (BufferIsValid(*buf))
|
|
|
|
{
|
|
|
|
if (BufferGetBlockNumber(*buf) != mapBlock)
|
|
|
|
{
|
|
|
|
ReleaseBuffer(*buf);
|
|
|
|
*buf = InvalidBuffer;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!BufferIsValid(*buf))
|
|
|
|
{
|
|
|
|
*buf = vm_readbuf(rel, mapBlock, false);
|
|
|
|
if (!BufferIsValid(*buf))
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
map = PageGetContents(BufferGetPage(*buf));
|
|
|
|
|
|
|
|
/*
|
Fix more crash-safe visibility map bugs, and improve comments.
In lazy_scan_heap, we could issue bogus warnings about incorrect
information in the visibility map, because we checked the visibility
map bit before locking the heap page, creating a race condition. Fix
by rechecking the visibility map bit before we complain. Rejigger
some related logic so that we rely on the possibly-outdated
all_visible_according_to_vm value as little as possible.
In heap_multi_insert, it's not safe to clear the visibility map bit
before beginning the critical section. The visibility map is not
crash-safe unless we treat clearing the bit as a critical operation.
Specifically, if the transaction were to error out after we set the
bit and before entering the critical section, we could end up writing
the heap page to disk (with the bit cleared) and crashing before the
visibility map page made it to disk. That would be bad. heap_insert
has this correct, but somehow the order of operations got rearranged
when heap_multi_insert was added.
Also, add some more comments to visibilitymap_test, lazy_scan_heap,
and IndexOnlyNext, expounding on concurrency issues.
Per extensive code review by Andres Freund, and further review by Tom
Lane, who also made the original report about the bogus warnings.
2012-06-07 18:25:41 +02:00
|
|
|
* A single-bit read is atomic. There could be memory-ordering effects
|
|
|
|
* here, but for performance reasons we make it the caller's job to worry
|
|
|
|
* about that.
|
2008-12-03 14:05:22 +01:00
|
|
|
*/
|
|
|
|
result = (map[mapByte] & (1 << mapBit)) ? true : false;
|
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2011-10-14 23:23:01 +02:00
|
|
|
/*
|
2012-06-10 21:20:04 +02:00
|
|
|
* visibilitymap_count - count number of bits set in visibility map
|
2011-10-14 23:23:01 +02:00
|
|
|
*
|
|
|
|
* Note: we ignore the possibility of race conditions when the table is being
|
|
|
|
* extended concurrently with the call. New pages added to the table aren't
|
|
|
|
* going to be marked all-visible, so they won't affect the result.
|
|
|
|
*/
|
|
|
|
BlockNumber
|
|
|
|
visibilitymap_count(Relation rel)
|
|
|
|
{
|
|
|
|
BlockNumber result = 0;
|
|
|
|
BlockNumber mapBlock;
|
|
|
|
|
2012-06-10 21:20:04 +02:00
|
|
|
for (mapBlock = 0;; mapBlock++)
|
2011-10-14 23:23:01 +02:00
|
|
|
{
|
|
|
|
Buffer mapBuffer;
|
|
|
|
unsigned char *map;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
/*
|
2012-06-10 21:20:04 +02:00
|
|
|
* Read till we fall off the end of the map. We assume that any extra
|
|
|
|
* bytes in the last page are zeroed, so we don't bother excluding
|
|
|
|
* them from the count.
|
2011-10-14 23:23:01 +02:00
|
|
|
*/
|
|
|
|
mapBuffer = vm_readbuf(rel, mapBlock, false);
|
|
|
|
if (!BufferIsValid(mapBuffer))
|
|
|
|
break;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We choose not to lock the page, since the result is going to be
|
|
|
|
* immediately stale anyway if anyone is concurrently setting or
|
|
|
|
* clearing bits, and we only really need an approximate value.
|
|
|
|
*/
|
|
|
|
map = (unsigned char *) PageGetContents(BufferGetPage(mapBuffer));
|
|
|
|
|
|
|
|
for (i = 0; i < MAPSIZE; i++)
|
|
|
|
{
|
|
|
|
result += number_of_ones[map[i]];
|
|
|
|
}
|
|
|
|
|
|
|
|
ReleaseBuffer(mapBuffer);
|
|
|
|
}
|
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2008-12-03 14:05:22 +01:00
|
|
|
/*
|
2010-02-09 22:43:30 +01:00
|
|
|
* visibilitymap_truncate - truncate the visibility map
|
|
|
|
*
|
|
|
|
* The caller must hold AccessExclusiveLock on the relation, to ensure that
|
|
|
|
* other backends receive the smgr invalidation event that this function sends
|
|
|
|
* before they access the VM again.
|
|
|
|
*
|
|
|
|
* nheapblocks is the new size of the heap.
|
2008-12-03 14:05:22 +01:00
|
|
|
*/
|
|
|
|
void
|
|
|
|
visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
|
|
|
|
{
|
|
|
|
BlockNumber newnblocks;
|
2009-06-11 16:49:15 +02:00
|
|
|
|
2008-12-03 14:05:22 +01:00
|
|
|
/* last remaining block, byte, and bit */
|
|
|
|
BlockNumber truncBlock = HEAPBLK_TO_MAPBLOCK(nheapblocks);
|
2009-06-11 16:49:15 +02:00
|
|
|
uint32 truncByte = HEAPBLK_TO_MAPBYTE(nheapblocks);
|
|
|
|
uint8 truncBit = HEAPBLK_TO_MAPBIT(nheapblocks);
|
2008-12-03 14:05:22 +01:00
|
|
|
|
|
|
|
#ifdef TRACE_VISIBILITYMAP
|
|
|
|
elog(DEBUG1, "vm_truncate %s %d", RelationGetRelationName(rel), nheapblocks);
|
|
|
|
#endif
|
|
|
|
|
2010-02-09 22:43:30 +01:00
|
|
|
RelationOpenSmgr(rel);
|
|
|
|
|
2008-12-03 14:05:22 +01:00
|
|
|
/*
|
|
|
|
* If no visibility map has been created yet for this relation, there's
|
|
|
|
* nothing to truncate.
|
|
|
|
*/
|
|
|
|
if (!smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM))
|
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Unless the new size is exactly at a visibility map page boundary, the
|
|
|
|
* tail bits in the last remaining map page, representing truncated heap
|
|
|
|
* blocks, need to be cleared. This is not only tidy, but also necessary
|
2009-06-11 16:49:15 +02:00
|
|
|
* because we don't get a chance to clear the bits if the heap is extended
|
|
|
|
* again.
|
2008-12-03 14:05:22 +01:00
|
|
|
*/
|
|
|
|
if (truncByte != 0 || truncBit != 0)
|
|
|
|
{
|
2009-06-11 16:49:15 +02:00
|
|
|
Buffer mapBuffer;
|
|
|
|
Page page;
|
|
|
|
char *map;
|
2008-12-03 14:05:22 +01:00
|
|
|
|
|
|
|
newnblocks = truncBlock + 1;
|
|
|
|
|
|
|
|
mapBuffer = vm_readbuf(rel, truncBlock, false);
|
|
|
|
if (!BufferIsValid(mapBuffer))
|
|
|
|
{
|
|
|
|
/* nothing to do, the file was already smaller */
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
page = BufferGetPage(mapBuffer);
|
|
|
|
map = PageGetContents(page);
|
|
|
|
|
|
|
|
LockBuffer(mapBuffer, BUFFER_LOCK_EXCLUSIVE);
|
|
|
|
|
|
|
|
/* Clear out the unwanted bytes. */
|
|
|
|
MemSet(&map[truncByte + 1], 0, MAPSIZE - (truncByte + 1));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Mask out the unwanted bits of the last remaining byte.
|
|
|
|
*
|
2009-06-11 16:49:15 +02:00
|
|
|
* ((1 << 0) - 1) = 00000000 ((1 << 1) - 1) = 00000001 ... ((1 << 6) -
|
|
|
|
* 1) = 00111111 ((1 << 7) - 1) = 01111111
|
2008-12-03 14:05:22 +01:00
|
|
|
*/
|
|
|
|
map[truncByte] &= (1 << truncBit) - 1;
|
|
|
|
|
|
|
|
MarkBufferDirty(mapBuffer);
|
|
|
|
UnlockReleaseBuffer(mapBuffer);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
newnblocks = truncBlock;
|
|
|
|
|
2010-02-09 22:43:30 +01:00
|
|
|
if (smgrnblocks(rel->rd_smgr, VISIBILITYMAP_FORKNUM) <= newnblocks)
|
2008-12-03 14:05:22 +01:00
|
|
|
{
|
|
|
|
/* nothing to do, the file was already smaller than requested size */
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2010-02-09 22:43:30 +01:00
|
|
|
/* Truncate the unused VM pages, and send smgr inval message */
|
2010-08-13 22:10:54 +02:00
|
|
|
smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks);
|
2008-12-03 14:05:22 +01:00
|
|
|
|
|
|
|
/*
|
2010-02-26 03:01:40 +01:00
|
|
|
* We might as well update the local smgr_vm_nblocks setting. smgrtruncate
|
|
|
|
* sent an smgr cache inval message, which will cause other backends to
|
|
|
|
* invalidate their copy of smgr_vm_nblocks, and this one too at the next
|
|
|
|
* command boundary. But this ensures it isn't outright wrong until then.
|
2008-12-03 14:05:22 +01:00
|
|
|
*/
|
2010-02-09 22:43:30 +01:00
|
|
|
if (rel->rd_smgr)
|
|
|
|
rel->rd_smgr->smgr_vm_nblocks = newnblocks;
|
2008-12-03 14:05:22 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Read a visibility map page.
|
|
|
|
*
|
|
|
|
* If the page doesn't exist, InvalidBuffer is returned, or if 'extend' is
|
|
|
|
* true, the visibility map file is extended.
|
|
|
|
*/
|
|
|
|
static Buffer
|
|
|
|
vm_readbuf(Relation rel, BlockNumber blkno, bool extend)
|
|
|
|
{
|
2009-06-11 16:49:15 +02:00
|
|
|
Buffer buf;
|
2008-12-03 14:05:22 +01:00
|
|
|
|
2012-02-02 02:35:42 +01:00
|
|
|
/*
|
2012-06-10 21:20:04 +02:00
|
|
|
* We might not have opened the relation at the smgr level yet, or we
|
|
|
|
* might have been forced to close it by a sinval message. The code below
|
|
|
|
* won't necessarily notice relation extension immediately when extend =
|
|
|
|
* false, so we rely on sinval messages to ensure that our ideas about the
|
|
|
|
* size of the map aren't too far out of date.
|
2012-02-02 02:35:42 +01:00
|
|
|
*/
|
2008-12-03 14:05:22 +01:00
|
|
|
RelationOpenSmgr(rel);
|
|
|
|
|
|
|
|
/*
|
2010-02-09 22:43:30 +01:00
|
|
|
* If we haven't cached the size of the visibility map fork yet, check it
|
2012-02-02 02:35:42 +01:00
|
|
|
* first.
|
2008-12-03 14:05:22 +01:00
|
|
|
*/
|
2012-02-02 02:35:42 +01:00
|
|
|
if (rel->rd_smgr->smgr_vm_nblocks == InvalidBlockNumber)
|
2008-12-03 14:05:22 +01:00
|
|
|
{
|
|
|
|
if (smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM))
|
2010-02-09 22:43:30 +01:00
|
|
|
rel->rd_smgr->smgr_vm_nblocks = smgrnblocks(rel->rd_smgr,
|
2010-02-26 03:01:40 +01:00
|
|
|
VISIBILITYMAP_FORKNUM);
|
2008-12-03 14:05:22 +01:00
|
|
|
else
|
2010-02-09 22:43:30 +01:00
|
|
|
rel->rd_smgr->smgr_vm_nblocks = 0;
|
2008-12-03 14:05:22 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Handle requests beyond EOF */
|
2010-02-09 22:43:30 +01:00
|
|
|
if (blkno >= rel->rd_smgr->smgr_vm_nblocks)
|
2008-12-03 14:05:22 +01:00
|
|
|
{
|
|
|
|
if (extend)
|
|
|
|
vm_extend(rel, blkno + 1);
|
|
|
|
else
|
|
|
|
return InvalidBuffer;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Use ZERO_ON_ERROR mode, and initialize the page if necessary. It's
|
|
|
|
* always safe to clear bits, so it's better to clear corrupt pages than
|
|
|
|
* error out.
|
|
|
|
*/
|
|
|
|
buf = ReadBufferExtended(rel, VISIBILITYMAP_FORKNUM, blkno,
|
|
|
|
RBM_ZERO_ON_ERROR, NULL);
|
|
|
|
if (PageIsNew(BufferGetPage(buf)))
|
|
|
|
PageInit(BufferGetPage(buf), BLCKSZ, 0);
|
|
|
|
return buf;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Ensure that the visibility map fork is at least vm_nblocks long, extending
|
|
|
|
* it if necessary with zeroed pages.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
vm_extend(Relation rel, BlockNumber vm_nblocks)
|
|
|
|
{
|
|
|
|
BlockNumber vm_nblocks_now;
|
2009-06-11 16:49:15 +02:00
|
|
|
Page pg;
|
2008-12-03 14:05:22 +01:00
|
|
|
|
|
|
|
pg = (Page) palloc(BLCKSZ);
|
|
|
|
PageInit(pg, BLCKSZ, 0);
|
|
|
|
|
|
|
|
/*
|
2009-06-11 16:49:15 +02:00
|
|
|
* We use the relation extension lock to lock out other backends trying to
|
|
|
|
* extend the visibility map at the same time. It also locks out extension
|
|
|
|
* of the main fork, unnecessarily, but extending the visibility map
|
|
|
|
* happens seldom enough that it doesn't seem worthwhile to have a
|
|
|
|
* separate lock tag type for it.
|
2008-12-03 14:05:22 +01:00
|
|
|
*
|
2009-06-11 16:49:15 +02:00
|
|
|
* Note that another backend might have extended or created the relation
|
2010-02-09 22:43:30 +01:00
|
|
|
* by the time we get the lock.
|
2008-12-03 14:05:22 +01:00
|
|
|
*/
|
|
|
|
LockRelationForExtension(rel, ExclusiveLock);
|
|
|
|
|
2010-02-09 22:43:30 +01:00
|
|
|
/* Might have to re-open if a cache flush happened */
|
|
|
|
RelationOpenSmgr(rel);
|
|
|
|
|
|
|
|
/*
|
2010-02-26 03:01:40 +01:00
|
|
|
* Create the file first if it doesn't exist. If smgr_vm_nblocks is
|
|
|
|
* positive then it must exist, no need for an smgrexists call.
|
2010-02-09 22:43:30 +01:00
|
|
|
*/
|
|
|
|
if ((rel->rd_smgr->smgr_vm_nblocks == 0 ||
|
|
|
|
rel->rd_smgr->smgr_vm_nblocks == InvalidBlockNumber) &&
|
|
|
|
!smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM))
|
2008-12-03 14:05:22 +01:00
|
|
|
smgrcreate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, false);
|
2010-02-09 22:43:30 +01:00
|
|
|
|
|
|
|
vm_nblocks_now = smgrnblocks(rel->rd_smgr, VISIBILITYMAP_FORKNUM);
|
2008-12-03 14:05:22 +01:00
|
|
|
|
2012-02-02 02:35:42 +01:00
|
|
|
/* Now extend the file */
|
2008-12-03 14:05:22 +01:00
|
|
|
while (vm_nblocks_now < vm_nblocks)
|
|
|
|
{
|
2013-03-22 14:54:07 +01:00
|
|
|
PageSetChecksumInplace(pg, vm_nblocks_now);
|
|
|
|
|
2008-12-03 14:05:22 +01:00
|
|
|
smgrextend(rel->rd_smgr, VISIBILITYMAP_FORKNUM, vm_nblocks_now,
|
2010-08-19 04:58:37 +02:00
|
|
|
(char *) pg, false);
|
2008-12-03 14:05:22 +01:00
|
|
|
vm_nblocks_now++;
|
|
|
|
}
|
|
|
|
|
2012-02-02 02:35:42 +01:00
|
|
|
/*
|
|
|
|
* Send a shared-inval message to force other backends to close any smgr
|
|
|
|
* references they may have for this rel, which we are about to change.
|
|
|
|
* This is a useful optimization because it means that backends don't have
|
|
|
|
* to keep checking for creation or extension of the file, which happens
|
|
|
|
* infrequently.
|
|
|
|
*/
|
|
|
|
CacheInvalidateSmgr(rel->rd_smgr->smgr_rnode);
|
|
|
|
|
2010-02-09 22:43:30 +01:00
|
|
|
/* Update local cache with the up-to-date size */
|
|
|
|
rel->rd_smgr->smgr_vm_nblocks = vm_nblocks_now;
|
|
|
|
|
2008-12-03 14:05:22 +01:00
|
|
|
UnlockRelationForExtension(rel, ExclusiveLock);
|
|
|
|
|
|
|
|
pfree(pg);
|
|
|
|
}
|