2005-04-20 00:35:18 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* nodeBitmapHeapscan.c
|
|
|
|
* Routines to support bitmapped scans of relations
|
|
|
|
*
|
|
|
|
* NOTE: it is critical that this plan type only be used with MVCC-compliant
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
* snapshots (ie, regular snapshots, not SnapshotAny or one of the other
|
2005-04-20 00:35:18 +02:00
|
|
|
* special snapshots). The reason is that since index and heap scans are
|
|
|
|
* decoupled, there can be no assurance that the index tuple prompting a
|
|
|
|
* visit to a particular heap TID still exists when the visit is made.
|
|
|
|
* Therefore the tuple might not exist anymore either (which is OK because
|
|
|
|
* heap_fetch will cope) --- but worse, the tuple slot could have been
|
|
|
|
* re-used for a newer tuple. With an MVCC snapshot the newer tuple is
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
* certain to fail the time qual and so it will not be mistakenly returned,
|
|
|
|
* but with anything else we might return a tuple that doesn't meet the
|
|
|
|
* required index qual conditions.
|
2005-04-20 00:35:18 +02:00
|
|
|
*
|
|
|
|
*
|
2021-01-02 19:06:25 +01:00
|
|
|
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
|
2005-04-20 00:35:18 +02:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/executor/nodeBitmapHeapscan.c
|
2005-04-20 00:35:18 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
/*
|
|
|
|
* INTERFACE ROUTINES
|
|
|
|
* ExecBitmapHeapScan scans a relation using bitmap info
|
|
|
|
* ExecBitmapHeapNext workhorse for above
|
|
|
|
* ExecInitBitmapHeapScan creates and initializes state info.
|
2010-07-12 19:01:06 +02:00
|
|
|
* ExecReScanBitmapHeapScan prepares to rescan the plan.
|
2005-04-20 00:35:18 +02:00
|
|
|
* ExecEndBitmapHeapScan releases all storage.
|
|
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
|
2017-02-23 21:57:08 +01:00
|
|
|
#include <math.h>
|
|
|
|
|
2008-06-19 02:46:06 +02:00
|
|
|
#include "access/relscan.h"
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
#include "access/tableam.h"
|
2008-09-11 16:01:10 +02:00
|
|
|
#include "access/transam.h"
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
#include "access/visibilitymap.h"
|
2005-04-20 00:35:18 +02:00
|
|
|
#include "executor/execdebug.h"
|
|
|
|
#include "executor/nodeBitmapHeapscan.h"
|
2017-07-26 02:37:17 +02:00
|
|
|
#include "miscadmin.h"
|
2005-10-06 04:29:23 +02:00
|
|
|
#include "pgstat.h"
|
2008-05-12 02:00:54 +02:00
|
|
|
#include "storage/bufmgr.h"
|
2011-06-29 20:40:27 +02:00
|
|
|
#include "storage/predicate.h"
|
2005-05-06 19:24:55 +02:00
|
|
|
#include "utils/memutils.h"
|
2011-02-23 18:18:09 +01:00
|
|
|
#include "utils/rel.h"
|
2008-03-26 19:48:59 +01:00
|
|
|
#include "utils/snapmgr.h"
|
2019-11-12 04:00:16 +01:00
|
|
|
#include "utils/spccache.h"
|
2005-04-20 00:35:18 +02:00
|
|
|
|
|
|
|
static TupleTableSlot *BitmapHeapNext(BitmapHeapScanState *node);
|
2019-07-31 09:05:21 +02:00
|
|
|
static inline void BitmapDoneInitializingSharedState(ParallelBitmapHeapState *pstate);
|
2017-03-02 14:17:40 +01:00
|
|
|
static inline void BitmapAdjustPrefetchIterator(BitmapHeapScanState *node,
|
|
|
|
TBMIterateResult *tbmres);
|
|
|
|
static inline void BitmapAdjustPrefetchTarget(BitmapHeapScanState *node);
|
|
|
|
static inline void BitmapPrefetch(BitmapHeapScanState *node,
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
TableScanDesc scan);
|
2019-07-31 09:05:21 +02:00
|
|
|
static bool BitmapShouldInitializeSharedState(ParallelBitmapHeapState *pstate);
|
2005-04-20 00:35:18 +02:00
|
|
|
|
|
|
|
|
|
|
|
/* ----------------------------------------------------------------
|
|
|
|
* BitmapHeapNext
|
|
|
|
*
|
|
|
|
* Retrieve next tuple from the BitmapHeapScan node's currentRelation
|
|
|
|
* ----------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
static TupleTableSlot *
|
|
|
|
BitmapHeapNext(BitmapHeapScanState *node)
|
|
|
|
{
|
|
|
|
ExprContext *econtext;
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
TableScanDesc scan;
|
2005-04-20 00:35:18 +02:00
|
|
|
TIDBitmap *tbm;
|
2017-03-08 18:43:39 +01:00
|
|
|
TBMIterator *tbmiterator = NULL;
|
|
|
|
TBMSharedIterator *shared_tbmiterator = NULL;
|
2005-04-20 00:35:18 +02:00
|
|
|
TBMIterateResult *tbmres;
|
|
|
|
TupleTableSlot *slot;
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
ParallelBitmapHeapState *pstate = node->pstate;
|
|
|
|
dsa_area *dsa = node->ss.ps.state->es_query_dsa;
|
2005-04-20 00:35:18 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* extract necessary information from index scan node
|
|
|
|
*/
|
|
|
|
econtext = node->ss.ps.ps_ExprContext;
|
|
|
|
slot = node->ss.ss_ScanTupleSlot;
|
2005-11-26 04:03:07 +01:00
|
|
|
scan = node->ss.ss_currentScanDesc;
|
2005-04-20 00:35:18 +02:00
|
|
|
tbm = node->tbm;
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
if (pstate == NULL)
|
|
|
|
tbmiterator = node->tbmiterator;
|
|
|
|
else
|
|
|
|
shared_tbmiterator = node->shared_tbmiterator;
|
2005-04-20 00:35:18 +02:00
|
|
|
tbmres = node->tbmres;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we haven't yet performed the underlying index scan, do it, and begin
|
2009-01-10 22:08:36 +01:00
|
|
|
* the iteration over the bitmap.
|
2009-01-12 06:10:45 +01:00
|
|
|
*
|
|
|
|
* For prefetching, we use *two* iterators, one for the pages we are
|
|
|
|
* actually scanning and another that runs ahead of the first for
|
|
|
|
* prefetching. node->prefetch_pages tracks exactly how many pages ahead
|
|
|
|
* the prefetch iterator is. Also, node->prefetch_target tracks the
|
|
|
|
* desired prefetch distance, which starts small and increases up to the
|
2015-09-08 17:51:42 +02:00
|
|
|
* node->prefetch_maximum. This is to avoid doing a lot of prefetching in
|
|
|
|
* a scan that stops after a few tuples because of a LIMIT.
|
2005-04-20 00:35:18 +02:00
|
|
|
*/
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
if (!node->initialized)
|
2005-04-20 00:35:18 +02:00
|
|
|
{
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
if (!pstate)
|
|
|
|
{
|
|
|
|
tbm = (TIDBitmap *) MultiExecProcNode(outerPlanState(node));
|
2005-04-20 00:35:18 +02:00
|
|
|
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
if (!tbm || !IsA(tbm, TIDBitmap))
|
|
|
|
elog(ERROR, "unrecognized result from subplan");
|
2005-04-20 00:35:18 +02:00
|
|
|
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
node->tbm = tbm;
|
|
|
|
node->tbmiterator = tbmiterator = tbm_begin_iterate(tbm);
|
|
|
|
node->tbmres = tbmres = NULL;
|
2009-01-12 06:10:45 +01:00
|
|
|
|
|
|
|
#ifdef USE_PREFETCH
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
if (node->prefetch_maximum > 0)
|
|
|
|
{
|
|
|
|
node->prefetch_iterator = tbm_begin_iterate(tbm);
|
|
|
|
node->prefetch_pages = 0;
|
|
|
|
node->prefetch_target = -1;
|
|
|
|
}
|
|
|
|
#endif /* USE_PREFETCH */
|
2009-01-12 06:10:45 +01:00
|
|
|
}
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* The leader will immediately come out of the function, but
|
|
|
|
* others will be blocked until leader populates the TBM and wakes
|
|
|
|
* them up.
|
|
|
|
*/
|
|
|
|
if (BitmapShouldInitializeSharedState(pstate))
|
|
|
|
{
|
|
|
|
tbm = (TIDBitmap *) MultiExecProcNode(outerPlanState(node));
|
|
|
|
if (!tbm || !IsA(tbm, TIDBitmap))
|
|
|
|
elog(ERROR, "unrecognized result from subplan");
|
|
|
|
|
|
|
|
node->tbm = tbm;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Prepare to iterate over the TBM. This will return the
|
|
|
|
* dsa_pointer of the iterator state which will be used by
|
|
|
|
* multiple processes to iterate jointly.
|
|
|
|
*/
|
|
|
|
pstate->tbmiterator = tbm_prepare_shared_iterate(tbm);
|
|
|
|
#ifdef USE_PREFETCH
|
|
|
|
if (node->prefetch_maximum > 0)
|
|
|
|
{
|
|
|
|
pstate->prefetch_iterator =
|
|
|
|
tbm_prepare_shared_iterate(tbm);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We don't need the mutex here as we haven't yet woke up
|
|
|
|
* others.
|
|
|
|
*/
|
|
|
|
pstate->prefetch_pages = 0;
|
|
|
|
pstate->prefetch_target = -1;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/* We have initialized the shared state so wake up others. */
|
|
|
|
BitmapDoneInitializingSharedState(pstate);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Allocate a private iterator and attach the shared state to it */
|
|
|
|
node->shared_tbmiterator = shared_tbmiterator =
|
|
|
|
tbm_attach_shared_iterate(dsa, pstate->tbmiterator);
|
|
|
|
node->tbmres = tbmres = NULL;
|
|
|
|
|
|
|
|
#ifdef USE_PREFETCH
|
|
|
|
if (node->prefetch_maximum > 0)
|
|
|
|
{
|
|
|
|
node->shared_prefetch_iterator =
|
|
|
|
tbm_attach_shared_iterate(dsa, pstate->prefetch_iterator);
|
|
|
|
}
|
2009-01-12 06:10:45 +01:00
|
|
|
#endif /* USE_PREFETCH */
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
}
|
|
|
|
node->initialized = true;
|
2005-04-20 00:35:18 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
for (;;)
|
|
|
|
{
|
2019-04-01 02:51:49 +02:00
|
|
|
bool skip_fetch;
|
2005-11-26 04:03:07 +01:00
|
|
|
|
2017-07-26 02:37:17 +02:00
|
|
|
CHECK_FOR_INTERRUPTS();
|
|
|
|
|
2005-04-20 00:35:18 +02:00
|
|
|
/*
|
|
|
|
* Get next page of results if needed
|
|
|
|
*/
|
|
|
|
if (tbmres == NULL)
|
|
|
|
{
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
if (!pstate)
|
|
|
|
node->tbmres = tbmres = tbm_iterate(tbmiterator);
|
|
|
|
else
|
|
|
|
node->tbmres = tbmres = tbm_shared_iterate(shared_tbmiterator);
|
2005-04-20 00:35:18 +02:00
|
|
|
if (tbmres == NULL)
|
|
|
|
{
|
|
|
|
/* no more entries in the bitmap */
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2017-03-02 14:17:40 +01:00
|
|
|
BitmapAdjustPrefetchIterator(node, tbmres);
|
2009-01-12 06:10:45 +01:00
|
|
|
|
2005-04-20 00:35:18 +02:00
|
|
|
/*
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
* We can skip fetching the heap page if we don't need any fields
|
|
|
|
* from the heap, and the bitmap entries don't need rechecking,
|
|
|
|
* and all tuples on the page are visible to our transaction.
|
2019-04-01 02:51:49 +02:00
|
|
|
*
|
|
|
|
* XXX: It's a layering violation that we do these checks above
|
|
|
|
* tableam, they should probably moved below it at some point.
|
2005-04-20 00:35:18 +02:00
|
|
|
*/
|
2019-04-01 02:51:49 +02:00
|
|
|
skip_fetch = (node->can_skip_fetch &&
|
|
|
|
!tbmres->recheck &&
|
|
|
|
VM_ALL_VISIBLE(node->ss.ss_currentRelation,
|
|
|
|
tbmres->blockno,
|
|
|
|
&node->vmbuffer));
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
|
2019-04-01 02:51:49 +02:00
|
|
|
if (skip_fetch)
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
{
|
2019-04-01 02:51:49 +02:00
|
|
|
/* can't be lossy in the skip_fetch case */
|
|
|
|
Assert(tbmres->ntuples >= 0);
|
|
|
|
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
/*
|
|
|
|
* The number of tuples on this page is put into
|
2019-04-01 02:51:49 +02:00
|
|
|
* node->return_empty_tuples.
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
*/
|
2019-04-01 02:51:49 +02:00
|
|
|
node->return_empty_tuples = tbmres->ntuples;
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
}
|
2019-04-01 02:51:49 +02:00
|
|
|
else if (!table_scan_bitmap_next_block(scan, tbmres))
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
{
|
2019-04-01 02:51:49 +02:00
|
|
|
/* AM doesn't think this block is valid, skip */
|
|
|
|
continue;
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
}
|
2005-04-20 00:35:18 +02:00
|
|
|
|
2014-01-13 20:42:16 +01:00
|
|
|
if (tbmres->ntuples >= 0)
|
|
|
|
node->exact_pages++;
|
|
|
|
else
|
|
|
|
node->lossy_pages++;
|
|
|
|
|
2017-03-02 14:17:40 +01:00
|
|
|
/* Adjust the prefetch target */
|
|
|
|
BitmapAdjustPrefetchTarget(node);
|
2005-04-20 00:35:18 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
2019-04-01 02:51:49 +02:00
|
|
|
* Continuing in previously obtained page.
|
2005-04-20 00:35:18 +02:00
|
|
|
*/
|
2009-01-12 06:10:45 +01:00
|
|
|
|
|
|
|
#ifdef USE_PREFETCH
|
2009-06-11 16:49:15 +02:00
|
|
|
|
2009-01-12 06:10:45 +01:00
|
|
|
/*
|
|
|
|
* Try to prefetch at least a few pages even before we get to the
|
|
|
|
* second page if we don't stop reading after the first tuple.
|
|
|
|
*/
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
if (!pstate)
|
|
|
|
{
|
|
|
|
if (node->prefetch_target < node->prefetch_maximum)
|
|
|
|
node->prefetch_target++;
|
|
|
|
}
|
|
|
|
else if (pstate->prefetch_target < node->prefetch_maximum)
|
|
|
|
{
|
|
|
|
/* take spinlock while updating shared state */
|
|
|
|
SpinLockAcquire(&pstate->mutex);
|
|
|
|
if (pstate->prefetch_target < node->prefetch_maximum)
|
|
|
|
pstate->prefetch_target++;
|
|
|
|
SpinLockRelease(&pstate->mutex);
|
|
|
|
}
|
2009-01-12 06:10:45 +01:00
|
|
|
#endif /* USE_PREFETCH */
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We issue prefetch requests *after* fetching the current page to try
|
|
|
|
* to avoid having prefetching interfere with the main I/O. Also, this
|
2009-01-12 17:00:41 +01:00
|
|
|
* should happen only when we have determined there is still something
|
|
|
|
* to do on the current page, else we may uselessly prefetch the same
|
|
|
|
* page we are just about to request for real.
|
2019-04-01 02:51:49 +02:00
|
|
|
*
|
|
|
|
* XXX: It's a layering violation that we do these checks above
|
|
|
|
* tableam, they should probably moved below it at some point.
|
2009-01-12 06:10:45 +01:00
|
|
|
*/
|
2017-03-02 14:17:40 +01:00
|
|
|
BitmapPrefetch(node, scan);
|
2005-04-20 00:35:18 +02:00
|
|
|
|
2019-04-01 02:51:49 +02:00
|
|
|
if (node->return_empty_tuples > 0)
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* If we don't have to fetch the tuple, just return nulls.
|
|
|
|
*/
|
|
|
|
ExecStoreAllNullTuple(slot);
|
2019-04-01 02:51:49 +02:00
|
|
|
|
|
|
|
if (--node->return_empty_tuples == 0)
|
|
|
|
{
|
|
|
|
/* no more tuples to return in the next round */
|
|
|
|
node->tbmres = tbmres = NULL;
|
|
|
|
}
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
2019-04-01 02:51:49 +02:00
|
|
|
* Attempt to fetch tuple from AM.
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
*/
|
2019-04-01 02:51:49 +02:00
|
|
|
if (!table_scan_bitmap_next_tuple(scan, tbmres, slot))
|
|
|
|
{
|
|
|
|
/* nothing more to look at on this page */
|
|
|
|
node->tbmres = tbmres = NULL;
|
|
|
|
continue;
|
|
|
|
}
|
2005-11-26 04:03:07 +01:00
|
|
|
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
/*
|
|
|
|
* If we are using lossy info, we have to recheck the qual
|
|
|
|
* conditions at every tuple.
|
|
|
|
*/
|
|
|
|
if (tbmres->recheck)
|
2005-11-26 04:03:07 +01:00
|
|
|
{
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
econtext->ecxt_scantuple = slot;
|
2018-01-29 21:16:53 +01:00
|
|
|
if (!ExecQualAndReset(node->bitmapqualorig, econtext))
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
{
|
|
|
|
/* Fails recheck, so drop it and loop back for another */
|
|
|
|
InstrCountFiltered2(node, 1);
|
|
|
|
ExecClearTuple(slot);
|
|
|
|
continue;
|
|
|
|
}
|
2005-11-26 04:03:07 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* OK to return this tuple */
|
|
|
|
return slot;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* if we get here it means we are at the end of the scan..
|
|
|
|
*/
|
|
|
|
return ExecClearTuple(slot);
|
|
|
|
}
|
|
|
|
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
/*
|
|
|
|
* BitmapDoneInitializingSharedState - Shared state is initialized
|
|
|
|
*
|
|
|
|
* By this time the leader has already populated the TBM and initialized the
|
|
|
|
* shared state so wake up other processes.
|
|
|
|
*/
|
|
|
|
static inline void
|
|
|
|
BitmapDoneInitializingSharedState(ParallelBitmapHeapState *pstate)
|
|
|
|
{
|
|
|
|
SpinLockAcquire(&pstate->mutex);
|
|
|
|
pstate->state = BM_FINISHED;
|
|
|
|
SpinLockRelease(&pstate->mutex);
|
|
|
|
ConditionVariableBroadcast(&pstate->cv);
|
|
|
|
}
|
|
|
|
|
2017-03-02 14:17:40 +01:00
|
|
|
/*
|
|
|
|
* BitmapAdjustPrefetchIterator - Adjust the prefetch iterator
|
|
|
|
*/
|
|
|
|
static inline void
|
|
|
|
BitmapAdjustPrefetchIterator(BitmapHeapScanState *node,
|
|
|
|
TBMIterateResult *tbmres)
|
|
|
|
{
|
|
|
|
#ifdef USE_PREFETCH
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
ParallelBitmapHeapState *pstate = node->pstate;
|
2017-03-02 14:17:40 +01:00
|
|
|
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
if (pstate == NULL)
|
2017-03-02 14:17:40 +01:00
|
|
|
{
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
TBMIterator *prefetch_iterator = node->prefetch_iterator;
|
|
|
|
|
|
|
|
if (node->prefetch_pages > 0)
|
|
|
|
{
|
|
|
|
/* The main iterator has closed the distance by one page */
|
|
|
|
node->prefetch_pages--;
|
|
|
|
}
|
|
|
|
else if (prefetch_iterator)
|
|
|
|
{
|
|
|
|
/* Do not let the prefetch iterator get behind the main one */
|
|
|
|
TBMIterateResult *tbmpre = tbm_iterate(prefetch_iterator);
|
|
|
|
|
|
|
|
if (tbmpre == NULL || tbmpre->blockno != tbmres->blockno)
|
|
|
|
elog(ERROR, "prefetch and main iterators are out of sync");
|
|
|
|
}
|
|
|
|
return;
|
2017-03-02 14:17:40 +01:00
|
|
|
}
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
|
|
|
|
if (node->prefetch_maximum > 0)
|
2017-03-02 14:17:40 +01:00
|
|
|
{
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
TBMSharedIterator *prefetch_iterator = node->shared_prefetch_iterator;
|
|
|
|
|
|
|
|
SpinLockAcquire(&pstate->mutex);
|
|
|
|
if (pstate->prefetch_pages > 0)
|
|
|
|
{
|
2017-04-04 15:03:41 +02:00
|
|
|
pstate->prefetch_pages--;
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
SpinLockRelease(&pstate->mutex);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Release the mutex before iterating */
|
|
|
|
SpinLockRelease(&pstate->mutex);
|
2017-03-02 14:17:40 +01:00
|
|
|
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
/*
|
|
|
|
* In case of shared mode, we can not ensure that the current
|
|
|
|
* blockno of the main iterator and that of the prefetch iterator
|
|
|
|
* are same. It's possible that whatever blockno we are
|
|
|
|
* prefetching will be processed by another process. Therefore,
|
|
|
|
* we don't validate the blockno here as we do in non-parallel
|
|
|
|
* case.
|
|
|
|
*/
|
|
|
|
if (prefetch_iterator)
|
|
|
|
tbm_shared_iterate(prefetch_iterator);
|
|
|
|
}
|
2017-03-02 14:17:40 +01:00
|
|
|
}
|
|
|
|
#endif /* USE_PREFETCH */
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* BitmapAdjustPrefetchTarget - Adjust the prefetch target
|
|
|
|
*
|
|
|
|
* Increase prefetch target if it's not yet at the max. Note that
|
|
|
|
* we will increase it to zero after fetching the very first
|
|
|
|
* page/tuple, then to one after the second tuple is fetched, then
|
|
|
|
* it doubles as later pages are fetched.
|
|
|
|
*/
|
|
|
|
static inline void
|
|
|
|
BitmapAdjustPrefetchTarget(BitmapHeapScanState *node)
|
|
|
|
{
|
|
|
|
#ifdef USE_PREFETCH
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
ParallelBitmapHeapState *pstate = node->pstate;
|
|
|
|
|
|
|
|
if (pstate == NULL)
|
|
|
|
{
|
|
|
|
if (node->prefetch_target >= node->prefetch_maximum)
|
|
|
|
/* don't increase any further */ ;
|
|
|
|
else if (node->prefetch_target >= node->prefetch_maximum / 2)
|
|
|
|
node->prefetch_target = node->prefetch_maximum;
|
|
|
|
else if (node->prefetch_target > 0)
|
|
|
|
node->prefetch_target *= 2;
|
|
|
|
else
|
|
|
|
node->prefetch_target++;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Do an unlocked check first to save spinlock acquisitions. */
|
|
|
|
if (pstate->prefetch_target < node->prefetch_maximum)
|
|
|
|
{
|
|
|
|
SpinLockAcquire(&pstate->mutex);
|
|
|
|
if (pstate->prefetch_target >= node->prefetch_maximum)
|
|
|
|
/* don't increase any further */ ;
|
|
|
|
else if (pstate->prefetch_target >= node->prefetch_maximum / 2)
|
|
|
|
pstate->prefetch_target = node->prefetch_maximum;
|
|
|
|
else if (pstate->prefetch_target > 0)
|
|
|
|
pstate->prefetch_target *= 2;
|
|
|
|
else
|
|
|
|
pstate->prefetch_target++;
|
|
|
|
SpinLockRelease(&pstate->mutex);
|
|
|
|
}
|
2017-03-02 14:17:40 +01:00
|
|
|
#endif /* USE_PREFETCH */
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* BitmapPrefetch - Prefetch, if prefetch_pages are behind prefetch_target
|
|
|
|
*/
|
|
|
|
static inline void
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
BitmapPrefetch(BitmapHeapScanState *node, TableScanDesc scan)
|
2017-03-02 14:17:40 +01:00
|
|
|
{
|
|
|
|
#ifdef USE_PREFETCH
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
ParallelBitmapHeapState *pstate = node->pstate;
|
2017-03-02 14:17:40 +01:00
|
|
|
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
if (pstate == NULL)
|
2017-03-02 14:17:40 +01:00
|
|
|
{
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
TBMIterator *prefetch_iterator = node->prefetch_iterator;
|
|
|
|
|
|
|
|
if (prefetch_iterator)
|
2017-03-02 14:17:40 +01:00
|
|
|
{
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
while (node->prefetch_pages < node->prefetch_target)
|
|
|
|
{
|
|
|
|
TBMIterateResult *tbmpre = tbm_iterate(prefetch_iterator);
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
bool skip_fetch;
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
|
|
|
|
if (tbmpre == NULL)
|
|
|
|
{
|
|
|
|
/* No more pages to prefetch */
|
|
|
|
tbm_end_iterate(prefetch_iterator);
|
|
|
|
node->prefetch_iterator = NULL;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
node->prefetch_pages++;
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If we expect not to have to actually read this heap page,
|
|
|
|
* skip this prefetch call, but continue to run the prefetch
|
|
|
|
* logic normally. (Would it be better not to increment
|
|
|
|
* prefetch_pages?)
|
|
|
|
*
|
|
|
|
* This depends on the assumption that the index AM will
|
|
|
|
* report the same recheck flag for this future heap page as
|
|
|
|
* it did for the current heap page; which is not a certainty
|
|
|
|
* but is true in many cases.
|
|
|
|
*/
|
|
|
|
skip_fetch = (node->can_skip_fetch &&
|
|
|
|
(node->tbmres ? !node->tbmres->recheck : false) &&
|
|
|
|
VM_ALL_VISIBLE(node->ss.ss_currentRelation,
|
|
|
|
tbmpre->blockno,
|
|
|
|
&node->pvmbuffer));
|
|
|
|
|
|
|
|
if (!skip_fetch)
|
|
|
|
PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, tbmpre->blockno);
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return;
|
|
|
|
}
|
2017-03-02 14:17:40 +01:00
|
|
|
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
if (pstate->prefetch_pages < pstate->prefetch_target)
|
|
|
|
{
|
|
|
|
TBMSharedIterator *prefetch_iterator = node->shared_prefetch_iterator;
|
|
|
|
|
|
|
|
if (prefetch_iterator)
|
|
|
|
{
|
|
|
|
while (1)
|
2017-03-02 14:17:40 +01:00
|
|
|
{
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
TBMIterateResult *tbmpre;
|
|
|
|
bool do_prefetch = false;
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
bool skip_fetch;
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Recheck under the mutex. If some other process has already
|
|
|
|
* done enough prefetching then we need not to do anything.
|
|
|
|
*/
|
|
|
|
SpinLockAcquire(&pstate->mutex);
|
|
|
|
if (pstate->prefetch_pages < pstate->prefetch_target)
|
|
|
|
{
|
|
|
|
pstate->prefetch_pages++;
|
|
|
|
do_prefetch = true;
|
|
|
|
}
|
|
|
|
SpinLockRelease(&pstate->mutex);
|
|
|
|
|
|
|
|
if (!do_prefetch)
|
|
|
|
return;
|
|
|
|
|
|
|
|
tbmpre = tbm_shared_iterate(prefetch_iterator);
|
|
|
|
if (tbmpre == NULL)
|
|
|
|
{
|
|
|
|
/* No more pages to prefetch */
|
|
|
|
tbm_end_shared_iterate(prefetch_iterator);
|
|
|
|
node->shared_prefetch_iterator = NULL;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
/* As above, skip prefetch if we expect not to need page */
|
|
|
|
skip_fetch = (node->can_skip_fetch &&
|
|
|
|
(node->tbmres ? !node->tbmres->recheck : false) &&
|
|
|
|
VM_ALL_VISIBLE(node->ss.ss_currentRelation,
|
|
|
|
tbmpre->blockno,
|
|
|
|
&node->pvmbuffer));
|
|
|
|
|
|
|
|
if (!skip_fetch)
|
|
|
|
PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, tbmpre->blockno);
|
2017-03-02 14:17:40 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
#endif /* USE_PREFETCH */
|
|
|
|
}
|
|
|
|
|
Re-implement EvalPlanQual processing to improve its performance and eliminate
a lot of strange behaviors that occurred in join cases. We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row. The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.
Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested. To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param. Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.
This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the
duplicate-output-tuple problem. It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
2009-10-26 03:26:45 +01:00
|
|
|
/*
|
|
|
|
* BitmapHeapRecheck -- access method routine to recheck a tuple in EvalPlanQual
|
|
|
|
*/
|
|
|
|
static bool
|
|
|
|
BitmapHeapRecheck(BitmapHeapScanState *node, TupleTableSlot *slot)
|
|
|
|
{
|
|
|
|
ExprContext *econtext;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* extract necessary information from index scan node
|
|
|
|
*/
|
|
|
|
econtext = node->ss.ps.ps_ExprContext;
|
|
|
|
|
|
|
|
/* Does the tuple meet the original qual conditions? */
|
|
|
|
econtext->ecxt_scantuple = slot;
|
2018-01-29 21:16:53 +01:00
|
|
|
return ExecQualAndReset(node->bitmapqualorig, econtext);
|
Re-implement EvalPlanQual processing to improve its performance and eliminate
a lot of strange behaviors that occurred in join cases. We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row. The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.
Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested. To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param. Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.
This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the
duplicate-output-tuple problem. It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
2009-10-26 03:26:45 +01:00
|
|
|
}
|
|
|
|
|
2005-04-20 00:35:18 +02:00
|
|
|
/* ----------------------------------------------------------------
|
|
|
|
* ExecBitmapHeapScan(node)
|
|
|
|
* ----------------------------------------------------------------
|
|
|
|
*/
|
2017-07-17 09:33:49 +02:00
|
|
|
static TupleTableSlot *
|
|
|
|
ExecBitmapHeapScan(PlanState *pstate)
|
2005-04-20 00:35:18 +02:00
|
|
|
{
|
2017-07-17 09:33:49 +02:00
|
|
|
BitmapHeapScanState *node = castNode(BitmapHeapScanState, pstate);
|
|
|
|
|
Re-implement EvalPlanQual processing to improve its performance and eliminate
a lot of strange behaviors that occurred in join cases. We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row. The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.
Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested. To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param. Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.
This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the
duplicate-output-tuple problem. It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
2009-10-26 03:26:45 +01:00
|
|
|
return ExecScan(&node->ss,
|
|
|
|
(ExecScanAccessMtd) BitmapHeapNext,
|
|
|
|
(ExecScanRecheckMtd) BitmapHeapRecheck);
|
2005-04-20 00:35:18 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/* ----------------------------------------------------------------
|
2010-07-12 19:01:06 +02:00
|
|
|
* ExecReScanBitmapHeapScan(node)
|
2005-04-20 00:35:18 +02:00
|
|
|
* ----------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
void
|
2010-07-12 19:01:06 +02:00
|
|
|
ExecReScanBitmapHeapScan(BitmapHeapScanState *node)
|
2005-04-20 00:35:18 +02:00
|
|
|
{
|
2015-05-04 22:13:07 +02:00
|
|
|
PlanState *outerPlan = outerPlanState(node);
|
|
|
|
|
2005-04-20 00:35:18 +02:00
|
|
|
/* rescan to release any page pin */
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
table_rescan(node->ss.ss_currentScanDesc, NULL);
|
2005-04-20 00:35:18 +02:00
|
|
|
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
/* release bitmaps and buffers if any */
|
2009-01-10 22:08:36 +01:00
|
|
|
if (node->tbmiterator)
|
|
|
|
tbm_end_iterate(node->tbmiterator);
|
2009-01-12 06:10:45 +01:00
|
|
|
if (node->prefetch_iterator)
|
|
|
|
tbm_end_iterate(node->prefetch_iterator);
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
if (node->shared_tbmiterator)
|
|
|
|
tbm_end_shared_iterate(node->shared_tbmiterator);
|
|
|
|
if (node->shared_prefetch_iterator)
|
|
|
|
tbm_end_shared_iterate(node->shared_prefetch_iterator);
|
2005-04-20 00:35:18 +02:00
|
|
|
if (node->tbm)
|
|
|
|
tbm_free(node->tbm);
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
if (node->vmbuffer != InvalidBuffer)
|
|
|
|
ReleaseBuffer(node->vmbuffer);
|
|
|
|
if (node->pvmbuffer != InvalidBuffer)
|
|
|
|
ReleaseBuffer(node->pvmbuffer);
|
2005-04-20 00:35:18 +02:00
|
|
|
node->tbm = NULL;
|
2009-01-10 22:08:36 +01:00
|
|
|
node->tbmiterator = NULL;
|
2005-04-20 00:35:18 +02:00
|
|
|
node->tbmres = NULL;
|
2009-01-12 06:10:45 +01:00
|
|
|
node->prefetch_iterator = NULL;
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
node->initialized = false;
|
|
|
|
node->shared_tbmiterator = NULL;
|
|
|
|
node->shared_prefetch_iterator = NULL;
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
node->vmbuffer = InvalidBuffer;
|
|
|
|
node->pvmbuffer = InvalidBuffer;
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
|
Re-implement EvalPlanQual processing to improve its performance and eliminate
a lot of strange behaviors that occurred in join cases. We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row. The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.
Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested. To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param. Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.
This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the
duplicate-output-tuple problem. It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
2009-10-26 03:26:45 +01:00
|
|
|
ExecScanReScan(&node->ss);
|
|
|
|
|
2005-04-20 00:35:18 +02:00
|
|
|
/*
|
2010-07-12 19:01:06 +02:00
|
|
|
* if chgParam of subnode is not null then plan will be re-scanned by
|
|
|
|
* first ExecProcNode.
|
2005-04-20 00:35:18 +02:00
|
|
|
*/
|
2015-05-04 22:13:07 +02:00
|
|
|
if (outerPlan->chgParam == NULL)
|
|
|
|
ExecReScan(outerPlan);
|
2005-04-20 00:35:18 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/* ----------------------------------------------------------------
|
|
|
|
* ExecEndBitmapHeapScan
|
|
|
|
* ----------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
ExecEndBitmapHeapScan(BitmapHeapScanState *node)
|
|
|
|
{
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
TableScanDesc scanDesc;
|
2005-04-20 00:35:18 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* extract information from the node
|
|
|
|
*/
|
|
|
|
scanDesc = node->ss.ss_currentScanDesc;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Free the exprcontext
|
|
|
|
*/
|
|
|
|
ExecFreeExprContext(&node->ss.ps);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* clear out tuple table slots
|
|
|
|
*/
|
Don't require return slots for nodes without projection.
In a lot of nodes the return slot is not required. That can either be
because the node doesn't do any projection (say an Append node), or
because the node does perform projections but the projection is
optimized away because the projection would yield an identical row.
Slots aren't that small, especially for wide rows, so it's worthwhile
to avoid creating them. It's not possible to just skip creating the
slot - it's currently used to determine the tuple descriptor returned
by ExecGetResultType(). So separate the determination of the result
type from the slot creation. The work previously done internally
ExecInitResultTupleSlotTL() can now also be done separately with
ExecInitResultTypeTL() and ExecInitResultSlot(). That way nodes that
aren't guaranteed to need a result slot, can use
ExecInitResultTypeTL() to determine the result type of the node, and
ExecAssignScanProjectionInfo() (via
ExecConditionalAssignProjectionInfo()) determines that a result slot
is needed, it is created with ExecInitResultSlot().
Besides the advantage of avoiding to create slots that then are
unused, this is necessary preparation for later patches around tuple
table slot abstraction. In particular separating the return descriptor
and slot is a prerequisite to allow JITing of tuple deforming with
knowledge of the underlying tuple format, and to avoid unnecessarily
creating JITed tuple deforming for virtual slots.
This commit removes a redundant argument from
ExecInitResultTupleSlotTL(). While this commit touches a lot of the
relevant lines anyway, it'd normally still not worthwhile to cause
breakage, except that aforementioned later commits will touch *all*
ExecInitResultTupleSlotTL() callers anyway (but fits worse
thematically).
Author: Andres Freund
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-10 02:19:39 +01:00
|
|
|
if (node->ss.ps.ps_ResultTupleSlot)
|
|
|
|
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
|
2005-04-20 00:35:18 +02:00
|
|
|
ExecClearTuple(node->ss.ss_ScanTupleSlot);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* close down subplans
|
|
|
|
*/
|
|
|
|
ExecEndNode(outerPlanState(node));
|
|
|
|
|
|
|
|
/*
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
* release bitmaps and buffers if any
|
2005-04-20 00:35:18 +02:00
|
|
|
*/
|
2009-01-10 22:08:36 +01:00
|
|
|
if (node->tbmiterator)
|
|
|
|
tbm_end_iterate(node->tbmiterator);
|
2009-01-12 06:10:45 +01:00
|
|
|
if (node->prefetch_iterator)
|
|
|
|
tbm_end_iterate(node->prefetch_iterator);
|
2005-04-20 00:35:18 +02:00
|
|
|
if (node->tbm)
|
|
|
|
tbm_free(node->tbm);
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
if (node->shared_tbmiterator)
|
|
|
|
tbm_end_shared_iterate(node->shared_tbmiterator);
|
|
|
|
if (node->shared_prefetch_iterator)
|
|
|
|
tbm_end_shared_iterate(node->shared_prefetch_iterator);
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
if (node->vmbuffer != InvalidBuffer)
|
|
|
|
ReleaseBuffer(node->vmbuffer);
|
|
|
|
if (node->pvmbuffer != InvalidBuffer)
|
|
|
|
ReleaseBuffer(node->pvmbuffer);
|
2005-04-20 00:35:18 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* close heap scan
|
|
|
|
*/
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
table_endscan(scanDesc);
|
2005-04-20 00:35:18 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/* ----------------------------------------------------------------
|
|
|
|
* ExecInitBitmapHeapScan
|
|
|
|
*
|
|
|
|
* Initializes the scan's state information.
|
|
|
|
* ----------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
BitmapHeapScanState *
|
2006-02-28 05:10:28 +01:00
|
|
|
ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
|
2005-04-20 00:35:18 +02:00
|
|
|
{
|
|
|
|
BitmapHeapScanState *scanstate;
|
|
|
|
Relation currentRelation;
|
|
|
|
|
2006-02-28 05:10:28 +01:00
|
|
|
/* check for unsupported flags */
|
|
|
|
Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
|
|
|
|
|
2005-11-26 04:03:07 +01:00
|
|
|
/*
|
|
|
|
* Assert caller didn't ask for an unsafe snapshot --- see comments at
|
|
|
|
* head of file.
|
|
|
|
*/
|
|
|
|
Assert(IsMVCCSnapshot(estate->es_snapshot));
|
|
|
|
|
2005-04-20 00:35:18 +02:00
|
|
|
/*
|
|
|
|
* create state structure
|
|
|
|
*/
|
|
|
|
scanstate = makeNode(BitmapHeapScanState);
|
|
|
|
scanstate->ss.ps.plan = (Plan *) node;
|
|
|
|
scanstate->ss.ps.state = estate;
|
2017-07-17 09:33:49 +02:00
|
|
|
scanstate->ss.ps.ExecProcNode = ExecBitmapHeapScan;
|
2005-04-20 00:35:18 +02:00
|
|
|
|
|
|
|
scanstate->tbm = NULL;
|
2009-01-10 22:08:36 +01:00
|
|
|
scanstate->tbmiterator = NULL;
|
2005-04-20 00:35:18 +02:00
|
|
|
scanstate->tbmres = NULL;
|
2019-04-01 02:51:49 +02:00
|
|
|
scanstate->return_empty_tuples = 0;
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
scanstate->vmbuffer = InvalidBuffer;
|
|
|
|
scanstate->pvmbuffer = InvalidBuffer;
|
2014-01-13 20:42:16 +01:00
|
|
|
scanstate->exact_pages = 0;
|
|
|
|
scanstate->lossy_pages = 0;
|
2009-01-12 06:10:45 +01:00
|
|
|
scanstate->prefetch_iterator = NULL;
|
|
|
|
scanstate->prefetch_pages = 0;
|
|
|
|
scanstate->prefetch_target = 0;
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
scanstate->pscan_len = 0;
|
|
|
|
scanstate->initialized = false;
|
|
|
|
scanstate->shared_tbmiterator = NULL;
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
scanstate->shared_prefetch_iterator = NULL;
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
scanstate->pstate = NULL;
|
2005-04-20 00:35:18 +02:00
|
|
|
|
Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.
Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next. While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.
This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet. In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.
Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.
Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 22:38:12 +01:00
|
|
|
/*
|
|
|
|
* We can potentially skip fetching heap pages if we do not need any
|
|
|
|
* columns of the table, either for checking non-indexable quals or for
|
|
|
|
* returning data. This test is a bit simplistic, as it checks the
|
|
|
|
* stronger condition that there's no qual or return tlist at all. But in
|
|
|
|
* most cases it's probably not worth working harder than that.
|
|
|
|
*/
|
|
|
|
scanstate->can_skip_fetch = (node->scan.plan.qual == NIL &&
|
|
|
|
node->scan.plan.targetlist == NIL);
|
|
|
|
|
2005-04-20 00:35:18 +02:00
|
|
|
/*
|
|
|
|
* Miscellaneous initialization
|
|
|
|
*
|
|
|
|
* create expression context for node
|
|
|
|
*/
|
|
|
|
ExecAssignExprContext(estate, &scanstate->ss.ps);
|
|
|
|
|
|
|
|
/*
|
2018-10-06 21:49:37 +02:00
|
|
|
* open the scan relation
|
2005-04-20 00:35:18 +02:00
|
|
|
*/
|
2018-02-17 06:17:38 +01:00
|
|
|
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* initialize child nodes
|
|
|
|
*/
|
|
|
|
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
|
2005-04-20 00:35:18 +02:00
|
|
|
|
|
|
|
/*
|
2018-02-17 06:17:38 +01:00
|
|
|
* get the scan type from the relation descriptor.
|
2005-04-20 00:35:18 +02:00
|
|
|
*/
|
2018-02-17 06:17:38 +01:00
|
|
|
ExecInitScanTupleSlot(estate, &scanstate->ss,
|
Introduce notion of different types of slots (without implementing them).
Upcoming work intends to allow pluggable ways to introduce new ways of
storing table data. Accessing those table access methods from the
executor requires TupleTableSlots to be carry tuples in the native
format of such storage methods; otherwise there'll be a significant
conversion overhead.
Different access methods will require different data to store tuples
efficiently (just like virtual, minimal, heap already require fields
in TupleTableSlot). To allow that without requiring additional pointer
indirections, we want to have different structs (embedding
TupleTableSlot) for different types of slots. Thus different types of
slots are needed, which requires adapting creators of slots.
The slot that most efficiently can represent a type of tuple in an
executor node will often depend on the type of slot a child node
uses. Therefore we need to track the type of slot is returned by
nodes, so parent slots can create slots based on that.
Relatedly, JIT compilation of tuple deforming needs to know which type
of slot a certain expression refers to, so it can create an
appropriate deforming function for the type of tuple in the slot.
But not all nodes will only return one type of slot, e.g. an append
node will potentially return different types of slots for each of its
subplans.
Therefore add function that allows to query the type of a node's
result slot, and whether it'll always be the same type (whether it's
fixed). This can be queried using ExecGetResultSlotOps().
The scan, result, inner, outer type of slots are automatically
inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(),
left/right subtrees respectively. If that's not correct for a node,
that can be overwritten using new fields in PlanState.
This commit does not introduce the actually abstracted implementation
of different kind of TupleTableSlots, that will be left for a followup
commit. The different types of slots introduced will, for now, still
use the same backing implementation.
While this already partially invalidates the big comment in
tuptable.h, it seems to make more sense to update it later, when the
different TupleTableSlot implementations actually exist.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-16 07:00:30 +01:00
|
|
|
RelationGetDescr(currentRelation),
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
table_slot_callbacks(currentRelation));
|
2005-04-20 00:35:18 +02:00
|
|
|
|
|
|
|
/*
|
Don't require return slots for nodes without projection.
In a lot of nodes the return slot is not required. That can either be
because the node doesn't do any projection (say an Append node), or
because the node does perform projections but the projection is
optimized away because the projection would yield an identical row.
Slots aren't that small, especially for wide rows, so it's worthwhile
to avoid creating them. It's not possible to just skip creating the
slot - it's currently used to determine the tuple descriptor returned
by ExecGetResultType(). So separate the determination of the result
type from the slot creation. The work previously done internally
ExecInitResultTupleSlotTL() can now also be done separately with
ExecInitResultTypeTL() and ExecInitResultSlot(). That way nodes that
aren't guaranteed to need a result slot, can use
ExecInitResultTypeTL() to determine the result type of the node, and
ExecAssignScanProjectionInfo() (via
ExecConditionalAssignProjectionInfo()) determines that a result slot
is needed, it is created with ExecInitResultSlot().
Besides the advantage of avoiding to create slots that then are
unused, this is necessary preparation for later patches around tuple
table slot abstraction. In particular separating the return descriptor
and slot is a prerequisite to allow JITing of tuple deforming with
knowledge of the underlying tuple format, and to avoid unnecessarily
creating JITed tuple deforming for virtual slots.
This commit removes a redundant argument from
ExecInitResultTupleSlotTL(). While this commit touches a lot of the
relevant lines anyway, it'd normally still not worthwhile to cause
breakage, except that aforementioned later commits will touch *all*
ExecInitResultTupleSlotTL() callers anyway (but fits worse
thematically).
Author: Andres Freund
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-10 02:19:39 +01:00
|
|
|
* Initialize result type and projection.
|
2005-04-20 00:35:18 +02:00
|
|
|
*/
|
Don't require return slots for nodes without projection.
In a lot of nodes the return slot is not required. That can either be
because the node doesn't do any projection (say an Append node), or
because the node does perform projections but the projection is
optimized away because the projection would yield an identical row.
Slots aren't that small, especially for wide rows, so it's worthwhile
to avoid creating them. It's not possible to just skip creating the
slot - it's currently used to determine the tuple descriptor returned
by ExecGetResultType(). So separate the determination of the result
type from the slot creation. The work previously done internally
ExecInitResultTupleSlotTL() can now also be done separately with
ExecInitResultTypeTL() and ExecInitResultSlot(). That way nodes that
aren't guaranteed to need a result slot, can use
ExecInitResultTypeTL() to determine the result type of the node, and
ExecAssignScanProjectionInfo() (via
ExecConditionalAssignProjectionInfo()) determines that a result slot
is needed, it is created with ExecInitResultSlot().
Besides the advantage of avoiding to create slots that then are
unused, this is necessary preparation for later patches around tuple
table slot abstraction. In particular separating the return descriptor
and slot is a prerequisite to allow JITing of tuple deforming with
knowledge of the underlying tuple format, and to avoid unnecessarily
creating JITed tuple deforming for virtual slots.
This commit removes a redundant argument from
ExecInitResultTupleSlotTL(). While this commit touches a lot of the
relevant lines anyway, it'd normally still not worthwhile to cause
breakage, except that aforementioned later commits will touch *all*
ExecInitResultTupleSlotTL() callers anyway (but fits worse
thematically).
Author: Andres Freund
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-10 02:19:39 +01:00
|
|
|
ExecInitResultTypeTL(&scanstate->ss.ps);
|
2018-02-17 06:17:38 +01:00
|
|
|
ExecAssignScanProjectionInfo(&scanstate->ss);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* initialize child expressions
|
|
|
|
*/
|
|
|
|
scanstate->ss.ps.qual =
|
|
|
|
ExecInitQual(node->scan.plan.qual, (PlanState *) scanstate);
|
|
|
|
scanstate->bitmapqualorig =
|
|
|
|
ExecInitQual(node->bitmapqualorig, (PlanState *) scanstate);
|
2005-04-20 00:35:18 +02:00
|
|
|
|
2015-09-08 17:51:42 +02:00
|
|
|
/*
|
2020-03-15 23:43:18 +01:00
|
|
|
* Maximum number of prefetches for the tablespace if configured,
|
|
|
|
* otherwise the current value of the effective_io_concurrency GUC.
|
2015-09-08 17:51:42 +02:00
|
|
|
*/
|
2020-03-15 23:43:18 +01:00
|
|
|
scanstate->prefetch_maximum =
|
2015-09-08 17:51:42 +02:00
|
|
|
get_tablespace_io_concurrency(currentRelation->rd_rel->reltablespace);
|
|
|
|
|
2005-04-20 00:35:18 +02:00
|
|
|
scanstate->ss.ss_currentRelation = currentRelation;
|
|
|
|
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
scanstate->ss.ss_currentScanDesc = table_beginscan_bm(currentRelation,
|
|
|
|
estate->es_snapshot,
|
|
|
|
0,
|
|
|
|
NULL);
|
2005-10-06 04:29:23 +02:00
|
|
|
|
2005-04-20 00:35:18 +02:00
|
|
|
/*
|
|
|
|
* all done.
|
|
|
|
*/
|
|
|
|
return scanstate;
|
|
|
|
}
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
|
|
|
|
/*----------------
|
|
|
|
* BitmapShouldInitializeSharedState
|
|
|
|
*
|
|
|
|
* The first process to come here and see the state to the BM_INITIAL
|
|
|
|
* will become the leader for the parallel bitmap scan and will be
|
|
|
|
* responsible for populating the TIDBitmap. The other processes will
|
|
|
|
* be blocked by the condition variable until the leader wakes them up.
|
|
|
|
* ---------------
|
|
|
|
*/
|
|
|
|
static bool
|
|
|
|
BitmapShouldInitializeSharedState(ParallelBitmapHeapState *pstate)
|
|
|
|
{
|
|
|
|
SharedBitmapState state;
|
|
|
|
|
|
|
|
while (1)
|
|
|
|
{
|
|
|
|
SpinLockAcquire(&pstate->mutex);
|
|
|
|
state = pstate->state;
|
|
|
|
if (pstate->state == BM_INITIAL)
|
|
|
|
pstate->state = BM_INPROGRESS;
|
|
|
|
SpinLockRelease(&pstate->mutex);
|
|
|
|
|
|
|
|
/* Exit if bitmap is done, or if we're the leader. */
|
|
|
|
if (state != BM_INPROGRESS)
|
|
|
|
break;
|
|
|
|
|
|
|
|
/* Wait for the leader to wake us up. */
|
|
|
|
ConditionVariableSleep(&pstate->cv, WAIT_EVENT_PARALLEL_BITMAP_SCAN);
|
|
|
|
}
|
|
|
|
|
|
|
|
ConditionVariableCancelSleep();
|
|
|
|
|
|
|
|
return (state == BM_INITIAL);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* ----------------------------------------------------------------
|
|
|
|
* ExecBitmapHeapEstimate
|
|
|
|
*
|
2017-10-28 11:50:22 +02:00
|
|
|
* Compute the amount of space we'll need in the parallel
|
|
|
|
* query DSM, and inform pcxt->estimator about our needs.
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
* ----------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
ExecBitmapHeapEstimate(BitmapHeapScanState *node,
|
|
|
|
ParallelContext *pcxt)
|
|
|
|
{
|
|
|
|
EState *estate = node->ss.ps.state;
|
|
|
|
|
|
|
|
node->pscan_len = add_size(offsetof(ParallelBitmapHeapState,
|
|
|
|
phs_snapshot_data),
|
|
|
|
EstimateSnapshotSpace(estate->es_snapshot));
|
|
|
|
|
|
|
|
shm_toc_estimate_chunk(&pcxt->estimator, node->pscan_len);
|
|
|
|
shm_toc_estimate_keys(&pcxt->estimator, 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* ----------------------------------------------------------------
|
|
|
|
* ExecBitmapHeapInitializeDSM
|
|
|
|
*
|
|
|
|
* Set up a parallel bitmap heap scan descriptor.
|
|
|
|
* ----------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
ExecBitmapHeapInitializeDSM(BitmapHeapScanState *node,
|
|
|
|
ParallelContext *pcxt)
|
|
|
|
{
|
|
|
|
ParallelBitmapHeapState *pstate;
|
|
|
|
EState *estate = node->ss.ps.state;
|
2017-11-28 17:39:16 +01:00
|
|
|
dsa_area *dsa = node->ss.ps.state->es_query_dsa;
|
|
|
|
|
|
|
|
/* If there's no DSA, there are no workers; initialize nothing. */
|
|
|
|
if (dsa == NULL)
|
|
|
|
return;
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
|
|
|
|
pstate = shm_toc_allocate(pcxt->toc, node->pscan_len);
|
|
|
|
|
|
|
|
pstate->tbmiterator = 0;
|
|
|
|
pstate->prefetch_iterator = 0;
|
|
|
|
|
|
|
|
/* Initialize the mutex */
|
|
|
|
SpinLockInit(&pstate->mutex);
|
|
|
|
pstate->prefetch_pages = 0;
|
|
|
|
pstate->prefetch_target = 0;
|
|
|
|
pstate->state = BM_INITIAL;
|
|
|
|
|
|
|
|
ConditionVariableInit(&pstate->cv);
|
|
|
|
SerializeSnapshot(estate->es_snapshot, pstate->phs_snapshot_data);
|
|
|
|
|
|
|
|
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pstate);
|
|
|
|
node->pstate = pstate;
|
|
|
|
}
|
|
|
|
|
Separate reinitialization of shared parallel-scan state from ExecReScan.
Previously, the parallel executor logic did reinitialization of shared
state within the ExecReScan code for parallel-aware scan nodes. This is
problematic, because it means that the ExecReScan call has to occur
synchronously (ie, during the parent Gather node's ReScan call). That is
swimming very much against the tide so far as the ExecReScan machinery is
concerned; the fact that it works at all today depends on a lot of fragile
assumptions, such as that no plan node between Gather and a parallel-aware
scan node is parameterized. Another objection is that because ExecReScan
might be called in workers as well as the leader, hacky extra tests are
needed in some places to prevent unwanted shared-state resets.
Hence, let's separate this code into two functions, a ReInitializeDSM
call and the ReScan call proper. ReInitializeDSM is called only in
the leader and is guaranteed to run before we start new workers.
ReScan is returned to its traditional function of resetting only local
state, which means that ExecReScan's usual habits of delaying or
eliminating child rescan calls are safe again.
As with the preceding commit 7df2c1f8d, it doesn't seem to be necessary
to make these changes in 9.6, which is a good thing because the FDW and
CustomScan APIs are impacted.
Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com
2017-08-30 19:18:16 +02:00
|
|
|
/* ----------------------------------------------------------------
|
|
|
|
* ExecBitmapHeapReInitializeDSM
|
|
|
|
*
|
|
|
|
* Reset shared state before beginning a fresh scan.
|
|
|
|
* ----------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
ExecBitmapHeapReInitializeDSM(BitmapHeapScanState *node,
|
|
|
|
ParallelContext *pcxt)
|
|
|
|
{
|
|
|
|
ParallelBitmapHeapState *pstate = node->pstate;
|
|
|
|
dsa_area *dsa = node->ss.ps.state->es_query_dsa;
|
|
|
|
|
2017-11-28 17:39:16 +01:00
|
|
|
/* If there's no DSA, there are no workers; do nothing. */
|
|
|
|
if (dsa == NULL)
|
|
|
|
return;
|
|
|
|
|
Separate reinitialization of shared parallel-scan state from ExecReScan.
Previously, the parallel executor logic did reinitialization of shared
state within the ExecReScan code for parallel-aware scan nodes. This is
problematic, because it means that the ExecReScan call has to occur
synchronously (ie, during the parent Gather node's ReScan call). That is
swimming very much against the tide so far as the ExecReScan machinery is
concerned; the fact that it works at all today depends on a lot of fragile
assumptions, such as that no plan node between Gather and a parallel-aware
scan node is parameterized. Another objection is that because ExecReScan
might be called in workers as well as the leader, hacky extra tests are
needed in some places to prevent unwanted shared-state resets.
Hence, let's separate this code into two functions, a ReInitializeDSM
call and the ReScan call proper. ReInitializeDSM is called only in
the leader and is guaranteed to run before we start new workers.
ReScan is returned to its traditional function of resetting only local
state, which means that ExecReScan's usual habits of delaying or
eliminating child rescan calls are safe again.
As with the preceding commit 7df2c1f8d, it doesn't seem to be necessary
to make these changes in 9.6, which is a good thing because the FDW and
CustomScan APIs are impacted.
Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com
2017-08-30 19:18:16 +02:00
|
|
|
pstate->state = BM_INITIAL;
|
|
|
|
|
|
|
|
if (DsaPointerIsValid(pstate->tbmiterator))
|
|
|
|
tbm_free_shared_area(dsa, pstate->tbmiterator);
|
|
|
|
|
|
|
|
if (DsaPointerIsValid(pstate->prefetch_iterator))
|
|
|
|
tbm_free_shared_area(dsa, pstate->prefetch_iterator);
|
|
|
|
|
|
|
|
pstate->tbmiterator = InvalidDsaPointer;
|
|
|
|
pstate->prefetch_iterator = InvalidDsaPointer;
|
|
|
|
}
|
|
|
|
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
/* ----------------------------------------------------------------
|
|
|
|
* ExecBitmapHeapInitializeWorker
|
|
|
|
*
|
|
|
|
* Copy relevant information from TOC into planstate.
|
|
|
|
* ----------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
void
|
2017-11-17 02:28:11 +01:00
|
|
|
ExecBitmapHeapInitializeWorker(BitmapHeapScanState *node,
|
|
|
|
ParallelWorkerContext *pwcxt)
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
{
|
|
|
|
ParallelBitmapHeapState *pstate;
|
|
|
|
Snapshot snapshot;
|
|
|
|
|
2017-11-28 17:39:16 +01:00
|
|
|
Assert(node->ss.ps.state->es_query_dsa != NULL);
|
|
|
|
|
2017-11-17 02:28:11 +01:00
|
|
|
pstate = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
node->pstate = pstate;
|
|
|
|
|
|
|
|
snapshot = RestoreSnapshot(pstate->phs_snapshot_data);
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
table_scan_update_snapshot(node->ss.ss_currentScanDesc, snapshot);
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
}
|