tableam: introduce table AM infrastructure.
This introduces the concept of table access methods, i.e. CREATE
ACCESS METHOD ... TYPE TABLE and
CREATE TABLE ... USING (storage-engine).
No table access functionality is delegated to table AMs as of this
commit, that'll be done in following commits.
Subsequent commits will incrementally abstract table access
functionality to be routed through table access methods. That change
is too large to be reviewed & committed at once, so it'll be done
incrementally.
Docs will be updated at the end, as adding them incrementally would
likely make them less coherent, and definitely is a lot more work,
without a lot of benefit.
Table access methods are specified similar to index access methods,
i.e. pg_am.amhandler returns, as INTERNAL, a pointer to a struct with
callbacks. In contrast to index AMs that struct needs to live as long
as a backend, typically that's achieved by just returning a pointer to
a constant struct.
Psql's \d+ now displays a table's access method. That can be disabled
with HIDE_TABLEAM=true, which is mainly useful so regression tests can
be run against different AMs. It's quite possible that this behaviour
still needs to be fine tuned.
For now it's not allowed to set a table AM for a partitioned table, as
we've not resolved how partitions would inherit that. Disallowing
allows us to introduce, if we decide that's the way forward, such a
behaviour without a compatibility break.
Catversion bumped, to add the heap table AM and references to it.
Author: Haribabu Kommi, Andres Freund, Alvaro Herrera, Dimitri Golgov and others
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
https://postgr.es/m/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de
https://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
2019-03-06 18:54:38 +01:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* heapam_handler.c
|
|
|
|
* heap table access method code
|
|
|
|
*
|
|
|
|
* Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
|
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
|
|
|
* src/backend/access/heap/heapam_handler.c
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* NOTES
|
|
|
|
* This files wires up the lower level heapam.c et routines with the
|
|
|
|
* tableam abstraction.
|
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
#include "access/heapam.h"
|
tableam: introduce table AM infrastructure.
This introduces the concept of table access methods, i.e. CREATE
ACCESS METHOD ... TYPE TABLE and
CREATE TABLE ... USING (storage-engine).
No table access functionality is delegated to table AMs as of this
commit, that'll be done in following commits.
Subsequent commits will incrementally abstract table access
functionality to be routed through table access methods. That change
is too large to be reviewed & committed at once, so it'll be done
incrementally.
Docs will be updated at the end, as adding them incrementally would
likely make them less coherent, and definitely is a lot more work,
without a lot of benefit.
Table access methods are specified similar to index access methods,
i.e. pg_am.amhandler returns, as INTERNAL, a pointer to a struct with
callbacks. In contrast to index AMs that struct needs to live as long
as a backend, typically that's achieved by just returning a pointer to
a constant struct.
Psql's \d+ now displays a table's access method. That can be disabled
with HIDE_TABLEAM=true, which is mainly useful so regression tests can
be run against different AMs. It's quite possible that this behaviour
still needs to be fine tuned.
For now it's not allowed to set a table AM for a partitioned table, as
we've not resolved how partitions would inherit that. Disallowing
allows us to introduce, if we decide that's the way forward, such a
behaviour without a compatibility break.
Catversion bumped, to add the heap table AM and references to it.
Author: Haribabu Kommi, Andres Freund, Alvaro Herrera, Dimitri Golgov and others
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
https://postgr.es/m/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de
https://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
2019-03-06 18:54:38 +01:00
|
|
|
#include "access/tableam.h"
|
tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
|
|
|
#include "access/xact.h"
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
#include "storage/bufmgr.h"
|
tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
|
|
|
#include "storage/lmgr.h"
|
tableam: introduce table AM infrastructure.
This introduces the concept of table access methods, i.e. CREATE
ACCESS METHOD ... TYPE TABLE and
CREATE TABLE ... USING (storage-engine).
No table access functionality is delegated to table AMs as of this
commit, that'll be done in following commits.
Subsequent commits will incrementally abstract table access
functionality to be routed through table access methods. That change
is too large to be reviewed & committed at once, so it'll be done
incrementally.
Docs will be updated at the end, as adding them incrementally would
likely make them less coherent, and definitely is a lot more work,
without a lot of benefit.
Table access methods are specified similar to index access methods,
i.e. pg_am.amhandler returns, as INTERNAL, a pointer to a struct with
callbacks. In contrast to index AMs that struct needs to live as long
as a backend, typically that's achieved by just returning a pointer to
a constant struct.
Psql's \d+ now displays a table's access method. That can be disabled
with HIDE_TABLEAM=true, which is mainly useful so regression tests can
be run against different AMs. It's quite possible that this behaviour
still needs to be fine tuned.
For now it's not allowed to set a table AM for a partitioned table, as
we've not resolved how partitions would inherit that. Disallowing
allows us to introduce, if we decide that's the way forward, such a
behaviour without a compatibility break.
Catversion bumped, to add the heap table AM and references to it.
Author: Haribabu Kommi, Andres Freund, Alvaro Herrera, Dimitri Golgov and others
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
https://postgr.es/m/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de
https://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
2019-03-06 18:54:38 +01:00
|
|
|
#include "utils/builtins.h"
|
|
|
|
|
|
|
|
|
|
|
|
static const TableAmRoutine heapam_methods;
|
|
|
|
|
|
|
|
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
/* ------------------------------------------------------------------------
|
|
|
|
* Slot related callbacks for heap AM
|
|
|
|
* ------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
|
|
|
|
static const TupleTableSlotOps *
|
|
|
|
heapam_slot_callbacks(Relation relation)
|
|
|
|
{
|
|
|
|
return &TTSOpsBufferHeapTuple;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/* ------------------------------------------------------------------------
|
|
|
|
* Index Scan Callbacks for heap AM
|
|
|
|
* ------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
|
|
|
|
static IndexFetchTableData *
|
|
|
|
heapam_index_fetch_begin(Relation rel)
|
|
|
|
{
|
|
|
|
IndexFetchHeapData *hscan = palloc0(sizeof(IndexFetchHeapData));
|
|
|
|
|
|
|
|
hscan->xs_base.rel = rel;
|
|
|
|
hscan->xs_cbuf = InvalidBuffer;
|
|
|
|
|
|
|
|
return &hscan->xs_base;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
heapam_index_fetch_reset(IndexFetchTableData *scan)
|
|
|
|
{
|
|
|
|
IndexFetchHeapData *hscan = (IndexFetchHeapData *) scan;
|
|
|
|
|
|
|
|
if (BufferIsValid(hscan->xs_cbuf))
|
|
|
|
{
|
|
|
|
ReleaseBuffer(hscan->xs_cbuf);
|
|
|
|
hscan->xs_cbuf = InvalidBuffer;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
heapam_index_fetch_end(IndexFetchTableData *scan)
|
|
|
|
{
|
|
|
|
IndexFetchHeapData *hscan = (IndexFetchHeapData *) scan;
|
|
|
|
|
|
|
|
heapam_index_fetch_reset(scan);
|
|
|
|
|
|
|
|
pfree(hscan);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool
|
|
|
|
heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
|
|
|
|
ItemPointer tid,
|
|
|
|
Snapshot snapshot,
|
|
|
|
TupleTableSlot *slot,
|
|
|
|
bool *call_again, bool *all_dead)
|
|
|
|
{
|
|
|
|
IndexFetchHeapData *hscan = (IndexFetchHeapData *) scan;
|
|
|
|
BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
|
|
|
|
bool got_heap_tuple;
|
|
|
|
|
|
|
|
Assert(TTS_IS_BUFFERTUPLE(slot));
|
|
|
|
|
|
|
|
/* We can skip the buffer-switching logic if we're in mid-HOT chain. */
|
|
|
|
if (!*call_again)
|
|
|
|
{
|
|
|
|
/* Switch to correct buffer if we don't have it already */
|
|
|
|
Buffer prev_buf = hscan->xs_cbuf;
|
|
|
|
|
|
|
|
hscan->xs_cbuf = ReleaseAndReadBuffer(hscan->xs_cbuf,
|
|
|
|
hscan->xs_base.rel,
|
|
|
|
ItemPointerGetBlockNumber(tid));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Prune page, but only if we weren't already on this page
|
|
|
|
*/
|
|
|
|
if (prev_buf != hscan->xs_cbuf)
|
|
|
|
heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Obtain share-lock on the buffer so we can examine visibility */
|
|
|
|
LockBuffer(hscan->xs_cbuf, BUFFER_LOCK_SHARE);
|
|
|
|
got_heap_tuple = heap_hot_search_buffer(tid,
|
|
|
|
hscan->xs_base.rel,
|
|
|
|
hscan->xs_cbuf,
|
|
|
|
snapshot,
|
|
|
|
&bslot->base.tupdata,
|
|
|
|
all_dead,
|
|
|
|
!*call_again);
|
|
|
|
bslot->base.tupdata.t_self = *tid;
|
|
|
|
LockBuffer(hscan->xs_cbuf, BUFFER_LOCK_UNLOCK);
|
|
|
|
|
|
|
|
if (got_heap_tuple)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Only in a non-MVCC snapshot can more than one member of the HOT
|
|
|
|
* chain be visible.
|
|
|
|
*/
|
|
|
|
*call_again = !IsMVCCSnapshot(snapshot);
|
|
|
|
|
|
|
|
slot->tts_tableOid = RelationGetRelid(scan->rel);
|
|
|
|
ExecStoreBufferHeapTuple(&bslot->base.tupdata, slot, hscan->xs_cbuf);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* We've reached the end of the HOT chain. */
|
|
|
|
*call_again = false;
|
|
|
|
}
|
|
|
|
|
|
|
|
return got_heap_tuple;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/* ------------------------------------------------------------------------
|
|
|
|
* Callbacks for non-modifying operations on individual tuples for heap AM
|
|
|
|
* ------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
|
2019-03-25 08:13:42 +01:00
|
|
|
static bool
|
|
|
|
heapam_fetch_row_version(Relation relation,
|
|
|
|
ItemPointer tid,
|
|
|
|
Snapshot snapshot,
|
|
|
|
TupleTableSlot *slot)
|
|
|
|
{
|
|
|
|
BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
|
|
|
|
Buffer buffer;
|
|
|
|
|
|
|
|
Assert(TTS_IS_BUFFERTUPLE(slot));
|
|
|
|
|
|
|
|
bslot->base.tupdata.t_self = *tid;
|
|
|
|
if (heap_fetch(relation, snapshot, &bslot->base.tupdata, &buffer))
|
|
|
|
{
|
|
|
|
/* store in slot, transferring existing pin */
|
|
|
|
ExecStorePinnedBufferHeapTuple(&bslot->base.tupdata, slot, buffer);
|
|
|
|
slot->tts_tableOid = RelationGetRelid(relation);
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
static bool
|
|
|
|
heapam_tuple_satisfies_snapshot(Relation rel, TupleTableSlot *slot,
|
|
|
|
Snapshot snapshot)
|
|
|
|
{
|
|
|
|
BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
|
|
|
|
bool res;
|
|
|
|
|
|
|
|
Assert(TTS_IS_BUFFERTUPLE(slot));
|
|
|
|
Assert(BufferIsValid(bslot->buffer));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We need buffer pin and lock to call HeapTupleSatisfiesVisibility.
|
|
|
|
* Caller should be holding pin, but not lock.
|
|
|
|
*/
|
|
|
|
LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
|
|
|
|
res = HeapTupleSatisfiesVisibility(bslot->base.tuple, snapshot,
|
|
|
|
bslot->buffer);
|
|
|
|
LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
|
|
|
|
|
|
|
|
return res;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
|
|
|
/* ----------------------------------------------------------------------------
|
|
|
|
* Functions for manipulations of physical tuples for heap AM.
|
|
|
|
* ----------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
|
|
|
|
static void
|
|
|
|
heapam_tuple_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
|
|
|
|
int options, BulkInsertState bistate)
|
|
|
|
{
|
|
|
|
bool shouldFree = true;
|
|
|
|
HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
|
|
|
|
|
|
|
|
/* Update the tuple with table oid */
|
|
|
|
slot->tts_tableOid = RelationGetRelid(relation);
|
|
|
|
tuple->t_tableOid = slot->tts_tableOid;
|
|
|
|
|
|
|
|
/* Perform the insertion, and copy the resulting ItemPointer */
|
|
|
|
heap_insert(relation, tuple, cid, options, bistate);
|
|
|
|
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
|
|
|
|
|
|
|
|
if (shouldFree)
|
|
|
|
pfree(tuple);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
heapam_tuple_insert_speculative(Relation relation, TupleTableSlot *slot, CommandId cid,
|
|
|
|
int options, BulkInsertState bistate, uint32 specToken)
|
|
|
|
{
|
|
|
|
bool shouldFree = true;
|
|
|
|
HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
|
|
|
|
|
|
|
|
/* Update the tuple with table oid */
|
|
|
|
slot->tts_tableOid = RelationGetRelid(relation);
|
|
|
|
tuple->t_tableOid = slot->tts_tableOid;
|
|
|
|
|
|
|
|
HeapTupleHeaderSetSpeculativeToken(tuple->t_data, specToken);
|
|
|
|
options |= HEAP_INSERT_SPECULATIVE;
|
|
|
|
|
|
|
|
/* Perform the insertion, and copy the resulting ItemPointer */
|
|
|
|
heap_insert(relation, tuple, cid, options, bistate);
|
|
|
|
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
|
|
|
|
|
|
|
|
if (shouldFree)
|
|
|
|
pfree(tuple);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
heapam_tuple_complete_speculative(Relation relation, TupleTableSlot *slot, uint32 spekToken,
|
|
|
|
bool succeeded)
|
|
|
|
{
|
|
|
|
bool shouldFree = true;
|
|
|
|
HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
|
|
|
|
|
|
|
|
/* adjust the tuple's state accordingly */
|
|
|
|
if (!succeeded)
|
|
|
|
heap_finish_speculative(relation, &slot->tts_tid);
|
|
|
|
else
|
|
|
|
heap_abort_speculative(relation, &slot->tts_tid);
|
|
|
|
|
|
|
|
if (shouldFree)
|
|
|
|
pfree(tuple);
|
|
|
|
}
|
|
|
|
|
|
|
|
static TM_Result
|
|
|
|
heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
|
|
|
|
Snapshot snapshot, Snapshot crosscheck, bool wait,
|
|
|
|
TM_FailureData *tmfd, bool changingPart)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Currently Deleting of index tuples are handled at vacuum, in case if
|
|
|
|
* the storage itself is cleaning the dead tuples by itself, it is the
|
|
|
|
* time to call the index tuple deletion also.
|
|
|
|
*/
|
|
|
|
return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
static TM_Result
|
|
|
|
heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
|
|
|
|
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
|
|
|
|
bool wait, TM_FailureData *tmfd,
|
|
|
|
LockTupleMode *lockmode, bool *update_indexes)
|
|
|
|
{
|
|
|
|
bool shouldFree = true;
|
|
|
|
HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
|
|
|
|
TM_Result result;
|
|
|
|
|
|
|
|
/* Update the tuple with table oid */
|
|
|
|
slot->tts_tableOid = RelationGetRelid(relation);
|
|
|
|
tuple->t_tableOid = slot->tts_tableOid;
|
|
|
|
|
|
|
|
result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
|
|
|
|
tmfd, lockmode);
|
|
|
|
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Decide whether new index entries are needed for the tuple
|
|
|
|
*
|
|
|
|
* Note: heap_update returns the tid (location) of the new tuple in the
|
|
|
|
* t_self field.
|
|
|
|
*
|
|
|
|
* If it's a HOT update, we mustn't insert new index entries.
|
|
|
|
*/
|
|
|
|
*update_indexes = result == TM_Ok && !HeapTupleIsHeapOnly(tuple);
|
|
|
|
|
|
|
|
if (shouldFree)
|
|
|
|
pfree(tuple);
|
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
static TM_Result
|
|
|
|
heapam_tuple_lock(Relation relation, ItemPointer tid, Snapshot snapshot,
|
|
|
|
TupleTableSlot *slot, CommandId cid, LockTupleMode mode,
|
|
|
|
LockWaitPolicy wait_policy, uint8 flags,
|
|
|
|
TM_FailureData *tmfd)
|
|
|
|
{
|
|
|
|
BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
|
|
|
|
TM_Result result;
|
|
|
|
Buffer buffer;
|
|
|
|
HeapTuple tuple = &bslot->base.tupdata;
|
|
|
|
bool follow_updates;
|
|
|
|
|
|
|
|
follow_updates = (flags & TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS) != 0;
|
|
|
|
tmfd->traversed = false;
|
|
|
|
|
|
|
|
Assert(TTS_IS_BUFFERTUPLE(slot));
|
|
|
|
|
|
|
|
tuple_lock_retry:
|
|
|
|
tuple->t_self = *tid;
|
|
|
|
result = heap_lock_tuple(relation, tuple, cid, mode, wait_policy,
|
|
|
|
follow_updates, &buffer, tmfd);
|
|
|
|
|
|
|
|
if (result == TM_Updated &&
|
|
|
|
(flags & TUPLE_LOCK_FLAG_FIND_LAST_VERSION))
|
|
|
|
{
|
|
|
|
ReleaseBuffer(buffer);
|
|
|
|
/* Should not encounter speculative tuple on recheck */
|
|
|
|
Assert(!HeapTupleHeaderIsSpeculative(tuple->t_data));
|
|
|
|
|
|
|
|
if (!ItemPointerEquals(&tmfd->ctid, &tuple->t_self))
|
|
|
|
{
|
|
|
|
SnapshotData SnapshotDirty;
|
|
|
|
TransactionId priorXmax;
|
|
|
|
|
|
|
|
/* it was updated, so look at the updated version */
|
|
|
|
*tid = tmfd->ctid;
|
|
|
|
/* updated row should have xmin matching this xmax */
|
|
|
|
priorXmax = tmfd->xmax;
|
|
|
|
|
|
|
|
/* signal that a tuple later in the chain is getting locked */
|
|
|
|
tmfd->traversed = true;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* fetch target tuple
|
|
|
|
*
|
|
|
|
* Loop here to deal with updated or busy tuples
|
|
|
|
*/
|
|
|
|
InitDirtySnapshot(SnapshotDirty);
|
|
|
|
for (;;)
|
|
|
|
{
|
|
|
|
if (ItemPointerIndicatesMovedPartitions(tid))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
|
|
|
|
errmsg("tuple to be locked was already moved to another partition due to concurrent update")));
|
|
|
|
|
|
|
|
tuple->t_self = *tid;
|
2019-03-25 08:13:42 +01:00
|
|
|
if (heap_fetch(relation, &SnapshotDirty, tuple, &buffer))
|
tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* If xmin isn't what we're expecting, the slot must have
|
|
|
|
* been recycled and reused for an unrelated tuple. This
|
|
|
|
* implies that the latest version of the row was deleted,
|
|
|
|
* so we need do nothing. (Should be safe to examine xmin
|
|
|
|
* without getting buffer's content lock. We assume
|
|
|
|
* reading a TransactionId to be atomic, and Xmin never
|
|
|
|
* changes in an existing tuple, except to invalid or
|
|
|
|
* frozen, and neither of those can match priorXmax.)
|
|
|
|
*/
|
|
|
|
if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple->t_data),
|
|
|
|
priorXmax))
|
|
|
|
{
|
|
|
|
ReleaseBuffer(buffer);
|
|
|
|
return TM_Deleted;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* otherwise xmin should not be dirty... */
|
|
|
|
if (TransactionIdIsValid(SnapshotDirty.xmin))
|
|
|
|
elog(ERROR, "t_xmin is uncommitted in tuple to be updated");
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If tuple is being updated by other transaction then we
|
|
|
|
* have to wait for its commit/abort, or die trying.
|
|
|
|
*/
|
|
|
|
if (TransactionIdIsValid(SnapshotDirty.xmax))
|
|
|
|
{
|
|
|
|
ReleaseBuffer(buffer);
|
|
|
|
switch (wait_policy)
|
|
|
|
{
|
|
|
|
case LockWaitBlock:
|
|
|
|
XactLockTableWait(SnapshotDirty.xmax,
|
|
|
|
relation, &tuple->t_self,
|
|
|
|
XLTW_FetchUpdated);
|
|
|
|
break;
|
|
|
|
case LockWaitSkip:
|
|
|
|
if (!ConditionalXactLockTableWait(SnapshotDirty.xmax))
|
|
|
|
/* skip instead of waiting */
|
|
|
|
return TM_WouldBlock;
|
|
|
|
break;
|
|
|
|
case LockWaitError:
|
|
|
|
if (!ConditionalXactLockTableWait(SnapshotDirty.xmax))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
|
|
|
|
errmsg("could not obtain lock on row in relation \"%s\"",
|
|
|
|
RelationGetRelationName(relation))));
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
continue; /* loop back to repeat heap_fetch */
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If tuple was inserted by our own transaction, we have
|
|
|
|
* to check cmin against cid: cmin >= current CID means
|
|
|
|
* our command cannot see the tuple, so we should ignore
|
|
|
|
* it. Otherwise heap_lock_tuple() will throw an error,
|
|
|
|
* and so would any later attempt to update or delete the
|
|
|
|
* tuple. (We need not check cmax because
|
|
|
|
* HeapTupleSatisfiesDirty will consider a tuple deleted
|
|
|
|
* by our transaction dead, regardless of cmax.) We just
|
|
|
|
* checked that priorXmax == xmin, so we can test that
|
|
|
|
* variable instead of doing HeapTupleHeaderGetXmin again.
|
|
|
|
*/
|
|
|
|
if (TransactionIdIsCurrentTransactionId(priorXmax) &&
|
|
|
|
HeapTupleHeaderGetCmin(tuple->t_data) >= cid)
|
|
|
|
{
|
|
|
|
ReleaseBuffer(buffer);
|
|
|
|
return TM_Invisible;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This is a live tuple, so try to lock it again.
|
|
|
|
*/
|
|
|
|
ReleaseBuffer(buffer);
|
|
|
|
goto tuple_lock_retry;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the referenced slot was actually empty, the latest
|
|
|
|
* version of the row must have been deleted, so we need do
|
|
|
|
* nothing.
|
|
|
|
*/
|
|
|
|
if (tuple->t_data == NULL)
|
|
|
|
{
|
|
|
|
return TM_Deleted;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* As above, if xmin isn't what we're expecting, do nothing.
|
|
|
|
*/
|
|
|
|
if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple->t_data),
|
|
|
|
priorXmax))
|
|
|
|
{
|
|
|
|
if (BufferIsValid(buffer))
|
|
|
|
ReleaseBuffer(buffer);
|
|
|
|
return TM_Deleted;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we get here, the tuple was found but failed
|
|
|
|
* SnapshotDirty. Assuming the xmin is either a committed xact
|
|
|
|
* or our own xact (as it certainly should be if we're trying
|
|
|
|
* to modify the tuple), this must mean that the row was
|
|
|
|
* updated or deleted by either a committed xact or our own
|
|
|
|
* xact. If it was deleted, we can ignore it; if it was
|
|
|
|
* updated then chain up to the next version and repeat the
|
|
|
|
* whole process.
|
|
|
|
*
|
|
|
|
* As above, it should be safe to examine xmax and t_ctid
|
|
|
|
* without the buffer content lock, because they can't be
|
|
|
|
* changing.
|
|
|
|
*/
|
|
|
|
if (ItemPointerEquals(&tuple->t_self, &tuple->t_data->t_ctid))
|
|
|
|
{
|
|
|
|
/* deleted, so forget about it */
|
|
|
|
if (BufferIsValid(buffer))
|
|
|
|
ReleaseBuffer(buffer);
|
|
|
|
return TM_Deleted;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* updated, so look at the updated row */
|
|
|
|
*tid = tuple->t_data->t_ctid;
|
|
|
|
/* updated row should have xmin matching this xmax */
|
|
|
|
priorXmax = HeapTupleHeaderGetUpdateXid(tuple->t_data);
|
|
|
|
if (BufferIsValid(buffer))
|
|
|
|
ReleaseBuffer(buffer);
|
|
|
|
/* loop back to fetch next in chain */
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* tuple was deleted, so give up */
|
|
|
|
return TM_Deleted;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
slot->tts_tableOid = RelationGetRelid(relation);
|
|
|
|
tuple->t_tableOid = slot->tts_tableOid;
|
|
|
|
|
|
|
|
/* store in slot, transferring existing pin */
|
|
|
|
ExecStorePinnedBufferHeapTuple(tuple, slot, buffer);
|
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
/* ------------------------------------------------------------------------
|
|
|
|
* Definition of the heap table access method.
|
|
|
|
* ------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
|
tableam: introduce table AM infrastructure.
This introduces the concept of table access methods, i.e. CREATE
ACCESS METHOD ... TYPE TABLE and
CREATE TABLE ... USING (storage-engine).
No table access functionality is delegated to table AMs as of this
commit, that'll be done in following commits.
Subsequent commits will incrementally abstract table access
functionality to be routed through table access methods. That change
is too large to be reviewed & committed at once, so it'll be done
incrementally.
Docs will be updated at the end, as adding them incrementally would
likely make them less coherent, and definitely is a lot more work,
without a lot of benefit.
Table access methods are specified similar to index access methods,
i.e. pg_am.amhandler returns, as INTERNAL, a pointer to a struct with
callbacks. In contrast to index AMs that struct needs to live as long
as a backend, typically that's achieved by just returning a pointer to
a constant struct.
Psql's \d+ now displays a table's access method. That can be disabled
with HIDE_TABLEAM=true, which is mainly useful so regression tests can
be run against different AMs. It's quite possible that this behaviour
still needs to be fine tuned.
For now it's not allowed to set a table AM for a partitioned table, as
we've not resolved how partitions would inherit that. Disallowing
allows us to introduce, if we decide that's the way forward, such a
behaviour without a compatibility break.
Catversion bumped, to add the heap table AM and references to it.
Author: Haribabu Kommi, Andres Freund, Alvaro Herrera, Dimitri Golgov and others
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
https://postgr.es/m/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de
https://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
2019-03-06 18:54:38 +01:00
|
|
|
static const TableAmRoutine heapam_methods = {
|
|
|
|
.type = T_TableAmRoutine,
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
|
|
|
|
.slot_callbacks = heapam_slot_callbacks,
|
|
|
|
|
|
|
|
.scan_begin = heap_beginscan,
|
|
|
|
.scan_end = heap_endscan,
|
|
|
|
.scan_rescan = heap_rescan,
|
|
|
|
.scan_getnextslot = heap_getnextslot,
|
|
|
|
|
|
|
|
.parallelscan_estimate = table_block_parallelscan_estimate,
|
|
|
|
.parallelscan_initialize = table_block_parallelscan_initialize,
|
|
|
|
.parallelscan_reinitialize = table_block_parallelscan_reinitialize,
|
|
|
|
|
|
|
|
.index_fetch_begin = heapam_index_fetch_begin,
|
|
|
|
.index_fetch_reset = heapam_index_fetch_reset,
|
|
|
|
.index_fetch_end = heapam_index_fetch_end,
|
|
|
|
.index_fetch_tuple = heapam_index_fetch_tuple,
|
|
|
|
|
tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
|
|
|
.tuple_insert = heapam_tuple_insert,
|
|
|
|
.tuple_insert_speculative = heapam_tuple_insert_speculative,
|
|
|
|
.tuple_complete_speculative = heapam_tuple_complete_speculative,
|
|
|
|
.tuple_delete = heapam_tuple_delete,
|
|
|
|
.tuple_update = heapam_tuple_update,
|
|
|
|
.tuple_lock = heapam_tuple_lock,
|
|
|
|
|
2019-03-25 08:13:42 +01:00
|
|
|
.tuple_fetch_row_version = heapam_fetch_row_version,
|
2019-03-26 01:14:48 +01:00
|
|
|
.tuple_get_latest_tid = heap_get_latest_tid,
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
.tuple_satisfies_snapshot = heapam_tuple_satisfies_snapshot,
|
Compute XID horizon for page level index vacuum on primary.
Previously the xid horizon was only computed during WAL replay. That
had two major problems:
1) It relied on knowing what the table pointed to looks like. That was
easy enough before the introducing of tableam (we knew it had to be
heap, although some trickery around logging the heap relfilenodes
was required). But to properly handle table AMs we need
per-database catalog access to look up the AM handler, which
recovery doesn't allow.
2) Not knowing the xid horizon also makes it hard to support logical
decoding on standbys. When on a catalog table, we need to be able
to conflict with slots that have an xid horizon that's too old. But
computing the horizon by visiting the heap only works once
consistency is reached, but we always need to be able to detect
conflicts.
There's also a secondary problem, in that the current method performs
redundant work on every standby. But that's counterbalanced by
potentially computing the value when not necessary (either because
there's no standby, or because there's no connected backends).
Solve 1) and 2) by moving computation of the xid horizon to the
primary and by involving tableam in the computation of the horizon.
To address the potentially increased overhead, increase the efficiency
of the xid horizon computation for heap by sorting the tids, and
eliminating redundant buffer accesses. When prefetching is available,
additionally perform prefetching of buffers. As this is more of a
maintenance task, rather than something routinely done in every read
only query, we add an arbitrary 10 to the effective concurrency -
thereby using IO concurrency, when not globally enabled. That's
possibly not the perfect formula, but seems good enough for now.
Bumps WAL format, as latestRemovedXid is now part of the records, and
the heap's relfilenode isn't anymore.
Author: Andres Freund, Amit Khandekar, Robert Haas
Reviewed-By: Robert Haas
Discussion:
https://postgr.es/m/20181212204154.nsxf3gzqv3gesl32@alap3.anarazel.de
https://postgr.es/m/20181214014235.dal5ogljs3bmlq44@alap3.anarazel.de
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-26 22:41:46 +01:00
|
|
|
.compute_xid_horizon_for_tuples = heap_compute_xid_horizon_for_tuples,
|
tableam: introduce table AM infrastructure.
This introduces the concept of table access methods, i.e. CREATE
ACCESS METHOD ... TYPE TABLE and
CREATE TABLE ... USING (storage-engine).
No table access functionality is delegated to table AMs as of this
commit, that'll be done in following commits.
Subsequent commits will incrementally abstract table access
functionality to be routed through table access methods. That change
is too large to be reviewed & committed at once, so it'll be done
incrementally.
Docs will be updated at the end, as adding them incrementally would
likely make them less coherent, and definitely is a lot more work,
without a lot of benefit.
Table access methods are specified similar to index access methods,
i.e. pg_am.amhandler returns, as INTERNAL, a pointer to a struct with
callbacks. In contrast to index AMs that struct needs to live as long
as a backend, typically that's achieved by just returning a pointer to
a constant struct.
Psql's \d+ now displays a table's access method. That can be disabled
with HIDE_TABLEAM=true, which is mainly useful so regression tests can
be run against different AMs. It's quite possible that this behaviour
still needs to be fine tuned.
For now it's not allowed to set a table AM for a partitioned table, as
we've not resolved how partitions would inherit that. Disallowing
allows us to introduce, if we decide that's the way forward, such a
behaviour without a compatibility break.
Catversion bumped, to add the heap table AM and references to it.
Author: Haribabu Kommi, Andres Freund, Alvaro Herrera, Dimitri Golgov and others
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
https://postgr.es/m/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de
https://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
2019-03-06 18:54:38 +01:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
|
|
const TableAmRoutine *
|
|
|
|
GetHeapamTableAmRoutine(void)
|
|
|
|
{
|
|
|
|
return &heapam_methods;
|
|
|
|
}
|
|
|
|
|
|
|
|
Datum
|
|
|
|
heap_tableam_handler(PG_FUNCTION_ARGS)
|
|
|
|
{
|
|
|
|
PG_RETURN_POINTER(&heapam_methods);
|
|
|
|
}
|