1998-01-25 06:04:21 +01:00
|
|
|
/*-------------------------------------------------------------------------
|
1996-08-27 23:50:29 +02:00
|
|
|
*
|
1999-02-14 00:22:53 +01:00
|
|
|
* heapam.h
|
1997-09-07 07:04:48 +02:00
|
|
|
* POSTGRES heap access method definitions.
|
1996-08-27 23:50:29 +02:00
|
|
|
*
|
|
|
|
*
|
2019-01-02 18:44:25 +01:00
|
|
|
* Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
|
2000-01-26 06:58:53 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
1996-08-27 23:50:29 +02:00
|
|
|
*
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/include/access/heapam.h
|
1996-08-27 23:50:29 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
1997-09-07 07:04:48 +02:00
|
|
|
#ifndef HEAPAM_H
|
1996-08-27 23:50:29 +02:00
|
|
|
#define HEAPAM_H
|
|
|
|
|
Introduce access/{table.h, relation.h}, for generic functions from heapam.h.
access/heapam contains functions that are very storage specific (say
heap_insert() and a lot of lower level functions), and fairly generic
infrastructure like relation_open(), heap_open() etc. In the upcoming
pluggable storage work we're introducing a layer between table
accesses in general and heapam, to allow for different storage
methods. For a bit cleaner separation it thus seems advantageous to
move generic functions like the aforementioned to their own headers.
access/relation.h will contain relation_open() etc, and access/table.h
will contain table_open() (formerly known as heap_open()). I've decided
for table.h not to include relation.h, but we might change that at a
later stage.
relation.h already exists in another directory, but the other
plausible name (rel.h) also conflicts. It'd be nice if there were a
non-conflicting name, but nobody came up with a suggestion. It's
possible that the appropriate way to address the naming conflict would
be to rename nodes/relation.h, which isn't particularly well named.
To avoid breaking a lot of extensions that just use heap_open() etc,
table.h has macros mapping the old names to the new ones, and heapam.h
includes relation, table.h. That also allows to keep the
bulk renaming of existing callers in a separate commit.
Author: Andres Freund
Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
2019-01-21 19:14:09 +01:00
|
|
|
#include "access/relation.h" /* for backward compatibility */
|
2002-05-21 01:51:44 +02:00
|
|
|
#include "access/sdir.h"
|
2008-06-19 02:46:06 +02:00
|
|
|
#include "access/skey.h"
|
Introduce access/{table.h, relation.h}, for generic functions from heapam.h.
access/heapam contains functions that are very storage specific (say
heap_insert() and a lot of lower level functions), and fairly generic
infrastructure like relation_open(), heap_open() etc. In the upcoming
pluggable storage work we're introducing a layer between table
accesses in general and heapam, to allow for different storage
methods. For a bit cleaner separation it thus seems advantageous to
move generic functions like the aforementioned to their own headers.
access/relation.h will contain relation_open() etc, and access/table.h
will contain table_open() (formerly known as heap_open()). I've decided
for table.h not to include relation.h, but we might change that at a
later stage.
relation.h already exists in another directory, but the other
plausible name (rel.h) also conflicts. It'd be nice if there were a
non-conflicting name, but nobody came up with a suggestion. It's
possible that the appropriate way to address the naming conflict would
be to rename nodes/relation.h, which isn't particularly well named.
To avoid breaking a lot of extensions that just use heap_open() etc,
table.h has macros mapping the old names to the new ones, and heapam.h
includes relation, table.h. That also allows to keep the
bulk renaming of existing callers in a separate commit.
Author: Andres Freund
Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
2019-01-21 19:14:09 +01:00
|
|
|
#include "access/table.h" /* for backward compatibility */
|
2015-03-15 20:19:04 +01:00
|
|
|
#include "nodes/lockoptions.h"
|
2011-09-04 07:13:16 +02:00
|
|
|
#include "nodes/primnodes.h"
|
2012-08-30 22:15:44 +02:00
|
|
|
#include "storage/bufpage.h"
|
2016-09-13 15:21:35 +02:00
|
|
|
#include "storage/lockdefs.h"
|
2008-06-19 02:46:06 +02:00
|
|
|
#include "utils/relcache.h"
|
2008-03-26 22:10:39 +01:00
|
|
|
#include "utils/snapshot.h"
|
1996-08-27 23:50:29 +02:00
|
|
|
|
|
|
|
|
2008-11-06 21:51:15 +01:00
|
|
|
/* "options" flag bits for heap_insert */
|
|
|
|
#define HEAP_INSERT_SKIP_WAL 0x0001
|
|
|
|
#define HEAP_INSERT_SKIP_FSM 0x0002
|
2012-12-02 21:52:52 +01:00
|
|
|
#define HEAP_INSERT_FROZEN 0x0004
|
Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint. DO NOTHING avoids the
constraint violation, without touching the pre-existing row. DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed. The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.
This feature is often referred to as upsert.
This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert. If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made. If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.
To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.
Bumps catversion as stored rules change.
Author: Peter Geoghegan, with significant contributions from Heikki
Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
Dean Rasheed, Stephen Frost and many others.
2015-05-08 05:31:36 +02:00
|
|
|
#define HEAP_INSERT_SPECULATIVE 0x0008
|
Fix logical decoding error when system table w/ toast is repeatedly rewritten.
Repeatedly rewriting a mapped catalog table with VACUUM FULL or
CLUSTER could cause logical decoding to fail with:
ERROR, "could not map filenode \"%s\" to relation OID"
To trigger the problem the rewritten catalog had to have live tuples
with toasted columns.
The problem was triggered as during catalog table rewrites the
heap_insert() check that prevents logical decoding information to be
emitted for system catalogs, failed to treat the new heap's toast table
as a system catalog (because the new heap is not recognized as a
catalog table via RelationIsLogicallyLogged()). The relmapper, in
contrast to the normal catalog contents, does not contain historical
information. After a single rewrite of a mapped table the new relation
is known to the relmapper, but if the table is rewritten twice before
logical decoding occurs, the relfilenode cannot be mapped to a
relation anymore. Which then leads us to error out. This only
happens for toast tables, because the main table contents aren't
re-inserted with heap_insert().
The fix is simple, add a new heap_insert() flag that prevents logical
decoding information from being emitted, and accept during decoding
that there might not be tuple data for toast tables.
Unfortunately that does not fix pre-existing logical decoding
errors. Doing so would require not throwing an error when a filenode
cannot be mapped to a relation during decoding, and that seems too
likely to hide bugs. If it's crucial to fix decoding for an existing
slot, temporarily changing the ERROR in ReorderBufferCommit() to a
WARNING appears to be the best fix.
Author: Andres Freund
Discussion: https://postgr.es/m/20180914021046.oi7dm4ra3ot2g2kt@alap3.anarazel.de
Backpatch: 9.4-, where logical decoding was introduced
2018-10-10 22:53:02 +02:00
|
|
|
#define HEAP_INSERT_NO_LOGICAL 0x0010
|
2008-11-06 21:51:15 +01:00
|
|
|
|
|
|
|
typedef struct BulkInsertStateData *BulkInsertState;
|
|
|
|
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
#define MaxLockTupleMode LockTupleExclusive
|
|
|
|
|
2012-10-26 21:55:36 +02:00
|
|
|
/*
|
|
|
|
* When heap_update, heap_delete, or heap_lock_tuple fail because the target
|
|
|
|
* tuple is already outdated, they fill in this struct to provide information
|
|
|
|
* to the caller about what happened.
|
|
|
|
* ctid is the target's ctid link: it is the same as the target's TID if the
|
|
|
|
* target was deleted, or the location of the replacement tuple if the target
|
|
|
|
* was updated.
|
|
|
|
* xmax is the outdating transaction's XID. If the caller wants to visit the
|
|
|
|
* replacement tuple, it must check that this matches before believing the
|
|
|
|
* replacement is really a match.
|
|
|
|
* cmax is the outdating command's CID, but only when the failure code is
|
|
|
|
* HeapTupleSelfUpdated (i.e., something in the current transaction outdated
|
2014-05-06 18:12:18 +02:00
|
|
|
* the tuple); otherwise cmax is zero. (We make this restriction because
|
2012-10-26 21:55:36 +02:00
|
|
|
* HeapTupleHeaderGetCmax doesn't work for tuples outdated in other
|
|
|
|
* transactions.)
|
|
|
|
*/
|
|
|
|
typedef struct HeapUpdateFailureData
|
|
|
|
{
|
2013-05-29 22:58:43 +02:00
|
|
|
ItemPointerData ctid;
|
|
|
|
TransactionId xmax;
|
|
|
|
CommandId cmax;
|
2012-10-26 21:55:36 +02:00
|
|
|
} HeapUpdateFailureData;
|
|
|
|
|
2019-01-22 02:03:15 +01:00
|
|
|
/* Result codes for HeapTupleSatisfiesVacuum */
|
|
|
|
typedef enum
|
|
|
|
{
|
|
|
|
HEAPTUPLE_DEAD, /* tuple is dead and deletable */
|
|
|
|
HEAPTUPLE_LIVE, /* tuple is live (committed, no deleter) */
|
|
|
|
HEAPTUPLE_RECENTLY_DEAD, /* tuple is dead, but not deletable yet */
|
|
|
|
HEAPTUPLE_INSERT_IN_PROGRESS, /* inserting xact is still in progress */
|
|
|
|
HEAPTUPLE_DELETE_IN_PROGRESS /* deleting xact is still in progress */
|
|
|
|
} HTSV_Result;
|
2007-06-08 20:23:53 +02:00
|
|
|
|
1996-08-27 23:50:29 +02:00
|
|
|
/* ----------------
|
1997-09-07 07:04:48 +02:00
|
|
|
* function prototypes for heap access method
|
2000-06-19 00:44:35 +02:00
|
|
|
*
|
|
|
|
* heap_create, heap_create_with_catalog, and heap_drop_with_catalog
|
|
|
|
* are declared in catalog/heap.h
|
1996-08-27 23:50:29 +02:00
|
|
|
* ----------------
|
|
|
|
*/
|
|
|
|
|
2001-11-02 17:30:29 +01:00
|
|
|
|
2015-10-16 23:25:02 +02:00
|
|
|
/* struct definitions appear in relscan.h */
|
2008-06-19 02:46:06 +02:00
|
|
|
typedef struct HeapScanDescData *HeapScanDesc;
|
2015-10-16 23:25:02 +02:00
|
|
|
typedef struct ParallelHeapScanDescData *ParallelHeapScanDesc;
|
2008-06-19 02:46:06 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* HeapScanIsValid
|
|
|
|
* True iff the heap scan is valid.
|
|
|
|
*/
|
|
|
|
#define HeapScanIsValid(scan) PointerIsValid(scan)
|
|
|
|
|
2002-05-21 01:51:44 +02:00
|
|
|
extern HeapScanDesc heap_beginscan(Relation relation, Snapshot snapshot,
|
2002-09-04 22:31:48 +02:00
|
|
|
int nkeys, ScanKey key);
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
extern HeapScanDesc heap_beginscan_catalog(Relation relation, int nkeys,
|
2014-05-06 18:12:18 +02:00
|
|
|
ScanKey key);
|
2008-01-14 02:39:09 +01:00
|
|
|
extern HeapScanDesc heap_beginscan_strat(Relation relation, Snapshot snapshot,
|
|
|
|
int nkeys, ScanKey key,
|
|
|
|
bool allow_strat, bool allow_sync);
|
2007-06-09 20:49:55 +02:00
|
|
|
extern HeapScanDesc heap_beginscan_bm(Relation relation, Snapshot snapshot,
|
2007-11-15 22:14:46 +01:00
|
|
|
int nkeys, ScanKey key);
|
2015-05-15 20:37:10 +02:00
|
|
|
extern HeapScanDesc heap_beginscan_sampling(Relation relation,
|
|
|
|
Snapshot snapshot, int nkeys, ScanKey key,
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
bool allow_strat, bool allow_sync, bool allow_pagemode);
|
BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
2014-11-07 20:38:14 +01:00
|
|
|
extern void heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk,
|
2015-05-24 03:35:49 +02:00
|
|
|
BlockNumber endBlk);
|
2015-05-15 20:37:10 +02:00
|
|
|
extern void heapgetpage(HeapScanDesc scan, BlockNumber page);
|
2002-05-21 01:51:44 +02:00
|
|
|
extern void heap_rescan(HeapScanDesc scan, ScanKey key);
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
extern void heap_rescan_set_params(HeapScanDesc scan, ScanKey key,
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
bool allow_strat, bool allow_sync, bool allow_pagemode);
|
1998-08-19 04:04:17 +02:00
|
|
|
extern void heap_endscan(HeapScanDesc scan);
|
2002-05-21 01:51:44 +02:00
|
|
|
extern HeapTuple heap_getnext(HeapScanDesc scan, ScanDirection direction);
|
|
|
|
|
2015-10-16 23:25:02 +02:00
|
|
|
extern Size heap_parallelscan_estimate(Snapshot snapshot);
|
|
|
|
extern void heap_parallelscan_initialize(ParallelHeapScanDesc target,
|
|
|
|
Relation relation, Snapshot snapshot);
|
Separate reinitialization of shared parallel-scan state from ExecReScan.
Previously, the parallel executor logic did reinitialization of shared
state within the ExecReScan code for parallel-aware scan nodes. This is
problematic, because it means that the ExecReScan call has to occur
synchronously (ie, during the parent Gather node's ReScan call). That is
swimming very much against the tide so far as the ExecReScan machinery is
concerned; the fact that it works at all today depends on a lot of fragile
assumptions, such as that no plan node between Gather and a parallel-aware
scan node is parameterized. Another objection is that because ExecReScan
might be called in workers as well as the leader, hacky extra tests are
needed in some places to prevent unwanted shared-state resets.
Hence, let's separate this code into two functions, a ReInitializeDSM
call and the ReScan call proper. ReInitializeDSM is called only in
the leader and is guaranteed to run before we start new workers.
ReScan is returned to its traditional function of resetting only local
state, which means that ExecReScan's usual habits of delaying or
eliminating child rescan calls are safe again.
As with the preceding commit 7df2c1f8d, it doesn't seem to be necessary
to make these changes in 9.6, which is a good thing because the FDW and
CustomScan APIs are impacted.
Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com
2017-08-30 19:18:16 +02:00
|
|
|
extern void heap_parallelscan_reinitialize(ParallelHeapScanDesc parallel_scan);
|
2015-10-16 23:25:02 +02:00
|
|
|
extern HeapScanDesc heap_beginscan_parallel(Relation, ParallelHeapScanDesc);
|
|
|
|
|
2002-05-24 20:57:57 +02:00
|
|
|
extern bool heap_fetch(Relation relation, Snapshot snapshot,
|
2002-09-04 22:31:48 +02:00
|
|
|
HeapTuple tuple, Buffer *userbuf, bool keep_buf,
|
2007-05-27 05:50:39 +02:00
|
|
|
Relation stats_relation);
|
Implement genuine serializable isolation level.
Until now, our Serializable mode has in fact been what's called Snapshot
Isolation, which allows some anomalies that could not occur in any
serialized ordering of the transactions. This patch fixes that using a
method called Serializable Snapshot Isolation, based on research papers by
Michael J. Cahill (see README-SSI for full references). In Serializable
Snapshot Isolation, transactions run like they do in Snapshot Isolation,
but a predicate lock manager observes the reads and writes performed and
aborts transactions if it detects that an anomaly might occur. This method
produces some false positives, ie. it sometimes aborts transactions even
though there is no anomaly.
To track reads we implement predicate locking, see storage/lmgr/predicate.c.
Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared
memory is finite, so when a transaction takes many tuple-level locks on a
page, the locks are promoted to a single page-level lock, and further to a
single relation level lock if necessary. To lock key values with no matching
tuple, a sequential scan always takes a relation-level lock, and an index
scan acquires a page-level lock that covers the search key, whether or not
there are any matching keys at the moment.
A predicate lock doesn't conflict with any regular locks or with another
predicate locks in the normal sense. They're only used by the predicate lock
manager to detect the danger of anomalies. Only serializable transactions
participate in predicate locking, so there should be no extra overhead for
for other transactions.
Predicate locks can't be released at commit, but must be remembered until
all the transactions that overlapped with it have completed. That means that
we need to remember an unbounded amount of predicate locks, so we apply a
lossy but conservative method of tracking locks for committed transactions.
If we run short of shared memory, we overflow to a new "pg_serial" SLRU
pool.
We don't currently allow Serializable transactions in Hot Standby mode.
That would be hard, because even read-only transactions can cause anomalies
that wouldn't otherwise occur.
Serializable isolation mode now means the new fully serializable level.
Repeatable Read gives you the old Snapshot Isolation level that we have
always had.
Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and
Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
|
|
|
extern bool heap_hot_search_buffer(ItemPointer tid, Relation relation,
|
2011-06-27 16:27:17 +02:00
|
|
|
Buffer buffer, Snapshot snapshot, HeapTuple heapTuple,
|
|
|
|
bool *all_dead, bool first_call);
|
2007-09-20 19:56:33 +02:00
|
|
|
extern bool heap_hot_search(ItemPointer tid, Relation relation,
|
2007-11-15 22:14:46 +01:00
|
|
|
Snapshot snapshot, bool *all_dead);
|
2002-05-21 01:51:44 +02:00
|
|
|
|
2005-08-20 02:40:32 +02:00
|
|
|
extern void heap_get_latest_tid(Relation relation, Snapshot snapshot,
|
2002-09-04 22:31:48 +02:00
|
|
|
ItemPointer tid);
|
2001-09-17 02:29:10 +02:00
|
|
|
extern void setLastTid(const ItemPointer tid);
|
2002-05-22 00:05:55 +02:00
|
|
|
|
2008-11-06 21:51:15 +01:00
|
|
|
extern BulkInsertState GetBulkInsertState(void);
|
|
|
|
extern void FreeBulkInsertState(BulkInsertState);
|
2017-01-24 14:50:16 +01:00
|
|
|
extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
|
2008-11-06 21:51:15 +01:00
|
|
|
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
|
2008-11-06 21:51:15 +01:00
|
|
|
int options, BulkInsertState bistate);
|
2011-11-09 09:54:41 +01:00
|
|
|
extern void heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
|
|
|
|
CommandId cid, int options, BulkInsertState bistate);
|
2005-08-20 02:40:32 +02:00
|
|
|
extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
|
2012-10-26 21:55:36 +02:00
|
|
|
CommandId cid, Snapshot crosscheck, bool wait,
|
Raise error when affecting tuple moved into different partition.
When an update moves a row between partitions (supported since
2f178441044b), our normal logic for following update chains in READ
COMMITTED mode doesn't work anymore. Cross partition updates are
modeled as an delete from the old and insert into the new
partition. No ctid chain exists across partitions, and there's no
convenient space to introduce that link.
Not throwing an error in a partitioned context when one would have
been thrown without partitioning is obviously problematic. This commit
introduces infrastructure to detect when a tuple has been moved, not
just plainly deleted. That allows to throw an error when encountering
a deletion that's actually a move, while attempting to following a
ctid chain.
The row deleted as part of a cross partition update is marked by
pointing it's t_ctid to an invalid block, instead of self as a normal
update would. That was deemed to be the least invasive and most
future proof way to represent the knowledge, given how few infomask
bits are there to be recycled (there's also some locking issues with
using infomask bits).
External code following ctid chains should be updated to check for
moved tuples. The most likely consequence of not doing so is a missed
error.
Author: Amul Sul, editorialized by me
Reviewed-By: Amit Kapila, Pavan Deolasee, Andres Freund, Robert Haas
Discussion: http://postgr.es/m/CAAJ_b95PkwojoYfz0bzXU8OokcTVGzN6vYGCNVUukeUDrnF3dw@mail.gmail.com
2018-04-07 22:24:10 +02:00
|
|
|
HeapUpdateFailureData *hufd, bool changingPart);
|
Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint. DO NOTHING avoids the
constraint violation, without touching the pre-existing row. DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed. The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.
This feature is often referred to as upsert.
This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert. If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made. If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.
To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.
Bumps catversion as stored rules change.
Author: Peter Geoghegan, with significant contributions from Heikki
Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
Dean Rasheed, Stephen Frost and many others.
2015-05-08 05:31:36 +02:00
|
|
|
extern void heap_finish_speculative(Relation relation, HeapTuple tuple);
|
|
|
|
extern void heap_abort_speculative(Relation relation, HeapTuple tuple);
|
2005-08-20 02:40:32 +02:00
|
|
|
extern HTSU_Result heap_update(Relation relation, ItemPointer otid,
|
2005-10-15 04:49:52 +02:00
|
|
|
HeapTuple newtup,
|
2012-10-26 21:55:36 +02:00
|
|
|
CommandId cid, Snapshot crosscheck, bool wait,
|
2018-04-12 12:22:56 +02:00
|
|
|
HeapUpdateFailureData *hufd, LockTupleMode *lockmode);
|
2005-08-20 02:40:32 +02:00
|
|
|
extern HTSU_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
|
2014-10-07 22:23:34 +02:00
|
|
|
CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
bool follow_update,
|
2012-10-26 21:55:36 +02:00
|
|
|
Buffer *buffer, HeapUpdateFailureData *hufd);
|
2006-05-11 01:18:39 +02:00
|
|
|
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
|
2017-11-14 03:45:47 +01:00
|
|
|
extern bool heap_freeze_tuple(HeapTupleHeader tuple,
|
|
|
|
TransactionId relfrozenxid, TransactionId relminmxid,
|
|
|
|
TransactionId cutoff_xid, TransactionId cutoff_multi);
|
2011-11-08 03:39:40 +01:00
|
|
|
extern bool heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
MultiXactId cutoff_multi, Buffer buf);
|
Change the format of the VM fork to add a second bit per page.
The new bit indicates whether every tuple on the page is already frozen.
It is cleared only when the all-visible bit is cleared, and it can be
set only when we vacuum a page and find that every tuple on that page is
both visible to every transaction and in no need of any future
vacuuming.
A future commit will use this new bit to optimize away full-table scans
that would otherwise be triggered by XID wraparound considerations. A
page which is merely all-visible must still be scanned in that case, but
a page which is all-frozen need not be. This commit does not attempt
that optimization, although that optimization is the goal here. It
seems better to get the basic infrastructure in place first.
Per discussion, it's very desirable for pg_upgrade to automatically
migrate existing VM forks from the old format to the new format. That,
too, will be handled in a follow-on patch.
Masahiko Sawada, reviewed by Kyotaro Horiguchi, Fujii Masao, Amit
Kapila, Simon Riggs, Andres Freund, and others, and substantially
revised by me.
2016-03-02 03:49:41 +01:00
|
|
|
extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
|
2002-05-22 00:05:55 +02:00
|
|
|
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
extern void simple_heap_insert(Relation relation, HeapTuple tup);
|
2001-01-23 05:32:23 +01:00
|
|
|
extern void simple_heap_delete(Relation relation, ItemPointer tid);
|
|
|
|
extern void simple_heap_update(Relation relation, ItemPointer otid,
|
2001-03-22 05:01:46 +01:00
|
|
|
HeapTuple tup);
|
2002-05-22 00:05:55 +02:00
|
|
|
|
2007-09-20 19:56:33 +02:00
|
|
|
extern void heap_sync(Relation relation);
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
extern void heap_update_snapshot(HeapScanDesc scan, Snapshot snapshot);
|
2007-09-20 19:56:33 +02:00
|
|
|
|
|
|
|
/* in heap/pruneheap.c */
|
Introduce logical decoding.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables. The output format is controlled by a
so-called "output plugin"; an example is included. To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.
Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.
Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
2014-03-03 22:32:18 +01:00
|
|
|
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
|
2007-11-15 22:14:46 +01:00
|
|
|
extern int heap_page_prune(Relation relation, Buffer buffer,
|
|
|
|
TransactionId OldestXmin,
|
2010-04-21 19:20:56 +02:00
|
|
|
bool report_stats, TransactionId *latestRemovedXid);
|
2008-06-12 11:12:31 +02:00
|
|
|
extern void heap_page_prune_execute(Buffer buffer,
|
2008-03-08 22:57:59 +01:00
|
|
|
OffsetNumber *redirected, int nredirected,
|
|
|
|
OffsetNumber *nowdead, int ndead,
|
2010-02-08 05:33:55 +01:00
|
|
|
OffsetNumber *nowunused, int nunused);
|
2007-09-20 19:56:33 +02:00
|
|
|
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
|
2007-01-25 03:17:26 +01:00
|
|
|
|
2007-06-08 20:23:53 +02:00
|
|
|
/* in heap/syncscan.c */
|
|
|
|
extern void ss_report_location(Relation rel, BlockNumber location);
|
|
|
|
extern BlockNumber ss_get_location(Relation rel, BlockNumber relnblocks);
|
|
|
|
extern void SyncScanShmemInit(void);
|
|
|
|
extern Size SyncScanShmemSize(void);
|
|
|
|
|
2019-01-15 21:06:19 +01:00
|
|
|
/* in heap/vacuumlazy.c */
|
|
|
|
struct VacuumParams;
|
|
|
|
extern void heap_vacuum_rel(Relation onerel, int options,
|
|
|
|
struct VacuumParams *params, BufferAccessStrategy bstrategy);
|
2019-01-22 02:03:15 +01:00
|
|
|
|
|
|
|
/* in heap/heapam_visibility.c */
|
|
|
|
extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot,
|
|
|
|
Buffer buffer);
|
|
|
|
extern HTSU_Result HeapTupleSatisfiesUpdate(HeapTuple stup, CommandId curcid,
|
|
|
|
Buffer buffer);
|
|
|
|
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple stup, TransactionId OldestXmin,
|
|
|
|
Buffer buffer);
|
|
|
|
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
|
|
|
|
uint16 infomask, TransactionId xid);
|
|
|
|
extern bool HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple);
|
|
|
|
extern bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
|
|
|
|
extern bool HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* To avoid leaking too much knowledge about reorderbuffer implementation
|
|
|
|
* details this is implemented in reorderbuffer.c not heapam_visibility.c
|
|
|
|
*/
|
|
|
|
struct HTAB;
|
|
|
|
extern bool ResolveCminCmaxDuringDecoding(struct HTAB *tuplecid_data,
|
|
|
|
Snapshot snapshot,
|
|
|
|
HeapTuple htup,
|
|
|
|
Buffer buffer,
|
|
|
|
CommandId *cmin, CommandId *cmax);
|
|
|
|
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
|
|
|
#endif /* HEAPAM_H */
|