1996-07-09 08:22:35 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
1999-10-18 00:15:09 +02:00
|
|
|
* nbtree.c
|
1997-09-07 07:04:48 +02:00
|
|
|
* Implementation of Lehman and Yao's btree management algorithm for
|
|
|
|
* Postgres.
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
1999-10-18 00:15:09 +02:00
|
|
|
* NOTES
|
|
|
|
* This file contains only the public interface routines.
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*
|
2004-12-31 23:04:05 +01:00
|
|
|
* Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
|
2000-01-26 06:58:53 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
1999-10-18 00:15:09 +02:00
|
|
|
* IDENTIFICATION
|
2005-11-22 19:17:34 +01:00
|
|
|
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtree.c,v 1.134 2005/11/22 18:17:06 momjian Exp $
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
1999-07-16 01:04:24 +02:00
|
|
|
#include "postgres.h"
|
1996-11-05 11:35:38 +01:00
|
|
|
|
1999-07-16 01:04:24 +02:00
|
|
|
#include "access/genam.h"
|
|
|
|
#include "access/heapam.h"
|
1999-07-16 07:00:38 +02:00
|
|
|
#include "access/nbtree.h"
|
1999-07-16 01:04:24 +02:00
|
|
|
#include "catalog/index.h"
|
2004-02-10 04:42:45 +01:00
|
|
|
#include "commands/vacuum.h"
|
1999-07-16 01:04:24 +02:00
|
|
|
#include "miscadmin.h"
|
2003-02-22 01:45:05 +01:00
|
|
|
#include "storage/freespace.h"
|
2003-02-24 01:57:17 +01:00
|
|
|
#include "storage/smgr.h"
|
2005-05-06 19:24:55 +02:00
|
|
|
#include "utils/memutils.h"
|
2000-07-21 08:42:39 +02:00
|
|
|
|
2001-03-22 05:01:46 +01:00
|
|
|
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
/* Working state for btbuild and its callback */
|
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
bool usefast;
|
|
|
|
bool isUnique;
|
|
|
|
bool haveDead;
|
|
|
|
Relation heapRel;
|
|
|
|
BTSpool *spool;
|
2001-10-25 07:50:21 +02:00
|
|
|
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* spool2 is needed only when the index is an unique index. Dead tuples
|
|
|
|
* are put into spool2 instead of spool in order to avoid uniqueness
|
|
|
|
* check.
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
*/
|
|
|
|
BTSpool *spool2;
|
|
|
|
double indtuples;
|
|
|
|
} BTBuildState;
|
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
bool FastBuild = true; /* use SORT instead of insertion build */
|
2001-01-26 02:24:31 +01:00
|
|
|
|
1998-09-01 06:40:42 +02:00
|
|
|
static void _bt_restscan(IndexScanDesc scan);
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
static void btbuildCallback(Relation index,
|
2001-10-25 07:50:21 +02:00
|
|
|
HeapTuple htup,
|
2005-03-21 02:24:04 +01:00
|
|
|
Datum *values,
|
|
|
|
bool *isnull,
|
2001-10-25 07:50:21 +02:00
|
|
|
bool tupleIsAlive,
|
|
|
|
void *state);
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
|
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/*
|
1997-09-07 07:04:48 +02:00
|
|
|
* btbuild() -- build a new btree index.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2000-06-13 09:35:40 +02:00
|
|
|
Datum
|
|
|
|
btbuild(PG_FUNCTION_ARGS)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2001-03-22 05:01:46 +01:00
|
|
|
Relation heap = (Relation) PG_GETARG_POINTER(0);
|
|
|
|
Relation index = (Relation) PG_GETARG_POINTER(1);
|
|
|
|
IndexInfo *indexInfo = (IndexInfo *) PG_GETARG_POINTER(2);
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
double reltuples;
|
|
|
|
BTBuildState buildstate;
|
2001-03-22 05:01:46 +01:00
|
|
|
|
What looks like some *major* improvements to btree indexing...
Patches from: aoki@CS.Berkeley.EDU (Paul M. Aoki)
i gave jolly my btree bulkload code a long, long time ago but never
gave him a bunch of my bugfixes. here's a diff against the 6.0
baseline.
for some reason, this code has slowed down somewhat relative to the
insertion-build code on very small tables. don't know why -- it used
to be within about 10%. anyway, here are some (highly unscientific!)
timings on a dec 3000/300 for synthetic tables with 10k, 100k and
1000k tuples (basically, 1mb, 10mb and 100mb heaps). 'c' means
clustered (pre-sorted) inputs and 'u' means unclustered (randomly
ordered) inputs. the 10k table basically fits in the buffer pool, but
the 100k and 1000k tables don't. as you can see, insertion build is
fine if you've sorted your heaps on your index key or if your heap
fits in core, but is absolutely horrible on unordered data (yes,
that's 7.5 hours to index 100mb of data...) because of the zillions of
random i/os.
if it doesn't work for you for whatever reason, you can always turn it
back off by flipping the FastBuild flag in nbtree.c. i don't have
time to maintain it.
good luck!
baseline code:
time psql -c 'create index c10 on k10 using btree (c int4_ops)' bttest
real 8.6
time psql -c 'create index u10 on k10 using btree (b int4_ops)' bttest
real 9.1
time psql -c 'create index c100 on k100 using btree (c int4_ops)' bttest
real 59.2
time psql -c 'create index u100 on k100 using btree (b int4_ops)' bttest
real 652.4
time psql -c 'create index c1000 on k1000 using btree (c int4_ops)' bttest
real 636.1
time psql -c 'create index u1000 on k1000 using btree (b int4_ops)' bttest
real 26772.9
bulkloading code:
time psql -c 'create index c10 on k10 using btree (c int4_ops)' bttest
real 11.3
time psql -c 'create index u10 on k10 using btree (b int4_ops)' bttest
real 10.4
time psql -c 'create index c100 on k100 using btree (c int4_ops)' bttest
real 59.5
time psql -c 'create index u100 on k100 using btree (b int4_ops)' bttest
real 63.5
time psql -c 'create index c1000 on k1000 using btree (c int4_ops)' bttest
real 636.9
time psql -c 'create index u1000 on k1000 using btree (b int4_ops)' bttest
real 701.0
1997-02-12 06:04:52 +01:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* bootstrap processing does something strange, so don't use sort/build
|
|
|
|
* for initial catalog indices. at some point i need to look harder at
|
|
|
|
* this. (there is some kind of incremental processing going on there.)
|
|
|
|
* -- pma 08/29/95
|
What looks like some *major* improvements to btree indexing...
Patches from: aoki@CS.Berkeley.EDU (Paul M. Aoki)
i gave jolly my btree bulkload code a long, long time ago but never
gave him a bunch of my bugfixes. here's a diff against the 6.0
baseline.
for some reason, this code has slowed down somewhat relative to the
insertion-build code on very small tables. don't know why -- it used
to be within about 10%. anyway, here are some (highly unscientific!)
timings on a dec 3000/300 for synthetic tables with 10k, 100k and
1000k tuples (basically, 1mb, 10mb and 100mb heaps). 'c' means
clustered (pre-sorted) inputs and 'u' means unclustered (randomly
ordered) inputs. the 10k table basically fits in the buffer pool, but
the 100k and 1000k tables don't. as you can see, insertion build is
fine if you've sorted your heaps on your index key or if your heap
fits in core, but is absolutely horrible on unordered data (yes,
that's 7.5 hours to index 100mb of data...) because of the zillions of
random i/os.
if it doesn't work for you for whatever reason, you can always turn it
back off by flipping the FastBuild flag in nbtree.c. i don't have
time to maintain it.
good luck!
baseline code:
time psql -c 'create index c10 on k10 using btree (c int4_ops)' bttest
real 8.6
time psql -c 'create index u10 on k10 using btree (b int4_ops)' bttest
real 9.1
time psql -c 'create index c100 on k100 using btree (c int4_ops)' bttest
real 59.2
time psql -c 'create index u100 on k100 using btree (b int4_ops)' bttest
real 652.4
time psql -c 'create index c1000 on k1000 using btree (c int4_ops)' bttest
real 636.1
time psql -c 'create index u1000 on k1000 using btree (b int4_ops)' bttest
real 26772.9
bulkloading code:
time psql -c 'create index c10 on k10 using btree (c int4_ops)' bttest
real 11.3
time psql -c 'create index u10 on k10 using btree (b int4_ops)' bttest
real 10.4
time psql -c 'create index c100 on k100 using btree (c int4_ops)' bttest
real 59.5
time psql -c 'create index u100 on k100 using btree (b int4_ops)' bttest
real 63.5
time psql -c 'create index c1000 on k1000 using btree (c int4_ops)' bttest
real 636.9
time psql -c 'create index u1000 on k1000 using btree (b int4_ops)' bttest
real 701.0
1997-02-12 06:04:52 +01:00
|
|
|
*/
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
buildstate.usefast = (FastBuild && IsNormalProcessingMode());
|
|
|
|
buildstate.isUnique = indexInfo->ii_Unique;
|
|
|
|
buildstate.haveDead = false;
|
|
|
|
buildstate.heapRel = heap;
|
|
|
|
buildstate.spool = NULL;
|
|
|
|
buildstate.spool2 = NULL;
|
|
|
|
buildstate.indtuples = 0;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
#ifdef BTREE_BUILD_STATS
|
2002-11-15 02:26:09 +01:00
|
|
|
if (log_btree_build_stats)
|
1997-09-07 07:04:48 +02:00
|
|
|
ResetUsage();
|
2001-11-05 18:46:40 +01:00
|
|
|
#endif /* BTREE_BUILD_STATS */
|
1997-09-07 07:04:48 +02:00
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* We expect to be called exactly once for any index relation. If that's
|
|
|
|
* not the case, big trouble's what we have.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
if (RelationGetNumberOfBlocks(index) != 0)
|
2003-07-21 22:29:40 +02:00
|
|
|
elog(ERROR, "index \"%s\" already contains data",
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
RelationGetRelationName(index));
|
1997-09-07 07:04:48 +02:00
|
|
|
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
if (buildstate.usefast)
|
2000-08-10 04:33:20 +02:00
|
|
|
{
|
2004-02-03 18:34:04 +01:00
|
|
|
buildstate.spool = _bt_spoolinit(index, indexInfo->ii_Unique, false);
|
2001-10-25 07:50:21 +02:00
|
|
|
|
2000-08-10 04:33:20 +02:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* If building a unique index, put dead tuples in a second spool to
|
|
|
|
* keep them out of the uniqueness check.
|
2001-03-22 05:01:46 +01:00
|
|
|
*/
|
2000-08-10 04:33:20 +02:00
|
|
|
if (indexInfo->ii_Unique)
|
2004-02-03 18:34:04 +01:00
|
|
|
buildstate.spool2 = _bt_spoolinit(index, false, true);
|
2000-08-10 04:33:20 +02:00
|
|
|
}
|
2004-06-02 19:28:18 +02:00
|
|
|
else
|
|
|
|
{
|
|
|
|
/* if using slow build, initialize the btree index metadata page */
|
|
|
|
_bt_metapinit(index);
|
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
/* do the heap scan */
|
|
|
|
reltuples = IndexBuildHeapScan(heap, index, indexInfo,
|
|
|
|
btbuildCallback, (void *) &buildstate);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
/* okay, all heap tuples are indexed */
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
if (buildstate.spool2 && !buildstate.haveDead)
|
2000-08-10 04:33:20 +02:00
|
|
|
{
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
/* spool2 turns out to be unnecessary */
|
|
|
|
_bt_spooldestroy(buildstate.spool2);
|
|
|
|
buildstate.spool2 = NULL;
|
2000-08-10 04:33:20 +02:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/*
|
2000-04-12 19:17:23 +02:00
|
|
|
* if we are doing bottom-up btree build, finish the build by (1)
|
2005-10-15 04:49:52 +02:00
|
|
|
* completing the sort of the spool file, (2) inserting the sorted tuples
|
|
|
|
* into btree pages and (3) building the upper levels.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
if (buildstate.usefast)
|
1997-09-07 07:04:48 +02:00
|
|
|
{
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
_bt_leafbuild(buildstate.spool, buildstate.spool2);
|
|
|
|
_bt_spooldestroy(buildstate.spool);
|
|
|
|
if (buildstate.spool2)
|
|
|
|
_bt_spooldestroy(buildstate.spool2);
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
#ifdef BTREE_BUILD_STATS
|
2002-11-15 02:26:09 +01:00
|
|
|
if (log_btree_build_stats)
|
1997-09-07 07:04:48 +02:00
|
|
|
{
|
2001-11-11 00:51:14 +01:00
|
|
|
ShowUsage("BTREE BUILD STATS");
|
1997-09-07 07:04:48 +02:00
|
|
|
ResetUsage();
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
2001-11-05 18:46:40 +01:00
|
|
|
#endif /* BTREE_BUILD_STATS */
|
1996-07-09 08:22:35 +02:00
|
|
|
|
2005-05-11 08:24:55 +02:00
|
|
|
/* since we just counted the # of tuples, may as well update stats */
|
|
|
|
IndexCloseAndUpdateStats(heap, reltuples, index, buildstate.indtuples);
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
|
|
|
|
PG_RETURN_VOID();
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Per-tuple callback from IndexBuildHeapScan
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
btbuildCallback(Relation index,
|
|
|
|
HeapTuple htup,
|
2005-03-21 02:24:04 +01:00
|
|
|
Datum *values,
|
|
|
|
bool *isnull,
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
bool tupleIsAlive,
|
|
|
|
void *state)
|
|
|
|
{
|
2001-10-25 07:50:21 +02:00
|
|
|
BTBuildState *buildstate = (BTBuildState *) state;
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
IndexTuple itup;
|
|
|
|
BTItem btitem;
|
|
|
|
|
|
|
|
/* form an index tuple and point it at the heap tuple */
|
2005-03-21 02:24:04 +01:00
|
|
|
itup = index_form_tuple(RelationGetDescr(index), values, isnull);
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
itup->t_tid = htup->t_self;
|
|
|
|
|
|
|
|
btitem = _bt_formitem(itup);
|
|
|
|
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* if we are doing bottom-up btree build, we insert the index into a spool
|
|
|
|
* file for subsequent processing. otherwise, we insert into the btree.
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
*/
|
|
|
|
if (buildstate->usefast)
|
|
|
|
{
|
|
|
|
if (tupleIsAlive || buildstate->spool2 == NULL)
|
|
|
|
_bt_spool(btitem, buildstate->spool);
|
|
|
|
else
|
1997-09-07 07:04:48 +02:00
|
|
|
{
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
/* dead tuples are put into spool2 */
|
|
|
|
buildstate->haveDead = true;
|
|
|
|
_bt_spool(btitem, buildstate->spool2);
|
1997-09-07 07:04:48 +02:00
|
|
|
}
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
else
|
|
|
|
{
|
2005-03-21 02:24:04 +01:00
|
|
|
_bt_doinsert(index, btitem,
|
|
|
|
buildstate->isUnique, buildstate->heapRel);
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
}
|
1996-07-09 08:22:35 +02:00
|
|
|
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
buildstate->indtuples += 1;
|
2000-06-13 09:35:40 +02:00
|
|
|
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
pfree(btitem);
|
|
|
|
pfree(itup);
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
1997-09-07 07:04:48 +02:00
|
|
|
* btinsert() -- insert an index tuple into a btree.
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
1997-09-07 07:04:48 +02:00
|
|
|
* Descend the tree recursively, find the appropriate location for our
|
2005-03-21 02:24:04 +01:00
|
|
|
* new tuple, and put it there.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2000-06-13 09:35:40 +02:00
|
|
|
Datum
|
|
|
|
btinsert(PG_FUNCTION_ARGS)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2001-03-22 05:01:46 +01:00
|
|
|
Relation rel = (Relation) PG_GETARG_POINTER(0);
|
2005-03-21 02:24:04 +01:00
|
|
|
Datum *values = (Datum *) PG_GETARG_POINTER(1);
|
|
|
|
bool *isnull = (bool *) PG_GETARG_POINTER(2);
|
2001-03-22 05:01:46 +01:00
|
|
|
ItemPointer ht_ctid = (ItemPointer) PG_GETARG_POINTER(3);
|
|
|
|
Relation heapRel = (Relation) PG_GETARG_POINTER(4);
|
2002-05-24 20:57:57 +02:00
|
|
|
bool checkUnique = PG_GETARG_BOOL(5);
|
1997-09-08 04:41:22 +02:00
|
|
|
BTItem btitem;
|
|
|
|
IndexTuple itup;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
/* generate an index tuple */
|
2005-03-21 02:24:04 +01:00
|
|
|
itup = index_form_tuple(RelationGetDescr(rel), values, isnull);
|
1997-09-07 07:04:48 +02:00
|
|
|
itup->t_tid = *ht_ctid;
|
|
|
|
btitem = _bt_formitem(itup);
|
|
|
|
|
2005-03-21 02:24:04 +01:00
|
|
|
_bt_doinsert(rel, btitem, checkUnique, heapRel);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
pfree(btitem);
|
|
|
|
pfree(itup);
|
|
|
|
|
2005-03-21 02:24:04 +01:00
|
|
|
PG_RETURN_BOOL(true);
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
1997-09-07 07:04:48 +02:00
|
|
|
* btgettuple() -- Get the next tuple in the scan.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2000-06-13 09:35:40 +02:00
|
|
|
Datum
|
|
|
|
btgettuple(PG_FUNCTION_ARGS)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2001-03-22 05:01:46 +01:00
|
|
|
IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
|
|
|
|
ScanDirection dir = (ScanDirection) PG_GETARG_INT32(1);
|
2002-05-24 20:57:57 +02:00
|
|
|
BTScanOpaque so = (BTScanOpaque) scan->opaque;
|
|
|
|
Page page;
|
|
|
|
OffsetNumber offnum;
|
|
|
|
bool res;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* If we've already initialized this scan, we can just advance it in the
|
|
|
|
* appropriate direction. If we haven't done so yet, we call a routine to
|
|
|
|
* get the first item in the scan.
|
1997-09-07 07:04:48 +02:00
|
|
|
*/
|
|
|
|
if (ItemPointerIsValid(&(scan->currentItemData)))
|
1998-07-30 07:05:05 +02:00
|
|
|
{
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Restore scan position using heap TID returned by previous call to
|
|
|
|
* btgettuple(). _bt_restscan() re-grabs the read lock on the buffer,
|
|
|
|
* too.
|
1998-07-30 07:05:05 +02:00
|
|
|
*/
|
|
|
|
_bt_restscan(scan);
|
2002-09-04 22:31:48 +02:00
|
|
|
|
2002-05-24 20:57:57 +02:00
|
|
|
/*
|
|
|
|
* Check to see if we should kill the previously-fetched tuple.
|
|
|
|
*/
|
|
|
|
if (scan->kill_prior_tuple)
|
|
|
|
{
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Yes, so mark it by setting the LP_DELETE bit in the item flags.
|
2002-05-24 20:57:57 +02:00
|
|
|
*/
|
|
|
|
offnum = ItemPointerGetOffsetNumber(&(scan->currentItemData));
|
|
|
|
page = BufferGetPage(so->btso_curbuf);
|
|
|
|
PageGetItemId(page, offnum)->lp_flags |= LP_DELETE;
|
2002-09-04 22:31:48 +02:00
|
|
|
|
2002-05-24 20:57:57 +02:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Since this can be redone later if needed, it's treated the same
|
|
|
|
* as a commit-hint-bit status update for heap tuples: we mark the
|
|
|
|
* buffer dirty but don't make a WAL log entry.
|
2002-05-24 20:57:57 +02:00
|
|
|
*/
|
|
|
|
SetBufferCommitInfoNeedsSave(so->btso_curbuf);
|
|
|
|
}
|
2002-09-04 22:31:48 +02:00
|
|
|
|
2002-05-24 20:57:57 +02:00
|
|
|
/*
|
|
|
|
* Now continue the scan.
|
|
|
|
*/
|
1997-09-07 07:04:48 +02:00
|
|
|
res = _bt_next(scan, dir);
|
1998-07-30 07:05:05 +02:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
else
|
|
|
|
res = _bt_first(scan, dir);
|
1998-09-01 06:40:42 +02:00
|
|
|
|
2002-05-24 20:57:57 +02:00
|
|
|
/*
|
|
|
|
* Skip killed tuples if asked to.
|
|
|
|
*/
|
|
|
|
if (scan->ignore_killed_tuples)
|
|
|
|
{
|
|
|
|
while (res)
|
|
|
|
{
|
|
|
|
offnum = ItemPointerGetOffsetNumber(&(scan->currentItemData));
|
|
|
|
page = BufferGetPage(so->btso_curbuf);
|
|
|
|
if (!ItemIdDeleted(PageGetItemId(page, offnum)))
|
|
|
|
break;
|
|
|
|
res = _bt_next(scan, dir);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
1999-05-26 00:04:56 +02:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Save heap TID to use it in _bt_restscan. Then release the read lock on
|
|
|
|
* the buffer so that we aren't blocking other backends.
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
*
|
2005-11-22 19:17:34 +01:00
|
|
|
* NOTE: we do keep the pin on the buffer! This is essential to ensure
|
|
|
|
* that someone else doesn't delete the index entry we are stopped on.
|
1999-05-25 20:20:31 +02:00
|
|
|
*/
|
1998-07-30 07:05:05 +02:00
|
|
|
if (res)
|
1999-05-25 20:20:31 +02:00
|
|
|
{
|
2002-05-21 01:51:44 +02:00
|
|
|
((BTScanOpaque) scan->opaque)->curHeapIptr = scan->xs_ctup.t_self;
|
2000-06-13 09:35:40 +02:00
|
|
|
LockBuffer(((BTScanOpaque) scan->opaque)->btso_curbuf,
|
|
|
|
BUFFER_LOCK_UNLOCK);
|
1999-05-25 20:20:31 +02:00
|
|
|
}
|
1998-09-01 06:40:42 +02:00
|
|
|
|
2002-05-21 01:51:44 +02:00
|
|
|
PG_RETURN_BOOL(res);
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
2005-03-28 01:53:05 +02:00
|
|
|
/*
|
|
|
|
* btgetmulti() -- get multiple tuples at once
|
|
|
|
*
|
|
|
|
* This is a somewhat generic implementation: it avoids the _bt_restscan
|
|
|
|
* overhead, but there's no smarts about picking especially good stopping
|
|
|
|
* points such as index page boundaries.
|
|
|
|
*/
|
|
|
|
Datum
|
|
|
|
btgetmulti(PG_FUNCTION_ARGS)
|
|
|
|
{
|
|
|
|
IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
|
2005-10-15 04:49:52 +02:00
|
|
|
ItemPointer tids = (ItemPointer) PG_GETARG_POINTER(1);
|
2005-03-28 01:53:05 +02:00
|
|
|
int32 max_tids = PG_GETARG_INT32(2);
|
|
|
|
int32 *returned_tids = (int32 *) PG_GETARG_POINTER(3);
|
|
|
|
BTScanOpaque so = (BTScanOpaque) scan->opaque;
|
|
|
|
bool res = true;
|
|
|
|
int32 ntids = 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Restore prior state if we were already called at least once.
|
|
|
|
*/
|
|
|
|
if (ItemPointerIsValid(&(scan->currentItemData)))
|
|
|
|
_bt_restscan(scan);
|
|
|
|
|
|
|
|
while (ntids < max_tids)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Start scan, or advance to next tuple.
|
|
|
|
*/
|
|
|
|
if (ItemPointerIsValid(&(scan->currentItemData)))
|
|
|
|
res = _bt_next(scan, ForwardScanDirection);
|
|
|
|
else
|
|
|
|
res = _bt_first(scan, ForwardScanDirection);
|
2005-10-15 04:49:52 +02:00
|
|
|
|
2005-03-28 01:53:05 +02:00
|
|
|
/*
|
|
|
|
* Skip killed tuples if asked to.
|
|
|
|
*/
|
|
|
|
if (scan->ignore_killed_tuples)
|
|
|
|
{
|
|
|
|
while (res)
|
|
|
|
{
|
|
|
|
Page page;
|
|
|
|
OffsetNumber offnum;
|
|
|
|
|
|
|
|
offnum = ItemPointerGetOffsetNumber(&(scan->currentItemData));
|
|
|
|
page = BufferGetPage(so->btso_curbuf);
|
|
|
|
if (!ItemIdDeleted(PageGetItemId(page, offnum)))
|
|
|
|
break;
|
|
|
|
res = _bt_next(scan, ForwardScanDirection);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!res)
|
|
|
|
break;
|
|
|
|
/* Save tuple ID, and continue scanning */
|
|
|
|
tids[ntids] = scan->xs_ctup.t_self;
|
|
|
|
ntids++;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Save heap TID to use it in _bt_restscan. Then release the read lock on
|
|
|
|
* the buffer so that we aren't blocking other backends.
|
2005-03-28 01:53:05 +02:00
|
|
|
*/
|
|
|
|
if (res)
|
|
|
|
{
|
|
|
|
((BTScanOpaque) scan->opaque)->curHeapIptr = scan->xs_ctup.t_self;
|
|
|
|
LockBuffer(((BTScanOpaque) scan->opaque)->btso_curbuf,
|
|
|
|
BUFFER_LOCK_UNLOCK);
|
|
|
|
}
|
|
|
|
|
|
|
|
*returned_tids = ntids;
|
|
|
|
PG_RETURN_BOOL(res);
|
|
|
|
}
|
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/*
|
1997-09-07 07:04:48 +02:00
|
|
|
* btbeginscan() -- start a scan on a btree index
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2000-06-13 09:35:40 +02:00
|
|
|
Datum
|
|
|
|
btbeginscan(PG_FUNCTION_ARGS)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2000-06-13 09:35:40 +02:00
|
|
|
Relation rel = (Relation) PG_GETARG_POINTER(0);
|
2002-05-21 01:51:44 +02:00
|
|
|
int keysz = PG_GETARG_INT32(1);
|
|
|
|
ScanKey scankey = (ScanKey) PG_GETARG_POINTER(2);
|
1997-09-08 04:41:22 +02:00
|
|
|
IndexScanDesc scan;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
/* get the scan */
|
2002-05-21 01:51:44 +02:00
|
|
|
scan = RelationGetIndexScan(rel, keysz, scankey);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2000-06-13 09:35:40 +02:00
|
|
|
PG_RETURN_POINTER(scan);
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
1997-09-07 07:04:48 +02:00
|
|
|
* btrescan() -- rescan an index relation
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2000-06-13 09:35:40 +02:00
|
|
|
Datum
|
|
|
|
btrescan(PG_FUNCTION_ARGS)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2001-03-22 05:01:46 +01:00
|
|
|
IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
|
2002-05-21 01:51:44 +02:00
|
|
|
ScanKey scankey = (ScanKey) PG_GETARG_POINTER(1);
|
1997-09-08 04:41:22 +02:00
|
|
|
ItemPointer iptr;
|
|
|
|
BTScanOpaque so;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
so = (BTScanOpaque) scan->opaque;
|
|
|
|
|
2000-07-21 08:42:39 +02:00
|
|
|
if (so == NULL) /* if called from btbeginscan */
|
|
|
|
{
|
|
|
|
so = (BTScanOpaque) palloc(sizeof(BTScanOpaqueData));
|
|
|
|
so->btso_curbuf = so->btso_mrkbuf = InvalidBuffer;
|
2002-05-21 01:51:44 +02:00
|
|
|
ItemPointerSetInvalid(&(so->curHeapIptr));
|
|
|
|
ItemPointerSetInvalid(&(so->mrkHeapIptr));
|
2000-07-21 08:42:39 +02:00
|
|
|
if (scan->numberOfKeys > 0)
|
|
|
|
so->keyData = (ScanKey) palloc(scan->numberOfKeys * sizeof(ScanKeyData));
|
2002-05-21 01:51:44 +02:00
|
|
|
else
|
2004-01-07 19:56:30 +01:00
|
|
|
so->keyData = NULL;
|
2000-07-21 08:42:39 +02:00
|
|
|
scan->opaque = so;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* we aren't holding any read locks, but gotta drop the pins */
|
1997-09-07 07:04:48 +02:00
|
|
|
if (ItemPointerIsValid(iptr = &(scan->currentItemData)))
|
|
|
|
{
|
1999-05-25 20:20:31 +02:00
|
|
|
ReleaseBuffer(so->btso_curbuf);
|
1997-09-07 07:04:48 +02:00
|
|
|
so->btso_curbuf = InvalidBuffer;
|
2002-05-21 01:51:44 +02:00
|
|
|
ItemPointerSetInvalid(&(so->curHeapIptr));
|
1997-09-07 07:04:48 +02:00
|
|
|
ItemPointerSetInvalid(iptr);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (ItemPointerIsValid(iptr = &(scan->currentMarkData)))
|
|
|
|
{
|
1999-05-25 20:20:31 +02:00
|
|
|
ReleaseBuffer(so->btso_mrkbuf);
|
1997-09-07 07:04:48 +02:00
|
|
|
so->btso_mrkbuf = InvalidBuffer;
|
2002-05-21 01:51:44 +02:00
|
|
|
ItemPointerSetInvalid(&(so->mrkHeapIptr));
|
1997-09-07 07:04:48 +02:00
|
|
|
ItemPointerSetInvalid(iptr);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Reset the scan keys. Note that keys ordering stuff moved to _bt_first.
|
|
|
|
* - vadim 05/05/97
|
1997-09-07 07:04:48 +02:00
|
|
|
*/
|
2003-03-24 00:01:03 +01:00
|
|
|
if (scankey && scan->numberOfKeys > 0)
|
1997-09-07 07:04:48 +02:00
|
|
|
memmove(scan->keyData,
|
|
|
|
scankey,
|
|
|
|
scan->numberOfKeys * sizeof(ScanKeyData));
|
2003-11-12 22:15:59 +01:00
|
|
|
so->numberOfKeys = 0; /* until _bt_preprocess_keys sets it */
|
1996-07-30 09:56:04 +02:00
|
|
|
|
2000-06-14 07:24:50 +02:00
|
|
|
PG_RETURN_VOID();
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
1997-09-07 07:04:48 +02:00
|
|
|
* btendscan() -- close down a scan
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2000-06-13 09:35:40 +02:00
|
|
|
Datum
|
|
|
|
btendscan(PG_FUNCTION_ARGS)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2001-03-22 05:01:46 +01:00
|
|
|
IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
|
1997-09-08 04:41:22 +02:00
|
|
|
ItemPointer iptr;
|
|
|
|
BTScanOpaque so;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
so = (BTScanOpaque) scan->opaque;
|
|
|
|
|
2000-07-21 08:42:39 +02:00
|
|
|
/* we aren't holding any read locks, but gotta drop the pins */
|
1997-09-07 07:04:48 +02:00
|
|
|
if (ItemPointerIsValid(iptr = &(scan->currentItemData)))
|
|
|
|
{
|
|
|
|
if (BufferIsValid(so->btso_curbuf))
|
1999-05-25 20:20:31 +02:00
|
|
|
ReleaseBuffer(so->btso_curbuf);
|
1997-09-07 07:04:48 +02:00
|
|
|
so->btso_curbuf = InvalidBuffer;
|
|
|
|
ItemPointerSetInvalid(iptr);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (ItemPointerIsValid(iptr = &(scan->currentMarkData)))
|
|
|
|
{
|
|
|
|
if (BufferIsValid(so->btso_mrkbuf))
|
1999-05-25 20:20:31 +02:00
|
|
|
ReleaseBuffer(so->btso_mrkbuf);
|
1997-09-07 07:04:48 +02:00
|
|
|
so->btso_mrkbuf = InvalidBuffer;
|
|
|
|
ItemPointerSetInvalid(iptr);
|
|
|
|
}
|
|
|
|
|
2004-01-07 19:56:30 +01:00
|
|
|
if (so->keyData != NULL)
|
1997-09-07 07:04:48 +02:00
|
|
|
pfree(so->keyData);
|
|
|
|
pfree(so);
|
|
|
|
|
2000-06-14 07:24:50 +02:00
|
|
|
PG_RETURN_VOID();
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
1997-09-07 07:04:48 +02:00
|
|
|
* btmarkpos() -- save current scan position
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2000-06-13 09:35:40 +02:00
|
|
|
Datum
|
|
|
|
btmarkpos(PG_FUNCTION_ARGS)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2001-03-22 05:01:46 +01:00
|
|
|
IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
|
1997-09-08 04:41:22 +02:00
|
|
|
ItemPointer iptr;
|
|
|
|
BTScanOpaque so;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
so = (BTScanOpaque) scan->opaque;
|
|
|
|
|
2000-07-21 08:42:39 +02:00
|
|
|
/* we aren't holding any read locks, but gotta drop the pin */
|
1997-09-07 07:04:48 +02:00
|
|
|
if (ItemPointerIsValid(iptr = &(scan->currentMarkData)))
|
|
|
|
{
|
1999-05-25 20:20:31 +02:00
|
|
|
ReleaseBuffer(so->btso_mrkbuf);
|
1997-09-07 07:04:48 +02:00
|
|
|
so->btso_mrkbuf = InvalidBuffer;
|
|
|
|
ItemPointerSetInvalid(iptr);
|
|
|
|
}
|
|
|
|
|
2000-07-21 08:42:39 +02:00
|
|
|
/* bump pin on current buffer for assignment to mark buffer */
|
1997-09-07 07:04:48 +02:00
|
|
|
if (ItemPointerIsValid(&(scan->currentItemData)))
|
|
|
|
{
|
2004-11-17 04:13:38 +01:00
|
|
|
IncrBufferRefCount(so->btso_curbuf);
|
|
|
|
so->btso_mrkbuf = so->btso_curbuf;
|
1997-09-07 07:04:48 +02:00
|
|
|
scan->currentMarkData = scan->currentItemData;
|
1998-07-30 07:05:05 +02:00
|
|
|
so->mrkHeapIptr = so->curHeapIptr;
|
1997-09-07 07:04:48 +02:00
|
|
|
}
|
2000-06-13 09:35:40 +02:00
|
|
|
|
2000-06-14 07:24:50 +02:00
|
|
|
PG_RETURN_VOID();
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
1997-09-07 07:04:48 +02:00
|
|
|
* btrestrpos() -- restore scan to last saved position
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2000-06-13 09:35:40 +02:00
|
|
|
Datum
|
|
|
|
btrestrpos(PG_FUNCTION_ARGS)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2001-03-22 05:01:46 +01:00
|
|
|
IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
|
1997-09-08 04:41:22 +02:00
|
|
|
ItemPointer iptr;
|
|
|
|
BTScanOpaque so;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
so = (BTScanOpaque) scan->opaque;
|
|
|
|
|
2000-07-21 08:42:39 +02:00
|
|
|
/* we aren't holding any read locks, but gotta drop the pin */
|
1997-09-07 07:04:48 +02:00
|
|
|
if (ItemPointerIsValid(iptr = &(scan->currentItemData)))
|
|
|
|
{
|
1999-05-25 20:20:31 +02:00
|
|
|
ReleaseBuffer(so->btso_curbuf);
|
1997-09-07 07:04:48 +02:00
|
|
|
so->btso_curbuf = InvalidBuffer;
|
|
|
|
ItemPointerSetInvalid(iptr);
|
|
|
|
}
|
|
|
|
|
1999-05-25 20:20:31 +02:00
|
|
|
/* bump pin on marked buffer */
|
1997-09-07 07:04:48 +02:00
|
|
|
if (ItemPointerIsValid(&(scan->currentMarkData)))
|
|
|
|
{
|
2004-11-17 04:13:38 +01:00
|
|
|
IncrBufferRefCount(so->btso_mrkbuf);
|
|
|
|
so->btso_curbuf = so->btso_mrkbuf;
|
1997-09-07 07:04:48 +02:00
|
|
|
scan->currentItemData = scan->currentMarkData;
|
1998-07-30 07:05:05 +02:00
|
|
|
so->curHeapIptr = so->mrkHeapIptr;
|
1997-09-07 07:04:48 +02:00
|
|
|
}
|
2000-06-13 09:35:40 +02:00
|
|
|
|
2000-06-14 07:24:50 +02:00
|
|
|
PG_RETURN_VOID();
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
/*
|
|
|
|
* Bulk deletion of all index entries pointing to a set of heap tuples.
|
|
|
|
* The set of target tuples is specified via a callback routine that tells
|
|
|
|
* whether any given heap tuple (identified by ItemPointer) is being deleted.
|
|
|
|
*
|
|
|
|
* Result: a palloc'd struct containing statistical info for VACUUM displays.
|
|
|
|
*/
|
2000-06-13 09:35:40 +02:00
|
|
|
Datum
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
btbulkdelete(PG_FUNCTION_ARGS)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2001-03-22 05:01:46 +01:00
|
|
|
Relation rel = (Relation) PG_GETARG_POINTER(0);
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
IndexBulkDeleteCallback callback = (IndexBulkDeleteCallback) PG_GETARG_POINTER(1);
|
|
|
|
void *callback_state = (void *) PG_GETARG_POINTER(2);
|
|
|
|
IndexBulkDeleteResult *result;
|
|
|
|
double tuples_removed;
|
|
|
|
double num_index_tuples;
|
2005-09-02 21:02:20 +02:00
|
|
|
OffsetNumber deletable[MaxOffsetNumber];
|
2003-02-23 23:43:09 +01:00
|
|
|
int ndeletable;
|
|
|
|
Buffer buf;
|
|
|
|
BlockNumber num_pages;
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
|
|
|
|
tuples_removed = 0;
|
|
|
|
num_index_tuples = 0;
|
|
|
|
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* The outer loop iterates over index leaf pages, the inner over items on
|
|
|
|
* a leaf page. We issue just one _bt_delitems() call per page, so as to
|
|
|
|
* minimize WAL traffic.
|
2002-10-20 22:47:31 +02:00
|
|
|
*
|
2003-08-04 02:43:34 +02:00
|
|
|
* Note that we exclusive-lock every leaf page containing data items, in
|
2005-10-15 04:49:52 +02:00
|
|
|
* sequence left to right. It sounds attractive to only exclusive-lock
|
|
|
|
* those containing items we need to delete, but unfortunately that is not
|
|
|
|
* safe: we could then pass a stopped indexscan, which could in rare cases
|
|
|
|
* lead to deleting the item it needs to find when it resumes. (See
|
|
|
|
* _bt_restscan --- this could only happen if an indexscan stops on a
|
|
|
|
* deletable item and then a page split moves that item into a page
|
|
|
|
* further to its right, which the indexscan will have no pin on.) We can
|
|
|
|
* skip obtaining exclusive lock on empty pages though, since no indexscan
|
|
|
|
* could be stopped on those.
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
*/
|
2003-02-23 23:43:09 +01:00
|
|
|
buf = _bt_get_endpoint(rel, 0, false);
|
|
|
|
if (BufferIsValid(buf)) /* check for empty index */
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
{
|
2003-02-23 23:43:09 +01:00
|
|
|
for (;;)
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
{
|
|
|
|
Page page;
|
2001-11-24 00:41:54 +01:00
|
|
|
BTPageOpaque opaque;
|
2003-02-23 23:43:09 +01:00
|
|
|
OffsetNumber offnum,
|
|
|
|
minoff,
|
|
|
|
maxoff;
|
2003-08-04 02:43:34 +02:00
|
|
|
BlockNumber nextpage;
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
|
2004-02-10 04:42:45 +01:00
|
|
|
vacuum_delay_point();
|
2004-02-06 20:36:18 +01:00
|
|
|
|
2003-02-23 23:43:09 +01:00
|
|
|
ndeletable = 0;
|
2002-10-20 22:47:31 +02:00
|
|
|
page = BufferGetPage(buf);
|
2003-02-23 23:43:09 +01:00
|
|
|
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
|
|
|
|
minoff = P_FIRSTDATAKEY(opaque);
|
|
|
|
maxoff = PageGetMaxOffsetNumber(page);
|
|
|
|
/* We probably cannot see deleted pages, but skip 'em if so */
|
|
|
|
if (minoff <= maxoff && !P_ISDELETED(opaque))
|
2002-10-20 22:47:31 +02:00
|
|
|
{
|
2003-02-23 23:43:09 +01:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Trade in the initial read lock for a super-exclusive write
|
|
|
|
* lock on this page.
|
2003-02-23 23:43:09 +01:00
|
|
|
*/
|
2002-10-20 22:47:31 +02:00
|
|
|
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
|
|
|
|
LockBufferForCleanup(buf);
|
2003-08-04 02:43:34 +02:00
|
|
|
|
2002-10-20 22:47:31 +02:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Recompute minoff/maxoff, both of which could have changed
|
|
|
|
* while we weren't holding the lock.
|
2002-10-20 22:47:31 +02:00
|
|
|
*/
|
2003-02-23 23:43:09 +01:00
|
|
|
minoff = P_FIRSTDATAKEY(opaque);
|
|
|
|
maxoff = PageGetMaxOffsetNumber(page);
|
2003-08-04 02:43:34 +02:00
|
|
|
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
/*
|
2003-02-23 23:43:09 +01:00
|
|
|
* Scan over all items to see which ones need deleted
|
|
|
|
* according to the callback function.
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
*/
|
2003-02-23 23:43:09 +01:00
|
|
|
for (offnum = minoff;
|
|
|
|
offnum <= maxoff;
|
|
|
|
offnum = OffsetNumberNext(offnum))
|
|
|
|
{
|
|
|
|
BTItem btitem;
|
|
|
|
ItemPointer htup;
|
|
|
|
|
|
|
|
btitem = (BTItem) PageGetItem(page,
|
2005-10-15 04:49:52 +02:00
|
|
|
PageGetItemId(page, offnum));
|
2003-02-23 23:43:09 +01:00
|
|
|
htup = &(btitem->bti_itup.t_tid);
|
|
|
|
if (callback(htup, callback_state))
|
|
|
|
{
|
|
|
|
deletable[ndeletable++] = offnum;
|
|
|
|
tuples_removed += 1;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
num_index_tuples += 1;
|
|
|
|
}
|
|
|
|
}
|
2003-08-04 02:43:34 +02:00
|
|
|
|
2003-02-23 23:43:09 +01:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* If we need to delete anything, do it and write the buffer; else
|
|
|
|
* just release the buffer.
|
2003-02-23 23:43:09 +01:00
|
|
|
*/
|
|
|
|
nextpage = opaque->btpo_next;
|
|
|
|
if (ndeletable > 0)
|
|
|
|
{
|
|
|
|
_bt_delitems(rel, buf, deletable, ndeletable);
|
|
|
|
_bt_wrtbuf(rel, buf);
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
}
|
|
|
|
else
|
2003-02-23 23:43:09 +01:00
|
|
|
_bt_relbuf(rel, buf);
|
|
|
|
/* And advance to next page, if any */
|
|
|
|
if (nextpage == P_NONE)
|
|
|
|
break;
|
|
|
|
buf = _bt_getbuf(rel, nextpage, BT_READ);
|
|
|
|
}
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/* return statistics */
|
|
|
|
num_pages = RelationGetNumberOfBlocks(rel);
|
|
|
|
|
2003-02-24 01:57:17 +01:00
|
|
|
result = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
result->num_pages = num_pages;
|
|
|
|
result->num_index_tuples = num_index_tuples;
|
2003-02-22 01:45:05 +01:00
|
|
|
result->tuples_removed = tuples_removed;
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
|
|
|
|
PG_RETURN_POINTER(result);
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
1998-07-30 07:05:05 +02:00
|
|
|
|
2003-02-22 01:45:05 +01:00
|
|
|
/*
|
|
|
|
* Post-VACUUM cleanup.
|
|
|
|
*
|
|
|
|
* Here, we scan looking for pages we can delete or return to the freelist.
|
|
|
|
*
|
|
|
|
* Result: a palloc'd struct containing statistical info for VACUUM displays.
|
|
|
|
*/
|
|
|
|
Datum
|
|
|
|
btvacuumcleanup(PG_FUNCTION_ARGS)
|
|
|
|
{
|
|
|
|
Relation rel = (Relation) PG_GETARG_POINTER(0);
|
|
|
|
IndexVacuumCleanupInfo *info = (IndexVacuumCleanupInfo *) PG_GETARG_POINTER(1);
|
|
|
|
IndexBulkDeleteResult *stats = (IndexBulkDeleteResult *) PG_GETARG_POINTER(2);
|
|
|
|
BlockNumber num_pages;
|
|
|
|
BlockNumber blkno;
|
2003-03-04 22:51:22 +01:00
|
|
|
BlockNumber *freePages;
|
2003-02-22 01:45:05 +01:00
|
|
|
int nFreePages,
|
|
|
|
maxFreePages;
|
2003-02-23 07:17:13 +01:00
|
|
|
BlockNumber pages_deleted = 0;
|
|
|
|
MemoryContext mycontext;
|
|
|
|
MemoryContext oldcontext;
|
2005-05-07 23:32:24 +02:00
|
|
|
bool needLock;
|
2003-02-22 01:45:05 +01:00
|
|
|
|
|
|
|
Assert(stats != NULL);
|
|
|
|
|
2005-05-07 23:32:24 +02:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* First find out the number of pages in the index. We must acquire the
|
|
|
|
* relation-extension lock while doing this to avoid a race condition: if
|
|
|
|
* someone else is extending the relation, there is a window where
|
|
|
|
* bufmgr/smgr have created a new all-zero page but it hasn't yet been
|
|
|
|
* write-locked by _bt_getbuf(). If we manage to scan such a page here,
|
|
|
|
* we'll improperly assume it can be recycled. Taking the lock
|
|
|
|
* synchronizes things enough to prevent a problem: either num_pages won't
|
|
|
|
* include the new page, or _bt_getbuf already has write lock on the
|
|
|
|
* buffer and it will be fully initialized before we can examine it. (See
|
|
|
|
* also vacuumlazy.c, which has the same issue.)
|
2005-05-07 23:32:24 +02:00
|
|
|
*
|
2005-11-06 20:29:01 +01:00
|
|
|
* We can skip locking for new or temp relations, however, since no one
|
|
|
|
* else could be accessing them.
|
2005-05-07 23:32:24 +02:00
|
|
|
*/
|
|
|
|
needLock = !RELATION_IS_LOCAL(rel);
|
|
|
|
|
|
|
|
if (needLock)
|
|
|
|
LockRelationForExtension(rel, ExclusiveLock);
|
|
|
|
|
2003-02-22 01:45:05 +01:00
|
|
|
num_pages = RelationGetNumberOfBlocks(rel);
|
|
|
|
|
2005-05-07 23:32:24 +02:00
|
|
|
if (needLock)
|
|
|
|
UnlockRelationForExtension(rel, ExclusiveLock);
|
|
|
|
|
2003-02-22 01:45:05 +01:00
|
|
|
/* No point in remembering more than MaxFSMPages pages */
|
|
|
|
maxFreePages = MaxFSMPages;
|
|
|
|
if ((BlockNumber) maxFreePages > num_pages)
|
2004-06-05 21:48:09 +02:00
|
|
|
maxFreePages = (int) num_pages;
|
2003-03-04 22:51:22 +01:00
|
|
|
freePages = (BlockNumber *) palloc(maxFreePages * sizeof(BlockNumber));
|
2003-02-22 01:45:05 +01:00
|
|
|
nFreePages = 0;
|
|
|
|
|
2003-02-23 07:17:13 +01:00
|
|
|
/* Create a temporary memory context to run _bt_pagedel in */
|
|
|
|
mycontext = AllocSetContextCreate(CurrentMemoryContext,
|
|
|
|
"_bt_pagedel",
|
|
|
|
ALLOCSET_DEFAULT_MINSIZE,
|
|
|
|
ALLOCSET_DEFAULT_INITSIZE,
|
|
|
|
ALLOCSET_DEFAULT_MAXSIZE);
|
|
|
|
|
2003-02-22 01:45:05 +01:00
|
|
|
/*
|
|
|
|
* Scan through all pages of index, except metapage. (Any pages added
|
|
|
|
* after we start the scan will not be examined; this should be fine,
|
|
|
|
* since they can't possibly be empty.)
|
|
|
|
*/
|
2003-08-04 02:43:34 +02:00
|
|
|
for (blkno = BTREE_METAPAGE + 1; blkno < num_pages; blkno++)
|
2003-02-22 01:45:05 +01:00
|
|
|
{
|
2003-08-04 02:43:34 +02:00
|
|
|
Buffer buf;
|
|
|
|
Page page;
|
2003-02-22 01:45:05 +01:00
|
|
|
BTPageOpaque opaque;
|
|
|
|
|
2005-11-06 20:29:01 +01:00
|
|
|
/*
|
|
|
|
* We can't use _bt_getbuf() here because it always applies
|
2005-11-22 19:17:34 +01:00
|
|
|
* _bt_checkpage(), which will barf on an all-zero page. We want to
|
|
|
|
* recycle all-zero pages, not fail.
|
2005-11-06 20:29:01 +01:00
|
|
|
*/
|
|
|
|
buf = ReadBuffer(rel, blkno);
|
|
|
|
LockBuffer(buf, BT_READ);
|
2003-02-22 01:45:05 +01:00
|
|
|
page = BufferGetPage(buf);
|
|
|
|
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
|
2005-11-06 20:29:01 +01:00
|
|
|
if (!PageIsNew(page))
|
|
|
|
_bt_checkpage(rel, buf);
|
2003-02-23 07:17:13 +01:00
|
|
|
if (_bt_page_recyclable(page))
|
2003-02-22 01:45:05 +01:00
|
|
|
{
|
2003-02-23 07:17:13 +01:00
|
|
|
/* Okay to recycle this page */
|
2003-02-22 01:45:05 +01:00
|
|
|
if (nFreePages < maxFreePages)
|
2003-03-04 22:51:22 +01:00
|
|
|
freePages[nFreePages++] = blkno;
|
2003-02-24 01:57:17 +01:00
|
|
|
pages_deleted++;
|
|
|
|
}
|
|
|
|
else if (P_ISDELETED(opaque))
|
|
|
|
{
|
|
|
|
/* Already deleted, but can't recycle yet */
|
|
|
|
pages_deleted++;
|
2003-02-22 01:45:05 +01:00
|
|
|
}
|
2003-02-23 07:17:13 +01:00
|
|
|
else if ((opaque->btpo_flags & BTP_HALF_DEAD) ||
|
2003-02-23 23:43:09 +01:00
|
|
|
P_FIRSTDATAKEY(opaque) > PageGetMaxOffsetNumber(page))
|
2003-02-23 07:17:13 +01:00
|
|
|
{
|
|
|
|
/* Empty, try to delete */
|
2003-08-04 02:43:34 +02:00
|
|
|
int ndel;
|
2003-02-23 07:17:13 +01:00
|
|
|
|
|
|
|
/* Run pagedel in a temp context to avoid memory leakage */
|
|
|
|
MemoryContextReset(mycontext);
|
|
|
|
oldcontext = MemoryContextSwitchTo(mycontext);
|
|
|
|
|
|
|
|
ndel = _bt_pagedel(rel, buf, info->vacuum_full);
|
2003-02-24 01:57:17 +01:00
|
|
|
|
|
|
|
/* count only this page, else may double-count parent */
|
|
|
|
if (ndel)
|
|
|
|
pages_deleted++;
|
2003-02-23 07:17:13 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* During VACUUM FULL it's okay to recycle deleted pages
|
2005-10-15 04:49:52 +02:00
|
|
|
* immediately, since there can be no other transactions scanning
|
|
|
|
* the index. Note that we will only recycle the current page and
|
|
|
|
* not any parent pages that _bt_pagedel might have recursed to;
|
|
|
|
* this seems reasonable in the name of simplicity. (Trying to do
|
|
|
|
* otherwise would mean we'd have to sort the list of recyclable
|
|
|
|
* pages we're building.)
|
2003-02-23 07:17:13 +01:00
|
|
|
*/
|
|
|
|
if (ndel && info->vacuum_full)
|
|
|
|
{
|
|
|
|
if (nFreePages < maxFreePages)
|
2003-03-04 22:51:22 +01:00
|
|
|
freePages[nFreePages++] = blkno;
|
2003-02-23 07:17:13 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
continue; /* pagedel released buffer */
|
|
|
|
}
|
2003-02-22 01:45:05 +01:00
|
|
|
_bt_relbuf(rel, buf);
|
|
|
|
}
|
|
|
|
|
2003-02-24 01:57:17 +01:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* During VACUUM FULL, we truncate off any recyclable pages at the end of
|
|
|
|
* the index. In a normal vacuum it'd be unsafe to do this except by
|
|
|
|
* acquiring exclusive lock on the index and then rechecking all the
|
|
|
|
* pages; doesn't seem worth it.
|
2003-02-24 01:57:17 +01:00
|
|
|
*/
|
|
|
|
if (info->vacuum_full && nFreePages > 0)
|
|
|
|
{
|
2003-08-04 02:43:34 +02:00
|
|
|
BlockNumber new_pages = num_pages;
|
2003-02-24 01:57:17 +01:00
|
|
|
|
2003-08-04 02:43:34 +02:00
|
|
|
while (nFreePages > 0 && freePages[nFreePages - 1] == new_pages - 1)
|
2003-02-24 01:57:17 +01:00
|
|
|
{
|
|
|
|
new_pages--;
|
|
|
|
pages_deleted--;
|
|
|
|
nFreePages--;
|
|
|
|
}
|
|
|
|
if (new_pages != num_pages)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Okay to truncate.
|
|
|
|
*/
|
2004-05-08 21:09:25 +02:00
|
|
|
RelationTruncate(rel, new_pages);
|
2004-12-01 20:00:56 +01:00
|
|
|
|
|
|
|
/* update statistics */
|
|
|
|
stats->pages_removed = num_pages - new_pages;
|
|
|
|
|
2003-02-24 01:57:17 +01:00
|
|
|
num_pages = new_pages;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2003-02-22 01:45:05 +01:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Update the shared Free Space Map with the info we now have about free
|
|
|
|
* pages in the index, discarding any old info the map may have. We do not
|
|
|
|
* need to sort the page numbers; they're in order already.
|
2003-02-22 01:45:05 +01:00
|
|
|
*/
|
2003-03-04 22:51:22 +01:00
|
|
|
RecordIndexFreeSpace(&rel->rd_node, nFreePages, freePages);
|
2003-02-22 01:45:05 +01:00
|
|
|
|
2003-03-04 22:51:22 +01:00
|
|
|
pfree(freePages);
|
2003-02-22 01:45:05 +01:00
|
|
|
|
2003-02-23 07:17:13 +01:00
|
|
|
MemoryContextDelete(mycontext);
|
|
|
|
|
2003-02-22 01:45:05 +01:00
|
|
|
/* update statistics */
|
|
|
|
stats->num_pages = num_pages;
|
2003-02-24 01:57:17 +01:00
|
|
|
stats->pages_deleted = pages_deleted;
|
2003-02-22 01:45:05 +01:00
|
|
|
stats->pages_free = nFreePages;
|
|
|
|
|
|
|
|
PG_RETURN_POINTER(stats);
|
|
|
|
}
|
|
|
|
|
2000-07-21 08:42:39 +02:00
|
|
|
/*
|
|
|
|
* Restore scan position when btgettuple is called to continue a scan.
|
2002-10-20 22:47:31 +02:00
|
|
|
*
|
|
|
|
* This is nontrivial because concurrent insertions might have moved the
|
|
|
|
* index tuple we stopped on. We assume the tuple can only have moved to
|
|
|
|
* the right from our stop point, because we kept a pin on the buffer,
|
|
|
|
* and so no deletion can have occurred on that page.
|
|
|
|
*
|
|
|
|
* On entry, we have a pin but no read lock on the buffer that contained
|
2003-08-04 02:43:34 +02:00
|
|
|
* the index tuple we stopped the scan on. On exit, we have pin and read
|
2002-10-20 22:47:31 +02:00
|
|
|
* lock on the buffer that now contains that index tuple, and the scandesc's
|
|
|
|
* current position is updated to point at it.
|
2000-07-21 08:42:39 +02:00
|
|
|
*/
|
1998-07-30 07:05:05 +02:00
|
|
|
static void
|
|
|
|
_bt_restscan(IndexScanDesc scan)
|
|
|
|
{
|
2002-05-21 01:51:44 +02:00
|
|
|
Relation rel = scan->indexRelation;
|
1998-09-01 06:40:42 +02:00
|
|
|
BTScanOpaque so = (BTScanOpaque) scan->opaque;
|
|
|
|
Buffer buf = so->btso_curbuf;
|
1999-06-07 17:14:54 +02:00
|
|
|
Page page;
|
1998-09-01 06:40:42 +02:00
|
|
|
ItemPointer current = &(scan->currentItemData);
|
|
|
|
OffsetNumber offnum = ItemPointerGetOffsetNumber(current),
|
1999-06-07 17:14:54 +02:00
|
|
|
maxoff;
|
|
|
|
BTPageOpaque opaque;
|
2002-10-20 22:47:31 +02:00
|
|
|
Buffer nextbuf;
|
2003-02-22 01:45:05 +01:00
|
|
|
ItemPointer target = &(so->curHeapIptr);
|
1998-09-01 06:40:42 +02:00
|
|
|
BTItem item;
|
|
|
|
BlockNumber blkno;
|
1998-07-30 07:05:05 +02:00
|
|
|
|
2000-07-21 08:42:39 +02:00
|
|
|
/*
|
2003-08-04 02:43:34 +02:00
|
|
|
* Reacquire read lock on the buffer. (We should still have a
|
|
|
|
* reference-count pin on it, so need not get that.)
|
2000-07-21 08:42:39 +02:00
|
|
|
*/
|
|
|
|
LockBuffer(buf, BT_READ);
|
|
|
|
|
1999-06-07 17:14:54 +02:00
|
|
|
page = BufferGetPage(buf);
|
|
|
|
maxoff = PageGetMaxOffsetNumber(page);
|
|
|
|
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
|
1999-05-26 00:04:56 +02:00
|
|
|
|
1999-03-28 22:32:42 +02:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* We use this as flag when first index tuple on page is deleted but we do
|
|
|
|
* not move left (this would slowdown vacuum) - so we set
|
1999-05-25 18:15:34 +02:00
|
|
|
* current->ip_posid before first index tuple on the current page
|
2002-10-20 22:47:31 +02:00
|
|
|
* (_bt_step will move it right)... XXX still needed?
|
1999-03-28 22:32:42 +02:00
|
|
|
*/
|
2003-02-22 01:45:05 +01:00
|
|
|
if (!ItemPointerIsValid(target))
|
1999-03-28 22:32:42 +02:00
|
|
|
{
|
2000-07-21 08:42:39 +02:00
|
|
|
ItemPointerSetOffsetNumber(current,
|
2005-10-15 04:49:52 +02:00
|
|
|
OffsetNumberPrev(P_FIRSTDATAKEY(opaque)));
|
1999-03-28 22:32:42 +02:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2000-07-21 08:42:39 +02:00
|
|
|
/*
|
2001-03-22 05:01:46 +01:00
|
|
|
* The item we were on may have moved right due to insertions. Find it
|
2002-10-20 22:47:31 +02:00
|
|
|
* again. We use the heap TID to identify the item uniquely.
|
2000-07-21 08:42:39 +02:00
|
|
|
*/
|
|
|
|
for (;;)
|
1998-07-30 07:05:05 +02:00
|
|
|
{
|
2000-07-21 08:42:39 +02:00
|
|
|
/* Check for item on this page */
|
1998-09-01 06:40:42 +02:00
|
|
|
for (;
|
1998-07-30 07:05:05 +02:00
|
|
|
offnum <= maxoff;
|
|
|
|
offnum = OffsetNumberNext(offnum))
|
|
|
|
{
|
|
|
|
item = (BTItem) PageGetItem(page, PageGetItemId(page, offnum));
|
2003-02-22 01:45:05 +01:00
|
|
|
if (BTTidSame(item->bti_itup.t_tid, *target))
|
1998-07-30 07:05:05 +02:00
|
|
|
{
|
2002-10-20 22:47:31 +02:00
|
|
|
/* Found it */
|
1998-07-30 07:05:05 +02:00
|
|
|
current->ip_posid = offnum;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2000-07-21 08:42:39 +02:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* The item we're looking for moved right at least one page, so move
|
|
|
|
* right. We are careful here to pin and read-lock the next non-dead
|
|
|
|
* page before releasing the current one. This ensures that a
|
|
|
|
* concurrent btbulkdelete scan cannot pass our position --- if it
|
|
|
|
* did, it might be able to reach and delete our target item before we
|
|
|
|
* can find it again.
|
2000-07-21 08:42:39 +02:00
|
|
|
*/
|
1998-07-30 07:05:05 +02:00
|
|
|
if (P_RIGHTMOST(opaque))
|
2003-07-21 22:29:40 +02:00
|
|
|
elog(ERROR, "failed to re-find previous key in \"%s\"",
|
|
|
|
RelationGetRelationName(rel));
|
2003-02-22 01:45:05 +01:00
|
|
|
/* Advance to next non-dead page --- there must be one */
|
|
|
|
nextbuf = InvalidBuffer;
|
|
|
|
for (;;)
|
|
|
|
{
|
|
|
|
blkno = opaque->btpo_next;
|
2004-04-21 20:24:26 +02:00
|
|
|
nextbuf = _bt_relandgetbuf(rel, nextbuf, blkno, BT_READ);
|
2003-02-22 01:45:05 +01:00
|
|
|
page = BufferGetPage(nextbuf);
|
|
|
|
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
|
|
|
|
if (!P_IGNORE(opaque))
|
|
|
|
break;
|
|
|
|
if (P_RIGHTMOST(opaque))
|
2003-07-21 22:29:40 +02:00
|
|
|
elog(ERROR, "fell off the end of \"%s\"",
|
2003-02-22 01:45:05 +01:00
|
|
|
RelationGetRelationName(rel));
|
|
|
|
}
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
_bt_relbuf(rel, buf);
|
2002-10-20 22:47:31 +02:00
|
|
|
so->btso_curbuf = buf = nextbuf;
|
1998-07-30 07:05:05 +02:00
|
|
|
maxoff = PageGetMaxOffsetNumber(page);
|
2000-07-21 08:42:39 +02:00
|
|
|
offnum = P_FIRSTDATAKEY(opaque);
|
|
|
|
ItemPointerSet(current, blkno, offnum);
|
1998-07-30 07:05:05 +02:00
|
|
|
}
|
|
|
|
}
|