postgresql/src/backend/access/gist/gistsplit.c

761 lines
22 KiB
C
Raw Normal View History

2006-06-28 14:08:35 +02:00
/*-------------------------------------------------------------------------
*
* gistsplit.c
* Multi-column page splitting algorithm
*
* This file is concerned with making good page-split decisions in multi-column
* GiST indexes. The opclass-specific picksplit functions can only be expected
* to produce answers based on a single column. We first run the picksplit
* function for column 1; then, if there are more columns, we check if any of
* the tuples are "don't cares" so far as the column 1 split is concerned
* (that is, they could go to either side for no additional penalty). If so,
* we try to redistribute those tuples on the basis of the next column.
* Repeat till we're out of columns.
*
* gistSplitByKey() is the entry point to this file.
2006-06-28 14:08:35 +02:00
*
*
* Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
2006-06-28 14:08:35 +02:00
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
2010-09-20 22:08:53 +02:00
* src/backend/access/gist/gistsplit.c
2006-06-28 14:08:35 +02:00
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include "access/gist_private.h"
#include "utils/rel.h"
2006-06-28 14:08:35 +02:00
2006-10-04 02:30:14 +02:00
typedef struct
{
2006-06-28 14:08:35 +02:00
OffsetNumber *entries;
Repair bugs in GiST page splitting code for multi-column indexes. When considering a non-last column in a multi-column GiST index, gistsplit.c tries to improve on the split chosen by the opclass-specific pickSplit function by considering penalties for the next column. However, there were two bugs in this code: it failed to recompute the union keys for the leftmost index columns, even though these might well change after reassigning tuples; and it included the old union keys in the recomputation for the columns it did recompute, so that those keys couldn't get smaller even if they should. The first problem could result in an invalid index in which searches wouldn't find index entries that are in fact present; the second would make the index less efficient to search. Both of these errors were caused by misuse of gistMakeUnionItVec, whose API was designed in a way that just begged such errors to be made. There is no situation in which it's safe or useful to compute the union keys for a subset of the index columns, and there is no caller that wants any previous union keys to be included in the computation; so the undocumented choice to treat the union keys as in/out rather than pure output parameters is a waste of code as well as being dangerous. Hence, rather than just making a minimal patch, I've changed the API of gistMakeUnionItVec to remove the "startkey" parameter (it now always processes all index columns) and treat the attr/isnull arrays as purely output parameters. In passing, also get rid of a couple of unnecessary and dangerous uses of static variables in gistutil.c. It's remarkable that the one in gistMakeUnionKey hasn't given us portability troubles before now, because in addition to posing a re-entrancy hazard, it was unsafely assuming that a static char[] array would have at least Datum alignment. Per investigation of a trouble report from Tomas Vondra. (There are also some bugs in contrib/btree_gist to be fixed, but that seems like material for a separate patch.) Back-patch to all supported branches.
2013-02-07 23:44:02 +01:00
int len;
Datum *attr;
2006-10-04 02:30:14 +02:00
bool *isnull;
bool *dontcare;
2006-06-28 14:08:35 +02:00
} GistSplitUnion;
/*
* Form unions of subkeys in itvec[] entries listed in gsvp->entries[],
* ignoring any tuples that are marked in gsvp->dontcare[]. Subroutine for
* gistunionsubkey.
2006-06-28 14:08:35 +02:00
*/
2006-10-04 02:30:14 +02:00
static void
gistunionsubkeyvec(GISTSTATE *giststate, IndexTuple *itvec,
Repair bugs in GiST page splitting code for multi-column indexes. When considering a non-last column in a multi-column GiST index, gistsplit.c tries to improve on the split chosen by the opclass-specific pickSplit function by considering penalties for the next column. However, there were two bugs in this code: it failed to recompute the union keys for the leftmost index columns, even though these might well change after reassigning tuples; and it included the old union keys in the recomputation for the columns it did recompute, so that those keys couldn't get smaller even if they should. The first problem could result in an invalid index in which searches wouldn't find index entries that are in fact present; the second would make the index less efficient to search. Both of these errors were caused by misuse of gistMakeUnionItVec, whose API was designed in a way that just begged such errors to be made. There is no situation in which it's safe or useful to compute the union keys for a subset of the index columns, and there is no caller that wants any previous union keys to be included in the computation; so the undocumented choice to treat the union keys as in/out rather than pure output parameters is a waste of code as well as being dangerous. Hence, rather than just making a minimal patch, I've changed the API of gistMakeUnionItVec to remove the "startkey" parameter (it now always processes all index columns) and treat the attr/isnull arrays as purely output parameters. In passing, also get rid of a couple of unnecessary and dangerous uses of static variables in gistutil.c. It's remarkable that the one in gistMakeUnionKey hasn't given us portability troubles before now, because in addition to posing a re-entrancy hazard, it was unsafely assuming that a static char[] array would have at least Datum alignment. Per investigation of a trouble report from Tomas Vondra. (There are also some bugs in contrib/btree_gist to be fixed, but that seems like material for a separate patch.) Back-patch to all supported branches.
2013-02-07 23:44:02 +01:00
GistSplitUnion *gsvp)
2006-10-04 02:30:14 +02:00
{
IndexTuple *cleanedItVec;
int i,
cleanedLen = 0;
2006-06-28 14:08:35 +02:00
2006-10-04 02:30:14 +02:00
cleanedItVec = (IndexTuple *) palloc(sizeof(IndexTuple) * gsvp->len);
2006-06-28 14:08:35 +02:00
2006-10-04 02:30:14 +02:00
for (i = 0; i < gsvp->len; i++)
{
if (gsvp->dontcare && gsvp->dontcare[gsvp->entries[i]])
2006-06-28 14:08:35 +02:00
continue;
cleanedItVec[cleanedLen++] = itvec[gsvp->entries[i] - 1];
}
Repair bugs in GiST page splitting code for multi-column indexes. When considering a non-last column in a multi-column GiST index, gistsplit.c tries to improve on the split chosen by the opclass-specific pickSplit function by considering penalties for the next column. However, there were two bugs in this code: it failed to recompute the union keys for the leftmost index columns, even though these might well change after reassigning tuples; and it included the old union keys in the recomputation for the columns it did recompute, so that those keys couldn't get smaller even if they should. The first problem could result in an invalid index in which searches wouldn't find index entries that are in fact present; the second would make the index less efficient to search. Both of these errors were caused by misuse of gistMakeUnionItVec, whose API was designed in a way that just begged such errors to be made. There is no situation in which it's safe or useful to compute the union keys for a subset of the index columns, and there is no caller that wants any previous union keys to be included in the computation; so the undocumented choice to treat the union keys as in/out rather than pure output parameters is a waste of code as well as being dangerous. Hence, rather than just making a minimal patch, I've changed the API of gistMakeUnionItVec to remove the "startkey" parameter (it now always processes all index columns) and treat the attr/isnull arrays as purely output parameters. In passing, also get rid of a couple of unnecessary and dangerous uses of static variables in gistutil.c. It's remarkable that the one in gistMakeUnionKey hasn't given us portability troubles before now, because in addition to posing a re-entrancy hazard, it was unsafely assuming that a static char[] array would have at least Datum alignment. Per investigation of a trouble report from Tomas Vondra. (There are also some bugs in contrib/btree_gist to be fixed, but that seems like material for a separate patch.) Back-patch to all supported branches.
2013-02-07 23:44:02 +01:00
gistMakeUnionItVec(giststate, cleanedItVec, cleanedLen,
2006-10-04 02:30:14 +02:00
gsvp->attr, gsvp->isnull);
2006-06-28 14:08:35 +02:00
2006-10-04 02:30:14 +02:00
pfree(cleanedItVec);
2006-06-28 14:08:35 +02:00
}
/*
* Recompute unions of left- and right-side subkeys after a page split,
* ignoring any tuples that are marked in spl->spl_dontcare[].
*
* Note: we always recompute union keys for all index columns. In some cases
* this might represent duplicate work for the leftmost column(s), but it's
* not safe to assume that "zero penalty to move a tuple" means "the union
* key doesn't change at all". Penalty functions aren't 100% accurate.
2006-06-28 14:08:35 +02:00
*/
static void
Repair bugs in GiST page splitting code for multi-column indexes. When considering a non-last column in a multi-column GiST index, gistsplit.c tries to improve on the split chosen by the opclass-specific pickSplit function by considering penalties for the next column. However, there were two bugs in this code: it failed to recompute the union keys for the leftmost index columns, even though these might well change after reassigning tuples; and it included the old union keys in the recomputation for the columns it did recompute, so that those keys couldn't get smaller even if they should. The first problem could result in an invalid index in which searches wouldn't find index entries that are in fact present; the second would make the index less efficient to search. Both of these errors were caused by misuse of gistMakeUnionItVec, whose API was designed in a way that just begged such errors to be made. There is no situation in which it's safe or useful to compute the union keys for a subset of the index columns, and there is no caller that wants any previous union keys to be included in the computation; so the undocumented choice to treat the union keys as in/out rather than pure output parameters is a waste of code as well as being dangerous. Hence, rather than just making a minimal patch, I've changed the API of gistMakeUnionItVec to remove the "startkey" parameter (it now always processes all index columns) and treat the attr/isnull arrays as purely output parameters. In passing, also get rid of a couple of unnecessary and dangerous uses of static variables in gistutil.c. It's remarkable that the one in gistMakeUnionKey hasn't given us portability troubles before now, because in addition to posing a re-entrancy hazard, it was unsafely assuming that a static char[] array would have at least Datum alignment. Per investigation of a trouble report from Tomas Vondra. (There are also some bugs in contrib/btree_gist to be fixed, but that seems like material for a separate patch.) Back-patch to all supported branches.
2013-02-07 23:44:02 +01:00
gistunionsubkey(GISTSTATE *giststate, IndexTuple *itvec, GistSplitVector *spl)
2006-06-28 14:08:35 +02:00
{
2006-10-04 02:30:14 +02:00
GistSplitUnion gsvp;
2006-06-28 14:08:35 +02:00
gsvp.dontcare = spl->spl_dontcare;
2006-06-28 14:08:35 +02:00
gsvp.entries = spl->splitVector.spl_left;
Repair bugs in GiST page splitting code for multi-column indexes. When considering a non-last column in a multi-column GiST index, gistsplit.c tries to improve on the split chosen by the opclass-specific pickSplit function by considering penalties for the next column. However, there were two bugs in this code: it failed to recompute the union keys for the leftmost index columns, even though these might well change after reassigning tuples; and it included the old union keys in the recomputation for the columns it did recompute, so that those keys couldn't get smaller even if they should. The first problem could result in an invalid index in which searches wouldn't find index entries that are in fact present; the second would make the index less efficient to search. Both of these errors were caused by misuse of gistMakeUnionItVec, whose API was designed in a way that just begged such errors to be made. There is no situation in which it's safe or useful to compute the union keys for a subset of the index columns, and there is no caller that wants any previous union keys to be included in the computation; so the undocumented choice to treat the union keys as in/out rather than pure output parameters is a waste of code as well as being dangerous. Hence, rather than just making a minimal patch, I've changed the API of gistMakeUnionItVec to remove the "startkey" parameter (it now always processes all index columns) and treat the attr/isnull arrays as purely output parameters. In passing, also get rid of a couple of unnecessary and dangerous uses of static variables in gistutil.c. It's remarkable that the one in gistMakeUnionKey hasn't given us portability troubles before now, because in addition to posing a re-entrancy hazard, it was unsafely assuming that a static char[] array would have at least Datum alignment. Per investigation of a trouble report from Tomas Vondra. (There are also some bugs in contrib/btree_gist to be fixed, but that seems like material for a separate patch.) Back-patch to all supported branches.
2013-02-07 23:44:02 +01:00
gsvp.len = spl->splitVector.spl_nleft;
gsvp.attr = spl->spl_lattr;
2006-06-28 14:08:35 +02:00
gsvp.isnull = spl->spl_lisnull;
Repair bugs in GiST page splitting code for multi-column indexes. When considering a non-last column in a multi-column GiST index, gistsplit.c tries to improve on the split chosen by the opclass-specific pickSplit function by considering penalties for the next column. However, there were two bugs in this code: it failed to recompute the union keys for the leftmost index columns, even though these might well change after reassigning tuples; and it included the old union keys in the recomputation for the columns it did recompute, so that those keys couldn't get smaller even if they should. The first problem could result in an invalid index in which searches wouldn't find index entries that are in fact present; the second would make the index less efficient to search. Both of these errors were caused by misuse of gistMakeUnionItVec, whose API was designed in a way that just begged such errors to be made. There is no situation in which it's safe or useful to compute the union keys for a subset of the index columns, and there is no caller that wants any previous union keys to be included in the computation; so the undocumented choice to treat the union keys as in/out rather than pure output parameters is a waste of code as well as being dangerous. Hence, rather than just making a minimal patch, I've changed the API of gistMakeUnionItVec to remove the "startkey" parameter (it now always processes all index columns) and treat the attr/isnull arrays as purely output parameters. In passing, also get rid of a couple of unnecessary and dangerous uses of static variables in gistutil.c. It's remarkable that the one in gistMakeUnionKey hasn't given us portability troubles before now, because in addition to posing a re-entrancy hazard, it was unsafely assuming that a static char[] array would have at least Datum alignment. Per investigation of a trouble report from Tomas Vondra. (There are also some bugs in contrib/btree_gist to be fixed, but that seems like material for a separate patch.) Back-patch to all supported branches.
2013-02-07 23:44:02 +01:00
gistunionsubkeyvec(giststate, itvec, &gsvp);
2006-06-28 14:08:35 +02:00
gsvp.entries = spl->splitVector.spl_right;
Repair bugs in GiST page splitting code for multi-column indexes. When considering a non-last column in a multi-column GiST index, gistsplit.c tries to improve on the split chosen by the opclass-specific pickSplit function by considering penalties for the next column. However, there were two bugs in this code: it failed to recompute the union keys for the leftmost index columns, even though these might well change after reassigning tuples; and it included the old union keys in the recomputation for the columns it did recompute, so that those keys couldn't get smaller even if they should. The first problem could result in an invalid index in which searches wouldn't find index entries that are in fact present; the second would make the index less efficient to search. Both of these errors were caused by misuse of gistMakeUnionItVec, whose API was designed in a way that just begged such errors to be made. There is no situation in which it's safe or useful to compute the union keys for a subset of the index columns, and there is no caller that wants any previous union keys to be included in the computation; so the undocumented choice to treat the union keys as in/out rather than pure output parameters is a waste of code as well as being dangerous. Hence, rather than just making a minimal patch, I've changed the API of gistMakeUnionItVec to remove the "startkey" parameter (it now always processes all index columns) and treat the attr/isnull arrays as purely output parameters. In passing, also get rid of a couple of unnecessary and dangerous uses of static variables in gistutil.c. It's remarkable that the one in gistMakeUnionKey hasn't given us portability troubles before now, because in addition to posing a re-entrancy hazard, it was unsafely assuming that a static char[] array would have at least Datum alignment. Per investigation of a trouble report from Tomas Vondra. (There are also some bugs in contrib/btree_gist to be fixed, but that seems like material for a separate patch.) Back-patch to all supported branches.
2013-02-07 23:44:02 +01:00
gsvp.len = spl->splitVector.spl_nright;
gsvp.attr = spl->spl_rattr;
2006-06-28 14:08:35 +02:00
gsvp.isnull = spl->spl_risnull;
Repair bugs in GiST page splitting code for multi-column indexes. When considering a non-last column in a multi-column GiST index, gistsplit.c tries to improve on the split chosen by the opclass-specific pickSplit function by considering penalties for the next column. However, there were two bugs in this code: it failed to recompute the union keys for the leftmost index columns, even though these might well change after reassigning tuples; and it included the old union keys in the recomputation for the columns it did recompute, so that those keys couldn't get smaller even if they should. The first problem could result in an invalid index in which searches wouldn't find index entries that are in fact present; the second would make the index less efficient to search. Both of these errors were caused by misuse of gistMakeUnionItVec, whose API was designed in a way that just begged such errors to be made. There is no situation in which it's safe or useful to compute the union keys for a subset of the index columns, and there is no caller that wants any previous union keys to be included in the computation; so the undocumented choice to treat the union keys as in/out rather than pure output parameters is a waste of code as well as being dangerous. Hence, rather than just making a minimal patch, I've changed the API of gistMakeUnionItVec to remove the "startkey" parameter (it now always processes all index columns) and treat the attr/isnull arrays as purely output parameters. In passing, also get rid of a couple of unnecessary and dangerous uses of static variables in gistutil.c. It's remarkable that the one in gistMakeUnionKey hasn't given us portability troubles before now, because in addition to posing a re-entrancy hazard, it was unsafely assuming that a static char[] array would have at least Datum alignment. Per investigation of a trouble report from Tomas Vondra. (There are also some bugs in contrib/btree_gist to be fixed, but that seems like material for a separate patch.) Back-patch to all supported branches.
2013-02-07 23:44:02 +01:00
gistunionsubkeyvec(giststate, itvec, &gsvp);
2006-06-28 14:08:35 +02:00
}
/*
* Find tuples that are "don't cares", that is could be moved to the other
* side of the split with zero penalty, so far as the attno column is
* concerned.
*
* Don't-care tuples are marked by setting the corresponding entry in
* spl->spl_dontcare[] to "true". Caller must have initialized that array
* to zeroes.
*
* Returns number of don't-cares found.
2006-06-28 14:08:35 +02:00
*/
static int
findDontCares(Relation r, GISTSTATE *giststate, GISTENTRY *valvec,
GistSplitVector *spl, int attno)
2006-06-28 14:08:35 +02:00
{
int i;
GISTENTRY entry;
int NumDontCare = 0;
2006-06-28 14:08:35 +02:00
/*
* First, search the left-side tuples to see if any have zero penalty to
* be added to the right-side union key.
*
* attno column is known all-not-null (see gistSplitByKey), so we need not
* check for nulls
2006-06-28 14:08:35 +02:00
*/
gistentryinit(entry, spl->splitVector.spl_rdatum, r, NULL,
(OffsetNumber) 0, FALSE);
2006-10-04 02:30:14 +02:00
for (i = 0; i < spl->splitVector.spl_nleft; i++)
{
int j = spl->splitVector.spl_left[i];
2006-10-04 02:30:14 +02:00
float penalty = gistpenalty(giststate, attno, &entry, false,
&valvec[j], false);
2006-10-04 02:30:14 +02:00
if (penalty == 0.0)
{
spl->spl_dontcare[j] = true;
NumDontCare++;
2006-06-28 14:08:35 +02:00
}
}
/* And conversely for the right-side tuples */
gistentryinit(entry, spl->splitVector.spl_ldatum, r, NULL,
(OffsetNumber) 0, FALSE);
2006-10-04 02:30:14 +02:00
for (i = 0; i < spl->splitVector.spl_nright; i++)
{
int j = spl->splitVector.spl_right[i];
2006-10-04 02:30:14 +02:00
float penalty = gistpenalty(giststate, attno, &entry, false,
&valvec[j], false);
2006-10-04 02:30:14 +02:00
if (penalty == 0.0)
{
spl->spl_dontcare[j] = true;
NumDontCare++;
2006-06-28 14:08:35 +02:00
}
}
return NumDontCare;
2006-06-28 14:08:35 +02:00
}
/*
* Remove tuples that are marked don't-cares from the tuple index array a[]
* of length *len. This is applied separately to the spl_left and spl_right
* arrays.
*
* Corner case: we do not wish to reduce the index array to zero length.
* (If we did, then the union key for this side would be null, and having just
* one of spl_ldatum_exists and spl_rdatum_exists be TRUE might confuse
* user-defined PickSplit methods.) To avoid that, we'll forcibly redefine
* one tuple as non-don't-care if necessary. Hence, we must be able to adjust
* caller's NumDontCare count.
*/
2006-06-28 14:08:35 +02:00
static void
removeDontCares(OffsetNumber *a, int *len, bool *dontcare, int *NumDontCare)
2006-10-04 02:30:14 +02:00
{
int origlen,
curlen,
2006-10-04 02:30:14 +02:00
i;
OffsetNumber *curwpos;
2006-06-28 14:08:35 +02:00
origlen = curlen = *len;
2006-06-28 14:08:35 +02:00
curwpos = a;
for (i = 0; i < origlen; i++)
2006-10-04 02:30:14 +02:00
{
OffsetNumber ai = a[i];
if (dontcare[ai] == FALSE)
2006-10-04 02:30:14 +02:00
{
/* re-emit item into a[] */
*curwpos = ai;
2006-06-28 14:08:35 +02:00
curwpos++;
2006-10-04 02:30:14 +02:00
}
else if (curlen == 1)
2006-10-04 02:30:14 +02:00
{
/* corner case: don't let array become empty */
dontcare[ai] = FALSE; /* mark item as non-dont-care */
*NumDontCare -= 1;
i--; /* reprocess item on next iteration */
2006-06-28 14:08:35 +02:00
}
else
curlen--;
2006-06-28 14:08:35 +02:00
}
*len = curlen;
}
/*
* Place a single don't-care tuple into either the left or right side of the
* split, according to which has least penalty for merging the tuple into
* the previously-computed union keys. We need consider only columns starting
* at attno.
*/
2006-06-28 14:08:35 +02:00
static void
placeOne(Relation r, GISTSTATE *giststate, GistSplitVector *v,
IndexTuple itup, OffsetNumber off, int attno)
2006-10-04 02:30:14 +02:00
{
2006-06-28 14:08:35 +02:00
GISTENTRY identry[INDEX_MAX_KEYS];
bool isnull[INDEX_MAX_KEYS];
2006-10-04 02:30:14 +02:00
bool toLeft = true;
2006-06-28 14:08:35 +02:00
gistDeCompressAtt(giststate, r, itup, NULL, (OffsetNumber) 0,
identry, isnull);
2006-06-28 14:08:35 +02:00
2006-10-04 02:30:14 +02:00
for (; attno < giststate->tupdesc->natts; attno++)
{
float lpenalty,
rpenalty;
2006-06-28 14:08:35 +02:00
GISTENTRY entry;
2006-10-04 02:30:14 +02:00
gistentryinit(entry, v->spl_lattr[attno], r, NULL, 0, FALSE);
lpenalty = gistpenalty(giststate, attno, &entry, v->spl_lisnull[attno],
identry + attno, isnull[attno]);
2006-10-04 02:30:14 +02:00
gistentryinit(entry, v->spl_rattr[attno], r, NULL, 0, FALSE);
rpenalty = gistpenalty(giststate, attno, &entry, v->spl_risnull[attno],
identry + attno, isnull[attno]);
2006-06-28 14:08:35 +02:00
2006-10-04 02:30:14 +02:00
if (lpenalty != rpenalty)
{
if (lpenalty > rpenalty)
2006-06-28 14:08:35 +02:00
toLeft = false;
break;
}
}
2006-10-04 02:30:14 +02:00
if (toLeft)
v->splitVector.spl_left[v->splitVector.spl_nleft++] = off;
2006-06-28 14:08:35 +02:00
else
2006-10-04 02:30:14 +02:00
v->splitVector.spl_right[v->splitVector.spl_nright++] = off;
2006-06-28 14:08:35 +02:00
}
#define SWAPVAR( s, d, t ) \
do { \
(t) = (s); \
(s) = (d); \
(d) = (t); \
} while(0)
/*
* Clean up when we did a secondary split but the user-defined PickSplit
* method didn't support it (leaving spl_ldatum_exists or spl_rdatum_exists
* true).
*
* We consider whether to swap the left and right outputs of the secondary
* split; this can be worthwhile if the penalty for merging those tuples into
* the previously chosen sets is less that way.
*
* In any case we must update the union datums for the current column by
* adding in the previous union keys (oldL/oldR), since the user-defined
* PickSplit method didn't do so.
2006-06-28 14:08:35 +02:00
*/
static void
supportSecondarySplit(Relation r, GISTSTATE *giststate, int attno,
GIST_SPLITVEC *sv, Datum oldL, Datum oldR)
2006-10-04 02:30:14 +02:00
{
bool leaveOnLeft = true,
tmpBool;
GISTENTRY entryL,
entryR,
entrySL,
entrySR;
gistentryinit(entryL, oldL, r, NULL, 0, FALSE);
gistentryinit(entryR, oldR, r, NULL, 0, FALSE);
gistentryinit(entrySL, sv->spl_ldatum, r, NULL, 0, FALSE);
gistentryinit(entrySR, sv->spl_rdatum, r, NULL, 0, FALSE);
if (sv->spl_ldatum_exists && sv->spl_rdatum_exists)
{
float penalty1,
penalty2;
2006-06-28 14:08:35 +02:00
penalty1 = gistpenalty(giststate, attno, &entryL, false, &entrySL, false) +
2006-10-04 02:30:14 +02:00
gistpenalty(giststate, attno, &entryR, false, &entrySR, false);
2006-06-28 14:08:35 +02:00
penalty2 = gistpenalty(giststate, attno, &entryL, false, &entrySR, false) +
2006-10-04 02:30:14 +02:00
gistpenalty(giststate, attno, &entryR, false, &entrySL, false);
2006-06-28 14:08:35 +02:00
2006-10-04 02:30:14 +02:00
if (penalty1 > penalty2)
2006-06-28 14:08:35 +02:00
leaveOnLeft = false;
2006-10-04 02:30:14 +02:00
}
else
{
GISTENTRY *entry1 = (sv->spl_ldatum_exists) ? &entryL : &entryR;
float penalty1,
penalty2;
2006-06-28 14:08:35 +02:00
/*
2006-10-04 02:30:14 +02:00
* there is only one previously defined union, so we just choose swap
* or not by lowest penalty
2006-06-28 14:08:35 +02:00
*/
penalty1 = gistpenalty(giststate, attno, entry1, false, &entrySL, false);
penalty2 = gistpenalty(giststate, attno, entry1, false, &entrySR, false);
2006-10-04 02:30:14 +02:00
if (penalty1 < penalty2)
leaveOnLeft = (sv->spl_ldatum_exists) ? true : false;
2006-06-28 14:08:35 +02:00
else
2006-10-04 02:30:14 +02:00
leaveOnLeft = (sv->spl_rdatum_exists) ? true : false;
2006-06-28 14:08:35 +02:00
}
2006-10-04 02:30:14 +02:00
if (leaveOnLeft == false)
{
2006-06-28 14:08:35 +02:00
/*
2006-10-04 02:30:14 +02:00
* swap left and right
2006-06-28 14:08:35 +02:00
*/
2006-10-04 02:30:14 +02:00
OffsetNumber *off,
noff;
Datum datum;
SWAPVAR(sv->spl_left, sv->spl_right, off);
SWAPVAR(sv->spl_nleft, sv->spl_nright, noff);
SWAPVAR(sv->spl_ldatum, sv->spl_rdatum, datum);
gistentryinit(entrySL, sv->spl_ldatum, r, NULL, 0, FALSE);
gistentryinit(entrySR, sv->spl_rdatum, r, NULL, 0, FALSE);
2006-06-28 14:08:35 +02:00
}
2006-10-04 02:30:14 +02:00
if (sv->spl_ldatum_exists)
2006-06-28 14:08:35 +02:00
gistMakeUnionKey(giststate, attno, &entryL, false, &entrySL, false,
2006-10-04 02:30:14 +02:00
&sv->spl_ldatum, &tmpBool);
2006-06-28 14:08:35 +02:00
2006-10-04 02:30:14 +02:00
if (sv->spl_rdatum_exists)
2006-06-28 14:08:35 +02:00
gistMakeUnionKey(giststate, attno, &entryR, false, &entrySR, false,
2006-10-04 02:30:14 +02:00
&sv->spl_rdatum, &tmpBool);
2006-06-28 14:08:35 +02:00
sv->spl_ldatum_exists = sv->spl_rdatum_exists = false;
}
/*
* Trivial picksplit implementation. Function called only
* if user-defined picksplit puts all keys on the same side of the split.
* That is a bug of user-defined picksplit but we don't want to fail.
*/
static void
genericPickSplit(GISTSTATE *giststate, GistEntryVector *entryvec, GIST_SPLITVEC *v, int attno)
{
OffsetNumber i,
maxoff;
int nbytes;
GistEntryVector *evec;
maxoff = entryvec->n - 1;
nbytes = (maxoff + 2) * sizeof(OffsetNumber);
v->spl_left = (OffsetNumber *) palloc(nbytes);
v->spl_right = (OffsetNumber *) palloc(nbytes);
v->spl_nleft = v->spl_nright = 0;
for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
{
if (i <= (maxoff - FirstOffsetNumber + 1) / 2)
{
v->spl_left[v->spl_nleft] = i;
v->spl_nleft++;
}
else
{
v->spl_right[v->spl_nright] = i;
v->spl_nright++;
}
}
/*
* Form union datums for each side
*/
evec = palloc(sizeof(GISTENTRY) * entryvec->n + GEVHDRSZ);
evec->n = v->spl_nleft;
memcpy(evec->vector, entryvec->vector + FirstOffsetNumber,
sizeof(GISTENTRY) * evec->n);
v->spl_ldatum = FunctionCall2Coll(&giststate->unionFn[attno],
giststate->supportCollation[attno],
PointerGetDatum(evec),
PointerGetDatum(&nbytes));
evec->n = v->spl_nright;
memcpy(evec->vector, entryvec->vector + FirstOffsetNumber + v->spl_nleft,
sizeof(GISTENTRY) * evec->n);
v->spl_rdatum = FunctionCall2Coll(&giststate->unionFn[attno],
giststate->supportCollation[attno],
PointerGetDatum(evec),
PointerGetDatum(&nbytes));
}
2006-06-28 14:08:35 +02:00
/*
* Calls user picksplit method for attno column to split tuples into
* two vectors.
*
* Returns FALSE if split is complete (there are no more index columns, or
* there is no need to consider them). Note that in this case the union
* keys for all columns must be computed here.
* Returns TRUE and v->spl_dontcare = NULL if left and right unions of attno
* column are the same, so we should split on next column instead.
* Returns TRUE and v->spl_dontcare != NULL if there are don't-care tuples
* that could be relocated based on the next column(s). The don't-care
* tuples have been removed from the split and must be reinserted by caller.
2006-06-28 14:08:35 +02:00
*/
static bool
gistUserPicksplit(Relation r, GistEntryVector *entryvec, int attno, GistSplitVector *v,
IndexTuple *itup, int len, GISTSTATE *giststate)
{
GIST_SPLITVEC *sv = &v->splitVector;
2006-10-04 02:30:14 +02:00
2006-06-28 14:08:35 +02:00
/*
* Prepare spl_ldatum/spl_rdatum/spl_ldatum_exists/spl_rdatum_exists in
* case we are doing a secondary split (see comments in gist.h).
2006-06-28 14:08:35 +02:00
*/
2006-10-04 02:30:14 +02:00
sv->spl_ldatum_exists = (v->spl_lisnull[attno]) ? false : true;
sv->spl_rdatum_exists = (v->spl_risnull[attno]) ? false : true;
2006-06-28 14:08:35 +02:00
sv->spl_ldatum = v->spl_lattr[attno];
sv->spl_rdatum = v->spl_rattr[attno];
/*
* Let the opclass-specific PickSplit method do its thing. Note that at
* this point we know there are no null keys in the entryvec.
*/
FunctionCall2Coll(&giststate->picksplitFn[attno],
giststate->supportCollation[attno],
PointerGetDatum(entryvec),
PointerGetDatum(sv));
2006-06-28 14:08:35 +02:00
if (sv->spl_nleft == 0 || sv->spl_nright == 0)
{
/*
* User-defined picksplit failed to create an actual split, ie it put
* everything on the same side. Complain but cope.
*/
ereport(DEBUG1,
(errcode(ERRCODE_INTERNAL_ERROR),
errmsg("picksplit method for column %d of index \"%s\" failed",
attno + 1, RelationGetRelationName(r)),
errhint("The index is not optimal. To optimize it, contact a developer, or try to use the column as the second one in the CREATE INDEX command.")));
2006-06-28 14:08:35 +02:00
/*
* Reinit GIST_SPLITVEC. Although these fields are not used by
* genericPickSplit(), set them up for further processing
*/
sv->spl_ldatum_exists = (v->spl_lisnull[attno]) ? false : true;
sv->spl_rdatum_exists = (v->spl_risnull[attno]) ? false : true;
sv->spl_ldatum = v->spl_lattr[attno];
sv->spl_rdatum = v->spl_rattr[attno];
/* Do a generic split */
genericPickSplit(giststate, entryvec, sv, attno);
/* Clean up if we're in a secondary split */
if (sv->spl_ldatum_exists || sv->spl_rdatum_exists)
supportSecondarySplit(r, giststate, attno, sv,
v->spl_lattr[attno], v->spl_rattr[attno]);
}
else
2006-10-04 02:30:14 +02:00
{
/* hack for compatibility with old picksplit API */
if (sv->spl_left[sv->spl_nleft - 1] == InvalidOffsetNumber)
sv->spl_left[sv->spl_nleft - 1] = (OffsetNumber) (entryvec->n - 1);
if (sv->spl_right[sv->spl_nright - 1] == InvalidOffsetNumber)
sv->spl_right[sv->spl_nright - 1] = (OffsetNumber) (entryvec->n - 1);
2006-06-28 14:08:35 +02:00
/* Clean up if we're in a secondary split */
if (sv->spl_ldatum_exists || sv->spl_rdatum_exists)
{
elog(DEBUG1, "picksplit method for column %d of index \"%s\" doesn't support secondary split",
attno + 1, RelationGetRelationName(r));
supportSecondarySplit(r, giststate, attno, sv,
v->spl_lattr[attno], v->spl_rattr[attno]);
}
2006-06-28 14:08:35 +02:00
}
/* emit union datums computed by PickSplit back to v arrays */
2006-06-28 14:08:35 +02:00
v->spl_lattr[attno] = sv->spl_ldatum;
v->spl_rattr[attno] = sv->spl_rdatum;
v->spl_lisnull[attno] = false;
v->spl_risnull[attno] = false;
/*
* If index columns remain, then consider whether we can improve the split
* by using them. Even if we can't, we must compute union keys for those
* columns before we can return FALSE.
2006-06-28 14:08:35 +02:00
*/
v->spl_dontcare = NULL;
2006-06-28 14:08:35 +02:00
if (attno + 1 < giststate->tupdesc->natts)
2006-06-28 14:08:35 +02:00
{
int NumDontCare;
2006-10-04 02:30:14 +02:00
if (gistKeyIsEQ(giststate, attno, sv->spl_ldatum, sv->spl_rdatum))
{
2006-06-28 14:08:35 +02:00
/*
* Left and right union keys are equal, so we can get better split
* by considering next column.
2006-06-28 14:08:35 +02:00
*/
return true;
2006-10-04 02:30:14 +02:00
}
2006-06-28 14:08:35 +02:00
/*
* Locate don't-care tuples, if any
*/
v->spl_dontcare = (bool *) palloc0(sizeof(bool) * (entryvec->n + 1));
2006-06-28 14:08:35 +02:00
NumDontCare = findDontCares(r, giststate, entryvec->vector, v, attno);
2006-06-28 14:08:35 +02:00
if (NumDontCare == 0)
{
2006-06-28 14:08:35 +02:00
/*
* There are no don't-cares, so just compute the union keys for
* remaining columns and we're done.
2006-10-04 02:30:14 +02:00
*/
gistunionsubkey(giststate, itup, v);
}
else
{
/*
* Remove don't-cares from spl_left[] and spl_right[]. NOTE: this
* could reduce NumDontCare to zero.
*/
removeDontCares(sv->spl_left, &sv->spl_nleft,
v->spl_dontcare, &NumDontCare);
removeDontCares(sv->spl_right, &sv->spl_nright,
v->spl_dontcare, &NumDontCare);
/*
* Recompute union keys, considering only non-don't-care tuples.
* NOTE: this will set union keys for remaining index columns,
* which will cause later calls of gistUserPicksplit to pass those
* values down to user-defined PickSplit methods with
* spl_ldatum_exists/spl_rdatum_exists set true.
*/
gistunionsubkey(giststate, itup, v);
if (NumDontCare == 1)
2006-10-04 02:30:14 +02:00
{
/*
* If there's only one don't-care tuple then we can't do a
* PickSplit on it, so just choose whether to send it left or
* right by comparing penalties.
*/
OffsetNumber toMove;
2006-06-28 14:08:35 +02:00
/* find it ... */
for (toMove = FirstOffsetNumber; toMove < entryvec->n; toMove++)
2006-10-04 02:30:14 +02:00
{
if (v->spl_dontcare[toMove])
break;
2006-10-04 02:30:14 +02:00
}
Assert(toMove < entryvec->n);
/* ... and assign it to cheaper side */
placeOne(r, giststate, v, itup[toMove - 1], toMove, attno + 1);
/* recompute the union keys including this tuple */
v->spl_dontcare = NULL;
gistunionsubkey(giststate, itup, v);
2006-06-28 14:08:35 +02:00
}
else if (NumDontCare > 1)
return true;
/* else NumDontCare is now zero; handle same as above */
2006-06-28 14:08:35 +02:00
}
}
return false;
}
/*
* simply split page in half
2006-06-28 14:08:35 +02:00
*/
static void
2006-10-04 02:30:14 +02:00
gistSplitHalf(GIST_SPLITVEC *v, int len)
{
int i;
2006-06-28 14:08:35 +02:00
2006-10-04 02:30:14 +02:00
v->spl_nright = v->spl_nleft = 0;
2006-06-28 14:08:35 +02:00
v->spl_left = (OffsetNumber *) palloc(len * sizeof(OffsetNumber));
2006-10-04 02:30:14 +02:00
v->spl_right = (OffsetNumber *) palloc(len * sizeof(OffsetNumber));
for (i = 1; i <= len; i++)
if (i < len / 2)
v->spl_right[v->spl_nright++] = i;
2006-06-28 14:08:35 +02:00
else
2006-10-04 02:30:14 +02:00
v->spl_left[v->spl_nleft++] = i;
/* we need not compute union keys, caller took care of it */
2006-06-28 14:08:35 +02:00
}
/*
* gistSplitByKey: main entry point for page-splitting algorithm
*
* r: index relation
* page: page being split
* itup: array of IndexTuples to be processed
* len: number of IndexTuples to be processed (must be at least 2)
* giststate: additional info about index
* v: working state and output area
* attno: column we are working on (zero-based index)
*
* Outside caller must initialize v->spl_lisnull and v->spl_risnull arrays
* to all-TRUE. On return, spl_left/spl_nleft contain indexes of tuples
* to go left, spl_right/spl_nright contain indexes of tuples to go right,
* spl_lattr/spl_lisnull contain left-side union key values, and
* spl_rattr/spl_risnull contain right-side union key values. Other fields
* in this struct are workspace for this file.
*
* Outside caller must pass zero for attno. The function may internally
* recurse to the next column by passing attno+1.
2006-06-28 14:08:35 +02:00
*/
void
gistSplitByKey(Relation r, Page page, IndexTuple *itup, int len,
GISTSTATE *giststate, GistSplitVector *v, int attno)
2006-10-04 02:30:14 +02:00
{
GistEntryVector *entryvec;
OffsetNumber *offNullTuples;
2006-10-04 02:30:14 +02:00
int nOffNullTuples = 0;
int i;
/* generate the item array, and identify tuples with null keys */
/* note that entryvec->vector[0] goes unused in this code */
entryvec = palloc(GEVHDRSZ + (len + 1) * sizeof(GISTENTRY));
entryvec->n = len + 1;
offNullTuples = (OffsetNumber *) palloc(len * sizeof(OffsetNumber));
2006-06-28 14:08:35 +02:00
2006-10-04 02:30:14 +02:00
for (i = 1; i <= len; i++)
{
Datum datum;
bool IsNull;
2006-06-28 14:08:35 +02:00
datum = index_getattr(itup[i - 1], attno + 1, giststate->tupdesc,
&IsNull);
2006-06-28 14:08:35 +02:00
gistdentryinit(giststate, attno, &(entryvec->vector[i]),
datum, r, page, i,
FALSE, IsNull);
2006-10-04 02:30:14 +02:00
if (IsNull)
offNullTuples[nOffNullTuples++] = i;
2006-06-28 14:08:35 +02:00
}
2006-10-04 02:30:14 +02:00
if (nOffNullTuples == len)
{
/*
* Corner case: All keys in attno column are null, so just transfer
* our attention to the next column. If there's no next column, just
* split page in half.
2006-06-28 14:08:35 +02:00
*/
v->spl_risnull[attno] = v->spl_lisnull[attno] = TRUE;
if (attno + 1 < r->rd_att->natts)
gistSplitByKey(r, page, itup, len, giststate, v, attno + 1);
2006-10-04 02:30:14 +02:00
else
gistSplitHalf(&v->splitVector, len);
2006-10-04 02:30:14 +02:00
}
else if (nOffNullTuples > 0)
{
int j = 0;
/*
* We don't want to mix NULL and not-NULL keys on one page, so split
* nulls to right page and not-nulls to left.
2006-06-28 14:08:35 +02:00
*/
v->splitVector.spl_right = offNullTuples;
v->splitVector.spl_nright = nOffNullTuples;
v->spl_risnull[attno] = TRUE;
v->splitVector.spl_left = (OffsetNumber *) palloc(len * sizeof(OffsetNumber));
v->splitVector.spl_nleft = 0;
2006-10-04 02:30:14 +02:00
for (i = 1; i <= len; i++)
if (j < v->splitVector.spl_nright && offNullTuples[j] == i)
2006-06-28 14:08:35 +02:00
j++;
else
2006-10-04 02:30:14 +02:00
v->splitVector.spl_left[v->splitVector.spl_nleft++] = i;
2006-06-28 14:08:35 +02:00
/* Must compute union keys for this and any following columns */
v->spl_dontcare = NULL;
Repair bugs in GiST page splitting code for multi-column indexes. When considering a non-last column in a multi-column GiST index, gistsplit.c tries to improve on the split chosen by the opclass-specific pickSplit function by considering penalties for the next column. However, there were two bugs in this code: it failed to recompute the union keys for the leftmost index columns, even though these might well change after reassigning tuples; and it included the old union keys in the recomputation for the columns it did recompute, so that those keys couldn't get smaller even if they should. The first problem could result in an invalid index in which searches wouldn't find index entries that are in fact present; the second would make the index less efficient to search. Both of these errors were caused by misuse of gistMakeUnionItVec, whose API was designed in a way that just begged such errors to be made. There is no situation in which it's safe or useful to compute the union keys for a subset of the index columns, and there is no caller that wants any previous union keys to be included in the computation; so the undocumented choice to treat the union keys as in/out rather than pure output parameters is a waste of code as well as being dangerous. Hence, rather than just making a minimal patch, I've changed the API of gistMakeUnionItVec to remove the "startkey" parameter (it now always processes all index columns) and treat the attr/isnull arrays as purely output parameters. In passing, also get rid of a couple of unnecessary and dangerous uses of static variables in gistutil.c. It's remarkable that the one in gistMakeUnionKey hasn't given us portability troubles before now, because in addition to posing a re-entrancy hazard, it was unsafely assuming that a static char[] array would have at least Datum alignment. Per investigation of a trouble report from Tomas Vondra. (There are also some bugs in contrib/btree_gist to be fixed, but that seems like material for a separate patch.) Back-patch to all supported branches.
2013-02-07 23:44:02 +01:00
gistunionsubkey(giststate, itup, v);
2006-10-04 02:30:14 +02:00
}
else
{
2006-06-28 14:08:35 +02:00
/*
* all keys are not-null, so apply user-defined PickSplit method
2006-06-28 14:08:35 +02:00
*/
if (gistUserPicksplit(r, entryvec, attno, v, itup, len, giststate))
2006-10-04 02:30:14 +02:00
{
2006-06-28 14:08:35 +02:00
/*
* Splitting on attno column is not optimal, so consider
* redistributing don't-care tuples according to the next column
2006-06-28 14:08:35 +02:00
*/
Assert(attno + 1 < r->rd_att->natts);
if (v->spl_dontcare == NULL)
2006-10-04 02:30:14 +02:00
{
/*
* Simple case: left and right keys for attno column are
* equal, so just split according to the next column.
2006-10-04 02:30:14 +02:00
*/
gistSplitByKey(r, page, itup, len, giststate, v, attno + 1);
2006-10-04 02:30:14 +02:00
}
else
{
/*
* Form an array of just the don't-care tuples to pass to a
* recursive invocation of this function for the next column.
*/
IndexTuple *newitup = (IndexTuple *) palloc(len * sizeof(IndexTuple));
OffsetNumber *map = (OffsetNumber *) palloc(len * sizeof(OffsetNumber));
2006-10-04 02:30:14 +02:00
int newlen = 0;
GIST_SPLITVEC backupSplit;
2006-06-28 14:08:35 +02:00
2006-10-04 02:30:14 +02:00
for (i = 0; i < len; i++)
{
if (v->spl_dontcare[i + 1])
2006-10-04 02:30:14 +02:00
{
newitup[newlen] = itup[i];
2006-10-04 02:30:14 +02:00
map[newlen] = i + 1;
newlen++;
2006-06-28 14:08:35 +02:00
}
}
2006-06-28 14:08:35 +02:00
2006-10-04 02:30:14 +02:00
Assert(newlen > 0);
2006-06-28 14:08:35 +02:00
/*
* Make a backup copy of v->splitVector, since the recursive
* call will overwrite that with its own result.
*/
backupSplit = v->splitVector;
2006-10-04 02:30:14 +02:00
backupSplit.spl_left = (OffsetNumber *) palloc(sizeof(OffsetNumber) * len);
memcpy(backupSplit.spl_left, v->splitVector.spl_left, sizeof(OffsetNumber) * v->splitVector.spl_nleft);
backupSplit.spl_right = (OffsetNumber *) palloc(sizeof(OffsetNumber) * len);
memcpy(backupSplit.spl_right, v->splitVector.spl_right, sizeof(OffsetNumber) * v->splitVector.spl_nright);
2006-06-28 14:08:35 +02:00
/* Recursively decide how to split the don't-care tuples */
gistSplitByKey(r, page, newitup, newlen, giststate, v, attno + 1);
2006-06-28 14:08:35 +02:00
/* Merge result of subsplit with non-don't-care tuples */
2006-10-04 02:30:14 +02:00
for (i = 0; i < v->splitVector.spl_nleft; i++)
backupSplit.spl_left[backupSplit.spl_nleft++] = map[v->splitVector.spl_left[i] - 1];
for (i = 0; i < v->splitVector.spl_nright; i++)
backupSplit.spl_right[backupSplit.spl_nright++] = map[v->splitVector.spl_right[i] - 1];
2006-06-28 14:08:35 +02:00
v->splitVector = backupSplit;
/* recompute left and right union datums */
Repair bugs in GiST page splitting code for multi-column indexes. When considering a non-last column in a multi-column GiST index, gistsplit.c tries to improve on the split chosen by the opclass-specific pickSplit function by considering penalties for the next column. However, there were two bugs in this code: it failed to recompute the union keys for the leftmost index columns, even though these might well change after reassigning tuples; and it included the old union keys in the recomputation for the columns it did recompute, so that those keys couldn't get smaller even if they should. The first problem could result in an invalid index in which searches wouldn't find index entries that are in fact present; the second would make the index less efficient to search. Both of these errors were caused by misuse of gistMakeUnionItVec, whose API was designed in a way that just begged such errors to be made. There is no situation in which it's safe or useful to compute the union keys for a subset of the index columns, and there is no caller that wants any previous union keys to be included in the computation; so the undocumented choice to treat the union keys as in/out rather than pure output parameters is a waste of code as well as being dangerous. Hence, rather than just making a minimal patch, I've changed the API of gistMakeUnionItVec to remove the "startkey" parameter (it now always processes all index columns) and treat the attr/isnull arrays as purely output parameters. In passing, also get rid of a couple of unnecessary and dangerous uses of static variables in gistutil.c. It's remarkable that the one in gistMakeUnionKey hasn't given us portability troubles before now, because in addition to posing a re-entrancy hazard, it was unsafely assuming that a static char[] array would have at least Datum alignment. Per investigation of a trouble report from Tomas Vondra. (There are also some bugs in contrib/btree_gist to be fixed, but that seems like material for a separate patch.) Back-patch to all supported branches.
2013-02-07 23:44:02 +01:00
gistunionsubkey(giststate, itup, v);
2006-06-28 14:08:35 +02:00
}
2006-10-04 02:30:14 +02:00
}
2006-06-28 14:08:35 +02:00
}
}