Add some more defenses against silly estimates to gincostestimate().

A report from Andy Colson showed that gincostestimate() was not being
nearly paranoid enough about whether to believe the statistics it finds in
the index metapage.  The problem is that the metapage stats (other than the
pending-pages count) are only updated by VACUUM, and in the worst case
could still reflect the index's original empty state even when it has grown
to many entries.  We attempted to deal with that by scaling up the stats to
match the current index size, but if nEntries is zero then scaling it up
still gives zero.  Moreover, the proportion of pages that are entry pages
vs. data pages vs. pending pages is unlikely to be estimated very well by
scaling if the index is now orders of magnitude larger than before.

We can improve matters by expanding the use of the rule-of-thumb estimates
I introduced in commit 7fb008c5ee: if the index has grown by more
than a cutoff amount (here set at 4X growth) since VACUUM, then use the
rule-of-thumb numbers instead of scaling.  This might not be exactly right
but it seems much less likely to produce insane estimates.

I also improved both the scaling estimate and the rule-of-thumb estimate
to account for numPendingPages, since it's reasonable to expect that that
is accurate in any case, and certainly pages that are in the pending list
are not either entry or data pages.

As a somewhat separate issue, adjust the estimation equations that are
concerned with extra fetches for partial-match searches.  These equations
suppose that a fraction partialEntries / numEntries of the entry and data
pages will be visited as a consequence of a partial-match search.  Now,
it's physically impossible for that fraction to exceed one, but our
estimate of partialEntries is mostly bunk, and our estimate of numEntries
isn't exactly gospel either, so we could arrive at a silly value.  In the
example presented by Andy we were coming out with a value of 100, leading
to insane cost estimates.  Clamp the fraction to one to avoid that.

Like the previous patch, back-patch to all supported branches; this
problem can be demonstrated in one form or another in all of them.
This commit is contained in:
Tom Lane 2016-01-01 13:42:21 -05:00
parent 2d774aaf18
commit d932391fd8

View File

@ -7240,6 +7240,7 @@ gincostestimate(PG_FUNCTION_ARGS)
numEntries; numEntries;
GinQualCounts counts; GinQualCounts counts;
bool matchPossible; bool matchPossible;
double partialScale;
double entryPagesFetched, double entryPagesFetched,
dataPagesFetched, dataPagesFetched,
dataPagesFetchedBySel; dataPagesFetchedBySel;
@ -7268,42 +7269,59 @@ gincostestimate(PG_FUNCTION_ARGS)
memset(&ginStats, 0, sizeof(ginStats)); memset(&ginStats, 0, sizeof(ginStats));
} }
if (ginStats.nTotalPages > 0 && ginStats.nEntryPages > 0 && numPages > 0) /*
* Assuming we got valid (nonzero) stats at all, nPendingPages can be
* trusted, but the other fields are data as of the last VACUUM. We can
* scale them up to account for growth since then, but that method only
* goes so far; in the worst case, the stats might be for a completely
* empty index, and scaling them will produce pretty bogus numbers.
* Somewhat arbitrarily, set the cutoff for doing scaling at 4X growth; if
* it's grown more than that, fall back to estimating things only from the
* assumed-accurate index size. But we'll trust nPendingPages in any case
* so long as it's not clearly insane, ie, more than the index size.
*/
if (ginStats.nPendingPages < numPages)
numPendingPages = ginStats.nPendingPages;
else
numPendingPages = 0;
if (numPages > 0 && ginStats.nTotalPages <= numPages &&
ginStats.nTotalPages > numPages / 4 &&
ginStats.nEntryPages > 0 && ginStats.nEntries > 0)
{ {
/* /*
* We got valid stats. nPendingPages can be trusted, but the other * OK, the stats seem close enough to sane to be trusted. But we
* fields are data as of the last VACUUM. Scale them by the ratio * still need to scale them by the ratio numPages / nTotalPages to
* numPages / nTotalPages to account for growth since then. * account for growth since the last VACUUM.
*/ */
double scale = numPages / ginStats.nTotalPages; double scale = numPages / ginStats.nTotalPages;
numEntryPages = ginStats.nEntryPages; numEntryPages = ceil(ginStats.nEntryPages * scale);
numDataPages = ginStats.nDataPages; numDataPages = ceil(ginStats.nDataPages * scale);
numPendingPages = ginStats.nPendingPages; numEntries = ceil(ginStats.nEntries * scale);
numEntries = ginStats.nEntries;
numEntryPages = ceil(numEntryPages * scale);
numDataPages = ceil(numDataPages * scale);
numEntries = ceil(numEntries * scale);
/* ensure we didn't round up too much */ /* ensure we didn't round up too much */
numEntryPages = Min(numEntryPages, numPages); numEntryPages = Min(numEntryPages, numPages - numPendingPages);
numDataPages = Min(numDataPages, numPages - numEntryPages); numDataPages = Min(numDataPages,
numPages - numPendingPages - numEntryPages);
} }
else else
{ {
/* /*
* It's a hypothetical index, or perhaps an index created pre-9.1 and * We might get here because it's a hypothetical index, or an index
* never vacuumed since upgrading. Invent some plausible internal * created pre-9.1 and never vacuumed since upgrading (in which case
* statistics based on the index page count. We estimate that 90% of * its stats would read as zeroes), or just because it's grown too
* the index is entry pages, and the rest is data pages. Estimate 100 * much since the last VACUUM for us to put our faith in scaling.
* entries per entry page; this is rather bogus since it'll depend on *
* the size of the keys, but it's more robust than trying to predict * Invent some plausible internal statistics based on the index page
* the number of entries per heap tuple. * count (and clamp that to at least 10 pages, just in case). We
* estimate that 90% of the index is entry pages, and the rest is data
* pages. Estimate 100 entries per entry page; this is rather bogus
* since it'll depend on the size of the keys, but it's more robust
* than trying to predict the number of entries per heap tuple.
*/ */
numPages = Max(numPages, 10); numPages = Max(numPages, 10);
numEntryPages = floor(numPages * 0.90); numEntryPages = floor((numPages - numPendingPages) * 0.90);
numDataPages = numPages - numEntryPages; numDataPages = numPages - numPendingPages - numEntryPages;
numPendingPages = 0;
numEntries = floor(numEntryPages * 100); numEntries = floor(numEntryPages * 100);
} }
@ -7429,16 +7447,21 @@ gincostestimate(PG_FUNCTION_ARGS)
/* /*
* Add an estimate of entry pages read by partial match algorithm. It's a * Add an estimate of entry pages read by partial match algorithm. It's a
* scan over leaf pages in entry tree. We haven't any useful stats here, * scan over leaf pages in entry tree. We haven't any useful stats here,
* so estimate it as proportion. * so estimate it as proportion. Because counts.partialEntries is really
* pretty bogus (see code above), it's possible that it is more than
* numEntries; clamp the proportion to ensure sanity.
*/ */
entryPagesFetched += ceil(numEntryPages * counts.partialEntries / numEntries); partialScale = counts.partialEntries / numEntries;
partialScale = Min(partialScale, 1.0);
entryPagesFetched += ceil(numEntryPages * partialScale);
/* /*
* Partial match algorithm reads all data pages before doing actual scan, * Partial match algorithm reads all data pages before doing actual scan,
* so it's a startup cost. Again, we haven't any useful stats here, so, * so it's a startup cost. Again, we haven't any useful stats here, so
* estimate it as proportion * estimate it as proportion.
*/ */
dataPagesFetched = ceil(numDataPages * counts.partialEntries / numEntries); dataPagesFetched = ceil(numDataPages * partialScale);
/* /*
* Calculate cache effects if more than one scan due to nestloops or array * Calculate cache effects if more than one scan due to nestloops or array