Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* ragetypes_typanalyze.c
|
|
|
|
* Functions for gathering statistics from range columns
|
|
|
|
*
|
|
|
|
* For a range type column, histograms of lower and upper bounds, and
|
|
|
|
* the fraction of NULL and empty ranges are collected.
|
|
|
|
*
|
|
|
|
* Both histograms have the same length, and they are combined into a
|
|
|
|
* single array of ranges. This has the same shape as the histogram that
|
|
|
|
* std_typanalyze would collect, but the values are different. Each range
|
|
|
|
* in the array is a valid range, even though the lower and upper bounds
|
|
|
|
* come from different tuples. In theory, the standard scalar selectivity
|
|
|
|
* functions could be used with the combined histogram.
|
|
|
|
*
|
2017-01-03 19:48:53 +01:00
|
|
|
* Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
|
|
|
* src/backend/utils/adt/rangetypes_typanalyze.c
|
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
|
|
|
|
#include "catalog/pg_operator.h"
|
|
|
|
#include "commands/vacuum.h"
|
|
|
|
#include "utils/builtins.h"
|
2013-12-24 04:18:12 +01:00
|
|
|
#include "utils/lsyscache.h"
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
#include "utils/rangetypes.h"
|
|
|
|
|
2013-05-29 22:58:43 +02:00
|
|
|
static int float8_qsort_cmp(const void *a1, const void *a2);
|
|
|
|
static int range_bound_qsort_cmp(const void *a1, const void *a2, void *arg);
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
static void compute_range_stats(VacAttrStats *stats,
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
AnalyzeAttrFetchFunc fetchfunc, int samplerows, double totalrows);
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* range_typanalyze -- typanalyze function for range columns
|
|
|
|
*/
|
|
|
|
Datum
|
|
|
|
range_typanalyze(PG_FUNCTION_ARGS)
|
|
|
|
{
|
|
|
|
VacAttrStats *stats = (VacAttrStats *) PG_GETARG_POINTER(0);
|
|
|
|
TypeCacheEntry *typcache;
|
|
|
|
Form_pg_attribute attr = stats->attr;
|
|
|
|
|
2013-12-24 04:18:12 +01:00
|
|
|
/* Get information about range type; note column might be a domain */
|
|
|
|
typcache = range_get_typcache(fcinfo, getBaseType(stats->attrtypid));
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
|
|
|
|
if (attr->attstattarget < 0)
|
2013-05-29 22:58:43 +02:00
|
|
|
attr->attstattarget = default_statistics_target;
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
|
|
|
|
stats->compute_stats = compute_range_stats;
|
|
|
|
stats->extra_data = typcache;
|
|
|
|
/* same as in std_typanalyze */
|
|
|
|
stats->minrows = 300 * attr->attstattarget;
|
|
|
|
|
|
|
|
PG_RETURN_BOOL(true);
|
|
|
|
}
|
|
|
|
|
2013-03-14 14:36:56 +01:00
|
|
|
/*
|
|
|
|
* Comparison function for sorting float8s, used for range lengths.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
float8_qsort_cmp(const void *a1, const void *a2)
|
|
|
|
{
|
|
|
|
const float8 *f1 = (const float8 *) a1;
|
|
|
|
const float8 *f2 = (const float8 *) a2;
|
|
|
|
|
|
|
|
if (*f1 < *f2)
|
|
|
|
return -1;
|
|
|
|
else if (*f1 == *f2)
|
|
|
|
return 0;
|
|
|
|
else
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
/*
|
|
|
|
* Comparison function for sorting RangeBounds.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
range_bound_qsort_cmp(const void *a1, const void *a2, void *arg)
|
|
|
|
{
|
2013-05-29 22:58:43 +02:00
|
|
|
RangeBound *b1 = (RangeBound *) a1;
|
|
|
|
RangeBound *b2 = (RangeBound *) a2;
|
|
|
|
TypeCacheEntry *typcache = (TypeCacheEntry *) arg;
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
|
|
|
|
return range_cmp_bounds(typcache, b1, b2);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* compute_range_stats() -- compute statistics for a range column
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
compute_range_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
|
|
|
|
int samplerows, double totalrows)
|
|
|
|
{
|
|
|
|
TypeCacheEntry *typcache = (TypeCacheEntry *) stats->extra_data;
|
2013-03-14 14:36:56 +01:00
|
|
|
bool has_subdiff = OidIsValid(typcache->rng_subdiff_finfo.fn_oid);
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
int null_cnt = 0;
|
|
|
|
int non_null_cnt = 0;
|
|
|
|
int non_empty_cnt = 0;
|
|
|
|
int empty_cnt = 0;
|
|
|
|
int range_no;
|
|
|
|
int slot_idx;
|
|
|
|
int num_bins = stats->attr->attstattarget;
|
|
|
|
int num_hist;
|
2013-03-14 14:36:56 +01:00
|
|
|
float8 *lengths;
|
2013-05-29 22:58:43 +02:00
|
|
|
RangeBound *lowers,
|
|
|
|
*uppers;
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
double total_width = 0;
|
|
|
|
|
2013-03-14 14:36:56 +01:00
|
|
|
/* Allocate memory to hold range bounds and lengths of the sample ranges. */
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
lowers = (RangeBound *) palloc(sizeof(RangeBound) * samplerows);
|
|
|
|
uppers = (RangeBound *) palloc(sizeof(RangeBound) * samplerows);
|
2013-03-14 14:36:56 +01:00
|
|
|
lengths = (float8 *) palloc(sizeof(float8) * samplerows);
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
|
|
|
|
/* Loop over the sample ranges. */
|
|
|
|
for (range_no = 0; range_no < samplerows; range_no++)
|
|
|
|
{
|
|
|
|
Datum value;
|
|
|
|
bool isnull,
|
|
|
|
empty;
|
|
|
|
RangeType *range;
|
|
|
|
RangeBound lower,
|
|
|
|
upper;
|
2013-03-14 14:36:56 +01:00
|
|
|
float8 length;
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
|
|
|
|
vacuum_delay_point();
|
|
|
|
|
|
|
|
value = fetchfunc(stats, range_no, &isnull);
|
|
|
|
if (isnull)
|
|
|
|
{
|
|
|
|
/* range is null, just count that */
|
|
|
|
null_cnt++;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* XXX: should we ignore wide values, like std_typanalyze does, to
|
|
|
|
* avoid bloating the statistics table?
|
|
|
|
*/
|
|
|
|
total_width += VARSIZE_ANY(DatumGetPointer(value));
|
|
|
|
|
|
|
|
/* Get range and deserialize it for further analysis. */
|
|
|
|
range = DatumGetRangeType(value);
|
|
|
|
range_deserialize(typcache, range, &lower, &upper, &empty);
|
|
|
|
|
|
|
|
if (!empty)
|
|
|
|
{
|
2013-03-14 14:36:56 +01:00
|
|
|
/* Remember bounds and length for further usage in histograms */
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
lowers[non_empty_cnt] = lower;
|
|
|
|
uppers[non_empty_cnt] = upper;
|
2013-03-14 14:36:56 +01:00
|
|
|
|
|
|
|
if (lower.infinite || upper.infinite)
|
|
|
|
{
|
|
|
|
/* Length of any kind of an infinite range is infinite */
|
|
|
|
length = get_float8_infinity();
|
|
|
|
}
|
|
|
|
else if (has_subdiff)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* For an ordinary range, use subdiff function between upper
|
|
|
|
* and lower bound values.
|
|
|
|
*/
|
|
|
|
length = DatumGetFloat8(FunctionCall2Coll(
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
&typcache->rng_subdiff_finfo,
|
|
|
|
typcache->rng_collation,
|
|
|
|
upper.val, lower.val));
|
2013-03-14 14:36:56 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Use default value of 1.0 if no subdiff is available. */
|
|
|
|
length = 1.0;
|
|
|
|
}
|
|
|
|
lengths[non_empty_cnt] = length;
|
|
|
|
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
non_empty_cnt++;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
empty_cnt++;
|
|
|
|
|
|
|
|
non_null_cnt++;
|
|
|
|
}
|
|
|
|
|
|
|
|
slot_idx = 0;
|
|
|
|
|
|
|
|
/* We can only compute real stats if we found some non-null values. */
|
|
|
|
if (non_null_cnt > 0)
|
|
|
|
{
|
|
|
|
Datum *bound_hist_values;
|
2013-03-14 14:36:56 +01:00
|
|
|
Datum *length_hist_values;
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
int pos,
|
|
|
|
posfrac,
|
|
|
|
delta,
|
|
|
|
deltafrac,
|
|
|
|
i;
|
|
|
|
MemoryContext old_cxt;
|
|
|
|
float4 *emptyfrac;
|
|
|
|
|
|
|
|
stats->stats_valid = true;
|
|
|
|
/* Do the simple null-frac and width stats */
|
|
|
|
stats->stanullfrac = (double) null_cnt / (double) samplerows;
|
|
|
|
stats->stawidth = total_width / (double) non_null_cnt;
|
Fix misestimation of n_distinct for a nearly-unique column with many nulls.
If ANALYZE found no repeated non-null entries in its sample, it set the
column's stadistinct value to -1.0, intending to indicate that the entries
are all distinct. But what this value actually means is that the number
of distinct values is 100% of the table's rowcount, and thus it was
overestimating the number of distinct values by however many nulls there
are. This could lead to very poor selectivity estimates, as for example
in a recent report from Andreas Joseph Krogh. We should discount the
stadistinct value by whatever we've estimated the nulls fraction to be.
(That is what will happen if we choose to use a negative stadistinct for
a column that does have repeated entries, so this code path was just
inconsistent.)
In addition to fixing the stadistinct entries stored by several different
ANALYZE code paths, adjust the logic where get_variable_numdistinct()
forces an "all distinct" estimate on the basis of finding a relevant unique
index. Unique indexes don't reject nulls, so there's no reason to assume
that the null fraction doesn't apply.
Back-patch to all supported branches. Back-patching is a bit of a judgment
call, but this problem seems to affect only a few users (else we'd have
identified it long ago), and it's bad enough when it does happen that
destabilizing plan choices in a worse direction seems unlikely.
Patch by me, with documentation wording suggested by Dean Rasheed
Report: <VisenaEmail.26.df42f82acae38a58.156463942b8@tc7-visena>
Discussion: <16143.1470350371@sss.pgh.pa.us>
2016-08-08 00:52:02 +02:00
|
|
|
|
|
|
|
/* Estimate that non-null values are unique */
|
|
|
|
stats->stadistinct = -1.0 * (1.0 - stats->stanullfrac);
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
|
|
|
|
/* Must copy the target values into anl_context */
|
|
|
|
old_cxt = MemoryContextSwitchTo(stats->anl_context);
|
|
|
|
|
2012-08-30 19:27:19 +02:00
|
|
|
/*
|
2013-03-14 14:36:56 +01:00
|
|
|
* Generate a bounds histogram slot entry if there are at least two
|
|
|
|
* values.
|
2012-08-30 19:27:19 +02:00
|
|
|
*/
|
|
|
|
if (non_empty_cnt >= 2)
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
{
|
|
|
|
/* Sort bound values */
|
|
|
|
qsort_arg(lowers, non_empty_cnt, sizeof(RangeBound),
|
|
|
|
range_bound_qsort_cmp, typcache);
|
|
|
|
qsort_arg(uppers, non_empty_cnt, sizeof(RangeBound),
|
|
|
|
range_bound_qsort_cmp, typcache);
|
|
|
|
|
|
|
|
num_hist = non_empty_cnt;
|
|
|
|
if (num_hist > num_bins)
|
|
|
|
num_hist = num_bins + 1;
|
|
|
|
|
|
|
|
bound_hist_values = (Datum *) palloc(num_hist * sizeof(Datum));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The object of this loop is to construct ranges from first and
|
|
|
|
* last entries in lowers[] and uppers[] along with evenly-spaced
|
2013-05-29 22:58:43 +02:00
|
|
|
* values in between. So the i'th value is a range of lowers[(i *
|
|
|
|
* (nvals - 1)) / (num_hist - 1)] and uppers[(i * (nvals - 1)) /
|
|
|
|
* (num_hist - 1)]. But computing that subscript directly risks
|
|
|
|
* integer overflow when the stats target is more than a couple
|
|
|
|
* thousand. Instead we add (nvals - 1) / (num_hist - 1) to pos
|
|
|
|
* at each step, tracking the integral and fractional parts of the
|
|
|
|
* sum separately.
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
*/
|
|
|
|
delta = (non_empty_cnt - 1) / (num_hist - 1);
|
|
|
|
deltafrac = (non_empty_cnt - 1) % (num_hist - 1);
|
|
|
|
pos = posfrac = 0;
|
|
|
|
|
|
|
|
for (i = 0; i < num_hist; i++)
|
|
|
|
{
|
|
|
|
bound_hist_values[i] = PointerGetDatum(range_serialize(
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
typcache, &lowers[pos], &uppers[pos], false));
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
pos += delta;
|
|
|
|
posfrac += deltafrac;
|
|
|
|
if (posfrac >= (num_hist - 1))
|
|
|
|
{
|
|
|
|
/* fractional part exceeds 1, carry to integer part */
|
|
|
|
pos++;
|
|
|
|
posfrac -= (num_hist - 1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
stats->stakind[slot_idx] = STATISTIC_KIND_BOUNDS_HISTOGRAM;
|
|
|
|
stats->stavalues[slot_idx] = bound_hist_values;
|
|
|
|
stats->numvalues[slot_idx] = num_hist;
|
|
|
|
slot_idx++;
|
|
|
|
}
|
|
|
|
|
2013-03-14 14:36:56 +01:00
|
|
|
/*
|
|
|
|
* Generate a length histogram slot entry if there are at least two
|
|
|
|
* values.
|
|
|
|
*/
|
|
|
|
if (non_empty_cnt >= 2)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Ascending sort of range lengths for further filling of
|
|
|
|
* histogram
|
|
|
|
*/
|
|
|
|
qsort(lengths, non_empty_cnt, sizeof(float8), float8_qsort_cmp);
|
|
|
|
|
|
|
|
num_hist = non_empty_cnt;
|
|
|
|
if (num_hist > num_bins)
|
|
|
|
num_hist = num_bins + 1;
|
|
|
|
|
|
|
|
length_hist_values = (Datum *) palloc(num_hist * sizeof(Datum));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The object of this loop is to copy the first and last lengths[]
|
|
|
|
* entries along with evenly-spaced values in between. So the i'th
|
|
|
|
* value is lengths[(i * (nvals - 1)) / (num_hist - 1)]. But
|
2013-05-29 22:58:43 +02:00
|
|
|
* computing that subscript directly risks integer overflow when
|
|
|
|
* the stats target is more than a couple thousand. Instead we
|
|
|
|
* add (nvals - 1) / (num_hist - 1) to pos at each step, tracking
|
|
|
|
* the integral and fractional parts of the sum separately.
|
2013-03-14 14:36:56 +01:00
|
|
|
*/
|
|
|
|
delta = (non_empty_cnt - 1) / (num_hist - 1);
|
|
|
|
deltafrac = (non_empty_cnt - 1) % (num_hist - 1);
|
|
|
|
pos = posfrac = 0;
|
|
|
|
|
|
|
|
for (i = 0; i < num_hist; i++)
|
|
|
|
{
|
|
|
|
length_hist_values[i] = Float8GetDatum(lengths[pos]);
|
|
|
|
pos += delta;
|
|
|
|
posfrac += deltafrac;
|
|
|
|
if (posfrac >= (num_hist - 1))
|
|
|
|
{
|
|
|
|
/* fractional part exceeds 1, carry to integer part */
|
|
|
|
pos++;
|
|
|
|
posfrac -= (num_hist - 1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Even when we don't create the histogram, store an empty array
|
|
|
|
* to mean "no histogram". We can't just leave stavalues NULL,
|
|
|
|
* because get_attstatsslot() errors if you ask for stavalues, and
|
|
|
|
* it's NULL. We'll still store the empty fraction in stanumbers.
|
|
|
|
*/
|
|
|
|
length_hist_values = palloc(0);
|
|
|
|
num_hist = 0;
|
|
|
|
}
|
|
|
|
stats->staop[slot_idx] = Float8LessOperator;
|
|
|
|
stats->stavalues[slot_idx] = length_hist_values;
|
|
|
|
stats->numvalues[slot_idx] = num_hist;
|
|
|
|
stats->statypid[slot_idx] = FLOAT8OID;
|
|
|
|
stats->statyplen[slot_idx] = sizeof(float8);
|
|
|
|
#ifdef USE_FLOAT8_BYVAL
|
|
|
|
stats->statypbyval[slot_idx] = true;
|
|
|
|
#else
|
|
|
|
stats->statypbyval[slot_idx] = false;
|
|
|
|
#endif
|
|
|
|
stats->statypalign[slot_idx] = 'd';
|
|
|
|
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
/* Store the fraction of empty ranges */
|
|
|
|
emptyfrac = (float4 *) palloc(sizeof(float4));
|
|
|
|
*emptyfrac = ((double) empty_cnt) / ((double) non_null_cnt);
|
|
|
|
stats->stanumbers[slot_idx] = emptyfrac;
|
|
|
|
stats->numnumbers[slot_idx] = 1;
|
2013-03-14 14:36:56 +01:00
|
|
|
|
|
|
|
stats->stakind[slot_idx] = STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM;
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
slot_idx++;
|
|
|
|
|
|
|
|
MemoryContextSwitchTo(old_cxt);
|
|
|
|
}
|
|
|
|
else if (null_cnt > 0)
|
|
|
|
{
|
|
|
|
/* We found only nulls; assume the column is entirely null */
|
|
|
|
stats->stats_valid = true;
|
|
|
|
stats->stanullfrac = 1.0;
|
2013-05-29 22:58:43 +02:00
|
|
|
stats->stawidth = 0; /* "unknown" */
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
|
|
|
stats->stadistinct = 0.0; /* "unknown" */
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
}
|
2013-05-29 22:58:43 +02:00
|
|
|
|
Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
2012-08-27 14:48:46 +02:00
|
|
|
/*
|
|
|
|
* We don't need to bother cleaning up any of our temporary palloc's. The
|
|
|
|
* hashtable should also go away, as it used a child memory context.
|
|
|
|
*/
|
|
|
|
}
|