Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* extended_stats.c
|
|
|
|
* POSTGRES extended statistics
|
|
|
|
*
|
2017-05-14 16:54:47 +02:00
|
|
|
* Generic code supporting statistics objects created via CREATE STATISTICS.
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
*
|
|
|
|
*
|
2019-01-02 18:44:25 +01:00
|
|
|
* Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
|
|
|
* src/backend/statistics/extended_stats.c
|
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
|
2019-07-08 17:58:05 +02:00
|
|
|
#include "access/detoast.h"
|
2019-12-27 00:09:00 +01:00
|
|
|
#include "access/genam.h"
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
#include "access/htup_details.h"
|
2019-01-21 19:18:20 +01:00
|
|
|
#include "access/table.h"
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
#include "catalog/indexing.h"
|
|
|
|
#include "catalog/pg_collation.h"
|
|
|
|
#include "catalog/pg_statistic_ext.h"
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
#include "catalog/pg_statistic_ext_data.h"
|
2019-06-23 19:50:08 +02:00
|
|
|
#include "miscadmin.h"
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
#include "nodes/nodeFuncs.h"
|
|
|
|
#include "optimizer/clauses.h"
|
|
|
|
#include "optimizer/optimizer.h"
|
2017-04-17 19:00:47 +02:00
|
|
|
#include "postmaster/autovacuum.h"
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
#include "statistics/extended_stats_internal.h"
|
|
|
|
#include "statistics/statistics.h"
|
2019-08-16 19:33:30 +02:00
|
|
|
#include "utils/array.h"
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
#include "utils/builtins.h"
|
|
|
|
#include "utils/fmgroids.h"
|
|
|
|
#include "utils/lsyscache.h"
|
2017-04-17 19:00:47 +02:00
|
|
|
#include "utils/memutils.h"
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
#include "utils/rel.h"
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
#include "utils/selfuncs.h"
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
#include "utils/syscache.h"
|
|
|
|
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
/*
|
|
|
|
* To avoid consuming too much memory during analysis and/or too much space
|
|
|
|
* in the resulting pg_statistic rows, we ignore varlena datums that are wider
|
|
|
|
* than WIDTH_THRESHOLD (after detoasting!). This is legitimate for MCV
|
|
|
|
* and distinct-value calculations since a wide value is unlikely to be
|
|
|
|
* duplicated at all, much less be a most-common value. For the same reason,
|
|
|
|
* ignoring wide values will not affect our estimates of histogram bin
|
|
|
|
* boundaries very much.
|
|
|
|
*/
|
|
|
|
#define WIDTH_THRESHOLD 1024
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
|
|
|
|
/*
|
2017-05-14 16:54:47 +02:00
|
|
|
* Used internally to refer to an individual statistics object, i.e.,
|
|
|
|
* a pg_statistic_ext entry.
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
*/
|
|
|
|
typedef struct StatExtEntry
|
|
|
|
{
|
2017-05-17 22:31:56 +02:00
|
|
|
Oid statOid; /* OID of pg_statistic_ext entry */
|
|
|
|
char *schema; /* statistics object's schema */
|
|
|
|
char *name; /* statistics object's name */
|
|
|
|
Bitmapset *columns; /* attribute numbers covered by the object */
|
|
|
|
List *types; /* 'char' list of enabled statistic kinds */
|
Allow setting statistics target for extended statistics
When building statistics, we need to decide how many rows to sample and
how accurate the resulting statistics should be. Until now, it was not
possible to explicitly define statistics target for extended statistics
objects, the value was always computed from the per-attribute targets
with a fallback to the system-wide default statistics target.
That's a bit inconvenient, as it ties together the statistics target set
for per-column and extended statistics. In some cases it may be useful
to require larger sample / higher accuracy for extended statics (or the
other way around), but with this approach that's not possible.
So this commit introduces a new command, allowing to specify statistics
target for individual extended statistics objects, overriding the value
derived from per-attribute targets (and the system default).
ALTER STATISTICS stat_name SET STATISTICS target_value;
When determining statistics target for an extended statistics object we
first look at this explicitly set value. When this value is -1, we fall
back to the old formula, looking at the per-attribute targets first and
then the system default. This means the behavior is backwards compatible
with older PostgreSQL releases.
Author: Tomas Vondra
Discussion: https://postgr.es/m/20190618213357.vli3i23vpkset2xd@development
Reviewed-by: Kirk Jamison, Dean Rasheed
2019-09-10 20:09:27 +02:00
|
|
|
int stattarget; /* statistics target (-1 for default) */
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
} StatExtEntry;
|
|
|
|
|
|
|
|
|
|
|
|
static List *fetch_statentries_for_relation(Relation pg_statext, Oid relid);
|
|
|
|
static VacAttrStats **lookup_var_attr_stats(Relation rel, Bitmapset *attrs,
|
2019-05-22 19:04:48 +02:00
|
|
|
int nvacatts, VacAttrStats **vacatts);
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
static void statext_store(Oid relid,
|
2019-05-22 19:04:48 +02:00
|
|
|
MVNDistinct *ndistinct, MVDependencies *dependencies,
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
MCVList *mcv, VacAttrStats **stats);
|
Allow setting statistics target for extended statistics
When building statistics, we need to decide how many rows to sample and
how accurate the resulting statistics should be. Until now, it was not
possible to explicitly define statistics target for extended statistics
objects, the value was always computed from the per-attribute targets
with a fallback to the system-wide default statistics target.
That's a bit inconvenient, as it ties together the statistics target set
for per-column and extended statistics. In some cases it may be useful
to require larger sample / higher accuracy for extended statics (or the
other way around), but with this approach that's not possible.
So this commit introduces a new command, allowing to specify statistics
target for individual extended statistics objects, overriding the value
derived from per-attribute targets (and the system default).
ALTER STATISTICS stat_name SET STATISTICS target_value;
When determining statistics target for an extended statistics object we
first look at this explicitly set value. When this value is -1, we fall
back to the old formula, looking at the per-attribute targets first and
then the system default. This means the behavior is backwards compatible
with older PostgreSQL releases.
Author: Tomas Vondra
Discussion: https://postgr.es/m/20190618213357.vli3i23vpkset2xd@development
Reviewed-by: Kirk Jamison, Dean Rasheed
2019-09-10 20:09:27 +02:00
|
|
|
static int statext_compute_stattarget(int stattarget,
|
|
|
|
int natts, VacAttrStats **stats);
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Compute requested extended stats, using the rows sampled for the plain
|
|
|
|
* (single-column) stats.
|
|
|
|
*
|
2017-05-14 16:54:47 +02:00
|
|
|
* This fetches a list of stats types from pg_statistic_ext, computes the
|
|
|
|
* requested stats, and serializes them back into the catalog.
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
*/
|
|
|
|
void
|
|
|
|
BuildRelationExtStatistics(Relation onerel, double totalrows,
|
|
|
|
int numrows, HeapTuple *rows,
|
|
|
|
int natts, VacAttrStats **vacattrstats)
|
|
|
|
{
|
|
|
|
Relation pg_stext;
|
|
|
|
ListCell *lc;
|
|
|
|
List *stats;
|
2017-04-17 19:00:47 +02:00
|
|
|
MemoryContext cxt;
|
|
|
|
MemoryContext oldcxt;
|
|
|
|
|
Allow memory contexts to have both fixed and variable ident strings.
Originally, we treated memory context names as potentially variable in
all cases, and therefore always copied them into the context header.
Commit 9fa6f00b1 rethought this a little bit and invented a distinction
between fixed and variable names, skipping the copy step for the former.
But we can make things both simpler and more useful by instead allowing
there to be two parts to a context's identification, a fixed "name" and
an optional, variable "ident". The name supplied in the context create
call is now required to be a compile-time-constant string in all cases,
as it is never copied but just pointed to. The "ident" string, if
wanted, is supplied later. This is needed because typically we want
the ident to be stored inside the context so that it's cleaned up
automatically on context deletion; that means it has to be copied into
the context before we can set the pointer.
The cost of this approach is basically just an additional pointer field
in struct MemoryContextData, which isn't much overhead, and is bought
back entirely in the AllocSet case by not needing a headerSize field
anymore, since we no longer have to cope with variable header length.
In addition, we can simplify the internal interfaces for memory context
creation still further, saving a few cycles there. And it's no longer
true that a custom identifier disqualifies a context from participating
in aset.c's freelist scheme, so possibly there's some win on that end.
All the places that were using non-compile-time-constant context names
are adjusted to put the variable info into the "ident" instead. This
allows more effective identification of those contexts in many cases;
for example, subsidary contexts of relcache entries are now identified
by both type (e.g. "index info") and relname, where before you got only
one or the other. Contexts associated with PL function cache entries
are now identified more fully and uniformly, too.
I also arranged for plancache contexts to use the query source string
as their identifier. This is basically free for CachedPlanSources, as
they contained a copy of that string already. We pay an extra pstrdup
to do it for CachedPlans. That could perhaps be avoided, but it would
make things more fragile (since the CachedPlanSource is sometimes
destroyed first). I suspect future improvements in error reporting will
require CachedPlans to have a copy of that string anyway, so it's not
clear that it's worth moving mountains to avoid it now.
This also changes the APIs for context statistics routines so that the
context-specific routines no longer assume that output goes straight
to stderr, nor do they know all details of the output format. This
is useful immediately to reduce code duplication, and it also allows
for external code to do something with stats output that's different
from printing to stderr.
The reason for pushing this now rather than waiting for v12 is that
it rethinks some of the API changes made by commit 9fa6f00b1. Seems
better for extension authors to endure just one round of API changes
not two.
Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com
2018-03-27 22:46:47 +02:00
|
|
|
cxt = AllocSetContextCreate(CurrentMemoryContext,
|
|
|
|
"BuildRelationExtStatistics",
|
2017-04-17 19:00:47 +02:00
|
|
|
ALLOCSET_DEFAULT_SIZES);
|
|
|
|
oldcxt = MemoryContextSwitchTo(cxt);
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
pg_stext = table_open(StatisticExtRelationId, RowExclusiveLock);
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
stats = fetch_statentries_for_relation(pg_stext, RelationGetRelid(onerel));
|
|
|
|
|
|
|
|
foreach(lc, stats)
|
|
|
|
{
|
2017-05-17 22:31:56 +02:00
|
|
|
StatExtEntry *stat = (StatExtEntry *) lfirst(lc);
|
|
|
|
MVNDistinct *ndistinct = NULL;
|
2017-04-06 00:00:42 +02:00
|
|
|
MVDependencies *dependencies = NULL;
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
MCVList *mcv = NULL;
|
2017-05-17 22:31:56 +02:00
|
|
|
VacAttrStats **stats;
|
|
|
|
ListCell *lc2;
|
Allow setting statistics target for extended statistics
When building statistics, we need to decide how many rows to sample and
how accurate the resulting statistics should be. Until now, it was not
possible to explicitly define statistics target for extended statistics
objects, the value was always computed from the per-attribute targets
with a fallback to the system-wide default statistics target.
That's a bit inconvenient, as it ties together the statistics target set
for per-column and extended statistics. In some cases it may be useful
to require larger sample / higher accuracy for extended statics (or the
other way around), but with this approach that's not possible.
So this commit introduces a new command, allowing to specify statistics
target for individual extended statistics objects, overriding the value
derived from per-attribute targets (and the system default).
ALTER STATISTICS stat_name SET STATISTICS target_value;
When determining statistics target for an extended statistics object we
first look at this explicitly set value. When this value is -1, we fall
back to the old formula, looking at the per-attribute targets first and
then the system default. This means the behavior is backwards compatible
with older PostgreSQL releases.
Author: Tomas Vondra
Discussion: https://postgr.es/m/20190618213357.vli3i23vpkset2xd@development
Reviewed-by: Kirk Jamison, Dean Rasheed
2019-09-10 20:09:27 +02:00
|
|
|
int stattarget;
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
|
2017-04-17 19:00:47 +02:00
|
|
|
/*
|
2017-05-17 22:31:56 +02:00
|
|
|
* Check if we can build these stats based on the column analyzed. If
|
|
|
|
* not, report this fact (except in autovacuum) and move on.
|
2017-04-17 19:00:47 +02:00
|
|
|
*/
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
stats = lookup_var_attr_stats(onerel, stat->columns,
|
|
|
|
natts, vacattrstats);
|
2017-11-29 03:25:47 +01:00
|
|
|
if (!stats)
|
2017-04-17 19:00:47 +02:00
|
|
|
{
|
2017-11-29 03:25:47 +01:00
|
|
|
if (!IsAutoVacuumWorkerProcess())
|
|
|
|
ereport(WARNING,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("statistics object \"%s.%s\" could not be computed for relation \"%s.%s\"",
|
|
|
|
stat->schema, stat->name,
|
|
|
|
get_namespace_name(onerel->rd_rel->relnamespace),
|
|
|
|
RelationGetRelationName(onerel)),
|
|
|
|
errtable(onerel)));
|
2017-04-17 19:00:47 +02:00
|
|
|
continue;
|
|
|
|
}
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
|
|
|
|
/* check allowed number of dimensions */
|
|
|
|
Assert(bms_num_members(stat->columns) >= 2 &&
|
|
|
|
bms_num_members(stat->columns) <= STATS_MAX_DIMENSIONS);
|
|
|
|
|
Allow setting statistics target for extended statistics
When building statistics, we need to decide how many rows to sample and
how accurate the resulting statistics should be. Until now, it was not
possible to explicitly define statistics target for extended statistics
objects, the value was always computed from the per-attribute targets
with a fallback to the system-wide default statistics target.
That's a bit inconvenient, as it ties together the statistics target set
for per-column and extended statistics. In some cases it may be useful
to require larger sample / higher accuracy for extended statics (or the
other way around), but with this approach that's not possible.
So this commit introduces a new command, allowing to specify statistics
target for individual extended statistics objects, overriding the value
derived from per-attribute targets (and the system default).
ALTER STATISTICS stat_name SET STATISTICS target_value;
When determining statistics target for an extended statistics object we
first look at this explicitly set value. When this value is -1, we fall
back to the old formula, looking at the per-attribute targets first and
then the system default. This means the behavior is backwards compatible
with older PostgreSQL releases.
Author: Tomas Vondra
Discussion: https://postgr.es/m/20190618213357.vli3i23vpkset2xd@development
Reviewed-by: Kirk Jamison, Dean Rasheed
2019-09-10 20:09:27 +02:00
|
|
|
/* compute statistics target for this statistics */
|
|
|
|
stattarget = statext_compute_stattarget(stat->stattarget,
|
|
|
|
bms_num_members(stat->columns),
|
|
|
|
stats);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Don't rebuild statistics objects with statistics target set to 0 (we
|
|
|
|
* just leave the existing values around, just like we do for regular
|
|
|
|
* per-column statistics).
|
|
|
|
*/
|
|
|
|
if (stattarget == 0)
|
|
|
|
continue;
|
|
|
|
|
2017-05-14 16:54:47 +02:00
|
|
|
/* compute statistic of each requested type */
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
foreach(lc2, stat->types)
|
|
|
|
{
|
2017-05-17 22:31:56 +02:00
|
|
|
char t = (char) lfirst_int(lc2);
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
|
|
|
|
if (t == STATS_EXT_NDISTINCT)
|
|
|
|
ndistinct = statext_ndistinct_build(totalrows, numrows, rows,
|
|
|
|
stat->columns, stats);
|
2017-04-06 00:00:42 +02:00
|
|
|
else if (t == STATS_EXT_DEPENDENCIES)
|
|
|
|
dependencies = statext_dependencies_build(numrows, rows,
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
stat->columns, stats);
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
else if (t == STATS_EXT_MCV)
|
|
|
|
mcv = statext_mcv_build(numrows, rows, stat->columns, stats,
|
Allow setting statistics target for extended statistics
When building statistics, we need to decide how many rows to sample and
how accurate the resulting statistics should be. Until now, it was not
possible to explicitly define statistics target for extended statistics
objects, the value was always computed from the per-attribute targets
with a fallback to the system-wide default statistics target.
That's a bit inconvenient, as it ties together the statistics target set
for per-column and extended statistics. In some cases it may be useful
to require larger sample / higher accuracy for extended statics (or the
other way around), but with this approach that's not possible.
So this commit introduces a new command, allowing to specify statistics
target for individual extended statistics objects, overriding the value
derived from per-attribute targets (and the system default).
ALTER STATISTICS stat_name SET STATISTICS target_value;
When determining statistics target for an extended statistics object we
first look at this explicitly set value. When this value is -1, we fall
back to the old formula, looking at the per-attribute targets first and
then the system default. This means the behavior is backwards compatible
with older PostgreSQL releases.
Author: Tomas Vondra
Discussion: https://postgr.es/m/20190618213357.vli3i23vpkset2xd@development
Reviewed-by: Kirk Jamison, Dean Rasheed
2019-09-10 20:09:27 +02:00
|
|
|
totalrows, stattarget);
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/* store the statistics in the catalog */
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
statext_store(stat->statOid, ndistinct, dependencies, mcv, stats);
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
}
|
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(pg_stext, RowExclusiveLock);
|
2017-04-17 19:00:47 +02:00
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcxt);
|
|
|
|
MemoryContextDelete(cxt);
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
}
|
|
|
|
|
Allow setting statistics target for extended statistics
When building statistics, we need to decide how many rows to sample and
how accurate the resulting statistics should be. Until now, it was not
possible to explicitly define statistics target for extended statistics
objects, the value was always computed from the per-attribute targets
with a fallback to the system-wide default statistics target.
That's a bit inconvenient, as it ties together the statistics target set
for per-column and extended statistics. In some cases it may be useful
to require larger sample / higher accuracy for extended statics (or the
other way around), but with this approach that's not possible.
So this commit introduces a new command, allowing to specify statistics
target for individual extended statistics objects, overriding the value
derived from per-attribute targets (and the system default).
ALTER STATISTICS stat_name SET STATISTICS target_value;
When determining statistics target for an extended statistics object we
first look at this explicitly set value. When this value is -1, we fall
back to the old formula, looking at the per-attribute targets first and
then the system default. This means the behavior is backwards compatible
with older PostgreSQL releases.
Author: Tomas Vondra
Discussion: https://postgr.es/m/20190618213357.vli3i23vpkset2xd@development
Reviewed-by: Kirk Jamison, Dean Rasheed
2019-09-10 20:09:27 +02:00
|
|
|
/*
|
|
|
|
* ComputeExtStatisticsRows
|
|
|
|
* Compute number of rows required by extended statistics on a table.
|
|
|
|
*
|
|
|
|
* Computes number of rows we need to sample to build extended statistics on a
|
|
|
|
* table. This only looks at statistics we can actually build - for example
|
|
|
|
* when analyzing only some of the columns, this will skip statistics objects
|
|
|
|
* that would require additional columns.
|
|
|
|
*
|
|
|
|
* See statext_compute_stattarget for details about how we compute statistics
|
|
|
|
* target for a statistics objects (from the object target, attribute targets
|
|
|
|
* and default statistics target).
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
ComputeExtStatisticsRows(Relation onerel,
|
|
|
|
int natts, VacAttrStats **vacattrstats)
|
|
|
|
{
|
|
|
|
Relation pg_stext;
|
|
|
|
ListCell *lc;
|
|
|
|
List *lstats;
|
|
|
|
MemoryContext cxt;
|
|
|
|
MemoryContext oldcxt;
|
|
|
|
int result = 0;
|
|
|
|
|
|
|
|
cxt = AllocSetContextCreate(CurrentMemoryContext,
|
|
|
|
"ComputeExtStatisticsRows",
|
|
|
|
ALLOCSET_DEFAULT_SIZES);
|
|
|
|
oldcxt = MemoryContextSwitchTo(cxt);
|
|
|
|
|
|
|
|
pg_stext = table_open(StatisticExtRelationId, RowExclusiveLock);
|
|
|
|
lstats = fetch_statentries_for_relation(pg_stext, RelationGetRelid(onerel));
|
|
|
|
|
|
|
|
foreach(lc, lstats)
|
|
|
|
{
|
|
|
|
StatExtEntry *stat = (StatExtEntry *) lfirst(lc);
|
|
|
|
int stattarget = stat->stattarget;
|
|
|
|
VacAttrStats **stats;
|
|
|
|
int nattrs = bms_num_members(stat->columns);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check if we can build this statistics object based on the columns
|
|
|
|
* analyzed. If not, ignore it (don't report anything, we'll do that
|
|
|
|
* during the actual build BuildRelationExtStatistics).
|
|
|
|
*/
|
|
|
|
stats = lookup_var_attr_stats(onerel, stat->columns,
|
|
|
|
natts, vacattrstats);
|
|
|
|
|
|
|
|
if (!stats)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Compute statistics target, based on what's set for the statistic
|
|
|
|
* object itself, and for its attributes.
|
|
|
|
*/
|
|
|
|
stattarget = statext_compute_stattarget(stat->stattarget,
|
|
|
|
nattrs, stats);
|
|
|
|
|
|
|
|
/* Use the largest value for all statistics objects. */
|
|
|
|
if (stattarget > result)
|
|
|
|
result = stattarget;
|
|
|
|
}
|
|
|
|
|
|
|
|
table_close(pg_stext, RowExclusiveLock);
|
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcxt);
|
|
|
|
MemoryContextDelete(cxt);
|
|
|
|
|
|
|
|
/* compute sample size based on the statistics target */
|
|
|
|
return (300 * result);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* statext_compute_stattarget
|
|
|
|
* compute statistics target for an extended statistic
|
|
|
|
*
|
|
|
|
* When computing target for extended statistics objects, we consider three
|
|
|
|
* places where the target may be set - the statistics object itself,
|
|
|
|
* attributes the statistics is defined on, and then the default statistics
|
|
|
|
* target.
|
|
|
|
*
|
|
|
|
* First we look at what's set for the statistics object itself, using the
|
|
|
|
* ALTER STATISTICS ... SET STATISTICS command. If we find a valid value
|
|
|
|
* there (i.e. not -1) we're done. Otherwise we look at targets set for any
|
|
|
|
* of the attributes the statistic is defined on, and if there are columns
|
|
|
|
* with defined target, we use the maximum value. We do this mostly for
|
|
|
|
* backwards compatibility, because this is what we did before having
|
|
|
|
* statistics target for extended statistics.
|
|
|
|
*
|
|
|
|
* And finally, if we still don't have a statistics target, we use the value
|
|
|
|
* set in default_statistics_target.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
statext_compute_stattarget(int stattarget, int nattrs, VacAttrStats **stats)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If there's statistics target set for the statistics object, use it.
|
|
|
|
* It may be set to 0 which disables building of that statistic.
|
|
|
|
*/
|
|
|
|
if (stattarget >= 0)
|
|
|
|
return stattarget;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The target for the statistics object is set to -1, in which case we
|
|
|
|
* look at the maximum target set for any of the attributes the object
|
|
|
|
* is defined on.
|
|
|
|
*/
|
|
|
|
for (i = 0; i < nattrs; i++)
|
|
|
|
{
|
|
|
|
/* keep the maximmum statistics target */
|
|
|
|
if (stats[i]->attr->attstattarget > stattarget)
|
|
|
|
stattarget = stats[i]->attr->attstattarget;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the value is still negative (so neither the statistics object nor
|
|
|
|
* any of the columns have custom statistics target set), use the global
|
|
|
|
* default target.
|
|
|
|
*/
|
|
|
|
if (stattarget < 0)
|
|
|
|
stattarget = default_statistics_target;
|
|
|
|
|
|
|
|
/* As this point we should have a valid statistics target. */
|
|
|
|
Assert((stattarget >= 0) && (stattarget <= 10000));
|
|
|
|
|
|
|
|
return stattarget;
|
|
|
|
}
|
|
|
|
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
/*
|
|
|
|
* statext_is_kind_built
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
* Is this stat kind built in the given pg_statistic_ext_data tuple?
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
*/
|
|
|
|
bool
|
|
|
|
statext_is_kind_built(HeapTuple htup, char type)
|
|
|
|
{
|
2017-05-17 22:31:56 +02:00
|
|
|
AttrNumber attnum;
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
|
|
|
|
switch (type)
|
|
|
|
{
|
|
|
|
case STATS_EXT_NDISTINCT:
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
attnum = Anum_pg_statistic_ext_data_stxdndistinct;
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
break;
|
|
|
|
|
2017-04-06 00:00:42 +02:00
|
|
|
case STATS_EXT_DEPENDENCIES:
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
attnum = Anum_pg_statistic_ext_data_stxddependencies;
|
2017-04-06 00:00:42 +02:00
|
|
|
break;
|
|
|
|
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
case STATS_EXT_MCV:
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
attnum = Anum_pg_statistic_ext_data_stxdmcv;
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
break;
|
|
|
|
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
default:
|
|
|
|
elog(ERROR, "unexpected statistics type requested: %d", type);
|
|
|
|
}
|
|
|
|
|
2018-03-28 02:13:52 +02:00
|
|
|
return !heap_attisnull(htup, attnum, NULL);
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2017-05-14 16:54:47 +02:00
|
|
|
* Return a list (of StatExtEntry) of statistics objects for the given relation.
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
*/
|
|
|
|
static List *
|
|
|
|
fetch_statentries_for_relation(Relation pg_statext, Oid relid)
|
|
|
|
{
|
|
|
|
SysScanDesc scan;
|
|
|
|
ScanKeyData skey;
|
2017-05-17 22:31:56 +02:00
|
|
|
HeapTuple htup;
|
|
|
|
List *result = NIL;
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
|
|
|
|
/*
|
2017-05-14 16:54:47 +02:00
|
|
|
* Prepare to scan pg_statistic_ext for entries having stxrelid = this
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
* rel.
|
|
|
|
*/
|
|
|
|
ScanKeyInit(&skey,
|
2017-04-17 23:34:29 +02:00
|
|
|
Anum_pg_statistic_ext_stxrelid,
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(relid));
|
|
|
|
|
|
|
|
scan = systable_beginscan(pg_statext, StatisticExtRelidIndexId, true,
|
|
|
|
NULL, 1, &skey);
|
|
|
|
|
|
|
|
while (HeapTupleIsValid(htup = systable_getnext(scan)))
|
|
|
|
{
|
|
|
|
StatExtEntry *entry;
|
|
|
|
Datum datum;
|
|
|
|
bool isnull;
|
|
|
|
int i;
|
|
|
|
ArrayType *arr;
|
|
|
|
char *enabled;
|
|
|
|
Form_pg_statistic_ext staForm;
|
|
|
|
|
|
|
|
entry = palloc0(sizeof(StatExtEntry));
|
|
|
|
staForm = (Form_pg_statistic_ext) GETSTRUCT(htup);
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
entry->statOid = staForm->oid;
|
2017-04-17 23:34:29 +02:00
|
|
|
entry->schema = get_namespace_name(staForm->stxnamespace);
|
|
|
|
entry->name = pstrdup(NameStr(staForm->stxname));
|
Allow setting statistics target for extended statistics
When building statistics, we need to decide how many rows to sample and
how accurate the resulting statistics should be. Until now, it was not
possible to explicitly define statistics target for extended statistics
objects, the value was always computed from the per-attribute targets
with a fallback to the system-wide default statistics target.
That's a bit inconvenient, as it ties together the statistics target set
for per-column and extended statistics. In some cases it may be useful
to require larger sample / higher accuracy for extended statics (or the
other way around), but with this approach that's not possible.
So this commit introduces a new command, allowing to specify statistics
target for individual extended statistics objects, overriding the value
derived from per-attribute targets (and the system default).
ALTER STATISTICS stat_name SET STATISTICS target_value;
When determining statistics target for an extended statistics object we
first look at this explicitly set value. When this value is -1, we fall
back to the old formula, looking at the per-attribute targets first and
then the system default. This means the behavior is backwards compatible
with older PostgreSQL releases.
Author: Tomas Vondra
Discussion: https://postgr.es/m/20190618213357.vli3i23vpkset2xd@development
Reviewed-by: Kirk Jamison, Dean Rasheed
2019-09-10 20:09:27 +02:00
|
|
|
entry->stattarget = staForm->stxstattarget;
|
2017-04-17 23:34:29 +02:00
|
|
|
for (i = 0; i < staForm->stxkeys.dim1; i++)
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
{
|
|
|
|
entry->columns = bms_add_member(entry->columns,
|
2017-04-17 23:34:29 +02:00
|
|
|
staForm->stxkeys.values[i]);
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
}
|
|
|
|
|
2017-04-17 23:34:29 +02:00
|
|
|
/* decode the stxkind char array into a list of chars */
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
datum = SysCacheGetAttr(STATEXTOID, htup,
|
2017-04-17 23:34:29 +02:00
|
|
|
Anum_pg_statistic_ext_stxkind, &isnull);
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
Assert(!isnull);
|
|
|
|
arr = DatumGetArrayTypeP(datum);
|
|
|
|
if (ARR_NDIM(arr) != 1 ||
|
|
|
|
ARR_HASNULL(arr) ||
|
|
|
|
ARR_ELEMTYPE(arr) != CHAROID)
|
2017-04-17 23:34:29 +02:00
|
|
|
elog(ERROR, "stxkind is not a 1-D char array");
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
enabled = (char *) ARR_DATA_PTR(arr);
|
|
|
|
for (i = 0; i < ARR_DIMS(arr)[0]; i++)
|
|
|
|
{
|
2017-04-06 00:00:42 +02:00
|
|
|
Assert((enabled[i] == STATS_EXT_NDISTINCT) ||
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
(enabled[i] == STATS_EXT_DEPENDENCIES) ||
|
|
|
|
(enabled[i] == STATS_EXT_MCV));
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
entry->types = lappend_int(entry->types, (int) enabled[i]);
|
|
|
|
}
|
|
|
|
|
|
|
|
result = lappend(result, entry);
|
|
|
|
}
|
|
|
|
|
|
|
|
systable_endscan(scan);
|
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2017-04-17 19:00:47 +02:00
|
|
|
* Using 'vacatts' of size 'nvacatts' as input data, return a newly built
|
|
|
|
* VacAttrStats array which includes only the items corresponding to
|
2017-04-17 23:34:29 +02:00
|
|
|
* attributes indicated by 'stxkeys'. If we don't have all of the per column
|
2017-04-17 19:00:47 +02:00
|
|
|
* stats available to compute the extended stats, then we return NULL to indicate
|
|
|
|
* to the caller that the stats should not be built.
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
*/
|
|
|
|
static VacAttrStats **
|
2017-04-17 19:00:47 +02:00
|
|
|
lookup_var_attr_stats(Relation rel, Bitmapset *attrs,
|
|
|
|
int nvacatts, VacAttrStats **vacatts)
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
{
|
|
|
|
int i = 0;
|
|
|
|
int x = -1;
|
|
|
|
VacAttrStats **stats;
|
|
|
|
|
|
|
|
stats = (VacAttrStats **)
|
|
|
|
palloc(bms_num_members(attrs) * sizeof(VacAttrStats *));
|
|
|
|
|
|
|
|
/* lookup VacAttrStats info for the requested columns (same attnum) */
|
|
|
|
while ((x = bms_next_member(attrs, x)) >= 0)
|
|
|
|
{
|
2017-05-17 22:31:56 +02:00
|
|
|
int j;
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
|
|
|
|
stats[i] = NULL;
|
2017-04-17 19:00:47 +02:00
|
|
|
for (j = 0; j < nvacatts; j++)
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
{
|
2017-04-17 19:00:47 +02:00
|
|
|
if (x == vacatts[j]->tupattnum)
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
{
|
2017-04-17 19:00:47 +02:00
|
|
|
stats[i] = vacatts[j];
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!stats[i])
|
2017-04-17 19:00:47 +02:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Looks like stats were not gathered for one of the columns
|
|
|
|
* required. We'll be unable to build the extended stats without
|
|
|
|
* this column.
|
|
|
|
*/
|
|
|
|
pfree(stats);
|
|
|
|
return NULL;
|
|
|
|
}
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
|
2017-05-17 22:31:56 +02:00
|
|
|
/*
|
|
|
|
* Sanity check that the column is not dropped - stats should have
|
|
|
|
* been removed in this case.
|
|
|
|
*/
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
Assert(!stats[i]->attr->attisdropped);
|
|
|
|
|
|
|
|
i++;
|
|
|
|
}
|
|
|
|
|
|
|
|
return stats;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* statext_store
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
* Serializes the statistics and stores them into the pg_statistic_ext_data
|
|
|
|
* tuple.
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
*/
|
|
|
|
static void
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
statext_store(Oid statOid,
|
2017-04-06 00:00:42 +02:00
|
|
|
MVNDistinct *ndistinct, MVDependencies *dependencies,
|
2019-05-22 18:55:34 +02:00
|
|
|
MCVList *mcv, VacAttrStats **stats)
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
{
|
|
|
|
HeapTuple stup,
|
|
|
|
oldtup;
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
Datum values[Natts_pg_statistic_ext_data];
|
|
|
|
bool nulls[Natts_pg_statistic_ext_data];
|
|
|
|
bool replaces[Natts_pg_statistic_ext_data];
|
|
|
|
Relation pg_stextdata;
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
|
2017-11-29 04:09:17 +01:00
|
|
|
memset(nulls, true, sizeof(nulls));
|
|
|
|
memset(replaces, false, sizeof(replaces));
|
|
|
|
memset(values, 0, sizeof(values));
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
|
|
|
|
/*
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
* Construct a new pg_statistic_ext_data tuple, replacing the calculated
|
|
|
|
* stats.
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
*/
|
|
|
|
if (ndistinct != NULL)
|
|
|
|
{
|
|
|
|
bytea *data = statext_ndistinct_serialize(ndistinct);
|
|
|
|
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
nulls[Anum_pg_statistic_ext_data_stxdndistinct - 1] = (data == NULL);
|
|
|
|
values[Anum_pg_statistic_ext_data_stxdndistinct - 1] = PointerGetDatum(data);
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
}
|
|
|
|
|
2017-04-06 00:00:42 +02:00
|
|
|
if (dependencies != NULL)
|
|
|
|
{
|
|
|
|
bytea *data = statext_dependencies_serialize(dependencies);
|
|
|
|
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
nulls[Anum_pg_statistic_ext_data_stxddependencies - 1] = (data == NULL);
|
|
|
|
values[Anum_pg_statistic_ext_data_stxddependencies - 1] = PointerGetDatum(data);
|
2017-04-06 00:00:42 +02:00
|
|
|
}
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
if (mcv != NULL)
|
|
|
|
{
|
|
|
|
bytea *data = statext_mcv_serialize(mcv, stats);
|
|
|
|
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
nulls[Anum_pg_statistic_ext_data_stxdmcv - 1] = (data == NULL);
|
|
|
|
values[Anum_pg_statistic_ext_data_stxdmcv - 1] = PointerGetDatum(data);
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
}
|
|
|
|
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
/* always replace the value (either by bytea or NULL) */
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
replaces[Anum_pg_statistic_ext_data_stxdndistinct - 1] = true;
|
|
|
|
replaces[Anum_pg_statistic_ext_data_stxddependencies - 1] = true;
|
|
|
|
replaces[Anum_pg_statistic_ext_data_stxdmcv - 1] = true;
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
/* there should already be a pg_statistic_ext_data tuple */
|
|
|
|
oldtup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(statOid));
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
if (!HeapTupleIsValid(oldtup))
|
2017-05-14 16:54:47 +02:00
|
|
|
elog(ERROR, "cache lookup failed for statistics object %u", statOid);
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
|
|
|
|
/* replace it */
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
pg_stextdata = table_open(StatisticExtDataRelationId, RowExclusiveLock);
|
|
|
|
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
stup = heap_modify_tuple(oldtup,
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
RelationGetDescr(pg_stextdata),
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
values,
|
|
|
|
nulls,
|
|
|
|
replaces);
|
|
|
|
ReleaseSysCache(oldtup);
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
CatalogTupleUpdate(pg_stextdata, &stup->t_self, stup);
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
|
|
|
|
heap_freetuple(stup);
|
Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-13 17:19:21 +02:00
|
|
|
|
|
|
|
table_close(pg_stextdata, RowExclusiveLock);
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/* initialize multi-dimensional sort */
|
|
|
|
MultiSortSupport
|
|
|
|
multi_sort_init(int ndims)
|
|
|
|
{
|
|
|
|
MultiSortSupport mss;
|
|
|
|
|
|
|
|
Assert(ndims >= 2);
|
|
|
|
|
|
|
|
mss = (MultiSortSupport) palloc0(offsetof(MultiSortSupportData, ssup)
|
2017-06-21 20:39:04 +02:00
|
|
|
+ sizeof(SortSupportData) * ndims);
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
|
|
|
|
mss->ndims = ndims;
|
|
|
|
|
|
|
|
return mss;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
Make pg_statistic and related code account more honestly for collations.
When we first put in collations support, we basically punted on teaching
pg_statistic, ANALYZE, and the planner selectivity functions about that.
They've just used DEFAULT_COLLATION_OID independently of the actual
collation of the data. It's time to improve that, so:
* Add columns to pg_statistic that record the specific collation associated
with each statistics slot.
* Teach ANALYZE to use the column's actual collation when comparing values
for statistical purposes, and record this in the appropriate slot. (Note
that type-specific typanalyze functions are now expected to fill
stats->stacoll with the appropriate collation, too.)
* Teach assorted selectivity functions to use the actual collation of
the stats they are looking at, instead of just assuming it's
DEFAULT_COLLATION_OID.
This should give noticeably better results in selectivity estimates for
columns with nondefault collations, at least for query clauses that use
that same collation (which would be the default behavior in most cases).
It's still true that comparisons with explicit COLLATE clauses different
from the stored data's collation won't be well-estimated, but that's no
worse than before. Also, this patch does make the first step towards
doing better with that, which is that it's now theoretically possible to
collect stats for a collation other than the column's own collation.
Patch by me; thanks to Peter Eisentraut for review.
Discussion: https://postgr.es/m/14706.1544630227@sss.pgh.pa.us
2018-12-14 18:52:49 +01:00
|
|
|
* Prepare sort support info using the given sort operator and collation
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
* at the position 'sortdim'
|
|
|
|
*/
|
|
|
|
void
|
Make pg_statistic and related code account more honestly for collations.
When we first put in collations support, we basically punted on teaching
pg_statistic, ANALYZE, and the planner selectivity functions about that.
They've just used DEFAULT_COLLATION_OID independently of the actual
collation of the data. It's time to improve that, so:
* Add columns to pg_statistic that record the specific collation associated
with each statistics slot.
* Teach ANALYZE to use the column's actual collation when comparing values
for statistical purposes, and record this in the appropriate slot. (Note
that type-specific typanalyze functions are now expected to fill
stats->stacoll with the appropriate collation, too.)
* Teach assorted selectivity functions to use the actual collation of
the stats they are looking at, instead of just assuming it's
DEFAULT_COLLATION_OID.
This should give noticeably better results in selectivity estimates for
columns with nondefault collations, at least for query clauses that use
that same collation (which would be the default behavior in most cases).
It's still true that comparisons with explicit COLLATE clauses different
from the stored data's collation won't be well-estimated, but that's no
worse than before. Also, this patch does make the first step towards
doing better with that, which is that it's now theoretically possible to
collect stats for a collation other than the column's own collation.
Patch by me; thanks to Peter Eisentraut for review.
Discussion: https://postgr.es/m/14706.1544630227@sss.pgh.pa.us
2018-12-14 18:52:49 +01:00
|
|
|
multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
|
|
|
|
Oid oper, Oid collation)
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
{
|
2017-05-17 22:31:56 +02:00
|
|
|
SortSupport ssup = &mss->ssup[sortdim];
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
|
|
|
|
ssup->ssup_cxt = CurrentMemoryContext;
|
Make pg_statistic and related code account more honestly for collations.
When we first put in collations support, we basically punted on teaching
pg_statistic, ANALYZE, and the planner selectivity functions about that.
They've just used DEFAULT_COLLATION_OID independently of the actual
collation of the data. It's time to improve that, so:
* Add columns to pg_statistic that record the specific collation associated
with each statistics slot.
* Teach ANALYZE to use the column's actual collation when comparing values
for statistical purposes, and record this in the appropriate slot. (Note
that type-specific typanalyze functions are now expected to fill
stats->stacoll with the appropriate collation, too.)
* Teach assorted selectivity functions to use the actual collation of
the stats they are looking at, instead of just assuming it's
DEFAULT_COLLATION_OID.
This should give noticeably better results in selectivity estimates for
columns with nondefault collations, at least for query clauses that use
that same collation (which would be the default behavior in most cases).
It's still true that comparisons with explicit COLLATE clauses different
from the stored data's collation won't be well-estimated, but that's no
worse than before. Also, this patch does make the first step towards
doing better with that, which is that it's now theoretically possible to
collect stats for a collation other than the column's own collation.
Patch by me; thanks to Peter Eisentraut for review.
Discussion: https://postgr.es/m/14706.1544630227@sss.pgh.pa.us
2018-12-14 18:52:49 +01:00
|
|
|
ssup->ssup_collation = collation;
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
|
|
|
ssup->ssup_nulls_first = false;
|
|
|
|
|
|
|
|
PrepareSortSupportFromOrderingOp(oper, ssup);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* compare all the dimensions in the selected order */
|
|
|
|
int
|
|
|
|
multi_sort_compare(const void *a, const void *b, void *arg)
|
|
|
|
{
|
|
|
|
MultiSortSupport mss = (MultiSortSupport) arg;
|
|
|
|
SortItem *ia = (SortItem *) a;
|
|
|
|
SortItem *ib = (SortItem *) b;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < mss->ndims; i++)
|
|
|
|
{
|
|
|
|
int compare;
|
|
|
|
|
|
|
|
compare = ApplySortComparator(ia->values[i], ia->isnull[i],
|
|
|
|
ib->values[i], ib->isnull[i],
|
|
|
|
&mss->ssup[i]);
|
|
|
|
|
|
|
|
if (compare != 0)
|
|
|
|
return compare;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* equal by default */
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* compare selected dimension */
|
|
|
|
int
|
|
|
|
multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
|
|
|
|
MultiSortSupport mss)
|
|
|
|
{
|
|
|
|
return ApplySortComparator(a->values[dim], a->isnull[dim],
|
|
|
|
b->values[dim], b->isnull[dim],
|
|
|
|
&mss->ssup[dim]);
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
multi_sort_compare_dims(int start, int end,
|
|
|
|
const SortItem *a, const SortItem *b,
|
|
|
|
MultiSortSupport mss)
|
|
|
|
{
|
|
|
|
int dim;
|
|
|
|
|
|
|
|
for (dim = start; dim <= end; dim++)
|
|
|
|
{
|
|
|
|
int r = ApplySortComparator(a->values[dim], a->isnull[dim],
|
|
|
|
b->values[dim], b->isnull[dim],
|
|
|
|
&mss->ssup[dim]);
|
|
|
|
|
|
|
|
if (r != 0)
|
|
|
|
return r;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
2017-04-06 00:00:42 +02:00
|
|
|
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
int
|
|
|
|
compare_scalars_simple(const void *a, const void *b, void *arg)
|
|
|
|
{
|
|
|
|
return compare_datums_simple(*(Datum *) a,
|
|
|
|
*(Datum *) b,
|
|
|
|
(SortSupport) arg);
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
compare_datums_simple(Datum a, Datum b, SortSupport ssup)
|
|
|
|
{
|
|
|
|
return ApplySortComparator(a, false, b, false, ssup);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* simple counterpart to qsort_arg */
|
|
|
|
void *
|
|
|
|
bsearch_arg(const void *key, const void *base, size_t nmemb, size_t size,
|
|
|
|
int (*compar) (const void *, const void *, void *),
|
|
|
|
void *arg)
|
|
|
|
{
|
|
|
|
size_t l,
|
|
|
|
u,
|
|
|
|
idx;
|
|
|
|
const void *p;
|
|
|
|
int comparison;
|
|
|
|
|
|
|
|
l = 0;
|
|
|
|
u = nmemb;
|
|
|
|
while (l < u)
|
|
|
|
{
|
|
|
|
idx = (l + u) / 2;
|
|
|
|
p = (void *) (((const char *) base) + (idx * size));
|
|
|
|
comparison = (*compar) (key, p, arg);
|
|
|
|
|
|
|
|
if (comparison < 0)
|
|
|
|
u = idx;
|
|
|
|
else if (comparison > 0)
|
|
|
|
l = idx + 1;
|
|
|
|
else
|
|
|
|
return (void *) p;
|
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* build_attnums_array
|
|
|
|
* Transforms a bitmap into an array of AttrNumber values.
|
|
|
|
*
|
|
|
|
* This is used for extended statistics only, so all the attribute must be
|
|
|
|
* user-defined. That means offsetting by FirstLowInvalidHeapAttributeNumber
|
|
|
|
* is not necessary here (and when querying the bitmap).
|
|
|
|
*/
|
|
|
|
AttrNumber *
|
|
|
|
build_attnums_array(Bitmapset *attrs, int *numattrs)
|
|
|
|
{
|
|
|
|
int i,
|
|
|
|
j;
|
|
|
|
AttrNumber *attnums;
|
|
|
|
int num = bms_num_members(attrs);
|
|
|
|
|
|
|
|
if (numattrs)
|
|
|
|
*numattrs = num;
|
|
|
|
|
|
|
|
/* build attnums from the bitmapset */
|
|
|
|
attnums = (AttrNumber *) palloc(sizeof(AttrNumber) * num);
|
|
|
|
i = 0;
|
|
|
|
j = -1;
|
|
|
|
while ((j = bms_next_member(attrs, j)) >= 0)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Make sure the bitmap contains only user-defined attributes. As
|
2019-05-22 18:55:34 +02:00
|
|
|
* bitmaps can't contain negative values, this can be violated in two
|
|
|
|
* ways. Firstly, the bitmap might contain 0 as a member, and secondly
|
|
|
|
* the integer value might be larger than MaxAttrNumber.
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
*/
|
|
|
|
Assert(AttrNumberIsForUserDefinedAttr(j));
|
|
|
|
Assert(j <= MaxAttrNumber);
|
|
|
|
|
|
|
|
attnums[i++] = (AttrNumber) j;
|
|
|
|
|
|
|
|
/* protect against overflows */
|
|
|
|
Assert(i <= num);
|
|
|
|
}
|
|
|
|
|
|
|
|
return attnums;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* build_sorted_items
|
|
|
|
* build a sorted array of SortItem with values from rows
|
|
|
|
*
|
|
|
|
* Note: All the memory is allocated in a single chunk, so that the caller
|
|
|
|
* can simply pfree the return value to release all of it.
|
|
|
|
*/
|
|
|
|
SortItem *
|
|
|
|
build_sorted_items(int numrows, int *nitems, HeapTuple *rows, TupleDesc tdesc,
|
|
|
|
MultiSortSupport mss, int numattrs, AttrNumber *attnums)
|
|
|
|
{
|
|
|
|
int i,
|
|
|
|
j,
|
|
|
|
len,
|
|
|
|
idx;
|
|
|
|
int nvalues = numrows * numattrs;
|
|
|
|
|
|
|
|
SortItem *items;
|
|
|
|
Datum *values;
|
|
|
|
bool *isnull;
|
|
|
|
char *ptr;
|
|
|
|
|
|
|
|
/* Compute the total amount of memory we need (both items and values). */
|
|
|
|
len = numrows * sizeof(SortItem) + nvalues * (sizeof(Datum) + sizeof(bool));
|
|
|
|
|
|
|
|
/* Allocate the memory and split it into the pieces. */
|
|
|
|
ptr = palloc0(len);
|
|
|
|
|
|
|
|
/* items to sort */
|
|
|
|
items = (SortItem *) ptr;
|
|
|
|
ptr += numrows * sizeof(SortItem);
|
|
|
|
|
|
|
|
/* values and null flags */
|
|
|
|
values = (Datum *) ptr;
|
|
|
|
ptr += nvalues * sizeof(Datum);
|
|
|
|
|
|
|
|
isnull = (bool *) ptr;
|
|
|
|
ptr += nvalues * sizeof(bool);
|
|
|
|
|
|
|
|
/* make sure we consumed the whole buffer exactly */
|
|
|
|
Assert((ptr - (char *) items) == len);
|
|
|
|
|
|
|
|
/* fix the pointers to Datum and bool arrays */
|
|
|
|
idx = 0;
|
|
|
|
for (i = 0; i < numrows; i++)
|
|
|
|
{
|
2019-05-22 18:55:34 +02:00
|
|
|
bool toowide = false;
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
|
|
|
|
items[idx].values = &values[idx * numattrs];
|
|
|
|
items[idx].isnull = &isnull[idx * numattrs];
|
|
|
|
|
|
|
|
/* load the values/null flags from sample rows */
|
|
|
|
for (j = 0; j < numattrs; j++)
|
|
|
|
{
|
2019-05-22 18:55:34 +02:00
|
|
|
Datum value;
|
|
|
|
bool isnull;
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
|
|
|
|
value = heap_getattr(rows[i], attnums[j], tdesc, &isnull);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If this is a varlena value, check if it's too wide and if yes
|
|
|
|
* then skip the whole item. Otherwise detoast the value.
|
|
|
|
*
|
|
|
|
* XXX It may happen that we've already detoasted some preceding
|
|
|
|
* values for the current item. We don't bother to cleanup those
|
|
|
|
* on the assumption that those are small (below WIDTH_THRESHOLD)
|
|
|
|
* and will be discarded at the end of analyze.
|
|
|
|
*/
|
|
|
|
if ((!isnull) &&
|
|
|
|
(TupleDescAttr(tdesc, attnums[j] - 1)->attlen == -1))
|
|
|
|
{
|
|
|
|
if (toast_raw_datum_size(value) > WIDTH_THRESHOLD)
|
|
|
|
{
|
|
|
|
toowide = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
value = PointerGetDatum(PG_DETOAST_DATUM(value));
|
|
|
|
}
|
|
|
|
|
|
|
|
items[idx].values[j] = value;
|
|
|
|
items[idx].isnull[j] = isnull;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (toowide)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
idx++;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* store the actual number of items (ignoring the too-wide ones) */
|
|
|
|
*nitems = idx;
|
|
|
|
|
|
|
|
/* all items were too wide */
|
|
|
|
if (idx == 0)
|
|
|
|
{
|
|
|
|
/* everything is allocated as a single chunk */
|
|
|
|
pfree(items);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* do the sort, using the multi-sort */
|
|
|
|
qsort_arg((void *) items, idx, sizeof(SortItem),
|
|
|
|
multi_sort_compare, mss);
|
|
|
|
|
|
|
|
return items;
|
|
|
|
}
|
|
|
|
|
2017-04-06 00:00:42 +02:00
|
|
|
/*
|
|
|
|
* has_stats_of_kind
|
2017-05-14 16:54:47 +02:00
|
|
|
* Check whether the list contains statistic of a given kind
|
2017-04-06 00:00:42 +02:00
|
|
|
*/
|
|
|
|
bool
|
|
|
|
has_stats_of_kind(List *stats, char requiredkind)
|
|
|
|
{
|
|
|
|
ListCell *l;
|
|
|
|
|
|
|
|
foreach(l, stats)
|
|
|
|
{
|
|
|
|
StatisticExtInfo *stat = (StatisticExtInfo *) lfirst(l);
|
|
|
|
|
|
|
|
if (stat->kind == requiredkind)
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* choose_best_statistics
|
2017-04-06 17:27:15 +02:00
|
|
|
* Look for and return statistics with the specified 'requiredkind' which
|
|
|
|
* have keys that match at least two of the given attnums. Return NULL if
|
|
|
|
* there's no match.
|
2017-04-06 00:00:42 +02:00
|
|
|
*
|
|
|
|
* The current selection criteria is very simple - we choose the statistics
|
2019-11-28 22:20:28 +01:00
|
|
|
* object referencing the most attributes in covered (and still unestimated
|
|
|
|
* clauses), breaking ties in favor of objects with fewer keys overall.
|
|
|
|
*
|
|
|
|
* The clause_attnums is an array of bitmaps, storing attnums for individual
|
|
|
|
* clauses. A NULL element means the clause is either incompatible or already
|
|
|
|
* estimated.
|
2017-04-06 00:00:42 +02:00
|
|
|
*
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
* XXX If multiple statistics objects tie on both criteria, then which object
|
2017-05-14 16:54:47 +02:00
|
|
|
* is chosen depends on the order that they appear in the stats list. Perhaps
|
|
|
|
* further tiebreakers are needed.
|
2017-04-06 00:00:42 +02:00
|
|
|
*/
|
|
|
|
StatisticExtInfo *
|
2019-11-28 22:20:28 +01:00
|
|
|
choose_best_statistics(List *stats, char requiredkind,
|
|
|
|
Bitmapset **clause_attnums, int nclauses)
|
2017-04-06 00:00:42 +02:00
|
|
|
{
|
|
|
|
ListCell *lc;
|
|
|
|
StatisticExtInfo *best_match = NULL;
|
|
|
|
int best_num_matched = 2; /* goal #1: maximize */
|
|
|
|
int best_match_keys = (STATS_MAX_DIMENSIONS + 1); /* goal #2: minimize */
|
|
|
|
|
|
|
|
foreach(lc, stats)
|
|
|
|
{
|
2019-11-28 22:20:28 +01:00
|
|
|
int i;
|
2017-04-06 00:00:42 +02:00
|
|
|
StatisticExtInfo *info = (StatisticExtInfo *) lfirst(lc);
|
2019-11-28 22:20:28 +01:00
|
|
|
Bitmapset *matched = NULL;
|
2017-04-06 00:00:42 +02:00
|
|
|
int num_matched;
|
|
|
|
int numkeys;
|
|
|
|
|
2017-05-14 16:54:47 +02:00
|
|
|
/* skip statistics that are not of the correct type */
|
2017-04-06 00:00:42 +02:00
|
|
|
if (info->kind != requiredkind)
|
|
|
|
continue;
|
|
|
|
|
2019-11-28 22:20:28 +01:00
|
|
|
/*
|
|
|
|
* Collect attributes in remaining (unestimated) clauses fully covered
|
|
|
|
* by this statistic object.
|
|
|
|
*/
|
|
|
|
for (i = 0; i < nclauses; i++)
|
|
|
|
{
|
|
|
|
/* ignore incompatible/estimated clauses */
|
|
|
|
if (!clause_attnums[i])
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/* ignore clauses that are not covered by this object */
|
|
|
|
if (!bms_is_subset(clause_attnums[i], info->keys))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
matched = bms_add_members(matched, clause_attnums[i]);
|
|
|
|
}
|
|
|
|
|
2017-04-06 00:00:42 +02:00
|
|
|
num_matched = bms_num_members(matched);
|
|
|
|
bms_free(matched);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* save the actual number of keys in the stats so that we can choose
|
|
|
|
* the narrowest stats with the most matching keys.
|
|
|
|
*/
|
|
|
|
numkeys = bms_num_members(info->keys);
|
|
|
|
|
|
|
|
/*
|
2017-05-14 16:54:47 +02:00
|
|
|
* Use this object when it increases the number of matched clauses or
|
|
|
|
* when it matches the same number of attributes but these stats have
|
|
|
|
* fewer keys than any previous match.
|
2017-04-06 00:00:42 +02:00
|
|
|
*/
|
|
|
|
if (num_matched > best_num_matched ||
|
|
|
|
(num_matched == best_num_matched && numkeys < best_match_keys))
|
|
|
|
{
|
|
|
|
best_match = info;
|
|
|
|
best_num_matched = num_matched;
|
|
|
|
best_match_keys = numkeys;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return best_match;
|
|
|
|
}
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* statext_is_compatible_clause_internal
|
|
|
|
* Determines if the clause is compatible with MCV lists.
|
|
|
|
*
|
|
|
|
* Does the heavy lifting of actually inspecting the clauses for
|
|
|
|
* statext_is_compatible_clause. It needs to be split like this because
|
|
|
|
* of recursion. The attnums bitmap is an input/output parameter collecting
|
|
|
|
* attribute numbers from all compatible clauses (recursively).
|
|
|
|
*/
|
|
|
|
static bool
|
2019-06-23 19:50:08 +02:00
|
|
|
statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
|
|
|
|
Index relid, Bitmapset **attnums)
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
{
|
|
|
|
/* Look inside any binary-compatible relabeling (as in examine_variable) */
|
|
|
|
if (IsA(clause, RelabelType))
|
|
|
|
clause = (Node *) ((RelabelType *) clause)->arg;
|
|
|
|
|
|
|
|
/* plain Var references (boolean Vars or recursive checks) */
|
|
|
|
if (IsA(clause, Var))
|
|
|
|
{
|
|
|
|
Var *var = (Var *) clause;
|
|
|
|
|
|
|
|
/* Ensure var is from the correct relation */
|
|
|
|
if (var->varno != relid)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/* we also better ensure the Var is from the current level */
|
|
|
|
if (var->varlevelsup > 0)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/* Also skip system attributes (we don't allow stats on those). */
|
|
|
|
if (!AttrNumberIsForUserDefinedAttr(var->varattno))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
*attnums = bms_add_member(*attnums, var->varattno);
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* (Var op Const) or (Const op Var) */
|
|
|
|
if (is_opclause(clause))
|
|
|
|
{
|
2019-06-23 19:50:08 +02:00
|
|
|
RangeTblEntry *rte = root->simple_rte_array[relid];
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
OpExpr *expr = (OpExpr *) clause;
|
|
|
|
Var *var;
|
|
|
|
|
|
|
|
/* Only expressions with two arguments are considered compatible. */
|
|
|
|
if (list_length(expr->args) != 2)
|
|
|
|
return false;
|
|
|
|
|
2019-07-13 00:12:16 +02:00
|
|
|
/* Check if the expression the right shape (one Var, one Const) */
|
|
|
|
if (!examine_opclause_expression(expr, &var, NULL, NULL))
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
return false;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If it's not one of the supported operators ("=", "<", ">", etc.),
|
|
|
|
* just ignore the clause, as it's not compatible with MCV lists.
|
|
|
|
*
|
|
|
|
* This uses the function for estimating selectivity, not the operator
|
|
|
|
* directly (a bit awkward, but well ...).
|
|
|
|
*/
|
|
|
|
switch (get_oprrest(expr->opno))
|
|
|
|
{
|
|
|
|
case F_EQSEL:
|
|
|
|
case F_NEQSEL:
|
|
|
|
case F_SCALARLTSEL:
|
|
|
|
case F_SCALARLESEL:
|
|
|
|
case F_SCALARGTSEL:
|
|
|
|
case F_SCALARGESEL:
|
|
|
|
/* supported, will continue with inspection of the Var */
|
|
|
|
break;
|
|
|
|
|
|
|
|
default:
|
|
|
|
/* other estimators are considered unknown/unsupported */
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2019-06-23 19:50:08 +02:00
|
|
|
/*
|
|
|
|
* If there are any securityQuals on the RTE from security barrier
|
|
|
|
* views or RLS policies, then the user may not have access to all the
|
|
|
|
* table's data, and we must check that the operator is leak-proof.
|
|
|
|
*
|
|
|
|
* If the operator is leaky, then we must ignore this clause for the
|
|
|
|
* purposes of estimating with MCV lists, otherwise the operator might
|
|
|
|
* reveal values from the MCV list that the user doesn't have
|
|
|
|
* permission to see.
|
|
|
|
*/
|
|
|
|
if (rte->securityQuals != NIL &&
|
|
|
|
!get_func_leakproof(get_opcode(expr->opno)))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
return statext_is_compatible_clause_internal(root, (Node *) var,
|
|
|
|
relid, attnums);
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/* AND/OR/NOT clause */
|
|
|
|
if (is_andclause(clause) ||
|
|
|
|
is_orclause(clause) ||
|
|
|
|
is_notclause(clause))
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* AND/OR/NOT-clauses are supported if all sub-clauses are supported
|
|
|
|
*
|
|
|
|
* Perhaps we could improve this by handling mixed cases, when some of
|
|
|
|
* the clauses are supported and some are not. Selectivity for the
|
|
|
|
* supported subclauses would be computed using extended statistics,
|
|
|
|
* and the remaining clauses would be estimated using the traditional
|
|
|
|
* algorithm (product of selectivities).
|
|
|
|
*
|
|
|
|
* It however seems overly complex, and in a way we already do that
|
|
|
|
* because if we reject the whole clause as unsupported here, it will
|
|
|
|
* be eventually passed to clauselist_selectivity() which does exactly
|
|
|
|
* this (split into supported/unsupported clauses etc).
|
|
|
|
*/
|
|
|
|
BoolExpr *expr = (BoolExpr *) clause;
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
foreach(lc, expr->args)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Had we found incompatible clause in the arguments, treat the
|
|
|
|
* whole clause as incompatible.
|
|
|
|
*/
|
2019-06-23 19:50:08 +02:00
|
|
|
if (!statext_is_compatible_clause_internal(root,
|
|
|
|
(Node *) lfirst(lc),
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
relid, attnums))
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Var IS NULL */
|
|
|
|
if (IsA(clause, NullTest))
|
|
|
|
{
|
|
|
|
NullTest *nt = (NullTest *) clause;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Only simple (Var IS NULL) expressions supported for now. Maybe we
|
|
|
|
* could use examine_variable to fix this?
|
|
|
|
*/
|
|
|
|
if (!IsA(nt->arg, Var))
|
|
|
|
return false;
|
|
|
|
|
2019-06-23 19:50:08 +02:00
|
|
|
return statext_is_compatible_clause_internal(root, (Node *) (nt->arg),
|
|
|
|
relid, attnums);
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* statext_is_compatible_clause
|
|
|
|
* Determines if the clause is compatible with MCV lists.
|
|
|
|
*
|
|
|
|
* Currently, we only support three types of clauses:
|
|
|
|
*
|
|
|
|
* (a) OpExprs of the form (Var op Const), or (Const op Var), where the op
|
|
|
|
* is one of ("=", "<", ">", ">=", "<=")
|
|
|
|
*
|
|
|
|
* (b) (Var IS [NOT] NULL)
|
|
|
|
*
|
|
|
|
* (c) combinations using AND/OR/NOT
|
|
|
|
*
|
|
|
|
* In the future, the range of supported clauses may be expanded to more
|
|
|
|
* complex cases, for example (Var op Var).
|
|
|
|
*/
|
|
|
|
static bool
|
2019-06-23 19:50:08 +02:00
|
|
|
statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
|
|
|
|
Bitmapset **attnums)
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
{
|
2019-06-23 19:50:08 +02:00
|
|
|
RangeTblEntry *rte = root->simple_rte_array[relid];
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
RestrictInfo *rinfo = (RestrictInfo *) clause;
|
2019-06-23 19:50:08 +02:00
|
|
|
Oid userid;
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
|
|
|
|
if (!IsA(rinfo, RestrictInfo))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/* Pseudoconstants are not really interesting here. */
|
|
|
|
if (rinfo->pseudoconstant)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/* clauses referencing multiple varnos are incompatible */
|
|
|
|
if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
|
|
|
|
return false;
|
|
|
|
|
2019-06-23 19:50:08 +02:00
|
|
|
/* Check the clause and determine what attributes it references. */
|
|
|
|
if (!statext_is_compatible_clause_internal(root, (Node *) rinfo->clause,
|
|
|
|
relid, attnums))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check that the user has permission to read all these attributes. Use
|
|
|
|
* checkAsUser if it's set, in case we're accessing the table via a view.
|
|
|
|
*/
|
|
|
|
userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
|
|
|
|
|
|
|
|
if (pg_class_aclcheck(rte->relid, userid, ACL_SELECT) != ACLCHECK_OK)
|
|
|
|
{
|
|
|
|
/* Don't have table privilege, must check individual columns */
|
|
|
|
if (bms_is_member(InvalidAttrNumber, *attnums))
|
|
|
|
{
|
|
|
|
/* Have a whole-row reference, must have access to all columns */
|
|
|
|
if (pg_attribute_aclcheck_all(rte->relid, userid, ACL_SELECT,
|
|
|
|
ACLMASK_ALL) != ACLCHECK_OK)
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Check the columns referenced by the clause */
|
|
|
|
int attnum = -1;
|
|
|
|
|
|
|
|
while ((attnum = bms_next_member(*attnums, attnum)) >= 0)
|
|
|
|
{
|
|
|
|
if (pg_attribute_aclcheck(rte->relid, attnum, userid,
|
|
|
|
ACL_SELECT) != ACLCHECK_OK)
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* If we reach here, the clause is OK */
|
|
|
|
return true;
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* statext_mcv_clauselist_selectivity
|
|
|
|
* Estimate clauses using the best multi-column statistics.
|
|
|
|
*
|
|
|
|
* Selects the best extended (multi-column) statistic on a table (measured by
|
|
|
|
* the number of attributes extracted from the clauses and covered by it), and
|
|
|
|
* computes the selectivity for the supplied clauses.
|
|
|
|
*
|
|
|
|
* One of the main challenges with using MCV lists is how to extrapolate the
|
|
|
|
* estimate to the data not covered by the MCV list. To do that, we compute
|
|
|
|
* not only the "MCV selectivity" (selectivities for MCV items matching the
|
|
|
|
* supplied clauses), but also a couple of derived selectivities:
|
|
|
|
*
|
|
|
|
* - simple selectivity: Computed without extended statistic, i.e. as if the
|
|
|
|
* columns/clauses were independent
|
|
|
|
*
|
|
|
|
* - base selectivity: Similar to simple selectivity, but is computed using
|
|
|
|
* the extended statistic by adding up the base frequencies (that we compute
|
|
|
|
* and store for each MCV item) of matching MCV items.
|
|
|
|
*
|
|
|
|
* - total selectivity: Selectivity covered by the whole MCV list.
|
|
|
|
*
|
|
|
|
* - other selectivity: A selectivity estimate for data not covered by the MCV
|
|
|
|
* list (i.e. satisfying the clauses, but not common enough to make it into
|
|
|
|
* the MCV list)
|
|
|
|
*
|
|
|
|
* Note: While simple and base selectivities are defined in a quite similar
|
|
|
|
* way, the values are computed differently and are not therefore equal. The
|
|
|
|
* simple selectivity is computed as a product of per-clause estimates, while
|
|
|
|
* the base selectivity is computed by adding up base frequencies of matching
|
|
|
|
* items of the multi-column MCV list. So the values may differ for two main
|
|
|
|
* reasons - (a) the MCV list may not cover 100% of the data and (b) some of
|
|
|
|
* the MCV items did not match the estimated clauses.
|
|
|
|
*
|
|
|
|
* As both (a) and (b) reduce the base selectivity value, it generally holds
|
|
|
|
* that (simple_selectivity >= base_selectivity). If the MCV list covers all
|
|
|
|
* the data, the values may be equal.
|
|
|
|
*
|
|
|
|
* So, (simple_selectivity - base_selectivity) is an estimate for the part
|
|
|
|
* not covered by the MCV list, and (mcv_selectivity - base_selectivity) may
|
|
|
|
* be seen as a correction for the part covered by the MCV list. Those two
|
|
|
|
* statements are actually equivalent.
|
|
|
|
*
|
|
|
|
* Note: Due to rounding errors and minor differences in how the estimates
|
|
|
|
* are computed, the inequality may not always hold. Which is why we clamp
|
|
|
|
* the selectivities to prevent strange estimate (negative etc.).
|
|
|
|
*
|
|
|
|
* 'estimatedclauses' is an input/output parameter. We set bits for the
|
|
|
|
* 0-based 'clauses' indexes we estimate for and also skip clause items that
|
|
|
|
* already have a bit set.
|
|
|
|
*
|
|
|
|
* XXX If we were to use multiple statistics, this is where it would happen.
|
|
|
|
* We would simply repeat this on a loop on the "remaining" clauses, possibly
|
|
|
|
* using the already estimated clauses as conditions (and combining the values
|
|
|
|
* using conditional probability formula).
|
|
|
|
*/
|
|
|
|
static Selectivity
|
|
|
|
statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
|
|
|
|
JoinType jointype, SpecialJoinInfo *sjinfo,
|
|
|
|
RelOptInfo *rel, Bitmapset **estimatedclauses)
|
|
|
|
{
|
|
|
|
ListCell *l;
|
|
|
|
Bitmapset **list_attnums;
|
|
|
|
int listidx;
|
|
|
|
StatisticExtInfo *stat;
|
|
|
|
List *stat_clauses;
|
2019-05-22 18:55:34 +02:00
|
|
|
Selectivity simple_sel,
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
mcv_sel,
|
|
|
|
mcv_basesel,
|
|
|
|
mcv_totalsel,
|
|
|
|
other_sel,
|
|
|
|
sel;
|
|
|
|
|
|
|
|
/* check if there's any stats that might be useful for us. */
|
|
|
|
if (!has_stats_of_kind(rel->statlist, STATS_EXT_MCV))
|
|
|
|
return 1.0;
|
|
|
|
|
|
|
|
list_attnums = (Bitmapset **) palloc(sizeof(Bitmapset *) *
|
|
|
|
list_length(clauses));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Pre-process the clauses list to extract the attnums seen in each item.
|
|
|
|
* We need to determine if there's any clauses which will be useful for
|
|
|
|
* selectivity estimations with extended stats. Along the way we'll record
|
2019-05-22 18:55:34 +02:00
|
|
|
* all of the attnums for each clause in a list which we'll reference
|
|
|
|
* later so we don't need to repeat the same work again. We'll also keep
|
|
|
|
* track of all attnums seen.
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
*
|
|
|
|
* We also skip clauses that we already estimated using different types of
|
|
|
|
* statistics (we treat them as incompatible).
|
|
|
|
*/
|
|
|
|
listidx = 0;
|
|
|
|
foreach(l, clauses)
|
|
|
|
{
|
|
|
|
Node *clause = (Node *) lfirst(l);
|
|
|
|
Bitmapset *attnums = NULL;
|
|
|
|
|
|
|
|
if (!bms_is_member(listidx, *estimatedclauses) &&
|
2019-06-23 19:50:08 +02:00
|
|
|
statext_is_compatible_clause(root, clause, rel->relid, &attnums))
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
list_attnums[listidx] = attnums;
|
|
|
|
else
|
|
|
|
list_attnums[listidx] = NULL;
|
|
|
|
|
|
|
|
listidx++;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* find the best suited statistics object for these attnums */
|
2019-11-28 22:20:28 +01:00
|
|
|
stat = choose_best_statistics(rel->statlist, STATS_EXT_MCV,
|
|
|
|
list_attnums, list_length(clauses));
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
|
|
|
|
/* if no matching stats could be found then we've nothing to do */
|
|
|
|
if (!stat)
|
|
|
|
return 1.0;
|
|
|
|
|
|
|
|
/* Ensure choose_best_statistics produced an expected stats type. */
|
|
|
|
Assert(stat->kind == STATS_EXT_MCV);
|
|
|
|
|
|
|
|
/* now filter the clauses to be estimated using the selected MCV */
|
|
|
|
stat_clauses = NIL;
|
|
|
|
|
|
|
|
listidx = 0;
|
|
|
|
foreach(l, clauses)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* If the clause is compatible with the selected statistics, mark it
|
|
|
|
* as estimated and add it to the list to estimate.
|
|
|
|
*/
|
|
|
|
if (list_attnums[listidx] != NULL &&
|
|
|
|
bms_is_subset(list_attnums[listidx], stat->keys))
|
|
|
|
{
|
|
|
|
stat_clauses = lappend(stat_clauses, (Node *) lfirst(l));
|
|
|
|
*estimatedclauses = bms_add_member(*estimatedclauses, listidx);
|
|
|
|
}
|
|
|
|
|
|
|
|
listidx++;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2019-05-22 18:55:34 +02:00
|
|
|
* First compute "simple" selectivity, i.e. without the extended
|
|
|
|
* statistics, and essentially assuming independence of the
|
|
|
|
* columns/clauses. We'll then use the various selectivities computed from
|
|
|
|
* MCV list to improve it.
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
*/
|
|
|
|
simple_sel = clauselist_selectivity_simple(root, stat_clauses, varRelid,
|
|
|
|
jointype, sjinfo, NULL);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Now compute the multi-column estimate from the MCV list, along with the
|
|
|
|
* other selectivities (base & total selectivity).
|
|
|
|
*/
|
|
|
|
mcv_sel = mcv_clauselist_selectivity(root, stat, stat_clauses, varRelid,
|
|
|
|
jointype, sjinfo, rel,
|
|
|
|
&mcv_basesel, &mcv_totalsel);
|
|
|
|
|
|
|
|
/* Estimated selectivity of values not covered by MCV matches */
|
|
|
|
other_sel = simple_sel - mcv_basesel;
|
|
|
|
CLAMP_PROBABILITY(other_sel);
|
|
|
|
|
|
|
|
/* The non-MCV selectivity can't exceed the 1 - mcv_totalsel. */
|
|
|
|
if (other_sel > 1.0 - mcv_totalsel)
|
|
|
|
other_sel = 1.0 - mcv_totalsel;
|
|
|
|
|
|
|
|
/* Overall selectivity is the combination of MCV and non-MCV estimates. */
|
|
|
|
sel = mcv_sel + other_sel;
|
|
|
|
CLAMP_PROBABILITY(sel);
|
|
|
|
|
|
|
|
return sel;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* statext_clauselist_selectivity
|
|
|
|
* Estimate clauses using the best multi-column statistics.
|
|
|
|
*/
|
|
|
|
Selectivity
|
|
|
|
statext_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
|
|
|
|
JoinType jointype, SpecialJoinInfo *sjinfo,
|
|
|
|
RelOptInfo *rel, Bitmapset **estimatedclauses)
|
|
|
|
{
|
2019-05-22 18:55:34 +02:00
|
|
|
Selectivity sel;
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
|
|
|
|
/* First, try estimating clauses using a multivariate MCV list. */
|
|
|
|
sel = statext_mcv_clauselist_selectivity(root, clauses, varRelid, jointype,
|
|
|
|
sjinfo, rel, estimatedclauses);
|
|
|
|
|
|
|
|
/*
|
2019-05-22 18:55:34 +02:00
|
|
|
* Then, apply functional dependencies on the remaining clauses by calling
|
|
|
|
* dependencies_clauselist_selectivity. Pass 'estimatedclauses' so the
|
|
|
|
* function can properly skip clauses already estimated above.
|
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
|
|
|
*
|
|
|
|
* The reasoning for applying dependencies last is that the more complex
|
|
|
|
* stats can track more complex correlations between the attributes, and
|
|
|
|
* so may be considered more reliable.
|
|
|
|
*
|
|
|
|
* For example, MCV list can give us an exact selectivity for values in
|
|
|
|
* two columns, while functional dependencies can only provide information
|
|
|
|
* about the overall strength of the dependency.
|
|
|
|
*/
|
|
|
|
sel *= dependencies_clauselist_selectivity(root, clauses, varRelid,
|
|
|
|
jointype, sjinfo, rel,
|
|
|
|
estimatedclauses);
|
|
|
|
|
|
|
|
return sel;
|
|
|
|
}
|
2019-07-13 00:12:16 +02:00
|
|
|
|
|
|
|
/*
|
2019-08-19 09:21:39 +02:00
|
|
|
* examine_opclause_expression
|
2019-07-13 00:12:16 +02:00
|
|
|
* Split expression into Var and Const parts.
|
|
|
|
*
|
|
|
|
* Attempts to match the arguments to either (Var op Const) or (Const op Var),
|
|
|
|
* possibly with a RelabelType on top. When the expression matches this form,
|
|
|
|
* returns true, otherwise returns false.
|
|
|
|
*
|
|
|
|
* Optionally returns pointers to the extracted Var/Const nodes, when passed
|
2019-07-19 16:28:28 +02:00
|
|
|
* non-null pointers (varp, cstp and varonleftp). The varonleftp flag specifies
|
|
|
|
* on which side of the operator we found the Var node.
|
2019-07-13 00:12:16 +02:00
|
|
|
*/
|
|
|
|
bool
|
2019-07-19 16:28:28 +02:00
|
|
|
examine_opclause_expression(OpExpr *expr, Var **varp, Const **cstp, bool *varonleftp)
|
2019-07-13 00:12:16 +02:00
|
|
|
{
|
|
|
|
Var *var;
|
|
|
|
Const *cst;
|
2019-07-19 16:28:28 +02:00
|
|
|
bool varonleft;
|
2019-07-13 00:12:16 +02:00
|
|
|
Node *leftop,
|
|
|
|
*rightop;
|
|
|
|
|
|
|
|
/* enforced by statext_is_compatible_clause_internal */
|
|
|
|
Assert(list_length(expr->args) == 2);
|
|
|
|
|
|
|
|
leftop = linitial(expr->args);
|
|
|
|
rightop = lsecond(expr->args);
|
|
|
|
|
|
|
|
/* strip RelabelType from either side of the expression */
|
|
|
|
if (IsA(leftop, RelabelType))
|
|
|
|
leftop = (Node *) ((RelabelType *) leftop)->arg;
|
|
|
|
|
|
|
|
if (IsA(rightop, RelabelType))
|
|
|
|
rightop = (Node *) ((RelabelType *) rightop)->arg;
|
|
|
|
|
|
|
|
if (IsA(leftop, Var) && IsA(rightop, Const))
|
|
|
|
{
|
|
|
|
var = (Var *) leftop;
|
|
|
|
cst = (Const *) rightop;
|
2019-07-19 16:28:28 +02:00
|
|
|
varonleft = true;
|
2019-07-13 00:12:16 +02:00
|
|
|
}
|
|
|
|
else if (IsA(leftop, Const) && IsA(rightop, Var))
|
|
|
|
{
|
|
|
|
var = (Var *) rightop;
|
|
|
|
cst = (Const *) leftop;
|
2019-07-19 16:28:28 +02:00
|
|
|
varonleft = false;
|
2019-07-13 00:12:16 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/* return pointers to the extracted parts if requested */
|
|
|
|
if (varp)
|
|
|
|
*varp = var;
|
|
|
|
|
|
|
|
if (cstp)
|
|
|
|
*cstp = cst;
|
|
|
|
|
2019-07-19 16:28:28 +02:00
|
|
|
if (varonleftp)
|
|
|
|
*varonleftp = varonleft;
|
2019-07-13 00:12:16 +02:00
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|