Commit Graph

4 Commits

Author SHA1 Message Date
Simon Riggs
2686ee1b7c Collect and use multi-column dependency stats
Follow on patch in the multi-variate statistics patch series.

CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t;
ANALYZE;
will collect dependency stats on (a, b) and then use the measured
dependency in subsequent query planning.

Commit 7b504eb282 added
CREATE STATISTICS with n-distinct coefficients. These are now
specified using the mutually exclusive option WITH (ndistinct).

Author: Tomas Vondra, David Rowley
Reviewed-by: Kyotaro HORIGUCHI, Álvaro Herrera, Dean Rasheed, Robert Haas
and many other comments and contributions
Discussion: https://postgr.es/m/56f40b20-c464-fad2-ff39-06b668fac47c@2ndquadrant.com
2017-04-05 18:00:42 -04:00
Alvaro Herrera
bed9ef5a16 Rework the stats_ext test
As suggested by Tom Lane, avoid printing specific estimated cost values,
because they vary across architectures; instead, verify plan shapes (in
this case, HashAggregate vs. GroupAggregate), as we do in other planner
tests.

We can now remove expected/stats_ext_1.out.

Author: Tomas Vondra
2017-03-27 12:43:04 -03:00
Alvaro Herrera
2c3e47527a Fix a couple of problems in pg_get_statisticsextdef
There was a thinko whereby we tested the wrong tuple after fetching it
from cache; avoid that by using generate_relation_name instead, which is
simpler.  Also, the statistics name was not qualified, so add that.  (It
could be argued that qualification should be conditional on the schema
not being on search path.  We can add that later, but at least this form
is correct.)

Author: David Rowley, Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f8RjLeVZJ2+93pdQGuZJeBF-ifsHaFMR-q-6-Z0qxA8cA@mail.gmail.com
2017-03-27 01:03:50 -03:00
Alvaro Herrera
7b504eb282 Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns.  Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too.  All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.

This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table.  This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve.  A new
special pseudo-type pg_ndistinct is used.

(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)

Author: Tomas Vondra.  Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
    Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
    https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 14:06:10 -03:00