Commit Graph

13 Commits

Author SHA1 Message Date
Tom Lane b5b0db19b8 Fix handling of extended statistics during ALTER COLUMN TYPE.
ALTER COLUMN TYPE on a column used by a statistics object fails since
commit 928c4de30, because the relevant switch in ATExecAlterColumnType
is unprepared for columns to have dependencies from OCLASS_STATISTIC_EXT
objects.

Although the existing types of extended statistics don't actually need us
to do any work for a column type change, it seems completely indefensible
that that assumption is hidden behind the failure of an unrelated module
to contain any code for the case.  Hence, create and call an API function
in statscmds.c where the assumption can be explained, and where we could
add code to deal with the problem when it inevitably becomes real.

Also, the reason this wasn't handled before, neither for extended stats
nor for the last half-dozen new OCLASS kinds :-(, is that the default:
in that switch suppresses compiler warnings, allowing people to miss the
need to consider it when adding an OCLASS.  We don't really need a default
because surely getObjectClass should only return valid values of the enum;
so remove it, and add the missed OCLASS entries where they should be.

Discussion: https://postgr.es/m/20170512221010.nglatgt5azzdxjlj@alvherre.pgsql
2017-05-14 12:22:25 -04:00
Tom Lane f04c9a6146 Standardize terminology for pg_statistic_ext entries.
Consistently refer to such an entry as a "statistics object", not just
"statistics" or "extended statistics".  Previously we had a mismash of
terms, accompanied by utter confusion as to whether the term was
singular or plural.  That's not only grating (at least to the ear of
a native English speaker) but could be outright misleading, eg in error
messages that seemed to be referring to multiple objects where only one
could be meant.

This commit fixes the code and a lot of comments (though I may have
missed a few).  I also renamed two new SQL functions,
pg_get_statisticsextdef -> pg_get_statisticsobjdef
pg_statistic_ext_is_visible -> pg_statistics_obj_is_visible
to conform better with this terminology.

I have not touched the SGML docs other than fixing those function
names; the docs certainly need work but it seems like a separable task.

Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us
2017-05-14 10:55:01 -04:00
Tom Lane 928c4de309 Fix dependencies for extended statistics objects.
A stats object ought to have a dependency on each individual column
it reads, not the entire table.  Doing this honestly lets us get rid
of the hard-wired logic in RemoveStatisticsExt, which seems to have
been misguidedly modeled on RemoveStatistics; and it will be far easier
to extend to multiple tables later.

Also, add overlooked dependency on owner, and make the dependency on
schema be NORMAL like every other such dependency.

There remains some unfinished work here, which is to allow statistics
objects to be extension members.  That takes more effort than just
adding the dependency call, though, so I left it out for now.

initdb forced because this changes the set of pg_depend records that
should exist for a statistics object.

Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us
2017-05-12 16:26:31 -04:00
Alvaro Herrera bc085205c8 Change CREATE STATISTICS syntax
Previously, we had the WITH clause in the middle of the command, where
you'd specify both generic options as well as statistic types.  Few
people liked this, so this commit changes it to remove the WITH keyword
from that clause and makes it accept statistic types only.  (We
currently don't have any generic options, but if we invent in the
future, we will gain a new WITH clause, probably at the end of the
command).

Also, the column list is now specified without parens, which makes the
whole command look more similar to a SELECT command.  This change will
let us expand the command to supporting expressions (not just columns
names) as well as multiple tables and their join conditions.

Tom added lots of code comments and fixed some parts of the CREATE
STATISTICS reference page, too; more changes in this area are
forthcoming.  He also fixed a potential problem in the alter_generic
regression test, reducing verbosity on a cascaded drop to avoid
dependency on message ordering, as we do in other tests.

Tom also closed a security bug: we documented that table ownership was
required in order to create a statistics object on it, but didn't
actually implement it.

Implement tab-completion for statistics objects.  This can stand some
more improvement.

Authors: Alvaro Herrera, with lots of cleanup by Tom Lane
Discussion: https://postgr.es/m/20170420212426.ltvgyhnefvhixm6i@alvherre.pgsql
2017-05-12 14:59:35 -03:00
Alvaro Herrera 93bbeec6a2 extstats: change output functions to emit valid JSON
Manipulating extended statistics is more convenient as JSON than the
current ad-hoc format, so let's change before it's too late.

Discussion: https://postgr.es/m/20170420193828.k3fliiock5hdnehn@alvherre.pgsql
2017-05-02 18:49:32 -03:00
Tom Lane 4b34624daa Code review for commands/statscmds.c.
Fix machine-dependent sorting of column numbers.  (Odd behavior
would only materialize for column numbers above 255, but that's
certainly legal.)

Fix poor choice of SQLSTATE for some errors, and improve error message
wording.  (Notably, "is not a scalar type" is a totally misleading way
to explain "does not have a default btree opclass".)

Avoid taking AccessExclusiveLock on the associated relation during DROP
STATISTICS.  That's neither necessary nor desirable, and it could easily
have put us into situations where DROP fails (compare commit 68ea2b7f9).

Adjust/improve comments.

David Rowley and Tom Lane

Discussion: https://postgr.es/m/CAKJS1f-GmCfPvBbAEaM5xoVOaYdVgVN1gicALSoYQ77z-+vLbw@mail.gmail.com
2017-04-24 11:15:15 -04:00
Alvaro Herrera ee6922112e Rename columns in new pg_statistic_ext catalog
The new catalog reused a column prefix "sta" from pg_statistic, but this
is undesirable, so change the catalog to use prefix "stx" instead.
Also, rename the column that lists enabled statistic kinds as "stxkind"
rather than "enabled".

Discussion: https://postgr.es/m/CAKJS1f_2t5jhSN7huYRFH3w3rrHfG2QU7hiUHsu-Vdjd1rYT3w@mail.gmail.com
2017-04-17 18:34:29 -03:00
Alvaro Herrera 8c5cdb7f4f Tighten up relation kind checks for extended statistics
We were accepting creation of extended statistics only for regular
tables, but they can usefully be created for foreign tables, partitioned
tables, and materialized views, too.  Allow those cases.

While at it, make sure all the rejected cases throw a consistent error
message, and add regression tests for the whole thing.

Author: David Rowley, Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f-BmGo410bh5RSPZUvOO0LhmHL2NYmdrC_Jm8pk_FfyCA@mail.gmail.com
2017-04-17 17:55:55 -03:00
Alvaro Herrera bf2a691e02 Fix extended statistics with partial analyzes
Either because of a previous ALTER TABLE .. SET STATISTICS 0 or because
of being invoked with a partial column list, ANALYZE could fail to
acquire sufficient data to build extended statistics.  Previously, this
would draw an ERROR and fail to collect any statistics at all (extended
and regular).  Change things so that we raise a WARNING instead, and
remove a hint that was wrong in half the cases.

Reported by: David Rowley
Discussion: https://postgr.es/m/CAKJS1f9Kk0NF6Fg7TA=JUXsjpS9kX6NVu27pb5QDCpOYAvb-Og@mail.gmail.com
2017-04-17 14:00:47 -03:00
Simon Riggs 2686ee1b7c Collect and use multi-column dependency stats
Follow on patch in the multi-variate statistics patch series.

CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t;
ANALYZE;
will collect dependency stats on (a, b) and then use the measured
dependency in subsequent query planning.

Commit 7b504eb282 added
CREATE STATISTICS with n-distinct coefficients. These are now
specified using the mutually exclusive option WITH (ndistinct).

Author: Tomas Vondra, David Rowley
Reviewed-by: Kyotaro HORIGUCHI, Álvaro Herrera, Dean Rasheed, Robert Haas
and many other comments and contributions
Discussion: https://postgr.es/m/56f40b20-c464-fad2-ff39-06b668fac47c@2ndquadrant.com
2017-04-05 18:00:42 -04:00
Alvaro Herrera bed9ef5a16 Rework the stats_ext test
As suggested by Tom Lane, avoid printing specific estimated cost values,
because they vary across architectures; instead, verify plan shapes (in
this case, HashAggregate vs. GroupAggregate), as we do in other planner
tests.

We can now remove expected/stats_ext_1.out.

Author: Tomas Vondra
2017-03-27 12:43:04 -03:00
Alvaro Herrera 2c3e47527a Fix a couple of problems in pg_get_statisticsextdef
There was a thinko whereby we tested the wrong tuple after fetching it
from cache; avoid that by using generate_relation_name instead, which is
simpler.  Also, the statistics name was not qualified, so add that.  (It
could be argued that qualification should be conditional on the schema
not being on search path.  We can add that later, but at least this form
is correct.)

Author: David Rowley, Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f8RjLeVZJ2+93pdQGuZJeBF-ifsHaFMR-q-6-Z0qxA8cA@mail.gmail.com
2017-03-27 01:03:50 -03:00
Alvaro Herrera 7b504eb282 Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns.  Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too.  All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.

This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table.  This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve.  A new
special pseudo-type pg_ndistinct is used.

(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)

Author: Tomas Vondra.  Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
    Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
    https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 14:06:10 -03:00