Commit Graph

18 Commits

Author SHA1 Message Date
Bruce Momjian 953c64e0f6 doc: add commas after 'i.e.' and 'e.g.'
This follows the American format,
https://jakubmarian.com/comma-after-i-e-and-e-g/. There is no intention
of requiring this format for future text, but making existing text
consistent every few years makes sense.

Discussion: https://postgr.es/m/20200825183619.GA22369@momjian.us

Backpatch-through: 9.5
2020-08-31 18:33:37 -04:00
Bruce Momjian 7e0fb165dd docs: adjust multi-column most-common-value statistics
This commit adds a mention that the order of columns specified during
multi-column most-common-value statistics is insignificant, and tries to
simplify examples.

Discussion: https://postgr.es/m/20190828162238.GA8360@momjian.us

Backpatch-through: 12
2019-09-30 13:44:22 -04:00
Tomas Vondra 7f44efa10b Fix incorrect CREATE STATISTICS example in docs
The example was incorrectly using parantheses around the list of columns, so
just drop them.

Reported-By: Robert Haas
Discussion: https://postgr.es/m/CA%2BTgmoZZEMAqWMAfvLHZnK57SoxOutgvE-ALO94WsRA7zZ7wyQ%40mail.gmail.com
2019-06-16 01:20:42 +02:00
Tomas Vondra 7300a69950 Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.

Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.

Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
Peter Eisentraut f7481d2c3c Documentation spell checking and markup improvements 2018-06-29 21:26:41 +02:00
Peter Eisentraut 3c49c6facb Convert documentation to DocBook XML
Since some preparation work had already been done, the only source
changes left were changing empty-element tags like <xref linkend="foo">
to <xref linkend="foo"/>, and changing the DOCTYPE.

The source files are still named *.sgml, but they are actually XML files
now.  Renaming could be considered later.

In the build system, the intermediate step to convert from SGML to XML
is removed.  Everything is build straight from the source files again.
The OpenSP (or the old SP) package is no longer needed.

The documentation toolchain instructions are updated and are much
simpler now.

Peter Eisentraut, Alexander Lakhin, Jürgen Purtz
2017-11-23 09:44:28 -05:00
Peter Eisentraut 1ff01b3902 Convert SGML IDs to lower case
IDs in SGML are case insensitive, and we have accumulated a mix of upper
and lower case IDs, including different variants of the same ID.  In
XML, these will be case sensitive, so we need to fix up those
differences.  Going to all lower case seems most straightforward, and
the current build process already makes all anchors and lower case
anyway during the SGML->XML conversion, so this doesn't create any
difference in the output right now.  A future XML-only build process
would, however, maintain any mixed case ID spellings in the output, so
that is another reason to clean this up beforehand.

Author: Alexander Lakhin <exclusion@gmail.com>
2017-10-20 19:26:10 -04:00
Peter Eisentraut c29c578908 Don't use SGML empty tags
For DocBook XML compatibility, don't use SGML empty tags (</>) anymore,
replace by the full tag name.  Add a warning option to catch future
occurrences.

Alexander Lakhin, Jürgen Purtz
2017-10-17 15:10:33 -04:00
Peter Eisentraut 44b3230e82 Use lower-case SGML attribute values
for DocBook XML compatibility
2017-10-10 10:15:57 -04:00
Peter Eisentraut 821fb8cdbf Message style fixes 2017-09-11 11:21:27 -04:00
Peter Eisentraut bbaf9e8f84 Documentation spell checking and markup improvements 2017-06-18 14:02:12 -04:00
Alvaro Herrera eb67e2e35a extended stats: Clarify behavior of omitting stat type clause
Pointed out by Jeff Janes
Discussion: https://postgr.es/m/CAMkU=1zGhK-nW10RAXhokcT3MM=YBg=j5LkG9RMDwmu3i0H0Og@mail.gmail.com
2017-05-25 13:22:25 -04:00
Tom Lane 93ece9cc88 Edit SGML documentation related to extended statistics.
Use the "statistics object" terminology uniformly here too.  Assorted
copy-editing.  Put new catalogs.sgml sections into alphabetical order.
2017-05-14 19:15:52 -04:00
Alvaro Herrera bc085205c8 Change CREATE STATISTICS syntax
Previously, we had the WITH clause in the middle of the command, where
you'd specify both generic options as well as statistic types.  Few
people liked this, so this commit changes it to remove the WITH keyword
from that clause and makes it accept statistic types only.  (We
currently don't have any generic options, but if we invent in the
future, we will gain a new WITH clause, probably at the end of the
command).

Also, the column list is now specified without parens, which makes the
whole command look more similar to a SELECT command.  This change will
let us expand the command to supporting expressions (not just columns
names) as well as multiple tables and their join conditions.

Tom added lots of code comments and fixed some parts of the CREATE
STATISTICS reference page, too; more changes in this area are
forthcoming.  He also fixed a potential problem in the alter_generic
regression test, reducing verbosity on a cascaded drop to avoid
dependency on message ordering, as we do in other tests.

Tom also closed a security bug: we documented that table ownership was
required in order to create a statistics object on it, but didn't
actually implement it.

Implement tab-completion for statistics objects.  This can stand some
more improvement.

Authors: Alvaro Herrera, with lots of cleanup by Tom Lane
Discussion: https://postgr.es/m/20170420212426.ltvgyhnefvhixm6i@alvherre.pgsql
2017-05-12 14:59:35 -03:00
Alvaro Herrera 8c5cdb7f4f Tighten up relation kind checks for extended statistics
We were accepting creation of extended statistics only for regular
tables, but they can usefully be created for foreign tables, partitioned
tables, and materialized views, too.  Allow those cases.

While at it, make sure all the rejected cases throw a consistent error
message, and add regression tests for the whole thing.

Author: David Rowley, Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f-BmGo410bh5RSPZUvOO0LhmHL2NYmdrC_Jm8pk_FfyCA@mail.gmail.com
2017-04-17 17:55:55 -03:00
Peter Eisentraut f0e44021df doc: Add some markup 2017-04-07 22:45:39 -04:00
Simon Riggs 2686ee1b7c Collect and use multi-column dependency stats
Follow on patch in the multi-variate statistics patch series.

CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t;
ANALYZE;
will collect dependency stats on (a, b) and then use the measured
dependency in subsequent query planning.

Commit 7b504eb282 added
CREATE STATISTICS with n-distinct coefficients. These are now
specified using the mutually exclusive option WITH (ndistinct).

Author: Tomas Vondra, David Rowley
Reviewed-by: Kyotaro HORIGUCHI, Álvaro Herrera, Dean Rasheed, Robert Haas
and many other comments and contributions
Discussion: https://postgr.es/m/56f40b20-c464-fad2-ff39-06b668fac47c@2ndquadrant.com
2017-04-05 18:00:42 -04:00
Alvaro Herrera 7b504eb282 Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns.  Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too.  All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.

This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table.  This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve.  A new
special pseudo-type pg_ndistinct is used.

(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)

Author: Tomas Vondra.  Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
    Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
    https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 14:06:10 -03:00