postgresql

Commit Graph

Author	SHA1	Message	Date
Tomas Vondra	5be8ce82e8	Fix lookup error in extended stats ownership check When an ownership check on extended statistics object failed, the code was calling aclcheck_error_type to report the failure, which is clearly wrong, resulting in cache lookup errors. Fix by calling aclcheck_error. This issue exists since the introduction of extended statistics, so backpatch all the way back to PostgreSQL 10. It went unnoticed because there were no tests triggering the error, so add one. Reported-by: Mark Dilger Backpatch-through: 10, where extended stats were introduced Discussion: https://postgr.es/m/1F238937-7CC2-4703-A1B1-6DC225B8978A%40enterprisedb.com	2021-08-31 18:33:38 +02:00
Alvaro Herrera	a397109114	psql: Fix name quoting on extended statistics Per our message style guidelines, for human consumption we quote qualified names as a whole rather than each part separately; but commits `bc085205c8` introduced a deviation for extended statistics and `a4d75c86bf` copied it. I don't agree with this policy applying to names shown by psql, but that's a poor reason to deviate from the practice only in two obscure corners, so make said corners use the same style as everywhere else. Backpatch to 14. The first of these is older, but I'm not sure we want to destabilize the psql output in the older branches for such a small thing. Discussion: https://postgr.es/m/20210828181618.GS26465@telsasoft.com	2021-08-30 14:01:29 -04:00
Tomas Vondra	f68b609230	psql \dX: check schema when listing statistics objects Commit `ad600bba04` added psql command \dX listing extended statistics objects, but it failed to consider search_path when selecting the elements so some of the returned elements might be invisible. The visibility was already considered for tab completion (added by commit `d99d58cdc8`), so adding it to the query is fairly simple. Reported and fix by Justin Pryzby, regression tests by me. Backpatch to PostgreSQL 14, where \dX was introduced. Batchpatch-through: 14 Author: Justin Pryzby Reviewed-by: Tatsuro Yamada Discussion: https://postgr.es/m/c027a541-5856-75a5-0868-341301e1624b%40nttcom.co.jp_1	2021-07-26 17:30:39 +02:00
Peter Eisentraut	2ed532ee8c	Improve error messages about mismatching relkind Most error messages about a relkind that was not supported or appropriate for the command was of the pattern "relation \"%s\" is not a table, foreign table, or materialized view" This style can become verbose and tedious to maintain. Moreover, it's not very helpful: If I'm trying to create a comment on a TOAST table, which is not supported, then the information that I could have created a comment on a materialized view is pointless. Instead, write the primary error message shorter and saying more directly that what was attempted is not possible. Then, in the detail message, explain that the operation is not supported for the relkind the object was. To simplify that, add a new function errdetail_relkind_not_supported() that does this. In passing, make use of RELKIND_HAS_STORAGE() where appropriate, instead of listing out the relkinds individually. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/flat/dc35a398-37d0-75ce-07ea-1dd71d98f8ec@2ndquadrant.com	2021-07-08 09:44:51 +02:00
Tomas Vondra	518442c7f3	Fix handling of clauses incompatible with extended statistics Handling of incompatible clauses while applying extended statistics was a bit confused - while handling a mix of compatible and incompatible clauses it sometimes incorrectly treated the incompatible clauses as compatible, resulting in a crash. Fixed by reworking the code applying the selected statistics object to make it easier to understand, and adding a proper compatibility check. Reported-by: David Rowley Discussion: https://postgr.es/m/CAApHDvpYT10-nkSp8xXe-nbO3jmoaRyRFHbzh-RWMfAJynqgpQ%40mail.gmail.com	2021-04-06 16:56:06 +02:00
Tomas Vondra	2a058e938c	Stabilize stats_ext test with other collations The tests used string concatenation to test statistics on expressions, but that made the tests locale-dependent, e.g. because the ordering of '11' and '1X' depends on the collation. This affected both the estimated and actual row couts, breaking some of the tests. Fixed by replacing the string concatenation with upper() function call, so that the text values contain only digits. Discussion: https://postgr.es/m/b650920b-2767-fbc3-c87a-cb8b5d693cbf%40enterprisedb.com	2021-03-27 18:26:56 +01:00
Tomas Vondra	a4d75c86bf	Extended statistics on expressions Allow defining extended statistics on expressions, not just just on simple column references. With this commit, expressions are supported by all existing extended statistics kinds, improving the same types of estimates. A simple example may look like this: CREATE TABLE t (a int); CREATE STATISTICS s ON mod(a,10), mod(a,20) FROM t; ANALYZE t; The collected statistics are useful e.g. to estimate queries with those expressions in WHERE or GROUP BY clauses: SELECT * FROM t WHERE mod(a,10) = 0 AND mod(a,20) = 0; SELECT 1 FROM t GROUP BY mod(a,10), mod(a,20); This introduces new internal statistics kind 'e' (expressions) which is built automatically when the statistics object definition includes any expressions. This represents single-expression statistics, as if there was an expression index (but without the index maintenance overhead). The statistics is stored in pg_statistics_ext_data as an array of composite types, which is possible thanks to `79f6a942bd`. CREATE STATISTICS allows building statistics on a single expression, in which case in which case it's not possible to specify statistics kinds. A new system view pg_stats_ext_exprs can be used to display expression statistics, similarly to pg_stats and pg_stats_ext views. ALTER TABLE ... ALTER COLUMN ... TYPE now treats indexes the same way it treats indexes, i.e. it drops and recreates the statistics. This means all statistics are reset, and we no longer try to preserve at least the functional dependencies. This should not be a major issue in practice, as the functional dependencies actually rely on per-column statistics, which were always reset anyway. Author: Tomas Vondra Reviewed-by: Justin Pryzby, Dean Rasheed, Zhihong Yu Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com	2021-03-27 00:01:11 +01:00
Tomas Vondra	98376c18f1	Reduce duration of stats_ext regression tests The regression tests of extended statistics were taking a fair amount of time, due to using fairly large data sets with a couple thousand rows. So far this was fine, but with tests for statistics on expressions the duration would get a bit excessive. So reduce the size of some of the tests that will be used to test expressions, to keep the duration under control. Done in a separate commit before adding the statistics on expressions, to make it clear which estimates are expected to change. Author: Tomas Vondra Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com	2021-03-26 23:00:47 +01:00
Tomas Vondra	33e52ad9a3	Fix ndistinct estimates with system attributes When estimating the number of groups using extended statistics, the code was discarding information about system attributes. This led to strange situation that SELECT 1 FROM t GROUP BY ctid; could have produced higher estimate (equal to pg_class.reltuples) than SELECT 1 FROM t GROUP BY a, b, ctid; with extended statistics on (a,b). Fixed by retaining information about the system attribute. Backpatch all the way to 10, where extended statistics were introduced. Author: Tomas Vondra Backpatch-through: 10	2021-03-26 22:34:58 +01:00
Tomas Vondra	ad600bba04	psql \dX: list extended statistics objects The new command lists extended statistics objects. All past releases with extended statistics are supported. This is a simplified version of commit `891a1d0bca`, which had to be reverted due to not considering pg_statistic_ext_data is not accessible by regular users. Fields requiring access to this catalog were removed. It's possible to add them, but it'll require changes to core. Author: Tatsuro Yamada Reviewed-by: Julien Rouhaud, Alvaro Herrera, Tomas Vondra, Noriyoshi Shinoda Discussion: https://postgr.es/m/c027a541-5856-75a5-0868-341301e1624b%40nttcom.co.jp_1	2021-01-20 22:57:21 +01:00
Tomas Vondra	1db0d173a2	Revert "psql \dX: list extended statistics objects" Reverts `891a1d0bca`, because the new psql command \dX only worked for users users who can read pg_statistic_ext_data catalog, and most regular users lack that privilege (the catalog may contain sensitive user data). Reported-by: Noriyoshi Shinoda Discussion: https://postgr.es/m/c027a541-5856-75a5-0868-341301e1624b%40nttcom.co.jp_1	2021-01-17 15:11:14 +01:00
Tomas Vondra	891a1d0bca	psql \dX: list extended statistics objects The new command lists extended statistics objects, possibly with their sizes. All past releases with extended statistics are supported. Author: Tatsuro Yamada Reviewed-by: Julien Rouhaud, Alvaro Herrera, Tomas Vondra Discussion: https://postgr.es/m/c027a541-5856-75a5-0868-341301e1624b%40nttcom.co.jp_1	2021-01-17 00:16:45 +01:00
Tomas Vondra	c9a0dc3486	Disallow CREATE STATISTICS on system catalogs Add a check that CREATE STATISTICS does not add extended statistics on system catalogs, similarly to indexes etc. It can be overriden using the allow_system_table_mods GUC. This bug exists since `7b504eb282`, adding the extended statistics, so backpatch all the way back to PostgreSQL 10. Author: Tomas Vondra Reported-by: Dean Rasheed Backpatch-through: 10 Discussion: https://postgr.es/m/CAEZATCXAPrrOKwEsyZKQ4uzzJQWBCt6QAvOcgqRGdWwT1zb%2BrQ%40mail.gmail.com	2021-01-15 23:31:22 +01:00
Dean Rasheed	4f5760d4af	Improve estimation of ANDs under ORs using extended statistics. Formerly, extended statistics only handled clauses that were RestrictInfos. However, the restrictinfo machinery doesn't create sub-AND RestrictInfos for AND clauses underneath OR clauses. Therefore teach extended statistics to handle bare AND clauses, looking for compatible RestrictInfo clauses underneath them. Dean Rasheed, reviewed by Tomas Vondra. Discussion: https://postgr.es/m/CAEZATCW=J65GUFm50RcPv-iASnS2mTXQbr=CfBvWRVhFLJ_fWA@mail.gmail.com	2020-12-08 20:10:11 +00:00
Dean Rasheed	88b0898fe3	Improve estimation of OR clauses using multiple extended statistics. When estimating an OR clause using multiple extended statistics objects, treat the estimates for each set of clauses for each statistics object as independent of one another. The overlap estimates produced for each statistics object do not apply to clauses covered by other statistics objects. Dean Rasheed, reviewed by Tomas Vondra. Discussion: https://postgr.es/m/CAEZATCW=J65GUFm50RcPv-iASnS2mTXQbr=CfBvWRVhFLJ_fWA@mail.gmail.com	2020-12-08 19:39:24 +00:00
Dean Rasheed	25a9e54d2d	Improve estimation of OR clauses using extended statistics. Formerly we only applied extended statistics to an OR clause as part of the clauselist_selectivity() code path for an OR clause appearing in an implicitly-ANDed list of clauses. This meant that it could only use extended statistics if all sub-clauses of the OR clause were covered by a single extended statistics object. Instead, teach clause_selectivity() how to apply extended statistics to an OR clause by handling its ORed list of sub-clauses in a similar manner to an implicitly-ANDed list of sub-clauses, but with different combination rules. This allows one or more extended statistics objects to be used to estimate all or part of the list of sub-clauses. Any remaining sub-clauses are then treated as if they are independent. Additionally, to avoid double-application of extended statistics, this introduces "extended" versions of clause_selectivity() and clauselist_selectivity(), which include an option to ignore extended statistics. This replaces the old clauselist_selectivity_simple() function which failed to completely ignore extended statistics when called from the extended statistics code. A known limitation of the current infrastructure is that an AND clause under an OR clause is not treated as compatible with extended statistics (because we don't build RestrictInfos for such sub-AND clauses). Thus, for example, "(a=1 AND b=1) OR (a=2 AND b=2)" will currently be treated as two independent AND clauses (each of which may be estimated using extended statistics), but extended statistics will not currently be used to account for any possible overlap between those clauses. Improving that is left as a task for the future. Original patch by Tomas Vondra, with additional improvements by me. Discussion: https://postgr.es/m/20200113230008.g67iyk4cs3xbnjju@development	2020-12-03 10:03:49 +00:00
Alvaro Herrera	3c99230b4f	psql: Display stats target of extended statistics The stats target can be set since commit `d06215d03`, but wasn't shown by psql. Author: Justin Pryzby <justin@telsasoft.com> Discussion: https://postgr.es/m/20200831050047.GG5450@telsasoft.com Reviewed-by: Georgios Kokolatos <gkokolatos@protonmail.com> Reviewed-by: Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp>	2020-09-11 16:15:47 -03:00
Heikki Linnakangas	5bdf694568	Fix typo in test comment.	2020-08-14 10:40:50 +03:00
Tom Lane	0936d1b6ff	Still another try at stabilizing stats_ext test results. The stats_ext test is not expecting that autovacuum will touch any of its tables; an expectation falsified by commit `b07642dbc`. Although I'm suspicious that there's something else going on that makes extended stats estimates not 100% reproducible, it's pretty easy to demonstrate that there are places in this test that fail if an autovacuum updates the table's stats unexpectedly. Hence, revert the band-aid changes made by `2dc16efed` and `24566b359` in favor of summarily disabling autovacuum for all the tables that this test checks estimated rowcounts for. Also remove an evidently obsolete comment at the head of the test. Discussion: https://postgr.es/m/15012.1585623298@sss.pgh.pa.us	2020-03-31 16:09:25 -04:00
David Rowley	24566b359d	Attempt to fix unstable regression tests, take 2 Following up on `2dc16efed`, petalura has suffered some additional failures in stats_ext which again appear to be around the timing of an autovacuum during the test, causing instability in the row estimates. Again, let's fix this by explicitly performing a VACUUM on the table and not leave it to happen by chance of an autovacuum pass. Discussion: https://postgr.es/m/CAApHDvok5hmXr%2BbUbJe7%2B2sQzWo4B_QzSk7RKFR9fP6BjYXx5g%40mail.gmail.com	2020-03-30 23:41:11 +13:00
David Rowley	2dc16efedc	Attempt to fix unstable regression tests `b07642dbc` added code to trigger autovacuums based on the number of inserts into a table. This seems to have caused some regression test results to destabilize. I suspect this is due to autovacuum triggering a vacuum sometime after the test's ANALYZE run and perhaps reltuples is ending up being set to a slightly different value as a result. Attempt to resolve this by running a VACUUM ANALYZE on the affected table instead of just ANALYZE. pg_class.reltuples will still get set to whatever ANALYZE chooses but we should no longer get the proceeding autovacuum overriding that. The overhead this adds to each test's runtime seems small enough not to worry about. I measure 3-4% on stats_ext and can't measure any change in partition_aggregate. I'm unable to recreate the issue locally, so this is a bit of a blind fix. Discussion: https://postgr.es/m/CAApHDvpWmpqYrKwwDQyeDq8dAyK7GMNaxDhrG69CkSuXoEg%2BVg%40mail.gmail.com	2020-03-29 19:36:20 +13:00
Dean Rasheed	87779aa474	Prevent functional dependency estimates from exceeding column estimates. Formerly we applied a functional dependency "a => b with dependency degree f" using the formula P(a,b) = P(a) * [f + (1-f)P(b)] This leads to the possibility that the combined selectivity P(a,b) could exceed P(b), which is not ideal. The addition of support for IN and OR clauses (commits `8f321bd16c` and `ccaa3569f5`) would seem to make this more likely, since the user-supplied values in such clauses are not necessarily compatible with the functional dependency. Mitigate this by using the formula P(a,b) = f Min(P(a), P(b)) + (1-f) * P(a) * P(b) instead, which guarantees that the combined selectivity is less than each column's individual selectivity. Logically, this is modifies the part of the formula that accounts for dependent rows to handle cases where P(a) > P(b), whilst not changing the second term which accounts for independent rows. Additionally, this refactors the way that functional dependencies are applied, so now dependencies_clauselist_selectivity() estimates both the implying clauses and the implied clauses for each functional dependency (formerly only the implied clauses were estimated), and now all clauses for each attribute are taken into account (formerly only one clause for each implied attribute was estimated). This removes the previously built-in assumption that only equality clauses will be seen, which is no longer true, and opens up the possibility of applying functional dependencies to more general clauses. Patch by me, reviewed by Tomas Vondra. Discussion: https://postgr.es/m/CAEZATCXaNFZyOhR4XXAfkvj1tibRBEjje6ZbXwqWUB_tqbH%3Drw%40mail.gmail.com Discussion: https://postgr.es/m/20200318002946.6dvblukm3cfmgir2%40development	2020-03-28 12:48:34 +00:00
Tomas Vondra	ccaa3569f5	Recognize some OR clauses as compatible with functional dependencies Since commit `8f321bd16c` functional dependencies can handle IN clauses, which however introduced a possible (and surprising) inconsistency, because IN clauses may be expressed as an OR clause, which are still considered incompatible. For example a IN (1, 2, 3) may be rewritten as (a = 1 OR a = 2 OR a = 3) The IN clause will work fine with functional dependencies, but the OR clause will force the estimation to fall back to plain per-column estimates, possibly introducing significant estimation errors. This commit recognizes OR clauses equivalent to an IN clause (when all arugments are compatible and reference the same attribute) as a special case, compatible with functional dependencies. This allows applying functional dependencies, just like for IN clauses. This does not eliminate the difference in estimating the clause itself, i.e. IN clause and OR clause still use different formulas. It would be possible to change that (for these special OR clauses), but that's not really about extended statistics - it was always like this. Moreover the errors are usually much smaller compared to ignoring dependencies. Author: Tomas Vondra Reviewed-by: Dean Rasheed Discussion: https://www.postgresql.org/message-id/flat/13902317.Eha0YfKkKy%40pierred-pdoc	2020-03-18 16:41:49 +01:00
Tomas Vondra	d8cfa82d51	Improve test coverage for multi-column MCV lists The regression tests for extended statistics were not testing a couple of important cases for the MCV lists: * IS NOT NULL clauses - We did have queries with IS NULL clauses, but not the negative case. * clauses with variable on the right - All the clauses had the Var on the left, i.e. (Var op Const), so this adds (Const op Var) too. * columns with fixed-length types passed by reference - All columns were using either by-value or varlena types, so add a test with UUID columns too. This matters for (de)serialization. * NULL-only dimension - When one of the columns contains only NULL values, we treat it a a special case during (de)serialization. * arrays containing NULL - When the constant parameter contains NULL value, we need to handle it correctly during estimation, for all IN, ANY and ALL clauses. Discussion: https://www.postgresql.org/message-id/flat/20200113230008.g67iyk4cs3xbnjju@development Author: Tomas Vondra	2020-03-14 23:09:40 +01:00
Tomas Vondra	f9696782c7	Improve test coverage for functional dependencies The regression tests for functional dependencies were only using clauses of the form (Var op Const), i.e. with Var on the left side. This adds a couple of queries with Var on the right, to test other code paths. It also prints one of the functional dependencies, to test the data type output function. The functional dependencies are "perfect" with degree of 1.0 so this should be stable. Discussion: https://www.postgresql.org/message-id/flat/20200113230008.g67iyk4cs3xbnjju@development Author: Tomas Vondra	2020-03-14 23:06:23 +01:00
Tomas Vondra	e83daa7e33	Use multi-variate MCV lists to estimate ScalarArrayOpExpr Commit `8f321bd16c` added support for estimating ScalarArrayOpExpr clauses (IN/ANY) clauses using functional dependencies. There's no good reason not to support estimation of these clauses using multi-variate MCV lists too, so this commits implements that. That makes the behavior consistent and MCV lists can estimate all variants (ANY/ALL, inequalities, ...). Author: Tomas Vondra Review: Dean Rasheed Discussion: https://www.postgresql.org/message-id/flat/13902317.Eha0YfKkKy%40pierred-pdoc	2020-03-14 16:13:00 +01:00
Tomas Vondra	8f321bd16c	Use functional dependencies to estimate ScalarArrayOpExpr Until now functional dependencies supported only simple equality clauses and clauses that can be trivially translated to equalities. This commit allows estimation of some ScalarArrayOpExpr (IN/ANY) clauses. For IN clauses we can do this thanks to using operator with equality semantics, which means an IN clause WHERE c IN (1, 2, ..., N) can be translated to WHERE (c = 1 OR c = 2 OR ... OR c = N) IN clauses are now considered compatible with functional dependencies, and rely on the same assumption of consistency of queries with data (which is an assumption we already used for simple equality clauses). This applies also to ALL clauses with an equality operator, which can be considered equivalent to IN clause. ALL clauses are still considered incompatible, although there's some discussion about maybe relaxing this in the future. Author: Pierre Ducroquet Reviewed-by: Tomas Vondra, Dean Rasheed Discussion: https://www.postgresql.org/message-id/flat/13902317.Eha0YfKkKy%40pierred-pdoc	2020-03-14 16:12:41 +01:00
Tomas Vondra	eae056c19e	Apply multiple multivariate MCV lists when possible Until now we've only used a single multivariate MCV list per relation, covering the largest number of clauses. So for example given a query SELECT * FROM t WHERE a = 1 AND b =1 AND c = 1 AND d = 1 and extended statistics on (a,b) and (c,d), we'd only pick and use one of them. This commit improves this by repeatedly picking and applying the best statistics (matching the largest number of remaining clauses) until no additional statistics is applicable. This greedy algorithm is simple, but may not be optimal. A different choice of statistics may leave fewer clauses unestimated and/or give better estimates for some other reason. This can however happen only when there are overlapping statistics, and selecting one makes it impossible to use the other. E.g. with statistics on (a,b), (c,d), (b,c,d), we may pick either (a,b) and (c,d) or (b,c,d). But it's not clear which option is the best one. We however assume cases like this are rare, and the easiest solution is to define statistics covering the whole group of correlated columns. In the future we might support overlapping stats, using some of the clauses as conditions (in conditional probability sense). Author: Tomas Vondra Reviewed-by: Mark Dilger, Kyotaro Horiguchi Discussion: https://postgr.es/m/20191028152048.jc6pqv5hb7j77ocp@development	2020-01-13 01:21:17 +01:00
Tomas Vondra	aaa6761876	Apply all available functional dependencies When considering functional dependencies during selectivity estimation, it's not necessary to bother with selecting the best extended statistic object and then use just dependencies from it. We can simply consider all applicable functional dependencies at once. This means we need to deserialie all (applicable) dependencies before applying them to the clauses. This is a bit more expensive than picking the best statistics and deserializing dependencies for it. To minimize the additional cost, we ignore statistics that are not applicable. Author: Tomas Vondra Reviewed-by: Mark Dilger Discussion: https://postgr.es/m/20191028152048.jc6pqv5hb7j77ocp@development	2020-01-13 01:21:06 +01:00
Tomas Vondra	c676e659b2	Fix choose_best_statistics to check clauses individually When picking the best extended statistics object for a list of clauses, it's not enough to look at attnums extracted from the clause list as a whole. Consider for example this query with OR clauses: SELECT * FROM t WHERE (t.a = 1) OR (t.b = 1) OR (t.c = 1) with a statistics defined on columns (a,b). Relying on attnums extracted from the whole OR clause, we'd consider the statistics usable. That does not work, as we see the conditions as a single OR-clause, referencing an attribute not covered by the statistic, leading to empty list of clauses to be estimated using the statistics and an assert failure. This changes choose_best_statistics to check which clauses are actually covered, and only using attributes from the fully covered ones. For the previous example this means the statistics object will not be considered as compatible with the OR-clause. Backpatch to 12, where MCVs were introduced. The issue does not affect older versions because functional dependencies don't handle OR clauses. Author: Tomas Vondra Reviewed-by: Dean Rasheed Reported-By: Manuel Rigger Discussion: https://postgr.es/m/CA+u7OA7H5rcE2=8f263w4NZD6ipO_XOrYB816nuLXbmSTH9pQQ@mail.gmail.com Backpatch-through: 12	2019-11-28 22:20:45 +01:00
Tomas Vondra	d482f7f867	Skip system attributes when applying mvdistinct stats When estimating number of distinct groups, we failed to ignore system attributes when matching the group expressions to mvdistinct stats, causing failures like ERROR: negative bitmapset member not allowed Fix that by simply skipping anything that is not a regular attribute. Backpatch to PostgreSQL 10, where the extended stats were introduced. Bug: #16111 Reported-by: Tuomas Leikola Author: Tomas Vondra Backpatch-through: 10 Discussion: https://postgr.es/m/16111-687799584c3a7e73@postgresql.org	2019-11-16 01:17:15 +01:00
Dean Rasheed	3d9a3ef5cb	Fix intermittent self-test failures caused by the stats_ext test. Commit `d7f8d26d9` added new tests to the stats_ext regression test that included creating a view in the public schema, without realising that the stats_ext test runs in the same parallel group as the rules test, which makes doing that unsafe. This led to intermittent failures of the rules test on the buildfarm, although I wasn't able to reproduce that locally. Fix by creating the view in a different schema. Tomas Vondra and Dean Rasheed, report and diagnosis by Thomas Munro. Discussion: https://postgr.es/m/CA+hUKGKX9hFZrYA7rQzAMRE07L4hziCc-nO_b3taJpiuKyLLxg@mail.gmail.com	2019-09-15 13:13:59 +01:00
Tomas Vondra	d06215d03b	Allow setting statistics target for extended statistics When building statistics, we need to decide how many rows to sample and how accurate the resulting statistics should be. Until now, it was not possible to explicitly define statistics target for extended statistics objects, the value was always computed from the per-attribute targets with a fallback to the system-wide default statistics target. That's a bit inconvenient, as it ties together the statistics target set for per-column and extended statistics. In some cases it may be useful to require larger sample / higher accuracy for extended statics (or the other way around), but with this approach that's not possible. So this commit introduces a new command, allowing to specify statistics target for individual extended statistics objects, overriding the value derived from per-attribute targets (and the system default). ALTER STATISTICS stat_name SET STATISTICS target_value; When determining statistics target for an extended statistics object we first look at this explicitly set value. When this value is -1, we fall back to the old formula, looking at the per-attribute targets first and then the system default. This means the behavior is backwards compatible with older PostgreSQL releases. Author: Tomas Vondra Discussion: https://postgr.es/m/20190618213357.vli3i23vpkset2xd@development Reviewed-by: Kirk Jamison, Dean Rasheed	2019-09-11 00:25:51 +02:00
Tomas Vondra	14ef15a222	Don't build extended statistics on inheritance trees When performing ANALYZE on inheritance trees, we collect two samples for each relation - one for the relation alone, and one for the inheritance subtree (relation and its child relations). And then we build statistics on each sample, so for each relation we get two sets of statistics. For regular (per-column) statistics this works fine, because the catalog includes a flag differentiating statistics built from those two samples. But we don't have such flag in the extended statistics catalogs, and we ended up updating the same row twice, triggering this error: ERROR: tuple already updated by self The simplest solution is to disable extended statistics on inheritance trees, which is what this commit is doing. In the future we may need to do something similar to per-column statistics, but that requires adding a flag to the catalog - and that's not backpatchable. Moreover, the current selectivity estimation code only works with individual relations, so building statistics on inheritance trees would be pointless anyway. Author: Tomas Vondra Backpatch-to: 10- Discussion: https://postgr.es/m/20190618231233.GA27470@telsasoft.com Reported-by: Justin Pryzby	2019-07-30 19:47:33 +02:00
Tomas Vondra	e4deae7396	Fix handling of NULLs in MCV items and constants There were two issues in how the extended statistics handled NULL values in opclauses. Firstly, the code was oblivious to the possibility that Const may be NULL (constisnull=true) in which case the constvalue is undefined. We need to treat this as a mismatch, and not call the proc. Secondly, the MCV item itself may contain NULL values too - the code already did check that, and updated the match bitmap accordingly, but failed to ensure we won't call the operator procedure anyway. It did work for AND-clauses, because in that case false in the bitmap stops evaluation of further clauses. But for OR-clauses ir was not easy to get incorrect estimates or even trigger a crash. This fixes both issues by extending the existing check so that it looks at constisnull too, and making sure it skips calling the procedure. Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu	2019-07-18 11:29:38 +02:00
Tomas Vondra	4d66285adc	Fix pg_mcv_list_items() to produce text[] The function pg_mcv_list_items() returns values stored in MCV items. The items may contain columns with different data types, so the function was generating text array-like representation, but in an ad-hoc way without properly escaping various characters etc. Fixed by simply building a text[] array, which also makes it easier to use from queries etc. Requires changes to pg_proc entry, so bump catversion. Backpatch to 12, where multi-column MCV lists were introduced. Author: Tomas Vondra Reviewed-by: Dean Rasheed Discussion: https://postgr.es/m/20190618205920.qtlzcu73whfpfqne@development	2019-07-05 01:32:46 +02:00
Tom Lane	f31111bbe8	Drop test user when done with it. Commit `d7f8d26d9` added a test case that created a user, but forgot to drop it again. This is no good; for one thing, it causes repeated "make installcheck" runs to fail.	2019-06-24 12:36:51 -04:00
Dean Rasheed	d7f8d26d9f	Add security checks to the multivariate MCV estimation code. The multivariate MCV estimation code may run user-defined operators on the values in the MCV list, which means that those operators may potentially leak the values from the MCV list. Guard against leaking data to unprivileged users by checking that the user has SELECT privileges on the table or all of the columns referred to by the statistics. Additionally, if there are any securityQuals on the RTE (either due to RLS policies on the table, or accessing the table via a security barrier view), not all rows may be visible to the current user, even if they have table or column privileges. Thus we further insist that the operator be leakproof in this case. Dean Rasheed, reviewed by Tomas Vondra. Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui=Vdx4N==VV5XOK5dsXfnGgVOz_JhAicB=ZA@mail.gmail.com	2019-06-23 18:50:08 +01:00
Tomas Vondra	6cbfb784c3	Rework the pg_statistic_ext catalog Since extended statistic got introduced in PostgreSQL 10, there was a single catalog pg_statistic_ext storing both the definitions and built statistic. That's however problematic when a user is supposed to have access only to the definitions, but not to user data. Consider for example pg_dump on a database with RLS enabled - if the pg_statistic_ext catalog respects RLS (which it should, if it contains user data), pg_dump would not see any records and the result would not define any extended statistics. That would be a surprising behavior. Until now this was not a pressing issue, because the existing types of extended statistic (functional dependencies and ndistinct coefficients) do not include any user data directly. This changed with introduction of MCV lists, which do include most common combinations of values. The easiest way to fix this is to split the pg_statistic_ext catalog into two - one for definitions, one for the built statistic values. The new catalog is called pg_statistic_ext_data, and we're maintaining a 1:1 relationship with the old catalog - either there are matching records in both catalogs, or neither of them. Bumped CATVERSION due to changing system catalog definitions. Author: Dean Rasheed, with improvements by me Reviewed-by: Dean Rasheed, John Naylor Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com	2019-06-16 01:20:31 +02:00
Noah Misch	f2c71cb71f	Stop using spelling "nonexistant". The documentation used "nonexistent" exclusively, and the source tree used it three times as often as "nonexistant".	2019-06-08 10:12:26 -07:00
Tomas Vondra	dbb984128e	Convert pre-existing stats_ext tests to new style The regression tests added in commit `7300a69950` test cardinality estimates using a function that extracts the interesting pieces from the EXPLAIN output, instead of testing the whole plan. That seems both easier to understand and less fragile, so this applies the same approach to pre-existing tests of ndistinct coefficients and functional dependencies. Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com	2019-04-16 00:02:22 +02:00
Tomas Vondra	3824ca30d1	Fix pg_mcv_list deserialization The memcpy() was copying type OIDs in the wrong direction, so the deserialized MCV list always had them as 0. This is mostly harmless except when printing the data in pg_mcv_list_items(), in which case it reported ERROR: cache lookup failed for type 0 Also added a simple regression test for pg_mcv_list_items() function, printing a single-item MCV list. Reported-By: Dean Rasheed Discussion: https://postgr.es/m/CAEZATCX6T0iDTTZrqyec4Cd6b4yuL7euu4=rQRXaVBAVrUi1Cg@mail.gmail.com	2019-04-16 00:01:39 +02:00
Tomas Vondra	7300a69950	Add support for multivariate MCV lists Introduce a third extended statistic type, supported by the CREATE STATISTICS command - MCV lists, a generalization of the statistic already built and used for individual columns. Compared to the already supported types (n-distinct coefficients and functional dependencies), MCV lists are more complex, include column values and allow estimation of much wider range of common clauses (equality and inequality conditions, IS NULL, IS NOT NULL etc.). Similarly to the other types, a new pseudo-type (pg_mcv_list) is used. Author: Tomas Vondra Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com	2019-03-27 18:32:18 +01:00
Tom Lane	940311e4bb	Un-hide most cascaded-drop details in regression test results. Now that the ordering of DROP messages ought to be stable everywhere, we should not need these kluges of hiding DETAIL output just to avoid unstable ordering. Hiding it's not great for test coverage, so let's undo that where possible. In a small number of places, it's necessary to leave it in, for example because the output might include a variable pg_temp_nnn schema name. I also left things alone in places where the details would depend on other regression test scripts, e.g. plpython_drop.sql. Perhaps buildfarm experience will show this to be a bad idea, but if so I'd like to know why. Discussion: https://postgr.es/m/E1h6eep-0001Mw-Vd@gemulon.postgresql.org	2019-03-24 19:15:37 -04:00
Peter Eisentraut	821fb8cdbf	Message style fixes	2017-09-11 11:21:27 -04:00
Tom Lane	8e7537261c	Suppress less info in regression tests using DROP CASCADE. DROP CASCADE doesn't currently promise to visit dependent objects in a fixed order, so when the regression tests use it, we typically need to suppress the details of which objects get dropped in order to have predictable test output. Traditionally we've done that by setting client_min_messages higher than NOTICE, but there's a better way: we can "\set VERBOSITY terse" in psql. That suppresses the DETAIL message with the object list, but we still get the basic notice telling how many objects were dropped. So at least the test case can verify that the expected number of objects were dropped. The VERBOSITY method was already in use in a few places, but run around and use it wherever it makes sense. Discussion: https://postgr.es/m/10766.1501608885@sss.pgh.pa.us	2017-08-01 16:49:23 -04:00
Alvaro Herrera	5dfd564b10	Fix IF NOT EXISTS in CREATE STATISTICS I misplaced the IF NOT EXISTS clause in commit `7b504eb282`, before the word STATISTICS. Put it where it belongs. Patch written independently by Amit Langote and myself. I adopted his submitted test case with a slight edit also. Reported-by: Bruno Wolff III Discussion: https://postgr.es/m/20170621004237.GB8337@wolff.to	2017-06-22 13:17:08 -04:00
Tom Lane	b5b0db19b8	Fix handling of extended statistics during ALTER COLUMN TYPE. ALTER COLUMN TYPE on a column used by a statistics object fails since commit `928c4de30`, because the relevant switch in ATExecAlterColumnType is unprepared for columns to have dependencies from OCLASS_STATISTIC_EXT objects. Although the existing types of extended statistics don't actually need us to do any work for a column type change, it seems completely indefensible that that assumption is hidden behind the failure of an unrelated module to contain any code for the case. Hence, create and call an API function in statscmds.c where the assumption can be explained, and where we could add code to deal with the problem when it inevitably becomes real. Also, the reason this wasn't handled before, neither for extended stats nor for the last half-dozen new OCLASS kinds :-(, is that the default: in that switch suppresses compiler warnings, allowing people to miss the need to consider it when adding an OCLASS. We don't really need a default because surely getObjectClass should only return valid values of the enum; so remove it, and add the missed OCLASS entries where they should be. Discussion: https://postgr.es/m/20170512221010.nglatgt5azzdxjlj@alvherre.pgsql	2017-05-14 12:22:25 -04:00
Tom Lane	f04c9a6146	Standardize terminology for pg_statistic_ext entries. Consistently refer to such an entry as a "statistics object", not just "statistics" or "extended statistics". Previously we had a mismash of terms, accompanied by utter confusion as to whether the term was singular or plural. That's not only grating (at least to the ear of a native English speaker) but could be outright misleading, eg in error messages that seemed to be referring to multiple objects where only one could be meant. This commit fixes the code and a lot of comments (though I may have missed a few). I also renamed two new SQL functions, pg_get_statisticsextdef -> pg_get_statisticsobjdef pg_statistic_ext_is_visible -> pg_statistics_obj_is_visible to conform better with this terminology. I have not touched the SGML docs other than fixing those function names; the docs certainly need work but it seems like a separable task. Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us	2017-05-14 10:55:01 -04:00
Tom Lane	928c4de309	Fix dependencies for extended statistics objects. A stats object ought to have a dependency on each individual column it reads, not the entire table. Doing this honestly lets us get rid of the hard-wired logic in RemoveStatisticsExt, which seems to have been misguidedly modeled on RemoveStatistics; and it will be far easier to extend to multiple tables later. Also, add overlooked dependency on owner, and make the dependency on schema be NORMAL like every other such dependency. There remains some unfinished work here, which is to allow statistics objects to be extension members. That takes more effort than just adding the dependency call, though, so I left it out for now. initdb forced because this changes the set of pg_depend records that should exist for a statistics object. Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us	2017-05-12 16:26:31 -04:00

1 2

60 Commits