Commit Graph

96 Commits

Author SHA1 Message Date
David Rowley 421dfb41a8 Fix incorrectly reported stats kind in "can't happen" ERROR
The error message(s) were reporting the stats kind of 'f', which is not
correct as that's for the "dependencies" statistics kind.

Reported-by: Horst Reiterer
Reviewed-by: Richard Guo
Discussion: https://postgr.es/m/18375-ba99383eb9062d6a@postgresql.org
Backpatch-through: 12, where MCV extended stats were added.
2024-03-05 16:19:05 +13:00
Tom Lane 476f9d5330 Partially undo commit 94da73281.
On closer inspection, mcv.c isn't as broken for ScalarArrayOpExpr
as I thought.  The Var-on-right issue is real enough, but actually
it does cope fine with a NULL array constant --- I was misled by
an XXX comment suggesting it didn't.  Undo that part of the code
change, and replace the XXX comment with something less misleading.
2022-08-05 15:57:46 -04:00
Tom Lane c102d11067 Fix non-bulletproof ScalarArrayOpExpr code for extended statistics.
statext_is_compatible_clause_internal() checked that the arguments
of a ScalarArrayOpExpr are one Var and one Const, but it would allow
cases where the Const was on the left.  Subsequent uses of the clause
are not expecting that and would suffer assertion failures or core
dumps.  mcv.c also had not bothered to cope with the case of a NULL
array constant, which seems really unacceptably sloppy of somebody.
(Although our tools failed us there too, since AFAIK neither Coverity
nor any compiler warned of the obvious use-of-uninitialized-variable
condition.)  It seems best to handle that by having
statext_is_compatible_clause_internal() reject it.

Noted while fixing bug #17570.  Back-patch to v13 where the
extended stats code grew some awareness of ScalarArrayOpExpr.
2022-08-05 13:58:49 -04:00
Tomas Vondra d6817032d2 Build inherited extended stats on partitioned tables
Commit 859b3003de disabled building of extended stats for inheritance
trees, to prevent updating the same catalog row twice. While that
resolved the issue, it also means there are no extended stats for
declaratively partitioned tables, because there are no data in the
non-leaf relations.

That also means declaratively partitioned tables were not affected by
the issue 859b3003de addressed, which means this is a regression
affecting queries that calculate estimates for the whole inheritance
tree as a whole (which includes e.g. GROUP BY queries).

But because partitioned tables are empty, we can invert the condition
and build statistics only for the case with inheritance, without losing
anything. And we can consider them when calculating estimates.

It may be necessary to run ANALYZE on partitioned tables, to collect
proper statistics. For declarative partitioning there should no prior
statistics, and it might take time before autoanalyze is triggered. For
tables partitioned by inheritance the statistics may include data from
child relations (if built 859b3003de), contradicting the current code.

Report and patch by Justin Pryzby, minor fixes and cleanup by me.
Backpatch all the way back to PostgreSQL 10, where extended statistics
were introduced (same as 859b3003de).

Author: Justin Pryzby
Reported-by: Justin Pryzby
Backpatch-through: 10
Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com
2022-01-15 19:14:00 +01:00
Tomas Vondra acfde7c583 Ignore extended statistics for inheritance trees
Since commit 859b3003de we only build extended statistics for individual
relations, ignoring the child relations. This resolved the issue with
updating catalog tuple twice, but we still tried to use the statistics
when calculating estimates for the whole inheritance tree. When the
relations contain very distinct data, it may produce bogus estimates.

This is roughly the same issue 427c6b5b9 addressed ~15 years ago, and we
fix it the same way - by ignoring extended statistics when calculating
estimates for the inheritance tree as a whole. We still consider
extended statistics when calculating estimates for individual child
relations, of course.

This may result in plan changes due to different estimates, but if the
old statistics were not describing the inheritance tree particularly
well it's quite likely the new plans is actually better.

Report and patch by Justin Pryzby, minor fixes and cleanup by me.
Backpatch all the way back to PostgreSQL 10, where extended statistics
were introduced (same as 859b3003de).

Author: Justin Pryzby
Reported-by: Justin Pryzby
Backpatch-through: 10
Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com
2022-01-15 02:30:06 +01:00
Tomas Vondra c0386f403a Release memory allocated by dependency_degree
Calculating degree of a functional dependency may allocate a lot of
memory - we have released mot of the explicitly allocated memory, but
e.g. detoasted varlena values were left behind. That may be an issue,
because we consider a lot of dependencies (all combinations), and the
detoasting may happen for each one again.

Fixed by calling dependency_degree() in a dedicated context, and
resetting it after each call. We only need the calculated dependency
degree, so we don't need to copy anything.

Backpatch to PostgreSQL 10, where extended statistics were introduced.

Backpatch-through: 10
Discussion: https://www.postgresql.org/message-id/20210915200928.GP831%40telsasoft.com
2021-09-23 18:34:01 +02:00
Tomas Vondra b564eb0181 Free memory after building each statistics object
Until now, all extended statistics on a given relation were built in the
same memory context, without resetting. Some of the memory was released
explicitly, but not all of it - for example memory allocated while
detoasting values is hard to free. This is how it worked since extended
statistics were introduced in PostgreSQL 10, but adding support for
extended stats on expressions made the issue somewhat worse as it
increases the number of statistics to build.

Fixed by adding a memory context which gets reset after building each
statistics object (all the statistics kinds included in it). Resetting
it after building each statistics kind would be even better, but it
would require more invasive changes and copying of results, making it
harder to backpatch.

Backpatch to PostgreSQL 10, where extended statistics were introduced.

Author: Justin Pryzby
Reported-by: Justin Pryzby
Reviewed-by: Tomas Vondra
Backpatch-through: 10
Discussion: https://www.postgresql.org/message-id/20210915200928.GP831%40telsasoft.com
2021-09-23 18:33:59 +02:00
Tomas Vondra 1faaad260d Use correct spelling of statistics kind
A couple error messages and comments used 'statistic kind', not the
correct 'statistics kind'. Fix and backpatch all the way back to 10,
where extended statistics were introduced.

Backpatch-through: 10
2021-03-23 05:00:19 +01:00
Tom Lane fa27dd40d5 Run pgindent with new pg_bsd_indent version 2.1.1.
Thomas Munro fixed a longstanding annoyance in pg_bsd_indent, that
it would misformat lines containing IsA() macros on the assumption
that the IsA() call should be treated like a cast.  This improves
some other cases involving field/variable names that match typedefs,
too.  The only places that get worse are a couple of uses of the
OpenSSL macro STACK_OF(); we'll gladly take that trade-off.

Discussion: https://postgr.es/m/20200114221814.GA19630@alvherre.pgsql
2020-05-16 11:54:51 -04:00
Tom Lane 5cbfce562f Initial pgindent and pgperltidy run for v13.
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.

Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
2020-05-14 13:06:50 -04:00
David Rowley 02a2e8b442 Modify additional power 2 calculations to use new helper functions
2nd pass of modifying various places which obtain the next power
of 2 of a number and make them use the new functions added in
f0705bb62.

In passing, also modify num_combinations(). This can be implemented
using simple bitshifting rather than looping.

Reviewed-by: John Naylor
Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org
2020-04-08 18:29:51 +12:00
Tom Lane fe3036527a Fix race condition in statext_store().
Must hold some lock on the pg_statistic_ext_data catalog *before*
we look up the tuple we aim to replace.  Otherwise a concurrent
VACUUM FULL or similar operation could move it to a different TID,
leaving us trying to replace the wrong tuple.

Back-patch to v12 where this got broken.

Credit goes to Dean Rasheed; I'm just doing the clerical work.

Discussion: https://postgr.es/m/CAEZATCU0zHMDiQV0g8P2U+YSP9C1idUPrn79DajsbonwkN0xvQ@mail.gmail.com
2020-03-31 17:06:22 -04:00
Dean Rasheed 87779aa474 Prevent functional dependency estimates from exceeding column estimates.
Formerly we applied a functional dependency "a => b with dependency
degree f" using the formula

  P(a,b) = P(a) * [f + (1-f)*P(b)]

This leads to the possibility that the combined selectivity P(a,b)
could exceed P(b), which is not ideal. The addition of support for IN
and OR clauses (commits 8f321bd16c and ccaa3569f5) would seem to make
this more likely, since the user-supplied values in such clauses are
not necessarily compatible with the functional dependency.

Mitigate this by using the formula

  P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b)

instead, which guarantees that the combined selectivity is less than
each column's individual selectivity. Logically, this is modifies the
part of the formula that accounts for dependent rows to handle cases
where P(a) > P(b), whilst not changing the second term which accounts
for independent rows.

Additionally, this refactors the way that functional dependencies are
applied, so now dependencies_clauselist_selectivity() estimates both
the implying clauses and the implied clauses for each functional
dependency (formerly only the implied clauses were estimated), and now
all clauses for each attribute are taken into account (formerly only
one clause for each implied attribute was estimated). This removes the
previously built-in assumption that only equality clauses will be
seen, which is no longer true, and opens up the possibility of
applying functional dependencies to more general clauses.

Patch by me, reviewed by Tomas Vondra.

Discussion: https://postgr.es/m/CAEZATCXaNFZyOhR4XXAfkvj1tibRBEjje6ZbXwqWUB_tqbH%3Drw%40mail.gmail.com
Discussion: https://postgr.es/m/20200318002946.6dvblukm3cfmgir2%40development
2020-03-28 12:48:34 +00:00
Tomas Vondra ccaa3569f5 Recognize some OR clauses as compatible with functional dependencies
Since commit 8f321bd16c functional dependencies can handle IN clauses,
which however introduced a possible (and surprising) inconsistency,
because IN clauses may be expressed as an OR clause, which are still
considered incompatible. For example

  a IN (1, 2, 3)

may be rewritten as

  (a = 1 OR a = 2 OR a = 3)

The IN clause will work fine with functional dependencies, but the OR
clause will force the estimation to fall back to plain per-column
estimates, possibly introducing significant estimation errors.

This commit recognizes OR clauses equivalent to an IN clause (when all
arugments are compatible and reference the same attribute) as a special
case, compatible with functional dependencies. This allows applying
functional dependencies, just like for IN clauses.

This does not eliminate the difference in estimating the clause itself,
i.e. IN clause and OR clause still use different formulas. It would be
possible to change that (for these special OR clauses), but that's not
really about extended statistics - it was always like this. Moreover the
errors are usually much smaller compared to ignoring dependencies.

Author: Tomas Vondra
Reviewed-by: Dean Rasheed
Discussion: https://www.postgresql.org/message-id/flat/13902317.Eha0YfKkKy%40pierred-pdoc
2020-03-18 16:41:49 +01:00
Tomas Vondra 6f72dbc48b Fix wording of several extended stats comments
Reported-by: Thomas Munro
Discussion: https://www.postgresql.org/message-id/flat/20200113230008.g67iyk4cs3xbnjju@development
2020-03-18 13:40:13 +01:00
Tomas Vondra e83daa7e33 Use multi-variate MCV lists to estimate ScalarArrayOpExpr
Commit 8f321bd16c added support for estimating ScalarArrayOpExpr clauses
(IN/ANY) clauses using functional dependencies. There's no good reason
not to support estimation of these clauses using multi-variate MCV lists
too, so this commits implements that. That makes the behavior consistent
and MCV lists can estimate all variants (ANY/ALL, inequalities, ...).

Author: Tomas Vondra
Review: Dean Rasheed
Discussion: https://www.postgresql.org/message-id/flat/13902317.Eha0YfKkKy%40pierred-pdoc
2020-03-14 16:13:00 +01:00
Tomas Vondra 8f321bd16c Use functional dependencies to estimate ScalarArrayOpExpr
Until now functional dependencies supported only simple equality clauses
and clauses that can be trivially translated to equalities. This commit
allows estimation of some ScalarArrayOpExpr (IN/ANY) clauses.

For IN clauses we can do this thanks to using operator with equality
semantics, which means an IN clause

    WHERE c IN (1, 2, ..., N)

can be translated to

    WHERE (c = 1 OR c = 2 OR ... OR c = N)

IN clauses are now considered compatible with functional dependencies,
and rely on the same assumption of consistency of queries with data
(which is an assumption we already used for simple equality clauses).
This applies also to ALL clauses with an equality operator, which can be
considered equivalent to IN clause.

ALL clauses are still considered incompatible, although there's some
discussion about maybe relaxing this in the future.

Author: Pierre Ducroquet
Reviewed-by: Tomas Vondra, Dean Rasheed
Discussion: https://www.postgresql.org/message-id/flat/13902317.Eha0YfKkKy%40pierred-pdoc
2020-03-14 16:12:41 +01:00
Peter Eisentraut 3c173a53a8 Remove utils/acl.h from catalog/objectaddress.h
The need for this was removed by
8b9e9644dc.

A number of files now need to include utils/acl.h or
parser/parse_node.h explicitly where they previously got it indirectly
somehow.

Since parser/parse_node.h already includes nodes/parsenodes.h, the
latter is then removed where the former was added.  Also, remove
nodes/pg_list.h from objectaddress.h, since that's included via
nodes/parsenodes.h.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/7601e258-26b2-8481-36d0-dc9dca6f28f1%402ndquadrant.com
2020-03-10 10:27:00 +01:00
Alvaro Herrera a166d408eb Report progress of ANALYZE commands
This uses the progress reporting infrastructure added by c16dc1aca5,
adding support for ANALYZE.

Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Co-authored-by: Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp>
Reviewed-by: Julien Rouhaud, Robert Haas, Anthony Nowocien, Kyotaro Horiguchi,
	Vignesh C, Amit Langote
2020-01-15 11:14:39 -03:00
Tomas Vondra eae056c19e Apply multiple multivariate MCV lists when possible
Until now we've only used a single multivariate MCV list per relation,
covering the largest number of clauses. So for example given a query

    SELECT * FROM t WHERE a = 1 AND b =1 AND c = 1 AND d = 1

and extended statistics on (a,b) and (c,d), we'd only pick and use one
of them. This commit improves this by repeatedly picking and applying
the best statistics (matching the largest number of remaining clauses)
until no additional statistics is applicable.

This greedy algorithm is simple, but may not be optimal. A different
choice of statistics may leave fewer clauses unestimated and/or give
better estimates for some other reason.

This can however happen only when there are overlapping statistics, and
selecting one makes it impossible to use the other. E.g. with statistics
on (a,b), (c,d), (b,c,d), we may pick either (a,b) and (c,d) or (b,c,d).
But it's not clear which option is the best one.

We however assume cases like this are rare, and the easiest solution is
to define statistics covering the whole group of correlated columns. In
the future we might support overlapping stats, using some of the clauses
as conditions (in conditional probability sense).

Author: Tomas Vondra
Reviewed-by: Mark Dilger, Kyotaro Horiguchi
Discussion: https://postgr.es/m/20191028152048.jc6pqv5hb7j77ocp@development
2020-01-13 01:21:17 +01:00
Tomas Vondra aaa6761876 Apply all available functional dependencies
When considering functional dependencies during selectivity estimation,
it's not necessary to bother with selecting the best extended statistic
object and then use just dependencies from it. We can simply consider
all applicable functional dependencies at once.

This means we need to deserialie all (applicable) dependencies before
applying them to the clauses. This is a bit more expensive than picking
the best statistics and deserializing dependencies for it. To minimize
the additional cost, we ignore statistics that are not applicable.

Author: Tomas Vondra
Reviewed-by: Mark Dilger
Discussion: https://postgr.es/m/20191028152048.jc6pqv5hb7j77ocp@development
2020-01-13 01:21:06 +01:00
Bruce Momjian 7559d8ebfa Update copyrights for 2020
Backpatch-through: update all files in master, backpatch legal files through 9.4
2020-01-01 12:21:45 -05:00
Michael Paquier 7854e07f25 Revert "Rename files and headers related to index AM"
This follows multiple complains from Peter Geoghegan, Andres Freund and
Alvaro Herrera that this issue ought to be dug more before actually
happening, if it happens.

Discussion: https://postgr.es/m/20191226144606.GA5659@alvherre.pgsql
2019-12-27 08:09:00 +09:00
Michael Paquier 8ce3aa9b59 Rename files and headers related to index AM
The following renaming is done so as source files related to index
access methods are more consistent with table access methods (the
original names used for index AMs ware too generic, and could be
confused as including features related to table AMs):
- amapi.h -> indexam.h.
- amapi.c -> indexamapi.c.  Here we have an equivalent with
backend/access/table/tableamapi.c.
- amvalidate.c -> indexamvalidate.c.
- amvalidate.h -> indexamvalidate.h.
- genam.c -> indexgenam.c.
- genam.h -> indexgenam.h.

This has been discussed during the development of v12 when table AM was
worked on, but the renaming never happened.

Author: Michael Paquier
Reviewed-by: Fabien Coelho, Julien Rouhaud
Discussion: https://postgr.es/m/20191223053434.GF34339@paquier.xyz
2019-12-25 10:23:39 +09:00
Tomas Vondra 6d61c3f1cb Remove unnecessary clauses_attnums variable
Commit c676e659b2 reworked how choose_best_statistics() picks the best
extended statistics, but failed to remove clauses_attnums which is now
unnecessary. So get rid of it and backpatch to 12, same as c676e659b2.

Author: Tomas Vondra
Discussion: https://postgr.es/m/CA+u7OA7H5rcE2=8f263w4NZD6ipO_XOrYB816nuLXbmSTH9pQQ@mail.gmail.com
Backpatch-through: 12
2019-11-28 23:25:14 +01:00
Tomas Vondra c676e659b2 Fix choose_best_statistics to check clauses individually
When picking the best extended statistics object for a list of clauses,
it's not enough to look at attnums extracted from the clause list as a
whole. Consider for example this query with OR clauses:

   SELECT * FROM t WHERE (t.a = 1) OR (t.b = 1) OR (t.c = 1)

with a statistics defined on columns (a,b). Relying on attnums extracted
from the whole OR clause, we'd consider the statistics usable. That does
not work, as we see the conditions as a single OR-clause, referencing an
attribute not covered by the statistic, leading to empty list of clauses
to be estimated using the statistics and an assert failure.

This changes choose_best_statistics to check which clauses are actually
covered, and only using attributes from the fully covered ones. For the
previous example this means the statistics object will not be considered
as compatible with the OR-clause.

Backpatch to 12, where MCVs were introduced. The issue does not affect
older versions because functional dependencies don't handle OR clauses.

Author: Tomas Vondra
Reviewed-by: Dean Rasheed
Reported-By: Manuel Rigger
Discussion: https://postgr.es/m/CA+u7OA7H5rcE2=8f263w4NZD6ipO_XOrYB816nuLXbmSTH9pQQ@mail.gmail.com
Backpatch-through: 12
2019-11-28 22:20:45 +01:00
Amit Kapila 14aec03502 Make the order of the header file includes consistent in backend modules.
Similar to commits 7e735035f2 and dddf4cdc33, this commit makes the order
of header file inclusion consistent for backend modules.

In the passing, removed a couple of duplicate inclusions.

Author: Vignesh C
Reviewed-by: Kuntal Ghosh and Amit Kapila
Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
2019-11-12 08:30:16 +05:30
Andres Freund 01368e5d9d Split all OBJS style lines in makefiles into one-line-per-entry style.
When maintaining or merging patches, one of the most common sources
for conflicts are the list of objects in makefiles. Especially when
the split across lines has been changed on both sides, which is
somewhat common due to attempting to stay below 80 columns, those
conflicts are unnecessarily laborious to resolve.

By splitting, and alphabetically sorting, OBJS style lines into one
object per line, conflicts should be less frequent, and easier to
resolve when they still occur.

Author: Andres Freund
Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de
2019-11-05 14:41:07 -08:00
Michael Paquier 6ca86bb7e9 Fix typos in the code
Author: Vignesh C
Reviewed-by: Dilip Kumar, Michael Paquier
Discussion: https://postgr.es/m/CALDaNm0ni+GAOe4+fbXiOxNrVudajMYmhJFtXGX-zBPoN8ixhw@mail.gmail.com
2019-10-30 10:03:00 +09:00
Tomas Vondra d06215d03b Allow setting statistics target for extended statistics
When building statistics, we need to decide how many rows to sample and
how accurate the resulting statistics should be. Until now, it was not
possible to explicitly define statistics target for extended statistics
objects, the value was always computed from the per-attribute targets
with a fallback to the system-wide default statistics target.

That's a bit inconvenient, as it ties together the statistics target set
for per-column and extended statistics. In some cases it may be useful
to require larger sample / higher accuracy for extended statics (or the
other way around), but with this approach that's not possible.

So this commit introduces a new command, allowing to specify statistics
target for individual extended statistics objects, overriding the value
derived from per-attribute targets (and the system default).

  ALTER STATISTICS stat_name SET STATISTICS target_value;

When determining statistics target for an extended statistics object we
first look at this explicitly set value. When this value is -1, we fall
back to the old formula, looking at the per-attribute targets first and
then the system default. This means the behavior is backwards compatible
with older PostgreSQL releases.

Author: Tomas Vondra
Discussion: https://postgr.es/m/20190618213357.vli3i23vpkset2xd@development
Reviewed-by: Kirk Jamison, Dean Rasheed
2019-09-11 00:25:51 +02:00
Robert Haas 8b94dab066 Split tuptoaster.c into three separate files.
detoast.c/h contain functions required to detoast a datum, partially
or completely, plus a few other utility functions for examining the
size of toasted datums.

toast_internals.c/h contain functions that are used internally to the
TOAST subsystem but which (mostly) do not need to be accessed from
outside.

heaptoast.c/h contains code that is intrinsically specific to the
heap AM, either because it operates on HeapTuples or is based on the
layout of a heap page.

detoast.c and toast_internals.c are placed in
src/backend/access/common rather than src/backend/access/heap.  At
present, both files still have dependencies on the heap, but that will
be improved in a future commit.

Patch by me, reviewed and tested by Prabhat Sabu, Thomas Munro,
Andres Freund, and Álvaro Herrera.

Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com
2019-09-05 13:15:10 -04:00
Michael Paquier c96581abe4 Fix inconsistencies and typos in the tree, take 11
This fixes various typos in docs and comments, and removes some orphaned
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/5da8e325-c665-da95-21e0-c8a99ea61fbf@gmail.com
2019-08-19 16:21:39 +09:00
Andres Freund 6a04d345fd Don't include utils/array.h from acl.h.
For most uses of acl.h the details of how "Acl" internally looks like
are irrelevant. It might make sense to move a lot of the
implementation details into a separate header at a later point.

The main motivation of this change is to avoid including fmgr.h (via
array.h, which needs it for exposed structs) in a lot of files that
otherwise don't need it. A subsequent commit will remove the fmgr.h
include from a lot of files.

Directly include utils/array.h and utils/expandeddatum.h from the
files that need them, but previously included them indirectly, via
acl.h.

Author: Andres Freund
Discussion: https://postgr.es/m/20190803193733.g3l3x3o42uv4qj7l@alap3.anarazel.de
2019-08-16 10:33:30 -07:00
Tomas Vondra a63378a03e Use column collation for extended statistics
The current extended statistics code was a bit confused which collation
to use.  When building the statistics, the collations defined as default
for the data types were used (since commit 5e0928005).  The MCV code was
however using the column collations for MCV serialization, and then
DEFAULT_COLLATION_OID when computing estimates. So overall the code was
using all three possible options, inconsistently.

This uses the column colation everywhere - this makes it consistent with
what 5e0928005 did for regular stats.  We however do not track the
collations in a catalog, because we can derive them from column-level
information.  This may need to change in the future, e.g. after allowing
statistics on expressions.

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu
Backpatch-to: 12
2019-07-20 16:37:37 +02:00
Tomas Vondra e38a55ba46 Rework examine_opclause_expression to use varonleft
The examine_opclause_expression function needs to return information on
which side of the operator we found the Var, but the variable was called
"isgt" which is rather misleading (it assumes the operator is either
less-than or greater-than, but it may be equality or something else).
Other places in the planner use a variable called "varonleft" for this
purpose, so just adopt the same convention here.

The code also assumed we don't care about this flag for equality, as
(Var = Const) and (Const = Var) should be the same thing. But that does
not work for cross-type operators, in which case we need to pass the
parameters to the procedure in the right order. So just use the same
code for all types of expressions.

This means we don't need to care about the selectivity estimation
function anymore, at least not in this code. We should only get the
supported cases here (thanks to statext_is_compatible_clause).

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu
Backpatch-to: 12
2019-07-20 16:37:30 +02:00
Michael Paquier 70a33b2109 Fix typo in mvdistinct.c
Noticed while browsing the code.
2019-07-19 08:50:14 +09:00
Tomas Vondra 7d24f6a490 Simplify bitmap updates in multivariate MCV code
When evaluating clauses on a multivariate MCV list, we build a bitmap
tracking how the clauses match each item of the MCV list.  When updating
the bitmap we need to consider the current value (tracking how the item
matches preceding clauses), match for the current clause and whether the
clauses are connected by AND or OR.

Until now the logic was copied on every place updating the bitmap, which
was not quite readable.  So just move it to a separate function and call
it where needed.

Backpatch to 12, where the code was introduced. While not a bugfix, this
should make maintenance and future backpatches easier.

Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu
2019-07-18 11:29:38 +02:00
Tomas Vondra e4deae7396 Fix handling of NULLs in MCV items and constants
There were two issues in how the extended statistics handled NULL values
in opclauses. Firstly, the code was oblivious to the possibility that
Const may be NULL (constisnull=true) in which case the constvalue is
undefined. We need to treat this as a mismatch, and not call the proc.

Secondly, the MCV item itself may contain NULL values too - the code
already did check that, and updated the match bitmap accordingly, but
failed to ensure we won't call the operator procedure anyway. It did
work for AND-clauses, because in that case false in the bitmap stops
evaluation of further clauses. But for OR-clauses ir was not easy to
get incorrect estimates or even trigger a crash.

This fixes both issues by extending the existing check so that it looks
at constisnull too, and making sure it skips calling the procedure.

Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu
2019-07-18 11:29:38 +02:00
Tomas Vondra e8b6ae2130 Fix handling of opclauses in extended statistics
We expect opclauses to have exactly one Var and one Const, but the code
was checking the Const by calling is_pseudo_constant_clause() which is
incorrect - we need a proper constant.

Fixed by using plain IsA(x,Const) to check type of the node. We need to
do these checks in two places, so move it into a separate function that
can be called in both places.

Reported by Andreas Seltenreich, based on crash reported by sqlsmith.

Backpatch to v12, where this code was introduced.

Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu
Backpatch-to: 12
2019-07-18 11:29:38 +02:00
Tomas Vondra a4303a078c Remove unnecessary TYPECACHE_GT_OPR lookup
The TYPECACHE_GT_OPR is not needed (it used to be in older version of
the MCV code), but the compiler failed to detect this as the result was
used in a fmgr_info() call, populating a FmgrInfo entry.

Backpatch to v12, where this code was introduced.

Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu
Backpatch-to: 12
2019-07-18 11:29:38 +02:00
Tomas Vondra ef777cb093 Remove unused variable in statext_mcv_serialize()
The itemlen variable used to be referenced in multiple places, but since
reworking the serialization code it's used only in one assert. Fixed by
removing the variable and calling the macro from the assert directly.

Backpatch to 12, where this code was introduced.

Reported-by: Jeff Janes
Discussion: https://postgr.es/m/CAMkU=1zc_ovH9NZd_9ovuiEWkF9yX06URUDdXCmgDydf-bqB5A@mail.gmail.com
2019-07-05 18:51:56 +02:00
Tomas Vondra 08aa131c7a Simplify pg_mcv_list (de)serialization
The serialization format of multivariate MCV lists included alignment in
order to allow direct access to part of the serialized data, but despite
multiple fixes (see for example commits d85e0f366a and ea4e1c0e8f) this
proved to be problematic.

This commit abandons alignment in the serialized format, and just copies
everything during deserialization.  We now also track amount of memory
needed after deserialization (including alignment), which allows us to
deserialize the MCV list in a single pass.

Bump catversion, as this affects contents of pg_statistic_ext_data.

Backpatch to 12, where multi-column MCV lists were introduced.

Author: Tomas Vondra
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/2201.1561521148@sss.pgh.pa.us
2019-07-05 01:32:49 +02:00
Tomas Vondra 4d66285adc Fix pg_mcv_list_items() to produce text[]
The function pg_mcv_list_items() returns values stored in MCV items. The
items may contain columns with different data types, so the function was
generating text array-like representation, but in an ad-hoc way without
properly escaping various characters etc.

Fixed by simply building a text[] array, which also makes it easier to
use from queries etc.

Requires changes to pg_proc entry, so bump catversion.

Backpatch to 12, where multi-column MCV lists were introduced.

Author: Tomas Vondra
Reviewed-by: Dean Rasheed
Discussion: https://postgr.es/m/20190618205920.qtlzcu73whfpfqne@development
2019-07-05 01:32:46 +02:00
Tomas Vondra e365a581c2 Speed-up build of MCV lists with many distinct values
When building multi-column MCV lists, we compute base frequency for each
item, i.e. a product of per-column frequencies for values from the item.
As a value may be in multiple groups, the code was scanning the whole
array of groups while adding items to the MCV list.  This works fine as
long as the number of distinct groups is small, but it's easy to trigger
trigger O(N^2) behavior, especially after increasing statistics target.

This commit precomputes frequencies for values in all columns, so that
when computing the base frequency it's enough to make a simple bsearch
lookup in the array.

Backpatch to 12, where multi-column MCV lists were introduced.

Discussion: https://postgr.es/m/20190618205920.qtlzcu73whfpfqne@development
2019-07-05 01:32:33 +02:00
Dean Rasheed d7f8d26d9f Add security checks to the multivariate MCV estimation code.
The multivariate MCV estimation code may run user-defined operators on
the values in the MCV list, which means that those operators may
potentially leak the values from the MCV list. Guard against leaking
data to unprivileged users by checking that the user has SELECT
privileges on the table or all of the columns referred to by the
statistics.

Additionally, if there are any securityQuals on the RTE (either due to
RLS policies on the table, or accessing the table via a security
barrier view), not all rows may be visible to the current user, even
if they have table or column privileges. Thus we further insist that
the operator be leakproof in this case.

Dean Rasheed, reviewed by Tomas Vondra.

Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui=Vdx4N==VV5XOK5dsXfnGgVOz_JhAicB=ZA@mail.gmail.com
2019-06-23 18:50:08 +01:00
Michael Paquier 3412030205 Fix more typos and inconsistencies in the tree
Author: Alexander Lakhin
Discussion: https://postgr.es/m/0a5419ea-1452-a4e6-72ff-545b1a5a8076@gmail.com
2019-06-17 16:13:16 +09:00
Tomas Vondra 6cbfb784c3 Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic.  That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.

Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics.  That would be a surprising behavior.

Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly.  This changed with introduction
of MCV lists, which do include most common combinations of values.

The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.

Bumped CATVERSION due to changing system catalog definitions.

Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-16 01:20:31 +02:00
Tomas Vondra fe415ff104 Make error logging in extended statistics more consistent
Most errors reported in extended statistics are internal issues, and so
should use elog(). The MCV list code was already following this rule, but
the functional dependencies and ndistinct coefficients were using a mix
of elog() and ereport(). Fix this by changing most places to elog(), with
the exception of input functions.

This is a mostly cosmetic change, it makes the life a little bit easier
for translators, as elog() messages are not translated. So backpatch to
PostgreSQL 10, where extended statistics were introduced.

Author: Tomas Vondra
Backpatch-through: 10 where extended statistics were added
Discussion: https://postgr.es/m/20190503154404.GA7478@alvherre.pgsql
2019-05-30 17:03:36 +02:00
Amit Kapila 9679345f3c Fix typos.
Reported-by: Alexander Lakhin
Author: Alexander Lakhin
Reviewed-by: Amit Kapila and Tom Lane
Discussion: https://postgr.es/m/7208de98-add8-8537-91c0-f8b089e2928c@gmail.com
2019-05-26 18:28:18 +05:30
Tom Lane 8255c7a5ee Phase 2 pgindent run for v12.
Switch to 2.1 version of pg_bsd_indent.  This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.

Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-05-22 13:04:48 -04:00