Commit Graph

68 Commits

Author SHA1 Message Date
Tomas Vondra 20b9fa308e Build inherited extended stats on partitioned tables
Commit 859b3003de disabled building of extended stats for inheritance
trees, to prevent updating the same catalog row twice. While that
resolved the issue, it also means there are no extended stats for
declaratively partitioned tables, because there are no data in the
non-leaf relations.

That also means declaratively partitioned tables were not affected by
the issue 859b3003de addressed, which means this is a regression
affecting queries that calculate estimates for the whole inheritance
tree as a whole (which includes e.g. GROUP BY queries).

But because partitioned tables are empty, we can invert the condition
and build statistics only for the case with inheritance, without losing
anything. And we can consider them when calculating estimates.

It may be necessary to run ANALYZE on partitioned tables, to collect
proper statistics. For declarative partitioning there should no prior
statistics, and it might take time before autoanalyze is triggered. For
tables partitioned by inheritance the statistics may include data from
child relations (if built 859b3003de), contradicting the current code.

Report and patch by Justin Pryzby, minor fixes and cleanup by me.
Backpatch all the way back to PostgreSQL 10, where extended statistics
were introduced (same as 859b3003de).

Author: Justin Pryzby
Reported-by: Justin Pryzby
Backpatch-through: 10
Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com
2022-01-15 19:06:48 +01:00
Tomas Vondra 36c4bc6e72 Ignore extended statistics for inheritance trees
Since commit 859b3003de we only build extended statistics for individual
relations, ignoring the child relations. This resolved the issue with
updating catalog tuple twice, but we still tried to use the statistics
when calculating estimates for the whole inheritance tree. When the
relations contain very distinct data, it may produce bogus estimates.

This is roughly the same issue 427c6b5b9 addressed ~15 years ago, and we
fix it the same way - by ignoring extended statistics when calculating
estimates for the inheritance tree as a whole. We still consider
extended statistics when calculating estimates for individual child
relations, of course.

This may result in plan changes due to different estimates, but if the
old statistics were not describing the inheritance tree particularly
well it's quite likely the new plans is actually better.

Report and patch by Justin Pryzby, minor fixes and cleanup by me.
Backpatch all the way back to PostgreSQL 10, where extended statistics
were introduced (same as 859b3003de).

Author: Justin Pryzby
Reported-by: Justin Pryzby
Backpatch-through: 10
Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com
2022-01-15 02:20:54 +01:00
Bruce Momjian 27b77ecf9f Update copyright for 2022
Backpatch-through: 10
2022-01-07 19:04:57 -05:00
Michael Paquier 070d2e19e4 Clarify use of "statistics objects" in the code
The code inconsistently used "statistic object" or "statistics" where
the correct term, as discussed, is actually "statistics object".  This
improves the state of the code to be more consistent.

While on it, fix an incorrect error message introduced in a4d75c8.  This
error should never happen, as the code states, but it would be
misleading.

Author: Justin Pryzby
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/20210924215827.GS831@telsasoft.com
Backpatch-through: 14
2021-09-29 15:29:38 +09:00
Michael Paquier e767ddcd35 Fix typos and grammar in code comments
Several mistakes have piled in the code comments over the time,
including incorrect grammar, function names and simple typos.  This
commit takes care of a portion of these.

No backpatch is done as this is only cosmetic.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20210924215827.GS831@telsasoft.com
2021-09-27 14:21:28 +09:00
Tomas Vondra 83772cc78e Free memory after building each statistics object
Until now, all extended statistics on a given relation were built in the
same memory context, without resetting. Some of the memory was released
explicitly, but not all of it - for example memory allocated while
detoasting values is hard to free. This is how it worked since extended
statistics were introduced in PostgreSQL 10, but adding support for
extended stats on expressions made the issue somewhat worse as it
increases the number of statistics to build.

Fixed by adding a memory context which gets reset after building each
statistics object (all the statistics kinds included in it). Resetting
it after building each statistics kind would be even better, but it
would require more invasive changes and copying of results, making it
harder to backpatch.

Backpatch to PostgreSQL 10, where extended statistics were introduced.

Author: Justin Pryzby
Reported-by: Justin Pryzby
Reviewed-by: Tomas Vondra
Backpatch-through: 10
Discussion: https://www.postgresql.org/message-id/20210915200928.GP831%40telsasoft.com
2021-09-23 18:05:10 +02:00
Peter Eisentraut c302a61390 Error message refactoring
Take some untranslatable things out of the message and replace by
format placeholders, to reduce translatable strings and reduce
translation mistakes.
2021-06-27 09:41:16 +02:00
Tom Lane def5b065ff Initial pgindent and pgperltidy run for v14.
Also "make reformat-dat-files".

The only change worthy of note is that pgindent messed up the formatting
of launcher.c's struct LogicalRepWorkerId, which led me to notice that
that struct wasn't used at all anymore, so I just took it out.
2021-05-12 13:14:10 -04:00
Tomas Vondra 93f9af1387 Fix typos in comments about extended statistics
Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/20210505210947.GA27406%40telsasoft.com
2021-05-07 14:40:36 +02:00
Michael Paquier 7ef8b52cf0 Fix typos and grammar in comments and docs
Author: Justin Pryzby
Discussion: https://postgr.es/m/20210416070310.GG3315@telsasoft.com
2021-04-19 11:32:30 +09:00
Tomas Vondra 20661c15db Initialize t_self and t_tableOid in statext_expressions_load
The function is building a fake heap tuple, but left some of the header
fields (tid and table OID) uninitialized. Per Coverity report.

Reported-by: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQApj6h8tZ0-eP5Af5PKc5NG1YUc7=SdN_99YoHS51fKa0Q@mail.gmail.com
2021-04-14 00:46:12 +02:00
Tom Lane 9cb9233409 Fix uninitialized variable from commit a4d75c86b.
The path for *exprs != NIL would misbehave, and likely crash,
since pull_varattnos expects its last argument to be valid
at call.

Found by Coverity --- we have no coverage of this path in
the regression tests.
2021-04-11 11:46:46 -04:00
Michael Paquier 609b0652af Fix typos and grammar in documentation and code comments
Comment fixes are applied on HEAD, and documentation improvements are
applied on back-branches where needed.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20210408164008.GJ6592@telsasoft.com
Backpatch-through: 9.6
2021-04-09 13:53:07 +09:00
Tomas Vondra 518442c7f3 Fix handling of clauses incompatible with extended statistics
Handling of incompatible clauses while applying extended statistics was
a bit confused - while handling a mix of compatible and incompatible
clauses it sometimes incorrectly treated the incompatible clauses as
compatible, resulting in a crash.

Fixed by reworking the code applying the selected statistics object to
make it easier to understand, and adding a proper compatibility check.

Reported-by: David Rowley
Discussion: https://postgr.es/m/CAApHDvpYT10-nkSp8xXe-nbO3jmoaRyRFHbzh-RWMfAJynqgpQ%40mail.gmail.com
2021-04-06 16:56:06 +02:00
Tom Lane c545e9524d Don't prematurely cram a value into a short int.
Since a4d75c86b, some buildfarm members have been warning that
		Assert(attnum <= MaxAttrNumber);
is useless if attnum is an AttrNumber.  I'm not certain how plausible
it is that the value coming out of the bitmap could actually exceed
MaxAttrNumber, but we seem to have thought that that was possible back
in 7300a6995.  Revert the intermediate variable to int so that we have
the same overflow protection as before.
2021-03-31 16:45:24 -04:00
Tomas Vondra a4d75c86bf Extended statistics on expressions
Allow defining extended statistics on expressions, not just just on
simple column references.  With this commit, expressions are supported
by all existing extended statistics kinds, improving the same types of
estimates. A simple example may look like this:

  CREATE TABLE t (a int);
  CREATE STATISTICS s ON mod(a,10), mod(a,20) FROM t;
  ANALYZE t;

The collected statistics are useful e.g. to estimate queries with those
expressions in WHERE or GROUP BY clauses:

  SELECT * FROM t WHERE mod(a,10) = 0 AND mod(a,20) = 0;

  SELECT 1 FROM t GROUP BY mod(a,10), mod(a,20);

This introduces new internal statistics kind 'e' (expressions) which is
built automatically when the statistics object definition includes any
expressions. This represents single-expression statistics, as if there
was an expression index (but without the index maintenance overhead).
The statistics is stored in pg_statistics_ext_data as an array of
composite types, which is possible thanks to 79f6a942bd.

CREATE STATISTICS allows building statistics on a single expression, in
which case in which case it's not possible to specify statistics kinds.

A new system view pg_stats_ext_exprs can be used to display expression
statistics, similarly to pg_stats and pg_stats_ext views.

ALTER TABLE ... ALTER COLUMN ... TYPE now treats indexes the same way it
treats indexes, i.e. it drops and recreates the statistics. This means
all statistics are reset, and we no longer try to preserve at least the
functional dependencies. This should not be a major issue in practice,
as the functional dependencies actually rely on per-column statistics,
which were always reset anyway.

Author: Tomas Vondra
Reviewed-by: Justin Pryzby, Dean Rasheed, Zhihong Yu
Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com
2021-03-27 00:01:11 +01:00
Tomas Vondra a5f002ad9a Use correct spelling of statistics kind
A couple error messages and comments used 'statistic kind', not the
correct 'statistics kind'. Fix and backpatch all the way back to 10,
where extended statistics were introduced.

Backpatch-through: 10
2021-03-23 05:01:35 +01:00
Tomas Vondra bfa2cee784 Move bsearch_arg to src/port
Until now the bsearch_arg function was used only in extended statistics
code, so it was defined in that code.  But we already have qsort_arg in
src/port, so let's move it next to it.
2021-03-23 00:11:22 +01:00
Bruce Momjian ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Dean Rasheed 4f5760d4af Improve estimation of ANDs under ORs using extended statistics.
Formerly, extended statistics only handled clauses that were
RestrictInfos. However, the restrictinfo machinery doesn't create
sub-AND RestrictInfos for AND clauses underneath OR clauses.
Therefore teach extended statistics to handle bare AND clauses,
looking for compatible RestrictInfo clauses underneath them.

Dean Rasheed, reviewed by Tomas Vondra.

Discussion: https://postgr.es/m/CAEZATCW=J65GUFm50RcPv-iASnS2mTXQbr=CfBvWRVhFLJ_fWA@mail.gmail.com
2020-12-08 20:10:11 +00:00
Dean Rasheed 88b0898fe3 Improve estimation of OR clauses using multiple extended statistics.
When estimating an OR clause using multiple extended statistics
objects, treat the estimates for each set of clauses for each
statistics object as independent of one another. The overlap estimates
produced for each statistics object do not apply to clauses covered by
other statistics objects.

Dean Rasheed, reviewed by Tomas Vondra.

Discussion: https://postgr.es/m/CAEZATCW=J65GUFm50RcPv-iASnS2mTXQbr=CfBvWRVhFLJ_fWA@mail.gmail.com
2020-12-08 19:39:24 +00:00
Dean Rasheed 25a9e54d2d Improve estimation of OR clauses using extended statistics.
Formerly we only applied extended statistics to an OR clause as part
of the clauselist_selectivity() code path for an OR clause appearing
in an implicitly-ANDed list of clauses. This meant that it could only
use extended statistics if all sub-clauses of the OR clause were
covered by a single extended statistics object.

Instead, teach clause_selectivity() how to apply extended statistics
to an OR clause by handling its ORed list of sub-clauses in a similar
manner to an implicitly-ANDed list of sub-clauses, but with different
combination rules. This allows one or more extended statistics objects
to be used to estimate all or part of the list of sub-clauses. Any
remaining sub-clauses are then treated as if they are independent.

Additionally, to avoid double-application of extended statistics, this
introduces "extended" versions of clause_selectivity() and
clauselist_selectivity(), which include an option to ignore extended
statistics. This replaces the old clauselist_selectivity_simple()
function which failed to completely ignore extended statistics when
called from the extended statistics code.

A known limitation of the current infrastructure is that an AND clause
under an OR clause is not treated as compatible with extended
statistics (because we don't build RestrictInfos for such sub-AND
clauses). Thus, for example, "(a=1 AND b=1) OR (a=2 AND b=2)" will
currently be treated as two independent AND clauses (each of which may
be estimated using extended statistics), but extended statistics will
not currently be used to account for any possible overlap between
those clauses. Improving that is left as a task for the future.

Original patch by Tomas Vondra, with additional improvements by me.

Discussion: https://postgr.es/m/20200113230008.g67iyk4cs3xbnjju@development
2020-12-03 10:03:49 +00:00
Michael Paquier 8a15e735be Fix some grammar and typos in comments and docs
The documentation fixes are backpatched down to where they apply.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20201031020801.GD3080@telsasoft.com
Backpatch-through: 9.6
2020-11-02 15:14:41 +09:00
Tom Lane 38a2d70329 Remove some more useless assignments.
Found with clang's scan-build tool.  It also whines about a lot of
other dead stores that we should *not* change IMO, either as a matter
of style or future-proofing.  But these places seem like clear
oversights.

Discussion: https://postgr.es/m/CAEudQAo1+AcGppxDSg8k+zF4+Kv+eJyqzEDdbpDg58-=MQcerQ@mail.gmail.com
2020-09-04 14:32:19 -04:00
Tom Lane fa27dd40d5 Run pgindent with new pg_bsd_indent version 2.1.1.
Thomas Munro fixed a longstanding annoyance in pg_bsd_indent, that
it would misformat lines containing IsA() macros on the assumption
that the IsA() call should be treated like a cast.  This improves
some other cases involving field/variable names that match typedefs,
too.  The only places that get worse are a couple of uses of the
OpenSSL macro STACK_OF(); we'll gladly take that trade-off.

Discussion: https://postgr.es/m/20200114221814.GA19630@alvherre.pgsql
2020-05-16 11:54:51 -04:00
Tom Lane 5cbfce562f Initial pgindent and pgperltidy run for v13.
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.

Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
2020-05-14 13:06:50 -04:00
Tom Lane fe3036527a Fix race condition in statext_store().
Must hold some lock on the pg_statistic_ext_data catalog *before*
we look up the tuple we aim to replace.  Otherwise a concurrent
VACUUM FULL or similar operation could move it to a different TID,
leaving us trying to replace the wrong tuple.

Back-patch to v12 where this got broken.

Credit goes to Dean Rasheed; I'm just doing the clerical work.

Discussion: https://postgr.es/m/CAEZATCU0zHMDiQV0g8P2U+YSP9C1idUPrn79DajsbonwkN0xvQ@mail.gmail.com
2020-03-31 17:06:22 -04:00
Tomas Vondra 6f72dbc48b Fix wording of several extended stats comments
Reported-by: Thomas Munro
Discussion: https://www.postgresql.org/message-id/flat/20200113230008.g67iyk4cs3xbnjju@development
2020-03-18 13:40:13 +01:00
Tomas Vondra e83daa7e33 Use multi-variate MCV lists to estimate ScalarArrayOpExpr
Commit 8f321bd16c added support for estimating ScalarArrayOpExpr clauses
(IN/ANY) clauses using functional dependencies. There's no good reason
not to support estimation of these clauses using multi-variate MCV lists
too, so this commits implements that. That makes the behavior consistent
and MCV lists can estimate all variants (ANY/ALL, inequalities, ...).

Author: Tomas Vondra
Review: Dean Rasheed
Discussion: https://www.postgresql.org/message-id/flat/13902317.Eha0YfKkKy%40pierred-pdoc
2020-03-14 16:13:00 +01:00
Peter Eisentraut 3c173a53a8 Remove utils/acl.h from catalog/objectaddress.h
The need for this was removed by
8b9e9644dc.

A number of files now need to include utils/acl.h or
parser/parse_node.h explicitly where they previously got it indirectly
somehow.

Since parser/parse_node.h already includes nodes/parsenodes.h, the
latter is then removed where the former was added.  Also, remove
nodes/pg_list.h from objectaddress.h, since that's included via
nodes/parsenodes.h.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/7601e258-26b2-8481-36d0-dc9dca6f28f1%402ndquadrant.com
2020-03-10 10:27:00 +01:00
Alvaro Herrera a166d408eb Report progress of ANALYZE commands
This uses the progress reporting infrastructure added by c16dc1aca5,
adding support for ANALYZE.

Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Co-authored-by: Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp>
Reviewed-by: Julien Rouhaud, Robert Haas, Anthony Nowocien, Kyotaro Horiguchi,
	Vignesh C, Amit Langote
2020-01-15 11:14:39 -03:00
Tomas Vondra eae056c19e Apply multiple multivariate MCV lists when possible
Until now we've only used a single multivariate MCV list per relation,
covering the largest number of clauses. So for example given a query

    SELECT * FROM t WHERE a = 1 AND b =1 AND c = 1 AND d = 1

and extended statistics on (a,b) and (c,d), we'd only pick and use one
of them. This commit improves this by repeatedly picking and applying
the best statistics (matching the largest number of remaining clauses)
until no additional statistics is applicable.

This greedy algorithm is simple, but may not be optimal. A different
choice of statistics may leave fewer clauses unestimated and/or give
better estimates for some other reason.

This can however happen only when there are overlapping statistics, and
selecting one makes it impossible to use the other. E.g. with statistics
on (a,b), (c,d), (b,c,d), we may pick either (a,b) and (c,d) or (b,c,d).
But it's not clear which option is the best one.

We however assume cases like this are rare, and the easiest solution is
to define statistics covering the whole group of correlated columns. In
the future we might support overlapping stats, using some of the clauses
as conditions (in conditional probability sense).

Author: Tomas Vondra
Reviewed-by: Mark Dilger, Kyotaro Horiguchi
Discussion: https://postgr.es/m/20191028152048.jc6pqv5hb7j77ocp@development
2020-01-13 01:21:17 +01:00
Bruce Momjian 7559d8ebfa Update copyrights for 2020
Backpatch-through: update all files in master, backpatch legal files through 9.4
2020-01-01 12:21:45 -05:00
Michael Paquier 7854e07f25 Revert "Rename files and headers related to index AM"
This follows multiple complains from Peter Geoghegan, Andres Freund and
Alvaro Herrera that this issue ought to be dug more before actually
happening, if it happens.

Discussion: https://postgr.es/m/20191226144606.GA5659@alvherre.pgsql
2019-12-27 08:09:00 +09:00
Michael Paquier 8ce3aa9b59 Rename files and headers related to index AM
The following renaming is done so as source files related to index
access methods are more consistent with table access methods (the
original names used for index AMs ware too generic, and could be
confused as including features related to table AMs):
- amapi.h -> indexam.h.
- amapi.c -> indexamapi.c.  Here we have an equivalent with
backend/access/table/tableamapi.c.
- amvalidate.c -> indexamvalidate.c.
- amvalidate.h -> indexamvalidate.h.
- genam.c -> indexgenam.c.
- genam.h -> indexgenam.h.

This has been discussed during the development of v12 when table AM was
worked on, but the renaming never happened.

Author: Michael Paquier
Reviewed-by: Fabien Coelho, Julien Rouhaud
Discussion: https://postgr.es/m/20191223053434.GF34339@paquier.xyz
2019-12-25 10:23:39 +09:00
Tomas Vondra 6d61c3f1cb Remove unnecessary clauses_attnums variable
Commit c676e659b2 reworked how choose_best_statistics() picks the best
extended statistics, but failed to remove clauses_attnums which is now
unnecessary. So get rid of it and backpatch to 12, same as c676e659b2.

Author: Tomas Vondra
Discussion: https://postgr.es/m/CA+u7OA7H5rcE2=8f263w4NZD6ipO_XOrYB816nuLXbmSTH9pQQ@mail.gmail.com
Backpatch-through: 12
2019-11-28 23:25:14 +01:00
Tomas Vondra c676e659b2 Fix choose_best_statistics to check clauses individually
When picking the best extended statistics object for a list of clauses,
it's not enough to look at attnums extracted from the clause list as a
whole. Consider for example this query with OR clauses:

   SELECT * FROM t WHERE (t.a = 1) OR (t.b = 1) OR (t.c = 1)

with a statistics defined on columns (a,b). Relying on attnums extracted
from the whole OR clause, we'd consider the statistics usable. That does
not work, as we see the conditions as a single OR-clause, referencing an
attribute not covered by the statistic, leading to empty list of clauses
to be estimated using the statistics and an assert failure.

This changes choose_best_statistics to check which clauses are actually
covered, and only using attributes from the fully covered ones. For the
previous example this means the statistics object will not be considered
as compatible with the OR-clause.

Backpatch to 12, where MCVs were introduced. The issue does not affect
older versions because functional dependencies don't handle OR clauses.

Author: Tomas Vondra
Reviewed-by: Dean Rasheed
Reported-By: Manuel Rigger
Discussion: https://postgr.es/m/CA+u7OA7H5rcE2=8f263w4NZD6ipO_XOrYB816nuLXbmSTH9pQQ@mail.gmail.com
Backpatch-through: 12
2019-11-28 22:20:45 +01:00
Tomas Vondra d06215d03b Allow setting statistics target for extended statistics
When building statistics, we need to decide how many rows to sample and
how accurate the resulting statistics should be. Until now, it was not
possible to explicitly define statistics target for extended statistics
objects, the value was always computed from the per-attribute targets
with a fallback to the system-wide default statistics target.

That's a bit inconvenient, as it ties together the statistics target set
for per-column and extended statistics. In some cases it may be useful
to require larger sample / higher accuracy for extended statics (or the
other way around), but with this approach that's not possible.

So this commit introduces a new command, allowing to specify statistics
target for individual extended statistics objects, overriding the value
derived from per-attribute targets (and the system default).

  ALTER STATISTICS stat_name SET STATISTICS target_value;

When determining statistics target for an extended statistics object we
first look at this explicitly set value. When this value is -1, we fall
back to the old formula, looking at the per-attribute targets first and
then the system default. This means the behavior is backwards compatible
with older PostgreSQL releases.

Author: Tomas Vondra
Discussion: https://postgr.es/m/20190618213357.vli3i23vpkset2xd@development
Reviewed-by: Kirk Jamison, Dean Rasheed
2019-09-11 00:25:51 +02:00
Robert Haas 8b94dab066 Split tuptoaster.c into three separate files.
detoast.c/h contain functions required to detoast a datum, partially
or completely, plus a few other utility functions for examining the
size of toasted datums.

toast_internals.c/h contain functions that are used internally to the
TOAST subsystem but which (mostly) do not need to be accessed from
outside.

heaptoast.c/h contains code that is intrinsically specific to the
heap AM, either because it operates on HeapTuples or is based on the
layout of a heap page.

detoast.c and toast_internals.c are placed in
src/backend/access/common rather than src/backend/access/heap.  At
present, both files still have dependencies on the heap, but that will
be improved in a future commit.

Patch by me, reviewed and tested by Prabhat Sabu, Thomas Munro,
Andres Freund, and Álvaro Herrera.

Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com
2019-09-05 13:15:10 -04:00
Michael Paquier c96581abe4 Fix inconsistencies and typos in the tree, take 11
This fixes various typos in docs and comments, and removes some orphaned
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/5da8e325-c665-da95-21e0-c8a99ea61fbf@gmail.com
2019-08-19 16:21:39 +09:00
Andres Freund 6a04d345fd Don't include utils/array.h from acl.h.
For most uses of acl.h the details of how "Acl" internally looks like
are irrelevant. It might make sense to move a lot of the
implementation details into a separate header at a later point.

The main motivation of this change is to avoid including fmgr.h (via
array.h, which needs it for exposed structs) in a lot of files that
otherwise don't need it. A subsequent commit will remove the fmgr.h
include from a lot of files.

Directly include utils/array.h and utils/expandeddatum.h from the
files that need them, but previously included them indirectly, via
acl.h.

Author: Andres Freund
Discussion: https://postgr.es/m/20190803193733.g3l3x3o42uv4qj7l@alap3.anarazel.de
2019-08-16 10:33:30 -07:00
Tomas Vondra e38a55ba46 Rework examine_opclause_expression to use varonleft
The examine_opclause_expression function needs to return information on
which side of the operator we found the Var, but the variable was called
"isgt" which is rather misleading (it assumes the operator is either
less-than or greater-than, but it may be equality or something else).
Other places in the planner use a variable called "varonleft" for this
purpose, so just adopt the same convention here.

The code also assumed we don't care about this flag for equality, as
(Var = Const) and (Const = Var) should be the same thing. But that does
not work for cross-type operators, in which case we need to pass the
parameters to the procedure in the right order. So just use the same
code for all types of expressions.

This means we don't need to care about the selectivity estimation
function anymore, at least not in this code. We should only get the
supported cases here (thanks to statext_is_compatible_clause).

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu
Backpatch-to: 12
2019-07-20 16:37:30 +02:00
Tomas Vondra e8b6ae2130 Fix handling of opclauses in extended statistics
We expect opclauses to have exactly one Var and one Const, but the code
was checking the Const by calling is_pseudo_constant_clause() which is
incorrect - we need a proper constant.

Fixed by using plain IsA(x,Const) to check type of the node. We need to
do these checks in two places, so move it into a separate function that
can be called in both places.

Reported by Andreas Seltenreich, based on crash reported by sqlsmith.

Backpatch to v12, where this code was introduced.

Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu
Backpatch-to: 12
2019-07-18 11:29:38 +02:00
Dean Rasheed d7f8d26d9f Add security checks to the multivariate MCV estimation code.
The multivariate MCV estimation code may run user-defined operators on
the values in the MCV list, which means that those operators may
potentially leak the values from the MCV list. Guard against leaking
data to unprivileged users by checking that the user has SELECT
privileges on the table or all of the columns referred to by the
statistics.

Additionally, if there are any securityQuals on the RTE (either due to
RLS policies on the table, or accessing the table via a security
barrier view), not all rows may be visible to the current user, even
if they have table or column privileges. Thus we further insist that
the operator be leakproof in this case.

Dean Rasheed, reviewed by Tomas Vondra.

Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui=Vdx4N==VV5XOK5dsXfnGgVOz_JhAicB=ZA@mail.gmail.com
2019-06-23 18:50:08 +01:00
Tomas Vondra 6cbfb784c3 Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic.  That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.

Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics.  That would be a surprising behavior.

Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly.  This changed with introduction
of MCV lists, which do include most common combinations of values.

The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.

Bumped CATVERSION due to changing system catalog definitions.

Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-16 01:20:31 +02:00
Tom Lane 8255c7a5ee Phase 2 pgindent run for v12.
Switch to 2.1 version of pg_bsd_indent.  This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.

Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-05-22 13:04:48 -04:00
Tom Lane be76af171c Initial pgindent run for v12.
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.

Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
2019-05-22 12:55:34 -04:00
Tomas Vondra 7300a69950 Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.

Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.

Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
Tom Lane fa2cf164aa Rename nodes/relation.h to nodes/pathnodes.h.
The old name of this file was never a very good indication of what it
was for.  Now that there's also access/relation.h, we have a potential
confusion hazard as well, so let's rename it to something more apropos.
Per discussion, "pathnodes.h" is reasonable, since a good fraction of
the file is Path node definitions.

While at it, tweak a couple of other headers that were gratuitously
importing relation.h into modules that don't need it.

Discussion: https://postgr.es/m/7719.1548688728@sss.pgh.pa.us
2019-01-29 16:49:25 -05:00
Andres Freund e0c4ec0728 Replace uses of heap_open et al with the corresponding table_* function.
Author: Andres Freund
Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
2019-01-21 10:51:37 -08:00