Fix ndistinct estimates with system attributes

When estimating the number of groups using extended statistics, the code
was discarding information about system attributes. This led to strange
situation that

    SELECT 1 FROM t GROUP BY ctid;

could have produced higher estimate (equal to pg_class.reltuples) than

    SELECT 1 FROM t GROUP BY a, b, ctid;

with extended statistics on (a,b). Fixed by retaining information about
the system attribute.

Backpatch all the way to 10, where extended statistics were introduced.

Author: Tomas Vondra
Backpatch-through: 10
This commit is contained in:
Tomas Vondra 2021-03-26 22:34:53 +01:00
parent a14a0118a1
commit 33e52ad9a3
2 changed files with 4 additions and 4 deletions

View File

@ -3987,11 +3987,11 @@ estimate_multivariate_ndistinct(PlannerInfo *root, RelOptInfo *rel,
attnum = ((Var *) varinfo->var)->varattno;
if (!AttrNumberIsForUserDefinedAttr(attnum))
if (AttrNumberIsForUserDefinedAttr(attnum) &&
bms_is_member(attnum, matched))
continue;
if (!bms_is_member(attnum, matched))
newlist = lappend(newlist, varinfo);
newlist = lappend(newlist, varinfo);
}
*varinfos = newlist;

View File

@ -260,7 +260,7 @@ SELECT s.stxkind, d.stxdndistinct
SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY ctid, a, b');
estimated | actual
-----------+--------
11 | 1000
1000 | 1000
(1 row)
-- Hash Aggregate, thanks to estimates improved by the statistic