Consider fillfactor when estimating relation size

When table_block_relation_estimate_size() estimated the number of tuples
in a relation without statistics (e.g. right after load), it did not
consider fillfactor when calculating density. With non-default
fillfactor values, this may result in significant overestimate of the
number of tuples - up to 10x with the minimum 10% fillfactor. This may
have unexpected consequences, e.g. when creating hash indexes.

This considers the current fillfactor value in the "no statistics" code
path.  If the fillfactor changes after loading data into the table, the
estimate may be off. But that seems much less likely than changing the
fillfactor before the data load.

Reviewed-by: Corey Huinker, Peter Eisentraut
Discussion: https://postgr.es/m/cf154ef9-6bac-d268-b735-67a3443debba@enterprisedb.com
This commit is contained in:
Tomas Vondra 2023-07-03 18:55:31 +02:00
parent 087a933b21
commit 29cf61ade3
1 changed files with 9 additions and 1 deletions

View File

@ -737,11 +737,19 @@ table_block_relation_estimate_size(Relation rel, int32 *attr_widths,
* and (c) different table AMs might use different padding schemes.
*/
int32 tuple_width;
int fillfactor;
/*
* Without reltuples/relpages, we also need to consider fillfactor.
* The other branch considers it implicitly by calculating density
* from actual relpages/reltuples statistics.
*/
fillfactor = RelationGetFillFactor(rel, HEAP_DEFAULT_FILLFACTOR);
tuple_width = get_rel_data_width(rel, attr_widths);
tuple_width += overhead_bytes_per_tuple;
/* note: integer division is intentional here */
density = usable_bytes_per_page / tuple_width;
density = (usable_bytes_per_page * fillfactor / 100) / tuple_width;
}
*tuples = rint(density * (double) curpages);