Invent recursive_worktable_factor GUC to replace hard-wired constant.

Up to now, the planner estimated the size of a recursive query's
worktable as 10 times the size of the non-recursive term.  It's hard
to see how to do significantly better than that automatically, but
we can give users control over the multiplier to allow tuning for
specific use-cases.  The default behavior remains the same.

Simon Riggs

Discussion: https://postgr.es/m/CANbhV-EuaLm4H3g0+BSTYHEGxJj3Kht0R+rJ8vT57Dejnh=_nA@mail.gmail.com
This commit is contained in:
Tom Lane 2022-03-24 11:47:41 -04:00
parent a47651447f
commit 0bd7af082a
6 changed files with 44 additions and 3 deletions

View File

@ -5919,6 +5919,29 @@ SELECT * FROM parent WHERE key = 2400;
</listitem>
</varlistentry>
<varlistentry id="guc-recursive-worktable-factor" xreflabel="recursive_worktable_factor">
<term><varname>recursive_worktable_factor</varname> (<type>floating point</type>)
<indexterm>
<primary><varname>recursive_worktable_factor</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
Sets the planner's estimate of the average size of the working
table of a <link linkend="queries-with-recursive">recursive
query</link>, as a multiple of the estimated size of the initial
non-recursive term of the query. This helps the planner choose
the most appropriate method for joining the working table to the
query's other tables.
The default value is <literal>10.0</literal>. A smaller value
such as <literal>1.0</literal> can be helpful when the recursion
has low <quote>fan-out</quote> from one step to the next, as for
example in shortest-path queries. Graph analytics queries may
benefit from larger-than-default values.
</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
</sect1>

View File

@ -123,6 +123,7 @@ double cpu_index_tuple_cost = DEFAULT_CPU_INDEX_TUPLE_COST;
double cpu_operator_cost = DEFAULT_CPU_OPERATOR_COST;
double parallel_tuple_cost = DEFAULT_PARALLEL_TUPLE_COST;
double parallel_setup_cost = DEFAULT_PARALLEL_SETUP_COST;
double recursive_worktable_factor = DEFAULT_RECURSIVE_WORKTABLE_FACTOR;
int effective_cache_size = DEFAULT_EFFECTIVE_CACHE_SIZE;
@ -5665,10 +5666,11 @@ set_cte_size_estimates(PlannerInfo *root, RelOptInfo *rel, double cte_rows)
if (rte->self_reference)
{
/*
* In a self-reference, arbitrarily assume the average worktable size
* is about 10 times the nonrecursive term's size.
* In a self-reference, we assume the average worktable size is a
* multiple of the nonrecursive term's size. The best multiplier will
* vary depending on query "fan-out", so make its value adjustable.
*/
rel->tuples = 10 * cte_rows;
rel->tuples = clamp_row_est(recursive_worktable_factor * cte_rows);
}
else
{

View File

@ -3740,6 +3740,18 @@ static struct config_real ConfigureNamesReal[] =
NULL, NULL, NULL
},
{
{"recursive_worktable_factor", PGC_USERSET, QUERY_TUNING_OTHER,
gettext_noop("Sets the planner's estimate of the average size "
"of a recursive query's working table."),
NULL,
GUC_EXPLAIN
},
&recursive_worktable_factor,
DEFAULT_RECURSIVE_WORKTABLE_FACTOR, 0.001, 1000000.0,
NULL, NULL, NULL
},
{
{"geqo_selection_bias", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("GEQO: selective pressure within the population."),

View File

@ -426,6 +426,7 @@
# JOIN clauses
#plan_cache_mode = auto # auto, force_generic_plan or
# force_custom_plan
#recursive_worktable_factor = 10.0 # range 0.001-1000000
#------------------------------------------------------------------------------

View File

@ -29,6 +29,8 @@
#define DEFAULT_PARALLEL_TUPLE_COST 0.1
#define DEFAULT_PARALLEL_SETUP_COST 1000.0
/* defaults for non-Cost parameters */
#define DEFAULT_RECURSIVE_WORKTABLE_FACTOR 10.0
#define DEFAULT_EFFECTIVE_CACHE_SIZE 524288 /* measured in pages */
typedef enum

View File

@ -91,6 +91,7 @@ extern PGDLLIMPORT double cpu_index_tuple_cost;
extern PGDLLIMPORT double cpu_operator_cost;
extern PGDLLIMPORT double parallel_tuple_cost;
extern PGDLLIMPORT double parallel_setup_cost;
extern PGDLLIMPORT double recursive_worktable_factor;
extern PGDLLIMPORT int effective_cache_size;
extern double clamp_row_est(double nrows);