Commit Graph

12 Commits

Author SHA1 Message Date
Alexander Korotkov ff9f72c68f revert: Transform OR clauses to ANY expression
This commit reverts 72bd38cc99 due to implementation and design issues.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/3604469.1712628736%40sss.pgh.pa.us
2024-04-10 02:28:09 +03:00
Alexander Korotkov 72bd38cc99 Transform OR clauses to ANY expression
Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...])
on the preliminary stage of optimization when we are still working with the
expression tree.

Here Cn is a n-th constant expression, 'expr' is non-constant expression, 'op'
is an operator which returns boolean result and has a commuter (for the case
of reverse order of constant and non-constant parts of the expression,
like 'Cn op expr').

Sometimes it can lead to not optimal plan.  This is why there is a
or_to_any_transform_limit GUC.  It specifies a threshold value of length of
arguments in an OR expression that triggers the OR-to-ANY transformation.
Generally, more groupable OR arguments mean that transformation will be more
likely to win than to lose.

Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru
Author: Alena Rybakina <lena.ribackina@yandex.ru>
Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Ranier Vilela <ranier.vf@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
2024-04-08 01:27:52 +03:00
Peter Eisentraut 367c989cd8 Remove custom _jumbleRangeTblEntry()
This is part of an effort to reduce the number of special cases in the
automatically generated node support functions.

This patch removes _jumbleRangeTblEntry() and instead adds per-field
query_jumble_ignore annotations to match the behavior of the previous
custom code.  The pg_stat_statements test suite has some coverage of
this.  It gets rid of the switch on rtekind; this should be
technically correct, since we do the equal and copy functions like
this also.

The list of fields to jumble has been checked and is considered
correct as of 8b29a119fd.

Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org
2024-03-22 07:23:47 +01:00
Peter Eisentraut 6d470211e5 Fix description and grouping of RangeTblEntry.inh
The inh field of RangeTblEntry was doubly confusingly documented.
Some parts of the code insisted that it was only valid for
RTE_RELATION entries, other parts said the field was valid for all
entries.  Neither was quite correct.  More correctly, the field is
valid for RTE_RELATION entries but is also used in the planner for
RTE_SUBQUERY entries.  So it makes more sense to group it with other
fields that are primarily for RTE_RELATION but borrowed by
RTE_SUBQUERY.  (The exact position was chosen so that it is next to
relkind for better struct packing, and next to relid, since relid and
inh are sort of the input fields and the others are filled in later.)
Also add documentation for the planner's use at the struct definition.

Discussion: https://www.postgresql.org/message-id/6c1fbccc-85c8-40d3-b08b-4f47f2093711@eisentraut.org
2024-03-07 12:13:09 +01:00
Peter Eisentraut 8b29a119fd Add missing RangeTblEntry field to jumble
RangeTblEntry.funcordinality should be jumbled, because the WITH
ORDINALITY clause changes the query result.

This was apparently an oversight in the past.

Discussion: https://www.postgresql.org/message-id/flat/d7f421f8-fd6d-4759-adc3-247090a5d44b%40eisentraut.org
2024-02-29 14:09:09 +01:00
Michael Paquier f854dae888 Improve comment about query_id_enabled in queryjumblefuncs.c
The comment was inexact because query_id_enabled will not be switched to
"true" even if compute_query_id is "on", unless a module requests for
it.

While on it, this adds a comment to mention that IsQueryIdEnabled()
should be used to check if query ID computation is enabled or not.

Author: Yugo Nagata
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/20240209153823.e29a68cadb14225f1362a2cf@sraoss.co.jp
2024-02-14 07:20:15 +09:00
Bruce Momjian 29275b1d17 Update copyright for 2024
Reported-by: Michael Paquier

Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz

Backpatch-through: 12
2024-01-03 20:49:05 -05:00
Michael Paquier 2ecbb0a493 Remove dependency to query text in JumbleQuery()
Since 3db72eb, the query ID of utilities is generated using the Query
structure, making the use of the query string in JumbleQuery()
unnecessary.  This commit removes the argument "querytext" from
JumbleQuery().

Reported-by: Joe Conway
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/ZJlQAWE4COFqHuAV@paquier.xyz
2023-06-28 08:59:36 +09:00
Michael Paquier 9ba37b2cb6 Include values of A_Const nodes in query jumbling
Like the implementation for node copy, write and read, this node
requires a custom implementation so as the query jumbling is able to
consider the correct value assigned to it, depending on its type (int,
float, bool, string, bitstring).

Based on a dump of pg_stat_statements from the regression database, this
would confuse the query jumbling of the following queries:
- SET.
- COPY TO with SELECT queries.
- START TRANSACTION with different isolation levels.
- ALTER TABLE with default expressions.
- CREATE TABLE with partition bounds.

Note that there may be a long-term argument in tracking the location of
such nodes so as query strings holding such nodes could be normalized,
but this is left as a separate discussion.

Oversight in 3db72eb.

Discussion: https://postgr.es/m/Y9+HuYslMAP6yyPb@paquier.xyz
2023-02-07 09:03:54 +09:00
Michael Paquier 3db72ebcbe Generate code for query jumbling through gen_node_support.pl
This commit changes the query jumbling code in queryjumblefuncs.c to be
generated automatically based on the information of the nodes in the
headers of src/include/nodes/ by using gen_node_support.pl.  This
approach offers many advantages:
- Support for query jumbling for all the utility statements, based on the
state of their parsed Nodes and not only their query string.  This will
greatly ease the switch to normalize the information of some DDLs, like
SET or CALL for example (this is left unchanged and should be part of a
separate discussion).  With this feature, the number of entries stored
for utilities in pg_stat_statements is reduced (for example now
"CHECKPOINT" and "checkpoint" mean the same thing with the same query
ID).
- Documentation of query jumbling directly in the structure definition
of the nodes.  Since this code has been introduced in pg_stat_statements
and then moved to code, the reasons behind the choices of what should be
included in the jumble are rather sparse.  Note that some explanation is
added for the most relevant parts, as a start.
- Overall code reduction and more consistency with the other parts
generating read, write and copy depending on the nodes.

The query jumbling is controlled by a couple of new node attributes,
documented in nodes/nodes.h:
- custom_query_jumble, to mark a Node as having a custom
implementation.
- no_query_jumble, to ignore entirely a Node.
- query_jumble_ignore, to ignore a field in a Node.
- query_jumble_location, to mark a location in a Node, for
normalization.  This can apply only to int fields, with "location" in
their name (only Const as of this commit).

There should be no compatibility impact on pg_stat_statements, as the
new code applies the jumbling to the same fields for each node (its
regression tests have no modification, for one).

Some benchmark of the query jumbling between HEAD and this commit for
SELECT and DMLs has proved that this new code does not cause a
performance regression, with computation times close for both methods.
For utility queries, the new method is slower than the previous method
of calculating a hash of the query string, though we are talking about
extra ns-level changes based on what I measured, which is unnoticeable
even for OLTP workloads as a query ID is calculated once per query
post-parse analysis.

Author: Michael Paquier
Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/Y5BHOUhX3zTH/ig6@paquier.xyz
2023-01-31 15:24:05 +09:00
Tom Lane 2489d76c49 Make Vars be outer-join-aware.
Traditionally we used the same Var struct to represent the value
of a table column everywhere in parse and plan trees.  This choice
predates our support for SQL outer joins, and it's really a pretty
bad idea with outer joins, because the Var's value can depend on
where it is in the tree: it might go to NULL above an outer join.
So expression nodes that are equal() per equalfuncs.c might not
represent the same value, which is a huge correctness hazard for
the planner.

To improve this, decorate Var nodes with a bitmapset showing
which outer joins (identified by RTE indexes) may have nulled
them at the point in the parse tree where the Var appears.
This allows us to trust that equal() Vars represent the same value.
A certain amount of klugery is still needed to cope with cases
where we re-order two outer joins, but it's possible to make it
work without sacrificing that core principle.  PlaceHolderVars
receive similar decoration for the same reason.

In the planner, we include these outer join bitmapsets into the relids
that an expression is considered to depend on, and in consequence also
add outer-join relids to the relids of join RelOptInfos.  This allows
us to correctly perceive whether an expression can be calculated above
or below a particular outer join.

This change affects FDWs that want to plan foreign joins.  They *must*
follow suit when labeling foreign joins in order to match with the
core planner, but for many purposes (if postgres_fdw is any guide)
they'd prefer to consider only base relations within the join.
To support both requirements, redefine ForeignScan.fs_relids as
base+OJ relids, and add a new field fs_base_relids that's set up by
the core planner.

Large though it is, this commit just does the minimum necessary to
install the new mechanisms and get check-world passing again.
Follow-up patches will perform some cleanup.  (The README additions
and comments mention some stuff that will appear in the follow-up.)

Patch by me; thanks to Richard Guo for review.

Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
2023-01-30 13:16:20 -05:00
Michael Paquier 8eba3e3f02 Move queryjumble.c code to src/backend/nodes/
This will ease a follow-up move that will generate automatically this
code.  The C file is renamed, for consistency with the node-related
files whose code are generated by gen_node_support.pl:
- queryjumble.c -> queryjumblefuncs.c
- utils/queryjumble.h -> nodes/queryjumble.h

Per a suggestion from Peter Eisentraut.

Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/Y5BHOUhX3zTH/ig6@paquier.xyz
2023-01-21 11:48:37 +09:00