Commit Graph

17 Commits

Author SHA1 Message Date
Tom Lane 1d33858406 Fix handling of targetlist SRFs when scan/join relation is known empty.
When we introduced separate ProjectSetPath nodes for application of
set-returning functions in v10, we inadvertently broke some cases where
we're supposed to recognize that the result of a subquery is known to be
empty (contain zero rows).  That's because IS_DUMMY_REL was just looking
for a childless AppendPath without allowing for a ProjectSetPath being
possibly stuck on top.  In itself, this didn't do anything much worse
than produce slightly worse plans for some corner cases.

Then in v11, commit 11cf92f6e rearranged things to allow the scan/join
targetlist to be applied directly to partial paths before they get
gathered.  But it inserted a short-circuit path for dummy relations
that was a little too short: it failed to insert a ProjectSetPath node
at all for a targetlist containing set-returning functions, resulting in
bogus "set-valued function called in context that cannot accept a set"
errors, as reported in bug #15669 from Madelaine Thibaut.

The best way to fix this mess seems to be to reimplement IS_DUMMY_REL
so that it drills down through any ProjectSetPath nodes that might be
there (and it seems like we'd better allow for ProjectionPath as well).

While we're at it, make it look at rel->pathlist not cheapest_total_path,
so that it gives the right answer independently of whether set_cheapest
has been done lately.  That dependency looks pretty shaky in the context
of code like apply_scanjoin_target_to_paths, and even if it's not broken
today it'd certainly bite us at some point.  (Nastily, unsafe use of the
old coding would almost always work; the hazard comes down to possibly
looking through a dangling pointer, and only once in a blue moon would
you find something there that resulted in the wrong answer.)

It now looks like it was a mistake for IS_DUMMY_REL to be a macro: if
there are any extensions using it, they'll continue to use the old
inadequate logic until they're recompiled, after which they'll fail
to load into server versions predating this fix.  Hopefully there are
few such extensions.

Having fixed IS_DUMMY_REL, the special path for dummy rels in
apply_scanjoin_target_to_paths is unnecessary as well as being wrong,
so we can just drop it.

Also change a few places that were testing for partitioned-ness of a
planner relation but not using IS_PARTITIONED_REL for the purpose; that
seems unsafe as well as inconsistent, plus it required an ugly hack in
apply_scanjoin_target_to_paths.

In passing, save a few cycles in apply_scanjoin_target_to_paths by
skipping processing of pre-existing paths for partitioned rels,
and do some cosmetic cleanup and comment adjustment in that function.

I renamed IS_DUMMY_PATH to IS_DUMMY_APPEND with the intention of breaking
any code that might be using it, since in almost every case that would
be wrong; IS_DUMMY_REL is what to be using instead.

In HEAD, also make set_dummy_rel_pathlist static (since it's no longer
used from outside allpaths.c), and delete is_dummy_plan, since it's no
longer used anywhere.

Back-patch as appropriate into v11 and v10.

Tom Lane and Julien Rouhaud

Discussion: https://postgr.es/m/15669-02fb3296cca26203@postgresql.org
2019-03-07 14:22:13 -05:00
Tom Lane 57cd2b6e6d Fix create_scan_plan's handling of sortgrouprefs for physical tlists.
We should only run apply_pathtarget_labeling_to_tlist if CP_LABEL_TLIST
was specified, because only in that case has use_physical_tlist checked
that the labeling will succeed; otherwise we may get an "ORDER/GROUP BY
expression not found in targetlist" error.  (This subsumes the previous
test about gating_clauses, because we reset "flags" to zero earlier
if there are gating clauses to apply.)

The only known case in which a failure can occur is with a ProjectSet
path directly atop a table scan path, although it seems likely that there
are other cases or will be such in future.  This means that the failure
is currently only visible in the v10 branch: 9.6 didn't have ProjectSet,
while in v11 and HEAD, apply_scanjoin_target_to_paths for some weird
reason is using create_projection_path not apply_projection_to_path,
masking the problem because there's a ProjectionPath in between.

Nonetheless this code is clearly wrong on its own terms, so back-patch
to 9.6 where this logic was introduced.

Per report from Regina Obe.

Discussion: https://postgr.es/m/001501d40f88$75186950$5f493bf0$@pcorp.us
2018-07-11 15:25:28 -04:00
Tom Lane 9c7dc89282 Re-allow SRFs and window functions within sub-selects within aggregates.
check_agg_arguments_walker threw an error upon seeing a SRF or window
function, but that is too aggressive: if the function is within a
sub-select then it's perfectly fine.  I broke the SRF case in commit
0436f6bde by copying the logic for window functions ... but that was
broken too, and had been since commit eaccfded9.

Repair both cases in HEAD, and the window function case back to 9.3.
9.2 gets this right.
2017-06-27 17:51:11 -04:00
Tom Lane 8e72239e9d Fix no-longer-valid shortcuts in expression_returns_set().
expression_returns_set() used to short-circuit its recursion upon
seeing certain node types, such as DistinctExpr, that it knew the
executor did not support set-valued arguments for.  That was never
inherent, though, just a reflection of laziness in execQual.c.
With the new implementation of SRFs there is no reason to think
that any scalar-valued expression node could not have a set-valued
subexpression, except for AggRefs and WindowFuncs where we know there
is a parser check rejecting it.  And indeed, the shortcut causes
unexpected failures for cases such as a SRF underneath DistinctExpr,
because the planner stops looking for SRFs too soon.

Discussion: https://postgr.es/m/5259.1497044025@sss.pgh.pa.us
2017-06-14 11:10:05 -04:00
Tom Lane 0436f6bde8 Disallow set-returning functions inside CASE or COALESCE.
When we reimplemented SRFs in commit 69f4b9c85, our initial choice was
to allow the behavior to vary from historical practice in cases where a
SRF call appeared within a conditional-execution construct (currently,
only CASE or COALESCE).  But that was controversial to begin with, and
subsequent discussion has resulted in a consensus that it's better to
throw an error instead of executing the query differently from before,
so long as we can provide a reasonably clear error message and a way to
rewrite the query.

Hence, add a parser mechanism to allow detection of such cases during
parse analysis.  The mechanism just requires storing, in the ParseState,
a pointer to the set-returning FuncExpr or OpExpr most recently emitted
by parse analysis.  Then the parsing functions for CASE and COALESCE can
detect the presence of a SRF in their arguments by noting whether this
pointer changes while analyzing their arguments.  Furthermore, if it does,
it provides a suitable error cursor location for the complaint.  (This
means that if there's more than one SRF in the arguments, the error will
point at the last one to be analyzed not the first.  While connoisseurs of
parsing behavior might find that odd, it's unlikely the average user would
ever notice.)

While at it, we can also provide more specific error messages than before
about some pre-existing restrictions, such as no-SRFs-within-aggregates.
Also, reject at parse time cases where a NULLIF or IS DISTINCT FROM
construct would need to return a set.  We've never supported that, but the
restriction is depended on in more subtle ways now, so it seems wise to
detect it at the start.

Also, provide some documentation about how to rewrite a SRF-within-CASE
query using a custom wrapper SRF.

It turns out that the information_schema.user_mapping_options view
contained an instance of exactly the behavior we're now forbidding; but
rewriting it makes it more clear and safer too.

initdb forced because of user_mapping_options change.

Patch by me, with error message suggestions from Alvaro Herrera and
Andres Freund, pursuant to a complaint from Regina Obe.

Discussion: https://postgr.es/m/000001d2d5de$d8d66170$8a832450$@pcorp.us
2017-06-13 23:46:39 -04:00
Tom Lane e240a65c7d Provide an error cursor for "can't call an SRF here" errors.
Since it appears that v10 is going to move the goalposts by some amount
in terms of where you can and can't invoke set-returning functions,
arrange for the executor's "set-valued function called in context that
cannot accept a set" errors to include a syntax position if possible,
pointing to the specific SRF that can't be called where it's located.

The main bit of infrastructure needed for this is to make the query source
text accessible in the executor; but it turns out that commit 4c728f382
already did that.  We just need a new function executor_errposition()
modeled on parser_errposition(), and we're ready to rock.

While experimenting with this, I noted that the error position wasn't
properly reported if it occurred in a plpgsql FOR-over-query loop,
which turned out to be because SPI_cursor_open_internal wasn't providing
an error context callback during PortalStart.  Fix that.

There's a whole lot more that could be done with this infrastructure
now that it's there, but this is not the right time in the development
cycle for that sort of work.  Hence, resist the temptation to plaster
executor_errposition() calls everywhere ... for the moment.

Discussion: https://postgr.es/m/5263.1492471571@sss.pgh.pa.us
2017-04-18 13:21:08 -04:00
Andrew Gierth b5635948ab Support hashed aggregation with grouping sets.
This extends the Aggregate node with two new features: HashAggregate
can now run multiple hashtables concurrently, and a new strategy
MixedAggregate populates hashtables while doing sorted grouping.

The planner will now attempt to save as many sorts as possible when
planning grouping sets queries, while not exceeding work_mem for the
estimated combined sizes of all hashtables used.  No SQL-level changes
are required.  There should be no user-visible impact other than the
new EXPLAIN output and possible changes to result ordering when ORDER
BY was not used (which affected a few regression tests).  The
enable_hashagg option is respected.

Author: Andrew Gierth
Reviewers: Mark Dilger, Andres Freund
Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk
2017-03-27 04:20:54 +01:00
Tom Lane c82d4e658e Fix mishandling of tSRFs at different nesting levels.
Given a targetlist like "srf(x), f(srf(x))", split_pathtarget_at_srfs()
decided that it needed two levels of ProjectSet nodes, failing to notice
that the two SRF calls are textually equal().  Because of that, setrefs.c
would convert the upper ProjectSet's tlist to "Var1, f(Var1)" (where Var1
represents a reference to the srf(x) output of the lower ProjectSet).
This triggered an assertion in nodeProjectSet.c complaining that it found
no SRFs to evaluate, as reported by Erik Rijkers.

What we want in such a case is to evaluate srf(x) only once and use a plain
Result node to compute "Var1, f(Var1)"; that gives results similar to what
previous versions produced, whereas allowing srf(x) to be evaluated again
in an upper ProjectSet would square the number of rows emitted.

Furthermore, even if the SRF calls aren't textually identical, we want them
to be evaluated in lockstep, because that's what happened in the old
implementation.  But split_pathtarget_at_srfs() got this completely wrong,
using two levels of ProjectSet for a case like "srf(x), f(srf(y))".

Hence, rewrite split_pathtarget_at_srfs() from the ground up so that it
groups SRFs according to the depth of nesting of SRFs in their arguments.
This is pretty much how we envisioned that working originally, but I blew
it when it came to implementation.

In passing, optimize the case of target == input_target, which I noticed
is not only possible but quite common.

Discussion: https://postgr.es/m/dcbd2853c05d22088766553d60dc78c6@xs4all.nl
2017-02-02 16:38:18 -05:00
Andres Freund 182200531a Fix platform dependant regression output triggered by 69f4b9c85f.
Due to the changed costing in that commit hash-aggregates started to
be used, which results in big-endian vs. little-endian output
differences.  Disable hash-aggs for those tests.

Author: Andres Freund, with input from Tom Lane
Discussion: https://postgr.es/m/22891.1484791792@sss.pgh.pa.us
2017-01-19 14:42:02 -08:00
Andres Freund 69f4b9c85f Move targetlist SRF handling from expression evaluation to new executor node.
Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT
generate_series(1,5)) so far was done in the expression evaluation (i.e.
ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code.

This meant that most executor nodes performing projection, and most
expression evaluation functions, had to deal with the possibility that an
evaluated expression could return a set of return values.

That's bad because it leads to repeated code in a lot of places. It also,
and that's my (Andres's) motivation, made it a lot harder to implement a
more efficient way of doing expression evaluation.

To fix this, introduce a new executor node (ProjectSet) that can evaluate
targetlists containing one or more SRFs. To avoid the complexity of the old
way of handling nested expressions returning sets (e.g. having to pass up
ExprDoneCond, and dealing with arguments to functions returning sets etc.),
those SRFs can only be at the top level of the node's targetlist.  The
planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is
only necessary in ProjectSet nodes and that SRFs are only present at the
top level of the node's targetlist. If there are nested SRFs the planner
creates multiple stacked ProjectSet nodes.  The ProjectSet nodes always get
input from an underlying node.

We also discussed and prototyped evaluating targetlist SRFs using ROWS
FROM(), but that turned out to be more complicated than we'd hoped.

While moving SRF evaluation to ProjectSet would allow to retain the old
"least common multiple" behavior when multiple SRFs are present in one
targetlist (i.e.  continue returning rows until all SRFs are at the end of
their input at the same time), we decided to instead only return rows till
all SRFs are exhausted, returning NULL for already exhausted ones.  We
deemed the previous behavior to be too confusing, unexpected and actually
not particularly useful.

As a side effect, the previously prohibited case of multiple set returning
arguments to a function, is now allowed. Not because it's particularly
desirable, but because it ends up working and there seems to be no argument
for adding code to prohibit it.

Currently the behavior for COALESCE and CASE containing SRFs has changed,
returning multiple rows from the expression, even when the SRF containing
"arm" of the expression is not evaluated. That's because the SRFs are
evaluated in a separate ProjectSet node.  As that's quite confusing, we're
likely to instead prohibit SRFs in those places.  But that's still being
discussed, and the code would reside in places not touched here, so that's
a task for later.

There's a lot of, now superfluous, code dealing with set return expressions
around. But as the changes to get rid of those are verbose largely boring,
it seems better for readability to keep the cleanup as a separate commit.

Author: Tom Lane and Andres Freund
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
2017-01-18 13:40:27 -08:00
Tom Lane d43a619c60 Fix check_srf_call_placement() to handle VALUES cases correctly.
INSERT ... VALUES with a single VALUES row is implemented quite differently
from the general VALUES case.  A user-visible implication of that is that
we accept SRFs in the single-row case, but not in the multi-row case.
That's a historical artifact no doubt, but in view of the lack of field
complaints, I'm not excited about fixing it right now.

However, check_srf_call_placement() needs to know about this, first because
it should throw an error in the unsupported case, and second because it
should set p_hasTargetSRFs in the single-row case (because we treat that
like a SELECT tlist).  That's an oversight in commit a4c35ea1c.

To fix, split EXPR_KIND_VALUES into two values.  So far as I can see,
this is the only place where we need to distinguish the two cases at
present; but there might be more later.

Patch by me, per report from Andres Freund.

Discussion: https://postgr.es/m/20170116081548.zg63zltblwimpfgp@alap3.anarazel.de
2017-01-16 15:23:11 -05:00
Andres Freund 0137caf273 Make regression tests less dependent on hash table order.
Upcoming changes to the hash table code used, among others, for grouping
and set operations will change the output order for a few queries. To
make it less likely that actual bugs are hidden between regression test
ordering changes, and to make the tests robust against platform
dependant ordering, add ORDER BYs guaranteeing the output order.

As it's possible that some of the changes expose platform dependant
ordering, push this earlier, to let the buildfarm shake out potentially
unstable results.

Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>
2016-10-10 13:41:57 -07:00
Tom Lane 0dac5b5174 Tweak targetlist-SRF tests some more.
Seems like it would be good to have a test case documenting the
existing behavior for non-top-level SRFs.
2016-09-14 19:48:50 -04:00
Tom Lane a163c006ca Tweak targetlist-SRF tests.
Add a test case showing that we don't support SRFs in window-function
arguments.  Remove a duplicate test case for SRFs in aggregate arguments.
2016-09-14 14:30:40 -04:00
Tom Lane a4c35ea1c2 Improve parser's and planner's handling of set-returning functions.
Teach the parser to reject misplaced set-returning functions during parse
analysis using p_expr_kind, in much the same way as we do for aggregates
and window functions (cf commit eaccfded9).  While this isn't complete
(it misses nesting-based restrictions), it's much better than the previous
error reporting for such cases, and it allows elimination of assorted
ad-hoc expression_returns_set() error checks.  We could add nesting checks
later if it seems important to catch all cases at parse time.

There is one case the parser will now throw error for although previous
versions allowed it, which is SRFs in the tlist of an UPDATE.  That never
behaved sensibly (since it's ill-defined which generated row should be
used to perform the update) and it's hard to see why it should not be
treated as an error.  It's a release-note-worthy change though.

Also, add a new Query field hasTargetSRFs reporting whether there are
any SRFs in the targetlist (including GROUP BY/ORDER BY expressions).
The parser can now set that basically for free during parse analysis,
and we can use it in a number of places to avoid expression_returns_set
searches.  (There will be more such checks soon.)  In some places, this
allows decontorting the logic since it's no longer expensive to check for
SRFs in the tlist --- so I made the checks parallel to the handling of
hasAggs/hasWindowFuncs wherever it seemed appropriate.

catversion bump because adding a Query field changes stored rules.

Andres Freund and Tom Lane

Discussion: <24639.1473782855@sss.pgh.pa.us>
2016-09-13 13:54:24 -04:00
Andres Freund 9f478b4f19 Address portability issues in bfe16d1a5 test output. 2016-09-12 18:15:10 -07:00
Andres Freund bfe16d1a5d Add more tests for targetlist SRFs.
We're considering changing the implementation of targetlist SRFs
considerably, and a lot of the current behaviour isn't tested in our
regression tests. Thus it seems useful to increase coverage to avoid
accidental behaviour changes.

It's quite possible that some of the plans here will require adjustments
to avoid falling afoul of ordering differences (e.g. hashed group
bys). The buildfarm will tell us.

Reviewed-By: Tom Lane
Discussion: <20160827214829.zo2dfb5jaikii5nw@alap3.anarazel.de>
2016-09-12 17:27:47 -07:00