Commit Graph

2184 Commits

Author SHA1 Message Date
David Rowley 0a34bcd0c2 Ensure we allocate NAMEDATALEN bytes for names in Index Only Scans
As an optimization, we store "name" columns as cstrings in btree
indexes.

Here we modify it so that Index Only Scans convert these cstrings back
to names with NAMEDATALEN bytes rather than storing the cstring in the
tuple slot, as was happening previously.

Bug: #17855
Reported-by: Alexander Lakhin
Reviewed-by: Alexander Lakhin, Tom Lane
Discussion: https://postgr.es/m/17855-5f523e0f9769a566@postgresql.org
Backpatch-through: 12, all supported versions
2024-05-01 13:23:05 +12:00
Tom Lane de3c5b1872 Fix unnecessary use of moving-aggregate mode with non-moving frame.
When a plain aggregate is used as a window function, and the window
frame start is specified as UNBOUNDED PRECEDING, the frame's head
cannot move so we do not need to use moving-aggregate mode.  The check
for that was put into initialize_peragg(), failing to notice that
ExecInitWindowAgg() calls that function before it's filled in
winstate->frameOptions.  Since makeNode() would have zeroed the field,
this didn't provoke uninitialized-value complaints, nor would the
erroneous decision have resulted in more than a little inefficiency.
Still, it's wrong, so move the initialization of
winstate->frameOptions earlier to make it work properly.

While here, also fix a thinko in a comment.  Both errors crept in in
commit a9d9acbf2 which introduced the moving-aggregate mode.

Spotted by Vallimaharajan G.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/18e7f2a5167.fe36253866818.977923893562469143@zohocorp.com
2024-03-27 13:39:03 -04:00
Heikki Linnakangas 992189a3e9 Fix EXPLAIN Bitmap heap scan to count pages with no visible tuples
Previously, bitmap heap scans only counted lossy and exact pages for
explain when there was at least one visible tuple on the page.

heapam_scan_bitmap_next_block() returned true only if there was a
"valid" page with tuples to be processed. However, the lossy and exact
page counters in EXPLAIN should count the number of pages represented
in a lossy or non-lossy way in the constructed bitmap, regardless of
whether or not the pages ultimately contained visible tuples.

Backpatch to all supported versions.

Author: Melanie Plageman
Discussion: https://www.postgresql.org/message-id/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAAKRu_bxrXeZ2rCnY8LyeC2Ls88KpjWrQ%2BopUrXDRXdcfwFZGA@mail.gmail.com
2024-03-18 14:04:24 +02:00
Tom Lane 28184f039e Fix confusion about the return rowtype of SQL-language procedures.
There is a very ancient hack in check_sql_fn_retval that allows a
single SELECT targetlist entry of composite type to be taken as
supplying all the output columns of a function returning composite.
(This is grotty and fundamentally ambiguous, but it's really hard
to do nested composite-returning functions without it.)

As far as I know, that doesn't cause any problems in ordinary
functions.  It's disastrous for procedures however.  All procedures
that have any output parameters are labeled with prorettype RECORD,
and the CALL code expects it will get back a record with one column
per output parameter, regardless of whether any of those parameters
is composite.  Doing something else leads to an assertion failure
or core dump.

This is simple enough to fix: we just need to not apply that rule
when considering procedures.  However, that requires adding another
argument to check_sql_fn_retval, which at least in principle might be
getting called by external callers.  Therefore, in the back branches
convert check_sql_fn_retval into an ABI-preserving wrapper around a
new function check_sql_fn_retval_ext.

Per report from Yahor Yuzefovich.  This has been broken since we
implemented procedures, so back-patch to all supported branches.

Discussion: https://postgr.es/m/CABz5gWHSjj2df6uG0NRiDhZ_Uz=Y8t0FJP-_SVSsRsnrQT76Gg@mail.gmail.com
2024-03-12 18:16:10 -04:00
Tom Lane d769f9d97f Fix type-checking of RECORD-returning functions in FROM.
In the corner case where a function returning RECORD has been
simplified to a RECORD constant or an inlined ROW() expression,
ExecInitFunctionScan failed to cross-check the function's result
rowtype against the coldeflist provided by the calling query.
That happened because get_expr_result_type is able to extract a
tupdesc from such expressions, which led ExecInitFunctionScan to
ignore the coldeflist.  (Instead, it used the extracted tupdesc
to check the function's output, which of course always succeeds.)

I have not been able to demonstrate any really serious consequences
from this, because if some column of the result is of the wrong
type and is directly referenced by a Var of the calling query,
CheckVarSlotCompatibility will catch it.  However, we definitely do
fail to report the case where the function returns more columns than
the coldeflist expects, and in the converse case where it returns
fewer columns, we get an assert failure (but, seemingly, no worse
results in non-assert builds).

To fix, always build the expected tupdesc from the coldeflist if there
is one, and consult get_expr_result_type only when there isn't one.

Also remove the failing Assert, even though it is no longer reached
after this fix.  It doesn't seem to be adding anything useful, since
later checking will deal with cases with the wrong number of columns.

The only other place I could find that is doing something similar
is inline_set_returning_function.  There's no live bug there because
we cannot be looking at a Const or RowExpr, but for consistency
change that code to agree with ExecInitFunctionScan.

Per report from PetSerAl.  After some debate I've concluded that
this should be back-patched.  There is a small risk that somebody
has been relying on such a case not throwing an error, but I judge
this outweighed by the risk that I've missed some way in which the
failure to cross-check has worse consequences than sketched above.

Discussion: https://postgr.es/m/CAKygsHSerA1eXsJHR9wft3Gn3wfHQ5RfP8XHBzF70_qcrrRvEg@mail.gmail.com
2024-03-06 14:41:13 -05:00
Alexander Korotkov 17b6f39f19 Fix indentation in ExecParallelHashIncreaseNumBatches()
Backpatch-through: 12
2024-01-08 19:58:38 +02:00
Alexander Korotkov 60de25c6ef Fix oversized memory allocation in Parallel Hash Join
During the calculations of the maximum for the number of buckets, take into
account that later we round that to the next power of 2.

Reported-by: Karen Talarico
Bug: #16925
Discussion: https://postgr.es/m/16925-ec96d83529d0d629%40postgresql.org
Author: Thomas Munro, Andrei Lepikhov, Alexander Korotkov
Reviewed-by: Alena Rybakina
Backpatch-through: 12
2024-01-07 09:11:00 +02:00
Tom Lane b7684473d7 Fix intra-query memory leak when a SRF returns zero rows.
When looping around after finding that the set-returning function
returned zero rows for the current input tuple, ExecProjectSet
neglected to reset either of the two memory contexts it's
responsible for cleaning out.  Typically this wouldn't cause much
problem, because once the SRF does return at least one row, the
contexts would get reset on the next call.  However, if the SRF
returns no rows for many input tuples in succession, quite a lot
of memory could be transiently consumed.

To fix, make sure we reset both contexts while looping around.

Per bug #18172 from Sergei Kornilov.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/18172-9b8c5fc1d676ded3@postgresql.org
2023-10-28 14:04:43 -04:00
David Rowley 890a73ba3a Fix runtime partition pruning for HASH partitioned tables
This could only affect HASH partitioned tables with at least 2 partition
key columns.

If partition pruning was delayed until execution and the query contained
an IS NULL qual on one of the partitioned keys, and some subsequent
partitioned key was being compared to a non-Const, then this could result
in a crash due to the incorrect keyno being used to calculate the
stateidx for the expression evaluation code.

Here we fix this by properly skipping partitioned keys which have a
nullkey set.  Effectively, this must be the same as what's going on
inside perform_pruning_base_step().

Sergei Glukhov also provided a patch, but that's not what's being used
here.

Reported-by: Sergei Glukhov
Reviewed-by: tender wang, Sergei Glukhov
Discussion: https://postgr.es/m/d05b26fa-af54-27e1-f693-6c31590802fa@postgrespro.ru
Backpatch-through: 11, where runtime partition pruning was added.
2023-10-13 01:14:22 +13:00
Tomas Vondra 3ce761d5ca Fix memory leak in Incremental Sort rescans
The Incremental Sort had a couple issues, resulting in leaking memory
during rescans, possibly triggering OOM. The code had a couple of
related flaws:

1. During rescans, the sort states were reset but then also set to NULL
   (despite the comment saying otherwise). ExecIncrementalSort then
   sees NULL and initializes a new sort state, leaking the memory used
   by the old one.

2. Initializing the sort state also automatically rebuilt the info about
   presorted keys, leaking the already initialized info. presorted_keys
   was also unnecessarily reset to NULL.

Patch by James Coleman, based on patches by Laurenz Albe and Tom Lane.
Backpatch to 13, where Incremental Sort was introduced.

Author: James Coleman, Laurenz Albe, Tom Lane
Reported-by: Laurenz Albe, Zu-Ming Jiang
Backpatch-through: 13
Discussion: https://postgr.es/m/b2bd02dff61af15e3526293e2771f874cf2a3be7.camel%40cybertec.at
Discussion: https://postgr.es/m/db03c582-086d-e7cd-d4a1-3bc722f81765%40inf.ethz.ch
2023-07-02 20:05:35 +02:00
Tom Lane d0ab203bc1 Fix order of operations in ExecEvalFieldStoreDeForm().
If the given composite datum is toasted out-of-line,
DatumGetHeapTupleHeader will perform database accesses to detoast it.
That can invalidate the result of get_cached_rowtype, as documented
(perhaps not plainly enough) in that function's API spec; which leads
to strange errors or crashes when we try to use the TupleDesc to read
the tuple.  In short then, trying to update a field of a composite
column could fail intermittently if the overall column value is wide
enough to require toasting.

We can fix the bug at no cost by just changing the order of
operations, since we don't need the TupleDesc until after detoasting.
(Other callers of get_cached_rowtype appear to get this right already,
so there's only one bug.)

Note that the added regression test case reveals this bug reliably
only with debug_discard_caches/CLOBBER_CACHE_ALWAYS.

Per bug #17994 from Alexander Lakhin.  Sadly, this patch does not fix
the missing-values issue revealed in the bug discussion; we'll need
some more work to cover that.

Discussion: https://postgr.es/m/17994-5c7100b51b4790e9@postgresql.org
2023-06-29 10:19:10 -04:00
Tomas Vondra c504aa8570 Use per-tuple context in ExecGetAllUpdatedCols
Commit fc22b6623b (generated columns) replaced ExecGetUpdatedCols() with
ExecGetAllUpdatedCols() in a couple places handling UPDATE (triggers and
lock mode). However, ExecGetUpdatedCols() did exec_rt_fetch() while
ExecGetAllUpdatedCols() also allocates memory through bms_union()
without paying attention to the memory context and happened to use the
long-lived ExecutorState, leaking the memory until the end of the query.

The amount of leaked memory is proportional to the number of (updated)
attributes, types of UPDATE triggers, and the number of processed rows
(which for UPDATE ... FROM ... may be much higher than updated rows).

Fixed by switching to the per-tuple context in GetAllUpdatedColumns().
This is fine for all in-core callers, but external callers may need to
copy the result. But we're not aware of any such callers.

Note the issue was introduced by fc22b6623b, but the macros were later
renamed by f50e888990.

Backpatch to 12, where the issue was introduced.

Reported-by: Tomas Vondra
Reviewed-by: Andres Freund, Tom Lane, Jakub Wartak
Backpatch-through: 12
Discussion: https://postgr.es/m/222a3442-7f7d-246c-ed9b-a76209d19239@enterprisedb.com
2023-06-07 18:53:16 +02:00
Tom Lane 7428aecddf Fix assignment to array of domain over composite, redux.
Commit 3e310d837 taught isAssignmentIndirectionExpr() to look through
CoerceToDomain nodes.  That's not sufficient, because since commit
04fe805a1 it's been possible for the planner to simplify
CoerceToDomain to RelabelType when the domain has no constraints
to enforce.  So we need to look through RelabelType too.

Per bug #17897 from Alexander Lakhin.  Although 3e310d837 was
back-patched to v11, it seems sufficient to apply this change
to v12 and later, since 04fe805a1 came in in v12.

Dmitry Dolgov

Discussion: https://postgr.es/m/17897-4216c546c3874044@postgresql.org
2023-04-15 12:01:39 -04:00
Tom Lane 1bbbe14607 Fix oversights in array manipulation.
The nested-arrays code path in ExecEvalArrayExpr() used palloc to
allocate the result array, whereas every other array-creating function
has used palloc0 since 18c0b4ecc.  This mostly works, but unused bits
past the end of the nulls bitmap may end up undefined.  That causes
valgrind complaints with -DWRITE_READ_PARSE_PLAN_TREES, and could
cause planner misbehavior as cited in 18c0b4ecc.  There seems no very
good reason why we should strive to avoid palloc0 in just this one case,
so fix it the easy way with s/palloc/palloc0/.

While looking at that I noted that we also failed to check for overflow
of "nbytes" and "nitems" while summing the sizes of the sub-arrays,
potentially allowing a crash due to undersized output allocation.
For "nbytes", follow the policy used by other array-munging code of
checking for overflow after each addition.  (As elsewhere, the last
addition of the array's overhead space doesn't need an extra check,
since palloc itself will catch a value between 1Gb and 2Gb.)
For "nitems", there's no very good reason to sum the inputs at all,
since we can perfectly well use ArrayGetNItems' result instead of
ignoring it.

Per discussion of this bug, also remove redundant zeroing of the
nulls bitmap in array_set_element and array_set_slice.

Patch by Alexander Lakhin and myself, per bug #17858 from Alexander
Lakhin; thanks also to Richard Guo.  These bugs are a dozen years old,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/17858-8fd287fd3663d051@postgresql.org
2023-03-26 13:41:06 -04:00
Amit Kapila be52fff91d Ignore generated columns during apply of update/delete.
We fail to apply updates and deletes when the REPLICA IDENTITY FULL is
used for the table having generated columns. We didn't use to ignore
generated columns while doing tuple comparison among the tuples from
the publisher and subscriber during apply of updates and deletes.

Author: Onder Kalaci
Reviewed-by: Shi yu, Amit Kapila
Backpatch-through: 12
Discussion: https://postgr.es/m/CACawEhVQC9WoofunvXg12aXtbqKnEgWxoRx3+v8q32AWYsdpGg@mail.gmail.com
2023-03-23 11:21:13 +05:30
Amit Kapila 751d6676da Ignore dropped columns during apply of update/delete.
We fail to apply updates and deletes when the REPLICA IDENTITY FULL is
used for the table having dropped columns. We didn't use to ignore dropped
columns while doing tuple comparison among the tuples from the publisher
and subscriber during apply of updates and deletes.

Author: Onder Kalaci, Shi yu
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CACawEhVQC9WoofunvXg12aXtbqKnEgWxoRx3+v8q32AWYsdpGg@mail.gmail.com
2023-03-21 09:07:37 +05:30
Thomas Munro 6e94d62e3f Fix race in parallel hash join batch cleanup, take II.
With unlucky timing and parallel_leader_participation=off (not the
default), PHJ could attempt to access per-batch shared state just as it
was being freed.  There was code intended to prevent that by checking
for a cleared pointer, but it was racy.  Fix, by introducing an extra
barrier phase.  The new phase PHJ_BUILD_RUNNING means that it's safe to
access the per-batch state to find a batch to help with, and
PHJ_BUILD_DONE means that it is too late.  The last to detach will free
the array of per-batch state as before, but now it will also atomically
advance the phase, so that late attachers can avoid the hazard.  This
mirrors the way per-batch hash tables are freed (see phases
PHJ_BATCH_PROBING and PHJ_BATCH_DONE).

An earlier attempt to fix this (commit 3b8981b6, later reverted) missed
one special case.  When the inner side is empty (the "empty inner
optimization), the build barrier would only make it to
PHJ_BUILD_HASHING_INNER phase before workers attempted to detach from
the hashtable.  In that case, fast-forward the build barrier to
PHJ_BUILD_RUNNING before proceeding, so that our later assertions hold
and we can still negotiate who is cleaning up.

Revealed by build farm failures, where BarrierAttach() failed a sanity
check assertion, because the memory had been clobbered by dsa_free().
In non-assert builds, the result could be a segmentation fault.

Back-patch to all supported releases.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Michael Paquier <michael@paquier.xyz>
Reported-by: David Geier <geidav.pg@gmail.com>
Tested-by: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/20200929061142.GA29096%40paquier.xyz
2023-03-21 14:38:59 +13:00
Tom Lane 4a94cbd02b Fix some more cases of missed GENERATED-column updates.
If UPDATE is forced to retry after an EvalPlanQual check, it neglected
to repeat GENERATED-column computations, even though those might well
have changed since we're dealing with a different tuple than before.
Fixing this is mostly a matter of looping back a bit further when
we retry.  In v15 and HEAD that's most easily done by altering the API
of ExecUpdateAct so that it includes computing GENERATED expressions.

Also, if an UPDATE in a partitioned table turns into a cross-partition
INSERT operation, we failed to recompute GENERATED columns.  That's a
bug since 8bf6ec3ba allowed partitions to have different generation
expressions; although it seems to have no ill effects before that.
Fixing this is messier because we can now have situations where the same
query needs both the UPDATE-aligned set of GENERATED columns and the
INSERT-aligned set, and it's unclear which set will be generated first
(else we could hack things by forcing the INSERT-aligned set to be
generated, which is indeed how fe9e658f4 made it work for MERGE).
The best fix seems to be to build and store separate sets of expressions
for the INSERT and UPDATE cases.  That would create ABI issues in the
back branches, but so far it seems we can leave this alone in the back
branches.

Per bug #17823 from Hisahiro Kauchi.  The first part of this affects all
branches back to v12 where GENERATED columns were added.

Discussion: https://postgr.es/m/17823-b64909cf7d63de84@postgresql.org
2023-03-06 18:31:16 -05:00
Tom Lane 1e199c2599 Fix MULTIEXPR_SUBLINK with partitioned target tables, yet again.
We already tried to fix this in commits 3f7323cbb et al (and follow-on
fixes), but now it emerges that there are still unfixed cases;
moreover, these cases affect all branches not only pre-v14.  I thought
we had eliminated all cases of making multiple clones of an UPDATE's
target list when we nuked inheritance_planner.  But it turns out we
still do that in some partitioned-UPDATE cases, notably including
INSERT ... ON CONFLICT UPDATE, because ExecInitPartitionInfo thinks
it's okay to clone and modify the parent's targetlist.

This fix is based on a suggestion from Andres Freund: let's stop
abusing the ParamExecData.execPlan mechanism, which was only ever
meant to handle initplans, and instead solve the execution timing
problem by having the expression compiler move MULTIEXPR_SUBLINK steps
to the front of their expression step lists.  This is feasible because
(a) all branches still in support compile the entire targetlist of
an UPDATE into a single ExprState, and (b) we know that all
MULTIEXPR_SUBLINKs do need to be evaluated --- none could be buried
inside a CASE, for example.  There is a minor semantics change
concerning the order of execution of the MULTIEXPR's subquery versus
other parts of the parent targetlist, but that seems like something
we can get away with.  By doing that, we no longer need to worry
about whether different clones of a MULTIEXPR_SUBLINK share output
Params; their usage of that data structure won't overlap.

Per bug #17800 from Alexander Lakhin.  Back-patch to all supported
branches.  In v13 and earlier, we can revert 3f7323cbb and follow-on
fixes; however, I chose to keep the SubPlan.subLinkId field added
in ccbb54c72.  We don't need that anymore in the core code, but it's
cheap enough to fill, and removing a plan node field in a minor
release seems like it'd be asking for trouble.

Andres Freund and Tom Lane

Discussion: https://postgr.es/m/17800-ff90866b3906c964@postgresql.org
2023-02-25 14:44:14 -05:00
Dean Rasheed 906356cf61 Add missing support for the latest SPI status codes.
SPI_result_code_string() was missing support for SPI_OK_TD_REGISTER,
and in v15 and later, it was missing support for SPI_OK_MERGE, as was
pltcl_process_SPI_result().

The last of those would trigger an error if a MERGE was executed from
PL/Tcl. The others seem fairly innocuous, but worth fixing.

Back-patch to all supported branches. Before v15, this is just adding
SPI_OK_TD_REGISTER to SPI_result_code_string(), which is unlikely to
be seen by anyone, but seems worth doing for completeness.

Reviewed by Tom Lane.

Discussion:
  https://postgr.es/m/CAEZATCUg8V%2BK%2BGcafOPqymxk84Y_prXgfe64PDoopjLFH6Z0Aw%40mail.gmail.com
  https://postgr.es/m/CAEZATCUMe%2B_KedPMM9AxKqm%3DSZogSxjUcrMe%2BsakusZh3BFcQw%40mail.gmail.com
2023-02-22 13:27:29 +00:00
David Rowley 301eb3ee4e Disable WindowAgg inverse transitions when subplans are present
When an aggregate function is used as a WindowFunc and a tuple transitions
out of the window frame, we ordinarily try to make use of the aggregate
function's inverse transition function to "unaggregate" the exiting tuple.

This optimization is disabled for various cases, including when the
aggregate contains a volatile function.  In such a case we'd be unable to
ensure that the transition value was calculated to the same value during
transitions and inverse transitions.  Unfortunately, we did this check by
calling contain_volatile_functions() which does not recursively search
SubPlans for volatile functions.  If the aggregate function's arguments or
its FILTER clause contained a subplan with volatile functions then we'd
fail to notice this.

Here we fix this by just disabling the optimization when the WindowFunc
contains any subplans.  Volatile functions are not the only reason that a
subplan may have nonrepeatable results.

Bug: #17777
Reported-by: Anban Company
Discussion: https://postgr.es/m/17777-860b739b6efde977%40postgresql.org
Reviewed-by: Tom Lane
Backpatch-through: 11
2023-02-13 17:09:26 +13:00
Tom Lane 787db4be94 Make new GENERATED-expressions code more bulletproof.
In commit 8bf6ec3ba I assumed that no code path could reach
ExecGetExtraUpdatedCols without having gone through
ExecInitStoredGenerated.  That turns out not to be the case in
logical replication: if there's an ON UPDATE trigger on the target
table, trigger.c will call this code before anybody has set up its
generated columns.  Having seen that, I don't have a lot of faith in
there not being other such paths.  ExecGetExtraUpdatedCols can call
ExecInitStoredGenerated for itself, as long as we are willing to
assume that it is only called in CMD_UPDATE operations, which on
the whole seems like a safer leap of faith.

Per report from Vitaly Davydov.

Discussion: https://postgr.es/m/d259d69652b8c2ff50e14cda3c236c7f@postgrespro.ru
2023-01-15 14:06:46 -05:00
Tom Lane ad38e2f891 Fix calculation of which GENERATED columns need to be updated.
We were identifying the updatable generated columns of inheritance
children by transposing the calculation made for their parent.
However, there's nothing that says a traditional-inheritance child
can't have generated columns that aren't there in its parent, or that
have different dependencies than are in the parent's expression.
(At present it seems that we don't enforce that for partitioning
either, which is likely wrong to some degree or other; but the case
clearly needs to be handled with traditional inheritance.)

Hence, drop the very-klugy-anyway "extraUpdatedCols" RTE field
in favor of identifying which generated columns depend on updated
columns during executor startup.  In HEAD we can remove
extraUpdatedCols altogether; in back branches, it's still there but
always empty.  Another difference between the HEAD and back-branch
versions of this patch is that in HEAD we can add the new bitmap field
to ResultRelInfo, but that would cause an ABI break in back branches.
Like 4b3e37993, add a List field at the end of struct EState instead.

Back-patch to v13.  The bogus calculation is also being made in v12,
but it doesn't have the same visible effect because we don't use it
to decide which generated columns to recalculate; as a consequence of
which the patch doesn't apply easily.  I think that there might still
be a demonstrable bug associated with trigger firing conditions, but
that's such a weird corner-case usage that I'm content to leave it
unfixed in v12.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/CA+HiwqFshLKNvQUd1DgwJ-7tsTp=dwv7KZqXC4j2wYBV1aCDUA@mail.gmail.com
Discussion: https://postgr.es/m/2793383.1672944799@sss.pgh.pa.us
2023-01-05 14:12:17 -05:00
Tom Lane 19adcaf00f Avoid reference to nonexistent array element in ExecInitAgg().
When considering an empty grouping set, we fetched
phasedata->eqfunctions[-1].  Because the eqfunctions array is
palloc'd, that would always be an aset pointer in released versions,
and thus the code accidentally failed to malfunction (since it would
do nothing unless it found a null pointer).  Nonetheless this seems
like trouble waiting to happen, so add a check for length == 0.

It's depressing that our valgrind testing did not catch this.
Maybe we should reconsider the choice to not mark that word NOACCESS?

Richard Guo

Discussion: https://postgr.es/m/CAMbWs4-vZuuPOZsKOYnSAaPYGKhmacxhki+vpOKk0O7rymccXQ@mail.gmail.com
2023-01-02 16:17:00 -05:00
Etsuro Fujita 197cbca6bc Fix copy-and-pasteo in comment. 2022-11-02 18:15:05 +09:00
Tom Lane c58513f095 Future-proof the recursion inside ExecShutdownNode().
The API contract for planstate_tree_walker() callbacks is that they
take a PlanState pointer and a context pointer.  Somebody figured
they could save a couple lines of code by ignoring that, and passing
ExecShutdownNode itself as the walker even though it has but one
argument.  Somewhat remarkably, we've gotten away with that so far.
However, it seems clear that the upcoming C2x standard means to
forbid such cases, and compilers that actively break such code
likely won't be far behind.  So spend the extra few lines of code
to do it honestly with a separate walker function.

In HEAD, we might as well go further and remove ExecShutdownNode's
useless return value.  I left that as-is in back branches though,
to forestall complaints about ABI breakage.

Back-patch, with the thought that this might become of practical
importance before our stable branches are all out of service.
It doesn't seem to be fixing any live bug on any currently known
platform, however.

Discussion: https://postgr.es/m/208054.1663534665@sss.pgh.pa.us
2022-09-19 12:16:02 -04:00
Tom Lane 3f7323cbbd Repair rare failure of MULTIEXPR_SUBLINK subplans in inherited updates.
Prior to v14, if we have a MULTIEXPR SubPlan (that is, use of the syntax
UPDATE ... SET (c1, ...) = (SELECT ...)) in an UPDATE with an inherited
or partitioned target table, inheritance_planner() will clone the
targetlist and therefore also the MULTIEXPR SubPlan and the Param nodes
referencing it for each child target table.  Up to now, we've allowed
all the clones to share the underlying subplan as well as the output
parameter IDs -- that is, the runtime ParamExecData slots.  That
technique is borrowed from the far older code that supports initplans,
and it works okay in that case because the cloned SubPlan nodes are
essentially identical.  So it doesn't matter which one of the clones
the shared ParamExecData.execPlan field might point to.

However, this fails to hold for MULTIEXPR SubPlans, because they can
have nonempty "args" lists (values to be passed into the subplan), and
those lists could get mutated to different states in the various clones.
In the submitted reproducer, as well as the test case added here, one
clone contains Vars with varno OUTER_VAR where another has INNER_VAR,
because the child tables are respectively on the outer or inner side of
the join.  Sharing the execPlan pointer can result in trying to evaluate
an args list that doesn't match the local execution state, with mayhem
ensuing.  The result often is to trigger consistency checks in the
executor, but I believe this could end in a crash or incorrect updates.

To fix, assign new Param IDs to each of the cloned SubPlans, so that
they don't share ParamExecData slots at runtime.  It still seems fine
for the clones to share the underlying subplan, and extra ParamExecData
slots are cheap enough that this fix shouldn't cost much.

This has been busted since we invented MULTIEXPR SubPlans in 9.5.
Probably the lack of previous reports is because query plans in which
the different clones of a MULTIEXPR mutate to effectively-different
states are pretty rare.  There's no issue in v14 and later, because
without inheritance_planner() there's never a reason to clone
MULTIEXPR SubPlans.

Per bug #17596 from Andre Lin.  Patch v10-v13 only.

Discussion: https://postgr.es/m/17596-c5357f61427a81dc@postgresql.org
2022-08-27 12:11:20 -04:00
Amit Kapila 1df86aac51 Fix replica identity check for a partitioned table.
The current publisher code checks if UPDATE or DELETE can be executed with
the replica identity of the table even if it's a partitioned table. We can
skip checking the replica identity for partitioned tables because the
operations are actually performed on the leaf partitions (not the
partitioned table).

Reported-by: Brad Nicholson
Author: Hou Zhijie
Reviewed-by: Peter Smith, Amit Kapila
Backpatch-through: 13
Discussion: https://postgr.es/m/CAMMnM%3D8i5DohH%3DYKzV0_wYuYSYvuOJoL9F5nzXTc%2ByzsG1f6rg%40mail.gmail.com
2022-08-16 14:30:27 +05:30
Tom Lane 4878ea717c Avoid misbehavior when hash_table_bytes < bucket_size.
It's possible to reach this case when work_mem is very small and tupsize
is (relatively) very large.  In that case ExecChooseHashTableSize would
get an assertion failure, or with asserts off it'd compute nbuckets = 0,
which'd likely cause misbehavior later (I've not checked).  To fix,
clamp the number of buckets to be at least 1.

This is due to faulty conversion of old my_log2() coding in 28d936031.
Back-patch to v13, as that was.

Zhang Mingli

Discussion: https://postgr.es/m/beb64ca0-91e2-44ac-bf4a-7ea36275ec02@Spark
2022-08-13 16:59:58 -04:00
Tom Lane 71caf3c4da Fix handling of R/W expanded datums that are passed to SQL functions.
fmgr_sql must make expanded-datum arguments read-only, because
it's possible that the function body will pass the argument to
more than one callee function.  If one of those functions takes
the datum's R/W property as license to scribble on it, then later
callees will see an unexpected value, leading to wrong answers.

From a performance standpoint, it'd be nice to skip this in the
common case that the argument value is passed to only one callee.
However, detecting that seems fairly hard, and certainly not
something that I care to attempt in a back-patched bug fix.

Per report from Adam Mackler.  This has been broken since we
invented expanded datums, so back-patch to all supported branches.

Discussion: https://postgr.es/m/WScDU5qfoZ7PB2gXwNqwGGgDPmWzz08VdydcPFLhOwUKZcdWbblbo-0Lku-qhuEiZoXJ82jpiQU4hOjOcrevYEDeoAvz6nR0IU4IHhXnaCA=@mackler.email
Discussion: https://postgr.es/m/187436.1660143060@sss.pgh.pa.us
2022-08-10 13:37:25 -04:00
Tom Lane 23edf0e8b4 Add CHECK_FOR_INTERRUPTS in ExecInsert's speculative insertion loop.
Ordinarily the functions called in this loop ought to have plenty
of CFIs themselves; but we've now seen a case where no such CFI is
reached, making the loop uninterruptible.  Even though that's from
a recently-introduced bug, it seems prudent to install a CFI at
the loop level in all branches.

Per discussion of bug #17558 from Andrew Kesper (an actual fix for
that bug will follow).

Discussion: https://postgr.es/m/17558-3f6599ffcf52fd4a@postgresql.org
2022-08-04 14:10:06 -04:00
Peter Eisentraut b2c8d56618 Re-add SPICleanup for ABI compatibility in stable branch
This fixes an ABI break introduced by
cfc86f9873.

Author: Markus Wanner <markus.wanner@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/defd749a-8410-841d-1126-21398686d63d@enterprisedb.com
2022-07-18 19:20:07 +02:00
Tom Lane cfc86f9873 Fix SPI's handling of errors during transaction commit.
SPI_commit previously left it up to the caller to recover from any error
occurring during commit.  Since that's complicated and requires use of
low-level xact.c facilities, it's not too surprising that no caller got
it right.  Let's move the responsibility for cleanup into spi.c.  Doing
that requires redefining SPI_commit as starting a new transaction, so
that it becomes equivalent to SPI_commit_and_chain except that you get
default transaction characteristics instead of preserving the prior
transaction's characteristics.  We can make this pretty transparent
API-wise by redefining SPI_start_transaction() as a no-op.  Callers
that expect to do something in between might be surprised, but
available evidence is that no callers do so.

Having made that API redefinition, we can fix this mess by having
SPI_commit[_and_chain] trap errors and start a new, clean transaction
before re-throwing the error.  Likewise for SPI_rollback[_and_chain].
Some cleanup is also needed in AtEOXact_SPI, which was nowhere near
smart enough to deal with SPI contexts nested inside a committing
context.

While plperl and pltcl need no changes beyond removing their now-useless
SPI_start_transaction() calls, plpython needs some more work because it
hadn't gotten the memo about catching commit/rollback errors in the
first place.  Such an error resulted in longjmp'ing out of the Python
interpreter, which leaks Python stack entries at present and is reported
to crash Python 3.11 altogether.  Add the missing logic to catch such
errors and convert them into Python exceptions.

This is a back-patch of commit 2e517818f.  That's now aged long enough
to reduce the concerns about whether it will break something, and we
do need to ensure that supported branches will work with Python 3.11.

Peter Eisentraut and Tom Lane

Discussion: https://postgr.es/m/3375ffd8-d71c-2565-e348-a597d6e739e3@enterprisedb.com
Discussion: https://postgr.es/m/17416-ed8fe5d7213d6c25@postgresql.org
2022-06-22 12:12:00 -04:00
Tom Lane 254cd7f31f Un-break whole-row Vars referencing domain-over-composite types.
In commit ec62cb0aa, I foolishly replaced ExecEvalWholeRowVar's
lookup_rowtype_tupdesc_domain call with just lookup_rowtype_tupdesc,
because I didn't see how a domain could be involved there, and
there were no regression test cases to jog my memory.  But the
existing code was correct, so revert that change and add a test
case showing why it's necessary.  (Note: per comment in struct
DatumTupleFields, it is correct to produce an output tuple that's
labeled with the base composite type, not the domain; hence just
blindly looking through the domain is correct here.)

Per bug #17515 from Dan Kubb.  Back-patch to v11 where domains over
composites became a thing.

Discussion: https://postgr.es/m/17515-a24737438363aca0@postgresql.org
2022-06-10 10:35:57 -04:00
Tom Lane 5e144cc89b Revert applying column aliases to the output of whole-row Vars.
In commit bf7ca1587, I had the bright idea that we could make the
result of a whole-row Var (that is, foo.*) track any column aliases
that had been applied to the FROM entry the Var refers to.  However,
that's not terribly logically consistent, because now the output of
the Var is no longer of the named composite type that the Var claims
to emit.  bf7ca1587 tried to handle that by changing the output
tuple values to be labeled with a blessed RECORD type, but that's
really pretty disastrous: we can wind up storing such tuples onto
disk, whereupon they're not readable by other sessions.

The only practical fix I can see is to give up on what bf7ca1587
tried to do, and say that the column names of tuples produced by
a whole-row Var are always those of the underlying named composite
type, query aliases or no.  While this introduces some inconsistencies,
it removes others, so it's not that awful in the abstract.  What *is*
kind of awful is to make such a behavioral change in a back-patched
bug fix.  But corrupt data is worse, so back-patched it will be.

(A workaround available to anyone who's unhappy about this is to
introduce an extra level of sub-SELECT, so that the whole-row Var is
referring to the sub-SELECT's output and not to a named table type.
Then the Var is of type RECORD to begin with and there's no issue.)

Per report from Miles Delahunty.  The faulty commit dates to 9.5,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/2950001.1638729947@sss.pgh.pa.us
2022-03-17 18:18:05 -04:00
Alexander Korotkov ac2303aa03 Fix memory leak in IndexScan node with reordering
Fix ExecReScanIndexScan() to free the referenced tuples while emptying the
priority queue.  Backpatch to all supported versions.

Discussion: https://postgr.es/m/CAHqSB9gECMENBQmpbv5rvmT3HTaORmMK3Ukg73DsX5H7EJV7jw%40mail.gmail.com
Author: Aliaksandr Kalenik
Reviewed-by: Tom Lane, Alexander Korotkov
Backpatch-through: 10
2022-02-14 03:32:34 +03:00
Tom Lane 9c7cf083f2 Test, don't just Assert, that mergejoin's inputs are in order.
There are two Asserts in nodeMergejoin.c that are reachable if
the input data is not in the expected order.  This seems way too
fragile.  Alexander Lakhin reported a case where the assertions
could be triggered with misconfigured foreign-table partitions,
and bitter experience with unstable operating system collation
definitions suggests another easy route to hitting them.  Neither
Assert is in a place where we can't afford one more test-and-branch,
so replace 'em with plain test-and-elog logic.

Per bug #17395.  While the reported symptom is relatively recent,
collation changes could happen anytime, so back-patch to all
supported branches.

Discussion: https://postgr.es/m/17395-8c326292078d1a57@postgresql.org
2022-02-05 11:59:30 -05:00
Tom Lane 20d08b2c61 Fix index-only scan plans, take 2.
Commit 4ace45677 failed to fix the problem fully, because the
same issue of attempting to fetch a non-returnable index column
can occur when rechecking the indexqual after using a lossy index
operator.  Moreover, it broke EXPLAIN for such indexquals (which
indicates a gap in our test cases :-().

Revert the code changes of 4ace45677 in favor of adding a new field
to struct IndexOnlyScan, containing a version of the indexqual that
can be executed against the index-returned tuple without using any
non-returnable columns.  (The restrictions imposed by check_index_only
guarantee this is possible, although we may have to recompute indexed
expressions.)  Support construction of that during setrefs.c
processing by marking IndexOnlyScan.indextlist entries as resjunk
if they can't be returned, rather than removing them entirely.
(We could alternatively require setrefs.c to look up the IndexOptInfo
again, but abusing resjunk this way seems like a reasonably safe way
to avoid needing to do that.)

This solution isn't great from an API-stability standpoint: if there
are any extensions out there that build IndexOnlyScan structs directly,
they'll be broken in the next minor releases.  However, only a very
invasive extension would be likely to do such a thing.  There's no
change in the Path representation, so typical planner extensions
shouldn't have a problem.

As before, back-patch to all supported branches.

Discussion: https://postgr.es/m/3179992.1641150853@sss.pgh.pa.us
Discussion: https://postgr.es/m/17350-b5bdcf476e5badbb@postgresql.org
2022-01-03 15:42:27 -05:00
Tom Lane ada667b454 Fix variable lifespan in ExecInitCoerceToDomain().
This undoes a mistake in 1ec7679f1: domainval and domainnull were
meant to live across loop iterations, but they were incorrectly
moved inside the loop.  The effect was only to emit useless extra
EEOP_MAKE_READONLY steps, so it's not a big deal; nonetheless,
back-patch to v13 where the mistake was introduced.

Ranier Vilela

Discussion: https://postgr.es/m/CAEudQAqXuhbkaAp-sGH6dR6Nsq7v28_0TPexHOm6FiDYqwQD-w@mail.gmail.com
2021-11-02 13:36:57 -04:00
Tom Lane e477642a1b Avoid some other O(N^2) hazards in list manipulation.
In the same spirit as 6301c3ada, fix some more places where we were
using list_delete_first() in a loop and thereby risking O(N^2)
behavior.  It's not clear that the lists manipulated in these spots
can get long enough to be really problematic ... but it's not clear
that they can't, either, and the fixes are simple enough.

As before, back-patch to v13.

Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com
2021-11-01 16:24:40 -04:00
Tom Lane 30e61a8cdc Fix assignment to array of domain over composite.
An update such as "UPDATE ... SET fld[n].subfld = whatever"
failed if the array elements were domains rather than plain
composites.  That's because isAssignmentIndirectionExpr()
failed to cope with the CoerceToDomain node that would appear
in the expression tree in this case.  The result would typically
be a crash, and even if we accidentally didn't crash, we'd not
correctly preserve other fields of the same array element.

Per report from Onder Kalaci.  Back-patch to v11 where arrays of
domains came in.

Discussion: https://postgr.es/m/PH0PR21MB132823A46AA36F0685B7A29AD8BD9@PH0PR21MB1328.namprd21.prod.outlook.com
2021-10-19 13:54:46 -04:00
Tom Lane 2b8f3f5a7c Get rid of artificial restriction on hash table sizes on Windows.
The point of introducing the hash_mem_multiplier GUC was to let users
reproduce the old behavior of hash aggregation, i.e. that it could use
more than work_mem at need.  However, the implementation failed to get
the job done on Win64, where work_mem is clamped to 2GB to protect
various places that calculate memory sizes using "long int".  As
written, the same clamp was applied to hash_mem.  This resulted in
severe performance regressions for queries requiring a bit more than
2GB for hash aggregation, as they now spill to disk and there's no
way to stop that.

Getting rid of the work_mem restriction seems like a good idea, but
it's a big job and could not conceivably be back-patched.  However,
there's only a fairly small number of places that are concerned with
the hash_mem value, and it turns out to be possible to remove the
restriction there without too much code churn or any ABI breaks.
So, let's do that for now to fix the regression, and leave the
larger task for another day.

This patch does introduce a bit more infrastructure that should help
with the larger task, namely pg_bitutils.h support for working with
size_t values.

Per gripe from Laurent Hasson.  Back-patch to v13 where the
behavior change came in.

Discussion: https://postgr.es/m/997817.1627074924@sss.pgh.pa.us
Discussion: https://postgr.es/m/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com
2021-07-25 14:02:27 -04:00
Tom Lane 77e3204ecb Fix usage of "tableoid" in GENERATED expressions.
We consider this supported (though I've got my doubts that it's a
good idea, because tableoid is not immutable).  However, several
code paths failed to fill the field in soon enough, causing such
a GENERATED expression to see zero or the wrong value.  This
occurred when ALTER TABLE adds a new GENERATED column to a table
with existing rows, and during regular INSERT or UPDATE on a
foreign table with GENERATED columns.

Noted during investigation of a report from Vitaly Ustinov.
Back-patch to v12 where GENERATED came in.

Discussion: https://postgr.es/m/CAM_DEiWR2DPT6U4xb-Ehigozzd3n3G37ZB1+867zbsEVtYoJww@mail.gmail.com
2021-05-21 15:02:07 -04:00
Tom Lane d18ee6f92d Restore the portal-level snapshot after procedure COMMIT/ROLLBACK.
COMMIT/ROLLBACK necessarily destroys all snapshots within the session.
The original implementation of intra-procedure transactions just
cavalierly did that, ignoring the fact that this left us executing in
a rather different environment than normal.  In particular, it turns
out that handling of toasted datums depends rather critically on there
being an outer ActiveSnapshot: otherwise, when SPI or the core
executor pop whatever snapshot they used and return, it's unsafe to
dereference any toasted datums that may appear in the query result.
It's possible to demonstrate "no known snapshots" and "missing chunk
number N for toast value" errors as a result of this oversight.

Historically this outer snapshot has been held by the Portal code,
and that seems like a good plan to preserve.  So add infrastructure
to pquery.c to allow re-establishing the Portal-owned snapshot if it's
not there anymore, and add enough bookkeeping support that we can tell
whether it is or not.

We can't, however, just re-establish the Portal snapshot as part of
COMMIT/ROLLBACK.  As in normal transaction start, acquiring the first
snapshot should wait until after SET and LOCK commands.  Hence, teach
spi.c about doing this at the right time.  (Note that this patch
doesn't fix the problem for any PLs that try to run intra-procedure
transactions without using SPI to execute SQL commands.)

This makes SPI's no_snapshots parameter rather a misnomer, so in HEAD,
rename that to allow_nonatomic.

replication/logical/worker.c also needs some fixes, because it wasn't
careful to hold a snapshot open around AFTER trigger execution.
That code doesn't use a Portal, which I suspect someday we're gonna
have to fix.  But for now, just rearrange the order of operations.
This includes back-patching the recent addition of finish_estate()
to centralize the cleanup logic there.

This also back-patches commit 2ecfeda3e into v13, to improve the
test coverage for worker.c (it was that test that exposed that
worker.c's snapshot management is wrong).

Per bug #15990 from Andreas Wicht.  Back-patch to v11 where
intra-procedure COMMIT was added.

Discussion: https://postgr.es/m/15990-eee2ac466b11293d@postgresql.org
2021-05-21 14:03:53 -04:00
Tom Lane 4a8656a7ee Fix mishandling of resjunk columns in ON CONFLICT ... UPDATE tlists.
It's unusual to have any resjunk columns in an ON CONFLICT ... UPDATE
list, but it can happen when MULTIEXPR_SUBLINK SubPlans are present.
If it happens, the ON CONFLICT UPDATE code path would end up storing
tuples that include the values of the extra resjunk columns.  That's
fairly harmless in the short run, but if new columns are added to
the table then the values would become accessible, possibly leading
to malfunctions if they don't match the datatypes of the new columns.

This had escaped notice through a confluence of missing sanity checks,
including

* There's no cross-check that a tuple presented to heap_insert or
heap_update matches the table rowtype.  While it's difficult to
check that fully at reasonable cost, we can easily add assertions
that there aren't too many columns.

* The output-column-assignment cases in execExprInterp.c lacked
any sanity checks on the output column numbers, which seems like
an oversight considering there are plenty of assertion checks on
input column numbers.  Add assertions there too.

* We failed to apply nodeModifyTable's ExecCheckPlanOutput() to
the ON CONFLICT UPDATE tlist.  That wouldn't have caught this
specific error, since that function is chartered to ignore resjunk
columns; but it sure seems like a bad omission now that we've seen
this bug.

In HEAD, the right way to fix this is to make the processing of
ON CONFLICT UPDATE tlists work the same as regular UPDATE tlists
now do, that is don't add "SET x = x" entries, and use
ExecBuildUpdateProjection to evaluate the tlist and combine it with
old values of the not-set columns.  This adds a little complication
to ExecBuildUpdateProjection, but allows removal of a comparable
amount of now-dead code from the planner.

In the back branches, the most expedient solution seems to be to
(a) use an output slot for the ON CONFLICT UPDATE projection that
actually matches the target table, and then (b) invent a variant of
ExecBuildProjectionInfo that can be told to not store values resulting
from resjunk columns, so it doesn't try to store into nonexistent
columns of the output slot.  (We can't simply ignore the resjunk columns
altogether; they have to be evaluated for MULTIEXPR_SUBLINK to work.)
This works back to v10.  In 9.6, projections work much differently and
we can't cheaply give them such an option.  The 9.6 version of this
patch works by inserting a JunkFilter when it's necessary to get rid
of resjunk columns.

In addition, v11 and up have the reverse problem when trying to
perform ON CONFLICT UPDATE on a partitioned table.  Through a
further oversight, adjust_partition_tlist() discarded resjunk columns
when re-ordering the ON CONFLICT UPDATE tlist to match a partition.
This accidentally prevented the storing-bogus-tuples problem, but
at the cost that MULTIEXPR_SUBLINK cases didn't work, typically
crashing if more than one row has to be updated.  Fix by preserving
resjunk columns in that routine.  (I failed to resist the temptation
to add more assertions there too, and to do some minor code
beautification.)

Per report from Andres Freund.  Back-patch to all supported branches.

Security: CVE-2021-32028
2021-05-10 11:02:29 -04:00
Tom Lane 467395bfdf Prevent integer overflows in array subscripting calculations.
While we were (mostly) careful about ensuring that the dimensions of
arrays aren't large enough to cause integer overflow, the lower bound
values were generally not checked.  This allows situations where
lower_bound + dimension overflows an integer.  It seems that that's
harmless so far as array reading is concerned, except that array
elements with subscripts notionally exceeding INT_MAX are inaccessible.
However, it confuses various array-assignment logic, resulting in a
potential for memory stomps.

Fix by adding checks that array lower bounds aren't large enough to
cause lower_bound + dimension to overflow.  (Note: this results in
disallowing cases where the last subscript position would be exactly
INT_MAX.  In principle we could probably allow that, but there's a lot
of code that computes lower_bound + dimension and would need adjustment.
It seems doubtful that it's worth the trouble/risk to allow it.)

Somewhat independently of that, array_set_element() was careless
about possible overflow when checking the subscript of a fixed-length
array, creating a different route to memory stomps.  Fix that too.

Security: CVE-2021-32027
2021-05-10 10:44:38 -04:00
Tom Lane 2602ee4689 Don't crash on reference to an un-available system column.
Adopt a more consistent policy about what slot-type-specific
getsysattr functions should do when system attributes are not
available.  To wit, they should all throw the same user-oriented
error, rather than variously crashing or emitting developer-oriented
messages.

This closes a identifiable problem in commits a71cfc56b and
3fb93103a (in v13 and v12), so back-patch into those branches,
along with a test case to try to ensure we don't break it again.
It is not known that any of the former crash cases are reachable
in HEAD, but this seems like a good safety improvement in any case.

Discussion: https://postgr.es/m/141051591267657@mail.yandex.ru
2021-04-22 17:30:42 -04:00
Tom Lane a71cfc56bf Fix bugs in RETURNING in cross-partition UPDATE cases.
If the source and destination partitions don't have identical
rowtypes (for example, one has dropped columns the other lacks),
then the planSlot contents will be different because of that.
If the query has a RETURNING list that tries to return resjunk
columns out of the planSlot, that is columns from tables that
were joined to the target table, we'd get errors or wrong answers.
That's because we used the RETURNING list generated for the
destination partition, which expects a planSlot matching that
partition's subplan.

The most practical fix seems to be to convert the updated destination
tuple back to the source partition's rowtype, and then apply the
RETURNING list generated for the source partition.  This avoids making
fragile assumptions about whether the per-subpartition subplans
generated all the resjunk columns in the same order.

This has been broken since v11 introduced cross-partition UPDATE.
The lack of field complaints shows that non-identical partitions
aren't a common case; therefore, don't stress too hard about
making the conversion efficient.

There's no such bug in HEAD, because commit 86dc90056 got rid of
per-target-relation variance in the contents of the planSlot.
Hence, patch v11-v13 only.

Amit Langote and Etsuro Fujita, small changes by me

Discussion: https://postgr.es/m/CA+HiwqE_UK1jTSNrjb8mpTdivzd3dum6mK--xqKq0Y9VmfwWQA@mail.gmail.com
2021-04-22 11:46:41 -04:00
Tom Lane 97b7ad4688 Redesign the caching done by get_cached_rowtype().
Previously, get_cached_rowtype() cached a pointer to a reference-counted
tuple descriptor from the typcache, relying on the ExprContextCallback
mechanism to release the tupdesc refcount when the expression tree
using the tupdesc was destroyed.  This worked fine when it was designed,
but the introduction of within-DO-block COMMITs broke it.  The refcount
is logged in a transaction-lifespan resource owner, but plpgsql won't
destroy simple expressions made within the DO block (before its first
commit) until the DO block is exited.  That results in a warning about
a leaked tupdesc refcount when the COMMIT destroys the original resource
owner, and then an error about the active resource owner not holding a
matching refcount when the expression is destroyed.

To fix, get rid of the need to have a shutdown callback at all, by
instead caching a pointer to the relevant typcache entry.  Those
survive for the life of the backend, so we needn't worry about the
pointer becoming stale.  (For registered RECORD types, we can still
cache a pointer to the tupdesc, knowing that it won't change for the
life of the backend.)  This mechanism has been in use in plpgsql
and expandedrecord.c since commit 4b93f5799, and seems to work well.

This change requires modifying the ExprEvalStep structs used by the
relevant expression step types, which is slightly worrisome for
back-patching.  However, there seems no good reason for extensions
to be familiar with the details of these particular sub-structs.

Per report from Rohit Bhogate.  Back-patch to v11 where within-DO-block
COMMITs became a thing.

Discussion: https://postgr.es/m/CAAV6ZkQRCVBh8qAY+SZiHnz+U+FqAGBBDaDTjF2yiKa2nJSLKg@mail.gmail.com
2021-04-13 13:37:07 -04:00
Thomas Munro 69739ec317 Revert "Fix race in Parallel Hash Join batch cleanup."
This reverts commit 4e0f0995e9.

Discussion: https://postgr.es/m/CA%2BhUKGJmcqAE3MZeDCLLXa62cWM0AJbKmp2JrJYaJ86bz36LFA%40mail.gmail.com
2021-03-18 01:00:56 +13:00