Commit Graph

2235 Commits

Author SHA1 Message Date
Michael Paquier 8a15e735be Fix some grammar and typos in comments and docs
The documentation fixes are backpatched down to where they apply.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20201031020801.GD3080@telsasoft.com
Backpatch-through: 9.6
2020-11-02 15:14:41 +09:00
Tom Lane 20d3fe9009 In INSERT/UPDATE, use the table's real tuple descriptor as target.
Previously, ExecInitModifyTable relied on ExecInitJunkFilter,
and thence ExecCleanTypeFromTL, to build the target descriptor from
the query tlist.  While we just checked (in ExecCheckPlanOutput)
that the tlist produces compatible output, this is not a great
substitute for the relation's actual tuple descriptor that's
available from the relcache.  For one thing, dropped columns will
not be correctly marked attisdropped; it's a bit surprising that
we've gotten away with that this long.  But the real reason for
being concerned with this is that using the table's descriptor means
that the slot will have correct attrmissing data, allowing us to
revert the klugy fix of commit ba9f18abd.  (This commit undoes
that one's changes in trigger.c, but keeps the new test case.)
Thus we can solve the bogus-trigger-tuple problem with fewer cycles
rather than more.

No back-patch, since this doesn't fix any additional bug, and it
seems somewhat more likely to have unforeseen side effects than
ba9f18abd's narrow fix.

Discussion: https://postgr.es/m/16644-5da7ef98a7ac4545@postgresql.org
2020-10-26 11:36:53 -04:00
Heikki Linnakangas 22b73d3cb0 Fix initialization of es_result_relations in EvalPlanQualStart().
Thinko in commit 1375422c78. EvalPlanQualStart() was mistakenly
resetting the parent EState's es_result_relations, when it should
initialize the field in the child EPQ EState it just created.

That was clearly wrong, but it didn't cause any ill effects, because
es_result_relations is currently not used after the ExecInit* phase.

Author: Amit Langote
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqFEuq8AAAmxXsTDVZ1r38cHbfYuiPQx_%3DYyKe2DC-6q4A%40mail.gmail.com
2020-10-23 09:30:08 +03:00
Tom Lane c8ab970179 Fix list-munging bug that broke SQL function result coercions.
Since commit 913bbd88d, check_sql_fn_retval() can either insert type
coercion steps in-line in the Query that produces the SQL function's
results, or generate a new top-level Query to perform the coercions,
if modifying the Query's output in-place wouldn't be safe.  However,
it appears that the latter case has never actually worked, because
the code tried to inject the new Query back into the query list it was
passed ... which is not the list that will be used for later processing
when we execute the SQL function "normally" (without inlining it).
So we ended up with no coercion happening at run-time, leading to
wrong results or crashes depending on the datatypes involved.

While the regression tests look like they cover this area well enough,
through a huge bit of bad luck all the test cases that exercise the
separate-Query path were checking either inline-able cases (which
accidentally didn't have the bug) or cases that are no-ops at runtime
(e.g., varchar to text), so that the failure to perform the coercion
wasn't obvious.  The fact that the cases that don't work weren't
allowed at all before v13 probably contributed to not noticing the
problem sooner, too.

To fix, get rid of the separate "flat" list of Query nodes and instead
pass the real two-level list that is going to be used later.  I chose
to make the same change in check_sql_fn_statements(), although that has
no actual bug, just so that we don't need that data structure at all.

This is an API change, as evidenced by the adjustments needed to
callers outside functions.c.  That's a bit scary to be doing in a
released branch, but so far as I can tell from a quick search,
there are no outside callers of these functions (and they are
sufficiently specific to our semantics for SQL-language functions that
it's not apparent why any extension would need to call them).  In any
case, v13 already changed the API of check_sql_fn_retval() compared to
prior branches.

Per report from pinker.  Back-patch to v13 where this code came in.

Discussion: https://postgr.es/m/1603050466566-0.post@n3.nabble.com
2020-10-19 14:33:09 -04:00
Heikki Linnakangas fb5883da86 Remove PartitionRoutingInfo struct.
The extra indirection neeeded to access its members via its enclosing
ResultRelInfo seems pointless. Move all the fields from
PartitionRoutingInfo to ResultRelInfo.

Author: Amit Langote
Reviewed-by: Alvaro Herrera
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqFViT47Zbr_ASBejiK7iDG8%3DQ1swQ-tjM6caRPQ67pT%3Dw%40mail.gmail.com
2020-10-19 14:42:55 +03:00
Heikki Linnakangas 6973533650 Revise child-to-root tuple conversion map management.
Store the tuple conversion map to convert a tuple from a child table's
format to the root format in a new ri_ChildToRootMap field in
ResultRelInfo. It is initialized if transition tuple capture for FOR
STATEMENT triggers or INSERT tuple routing on a partitioned table is
needed. Previously, ModifyTable kept the maps in the per-subplan
ModifyTableState->mt_per_subplan_tupconv_maps array, or when tuple
routing was used, in
ResultRelInfo->ri_Partitioninfo->pi_PartitionToRootMap. The new field
replaces both of those.

Now that the child-to-root tuple conversion map is always available in
ResultRelInfo (when needed), remove the TransitionCaptureState.tcs_map
field. The callers of Exec*Trigger() functions no longer need to set or
save it, which is much less confusing and bug-prone. Also, as a future
optimization, this will allow us to delay creating the map for a given
result relation until the relation is actually processed during
execution.

Author: Amit Langote
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqHtCWLdK-LO%3DNEsvOdHx%2B7yv4mE_zYK0i3BH7dXb-wxog%40mail.gmail.com
2020-10-19 14:42:55 +03:00
Heikki Linnakangas f49b85d783 Clean up code to resolve the "root target relation" in nodeModifyTable.c
When executing DDL on a partitioned table or on a table with inheritance
children, statement-level triggers must be fired against the table given
in the original statement. The code to look that up was a bit messy and
duplicative. Commit 501ed02cf6 added a helper function,
getASTriggerResultRelInfo() (later renamed to getTargetResultRelInfo())
for it, but for some reason it was only used when firing AFTER STATEMENT
triggers and the code to fire BEFORE STATEMENT triggers duplicated the
logic.

Determine the target relation in ExecInitModifyTable(), and set it always
in ModifyTableState. Code that used to call getTargetResultRelInfo() can
now use ModifyTableState->rootResultRelInfo directly.

Discussion: https://www.postgresql.org/message-id/CA%2BHiwqFViT47Zbr_ASBejiK7iDG8%3DQ1swQ-tjM6caRPQ67pT%3Dw%40mail.gmail.com
2020-10-19 14:42:40 +03:00
Heikki Linnakangas c5b097f8fa Refactor code for cross-partition updates to a separate function.
ExecUpdate() is very long, so extract the part of it that deals with
cross-partition updates to a separate function to make it more readable.
Per Andres Freund's suggestion.

Author: Amit Langote
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqEUgb5RdUgxR7Sqco4S09jzJstHiaT2vnCRPGR4JCAPqA%40mail.gmail.com
2020-10-15 17:08:10 +03:00
Heikki Linnakangas a04daa97a4 Remove es_result_relation_info from EState.
Maintaining 'es_result_relation_info' correctly at all times has become
cumbersome, especially with partitioning where each partition gets its
own result relation info. Having to set and reset it across arbitrary
operations has caused bugs in the past.

This changes all the places that used 'es_result_relation_info', to
receive the currently active ResultRelInfo via function parameters
instead.

Author: Amit Langote
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
2020-10-14 11:41:40 +03:00
Heikki Linnakangas 178f2d560d Include result relation info in direct modify ForeignScan nodes.
FDWs that can perform an UPDATE/DELETE remotely using the "direct modify"
set of APIs need to access the ResultRelInfo of the target table. That's
currently available in EState.es_result_relation_info, but the next
commit will remove that field.

This commit adds a new resultRelation field in ForeignScan, to store the
target relation's RT index, and the corresponding ResultRelInfo in
ForeignScanState. The FDW's PlanDirectModify callback is expected to set
'resultRelation' along with 'operation'. The core code doesn't need them
for anything, they are for the convenience of FDW's Begin- and
IterateDirectModify callbacks.

Authors: Amit Langote, Etsuro Fujita
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
2020-10-14 10:58:38 +03:00
Heikki Linnakangas 1375422c78 Create ResultRelInfos later in InitPlan, index them by RT index.
Instead of allocating all the ResultRelInfos upfront in one big array,
allocate them in ExecInitModifyTable(). es_result_relations is now an
array of ResultRelInfo pointers, rather than an array of structs, and it
is indexed by the RT index.

This simplifies things: we get rid of the separate concept of a "result
rel index", and don't need to set it in setrefs.c anymore. This also
allows follow-up optimizations (not included in this commit yet) to skip
initializing ResultRelInfos for target relations that were not needed at
runtime, and removal of the es_result_relation_info pointer.

The EState arrays of regular result rels and root result rels are merged
into one array. Similarly, the resultRelations and rootResultRelations
lists in PlannedStmt are merged into one. It's not actually clear to me
why they were kept separate in the first place, but now that the
es_result_relations array is indexed by RT index, it certainly seems
pointless.

The PlannedStmt->resultRelations list is now only needed for
ExecRelationIsTargetRelation(). One visible effect of this change is that
ExecRelationIsTargetRelation() will now return 'true' also for the
partition root, if a partitioned table is updated. That seems like a good
thing, although the function isn't used in core code, and I don't see any
reason for an FDW to call it on a partition root.

Author: Amit Langote
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
2020-10-13 12:57:02 +03:00
Peter Eisentraut 2453ea1422 Support for OUT parameters in procedures
Unlike for functions, OUT parameters for procedures are part of the
signature.  Therefore, they have to be listed in pg_proc.proargtypes
as well as mentioned in ALTER PROCEDURE and DROP PROCEDURE.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/2b8490fe-51af-e671-c504-47359dc453c5@2ndquadrant.com
2020-10-05 09:21:43 +02:00
Tom Lane 41efb83408 Move resolution of AlternativeSubPlan choices to the planner.
When commit bd3daddaf introduced AlternativeSubPlans, I had some
ambitions towards allowing the choice of subplan to change during
execution.  That has not happened, or even been thought about, in the
ensuing twelve years; so it seems like a failed experiment.  So let's
rip that out and resolve the choice of subplan at the end of planning
(in setrefs.c) rather than during executor startup.  This has a number
of positive benefits:

* Removal of a few hundred lines of executor code, since
AlternativeSubPlans need no longer be supported there.

* Removal of executor-startup overhead (particularly, initialization
of subplans that won't be used).

* Removal of incidental costs of having a larger plan tree, such as
tree-scanning and copying costs in the plancache; not to mention
setrefs.c's own costs of processing the discarded subplans.

* EXPLAIN no longer has to print a weird (and undocumented)
representation of an AlternativeSubPlan choice; it sees only the
subplan actually used.  This should mean less confusion for users.

* Since setrefs.c knows which subexpression of a plan node it's
working on at any instant, it's possible to adjust the estimated
number of executions of the subplan based on that.  For example,
we should usually estimate more executions of a qual expression
than a targetlist expression.  The implementation used here is
pretty simplistic, because we don't want to expend a lot of cycles
on the issue; but it's better than ignoring the point entirely,
as the executor had to.

That last point might possibly result in shifting the choice
between hashed and non-hashed EXISTS subplans in a few cases,
but in general this patch isn't meant to change planner choices.
Since we're doing the resolution so late, it's really impossible
to change any plan choices outside the AlternativeSubPlan itself.

Patch by me; thanks to David Rowley for review.

Discussion: https://postgr.es/m/1992952.1592785225@sss.pgh.pa.us
2020-09-27 12:51:28 -04:00
Tom Lane 2000b6c10a Don't fetch partition check expression during InitResultRelInfo.
Since there is only one place that actually needs the partition check
expression, namely ExecPartitionCheck, it's better to fetch it from
the relcache there.  In this way we will never fetch it at all if
the query never has use for it, and we still fetch it just once when
we do need it.

The reason for taking an interest in this is that if the relcache
doesn't already have the check expression cached, fetching it
requires obtaining AccessShareLock on the partition root.  That
means that operations that look like they should only touch the
partition itself will also take a lock on the root.  In particular
we observed that TRUNCATE on a partition may take a lock on the
partition's root, contributing to a deadlock situation in parallel
pg_restore.

As written, this patch does have a small cost, which is that we
are microscopically reducing efficiency for the case where a partition
has an empty check expression.  ExecPartitionCheck will be called,
and will go through the motions of setting up and checking an empty
qual, where before it would not have been called at all.  We could
avoid that by adding a separate boolean flag to track whether there
is a partition expression to test.  However, this case only arises
for a default partition with no siblings, which surely is not an
interesting case in practice.  Hence adding complexity for it
does not seem like a good trade-off.

Amit Langote, per a suggestion by me

Discussion: https://postgr.es/m/VI1PR03MB31670CA1BD9625C3A8C5DD05EB230@VI1PR03MB3167.eurprd03.prod.outlook.com
2020-09-16 14:28:18 -04:00
Jeff Davis c8aeaf3ab3 Change LogicalTapeSetBlocks() to use nBlocksWritten.
Previously, it was based on nBlocksAllocated to account for tapes with
open write buffers that may not have made it to the BufFile yet.

That was unnecessary, because callers do not need to get the number of
blocks while a tape has an open write buffer; and it also conflicted
with the preallocation logic added for HashAgg.

Reviewed-by: Peter Geoghegan
Discussion: https://postgr.es/m/ce5af05900fdbd0e9185747825a7423c48501964.camel@j-davis.com
Backpatch-through: 13
2020-09-15 21:42:25 -07:00
Jeff Davis 3bd35d4f51 HashAgg: release write buffers sooner by rewinding tape.
This was an oversight. The purpose of 7fdd919ae7 was to avoid keeping
tape buffers around unnecessisarily, but HashAgg didn't rewind early
enough.

Reviewed-by: Peter Geoghegan
Discussion: https://postgr.es/m/1fb1151c2cddf8747d14e0532da283c3f97e2685.camel@j-davis.com
Backpatch-through: 13
2020-09-15 21:18:22 -07:00
Jeff Davis 0758964963 logtape.c: do not preallocate for tapes when sorting
The preallocation logic is only useful for HashAgg, so disable it when
sorting.

Also, adjust an out-of-date comment.

Reviewed-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wzn_o7tE2+hRVvwSFghRb75AJ5g-nqGzDUqLYMexjOAe=g@mail.gmail.com
Backpatch-through: 13
2020-09-11 17:10:02 -07:00
Alvaro Herrera f481d28232
Check default partitions constraints while descending
Partitioning tuple route code assumes that the partition chosen while
descending the partition hierarchy is always the correct one.  This is
true except when the partition is the default partition and another
partition has been added concurrently: the partition constraint changes
and we don't recheck it.  This can lead to tuples mistakenly being added
to the default partition that should have been rejected.

Fix by rechecking the default partition constraint while descending the
hierarchy.

An isolation test based on the reproduction steps described by Hao Wu
(with tweaks for extra coverage) is included.

Backpatch to 12, where this bug came in with 898e5e3290.

Reported by: Hao Wu <hawu@vmware.com>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CA+HiwqFqBmcSSap4sFnCBUEL_VfOMmEKaQ3gwUhyfa4c7J_-nA@mail.gmail.com
Discussion: https://postgr.es/m/DM5PR0501MB3910E97A9EDFB4C775CF3D75A42F0@DM5PR0501MB3910.namprd05.prod.outlook.com
2020-09-08 19:35:15 -03:00
Tom Lane 1e7629d2c9 Be more careful about the shape of hashable subplan clauses.
nodeSubplan.c expects that the testexpr for a hashable ANY SubPlan
has the form of one or more OpExprs whose LHS is an expression of the
outer query's, while the RHS is an expression over Params representing
output columns of the subquery.  However, the planner only went as far
as verifying that the clauses were all binary OpExprs.  This works
99.99% of the time, because the clauses have the right shape when
emitted by the parser --- but it's possible for function inlining to
break that, as reported by PegoraroF10.  To fix, teach the planner
to check that the LHS and RHS contain the right things, or more
accurately don't contain the wrong things.  Given that this has been
broken for years without anyone noticing, it seems sufficient to just
give up hashing when it happens, rather than go to the trouble of
commuting the clauses back again (which wouldn't necessarily work
anyway).

While poking at that, I also noticed that nodeSubplan.c had a baked-in
assumption that the number of hash clauses is identical to the number
of subquery output columns.  Again, that's fine as far as parser output
goes, but it's not hard to break it via function inlining.  There seems
little reason for that assumption though --- AFAICS, the only thing
it's buying us is not having to store the number of hash clauses
explicitly.  Adding code to the planner to reject such cases would take
more code than getting nodeSubplan.c to cope, so I fixed it that way.

This has been broken for as long as we've had hashable SubPlans,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/1549209182255-0.post@n3.nabble.com
2020-08-14 22:14:03 -04:00
Tom Lane 7a980dfc6c Fix matching of sub-partitions when a partitioned plan is stale.
Since we no longer require AccessExclusiveLock to add a partition,
the executor may see that a partitioned table has more partitions
than the planner saw.  ExecCreatePartitionPruneState's code for
matching up the partition lists in such cases was faulty, and would
misbehave if the planner had successfully pruned any partitions from
the query.  (Thus, trouble would occur only if a partition addition
happens concurrently with a query that uses both static and dynamic
partition pruning.)  This led to an Assert failure in debug builds,
and probably to crashes or query misbehavior in production builds.

To repair the bug, just explicitly skip zeroes in the plan's
relid_map[] list.  I also made some cosmetic changes to make the code
more readable (IMO anyway).  Also, convert the cross-checking Assert
to a regular test-and-elog, since it's now apparent that this logic
is more fragile than one would like.

Currently, there's no way to repeatably exercise this code, except
with manual use of a debugger to stop the backend between planning
and execution.  Hence, no test case in this patch.  We oughta do
something about that testability gap, but that's for another day.

Amit Langote and Tom Lane, per report from Justin Pryzby.  Oversight
in commit 898e5e329; backpatch to v12 where that appeared.

Discussion: https://postgr.es/m/20200802181131.GA27754@telsasoft.com
2020-08-05 15:38:55 -04:00
Peter Geoghegan d6c08e29e7 Add hash_mem_multiplier GUC.
Add a GUC that acts as a multiplier on work_mem.  It gets applied when
sizing executor node hash tables that were previously size constrained
using work_mem alone.

The new GUC can be used to preferentially give hash-based nodes more
memory than the generic work_mem limit.  It is intended to enable admin
tuning of the executor's memory usage.  Overall system throughput and
system responsiveness can be improved by giving hash-based executor
nodes more memory (especially over sort-based alternatives, which are
often much less sensitive to being memory constrained).

The default value for hash_mem_multiplier is 1.0, which is also the
minimum valid value.  This means that hash-based nodes continue to apply
work_mem in the traditional way by default.

hash_mem_multiplier is generally useful.  However, it is being added now
due to concerns about hash aggregate performance stability for users
that upgrade to Postgres 13 (which added disk-based hash aggregation in
commit 1f39bce0).  While the old hash aggregate behavior risked
out-of-memory errors, it is nevertheless likely that many users actually
benefited.  Hash agg's previous indifference to work_mem during query
execution was not just faster; it also accidentally made aggregation
resilient to grouping estimate problems (at least in cases where this
didn't create destabilizing memory pressure).

hash_mem_multiplier can provide a certain kind of continuity with the
behavior of Postgres 12 hash aggregates in cases where the planner
incorrectly estimates that all groups (plus related allocations) will
fit in work_mem/hash_mem.  This seems necessary because hash-based
aggregation is usually much slower when only a small fraction of all
groups can fit.  Even when it isn't possible to totally avoid hash
aggregates that spill, giving hash aggregation more memory will reliably
improve performance (the same cannot be said for external sort
operations, which appear to be almost unaffected by memory availability
provided it's at least possible to get a single merge pass).

The PostgreSQL 13 release notes should advise users that increasing
hash_mem_multiplier can help with performance regressions associated
with hash aggregation.  That can be taken care of by a later commit.

Author: Peter Geoghegan
Reviewed-By: Álvaro Herrera, Jeff Davis
Discussion: https://postgr.es/m/20200625203629.7m6yvut7eqblgmfo@alap3.anarazel.de
Discussion: https://postgr.es/m/CAH2-WzmD%2Bi1pG6rc1%2BCjc4V6EaFJ_qSuKCCHVnH%3DoruqD-zqow%40mail.gmail.com
Backpatch: 13-, where disk-based hash aggregation was introduced.
2020-07-29 14:14:58 -07:00
Jeff Davis 9878b643f3 HashAgg: use better cardinality estimate for recursive spilling.
Use HyperLogLog to estimate the group cardinality in a spilled
partition. This estimate is used to choose the number of partitions if
we recurse.

The previous behavior was to use the number of tuples in a spilled
partition as the estimate for the number of groups, which lead to
overpartitioning. That could cause the number of batches to be much
higher than expected (with each batch being very small), which made it
harder to interpret EXPLAIN ANALYZE results.

Reviewed-by: Peter Geoghegan
Discussion: https://postgr.es/m/a856635f9284bc36f7a77d02f47bbb6aaf7b59b3.camel@j-davis.com
Backpatch-through: 13
2020-07-28 23:16:28 -07:00
Peter Geoghegan c49c74d192 Rename another "hash_mem" local variable.
Missed by my commit 564ce621.

Backpatch: 13-, where disk-based hash aggregation was introduced.
2020-07-28 17:59:16 -07:00
David Rowley 0e3e1c4e1c Make EXPLAIN ANALYZE of HashAgg more similar to Hash Join
There were various unnecessary differences between Hash Agg's EXPLAIN
ANALYZE output and Hash Join's.  Here we modify the Hash Agg output so
that it's better aligned to Hash Join's.

The following changes have been made:
1. Start batches counter at 1 instead of 0.
2. Always display the "Batches" property, even when we didn't spill to
   disk.
3. Use the text "Batches" instead of "HashAgg Batches" for text format.
4. Use the text "Memory Usage" instead of "Peak Memory Usage" for text
   format.
5. Include "Batches" before "Memory Usage" in both text and non-text
   formats.

In passing also modify the "Planned Partitions" property so that we show
it regardless of if the value is 0 or not for non-text EXPLAIN formats.
This was pointed out by Justin Pryzby and probably should have been part
of 40efbf870.

Reviewed-by: Justin Pryzby, Jeff Davis
Discussion: https://postgr.es/m/CAApHDvrshRnA6C0VFnu7Fb9TVvgGo80PUMm5+2DiaS1gEkPvtw@mail.gmail.com
Backpatch-through: 13, where HashAgg batching was introduced
2020-07-29 11:42:21 +12:00
Jeff Davis 200f6100a9 Fix LookupTupleHashEntryHash() pipeline-stall issue.
Refactor hash lookups in nodeAgg.c to improve performance.

Author: Andres Freund and Jeff Davis
Discussion: https://postgr.es/m/20200612213715.op4ye4q7gktqvpuo%40alap3.anarazel.de
Backpatch-through: 13
2020-07-26 15:09:46 -07:00
Amit Kapila 2a2494229a Fix buffer usage stats for nodes above Gather Merge.
Commit 85c9d347 addressed a similar problem for Gather and Gather
Merge nodes but forgot to account for nodes above parallel nodes.  This
still works for nodes above Gather node because we shut down the workers
for Gather node as soon as there are no more tuples.  We can do a similar
thing for Gather Merge as well but it seems better to account for stats
during nodes shutdown after completing the execution.

Reported-by: Stéphane Lorek, Jehan-Guillaume de Rorthais
Author: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Reviewed-by: Amit Kapila
Backpatch-through: 10, where it was introduced
Discussion: https://postgr.es/m/20200718160206.584532a2@firost
2020-07-25 10:20:39 +05:30
Amit Kapila 044dc7b964 Fix minor typo in nodeIncrementalSort.c.
Author: Vignesh C
Reviewed-by: James Coleman
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/CALDaNm0WjZqRvdeL59ZfYH0o4mLbKQ23jm-bnjXcFzgpANx55g@mail.gmail.com
2020-07-20 07:45:26 +05:30
Peter Geoghegan 564ce62164 Rename "hash_mem" local variable.
The term "hash_mem" will take on new significance when pending work to
add a new hash_mem_multiplier GUC is committed.  Rename a local variable
that happens to have been called hash_mem now to avoid confusion.
2020-07-17 18:24:23 -07:00
Thomas Munro cdc7169509 Use MinimalTuple for tuple queues.
This representation saves 8 bytes per tuple compared to HeapTuple, and
avoids the need to allocate, copy and free on the receiving side.

Gather can emit the returned MinimalTuple directly, but GatherMerge now
needs to make an explicit copy because it buffers multiple tuples at a
time.  That should be no worse than before.

Reviewed-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2B8T_ggoUTAE-U%3DA%2BOcPc4%3DB0nPPHcSfffuQhvXXjML6w%40mail.gmail.com
2020-07-17 15:04:16 +12:00
Jeff Davis 2302302236 HashAgg: before spilling tuples, set unneeded columns to NULL.
This is a replacement for 4cad2534. Instead of projecting all tuples
going into a HashAgg, only remove unnecessary attributes when actually
spilling. This avoids the regression for the in-memory case.

Discussion: https://postgr.es/m/a2fb7dfeb4f50aa0a123e42151ee3013933cb802.camel%40j-davis.com
Backpatch-through: 13
2020-07-12 22:59:32 -07:00
Andres Freund e07633646a code: replace 'master' with 'leader' where appropriate.
Leader already is the more widely used terminology, but a few places
didn't get the message.

Author: Andres Freund
Reviewed-By: David Steele
Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de
2020-07-08 12:58:32 -07:00
David Rowley 9bdb300ded Fix EXPLAIN ANALYZE for parallel HashAgg plans
Since 1f39bce02, HashAgg nodes have had the ability to spill to disk when
memory consumption exceeds work_mem. That commit added new properties to
EXPLAIN ANALYZE to show the maximum memory usage and disk usage, however,
it didn't quite go as far as showing that information for parallel
workers.  Since workers may have experienced something very different from
the main process, we should show this information per worker, as is done
in Sort.

Reviewed-by: Justin Pryzby
Reviewed-by: Jeff Davis
Discussion: https://postgr.es/m/CAApHDvpEKbfZa18mM1TD7qV6PG+w97pwCWq5tVD0dX7e11gRJw@mail.gmail.com
Backpatch-through: 13, where the hashagg spilling code was added.
2020-06-19 17:24:27 +12:00
Thomas Munro 7897e3bb90 Fix buffile.c error handling.
Convert buffile.c error handling to use ereport.  This fixes cases where
I/O errors were indistinguishable from EOF or not reported.  Also remove
"%m" from error messages where errno would be bogus.  While we're
modifying those strings, add block numbers and short read byte counts
where appropriate.

Back-patch to all supported releases.

Reported-by: Amit Khandekar <amitdkhan.pg@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGJE04G%3D8TLK0DLypT_27D9dR8F1RQgNp0jK6qR0tZGWOw%40mail.gmail.com
2020-06-16 16:59:07 +12:00
Tom Lane 2f48ede080 Avoid using a cursor in plpgsql's RETURN QUERY statement.
plpgsql has always executed the query given in a RETURN QUERY command
by opening it as a cursor and then fetching a few rows at a time,
which it turns around and dumps into the function's result tuplestore.
The point of this was to keep from blowing out memory with an oversized
SPITupleTable result (note that while a tuplestore can spill tuples
to disk, SPITupleTable cannot).  However, it's rather inefficient, both
because of extra data copying and because of executor entry/exit
overhead.  In recent versions, a new performance problem has emerged:
use of a cursor prevents use of a parallel plan for the executed query.

We can improve matters by skipping use of a cursor and having the
executor push result tuples directly into the function's result
tuplestore.  However, a moderate amount of new infrastructure is needed
to make that idea work:

* We can use the existing tstoreReceiver.c DestReceiver code to funnel
executor output to the tuplestore, but it has to be extended to support
plpgsql's requirement for possibly applying a tuple conversion map.

* SPI needs to be extended to allow use of a caller-supplied
DestReceiver instead of its usual receiver that puts tuples into
a SPITupleTable.  Two new API calls are needed to handle both the
RETURN QUERY and RETURN QUERY EXECUTE cases.

I also felt that I didn't want these new API calls to use the legacy
method of specifying query parameter values with "char" null flags
(the old ' '/'n' convention); rather they should accept ParamListInfo
objects containing the parameter type and value info.  This required
a bit of additional new infrastructure since we didn't yet have any
parse analysis callback that would interpret $N parameter symbols
according to type data supplied in a ParamListInfo.  There seems to be
no harm in letting makeParamList install that callback by default,
rather than leaving a new ParamListInfo's parserSetup hook as NULL.
(Indeed, as of HEAD, I couldn't find anyplace that was using the
parserSetup field at all; plpgsql was using parserSetupArg for its
own purposes, but parserSetup seemed to be write-only.)

We can actually get plpgsql out of the business of using legacy null
flags altogether, and using ParamListInfo instead of its ad-hoc
PreparedParamsData structure; but this requires inventing one more
SPI API call that can replace SPI_cursor_open_with_args.  That seems
worth doing, though.

SPI_execute_with_args and SPI_cursor_open_with_args are now unused
anywhere in the core PG distribution.  Perhaps someday we could
deprecate/remove them.  But cleaning up the crufty bits of the SPI
API is a task for a different patch.

Per bug #16040 from Jeremy Smith.  This is unfortunately too invasive to
consider back-patching.  Patch by me; thanks to Hamid Akhtar for review.

Discussion: https://postgr.es/m/16040-eaacad11fecfb198@postgresql.org
2020-06-12 12:14:32 -04:00
Jeff Davis 1b2c29469a Fix HashAgg regression from choosing too many initial buckets.
Diagnosis by Andres.

Reported-by: Pavel Stehule
Discussion: https://postgr.es/m/CAFj8pRDLVakD5Aagt3yZeEQeTeEWaS3YE5h8XC3Q3qJ6TYkc2Q%40mail.gmail.com
Backpatch-through: 13
2020-06-08 21:04:16 -07:00
Tom Lane 3048898e73 Mop-up for wait event naming issues.
Synchronize the event names for parallel hash join waits with other
event names, by getting rid of the slashes and dropping "-ing"
suffixes.  Rename ClogGroupUpdate to XactGroupUpdate, to match the
new SLRU name.  Move the ProcSignalBarrier event to the IPC category;
it doesn't belong under IO.

Also a bit more wordsmithing in the wait event documentation tables.

Discussion: https://postgr.es/m/4505.1589640417@sss.pgh.pa.us
2020-05-16 21:00:11 -04:00
Tom Lane fa27dd40d5 Run pgindent with new pg_bsd_indent version 2.1.1.
Thomas Munro fixed a longstanding annoyance in pg_bsd_indent, that
it would misformat lines containing IsA() macros on the assumption
that the IsA() call should be treated like a cast.  This improves
some other cases involving field/variable names that match typedefs,
too.  The only places that get worse are a couple of uses of the
OpenSSL macro STACK_OF(); we'll gladly take that trade-off.

Discussion: https://postgr.es/m/20200114221814.GA19630@alvherre.pgsql
2020-05-16 11:54:51 -04:00
Michael Paquier 7ccb2f54d9 Fix assertion with relation using REPLICA IDENTITY FULL in subscriber
In a logical replication subscriber, a table using REPLICA IDENTITY FULL
which has a primary key would try to use the primary key's index
available to scan for a tuple, but an assertion only assumed as correct
the case of an index associated to REPLICA IDENTITY USING INDEX.  This
commit corrects the assertion so as the use of a primary key index is a
valid case.

Reported-by: Dilip Kumar
Analyzed-by: Dilip Kumar
Author: Euler Taveira
Reviewed-by: Michael Paquier, Masahiko Sawada
Discussion: https://postgr.es/m/CAFiTN-u64S5bUiPL1q5kwpHNd0hRnf1OE-bzxNiOs5zo84i51w@mail.gmail.com
Backpatch-through: 10
2020-05-16 18:15:18 +09:00
Tom Lane 5cbfce562f Initial pgindent and pgperltidy run for v13.
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.

Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
2020-05-14 13:06:50 -04:00
Alvaro Herrera 17cc133f01
Dial back -Wimplicit-fallthrough to level 3
The additional pain from level 4 is excessive for the gain.

Also revert all the source annotation changes to their original
wordings, to avoid back-patching pain.

Discussion: https://postgr.es/m/31166.1589378554@sss.pgh.pa.us
2020-05-13 15:31:14 -04:00
Alvaro Herrera 3e9744465d
Add -Wimplicit-fallthrough to CFLAGS and CXXFLAGS
Use it at level 4, a bit more restrictive than the default level, and
tweak our commanding comments to FALLTHROUGH.

(However, leave zic.c alone, since it's external code; to avoid the
warnings that would appear there, change CFLAGS for that file in the
Makefile.)

Author: Julien Rouhaud <rjuju123@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20200412081825.qyo5vwwco3fv4gdo@nol
Discussion: https://postgr.es/m/flat/E1fDenm-0000C8-IJ@gemulon.postgresql.org
2020-05-12 16:07:30 -04:00
Tomas Vondra 1a40d37a9f Fix typos and improve incremental sort comments
Author: Justin Pryzby, James Coleman
Discussion: https://postgr.es/m/20200419023625.GP26953@telsasoft.com
2020-05-12 19:37:13 +02:00
Tomas Vondra 9155b4be9a Do no reset bounded before incremental sort rescan
ExecReScanIncrementalSort was resetting bounded=false, which means the
optimization would be disabled on all rescans. This happens because
ExecSetTupleBound is called before the rescan, not after it.

Author: James Coleman
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/20200414065336.GI1492@paquier.xyz
2020-05-09 19:41:36 +02:00
Tomas Vondra c442722648 Fix handling of REWIND/MARK/BACKWARD in incremental sort
The executor flags were not handled entirely correctly, although the
bugs were mostly harmless and it was mostly comment inaccuracy. We don't
need to strip any of the flags for child nodes.

Incremental sort does not support backward scans of mark/restore, so
MARK/BACKWARDS flags should not be possible. So we simply ensure this
using an assert, and we don't bother removing them when initializing
the child node.

With REWIND it's a bit less clear - incremental sort does not support
REWIND, but there is no way to signal this - it's legal to just ignore
the flag. We however continue passing the flag to child nodes, because
they might be useful to leverage that.

Reported-by: Michael Paquier
Author: James Coleman
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/20200414065336.GI1492@paquier.xyz
2020-05-09 19:41:18 +02:00
Amit Kapila 69bfaf2e1d Change the display of WAL usage statistics in Explain.
In commit 33e05f89c5, we have added the option to display WAL usage
statistics in Explain and auto_explain.  The display format used two spaces
between each field which is inconsistent with Buffer usage statistics which
is using one space between each field.  Change the format to make WAL usage
statistics consistent with Buffer usage statistics.

This commit also changed the usage of "full page writes" to
"full page images" for WAL usage statistics to make it consistent with
other parts of code and docs.

Author: Julien Rouhaud, Amit Kapila
Reviewed-by: Justin Pryzby, Kyotaro Horiguchi and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-05-05 08:00:53 +05:30
Andres Freund 299298bc87 Fix transient memory leak for SRFs in FROM.
In a9c35cf85c I changed ExecMakeTableFunctionResult() to dynamically
allocate the FunctionCallInfo used to call the SRF. Unfortunately I
did not account for the fact that the surrounding memory context has
query lifetime, leading to a leak till the end of the query.

In most cases the leak is fairly inconsequential, but if the
FunctionScan is done many times in the query, the leak can add
up. This happens e.g. if the function scan is on the inner side of a
nested loop, due to a lateral join.
EXPLAIN SELECT sum(f) FROM generate_series(1, 100000000) g(i), generate_series(i, i+1) f;
quickly shows the leak.

Instead of explicitly freeing the FunctionCallInfo it seems better to
make sure all the per-set temporary state in
ExecMakeTableFunctionResult() is cleaned up wholesale. Currently
that's probably just the FunctionCallInfo allocation, but since
there's some initialization work, and since there's already an
appropriate context, this seems like a more robust approach.

Bug: #16112
Reported-By: Ben Cornett
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/16112-4448bbf55a404189%40postgresql.org
Backpatch: 12, a9c35cf85c
2020-04-22 19:53:06 -07:00
Tom Lane 5836d32655 Fix minor violations of FunctionCallInvoke usage protocol.
Working on commit 1c455078b led me to check through FunctionCallInvoke
call sites to see if every one was being honest about (a) making sure
that fcinfo.isnull is initially false, and (b) checking its state after
the call.  Sure enough, I found some violations.

The main one is that finalize_partialaggregate re-used serialfn_fcinfo
without resetting isnull, even though it clearly intends to cater for
serialfns that return NULL.  There would only be an issue with a
non-strict serialfn, since it's unlikely that a serialfn would return
NULL for non-null input.  We have no non-strict serialfns in core, and
there may be none in the wild either, which would account for the lack
of complaints.  Still, it's clearly wrong, so back-patch that fix to
9.6 where finalize_partialaggregate was introduced.

Also, arrayfuncs.c and rowtypes.c contained various callers that were
not bothering to check for result nulls.  While what's being called is
a comparison or hash function that probably *shouldn't* return null,
that's a lousy excuse for not having any check at all.  There are
existing places that just Assert(!fcinfo->isnull) in comparable
situations, so I added that to the places that were calling btree
comparison or hash support functions.  In the places calling
boolean-returning equality functions, it's quite cheap to have them
treat isnull as FALSE, so make those places do that.  Also remove some
"locfcinfo->isnull = false" assignments that are unnecessary given the
assumption that no previous call returned null.  These changes seem like
mostly neatnik-ism or debugging support, so I didn't back-patch.
2020-04-21 14:23:53 -04:00
David Rowley 3cb02e307e Fix possible crash with GENERATED ALWAYS columns
In some corner cases, this could also lead to corrupted values being
included in the tuple.

Users who are concerned that they are affected by this should first
upgrade and then perform a base backup of their database and restore onto
an off-line server. They should then query each table with generated
columns to ensure there are no rows where the generated expression does
not match a newly calculated version of the GENERATED ALWAYS expression.
If no crashes occur and no rows are returned then you're not affected.

Fixes bug #16369.

Reported-by: Cameron Ezell
Discussion: https://postgr.es/m/16369-5845a6f1bef59884@postgresql.org
Backpatch-through: 12 (where GENERATED ALWAYS columns were added.)
2020-04-18 14:10:37 +12:00
Tom Lane 2d59643dbc Account for collation when coercing the output of a SQL function.
Commit 913bbd88d overlooked that the result of coerce_to_target_type
might need collation fixups.  Per report from Andreas Joseph Krogh.

Discussion: https://postgr.es/m/VisenaEmail.72.37d08ec2b8cb8fb5.17179940cd3@tc7-visena
2020-04-14 17:30:36 -04:00
Michael Paquier 8128b0c152 Fix collection of typos and grammar mistakes in the tree, volume 2
This fixes some comments and documentation new as of Postgres 13, and is
a follow-up of the work done in dd0f37e.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com
2020-04-14 14:45:43 +09:00
Amit Kapila ef08ca113f Cosmetic fixups for WAL usage work.
Reported-by: Justin Pryzby and Euler Taveira
Author: Justin Pryzby and Julien Rouhaud
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-13 15:31:16 +05:30
Tom Lane 35cb574aa8 Suppress -Wimplicit-fallthrough warning in new LIMIT WITH TIES code.
The placement of the fall-through comment in this code appears not to
work to suppress the warning in recent gcc.  Move it to the bottom of
the case group, and add an assertion that we didn't get there through
some other code path.  Also improve wording of nearby comments.

Julien Rouhaud, comment hacking by me

Discussion: https://postgr.es/m/CAOBaU_aLdPGU5wCpaowNLF-Q8328iR7mj1yJAhMOVsdLwY+sdg@mail.gmail.com
2020-04-11 15:02:44 -04:00
Noah Misch 328c70997b Optimize RelationFindReplTupleSeq() for CLOBBER_CACHE_ALWAYS.
Specifically, remember lookup_type_cache() results instead of retrieving
them once per comparison.  Under CLOBBER_CACHE_ALWAYS, this reduced
src/test/subscription/t/001_rep_changes.pl elapsed time by an order of
magnitude, which reduced check-world elapsed time by 9%.

Discussion: https://postgr.es/m/20200406085420.GC162712@rfd.leadboat.com
2020-04-11 10:30:12 -07:00
Tom Lane 969f9d0b4b Make EXPLAIN report maximum hashtable usage across multiple rescans.
Before discarding the old hash table in ExecReScanHashJoin, capture
its statistics, ensuring that we report the maximum hashtable size
across repeated rescans of the hash input relation.  We can repurpose
the existing code for reporting hashtable size in parallel workers
to help with this, making the patch pretty small.  This also ensures
that if rescans happen within parallel workers, we get the correct
maximums across all instances.

Konstantin Knizhnik and Tom Lane, per diagnosis by Thomas Munro
of a trouble report from Alvaro Herrera.

Discussion: https://postgr.es/m/20200323165059.GA24950@alvherre.pgsql
2020-04-11 12:39:19 -04:00
Tom Lane 5c27bce7f3 Clear dangling pointer to avoid bogus EXPLAIN printout in a corner case.
ExecReScanHashJoin will destroy the join's hash table if it expects
that the inner relation will produce different rows on rescan.
Up to now it's not bothered to clear the additional pointer to that
hash table that exists in the child HashState node.  However, it's
possible for the query to terminate without building a fresh hash
table (this happens if the outer relation is found to be empty
during the final rescan).  So we can end with a dangling pointer
to a deleted hash table.  That was harmless originally, but since
9.0 EXPLAIN ANALYZE has used that pointer to print hash table
statistics.  In debug builds this reproducibly results in garbage
statistics.  In non-debug builds there's frequently no ill effects,
but in principle one could get wrong EXPLAIN ANALYZE output, or
perhaps even a crash if free() has released the hashtable memory
back to the OS.

To fix, just make sure we clear the additional pointer when destroying
the hash table.  In problematic cases, EXPLAIN ANALYZE will then print
no hashtable statistics (reverting to its pre-9.0 behavior).  This isn't
ideal, but since the problem manifests only in unusual corner cases,
it's hard to justify taking any risks to do better in the back
branches.  A follow-on patch will improve matters in HEAD.

Konstantin Knizhnik and Tom Lane, per diagnosis by Thomas Munro
of a trouble report from Alvaro Herrera.

Discussion: https://postgr.es/m/20200323165059.GA24950@alvherre.pgsql
2020-04-11 12:29:06 -04:00
Michael Paquier dd0f37ecce Fix collection of typos and grammar mistakes in the tree
This fixes some comments and documentation new as of Postgres 13.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com
2020-04-10 11:18:39 +09:00
Amit Kapila 5c71362174 Allow parallel create index to accumulate buffer usage stats.
Currently, we don't account for buffer usage incurred by parallel workers
for parallel create index.  This commit allows each worker to record the
buffer usage stats and leader backend to accumulate that stats at the
end of the operation.  This will allow pg_stat_statements to display
correct buffer usage stats for (parallel) create index command.

Reported-by: Julien Rouhaud
Author: Sawada Masahiko
Reviewed-by: Dilip Kumar, Julien Rouhaud and Amit Kapila
Backpatch-through: 11, where this was introduced
Discussion: https://postgr.es/m/20200328151721.GB12854@nol
2020-04-09 09:49:30 +05:30
David Rowley 02a2e8b442 Modify additional power 2 calculations to use new helper functions
2nd pass of modifying various places which obtain the next power
of 2 of a number and make them use the new functions added in
f0705bb62.

In passing, also modify num_combinations(). This can be implemented
using simple bitshifting rather than looping.

Reviewed-by: John Naylor
Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org
2020-04-08 18:29:51 +12:00
Jeff Davis 50a38f6517 Create memory context for HashAgg with a reasonable maxBlockSize.
If the memory context's maxBlockSize is too big, a single block
allocation can suddenly exceed work_mem. For Hash Aggregation, this
can mean spilling to disk too early or reporting a confusing memory
usage number for EXPLAN ANALYZE.

Introduce CreateWorkExprContext(), which is like CreateExprContext(),
except that it creates the AllocSet with a maxBlockSize that is
reasonable in proportion to work_mem.

Right now, CreateWorkExprContext() is only used by Hash Aggregation,
but it may be generally useful in the future.

Discussion: https://postgr.es/m/412a3fbf306f84d8d78c4009e11791867e62b87c.camel@j-davis.com
2020-04-07 21:25:28 -07:00
Alvaro Herrera 357889eb17
Support FETCH FIRST WITH TIES
WITH TIES is an option to the FETCH FIRST N ROWS clause (the SQL
standard's spelling of LIMIT), where you additionally get rows that
compare equal to the last of those N rows by the columns in the
mandatory ORDER BY clause.

There was a proposal by Andrew Gierth to implement this functionality in
a more powerful way that would yield more features, but the other patch
had not been finished at this time, so we decided to use this one for
now in the spirit of incremental development.

Author: Surafel Temesgen <surafel3000@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Discussion: https://postgr.es/m/CALAY4q9ky7rD_A4vf=FVQvCGngm3LOes-ky0J6euMrg=_Se+ag@mail.gmail.com
Discussion: https://postgr.es/m/87o8wvz253.fsf@news-spur.riddles.org.uk
2020-04-07 16:22:13 -04:00
Tomas Vondra d2d8a229bc Implement Incremental Sort
Incremental Sort is an optimized variant of multikey sort for cases when
the input is already sorted by a prefix of the requested sort keys. For
example when the relation is already sorted by (key1, key2) and we need
to sort it by (key1, key2, key3) we can simply split the input rows into
groups having equal values in (key1, key2), and only sort/compare the
remaining column key3.

This has a number of benefits:

- Reduced memory consumption, because only a single group (determined by
  values in the sorted prefix) needs to be kept in memory. This may also
  eliminate the need to spill to disk.

- Lower startup cost, because Incremental Sort produce results after each
  prefix group, which is beneficial for plans where startup cost matters
  (like for example queries with LIMIT clause).

We consider both Sort and Incremental Sort, and decide based on costing.

The implemented algorithm operates in two different modes:

- Fetching a minimum number of tuples without check of equality on the
  prefix keys, and sorting on all columns when safe.

- Fetching all tuples for a single prefix group and then sorting by
  comparing only the remaining (non-prefix) keys.

We always start in the first mode, and employ a heuristic to switch into
the second mode if we believe it's beneficial - the goal is to minimize
the number of unnecessary comparions while keeping memory consumption
below work_mem.

This is a very old patch series. The idea was originally proposed by
Alexander Korotkov back in 2013, and then revived in 2017. In 2018 the
patch was taken over by James Coleman, who wrote and rewrote most of the
current code.

There were many reviewers/contributors since 2013 - I've done my best to
pick the most active ones, and listed them in this commit message.

Author: James Coleman, Alexander Korotkov
Reviewed-by: Tomas Vondra, Andreas Karlsson, Marti Raudsepp, Peter Geoghegan, Robert Haas, Thomas Munro, Antonin Houska, Andres Freund, Alexander Kuzmenkov
Discussion: https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
2020-04-06 21:35:10 +02:00
Peter Eisentraut f1ac27bfda Add logical replication support to replicate into partitioned tables
Mainly, this adds support code in logical/worker.c for applying
replicated operations whose target is a partitioned table to its
relevant partitions.

Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Petr Jelinek <petr@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com
2020-04-06 15:15:52 +02:00
Amit Kapila df3b181499 Add infrastructure to track WAL usage.
This allows gathering the WAL generation statistics for each statement
execution.  The three statistics that we collect are the number of WAL
records, the number of full page writes and the amount of WAL bytes
generated.

This helps the users who have write-intensive workload to see the impact
of I/O due to WAL.  This further enables us to see approximately what
percentage of overall WAL is due to full page writes.

In the future, we can extend this functionality to allow us to compute the
the exact amount of WAL data due to full page writes.

This patch in itself is just an infrastructure to compute WAL usage data.
The upcoming patches will expose this data via explain, auto_explain,
pg_stat_statements and verbose (auto)vacuum output.

Author: Kirill Bychik, Julien Rouhaud
Reviewed-by: Dilip Kumar, Fujii Masao and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-04 10:02:08 +05:30
Jeff Davis 0588ee63aa Include chunk overhead in hash table entry size estimate.
Don't try to be precise about it, just use a constant 16 bytes of
chunk overhead. Being smarter would require knowing the memory context
where the chunk will be allocated, which is not known by all callers.

Discussion: https://postgr.es/m/20200325220936.il3ni2fj2j2b45y5@alap3.anarazel.de
2020-04-03 20:07:58 -07:00
Fujii Masao 6aba63ef3e Allow the planner-related functions and hook to accept the query string.
This commit adds query_string argument into the planner-related functions
and hook and allows us to pass the query string to them.

Currently there is no user of the query string passed. But the upcoming patch
for the planning counters will add the planning hook function into
pg_stat_statements and the function will need the query string. So this change
will be necessary for that patch.

Also this change is useful for some extensions that want to use the query
string in their planner hook function.

Author: Pascal Legrand, Julien Rouhaud
Reviewed-by: Yoshikazu Imai, Tom Lane, Fujii Masao
Discussion: https://postgr.es/m/CAOBaU_bU1m3_XF5qKYtSj1ua4dxd=FWDyh2SH4rSJAUUfsGmAQ@mail.gmail.com
Discussion: https://postgr.es/m/1583789487074-0.post@n3.nabble.com
2020-03-30 13:51:05 +09:00
Fujii Masao 4a539a25eb Expose BufferUsageAccumDiff().
Previously pg_stat_statements calculated the difference of buffer counters
by its own code even while BufferUsageAccumDiff() had the same code.
This commit expose BufferUsageAccumDiff() and makes pg_stat_statements
use it for the calculation, in order to simply the code.

This change also would be useful for the upcoming patch for the planning
counters in pg_stat_statements because the patch will add one more code
for the calculation of difference of buffer counters and that can easily be
done by using BufferUsageAccumDiff().

Author: Julien Rouhaud
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/bdfee4e0-a304-2498-8da5-3cb52c0a193e@oss.nttdata.com
2020-03-30 12:15:26 +09:00
Jeff Davis 7351bfeda3 Fix costing for disk-based hash aggregation.
Report and suggestions from Richard Guo and Tomas Vondra.

Discussion: https://postgr.es/m/CAMbWs4_W8fYbAn8KxgidAaZHON_Oo08OYn9ze=7remJymLqo5g@mail.gmail.com
2020-03-28 12:07:49 -07:00
Tom Lane bda6dedbea Go back to returning int from ereport auxiliary functions.
This reverts the parts of commit 17a28b0364
that changed ereport's auxiliary functions from returning dummy integer
values to returning void.  It turns out that a minority of compilers
complain (not entirely unreasonably) about constructs such as

	(condition) ? errdetail(...) : 0

if errdetail() returns void rather than int.  We could update those
call sites to say "(void) 0" perhaps, but the expectation for this
patch set was that ereport callers would not have to change anything.
And this aspect of the patch set was already the most invasive and
least compelling part of it, so let's just drop it.

Per buildfarm.

Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
2020-03-25 11:57:36 -04:00
Jeff Davis 3649133b14 Avoid allocating unnecessary zero-sized array.
If there are no aggregates, there is no need to allocate an array of
zero AggStatePerGroupData elements.
2020-03-24 18:30:04 -07:00
Tom Lane 17a28b0364 Improve the internal implementation of ereport().
Change all the auxiliary error-reporting routines to return void,
now that we no longer need to pretend they are passing something
useful to errfinish().  While this probably doesn't save anything
significant at the machine-code level, it allows detection of some
additional types of mistakes.

Pass the error location details (__FILE__, __LINE__, PG_FUNCNAME_MACRO)
to errfinish not errstart.  This shaves a few cycles off the case where
errstart decides we're not going to emit anything.

Re-implement elog() as a trivial wrapper around ereport(), removing
the separate support infrastructure it used to have.  Aside from
getting rid of some now-surplus code, this means that elog() now
really does have exactly the same semantics as ereport(), in particular
that it can skip evaluation work if the message is not to be emitted.

Andres Freund and Tom Lane

Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
2020-03-24 12:08:48 -04:00
Jeff Davis 64fe602279 Fixes for Disk-based Hash Aggregation.
Justin Pryzby raised a couple issues with commit 1f39bce0. Fixed.

Also, tweak the way the size of a hash entry is estimated and the
number of buckets is estimated when calling BuildTupleHashTableExt().

Discussion: https://www.postgresql.org/message-id/20200319064222.GR26184@telsasoft.com
2020-03-23 15:43:07 -07:00
Amit Kapila 33753ac9d7 Add object names to partition integrity violations.
All errors of SQLSTATE class 23 should include the name of an object
associated with the error in separate fields of the error report message.
We do this so that applications need not try to extract them from the
possibly-localized human-readable text of the message.

Reported-by: Chris Bandy
Author: Chris Bandy
Reviewed-by: Amit Kapila and Amit Langote
Discussion: https://postgr.es/m/0aa113a3-3c7f-db48-bcd8-f9290b2269ae@gmail.com
2020-03-23 08:09:15 +05:30
Jeff Davis 1f39bce021 Disk-based Hash Aggregation.
While performing hash aggregation, track memory usage when adding new
groups to a hash table. If the memory usage exceeds work_mem, enter
"spill mode".

In spill mode, new groups are not created in the hash table(s), but
existing groups continue to be advanced if input tuples match. Tuples
that would cause a new group to be created are instead spilled to a
logical tape to be processed later.

The tuples are spilled in a partitioned fashion. When all tuples from
the outer plan are processed (either by advancing the group or
spilling the tuple), finalize and emit the groups from the hash
table. Then, create new batches of work from the spilled partitions,
and select one of the saved batches and process it (possibly spilling
recursively).

Author: Jeff Davis
Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman
Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com
2020-03-18 15:42:02 -07:00
Tom Lane 9d9784c840 Remove bogus assertion about polymorphic SQL function result.
It is possible to reach check_sql_fn_retval() with an unresolved
polymorphic rettype, resulting in an assertion failure as demonstrated
by one of the added test cases.  However, the code following that
throws what seems an acceptable error message, so just remove the
Assert and adjust commentary.

While here, I thought it'd be a good idea to provide some parallel
tests of SQL-function and PL/pgSQL-function polymorphism behavior.
Some of these cases are perhaps duplicative of tests elsewhere,
but we hadn't any organized coverage of the topic AFAICS.

Although that assertion's been wrong all along, it won't have any
effect in production builds, so I'm not bothering to back-patch.

Discussion: https://postgr.es/m/21569.1584314271@sss.pgh.pa.us
2020-03-17 14:54:46 -04:00
Thomas Munro b09ff53667 Simplify the effective_io_concurrency setting.
The effective_io_concurrency GUC and equivalent tablespace option were
previously passed through a formula based on a theory about RAID
spindles and probabilities, to arrive at the number of pages to prefetch
in bitmap heap scans.  Tomas Vondra, Andres Freund and others argued
that it was anachronistic and hard to justify, and commit 558a9165e0
already started down the path of bypassing it in new code.  We agreed to
drop that logic and use the value directly.

For the default setting of 1, there is no change in effect.  Higher
settings can be converted from the old meaning to the new with:

  select round(sum(OLD / n::float)) from generate_series(1, OLD) s(n);

We might want to consider renaming the GUC before the next release given
the change in meaning, but it's not clear that many users had set it
very carefully anyway.  That decision is deferred for now.

Discussion: https://postgr.es/m/CA%2BhUKGJUw08dPs_3EUcdO6M90GnjofPYrWp4YSLaBkgYwS-AqA%40mail.gmail.com
2020-03-16 17:14:26 +13:00
Peter Eisentraut 3c173a53a8 Remove utils/acl.h from catalog/objectaddress.h
The need for this was removed by
8b9e9644dc.

A number of files now need to include utils/acl.h or
parser/parse_node.h explicitly where they previously got it indirectly
somehow.

Since parser/parse_node.h already includes nodes/parsenodes.h, the
latter is then removed where the former was added.  Also, remove
nodes/pg_list.h from objectaddress.h, since that's included via
nodes/parsenodes.h.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/7601e258-26b2-8481-36d0-dc9dca6f28f1%402ndquadrant.com
2020-03-10 10:27:00 +01:00
Jeff Davis c954d49046 Extend ExecBuildAggTrans() to support a NULL pointer check.
Optionally push a step to check for a NULL pointer to the pergroup
state.

This will be important for disk-based hash aggregation in combination
with grouping sets. When memory limits are reached, a given tuple may
find its per-group state for some grouping sets but not others. For
the former, it advances the per-group state as normal; for the latter,
it skips evaluation and the calling code will have to spill the tuple
and reprocess it in a later batch.

Add the NULL check as a separate expression step because in some
common cases it's not needed.

Discussion: https://postgr.es/m/20200221202212.ssb2qpmdgrnx52sj%40alap3.anarazel.de
2020-03-04 17:29:18 -08:00
Tom Lane 3ed2005ff5 Introduce macros for typalign and typstorage constants.
Our usual practice for "poor man's enum" catalog columns is to define
macros for the possible values and use those, not literal constants,
in C code.  But for some reason lost in the mists of time, this was
never done for typalign/attalign or typstorage/attstorage.  It's never
too late to make it better though, so let's do that.

The reason I got interested in this right now is the need to duplicate
some uses of the TYPSTORAGE constants in an upcoming ALTER TYPE patch.
But in general, this sort of change aids greppability and readability,
so it's a good idea even without any specific motivation.

I may have missed a few places that could be converted, and it's even
more likely that pending patches will re-introduce some hard-coded
references.  But that's not fatal --- there's no expectation that
we'd actually change any of these values.  We can clean up stragglers
over time.

Discussion: https://postgr.es/m/16457.1583189537@sss.pgh.pa.us
2020-03-04 10:34:25 -05:00
Alvaro Herrera 2f9661311b
Represent command completion tags as structs
The backend was using strings to represent command tags and doing string
comparisons in multiple places, but that's slow and unhelpful.  Create a
new command list with a supporting structure to use instead; this is
stored in a tag-list-file that can be tailored to specific purposes with
a caller-definable C macro, similar to what we do for WAL resource
managers.  The first first such uses are a new CommandTag enum and a
CommandTagBehavior struct.

Replace numerous occurrences of char *completionTag with a
QueryCompletion struct so that the code no longer stores information
about completed queries in a cstring.  Only at the last moment, in
EndCommand(), does this get converted to a string.

EventTriggerCacheItem no longer holds an array of palloc’d tag strings
in sorted order, but rather just a Bitmapset over the CommandTags.

Author: Mark Dilger, with unsolicited help from Álvaro Herrera
Reviewed-by: John Naylor, Tom Lane
Discussion: https://postgr.es/m/981A9DB4-3F0C-4DA5-88AD-CB9CFF4D6CAD@enterprisedb.com
2020-03-02 18:19:51 -03:00
Tom Lane 58c47ccfff Correctly re-use hash tables in buildSubPlanHash().
Commit 356687bd8 omitted to remove leftover code for destroying
a hashed subplan's hash tables, with the result that the tables
were always rebuilt not reused; this leads to severe memory
leakage if a hashed subplan is re-executed enough times.
Moreover, the code for reusing the hashnulls table had a typo
that would have made it do the wrong thing if it were reached.

Looking at the code coverage report shows severe under-coverage
of the potential callers of ResetTupleHashTable, so add some test
cases that exercise them.

Andreas Karlsson and Tom Lane, per reports from Ranier Vilela
and Justin Pryzby.

Backpatch to v11, as the faulty commit was.

Discussion: https://postgr.es/m/edb62547-c453-c35b-3ed6-a069e4d6b937@proxel.se
Discussion: https://postgr.es/m/CAEudQAo=DCebm1RXtig9OH+QivpS97sMkikt0A9qHmMUs+g6ZA@mail.gmail.com
Discussion: https://postgr.es/m/20200210032547.GA1412@telsasoft.com
2020-02-29 13:48:09 -05:00
Tom Lane 6afc8aefd3 Remove obsolete comment.
Noted while studying subplan hash issue.
2020-02-29 13:23:12 -05:00
Robert Haas 05d8449e73 Move src/backend/utils/hash/hashfn.c to src/common
This also involves renaming src/include/utils/hashutils.h, which
becomes src/include/common/hashfn.h. Perhaps an argument can be
made for keeping the hashutils.h name, but it seemed more
consistent to make it match the name of the file, and also more
descriptive of what is actually going on here.

Patch by me, reviewed by Suraj Kharage and Mark Dilger. Off-list
advice on how not to break the Windows build from Davinder Singh
and Amit Kapila.

Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com
2020-02-27 09:25:41 +05:30
Andres Freund 2742c45080 expression eval: Reduce number of steps for agg transition invocations.
Do so by combining the various steps that are part of aggregate
transition function invocation into one larger step. As some of the
current steps are only necessary for some aggregates, have one variant
of the aggregate transition step for each possible combination.

To avoid further manual copies of code in the different transition
step implementations, move most of the code into helper functions
marked as "always inline".

The benefit of this change is an increase in performance when
aggregating lots of rows. This comes in part due to the reduced number
of indirect jumps due to the reduced number of steps, and in part by
reducing redundant setup code across steps. This mainly benefits
interpreted execution, but the code generated by JIT is also improved
a bit.

As a nice side-effect it also ends up making the code a bit simpler.

A small additional optimization is removing the need to set
aggstate->curaggcontext before calling ExecAggInitGroup, choosing to
instead passign curaggcontext as an argument. It was, in contrast to
other aggregate related functions, only needed to fetch a memory
context to copy the transition value into.

Author: Andres Freund
Discussion:
   https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de
   https://postgr.es/m/5c371df7cee903e8cd4c685f90c6c72086d3a2dc.camel@j-davis.com
2020-02-24 15:09:09 -08:00
Jeff Davis b7fabe80df Fixup for nodeAgg.c refactor.
Commit 5b618e1f made an unintended behavior change.
2020-02-21 09:34:20 -08:00
Jeff Davis 5b618e1f48 Minor refactor of nodeAgg.c.
* Separate calculation of hash value from the lookup.
  * Split build_hash_table() into two functions.
  * Change lookup_hash_entry() to return AggStatePerGroup. That's all
    the caller needed, anyway.

These changes are to support the upcoming Disk-based Hash Aggregation
work.

Discussion: https://postgr.es/m/31f5ab871a3ad5a1a91a7a797651f20e77ac7ce3.camel%40j-davis.com
2020-02-19 10:39:11 -08:00
Michael Paquier 958f9fb98d Remove duplicated words in comments
Author: Daniel Gustafsson
Reviewed-by: Vik Fearing
Discussion: https://postgr.es/m/EBC3BFEB-664C-4063-81ED-29F1227DB012@yesql.se
2020-02-18 12:20:55 +09:00
Peter Eisentraut c6679e4fca Optimize update of tables with generated columns
When updating a table row with generated columns, only recompute those
generated columns whose base columns have changed in this update and
keep the rest unchanged.  This can result in a significant performance
benefit.  The required information was already kept in
RangeTblEntry.extraUpdatedCols; we just have to make use of it.

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b05e781a-fa16-6b52-6738-761181204567@2ndquadrant.com
2020-02-17 15:20:58 +01:00
Jeff Davis 11de6c903d Change signature of TupleHashTableHash().
Commit 4eaea3db introduced TupleHashTableHash(), but the signature
didn't match the other exposed functions. Separate it into internal
and external versions. The external version hides the details behind
an API more consistent with the other external functions, and the
internal version is still suitable for simplehash.
2020-02-10 10:20:10 -08:00
Fujii Masao cb5b28613d Fix bug in Tid scan.
Commit 147e3722f7 changed Tid scan so that it calls table_beginscan()
and uses the scan option for seq scan. This change caused two issues.

(1) The change caused Tid scan to take a predicate lock on the entire
       relation in serializable transaction even when relation-level
       lock is not necessary. This could lead to an unexpected
       serialization error.

(2) The change caused Tid scan to increment the number of seq_scan
       in pg_stat_*_tables views even though it's not seq scan. This
       could confuse the users.

This commit adds the scan option for Tid scan and makes Tid scan
use it, to avoid those issues.

Back-patch to v12, where the bug was introduced.

Author: Tatsuhito Kasahara
Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Fujii Masao
Discussion: https://postgr.es/m/CAP0=ZVKy+gTbFmB6X_UW0pP3WaeJ-fkUWHoD-pExS=at3CY76g@mail.gmail.com
2020-02-07 22:06:31 +09:00
Jeff Davis 4eaea3db15 Introduce TupleHashTableHash() and LookupTupleHashEntryHash().
Expose two new entry points: one for only calculating the hash value
of a tuple, and another for looking up a hash entry when the hash
value is already known. This will be useful for disk-based Hash
Aggregation to avoid recomputing the hash value for the same tuple
after saving and restoring it from disk.

Discussion: https://postgr.es/m/37091115219dd522fd9ed67333ee8ed1b7e09443.camel%40j-davis.com
2020-02-06 20:34:01 -08:00
Andres Freund 1fdb7f9789 expression eval: Don't redundantly keep track of AggState.
It's already tracked via ExprState->parent, so we don't need to also
include it in ExprEvalStep. When that code originally was written
ExprState->parent didn't exist, but it since has been introduced in
6719b238e8.

Author: Andres Freund
Discussion: https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de
2020-02-06 19:54:43 -08:00
Andres Freund 1ec7679f1b expression eval, jit: Minor code cleanups.
This mostly consists of using C99 style for loops, moving variables
into narrower scopes, and a smattering of other minor improvements.
Done separately to make it easier to review patches with actual
functional changes.

Author: Andres Freund
Discussion: https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de
2020-02-06 19:54:15 -08:00
Jeff Davis 7d4395d0a1 Refactor hash_agg_entry_size().
Consolidate the calculations for hash table size estimation. This will
help with upcoming Hash Aggregation work that will add additional call
sites.
2020-02-06 11:49:56 -08:00
Alvaro Herrera 1c7a0b387d Add missing break out seqscan loop in logical replication
When replica identity is FULL (an admittedly unusual case), the loop
that searches for tuples in execReplication.c didn't stop scanning the
table when once a matching tuple was found.  Add the missing 'break'.

Note slight behavior change: we now return the first matching tuple
rather than the last one.  They are supposed to be indistinguishable
anyway, so this shouldn't matter.

Author: Konstantin Knizhnik
Discussion: https://postgr.es/m/379743f6-ae91-b866-f7a2-5624e6d2b0a4@postgrespro.ru
2020-02-03 18:59:12 -03:00
Alvaro Herrera c9d2977519 Clean up newlines following left parentheses
We used to strategically place newlines after some function call left
parentheses to make pgindent move the argument list a few chars to the
left, so that the whole line would fit under 80 chars.  However,
pgindent no longer does that, so the newlines just made the code
vertically longer for no reason.  Remove those newlines, and reflow some
of those lines for some extra naturality.

Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql
2020-01-30 13:42:14 -03:00
Tom Lane 01d9676a53 Fix dangling pointer in EvalPlanQual machinery.
EvalPlanQualStart() supposed that it could re-use the relsubs_rowmark
and relsubs_done arrays from a prior instantiation.  But since they are
allocated in the es_query_cxt of the recheckestate, that's just wrong;
EvalPlanQualEnd() will blow away that storage.  Therefore we were using
storage that could have been reallocated to something else, causing all
sorts of havoc.

I think this was modeled on the old code's handling of es_epqTupleSlot,
but since the code was anyway clearing the arrays at re-use, there's
clearly no expectation of importing any outside state.  So it's just
a dubious savings of a couple of pallocs, which is negligible compared
to setting up a new planstate tree.  Therefore, just allocate the
arrays always.  (I moved the allocations slightly for readability.)

In principle this bug could cause a problem whenever EPQ rechecks are
needed in more than one target table of a ModifyTable plan node.
In practice it seems not quite so easy to trigger as that; I couldn't
readily duplicate a crash with a partitioned target table, for instance.
That's probably down to incidental choices about when to free or
reallocate stuff.  The added isolation test case does seem to reliably
show an assertion failure, though.

Per report from Oleksii Kliukin.  Back-patch to v12 where the bug was
introduced (evidently by commit 3fb307bc4).

Discussion: https://postgr.es/m/EEF05F66-2871-4786-992B-5F45C92FEE2E@hintbits.com
2020-01-28 17:26:37 -05:00
Amit Kapila 05f18c6b6b Added relation name in error messages for constraint checks.
This gives more information to the user about the error and it makes such
messages consistent with the other similar messages in the code.

Reported-by: Simon Riggs
Author: Mahendra Singh and Simon Riggs
Reviewed-by: Beena Emerson and Amit Kapila
Discussion: https://postgr.es/m/CANP8+j+7YUvQvGxTrCiw77R23enMJ7DFmyA3buR+fa2pKs4XhA@mail.gmail.com
2020-01-28 07:48:10 +05:30
Thomas Munro 3e4818e9dd Avoid unnecessary shm writes in Parallel Hash Join.
Currently, Parallel Hash Join cannot be used for full/right joins,
so there is no point in setting the match flag.  It turns out that
the cache coherence traffic generated by those writes slows down
large systems running many-core joins, so let's stop doing that.
In future, if we need to use match bits in parallel joins, we might
want to consider setting them only if not already set.

Back-patch to 11, where Parallel Hash Join arrived.

Reported-by: Deng, Gang
Discussion: https://postgr.es/m/0F44E799048C4849BAE4B91012DB910462E9897A%40SHSMSX103.ccr.corp.intel.com
2020-01-27 15:07:03 +13:00
Andres Freund affdde2e15 Fix edge case leading to agg transitions skipping ExecAggTransReparent() calls.
The code checking whether an aggregate transition value needs to be
reparented into the current context has always only compared the
transition return value with the previous transition value by datum,
i.e. without regard for NULLness.  This normally works, because when
the transition function returns NULL (via fcinfo->isnull), it'll
return a value that won't be the same as its input value.

But there's no hard requirement that that's the case. And it turns
out, it's possible to hit this case (see discussion or reproducers),
leading to a non-null transition value not being reparented, followed
by a crash caused by that.

Instead of adding another comparison of NULLness, instead have
ExecAggTransReparent() ensure that pergroup->transValue ends up as 0
when the new transition value is NULL. That avoids having to add an
additional branch to the much more common cases of the transition
function returning the old transition value (which is a pointer in
this case), and when the new value is different, but not NULL.

In branches since 69c3936a14, also deduplicate the reparenting code
between the expression evaluation based transitions, and the path for
ordered aggregates.

Reported-By: Teodor Sigaev, Nikita Glukhov
Author: Andres Freund
Discussion: https://postgr.es/m/bd34e930-cfec-ea9b-3827-a8bc50891393@sigaev.ru
Backpatch: 9.4-, this issue has existed since at least 7.4
2020-01-20 23:26:51 -08:00
Tom Lane 41c6f9db25 Repair more failures with SubPlans in multi-row VALUES lists.
Commit 9b63c13f0 turns out to have been fundamentally misguided:
the parent node's subPlan list is by no means the only way in which
a child SubPlan node can be hooked into the outer execution state.
As shown in bug #16213 from Matt Jibson, we can also get short-lived
tuple table slots added to the outer es_tupleTable list.  At this point
I have little faith that there aren't other possible connections as
well; the long time it took to notice this problem shows that this
isn't a heavily-exercised situation.

Therefore, revert that fix, returning to the coding that passed a
NULL parent plan pointer down to the transiently-built subexpressions.
That gives us a pretty good guarantee that they won't hook into the
outer executor state in any way.  But then we need some other solution
to make SubPlans work.  Adopt the solution speculated about in the
previous commit's log message: do expression initialization at plan
startup for just those VALUES rows containing SubPlans, abandoning the
goal of reclaiming memory intra-query for those rows.  In practice it
seems unlikely that queries containing a vast number of VALUES rows
would be using SubPlans in them, so this should not give up much.

(BTW, this test case also refutes my claim in connection with the prior
commit that the issue only arises with use of LATERAL.  That was just
wrong: some variants of SubLink always produce SubPlans.)

As with previous patch, back-patch to all supported branches.

Discussion: https://postgr.es/m/16213-871ac3bc208ecf23@postgresql.org
2020-01-17 16:17:31 -05:00
Robert Haas 2eb34ac369 Fix problems with "read only query" checks, and refactor the code.
Previously, check_xact_readonly() was responsible for determining
which types of queries could not be run in a read-only transaction,
standard_ProcessUtility() was responsibility for prohibiting things
which were allowed in read only transactions but not in recovery, and
utility commands were basically prohibited in bulk in parallel mode by
calls to CommandIsReadOnly() in functions.c and spi.c.  This situation
was confusing and error-prone. Accordingly, move all the checks to a
new function ClassifyUtilityCommandAsReadOnly(), which determines the
degree to which a given statement is read only.

In the old code, check_xact_readonly() inadvertently failed to handle
several statement types that actually should have been prohibited,
specifically T_CreatePolicyStmt, T_AlterPolicyStmt, T_CreateAmStmt,
T_CreateStatsStmt, T_AlterStatsStmt, and T_AlterCollationStmt.  As a
result, thes statements were erroneously allowed in read only
transactions, parallel queries, and standby operation. Generally, they
would fail anyway due to some lower-level error check, but we
shouldn't rely on that.  In the new code structure, future omissions
of this type should cause ClassifyUtilityCommandAsReadOnly() to
complain about an unrecognized node type.

As a fringe benefit, this means we can allow certain types of utility
commands in parallel mode, where it's safe to do so. This allows
ALTER SYSTEM, CALL, DO, CHECKPOINT, COPY FROM, EXPLAIN, and SHOW.
It might be possible to allow additional commands with more work
and thought.

Along the way, document the thinking process behind the current set
of checks, as per discussion especially with Peter Eisentraut. There
is some interest in revising some of these rules, but that seems
like a job for another patch.

Patch by me, reviewed by Tom Lane, Stephen Frost, and Peter
Eisentraut.

Discussion: http://postgr.es/m/CA+TgmoZ_rLqJt5sYkvh+JpQnfX0Y+B2R+qfi820xNih6x-FQOQ@mail.gmail.com
2020-01-16 12:11:31 -05:00
Dean Rasheed d751ba5235 Make rewriter prevent auto-updates on views with conditional INSTEAD rules.
A view with conditional INSTEAD rules and no unconditional INSTEAD
rules or INSTEAD OF triggers is not auto-updatable. Previously we
relied on a check in the executor to catch this, but that's
problematic since the planner may fail to properly handle such a query
and thus return a particularly unhelpful error to the user, before
reaching the executor check.

Instead, trap this in the rewriter and report the correct error there.
Doing so also allows us to include more useful error detail than the
executor check can provide. This doesn't change the existing behaviour
of updatable views; it merely ensures that useful error messages are
reported when a view isn't updatable.

Per report from Pengzhou Tang, though not adopting that suggested fix.
Back-patch to all supported branches.

Discussion: https://postgr.es/m/CAG4reAQn+4xB6xHJqWdtE0ve_WqJkdyCV4P=trYr4Kn8_3_PEA@mail.gmail.com
2020-01-14 09:52:21 +00:00
Tom Lane 913bbd88dc Improve the handling of result type coercions in SQL functions.
Use the parser's standard type coercion machinery to convert the
output column(s) of a SQL function's final SELECT or RETURNING
to the type(s) they should have according to the function's declared
result type.  We'll allow any case where an assignment-level
coercion is available.  Previously, we failed unless the required
coercion was a binary-compatible one (and the documentation ignored
this, falsely claiming that the types must match exactly).

Notably, the coercion now accounts for typmods, so that cases where
a SQL function is declared to return a composite type whose columns
are typmod-constrained now behave as one would expect.  Arguably
this aspect is a bug fix, but the overall behavioral change here
seems too large to consider back-patching.

A nice side-effect is that functions can now be inlined in a
few cases where we previously failed to do so because of type
mismatches.

Discussion: https://postgr.es/m/18929.1574895430@sss.pgh.pa.us
2020-01-08 11:07:59 -05:00
Bruce Momjian 7559d8ebfa Update copyrights for 2020
Backpatch-through: update all files in master, backpatch legal files through 9.4
2020-01-01 12:21:45 -05:00
Michael Paquier 7854e07f25 Revert "Rename files and headers related to index AM"
This follows multiple complains from Peter Geoghegan, Andres Freund and
Alvaro Herrera that this issue ought to be dug more before actually
happening, if it happens.

Discussion: https://postgr.es/m/20191226144606.GA5659@alvherre.pgsql
2019-12-27 08:09:00 +09:00
Tom Lane 5b9312378e Load relcache entries' partitioning data on-demand, not immediately.
Formerly the rd_partkey and rd_partdesc data structures were always
populated immediately when a relcache entry was built or rebuilt.
This patch changes things so that they are populated only when they
are first requested.  (Hence, callers *must* now always use
RelationGetPartitionKey or RelationGetPartitionDesc; just fetching
the pointer directly is no longer acceptable.)

This seems to have some performance benefits, but the main reason to do
it is that it eliminates a recursive-reload failure that occurs if the
partkey or partdesc expressions contain any references to the relation's
rowtype (as discovered by Amit Langote).  In retrospect, since loading
these data structures might result in execution of nearly-arbitrary code
via eval_const_expressions, it was a dumb idea to require that to happen
during relcache entry rebuild.

Also, fix things so that old copies of a relcache partition descriptor
will be dropped when the cache entry's refcount goes to zero.  In the
previous coding it was possible for such copies to survive for the
lifetime of the session, as I'd complained of in a previous discussion.
(This management technique still isn't perfect, but it's better than
before.)  Improve the commentary explaining how that works and why
it's safe to hand out direct pointers to these relcache substructures.

In passing, improve RelationBuildPartitionDesc by using the same
memory-context-parent-swap approach used by RelationBuildPartitionKey,
thereby making it less dependent on strong assumptions about what
partition_bounds_copy does.  Avoid doing get_rel_relkind in the
critical section, too.

Patch by Amit Langote and Tom Lane; Robert Haas deserves some credit
for prior work in the area, too.  Although this is a pre-existing
problem, no back-patch: the patch seems too invasive to be safe to
back-patch, and the bug it fixes is a corner case that seems
relatively unlikely to cause problems in the field.

Discussion: https://postgr.es/m/CA+HiwqFUzjfj9HEsJtYWcr1SgQ_=iCAvQ=O2Sx6aQxoDu4OiHw@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com
2019-12-25 14:43:13 -05:00
Michael Paquier 8ce3aa9b59 Rename files and headers related to index AM
The following renaming is done so as source files related to index
access methods are more consistent with table access methods (the
original names used for index AMs ware too generic, and could be
confused as including features related to table AMs):
- amapi.h -> indexam.h.
- amapi.c -> indexamapi.c.  Here we have an equivalent with
backend/access/table/tableamapi.c.
- amvalidate.c -> indexamvalidate.c.
- amvalidate.h -> indexamvalidate.h.
- genam.c -> indexgenam.c.
- genam.h -> indexgenam.h.

This has been discussed during the development of v12 when table AM was
worked on, but the renaming never happened.

Author: Michael Paquier
Reviewed-by: Fabien Coelho, Julien Rouhaud
Discussion: https://postgr.es/m/20191223053434.GF34339@paquier.xyz
2019-12-25 10:23:39 +09:00
Alvaro Herrera c4dcd9144b Avoid splitting C string literals with \-newline
Using \ is unnecessary and ugly, so remove that.  While at it, stitch
the literals back into a single line: we've long discouraged splitting
error message literals even when they go past the 80 chars line limit,
to improve greppability.

Leave contrib/tablefunc alone.

Discussion: https://postgr.es/m/20191223195156.GA12271@alvherre.pgsql
2019-12-24 12:44:12 -03:00
Thomas Munro e69d644547 Rotate instead of shifting hash join batch number.
Our algorithm for choosing batch numbers turned out not to work
effectively for multi-billion key inner relations.  We would use
more hash bits than we have, and effectively concentrate all tuples
into a smaller number of batches than we intended.  While ideally
we should switch to wider hashes, for now, change the algorithm to
one that effectively gives up bits from the bucket number when we
don't have enough bits.  That means we'll finish up with longer
bucket chains than would be ideal, but that's better than having
batches that don't fit in work_mem and can't be divided.

Batch-patch to all supported releases.

Author: Thomas Munro
Reviewed-by: Tom Lane, thanks also to Tomas Vondra, Alvaro Herrera, Andres Freund for testing and discussion
Reported-by: James Coleman
Discussion: https://postgr.es/m/16104-dc11ed911f1ab9df%40postgresql.org
2019-12-24 13:05:43 +13:00
Michael Paquier e1551f96e6 Refactor attribute mappings used in logical tuple conversion
Tuple conversion support in tupconvert.c is able to convert rowtypes
between two relations, inner and outer, which are logically equivalent
but have a different ordering or even dropped columns (used mainly for
inheritance tree and partitions).  This makes use of attribute mappings,
which are simple arrays made of AttrNumber elements with a length
matching the number of attributes of the outer relation.  The length of
the attribute mapping has been treated as completely independent of the
mapping itself until now, making it easy to pass down an incorrect
mapping length.

This commit refactors the code related to attribute mappings and moves
it into an independent facility called attmap.c, extracted from
tupconvert.c.  This merges the attribute mapping with its length,
avoiding to try to guess what is the length of a mapping to use as this
is computed once, when the map is built.

This will avoid mistakes like what has been fixed in dc816e58, which has
used an incorrect mapping length by matching it with the number of
attributes of an inner relation (a child partition) instead of an outer
relation (a partitioned table).

Author: Michael Paquier
Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/20191121042556.GD153437@paquier.xyz
2019-12-18 16:23:02 +09:00
Bruce Momjian 181932a032 Remove redundant not-null test
Reported-by: Ranier Vilela

Discussion: https://postgr.es/m/MN2PR18MB2927E73FADCA8967B2302469E3490@MN2PR18MB2927.namprd18.prod.outlook.com

Author: Ranier Vilela

Backpatch-through: master
2019-12-17 20:37:22 -05:00
Tom Lane 5935917ce5 Allow executor startup pruning to prune all child nodes.
Previously, if the startup pruning logic proved that all child nodes
of an Append or MergeAppend could be pruned, we still kept one, just
to keep EXPLAIN from failing.  The previous commit removed the
ruleutils.c limitation that required this kluge, so drop it.  That
results in less-confusing EXPLAIN output, as per a complaint from
Yuzuko Hosoya.

David Rowley

Discussion: https://postgr.es/m/001001d4f44b$2a2cca50$7e865ef0$@lab.ntt.co.jp
2019-12-11 17:05:30 -05:00
Tom Lane 6ef77cf46e Further adjust EXPLAIN's choices of table alias names.
This patch causes EXPLAIN to always assign a separate table alias to the
parent RTE of an append relation (inheritance set); before, such RTEs
were ignored if not actually scanned by the plan.  Since the child RTEs
now always have that same alias to start with (cf. commit 55a1954da),
the net effect is that the parent RTE usually gets the alias used or
implied by the query text, and the children all get that alias with "_N"
appended.  (The exception to "usually" is if there are duplicate aliases
in different subtrees of the original query; then some of those original
RTEs will also have "_N" appended.)

This results in more uniform output for partitioned-table plans than
we had before: the partitioned table itself gets the original alias,
and all child tables have aliases with "_N", rather than the previous
behavior where one of the children would get an alias without "_N".

The reason for giving the parent RTE an alias, even if it isn't scanned
by the plan, is that we now use the parent's alias to qualify Vars that
refer to an appendrel output column and appear above the Append or
MergeAppend that computes the appendrel.  But below the append, Vars
refer to some one of the child relations, and are displayed that way.
This seems clearer than the old behavior where a Var that could carry
values from any child relation was displayed as if it referred to only
one of them.

While at it, change ruleutils.c so that the code paths used by EXPLAIN
deal in Plan trees not PlanState trees.  This effectively reverts a
decision made in commit 1cc29fe7c, which seemed like a good idea at
the time to make ruleutils.c consistent with explain.c.  However,
it's problematic because we'd really like to allow executor startup
pruning to remove all the children of an append node when possible,
leaving no child PlanState to resolve Vars against.  (That's not done
here, but will be in the next patch.)  This requires different handling
of subplans and initplans than before, but is otherwise a pretty
straightforward change.

Discussion: https://postgr.es/m/001001d4f44b$2a2cca50$7e865ef0$@lab.ntt.co.jp
2019-12-11 17:05:18 -05:00
Jeff Davis 30d47723fd Fix comments in execGrouping.c
Commit 5dfc1981 missed updating some comments.

Also, fix a comment typo found in passing.

Author: Jeff Davis
Discussion: https://postgr.es/m/9723131d247b919f94699152647fa87ee0bc02c2.camel%40j-davis.com
2019-12-06 11:49:59 -08:00
Etsuro Fujita 4af77aa797 Fix whitespace. 2019-12-04 12:45:00 +09:00
Alvaro Herrera 3974c4a724 Remove useless "return;" lines
Discussion: https://postgr.es/m/20191128144653.GA27883@alvherre.pgsql
2019-11-28 16:48:37 -03:00
Amit Kapila 080313f829 Don't shut down Gather[Merge] early under Limit.
Revert part of commit 19df1702f5.

Early shutdown was added by that commit so that we could collect
statistics from workers, but unfortunately, it interacted badly with
rescans.  The problem is that we ended up destroying the parallel context
which is required for rescans.  This leads to rescans of a Limit node over
a Gather node to produce unpredictable results as it tries to access
destroyed parallel context.  By reverting the early shutdown code, we
might lose statistics in some cases of Limit over Gather [Merge], but that
will require further study to fix.

Reported-by: Jerry Sievers
Diagnosed-by: Thomas Munro
Author: Amit Kapila, testcase by Vignesh C
Backpatch-through: 9.6
Discussion: https://postgr.es/m/87ims2amh6.fsf@jsievers.enova.com
2019-11-26 08:30:24 +05:30
Thomas Munro 76cbfcdf3a Always call ExecShutdownNode() if appropriate.
Call ExecShutdownNode() after ExecutePlan()'s loop, rather than at each
break.  We had forgotten to do that in one case.  The omission caused
intermittent "temporary file leak" warnings from multi-batch parallel
hash joins with a LIMIT clause.

Back-patch to 11.  Though the problem exists in theory in earlier
parallel query releases, nothing really depended on it.

Author: Kyotaro Horiguchi
Reviewed-by: Thomas Munro, Amit Kapila
Discussion: https://postgr.es/m/20191111.212418.2222262873417235945.horikyota.ntt%40gmail.com
2019-11-16 10:11:30 +13:00
Amit Kapila 14aec03502 Make the order of the header file includes consistent in backend modules.
Similar to commits 7e735035f2 and dddf4cdc33, this commit makes the order
of header file inclusion consistent for backend modules.

In the passing, removed a couple of duplicate inclusions.

Author: Vignesh C
Reviewed-by: Kuntal Ghosh and Amit Kapila
Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
2019-11-12 08:30:16 +05:30
Thomas Munro 7815e7efdb Add reusable routine for making arrays unique.
Introduce qunique() and qunique_arg(), which can be used after qsort()
and qsort_arg() respectively to remove duplicate values.  Use it where
appropriate.

Author: Thomas Munro
Reviewed-by: Tom Lane (in an earlier version)
Discussion: https://postgr.es/m/CAEepm%3D2vmFTNpAmwbGGD2WaryM6T3hSDVKQPfUwjdD_5XY6vAA%40mail.gmail.com
2019-11-07 17:00:48 +13:00
Tom Lane 22e44e8dbc Minor code review for tuple slot rewrite.
Avoid creating transiently-inconsistent slot states where possible,
by not setting TTS_FLAG_SHOULDFREE until after the slot actually has
a free'able tuple pointer, and by making sure that we reset tts_nvalid
and related derived state before we replace the tuple contents.  This
would only matter if something were to examine the slot after we'd
suffered some kind of error (e.g. out of memory) while manipulating
the slot.  We typically don't do that, so these changes might just be
cosmetic --- but even if so, it seems like good future-proofing.

Also remove some redundant Asserts, and add a couple for consistency.

Back-patch to v12 where all this code was rewritten.

Discussion: https://postgr.es/m/16095-c3ff2e5283b8dba5@postgresql.org
2019-11-06 12:00:17 -05:00
Andres Freund 01368e5d9d Split all OBJS style lines in makefiles into one-line-per-entry style.
When maintaining or merging patches, one of the most common sources
for conflicts are the list of objects in makefiles. Especially when
the split across lines has been changed on both sides, which is
somewhat common due to attempting to stay below 80 columns, those
conflicts are unnecessarily laborious to resolve.

By splitting, and alphabetically sorting, OBJS style lines into one
object per line, conflicts should be less frequent, and easier to
resolve when they still occur.

Author: Andres Freund
Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de
2019-11-05 14:41:07 -08:00
Robert Haas 2e8b6bfa90 Rename some toasting functions based on whether they are heap-specific.
The old names for the attribute-detoasting functions names included
the word "heap," which seems outdated now that the heap is only one of
potentially many table access methods.

On the other hand, toast_insert_or_update and toast_delete are
heap-specific, so rename them by adding "heap_" as a prefix.

Not all of the work of making the TOAST system fully accessible to AMs
other than the heap is done yet, but there seems to be little harm in
getting this renaming out of the way now. Commit
8b94dab066 already divided up the
functions among various files partially according to whether it was
intended that they should be heap-specific or AM-agnostic, so this is
just clarifying the division contemplated by that commit.

Patch by me, reviewed and tested by Prabhat Sabu, Thomas Munro,
Andres Freund, and Álvaro Herrera.

Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com
2019-10-04 14:24:46 -04:00
Andres Freund 36d22dd95b Don't generate EEOP_*_FETCHSOME operations for slots know to be virtual.
That avoids unnecessary work during both interpreted execution, and
JIT compiled expression evaluation. Both benefit from fewer expression
steps needing be processed, and for interpreted execution there now is
a fastpath dedicated to just fetching a value from a virtual
slot. That's e.g. beneficial for hashjoins over nodes that perform
projections, as the hashed columns are currently fetched individually.

Author: Soumyadeep Chakraborty, Andres Freund
Discussion: https://postgr.es/m/CAE-ML+9OKSN71+mHtfMD-L24oDp8dGTfaVjDU6U+j+FNAW5kRQ@mail.gmail.com
2019-09-30 16:06:16 -07:00
Andres Freund 34c9c53bb0 Reduce code duplication for ExecJust*Var operations.
This is mainly in preparation for adding further fastpath evaluation
routines.

Also reorder ExecJust*Var functions to be consistent with the order in
which they're used.

Author: Andres Freund
Discussion: https://postgr.es/m/CAE-ML+9OKSN71+mHtfMD-L24oDp8dGTfaVjDU6U+j+FNAW5kRQ@mail.gmail.com
2019-09-30 15:32:00 -07:00
Andres Freund ac88807f9b jit: Re-allow JIT compilation of execGrouping.c hashtable comparisons.
In the course of 5567d12ce0, 356687bd8 and 317ffdfeaa, I changed
BuildTupleHashTable[Ext]'s call to ExecBuildGroupingEqual to not pass
in the parent node, but NULL. Which in turn prevents the tuple
equality comparator from being JIT compiled.  While that fixes
bug #15486, it is not actually necessary after all of the above commits,
as we don't re-build the comparator when using the new
BuildTupleHashTableExt() interface (as the content of the hashtable
are reset, but the TupleHashTable itself is not).

Therefore re-allow jit compilation for callers that use
BuildTupleHashTableExt with a separate context for "metadata" and
content.

As in the previous commit, there's ongoing work to make this easier to
test to prevent such regressions in the future, but that
infrastructure is not going to be backpatchable.

The performance impact of not JIT compiling hashtable equality
comparators can be substantial e.g. for aggregation queries that
aggregate a lot of input rows to few output rows (when there are a lot
of output groups, there will be fewer comparisons).

Author: Andres Freund
Discussion: https://postgr.es/m/20190927072053.njf6prdl3vb7y7qb@alap3.anarazel.de
Backpatch: 11, just as 5567d12ce0
2019-09-29 16:24:32 -07:00
Andres Freund 97e971ee05 Fix determination when slot types for upper executor nodes are fixed.
For many queries the fact that the tuple descriptor from the lower
node was not taken into account when determining whether the type of a
slot is fixed, lead to tuple deforming for such upper nodes not to be
JIT accelerated.

I broke this in 675af5c01e.

There is ongoing work to enable writing regression tests for related
behavior (including a patch that would have detected this
regression), by optionally showing such details in EXPLAIN. But as it
seems unlikely that that will be suitable for stable branches, just
merge the fix for now.

While it's fairly close to the 12 release window, the fact that 11
continues to perform JITed tuple deforming in these cases, that
there's still cases where we do so in 12, and the fact that the
performance regression can be sizable, weigh in favor of fixing it
now.

Author: Andres Freund
Discussion: https://postgr.es/m/20190927072053.njf6prdl3vb7y7qb@alap3.anarazel.de
Backpatch: 12-, where 675af5c01e was merged.
2019-09-29 15:46:17 -07:00
Andres Freund 30d1379658 Fix ExprState's tag to be of type NodeTag rather than Node.
This appears to have been an oversight in b8d7f053c5. As it's
effectively harmless, though confusing, only fix in master.

Author: Andres Freund
2019-09-23 15:28:13 -07:00
Tom Lane 0a2f894c3c Fix typo in tts_virtual_copyslot.
The code used the destination slot's natts where it intended to
use the source slot's natts.  Adding an Assert shows that there
is no case in "make check-world" where these counts are different,
so maybe this is a harmless bug, but it's still a bug.

Takayuki Tsunakawa

Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1FD34C0E@G01JPEXMBYT05
2019-09-22 14:21:07 -04:00
Tom Lane d812257809 Fix bogus sizeof calculations.
Noted by Coverity.  Typo in 27cc7cd2b, so back-patch to v12
as that was.
2019-09-15 11:51:57 -04:00
Andres Freund 27cc7cd2bc Reorder EPQ work, to fix rowmark related bugs and improve efficiency.
In ad0bda5d24 I changed the EvalPlanQual machinery to store
substitution tuples in slot, instead of using plain HeapTuples. The
main motivation for that was that using HeapTuples will be inefficient
for future tableams.  But it turns out that that conversion was buggy
for non-locking rowmarks - the wrong tuple descriptor was used to
create the slot.

As a secondary issue 5db6df0c0 changed ExecLockRows() to begin EPQ
earlier, to allow to fetch the locked rows directly into the EPQ
slots, instead of having to copy tuples around. Unfortunately, as Tom
complained, that forces some expensive initialization to happen
earlier.

As a third issue, the test coverage for EPQ was clearly insufficient.

Fixing the first issue is unfortunately not trivial: Non-locked row
marks were fetched at the start of EPQ, and we don't have the type
information for the rowmarks available at that point. While we could
change that, it's not easy. It might be worthwhile to change that at
some point, but to fix this bug, it seems better to delay fetching
non-locking rowmarks when they're actually needed, rather than
eagerly. They're referenced at most once, and in cases where EPQ
fails, might never be referenced. Fetching them when needed also
increases locality a bit.

To be able to fetch rowmarks during execution, rather than
initialization, we need to be able to access the active EPQState, as
that contains necessary data. To do so move EPQ related data from
EState to EPQState, and, only for EStates creates as part of EPQ,
reference the associated EPQState from EState.

To fix the second issue, change EPQ initialization to allow use of
EvalPlanQualSlot() to be used before EvalPlanQualBegin() (but
obviously still requiring EvalPlanQualInit() to have been done).

As these changes made struct EState harder to understand, e.g. by
adding multiple EStates, significantly reorder the members, and add a
lot more comments.

Also add a few more EPQ tests, including one that fails for the first
issue above. More is needed.

Reported-By: yi huang
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion:
    https://postgr.es/m/CAHU7rYZo_C4ULsAx_LAj8az9zqgrD8WDd4hTegDTMM1LMqrBsg@mail.gmail.com
    https://postgr.es/m/24530.1562686693@sss.pgh.pa.us
Backpatch: 12-, where the EPQ changes were introduced
2019-09-09 05:14:11 -07:00
Robert Haas 8b94dab066 Split tuptoaster.c into three separate files.
detoast.c/h contain functions required to detoast a datum, partially
or completely, plus a few other utility functions for examining the
size of toasted datums.

toast_internals.c/h contain functions that are used internally to the
TOAST subsystem but which (mostly) do not need to be accessed from
outside.

heaptoast.c/h contains code that is intrinsically specific to the
heap AM, either because it operates on HeapTuples or is based on the
layout of a heap page.

detoast.c and toast_internals.c are placed in
src/backend/access/common rather than src/backend/access/heap.  At
present, both files still have dependencies on the heap, but that will
be improved in a future commit.

Patch by me, reviewed and tested by Prabhat Sabu, Thomas Munro,
Andres Freund, and Álvaro Herrera.

Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com
2019-09-05 13:15:10 -04:00
Alvaro Herrera fe66125974 Remove 'msg' parameter from convert_tuples_by_name
The message was included as a parameter when this function was added in
dcb2bda9b7, but I don't think it has ever served any useful purpose.
Let's stop spreading it pointlessly.

Reviewed by Amit Langote and Peter Eisentraut.

Discussion: https://postgr.es/m/20190806224728.GA17233@alvherre.pgsql
2019-09-03 14:47:29 -04:00
Michael Paquier c96581abe4 Fix inconsistencies and typos in the tree, take 11
This fixes various typos in docs and comments, and removes some orphaned
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/5da8e325-c665-da95-21e0-c8a99ea61fbf@gmail.com
2019-08-19 16:21:39 +09:00
Andres Freund 6a04d345fd Don't include utils/array.h from acl.h.
For most uses of acl.h the details of how "Acl" internally looks like
are irrelevant. It might make sense to move a lot of the
implementation details into a separate header at a later point.

The main motivation of this change is to avoid including fmgr.h (via
array.h, which needs it for exposed structs) in a lot of files that
otherwise don't need it. A subsequent commit will remove the fmgr.h
include from a lot of files.

Directly include utils/array.h and utils/expandeddatum.h from the
files that need them, but previously included them indirectly, via
acl.h.

Author: Andres Freund
Discussion: https://postgr.es/m/20190803193733.g3l3x3o42uv4qj7l@alap3.anarazel.de
2019-08-16 10:33:30 -07:00
Michael Paquier 66bde49d96 Fix inconsistencies and typos in the tree, take 10
This addresses some issues with unnecessary code comments, fixes various
typos in docs and comments, and removes some orphaned structures and
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/9aabc775-5494-b372-8bcb-4dfc0bd37c68@gmail.com
2019-08-13 13:53:41 +09:00
Tom Lane 3c926587b5 Remove EState.es_range_table_array.
Now that list_nth is O(1), there's no good reason to maintain a
separate array of RTE pointers rather than indexing into
estate->es_range_table.  Deleting the array doesn't save all that
much either; but just on cleanliness grounds, it's better not to
have duplicate representations of the identical information.

Discussion: https://postgr.es/m/14960.1565384592@sss.pgh.pa.us
2019-08-12 11:58:35 -04:00
Tom Lane 5ee190f8ec Rationalize use of list_concat + list_copy combinations.
In the wake of commit 1cff1b95a, the result of list_concat no longer
shares the ListCells of the second input.  Therefore, we can replace
"list_concat(x, list_copy(y))" with just "list_concat(x, y)".

To improve call sites that were list_copy'ing the first argument,
or both arguments, invent "list_concat_copy()" which produces a new
list sharing no ListCells with either input.  (This is a bit faster
than "list_concat(list_copy(x), y)" because it makes the result list
the right size to start with.)

In call sites that were not list_copy'ing the second argument, the new
semantics mean that we are usually leaking the second List's storage,
since typically there is no remaining pointer to it.  We considered
inventing another list_copy variant that would list_free the second
input, but concluded that for most call sites it isn't worth worrying
about, given the relative compactness of the new List representation.
(Note that in cases where such leakage would happen, the old code
already leaked the second List's header; so we're only discussing
the size of the leak not whether there is one.  I did adjust two or
three places that had been troubling to free that header so that
they manually free the whole second List.)

Patch by me; thanks to David Rowley for review.

Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-08-12 11:20:18 -04:00
Tom Lane 4766dce0dd Fix choice of comparison operators for cross-type hashed subplans.
Commit bf6c614a2 rearranged the lookup of the comparison operators
needed in a hashed subplan, and in so doing, broke the cross-type
case: it caused the original LHS-vs-RHS operator to be used to compare
hash table entries too (which of course are all of the RHS type).
This leads to C functions being passed a Datum that is not of the
type they expect, with the usual hazards of crashes and unauthorized
server memory disclosure.

For the set of hashable cross-type operators present in v11 core
Postgres, this bug is nearly harmless on 64-bit machines, which
may explain why it escaped earlier detection.  But it is a live
security hazard on 32-bit machines; and of course there may be
extensions that add more hashable cross-type operators, which
would increase the risk.

Reported by Andreas Seltenreich.  Back-patch to v11 where the
problem came in.

Security: CVE-2019-10209
2019-08-05 11:20:31 -04:00
Michael Paquier 8548ddc61b Fix inconsistencies and typos in the tree, take 9
This addresses more issues with code comments, variable names and
unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/7ab243e0-116d-3e44-d120-76b3df7abefd@gmail.com
2019-08-05 12:14:58 +09:00
Andres Freund 2abd7ae9b2 Fix representation of hash keys in Hash/HashJoin nodes.
In 5f32b29c18 I changed the creation of HashState.hashkeys to
actually use HashState as the parent (instead of HashJoinState, which
was incorrect, as they were executed below HashState), to fix the
problem of hashkeys expressions otherwise relying on slot types
appropriate for HashJoinState, rather than HashState as would be
correct. That reliance was only introduced in 12, which is why it
previously worked to use HashJoinState as the parent (although I'd be
unsurprised if there were problematic cases).

Unfortunately that's not a sufficient solution, because before this
commit, the to-be-hashed expressions referenced inner/outer as
appropriate for the HashJoin, not Hash. That didn't have obvious bad
consequences, because the slots containing the tuples were put into
ecxt_innertuple when hashing a tuple for HashState (even though Hash
doesn't have an inner plan).

There are less common cases where this can cause visible problems
however (rather than just confusion when inspecting such executor
trees). E.g. "ERROR: bogus varno: 65000", when explaining queries
containing a HashJoin where the subsidiary Hash node's hash keys
reference a subplan. While normally hashkeys aren't displayed by
EXPLAIN, if one of those expressions references a subplan, that
subplan may be printed as part of the Hash node - which then failed
because an inner plan was referenced, and Hash doesn't have that.

It seems quite possible that there's other broken cases, too.

Fix the problem by properly splitting the expression for the HashJoin
and Hash nodes at plan time, and have them reference the proper
subsidiary node. While other workarounds are possible, fixing this
correctly seems easy enough. It was a pretty ugly hack to have
ExecInitHashJoin put the expression into the already initialized
HashState, in the first place.

I decided to not just split inner/outer hashkeys inside
make_hashjoin(), but also to separate out hashoperators and
hashcollations at plan time. Otherwise we would have ended up having
two very similar loops, one at plan time and the other during executor
startup. The work seems to more appropriately belong to plan time,
anyway.

Reported-By: Nikita Glukhov, Alexander Korotkov
Author: Andres Freund
Reviewed-By: Tom Lane, in an earlier version
Discussion: https://postgr.es/m/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR+teQ_8tEXU8mxg@mail.gmail.com
Backpatch: 12-
2019-08-02 00:02:46 -07:00
Andres Freund 870b1d6800 Remove superfluous newlines in function prototypes.
These were introduced by pgindent due to fixe to broken
indentation (c.f. 8255c7a5ee). Previously the mis-indentation of
function prototypes was creatively used to reduce indentation in a few
places.

As that formatting only exists in master and REL_12_STABLE, it seems
better to fix it in both, rather than having some odd indentation in
v12 that somebody might copy for future patches or such.

Author: Andres Freund
Discussion: https://postgr.es/m/20190728013754.jwcbe5nfyt3533vx@alap3.anarazel.de
Backpatch: 12-
2019-07-31 00:05:21 -07:00
Andres Freund af3deff3f2 Fix slot type handling for Agg nodes performing internal sorts.
Since 15d8f8312 we assert that - and since 7ef04e4d2c, 4da597edf1
rely on - the slot type for an expression's
ecxt_{outer,inner,scan}tuple not changing, unless explicitly flagged
as such. That allows to either skip deforming (for a virtual tuple
slot) or optimize the code for JIT accelerated deforming
appropriately (for other known slot types).

This assumption was sometimes violated for grouping sets, when
nodeAgg.c internally uses tuplesorts, and the child node doesn't
return a TTSOpsMinimalTuple type slot. Detect that case, and flag that
the outer slot might not be "fixed".

It's probably worthwhile to optimize this further in the future, and
more granularly determine whether the slot is fixed. As we already
instantiate per-phase transition and equal expressions, we could
cheaply set the slot type appropriately for each phase.  But that's a
separate change from this bugfix.

This commit does include a very minor optimization by avoiding to
create a slot for handling tuplesorts, if no such sorts are
performed. Previously we created that slot unnecessarily in the common
case of computing all grouping sets via hashing. The code looked too
confusing without that, as the conditions for needing a sort slot and
flagging that the slot type isn't fixed, are the same.

Reported-By: Ashutosh Sharma
Author: Andres Freund
Discussion: https://postgr.es/m/CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com
Backpatch: 12-, where the bug was introduced in 15d8f8312
2019-07-25 14:28:55 -07:00
Andres Freund ecbdd00934 Fix system column accesses in ON CONFLICT ... RETURNING.
After 277cb78983 ON CONFLICT ... SET ... RETURNING failed with
ERROR:  virtual tuple table slot does not have system attributes
when taking the update path, as the slot used to insert into the
table (and then process RETURNING) was defined to be a virtual slot in
that commit. Virtual slots don't support system columns except for
tableoid and ctid, as the other system columns are AM dependent.

Fix that by using a slot of the table's type. Add tests for system
column accesses in ON CONFLICT ...  RETURNING.

Reported-By: Roby, bisected to the relevant commit by Jeff Janes
Author: Andres Freund
Discussion: https://postgr.es/m/73436355-6432-49B1-92ED-1FE4F7E7E100@finefun.com.au
Backpatch: 12-, where the bug was introduced in 277cb78983
2019-07-24 18:45:58 -07:00
David Rowley 1e6a759838 Use appendBinaryStringInfo in more places where the length is known
When we already know the length that we're going to append, then it
makes sense to use appendBinaryStringInfo instead of
appendStringInfoString so that the append can be performed with a simple
memcpy() using a known length rather than having to first perform a
strlen() call to obtain the length.

Discussion: https://postgr.es/m/CAKJS1f8+FRAM1s5+mAa3isajeEoAaicJ=4e0WzrH3tAusbbiMQ@mail.gmail.com
2019-07-23 00:14:11 +12:00
David Rowley efdcca55a3 Make better use of the new List implementation in a couple of places
In nodeAppend.c and nodeMergeAppend.c there were some foreach loops which
looped over the list of subplans and only performed any work if the
subplan index was found in a Bitmapset.  With the old linked list
implementation of List, this form made sense as accessing the Nth list
element was O(N).  However, thanks to 1cff1b95a we now have array-based
lists, so accessing the Nth element has become O(1).

Here we make the most of the O(1) lookups and just loop over the set
members of the Bitmapset with bms_next_member().  This performs slightly
better when a small number of the list items are in the Bitmapset.  Micro
benchmarks show that when the Bitmapset contains all or most of the list
items then the new code is ever so slightly slower.  In practice, the cost
is so small that it's drowned out by various other things such as locking
the relations belonging to each subplan, etc.

The primary goal here is to leave better code examples around which benefit
better from the new list implementation.

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAKJS1f8ZcsLVgkF4wOfRyMYTcPgLFiUAOedFC+U2vK_aFZk-BA@mail.gmail.com
2019-07-22 19:03:12 +12:00
Michael Paquier 23bccc823d Fix inconsistencies and typos in the tree
This is numbered take 7, and addresses a set of issues with code
comments, variable names and unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/dff75442-2468-f74f-568c-6006e141062f@gmail.com
2019-07-22 10:01:50 +09:00
Tom Lane bc8393cf27 Further adjust SPITupleTable to provide a public row-count field.
Now that commit fec0778c8 drew a clear line between public and private
fields in SPITupleTable, it seems pretty silly that the count of valid
tuples isn't on the public side of that line.  The reason why not was
that there wasn't such a count.  For reasons lost in the mists of time,
spi.c preferred to keep a count of remaining free entries in the array.
But that seems pretty pointless: it's unlike the way we handle similar
code everywhere else, and it involves extra subtractions that surely
outweigh having to do a comparison rather than test-for-zero to check
for array-full.

Hence, rearrange so that this code does the expansible array logic
the same as everywhere else, with a count of valid entries alongside
the allocated array length.  And document the count as public.

I looked for core-code callers where it would make sense to start
relying on tuptable->numvals rather than the separate SPI_processed
variable.  Right now there don't seem to be places where it'd be
a win to do so without more code restructuring than I care to
undertake today.  In principle, though, having SPITupleTables be
fully self-contained should be helpful down the line.

Discussion: https://postgr.es/m/16852.1563395722@sss.pgh.pa.us
2019-07-18 10:37:13 -04:00
Tom Lane d97b714a21 Avoid using lcons and list_delete_first where it's easy to do so.
Formerly, lcons was about the same speed as lappend, but with the new
List implementation, that's not so; with a long List, data movement
imposes an O(N) cost on lcons and list_delete_first, but not lappend.

Hence, invent list_delete_last with semantics parallel to
list_delete_first (but O(1) cost), and change various places to use
lappend and list_delete_last where this can be done without much
violence to the code logic.

There are quite a few places that construct result lists using lcons not
lappend.  Some have semantic rationales for that; I added comments about
it to a couple that didn't have them already.  In many such places though,
I think the coding is that way only because back in the dark ages lcons
was faster than lappend.  Hence, switch to lappend where this can be done
without causing semantic changes.

In ExecInitExprRec(), this results in aggregates and window functions that
are in the same plan node being executed in a different order than before.
Generally, the executions of such functions ought to be independent of
each other, so this shouldn't result in visibly different query results.
But if you push it, as one regression test case does, you can show that
the order is different.  The new order seems saner; it's closer to
the order of the functions in the query text.  And we never documented
or promised anything about this, anyway.

Also, in gistfinishsplit(), don't bother building a reverse-order list;
it's easy now to iterate backwards through the original list.

It'd be possible to go further towards removing uses of lcons and
list_delete_first, but it'd require more extensive logic changes,
and I'm not convinced it's worth it.  Most of the remaining uses
deal with queues that probably never get long enough to be worth
sweating over.  (Actually, I doubt that any of the changes in this
patch will have measurable performance effects either.  But better
to have good examples than bad ones in the code base.)

Patch by me, thanks to David Rowley and Daniel Gustafsson for review.

Discussion: https://postgr.es/m/21272.1563318411@sss.pgh.pa.us
2019-07-17 11:15:34 -04:00
Michael Paquier 0896ae561b Fix inconsistencies and typos in the tree
This is numbered take 7, and addresses a set of issues around:
- Fixes for typos and incorrect reference names.
- Removal of unneeded comments.
- Removal of unreferenced functions and structures.
- Fixes regarding variable name consistency.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/10bfd4ac-3e7c-40ab-2b2e-355ed15495e8@gmail.com
2019-07-16 13:23:53 +09:00