Commit Graph

2106 Commits

Author SHA1 Message Date
Jeff Davis 2302302236 HashAgg: before spilling tuples, set unneeded columns to NULL.
This is a replacement for 4cad2534. Instead of projecting all tuples
going into a HashAgg, only remove unnecessary attributes when actually
spilling. This avoids the regression for the in-memory case.

Discussion: https://postgr.es/m/a2fb7dfeb4f50aa0a123e42151ee3013933cb802.camel%40j-davis.com
Backpatch-through: 13
2020-07-12 22:59:32 -07:00
Andres Freund e07633646a code: replace 'master' with 'leader' where appropriate.
Leader already is the more widely used terminology, but a few places
didn't get the message.

Author: Andres Freund
Reviewed-By: David Steele
Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de
2020-07-08 12:58:32 -07:00
David Rowley 9bdb300ded Fix EXPLAIN ANALYZE for parallel HashAgg plans
Since 1f39bce02, HashAgg nodes have had the ability to spill to disk when
memory consumption exceeds work_mem. That commit added new properties to
EXPLAIN ANALYZE to show the maximum memory usage and disk usage, however,
it didn't quite go as far as showing that information for parallel
workers.  Since workers may have experienced something very different from
the main process, we should show this information per worker, as is done
in Sort.

Reviewed-by: Justin Pryzby
Reviewed-by: Jeff Davis
Discussion: https://postgr.es/m/CAApHDvpEKbfZa18mM1TD7qV6PG+w97pwCWq5tVD0dX7e11gRJw@mail.gmail.com
Backpatch-through: 13, where the hashagg spilling code was added.
2020-06-19 17:24:27 +12:00
Thomas Munro 7897e3bb90 Fix buffile.c error handling.
Convert buffile.c error handling to use ereport.  This fixes cases where
I/O errors were indistinguishable from EOF or not reported.  Also remove
"%m" from error messages where errno would be bogus.  While we're
modifying those strings, add block numbers and short read byte counts
where appropriate.

Back-patch to all supported releases.

Reported-by: Amit Khandekar <amitdkhan.pg@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGJE04G%3D8TLK0DLypT_27D9dR8F1RQgNp0jK6qR0tZGWOw%40mail.gmail.com
2020-06-16 16:59:07 +12:00
Tom Lane 2f48ede080 Avoid using a cursor in plpgsql's RETURN QUERY statement.
plpgsql has always executed the query given in a RETURN QUERY command
by opening it as a cursor and then fetching a few rows at a time,
which it turns around and dumps into the function's result tuplestore.
The point of this was to keep from blowing out memory with an oversized
SPITupleTable result (note that while a tuplestore can spill tuples
to disk, SPITupleTable cannot).  However, it's rather inefficient, both
because of extra data copying and because of executor entry/exit
overhead.  In recent versions, a new performance problem has emerged:
use of a cursor prevents use of a parallel plan for the executed query.

We can improve matters by skipping use of a cursor and having the
executor push result tuples directly into the function's result
tuplestore.  However, a moderate amount of new infrastructure is needed
to make that idea work:

* We can use the existing tstoreReceiver.c DestReceiver code to funnel
executor output to the tuplestore, but it has to be extended to support
plpgsql's requirement for possibly applying a tuple conversion map.

* SPI needs to be extended to allow use of a caller-supplied
DestReceiver instead of its usual receiver that puts tuples into
a SPITupleTable.  Two new API calls are needed to handle both the
RETURN QUERY and RETURN QUERY EXECUTE cases.

I also felt that I didn't want these new API calls to use the legacy
method of specifying query parameter values with "char" null flags
(the old ' '/'n' convention); rather they should accept ParamListInfo
objects containing the parameter type and value info.  This required
a bit of additional new infrastructure since we didn't yet have any
parse analysis callback that would interpret $N parameter symbols
according to type data supplied in a ParamListInfo.  There seems to be
no harm in letting makeParamList install that callback by default,
rather than leaving a new ParamListInfo's parserSetup hook as NULL.
(Indeed, as of HEAD, I couldn't find anyplace that was using the
parserSetup field at all; plpgsql was using parserSetupArg for its
own purposes, but parserSetup seemed to be write-only.)

We can actually get plpgsql out of the business of using legacy null
flags altogether, and using ParamListInfo instead of its ad-hoc
PreparedParamsData structure; but this requires inventing one more
SPI API call that can replace SPI_cursor_open_with_args.  That seems
worth doing, though.

SPI_execute_with_args and SPI_cursor_open_with_args are now unused
anywhere in the core PG distribution.  Perhaps someday we could
deprecate/remove them.  But cleaning up the crufty bits of the SPI
API is a task for a different patch.

Per bug #16040 from Jeremy Smith.  This is unfortunately too invasive to
consider back-patching.  Patch by me; thanks to Hamid Akhtar for review.

Discussion: https://postgr.es/m/16040-eaacad11fecfb198@postgresql.org
2020-06-12 12:14:32 -04:00
Jeff Davis 1b2c29469a Fix HashAgg regression from choosing too many initial buckets.
Diagnosis by Andres.

Reported-by: Pavel Stehule
Discussion: https://postgr.es/m/CAFj8pRDLVakD5Aagt3yZeEQeTeEWaS3YE5h8XC3Q3qJ6TYkc2Q%40mail.gmail.com
Backpatch-through: 13
2020-06-08 21:04:16 -07:00
Tom Lane 3048898e73 Mop-up for wait event naming issues.
Synchronize the event names for parallel hash join waits with other
event names, by getting rid of the slashes and dropping "-ing"
suffixes.  Rename ClogGroupUpdate to XactGroupUpdate, to match the
new SLRU name.  Move the ProcSignalBarrier event to the IPC category;
it doesn't belong under IO.

Also a bit more wordsmithing in the wait event documentation tables.

Discussion: https://postgr.es/m/4505.1589640417@sss.pgh.pa.us
2020-05-16 21:00:11 -04:00
Tom Lane fa27dd40d5 Run pgindent with new pg_bsd_indent version 2.1.1.
Thomas Munro fixed a longstanding annoyance in pg_bsd_indent, that
it would misformat lines containing IsA() macros on the assumption
that the IsA() call should be treated like a cast.  This improves
some other cases involving field/variable names that match typedefs,
too.  The only places that get worse are a couple of uses of the
OpenSSL macro STACK_OF(); we'll gladly take that trade-off.

Discussion: https://postgr.es/m/20200114221814.GA19630@alvherre.pgsql
2020-05-16 11:54:51 -04:00
Michael Paquier 7ccb2f54d9 Fix assertion with relation using REPLICA IDENTITY FULL in subscriber
In a logical replication subscriber, a table using REPLICA IDENTITY FULL
which has a primary key would try to use the primary key's index
available to scan for a tuple, but an assertion only assumed as correct
the case of an index associated to REPLICA IDENTITY USING INDEX.  This
commit corrects the assertion so as the use of a primary key index is a
valid case.

Reported-by: Dilip Kumar
Analyzed-by: Dilip Kumar
Author: Euler Taveira
Reviewed-by: Michael Paquier, Masahiko Sawada
Discussion: https://postgr.es/m/CAFiTN-u64S5bUiPL1q5kwpHNd0hRnf1OE-bzxNiOs5zo84i51w@mail.gmail.com
Backpatch-through: 10
2020-05-16 18:15:18 +09:00
Tom Lane 5cbfce562f Initial pgindent and pgperltidy run for v13.
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.

Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
2020-05-14 13:06:50 -04:00
Alvaro Herrera 17cc133f01
Dial back -Wimplicit-fallthrough to level 3
The additional pain from level 4 is excessive for the gain.

Also revert all the source annotation changes to their original
wordings, to avoid back-patching pain.

Discussion: https://postgr.es/m/31166.1589378554@sss.pgh.pa.us
2020-05-13 15:31:14 -04:00
Alvaro Herrera 3e9744465d
Add -Wimplicit-fallthrough to CFLAGS and CXXFLAGS
Use it at level 4, a bit more restrictive than the default level, and
tweak our commanding comments to FALLTHROUGH.

(However, leave zic.c alone, since it's external code; to avoid the
warnings that would appear there, change CFLAGS for that file in the
Makefile.)

Author: Julien Rouhaud <rjuju123@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20200412081825.qyo5vwwco3fv4gdo@nol
Discussion: https://postgr.es/m/flat/E1fDenm-0000C8-IJ@gemulon.postgresql.org
2020-05-12 16:07:30 -04:00
Tomas Vondra 1a40d37a9f Fix typos and improve incremental sort comments
Author: Justin Pryzby, James Coleman
Discussion: https://postgr.es/m/20200419023625.GP26953@telsasoft.com
2020-05-12 19:37:13 +02:00
Tomas Vondra 9155b4be9a Do no reset bounded before incremental sort rescan
ExecReScanIncrementalSort was resetting bounded=false, which means the
optimization would be disabled on all rescans. This happens because
ExecSetTupleBound is called before the rescan, not after it.

Author: James Coleman
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/20200414065336.GI1492@paquier.xyz
2020-05-09 19:41:36 +02:00
Tomas Vondra c442722648 Fix handling of REWIND/MARK/BACKWARD in incremental sort
The executor flags were not handled entirely correctly, although the
bugs were mostly harmless and it was mostly comment inaccuracy. We don't
need to strip any of the flags for child nodes.

Incremental sort does not support backward scans of mark/restore, so
MARK/BACKWARDS flags should not be possible. So we simply ensure this
using an assert, and we don't bother removing them when initializing
the child node.

With REWIND it's a bit less clear - incremental sort does not support
REWIND, but there is no way to signal this - it's legal to just ignore
the flag. We however continue passing the flag to child nodes, because
they might be useful to leverage that.

Reported-by: Michael Paquier
Author: James Coleman
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/20200414065336.GI1492@paquier.xyz
2020-05-09 19:41:18 +02:00
Amit Kapila 69bfaf2e1d Change the display of WAL usage statistics in Explain.
In commit 33e05f89c5, we have added the option to display WAL usage
statistics in Explain and auto_explain.  The display format used two spaces
between each field which is inconsistent with Buffer usage statistics which
is using one space between each field.  Change the format to make WAL usage
statistics consistent with Buffer usage statistics.

This commit also changed the usage of "full page writes" to
"full page images" for WAL usage statistics to make it consistent with
other parts of code and docs.

Author: Julien Rouhaud, Amit Kapila
Reviewed-by: Justin Pryzby, Kyotaro Horiguchi and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-05-05 08:00:53 +05:30
Andres Freund 299298bc87 Fix transient memory leak for SRFs in FROM.
In a9c35cf85c I changed ExecMakeTableFunctionResult() to dynamically
allocate the FunctionCallInfo used to call the SRF. Unfortunately I
did not account for the fact that the surrounding memory context has
query lifetime, leading to a leak till the end of the query.

In most cases the leak is fairly inconsequential, but if the
FunctionScan is done many times in the query, the leak can add
up. This happens e.g. if the function scan is on the inner side of a
nested loop, due to a lateral join.
EXPLAIN SELECT sum(f) FROM generate_series(1, 100000000) g(i), generate_series(i, i+1) f;
quickly shows the leak.

Instead of explicitly freeing the FunctionCallInfo it seems better to
make sure all the per-set temporary state in
ExecMakeTableFunctionResult() is cleaned up wholesale. Currently
that's probably just the FunctionCallInfo allocation, but since
there's some initialization work, and since there's already an
appropriate context, this seems like a more robust approach.

Bug: #16112
Reported-By: Ben Cornett
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/16112-4448bbf55a404189%40postgresql.org
Backpatch: 12, a9c35cf85c
2020-04-22 19:53:06 -07:00
Tom Lane 5836d32655 Fix minor violations of FunctionCallInvoke usage protocol.
Working on commit 1c455078b led me to check through FunctionCallInvoke
call sites to see if every one was being honest about (a) making sure
that fcinfo.isnull is initially false, and (b) checking its state after
the call.  Sure enough, I found some violations.

The main one is that finalize_partialaggregate re-used serialfn_fcinfo
without resetting isnull, even though it clearly intends to cater for
serialfns that return NULL.  There would only be an issue with a
non-strict serialfn, since it's unlikely that a serialfn would return
NULL for non-null input.  We have no non-strict serialfns in core, and
there may be none in the wild either, which would account for the lack
of complaints.  Still, it's clearly wrong, so back-patch that fix to
9.6 where finalize_partialaggregate was introduced.

Also, arrayfuncs.c and rowtypes.c contained various callers that were
not bothering to check for result nulls.  While what's being called is
a comparison or hash function that probably *shouldn't* return null,
that's a lousy excuse for not having any check at all.  There are
existing places that just Assert(!fcinfo->isnull) in comparable
situations, so I added that to the places that were calling btree
comparison or hash support functions.  In the places calling
boolean-returning equality functions, it's quite cheap to have them
treat isnull as FALSE, so make those places do that.  Also remove some
"locfcinfo->isnull = false" assignments that are unnecessary given the
assumption that no previous call returned null.  These changes seem like
mostly neatnik-ism or debugging support, so I didn't back-patch.
2020-04-21 14:23:53 -04:00
David Rowley 3cb02e307e Fix possible crash with GENERATED ALWAYS columns
In some corner cases, this could also lead to corrupted values being
included in the tuple.

Users who are concerned that they are affected by this should first
upgrade and then perform a base backup of their database and restore onto
an off-line server. They should then query each table with generated
columns to ensure there are no rows where the generated expression does
not match a newly calculated version of the GENERATED ALWAYS expression.
If no crashes occur and no rows are returned then you're not affected.

Fixes bug #16369.

Reported-by: Cameron Ezell
Discussion: https://postgr.es/m/16369-5845a6f1bef59884@postgresql.org
Backpatch-through: 12 (where GENERATED ALWAYS columns were added.)
2020-04-18 14:10:37 +12:00
Tom Lane 2d59643dbc Account for collation when coercing the output of a SQL function.
Commit 913bbd88d overlooked that the result of coerce_to_target_type
might need collation fixups.  Per report from Andreas Joseph Krogh.

Discussion: https://postgr.es/m/VisenaEmail.72.37d08ec2b8cb8fb5.17179940cd3@tc7-visena
2020-04-14 17:30:36 -04:00
Michael Paquier 8128b0c152 Fix collection of typos and grammar mistakes in the tree, volume 2
This fixes some comments and documentation new as of Postgres 13, and is
a follow-up of the work done in dd0f37e.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com
2020-04-14 14:45:43 +09:00
Amit Kapila ef08ca113f Cosmetic fixups for WAL usage work.
Reported-by: Justin Pryzby and Euler Taveira
Author: Justin Pryzby and Julien Rouhaud
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-13 15:31:16 +05:30
Tom Lane 35cb574aa8 Suppress -Wimplicit-fallthrough warning in new LIMIT WITH TIES code.
The placement of the fall-through comment in this code appears not to
work to suppress the warning in recent gcc.  Move it to the bottom of
the case group, and add an assertion that we didn't get there through
some other code path.  Also improve wording of nearby comments.

Julien Rouhaud, comment hacking by me

Discussion: https://postgr.es/m/CAOBaU_aLdPGU5wCpaowNLF-Q8328iR7mj1yJAhMOVsdLwY+sdg@mail.gmail.com
2020-04-11 15:02:44 -04:00
Noah Misch 328c70997b Optimize RelationFindReplTupleSeq() for CLOBBER_CACHE_ALWAYS.
Specifically, remember lookup_type_cache() results instead of retrieving
them once per comparison.  Under CLOBBER_CACHE_ALWAYS, this reduced
src/test/subscription/t/001_rep_changes.pl elapsed time by an order of
magnitude, which reduced check-world elapsed time by 9%.

Discussion: https://postgr.es/m/20200406085420.GC162712@rfd.leadboat.com
2020-04-11 10:30:12 -07:00
Tom Lane 969f9d0b4b Make EXPLAIN report maximum hashtable usage across multiple rescans.
Before discarding the old hash table in ExecReScanHashJoin, capture
its statistics, ensuring that we report the maximum hashtable size
across repeated rescans of the hash input relation.  We can repurpose
the existing code for reporting hashtable size in parallel workers
to help with this, making the patch pretty small.  This also ensures
that if rescans happen within parallel workers, we get the correct
maximums across all instances.

Konstantin Knizhnik and Tom Lane, per diagnosis by Thomas Munro
of a trouble report from Alvaro Herrera.

Discussion: https://postgr.es/m/20200323165059.GA24950@alvherre.pgsql
2020-04-11 12:39:19 -04:00
Tom Lane 5c27bce7f3 Clear dangling pointer to avoid bogus EXPLAIN printout in a corner case.
ExecReScanHashJoin will destroy the join's hash table if it expects
that the inner relation will produce different rows on rescan.
Up to now it's not bothered to clear the additional pointer to that
hash table that exists in the child HashState node.  However, it's
possible for the query to terminate without building a fresh hash
table (this happens if the outer relation is found to be empty
during the final rescan).  So we can end with a dangling pointer
to a deleted hash table.  That was harmless originally, but since
9.0 EXPLAIN ANALYZE has used that pointer to print hash table
statistics.  In debug builds this reproducibly results in garbage
statistics.  In non-debug builds there's frequently no ill effects,
but in principle one could get wrong EXPLAIN ANALYZE output, or
perhaps even a crash if free() has released the hashtable memory
back to the OS.

To fix, just make sure we clear the additional pointer when destroying
the hash table.  In problematic cases, EXPLAIN ANALYZE will then print
no hashtable statistics (reverting to its pre-9.0 behavior).  This isn't
ideal, but since the problem manifests only in unusual corner cases,
it's hard to justify taking any risks to do better in the back
branches.  A follow-on patch will improve matters in HEAD.

Konstantin Knizhnik and Tom Lane, per diagnosis by Thomas Munro
of a trouble report from Alvaro Herrera.

Discussion: https://postgr.es/m/20200323165059.GA24950@alvherre.pgsql
2020-04-11 12:29:06 -04:00
Michael Paquier dd0f37ecce Fix collection of typos and grammar mistakes in the tree
This fixes some comments and documentation new as of Postgres 13.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com
2020-04-10 11:18:39 +09:00
Amit Kapila 5c71362174 Allow parallel create index to accumulate buffer usage stats.
Currently, we don't account for buffer usage incurred by parallel workers
for parallel create index.  This commit allows each worker to record the
buffer usage stats and leader backend to accumulate that stats at the
end of the operation.  This will allow pg_stat_statements to display
correct buffer usage stats for (parallel) create index command.

Reported-by: Julien Rouhaud
Author: Sawada Masahiko
Reviewed-by: Dilip Kumar, Julien Rouhaud and Amit Kapila
Backpatch-through: 11, where this was introduced
Discussion: https://postgr.es/m/20200328151721.GB12854@nol
2020-04-09 09:49:30 +05:30
David Rowley 02a2e8b442 Modify additional power 2 calculations to use new helper functions
2nd pass of modifying various places which obtain the next power
of 2 of a number and make them use the new functions added in
f0705bb62.

In passing, also modify num_combinations(). This can be implemented
using simple bitshifting rather than looping.

Reviewed-by: John Naylor
Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org
2020-04-08 18:29:51 +12:00
Jeff Davis 50a38f6517 Create memory context for HashAgg with a reasonable maxBlockSize.
If the memory context's maxBlockSize is too big, a single block
allocation can suddenly exceed work_mem. For Hash Aggregation, this
can mean spilling to disk too early or reporting a confusing memory
usage number for EXPLAN ANALYZE.

Introduce CreateWorkExprContext(), which is like CreateExprContext(),
except that it creates the AllocSet with a maxBlockSize that is
reasonable in proportion to work_mem.

Right now, CreateWorkExprContext() is only used by Hash Aggregation,
but it may be generally useful in the future.

Discussion: https://postgr.es/m/412a3fbf306f84d8d78c4009e11791867e62b87c.camel@j-davis.com
2020-04-07 21:25:28 -07:00
Alvaro Herrera 357889eb17
Support FETCH FIRST WITH TIES
WITH TIES is an option to the FETCH FIRST N ROWS clause (the SQL
standard's spelling of LIMIT), where you additionally get rows that
compare equal to the last of those N rows by the columns in the
mandatory ORDER BY clause.

There was a proposal by Andrew Gierth to implement this functionality in
a more powerful way that would yield more features, but the other patch
had not been finished at this time, so we decided to use this one for
now in the spirit of incremental development.

Author: Surafel Temesgen <surafel3000@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Discussion: https://postgr.es/m/CALAY4q9ky7rD_A4vf=FVQvCGngm3LOes-ky0J6euMrg=_Se+ag@mail.gmail.com
Discussion: https://postgr.es/m/87o8wvz253.fsf@news-spur.riddles.org.uk
2020-04-07 16:22:13 -04:00
Tomas Vondra d2d8a229bc Implement Incremental Sort
Incremental Sort is an optimized variant of multikey sort for cases when
the input is already sorted by a prefix of the requested sort keys. For
example when the relation is already sorted by (key1, key2) and we need
to sort it by (key1, key2, key3) we can simply split the input rows into
groups having equal values in (key1, key2), and only sort/compare the
remaining column key3.

This has a number of benefits:

- Reduced memory consumption, because only a single group (determined by
  values in the sorted prefix) needs to be kept in memory. This may also
  eliminate the need to spill to disk.

- Lower startup cost, because Incremental Sort produce results after each
  prefix group, which is beneficial for plans where startup cost matters
  (like for example queries with LIMIT clause).

We consider both Sort and Incremental Sort, and decide based on costing.

The implemented algorithm operates in two different modes:

- Fetching a minimum number of tuples without check of equality on the
  prefix keys, and sorting on all columns when safe.

- Fetching all tuples for a single prefix group and then sorting by
  comparing only the remaining (non-prefix) keys.

We always start in the first mode, and employ a heuristic to switch into
the second mode if we believe it's beneficial - the goal is to minimize
the number of unnecessary comparions while keeping memory consumption
below work_mem.

This is a very old patch series. The idea was originally proposed by
Alexander Korotkov back in 2013, and then revived in 2017. In 2018 the
patch was taken over by James Coleman, who wrote and rewrote most of the
current code.

There were many reviewers/contributors since 2013 - I've done my best to
pick the most active ones, and listed them in this commit message.

Author: James Coleman, Alexander Korotkov
Reviewed-by: Tomas Vondra, Andreas Karlsson, Marti Raudsepp, Peter Geoghegan, Robert Haas, Thomas Munro, Antonin Houska, Andres Freund, Alexander Kuzmenkov
Discussion: https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
2020-04-06 21:35:10 +02:00
Peter Eisentraut f1ac27bfda Add logical replication support to replicate into partitioned tables
Mainly, this adds support code in logical/worker.c for applying
replicated operations whose target is a partitioned table to its
relevant partitions.

Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Petr Jelinek <petr@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com
2020-04-06 15:15:52 +02:00
Amit Kapila df3b181499 Add infrastructure to track WAL usage.
This allows gathering the WAL generation statistics for each statement
execution.  The three statistics that we collect are the number of WAL
records, the number of full page writes and the amount of WAL bytes
generated.

This helps the users who have write-intensive workload to see the impact
of I/O due to WAL.  This further enables us to see approximately what
percentage of overall WAL is due to full page writes.

In the future, we can extend this functionality to allow us to compute the
the exact amount of WAL data due to full page writes.

This patch in itself is just an infrastructure to compute WAL usage data.
The upcoming patches will expose this data via explain, auto_explain,
pg_stat_statements and verbose (auto)vacuum output.

Author: Kirill Bychik, Julien Rouhaud
Reviewed-by: Dilip Kumar, Fujii Masao and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-04 10:02:08 +05:30
Jeff Davis 0588ee63aa Include chunk overhead in hash table entry size estimate.
Don't try to be precise about it, just use a constant 16 bytes of
chunk overhead. Being smarter would require knowing the memory context
where the chunk will be allocated, which is not known by all callers.

Discussion: https://postgr.es/m/20200325220936.il3ni2fj2j2b45y5@alap3.anarazel.de
2020-04-03 20:07:58 -07:00
Fujii Masao 6aba63ef3e Allow the planner-related functions and hook to accept the query string.
This commit adds query_string argument into the planner-related functions
and hook and allows us to pass the query string to them.

Currently there is no user of the query string passed. But the upcoming patch
for the planning counters will add the planning hook function into
pg_stat_statements and the function will need the query string. So this change
will be necessary for that patch.

Also this change is useful for some extensions that want to use the query
string in their planner hook function.

Author: Pascal Legrand, Julien Rouhaud
Reviewed-by: Yoshikazu Imai, Tom Lane, Fujii Masao
Discussion: https://postgr.es/m/CAOBaU_bU1m3_XF5qKYtSj1ua4dxd=FWDyh2SH4rSJAUUfsGmAQ@mail.gmail.com
Discussion: https://postgr.es/m/1583789487074-0.post@n3.nabble.com
2020-03-30 13:51:05 +09:00
Fujii Masao 4a539a25eb Expose BufferUsageAccumDiff().
Previously pg_stat_statements calculated the difference of buffer counters
by its own code even while BufferUsageAccumDiff() had the same code.
This commit expose BufferUsageAccumDiff() and makes pg_stat_statements
use it for the calculation, in order to simply the code.

This change also would be useful for the upcoming patch for the planning
counters in pg_stat_statements because the patch will add one more code
for the calculation of difference of buffer counters and that can easily be
done by using BufferUsageAccumDiff().

Author: Julien Rouhaud
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/bdfee4e0-a304-2498-8da5-3cb52c0a193e@oss.nttdata.com
2020-03-30 12:15:26 +09:00
Jeff Davis 7351bfeda3 Fix costing for disk-based hash aggregation.
Report and suggestions from Richard Guo and Tomas Vondra.

Discussion: https://postgr.es/m/CAMbWs4_W8fYbAn8KxgidAaZHON_Oo08OYn9ze=7remJymLqo5g@mail.gmail.com
2020-03-28 12:07:49 -07:00
Tom Lane bda6dedbea Go back to returning int from ereport auxiliary functions.
This reverts the parts of commit 17a28b0364
that changed ereport's auxiliary functions from returning dummy integer
values to returning void.  It turns out that a minority of compilers
complain (not entirely unreasonably) about constructs such as

	(condition) ? errdetail(...) : 0

if errdetail() returns void rather than int.  We could update those
call sites to say "(void) 0" perhaps, but the expectation for this
patch set was that ereport callers would not have to change anything.
And this aspect of the patch set was already the most invasive and
least compelling part of it, so let's just drop it.

Per buildfarm.

Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
2020-03-25 11:57:36 -04:00
Jeff Davis 3649133b14 Avoid allocating unnecessary zero-sized array.
If there are no aggregates, there is no need to allocate an array of
zero AggStatePerGroupData elements.
2020-03-24 18:30:04 -07:00
Tom Lane 17a28b0364 Improve the internal implementation of ereport().
Change all the auxiliary error-reporting routines to return void,
now that we no longer need to pretend they are passing something
useful to errfinish().  While this probably doesn't save anything
significant at the machine-code level, it allows detection of some
additional types of mistakes.

Pass the error location details (__FILE__, __LINE__, PG_FUNCNAME_MACRO)
to errfinish not errstart.  This shaves a few cycles off the case where
errstart decides we're not going to emit anything.

Re-implement elog() as a trivial wrapper around ereport(), removing
the separate support infrastructure it used to have.  Aside from
getting rid of some now-surplus code, this means that elog() now
really does have exactly the same semantics as ereport(), in particular
that it can skip evaluation work if the message is not to be emitted.

Andres Freund and Tom Lane

Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
2020-03-24 12:08:48 -04:00
Jeff Davis 64fe602279 Fixes for Disk-based Hash Aggregation.
Justin Pryzby raised a couple issues with commit 1f39bce0. Fixed.

Also, tweak the way the size of a hash entry is estimated and the
number of buckets is estimated when calling BuildTupleHashTableExt().

Discussion: https://www.postgresql.org/message-id/20200319064222.GR26184@telsasoft.com
2020-03-23 15:43:07 -07:00
Amit Kapila 33753ac9d7 Add object names to partition integrity violations.
All errors of SQLSTATE class 23 should include the name of an object
associated with the error in separate fields of the error report message.
We do this so that applications need not try to extract them from the
possibly-localized human-readable text of the message.

Reported-by: Chris Bandy
Author: Chris Bandy
Reviewed-by: Amit Kapila and Amit Langote
Discussion: https://postgr.es/m/0aa113a3-3c7f-db48-bcd8-f9290b2269ae@gmail.com
2020-03-23 08:09:15 +05:30
Jeff Davis 1f39bce021 Disk-based Hash Aggregation.
While performing hash aggregation, track memory usage when adding new
groups to a hash table. If the memory usage exceeds work_mem, enter
"spill mode".

In spill mode, new groups are not created in the hash table(s), but
existing groups continue to be advanced if input tuples match. Tuples
that would cause a new group to be created are instead spilled to a
logical tape to be processed later.

The tuples are spilled in a partitioned fashion. When all tuples from
the outer plan are processed (either by advancing the group or
spilling the tuple), finalize and emit the groups from the hash
table. Then, create new batches of work from the spilled partitions,
and select one of the saved batches and process it (possibly spilling
recursively).

Author: Jeff Davis
Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman
Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com
2020-03-18 15:42:02 -07:00
Tom Lane 9d9784c840 Remove bogus assertion about polymorphic SQL function result.
It is possible to reach check_sql_fn_retval() with an unresolved
polymorphic rettype, resulting in an assertion failure as demonstrated
by one of the added test cases.  However, the code following that
throws what seems an acceptable error message, so just remove the
Assert and adjust commentary.

While here, I thought it'd be a good idea to provide some parallel
tests of SQL-function and PL/pgSQL-function polymorphism behavior.
Some of these cases are perhaps duplicative of tests elsewhere,
but we hadn't any organized coverage of the topic AFAICS.

Although that assertion's been wrong all along, it won't have any
effect in production builds, so I'm not bothering to back-patch.

Discussion: https://postgr.es/m/21569.1584314271@sss.pgh.pa.us
2020-03-17 14:54:46 -04:00
Thomas Munro b09ff53667 Simplify the effective_io_concurrency setting.
The effective_io_concurrency GUC and equivalent tablespace option were
previously passed through a formula based on a theory about RAID
spindles and probabilities, to arrive at the number of pages to prefetch
in bitmap heap scans.  Tomas Vondra, Andres Freund and others argued
that it was anachronistic and hard to justify, and commit 558a9165e0
already started down the path of bypassing it in new code.  We agreed to
drop that logic and use the value directly.

For the default setting of 1, there is no change in effect.  Higher
settings can be converted from the old meaning to the new with:

  select round(sum(OLD / n::float)) from generate_series(1, OLD) s(n);

We might want to consider renaming the GUC before the next release given
the change in meaning, but it's not clear that many users had set it
very carefully anyway.  That decision is deferred for now.

Discussion: https://postgr.es/m/CA%2BhUKGJUw08dPs_3EUcdO6M90GnjofPYrWp4YSLaBkgYwS-AqA%40mail.gmail.com
2020-03-16 17:14:26 +13:00
Peter Eisentraut 3c173a53a8 Remove utils/acl.h from catalog/objectaddress.h
The need for this was removed by
8b9e9644dc.

A number of files now need to include utils/acl.h or
parser/parse_node.h explicitly where they previously got it indirectly
somehow.

Since parser/parse_node.h already includes nodes/parsenodes.h, the
latter is then removed where the former was added.  Also, remove
nodes/pg_list.h from objectaddress.h, since that's included via
nodes/parsenodes.h.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/7601e258-26b2-8481-36d0-dc9dca6f28f1%402ndquadrant.com
2020-03-10 10:27:00 +01:00
Jeff Davis c954d49046 Extend ExecBuildAggTrans() to support a NULL pointer check.
Optionally push a step to check for a NULL pointer to the pergroup
state.

This will be important for disk-based hash aggregation in combination
with grouping sets. When memory limits are reached, a given tuple may
find its per-group state for some grouping sets but not others. For
the former, it advances the per-group state as normal; for the latter,
it skips evaluation and the calling code will have to spill the tuple
and reprocess it in a later batch.

Add the NULL check as a separate expression step because in some
common cases it's not needed.

Discussion: https://postgr.es/m/20200221202212.ssb2qpmdgrnx52sj%40alap3.anarazel.de
2020-03-04 17:29:18 -08:00
Tom Lane 3ed2005ff5 Introduce macros for typalign and typstorage constants.
Our usual practice for "poor man's enum" catalog columns is to define
macros for the possible values and use those, not literal constants,
in C code.  But for some reason lost in the mists of time, this was
never done for typalign/attalign or typstorage/attstorage.  It's never
too late to make it better though, so let's do that.

The reason I got interested in this right now is the need to duplicate
some uses of the TYPSTORAGE constants in an upcoming ALTER TYPE patch.
But in general, this sort of change aids greppability and readability,
so it's a good idea even without any specific motivation.

I may have missed a few places that could be converted, and it's even
more likely that pending patches will re-introduce some hard-coded
references.  But that's not fatal --- there's no expectation that
we'd actually change any of these values.  We can clean up stragglers
over time.

Discussion: https://postgr.es/m/16457.1583189537@sss.pgh.pa.us
2020-03-04 10:34:25 -05:00
Alvaro Herrera 2f9661311b
Represent command completion tags as structs
The backend was using strings to represent command tags and doing string
comparisons in multiple places, but that's slow and unhelpful.  Create a
new command list with a supporting structure to use instead; this is
stored in a tag-list-file that can be tailored to specific purposes with
a caller-definable C macro, similar to what we do for WAL resource
managers.  The first first such uses are a new CommandTag enum and a
CommandTagBehavior struct.

Replace numerous occurrences of char *completionTag with a
QueryCompletion struct so that the code no longer stores information
about completed queries in a cstring.  Only at the last moment, in
EndCommand(), does this get converted to a string.

EventTriggerCacheItem no longer holds an array of palloc’d tag strings
in sorted order, but rather just a Bitmapset over the CommandTags.

Author: Mark Dilger, with unsolicited help from Álvaro Herrera
Reviewed-by: John Naylor, Tom Lane
Discussion: https://postgr.es/m/981A9DB4-3F0C-4DA5-88AD-CB9CFF4D6CAD@enterprisedb.com
2020-03-02 18:19:51 -03:00