Commit Graph

45 Commits

Author SHA1 Message Date
Alexander Korotkov da841aa4dc Revert: Let table AM insertion methods control index insertion
This commit reverts b1484a3f19 per review by Andres Freund.

Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
2024-04-11 16:01:30 +03:00
Masahiko Sawada f5a227895e Add new COPY option LOG_VERBOSITY.
This commit adds a new COPY option LOG_VERBOSITY, which controls the
amount of messages emitted during processing. Valid values are
'default' and 'verbose'.

This is currently used in COPY FROM when ON_ERROR option is set to
ignore. If 'verbose' is specified, a NOTICE message is emitted for
each discarded row, providing additional information such as line
number, column name, and the malformed value. This helps users to
identify problematic rows that failed to load.

Author: Bharath Rupireddy
Reviewed-by: Michael Paquier, Atsushi Torikoshi, Masahiko Sawada
Discussion: https://www.postgresql.org/message-id/CALj2ACUk700cYhx1ATRQyRw-fBM%2BaRo6auRAitKGff7XNmYfqQ%40mail.gmail.com
2024-04-01 15:25:25 +09:00
Alexander Korotkov b1484a3f19 Let table AM insertion methods control index insertion
Previously, the executor did index insert unconditionally after calling
table AM interface methods tuple_insert() and multi_insert().  This commit
introduces the new parameter insert_indexes for these two methods.  Setting
'*insert_indexes' to true saves the current logic.  Setting it to false
indicates that table AM cares about index inserts itself and doesn't want the
caller to do that.

Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Reviewed-by: Pavel Borisov, Matthias van de Meent, Mark Dilger
2024-03-30 22:53:56 +02:00
Peter Eisentraut dbbca2cf29 Remove unused #include's from backend .c files
as determined by include-what-you-use (IWYU)

While IWYU also suggests to *add* a bunch of #include's (which is its
main purpose), this patch does not do that.  In some cases, a more
specific #include replaces another less specific one.

Some manual adjustments of the automatic result:

- IWYU currently doesn't know about includes that provide global
  variable declarations (like -Wmissing-variable-declarations), so
  those includes are being kept manually.

- All includes for port(ability) headers are being kept for now, to
  play it safe.

- No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the
  patch from exploding in size.

Note that this patch touches just *.c files, so nothing declared in
header files changes in hidden ways.

As a small example, in src/backend/access/transam/rmgr.c, some IWYU
pragma annotations are added to handle a special case there.

Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
2024-03-04 12:02:20 +01:00
Dean Rasheed 5f2e179bd3 Support MERGE into updatable views.
This allows the target relation of MERGE to be an auto-updatable or
trigger-updatable view, and includes support for WITH CHECK OPTION,
security barrier views, and security invoker views.

A trigger-updatable view must have INSTEAD OF triggers for every type
of action (INSERT, UPDATE, and DELETE) mentioned in the MERGE command.
An auto-updatable view must not have any INSTEAD OF triggers. Mixing
auto-update and trigger-update actions (i.e., having a partial set of
INSTEAD OF triggers) is not supported.

Rule-updatable views are also not supported, since there is no
rewriter support for non-SELECT rules with MERGE operations.

Dean Rasheed, reviewed by Jian He and Alvaro Herrera.

Discussion: https://postgr.es/m/CAEZATCVcB1g0nmxuEc-A+gGB0HnfcGQNGYH7gS=7rq0u0zOBXA@mail.gmail.com
2024-02-29 15:56:59 +00:00
Michael Paquier 06bd311bce Revert "Refactor CopyReadAttributes{CSV,Text}() to use a callback in COPY FROM"
This reverts commit 95fb5b4902, for reasons similar to what led to
1aa8324b81.  In this case, the callback was called once per row, which
is less worse than the previous callback introduced for COPY TO called
once per argument for each row, still the patch set discussed to plug in
custom routines to the COPY paths would be able to know which subroutine
to use depending on its CopyFromState, so this led to a suboptimal
approach at the end.

For now, this part is reverted to consider better which approach to use.

Discussion: https://postgr.es/m/20240206014125.qofww7ew3dx3v3uk@awork3.anarazel.de
2024-02-14 10:07:22 +09:00
Michael Paquier 95fb5b4902 Refactor CopyReadAttributes{CSV,Text}() to use a callback in COPY FROM
CopyReadAttributes{CSV,Text}() are used to parse lines for text and CSV
format.  This reduces the number of "if" branches that need to be
checked when parsing fields in CSV and text mode when dealing with a
COPY FROM, something that can become more noticeable with more
attributes and more lines to process.

Extracted from a larger patch by the same author.

Author: Sutou Kouhei
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
2024-02-05 09:46:02 +09:00
Masahiko Sawada 729439607a Add progress reporting of skipped tuples during COPY FROM.
9e2d870119 enabled the COPY command to skip malformed data, however
there was no visibility into how many tuples were actually skipped
during the COPY FROM.

This commit adds a new "tuples_skipped" column to
pg_stat_progress_copy view to report the number of tuples that were
skipped because they contain malformed data.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Masahiko Sawada
Discussion: https://postgr.es/m/d12fd8c99adcae2744212cb23feff6ed%40oss.nttdata.com
2024-01-25 10:57:41 +09:00
Alexander Korotkov b725b7eec4 Rename COPY option from SAVE_ERROR_TO to ON_ERROR
The option names now are "stop" (default) and "ignore".  The future options
could be "file 'filename.log'" and "table 'tablename'".

Discussion: https://postgr.es/m/20240117.164859.2242646601795501168.horikyota.ntt%40gmail.com
Author: Jian He
Reviewed-by: Atsushi Torikoshi
2024-01-19 15:15:51 +02:00
Alexander Korotkov 58fbbc9d68 Fix spelling in notice
Reported-by: Atsushi Torikoshi
Discussion: https://postgr.es/m/762d7dd4d5aa9e5ecffec2ae6a255a28%40oss.nttdata.com
2024-01-17 22:59:09 +02:00
Alexander Korotkov 8ad1f7db87 Fix format specifier for NOTICE in copyfrom.c
It's incorrect to use %lz for 64-bit numbers on 32-bit machine.
2024-01-17 00:59:44 +02:00
Alexander Korotkov 9e2d870119 Add new COPY option SAVE_ERROR_TO
Currently, when source data contains unexpected data regarding data type or
range, the entire COPY fails. However, in some cases, such data can be ignored
and just copying normal data is preferable.

This commit adds a new option SAVE_ERROR_TO, which specifies where to save the
error information. When this option is specified, COPY skips soft errors and
continues copying.

Currently, SAVE_ERROR_TO only supports "none". This indicates error information
is not saved and COPY just skips the unexpected data and continues running.

Later works are expected to add more choices, such as 'log' and 'table'.

Author: Damir Belyalov, Atsushi Torikoshi, Alex Shulgin, Jian He
Discussion: https://postgr.es/m/87k31ftoe0.fsf_-_%40commandprompt.com
Reviewed-by: Pavel Stehule, Andres Freund, Tom Lane, Daniel Gustafsson,
Reviewed-by: Alena Rybakina, Andy Fan, Andrei Lepikhov, Masahiko Sawada
Reviewed-by: Vignesh C, Atsushi Torikoshi
2024-01-16 23:08:53 +02:00
Bruce Momjian 29275b1d17 Update copyright for 2024
Reported-by: Michael Paquier

Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz

Backpatch-through: 12
2024-01-03 20:49:05 -05:00
Tom Lane 743ddafc71 Ensure we preprocess expressions before checking their volatility.
contain_mutable_functions and contain_volatile_functions give
reliable answers only after expression preprocessing (specifically
eval_const_expressions).  Some places understand this, but some did
not get the memo --- which is not entirely their fault, because the
problem is documented only in places far away from those functions.
Introduce wrapper functions that allow doing the right thing easily,
and add commentary in hopes of preventing future mistakes from
copy-and-paste of code that's only conditionally safe.

Two actual bugs of this ilk are fixed here.  We failed to preprocess
column GENERATED expressions before checking mutability, so that the
code could fail to detect the use of a volatile function
default-argument expression, or it could reject a polymorphic function
that is actually immutable on the datatype of interest.  Likewise,
column DEFAULT expressions weren't preprocessed before determining if
it's safe to apply the attmissingval mechanism.  A false negative
would just result in an unnecessary table rewrite, but a false
positive could allow the attmissingval mechanism to be used in a case
where it should not be, resulting in unexpected initial values in a
new column.

In passing, re-order the steps in ComputePartitionAttrs so that its
checks for invalid column references are done before applying
expression_planner, rather than after.  The previous coding would
not complain if a partition expression contains a disallowed column
reference that gets optimized away by constant folding, which seems
to me to be a behavior we do not want.

Per bug #18097 from Jim Keener.  Back-patch to all supported versions.

Discussion: https://postgr.es/m/18097-ebb179674f22932f@postgresql.org
2023-11-16 10:05:14 -05:00
Tom Lane d8a09939a3 In COPY FROM, fail cleanly when unsupported encoding conversion is needed.
In recent releases, such cases fail with "cache lookup failed for
function 0" rather than complaining that the conversion function
doesn't exist as prior versions did.  Seems to be a consequence of
sloppy refactoring in commit f82de5c46.  Add the missing error check.

Per report from Pierre Fortin.  Back-patch to v14 where the
oversight crept in.

Discussion: https://postgr.es/m/20230929163739.3bea46e5.pfortin@pfortin.com
2023-10-01 12:09:26 -04:00
Andrew Dunstan 276393f53e Only evaluate default values as required when doing COPY FROM
Commit 9f8377f7a2 was a little too eager in fetching default values.
Normally this would not matter, but if the default value is not valid
for the type (e.g. a varchar that's too long) it caused an unnecessary
error.

Complaint and fix from Laurenz Albe

Backpatch to release 16.

Discussion: https://postgr.es/m/75a7b7483aeb331aa017328d606d568fc715b90d.camel@cybertec.at
2023-10-01 10:18:41 -04:00
Andrew Dunstan f6d4c9cf16 Provide FORCE_NULL * and FORCE_NOT_NULL * options for COPY FROM
These options already exist, but you need to specify a column list for
them, which can be cumbersome. We already have the possibility of all
columns for FORCE QUOTE, so this is simply extending that facility to
FORCE_NULL and FORCE_NOT_NULL.

Author: Zhang Mingli
Reviewed-By: Richard Guo, Kyatoro Horiguchi, Michael Paquier.

Discussion: https://postgr.es/m/CACJufxEnVqzOFtqhexF2+AwOKFrV8zHOY3y=p+gPK6eB14pn_w@mail.gmail.com
2023-09-30 12:34:41 -04:00
Michael Paquier e434e21e11 Remove redundant assignments in copyfrom.c
The tuple descriptor and the number of attributes are assigned twice to
the same values in BeginCopyFrom(), for what looks like a small thinko
coming from the refactoring done in c532d15ddd.

Author: Jingtang Zhang
Discussion: https://postgr.es/m/CAPsk3_CrYeXUVHEiaWAYxY9BKiGvGT3AoXo_+Jm0xP_s_VmXCA@mail.gmail.com
2023-09-09 21:12:41 +09:00
David Rowley b635ac03e8 Fix performance problem with new COPY DEFAULT code
9f8377f7a added code to allow COPY FROM insert a column's default value
when the input matches the DEFAULT string specified in the COPY command.

Here we fix some inefficient code which needlessly palloc0'd an array to
store if we should use the default value or input value for the given
column.  This array was being palloc0'd and pfree'd once per row.  It's
much more efficient to allocate this once and just reset the values once
per row.

Reported-by: Masahiko Sawada
Author: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoDvDmUQeJtZrau1ovnT_smN940%3DKp6mszNGK3bq9yRN6g%40mail.gmail.com
Backpatch-through: 16, where 9f8377f7a was introduced.
2023-07-27 14:47:05 +12:00
Tomas Vondra 19d8e2308b Ignore BRIN indexes when checking for HOT updates
When determining whether an index update may be skipped by using HOT, we
can ignore attributes indexed by block summarizing indexes without
references to individual tuples that need to be cleaned up.

A new type TU_UpdateIndexes provides a signal to the executor to
determine which indexes to update - no indexes, all indexes, or only the
summarizing indexes.

This also removes rd_indexattr list, and replaces it with rd_attrsvalid
flag. The list was not used anywhere, and a simple flag is sufficient.

This was originally committed as 5753d4ee32, but then got reverted by
e3fcca0d0d because of correctness issues.

Original patch by Josef Simanek, various fixes and improvements by Tomas
Vondra and me.

Authors: Matthias van de Meent, Josef Simanek, Tomas Vondra
Reviewed-by: Tomas Vondra, Alvaro Herrera
Discussion: https://postgr.es/m/05ebcb44-f383-86e3-4f31-0a97a55634cf@enterprisedb.com
Discussion: https://postgr.es/m/CAFp7QwpMRGcDAQumN7onN9HjrJ3u4X3ZRXdGFT0K5G2JWvnbWg%40mail.gmail.com
2023-03-20 11:02:42 +01:00
Andrew Dunstan 9f8377f7a2 Add a DEFAULT option to COPY FROM
This allows for a string which if an input field matches causes the
column's default value to be inserted. The advantage of this is that
the default can be inserted in some rows and not others, for which
non-default data is available.

The file_fdw extension is also modified to take allow use of this
option.

Israel Barth Rubio

Discussion: https://postgr.es/m/CAO_rXXAcqesk6DsvioOZ5zmeEmpUN5ktZf-9=9yu+DTr0Xr8Uw@mail.gmail.com
2023-03-13 10:01:56 -04:00
Tom Lane b803b7d132 Fill EState.es_rteperminfos more systematically.
While testing a fix for bug #17823, I discovered that EvalPlanQualStart
failed to copy es_rteperminfos from the parent EState, resulting in
failure if anything in EPQ execution wanted to consult that information.

This led me to conclude that commit a61b1f748 had been too haphazard
about where to fill es_rteperminfos, and that we need to be sure that
that happens exactly where es_range_table gets filled.  So I changed the
signature of ExecInitRangeTable to help ensure that this new requirement
doesn't get missed.  (Indeed, pgoutput.c was also failing to fill it.
Maybe we don't ever need it there, but I wouldn't bet on that.)

No test case yet; one will arrive with the fix for #17823.
But that needs to be back-patched, while this fix is HEAD-only.

Discussion: https://postgr.es/m/17823-b64909cf7d63de84@postgresql.org
2023-03-06 13:10:57 -05:00
Bruce Momjian c8e1ba736b Update copyright for 2023
Backpatch-through: 11
2023-01-02 15:00:37 -05:00
Alvaro Herrera a61b1f7482
Rework query relation permission checking
Currently, information about the permissions to be checked on relations
mentioned in a query is stored in their range table entries.  So the
executor must scan the entire range table looking for relations that
need to have permissions checked.  This can make the permission checking
part of the executor initialization needlessly expensive when many
inheritance children are present in the range range.  While the
permissions need not be checked on the individual child relations, the
executor still must visit every range table entry to filter them out.

This commit moves the permission checking information out of the range
table entries into a new plan node called RTEPermissionInfo.  Every
top-level (inheritance "root") RTE_RELATION entry in the range table
gets one and a list of those is maintained alongside the range table.
This new list is initialized by the parser when initializing the range
table.  The rewriter can add more entries to it as rules/views are
expanded.  Finally, the planner combines the lists of the individual
subqueries into one flat list that is passed to the executor for
checking.

To make it quick to find the RTEPermissionInfo entry belonging to a
given relation, RangeTblEntry gets a new Index field 'perminfoindex'
that stores the corresponding RTEPermissionInfo's index in the query's
list of the latter.

ExecutorCheckPerms_hook has gained another List * argument; the
signature is now:
typedef bool (*ExecutorCheckPerms_hook_type) (List *rangeTable,
					      List *rtePermInfos,
					      bool ereport_on_violation);
The first argument is no longer used by any in-core uses of the hook,
but we leave it in place because there may be other implementations that
do.  Implementations should likely scan the rtePermInfos list to
determine which operations to allow or deny.

Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail.gmail.com
2022-12-06 16:09:24 +01:00
Alvaro Herrera fb958b5da8
Generalize ri_RootToPartitionMap to use for non-partition children
ri_RootToPartitionMap is currently only initialized for tuple routing
target partitions, though a future commit will need the ability to use
it even for the non-partition child tables, so make adjustments to the
decouple it from the partitioning code.

Also, make it lazily initialized via ExecGetRootToChildMap(), making
that function its preferred access path.  Existing third-party code
accessing it directly should no longer do so; consequently, it's been
renamed to ri_RootToChildMap, which also makes it consistent with
ri_ChildToRootMap.

ExecGetRootToChildMap() houses the logic of setting the map appropriately
depending on whether a given child relation is partition or not.

To support this, also add a separate entry point for TupleConversionMap
creation that receives an AttrMap.  No new code here, just split an
existing function in two.

Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqEYUhDXSK5BTvG_xk=eaAEJCD4GS3C6uH7ybBvv+Z_Tmg@mail.gmail.com
2022-12-02 10:35:55 +01:00
Etsuro Fujita 97da48246d Allow batch insertion during COPY into a foreign table.
Commit 3d956d956 allowed the COPY, but it's done by inserting individual
rows to the foreign table, so it can be inefficient due to the overhead
caused by each round-trip to the foreign server.  To improve performance
of the COPY in such a case, this patch allows batch insertion, by
extending the multi-insert machinery in CopyFrom() to the foreign-table
case so that we insert multiple rows to the foreign table at once using
the FDW callback routine added by commit b663a4136.  This patch also
allows this for postgres_fdw.  It is enabled by the "batch_size" option
added by commit b663a4136, which is disabled by default.

When doing batch insertion, we update progress of the COPY command after
performing the FDW callback routine, to count rows not suppressed by the
FDW as well as a BEFORE ROW INSERT trigger.  For consistency, this patch
changes the timing of updating it for plain tables: previously, we
updated it immediately after adding each row to the multi-insert buffer,
but we do so only after writing the rows stored in the buffer out to the
table using table_multi_insert(), which I think would be consistent even
with non-batching mode, because in that mode we update it after writing
each row out to the table using table_tuple_insert().

Andrey Lepikhov, heavily revised by me, with review from Ian Barwick,
Andrey Lepikhov, and Zhihong Yu.

Discussion: https://postgr.es/m/bc489202-9855-7550-d64c-ad2d83c24867%40postgrespro.ru
2022-10-13 18:45:00 +09:00
Etsuro Fujita cbe6dd17ac Fix thinko in comment.
This comment has been wrong since its introduction in commit 0d5f05cde;
backpatch to v12 where that came in.

Discussion: https://postgr.es/m/CAPmGK14VGf-xQjGQN4o1QyAbXAaxugU5%3DqfcmTDh1iufUDnV_w%40mail.gmail.com
2022-09-22 15:55:00 +09:00
David Rowley 421892a192 Further reduce warnings with -Wshadow=compatible-local
In a similar effort to f01592f91, here we're targetting fixing the
warnings that -Wshadow=compatible-local produces that we can fix by moving
a variable to an inner scope to stop that variable from being shadowed by
another variable declared somewhere later in the function.

All of the warnings being fixed here are changing the scope of variables
which are being used as an iterator for a "for" loop.  In each instance,
the fix happens to be changing the for loop to use the C99 type
initialization.  Much of this code likely pre-dates our use of C99.

Reducing the scope of the outer scoped variable seems like the safest way
to fix these.  Renaming seems more likely to risk patches using the wrong
variable.  Reducing the scope is more likely to result in a compilation
failure after applying some future patch rather than introducing bugs with
it.

By my count, this takes the warning count from 129 down to 114.

Author: Justin Pryzby
Discussion: https://postgr.es/m/CAApHDvrwLGBP%2BYw9vriayyf%3DXR4uPWP5jr6cQhP9au_kaDUhbA%40mail.gmail.com
2022-08-24 12:27:12 +12:00
Robert Haas b0a55e4329 Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.

Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".

Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.

On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.

Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.

Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.

Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 11:39:09 -04:00
Etsuro Fujita f505bec711 Fix typo in comment. 2022-03-30 19:00:00 +09:00
Tom Lane 1f8bc44868 Remove workarounds for avoiding [U]INT64_FORMAT in translatable strings.
Further code simplification along the same lines as d914eb347
and earlier patches.

Aleksander Alekseev, Japin Li

Discussion: https://postgr.es/m/CAJ7c6TMSKi3Xs8h5MP38XOnQQpBLazJvVxVfPn++roitDJcR7g@mail.gmail.com
2022-03-21 11:11:55 -04:00
Tom Lane 89f059bdf5 Remove redundant memory context switches in BeginCopyFrom().
This is probably a leftover from code refactoring.

Japin Li

Discussion: https://postgr.es/m/MEYP282MB16693DDABDFEC7949AC31857B6599@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2022-01-19 12:31:15 -05:00
Bruce Momjian 27b77ecf9f Update copyright for 2022
Backpatch-through: 10
2022-01-07 19:04:57 -05:00
Daniel Gustafsson 8af57ad815 Fix typos in comments
Author: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/CAHut+PsN_gmKu-KfeEb9NDARoTPbs4AN4PPu=6LZXFZRJ13SEw@mail.gmail.com
2021-10-27 22:38:38 +02:00
Michael Paquier 609b0652af Fix typos and grammar in documentation and code comments
Comment fixes are applied on HEAD, and documentation improvements are
applied on back-branches where needed.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20210408164008.GJ6592@telsasoft.com
Backpatch-through: 9.6
2021-04-09 13:53:07 +09:00
Tom Lane c5b7ba4e67 Postpone some stuff out of ExecInitModifyTable.
Arrange to do some things on-demand, rather than immediately during
executor startup, because there's a fair chance of never having to do
them at all:

* Don't open result relations' indexes until needed.

* Don't initialize partition tuple routing, nor the child-to-root
tuple conversion map, until needed.

This wins in UPDATEs on partitioned tables when only some of the
partitions will actually receive updates; with larger partition
counts the savings is quite noticeable.  Also, we can remove some
sketchy heuristics in ExecInitModifyTable about whether to set up
tuple routing.

Also, remove execPartition.c's private hash table tracking which
partitions were already opened by the ModifyTable node.  Instead
use the hash added to ModifyTable itself by commit 86dc90056.

To allow lazy computation of the conversion maps, we now set
ri_RootResultRelInfo in all child ResultRelInfos.  We formerly set it
only in some, not terribly well-defined, cases.  This has user-visible
side effects in that now more error messages refer to the root
relation instead of some partition (and provide error data in the
root's column order, too).  It looks to me like this is a strict
improvement in consistency, so I don't have a problem with the
output changes visible in this commit.

Extracted from a larger patch, which seemed to me to be too messy
to push in one commit.

Amit Langote, reviewed at different times by Heikki Linnakangas and
myself

Discussion: https://postgr.es/m/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com
2021-04-06 15:57:11 -04:00
Heikki Linnakangas f82de5c46b Do COPY FROM encoding conversion/verification in larger chunks.
This gives a small performance gain, by reducing the number of calls
to the conversion/verification function, and letting it work with
larger inputs. Also, reorganizing the input pipeline makes it easier
to parallelize the input parsing: after the input has been converted
to the database encoding, the next stage of finding the newlines can
be done in parallel, because there cannot be any newline chars
"embedded" in multi-byte characters in the encodings that we support
as server encodings.

This changes behavior in one corner case: if client and server
encodings are the same single-byte encoding (e.g. latin1), previously
the input would not be checked for zero bytes ('\0'). Any fields
containing zero bytes would be truncated at the zero. But if encoding
conversion was needed, the conversion routine would throw an error on
the zero. After this commit, the input is always checked for zeros.

Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01%40iki.fi
2021-04-01 12:23:40 +03:00
Tom Lane 86dc90056d Rework planning and execution of UPDATE and DELETE.
This patch makes two closely related sets of changes:

1. For UPDATE, the subplan of the ModifyTable node now only delivers
the new values of the changed columns (i.e., the expressions computed
in the query's SET clause) plus row identity information such as CTID.
ModifyTable must re-fetch the original tuple to merge in the old
values of any unchanged columns.  The core advantage of this is that
the changed columns are uniform across all tables of an inherited or
partitioned target relation, whereas the other columns might not be.
A secondary advantage, when the UPDATE involves joins, is that less
data needs to pass through the plan tree.  The disadvantage of course
is an extra fetch of each tuple to be updated.  However, that seems to
be very nearly free in context; even worst-case tests don't show it to
add more than a couple percent to the total query cost.  At some point
it might be interesting to combine the re-fetch with the tuple access
that ModifyTable must do anyway to mark the old tuple dead; but that
would require a good deal of refactoring and it seems it wouldn't buy
all that much, so this patch doesn't attempt it.

2. For inherited UPDATE/DELETE, instead of generating a separate
subplan for each target relation, we now generate a single subplan
that is just exactly like a SELECT's plan, then stick ModifyTable
on top of that.  To let ModifyTable know which target relation a
given incoming row refers to, a tableoid junk column is added to
the row identity information.  This gets rid of the horrid hack
that was inheritance_planner(), eliminating O(N^2) planning cost
and memory consumption in cases where there were many unprunable
target relations.

Point 2 of course requires point 1, so that there is a uniform
definition of the non-junk columns to be returned by the subplan.
We can't insist on uniform definition of the row identity junk
columns however, if we want to keep the ability to have both
plain and foreign tables in a partitioning hierarchy.  Since
it wouldn't scale very far to have every child table have its
own row identity column, this patch includes provisions to merge
similar row identity columns into one column of the subplan result.
In particular, we can merge the whole-row Vars typically used as
row identity by FDWs into one column by pretending they are type
RECORD.  (It's still okay for the actual composite Datums to be
labeled with the table's rowtype OID, though.)

There is more that can be done to file down residual inefficiencies
in this patch, but it seems to be committable now.

FDW authors should note several API changes:

* The argument list for AddForeignUpdateTargets() has changed, and so
has the method it must use for adding junk columns to the query.  Call
add_row_identity_var() instead of manipulating the parse tree directly.
You might want to reconsider exactly what you're adding, too.

* PlanDirectModify() must now work a little harder to find the
ForeignScan plan node; if the foreign table is part of a partitioning
hierarchy then the ForeignScan might not be the direct child of
ModifyTable.  See postgres_fdw for sample code.

* To check whether a relation is a target relation, it's no
longer sufficient to compare its relid to root->parse->resultRelation.
Instead, check it against all_result_relids or leaf_result_relids,
as appropriate.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com
2021-03-31 11:52:37 -04:00
Michael Paquier 9d2d457009 Add support for more progress reporting in COPY
The command (TO or FROM), its type (file, pipe, program or callback),
and the number of tuples excluded by a WHERE clause in COPY FROM are
added to the progress reporting already available.

The column "lines_processed" is renamed to "tuples_processed" to
disambiguate the meaning of this column in the cases of CSV and BINARY
COPY and to be more consistent with the other catalog progress views.

Bump catalog version, again.

Author: Matthias van de Meent
Reviewed-by: Michael Paquier, Justin Pryzby, Bharath Rupireddy, Josef
Šimánek, Tomas Vondra
Discussion: https://postgr.es/m/CAEze2WiOcgdH4aQA8NtZq-4dgvnJzp8PohdeKchPkhMY-jWZXA@mail.gmail.com
2021-03-09 14:21:03 +09:00
Heikki Linnakangas 3174d69fb9 Remove server and libpq support for old FE/BE protocol version 2.
Protocol version 3 was introduced in PostgreSQL 7.4. There shouldn't be
many clients or servers left out there without version 3 support. But as
a courtesy, I kept just enough of the old protocol support that we can
still send the "unsupported protocol version" error in v2 format, so that
old clients can display the message properly. Likewise, libpq still
understands v2 ErrorResponse messages when establishing a connection.

The impetus to do this now is that I'm working on a patch to COPY
FROM, to always prefetch some data. We cannot do that safely with the
old protocol, because it requires parsing the input one byte at a time
to detect the end-of-copy marker.

Reviewed-by: Tom Lane, Alvaro Herrera, John Naylor
Discussion: https://www.postgresql.org/message-id/9ec25819-0a8a-d51a-17dc-4150bb3cca3b%40iki.fi
2021-03-04 10:45:55 +02:00
Heikki Linnakangas 6214e2b228 Fix permission checks on constraint violation errors on partitions.
If a cross-partition UPDATE violates a constraint on the target partition,
and the columns in the new partition are in different physical order than
in the parent, the error message can reveal columns that the user does not
have SELECT permission on. A similar bug was fixed earlier in commit
804b6b6db4.

The cause of the bug is that the callers of the
ExecBuildSlotValueDescription() function got confused when constructing
the list of modified columns. If the tuple was routed from a parent, we
converted the tuple to the parent's format, but the list of modified
columns was grabbed directly from the child's RTE entry.

ExecUpdateLockMode() had a similar issue. That lead to confusion on which
columns are key columns, leading to wrong tuple lock being taken on tables
referenced by foreign keys, when a row is updated with INSERT ON CONFLICT
UPDATE. A new isolation test is added for that corner case.

With this patch, the ri_RangeTableIndex field is no longer set for
partitions that don't have an entry in the range table. Previously, it was
set to the RTE entry of the parent relation, but that was confusing.

NOTE: This modifies the ResultRelInfo struct, replacing the
ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to
backpatch, because it breaks any extensions accessing the field. The
change that ri_RangeTableIndex is not set for partitions could potentially
break extensions, too. The ResultRelInfos are visible to FDWs at least,
and this patch required small changes to postgres_fdw. Nevertheless, this
seem like the least bad option. I don't think these fields widely used in
extensions; I don't think there are FDWs out there that uses the FDW
"direct update" API, other than postgres_fdw. If there is, you will get a
compilation error, so hopefully it is caught quickly.

Backpatch to 11, where support for both cross-partition UPDATEs, and unique
indexes on partitioned tables, were added.

Reviewed-by: Amit Langote
Security: CVE-2021-3393
2021-02-08 11:01:51 +02:00
Peter Geoghegan 9dc718bdf2 Pass down "logically unchanged index" hint.
Add an executor aminsert() hint mechanism that informs index AMs that
the incoming index tuple (the tuple that accompanies the hint) is not
being inserted by execution of an SQL statement that logically modifies
any of the index's key columns.

The hint is received by indexes when an UPDATE takes place that does not
apply an optimization like heapam's HOT (though only for indexes where
all key columns are logically unchanged).  Any index tuple that receives
the hint on insert is expected to be a duplicate of at least one
existing older version that is needed for the same logical row.  Related
versions will typically be stored on the same index page, at least
within index AMs that apply the hint.

Recognizing the difference between MVCC version churn duplicates and
true logical row duplicates at the index AM level can help with cleanup
of garbage index tuples.  Cleanup can intelligently target tuples that
are likely to be garbage, without wasting too many cycles on less
promising tuples/pages (index pages with little or no version churn).

This is infrastructure for an upcoming commit that will teach nbtree to
perform bottom-up index deletion.  No index AM actually applies the hint
just yet.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=CEKFa74EScx_hFVshCOn6AA5T-ajFASTdzipdkLTNQQ@mail.gmail.com
2021-01-13 08:11:00 -08:00
Tomas Vondra 8a4f618e7a Report progress of COPY commands
This commit introduces a view pg_stat_progress_copy, reporting progress
of COPY commands.  This allows rough estimates how far a running COPY
progressed, with the caveat that the total number of bytes may not be
available in some cases (e.g. when the input comes from the client).

Author: Josef Šimánek
Reviewed-by: Fujii Masao, Bharath Rupireddy, Vignesh C, Matthias van de Meent
Discussion: https://postgr.es/m/CAFp7QwqMGEi4OyyaLEK9DR0+E+oK3UtA4bEjDVCa4bNkwUY2PQ@mail.gmail.com
Discussion: https://postgr.es/m/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg+FEKet4XaTGSW=LitKQ@mail.gmail.com
2021-01-06 21:51:06 +01:00
Bruce Momjian ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Heikki Linnakangas c532d15ddd Split copy.c into four files.
Copy.c has grown really large. Split it into more manageable parts:

- copy.c now contains only a few functions that are common to COPY FROM
  and COPY TO.

- copyto.c contains code for COPY TO.

- copyfrom.c contains code for initializing COPY FROM, and inserting the
  tuples to the correct table.

- copyfromparse.c contains code for reading from the client/file/program,
  and parsing the input text/CSV/binary format into tuples.

All of these parts are fairly complicated, and fairly independent of each
other. There is a patch being discussed to implement parallel COPY FROM,
which will add a lot of new code to the COPY FROM path, and another patch
which would allow INSERTs to use the same multi-insert machinery as COPY
FROM, both of which will require refactoring that code. With those two
patches, there's going to be a lot of code churn in copy.c anyway, so now
seems like a good time to do this refactoring.

The CopyStateData struct is also split. All the formatting options, like
FORMAT, QUOTE, ESCAPE, are put in a new CopyFormatOption struct, which
is used by both COPY FROM and TO. Other state data are kept in separate
CopyFromStateData and CopyToStateData structs.

Reviewed-by: Soumyadeep Chakraborty, Erik Rijkers, Vignesh C, Andres Freund
Discussion: https://www.postgresql.org/message-id/8e15b560-f387-7acc-ac90-763986617bfb%40iki.fi
2020-11-23 10:50:50 +02:00