Commit c0d5be5d6 caused pg_dump to create a separate BLOB metadata TOC
entry for each large object (blob), but it did not touch the ancient
decision to put all the blobs' data into a single "BLOBS" TOC entry.
This is bad for a few reasons: for databases with millions of blobs,
the TOC becomes unreasonably large, causing performance issues;
selective restore of just some blobs is quite impossible; and we
cannot parallelize either dump or restore of the blob data, since our
architecture for that relies on farming out whole TOC entries to
worker processes.
To improve matters, let's group multiple blobs into each blob metadata
TOC entry, and then make corresponding per-group blob data TOC entries.
Selective restore using pg_restore's -l/-L switches is then possible,
though only at the group level. (Perhaps we should provide a switch
to allow forcing one-blob-per-group for users who need precise
selective restore and don't have huge numbers of blobs. This patch
doesn't do that, instead just hard-wiring the maximum number of blobs
per entry at 1000.)
The blobs in a group must all have the same owner, since the TOC entry
format only allows one owner to be named. In this implementation
we also require them to all share the same ACL (grants); the archive
format wouldn't require that, but pg_dump's representation of
DumpableObjects does. It seems unlikely that either restriction
will be problematic for databases with huge numbers of blobs.
The metadata TOC entries now have a "desc" string of "BLOB METADATA",
and their "defn" string is just a newline-separated list of blob OIDs.
The restore code has to generate creation commands, ALTER OWNER
commands, and drop commands (for --clean mode) from that. We would
need special-case code for ALTER OWNER and drop in any case, so the
alternative of keeping the "defn" as directly executable SQL code
for creation wouldn't buy much, and it seems like it'd bloat the
archive to little purpose.
Since we require the blobs of a metadata group to share the same ACL,
we can furthermore store only one copy of that ACL, and then make
pg_restore regenerate the appropriate commands for each blob. This
saves space in the dump file not only by removing duplicative SQL
command strings, but by not needing a separate TOC entry for each
blob's ACL. In turn, that reduces client-side memory requirements for
handling many blobs.
ACL TOC entries that need this special processing are labeled as
"ACL"/"LARGE OBJECTS nnn..nnn". If we have a blob with a unique ACL,
continue to label it as "ACL"/"LARGE OBJECT nnn". We don't actually
have to make such a distinction, but it saves a few cycles during
restore for the easy case, and it seems like a good idea to not change
the TOC contents unnecessarily.
The data TOC entries ("BLOBS") are exactly the same as before,
except that now there can be more than one, so we'd better give them
identifying tag strings.
Also, commit c0d5be5d6 put the new BLOB metadata TOC entries into
SECTION_PRE_DATA, which perhaps is defensible in some ways, but
it's a rather odd choice considering that we go out of our way to
treat blobs as data. Moreover, because parallel restore handles
the PRE_DATA section serially, this means we'd only get part of the
parallelism speedup we could hope for. Move these entries into
SECTION_DATA, letting us parallelize the lo_create calls not just the
data loading when there are many blobs. Add dependencies to ensure
that we won't try to load data for a blob we've not yet created.
As this stands, we still generate a separate TOC entry for any comment
or security label attached to a blob. I feel comfortable in believing
that comments and security labels on blobs are rare, so this patch
should be enough to get most of the useful TOC compression for blobs.
We have to bump the archive file format version number, since existing
versions of pg_restore wouldn't know they need to do something special
for BLOB METADATA, plus they aren't going to work correctly with
multiple BLOBS entries or multiple-large-object ACL entries.
The directory and tar-file format handlers need some work
for multiple BLOBS entries: they used to hard-wire the file name
as "blobs.toc", which is replaced here with "blobs_<dumpid>.toc".
The 002_pg_dump.pl test script also knows about that and requires
minor updates. (I had to drop the test for manually-compressed
blobs.toc files with LZ4, because lz4's obtuse command line
design requires explicit specification of the output file name
which seems impractical here. I don't think we're losing any
useful test coverage thereby; that test stanza seems completely
duplicative with the gzip and zstd cases anyway.)
In passing, centralize management of the lo_buf used to hold data
while restoring blobs. The code previously had each format handler
create lo_buf, which seems rather pointless given that the format
handlers all make it the same way. Moreover, the format handlers
never use lo_buf directly, making this setup a failure from a
separation-of-concerns standpoint. Let's move the responsibility into
pg_backup_archiver.c, which is the only module concerned with lo_buf.
The reason to do this in this patch is that it allows a centralized
fix for the now-false assumption that we never restore blobs in
parallel. Also, get rid of dead code in DropLOIfExists: it's been a
long time since we had any need to be able to restore to a pre-9.0
server.
Discussion: https://postgr.es/m/a9f9376f1c3343a6bb319dce294e20ac@EX13D05UWC001.ant.amazon.com
This commit adds support for using syncfs() in fsync_pgdata() and
fsync_dir_recurse() (which have been renamed to sync_pgdata() and
sync_dir_recurse()). Like recovery_init_sync_method,
sync_pgdata() calls syncfs() for the data directory, each
tablespace, and pg_wal (if it is a symlink). For now, all of the
frontend utilities that use these support functions are hard-coded
to use fsync(), but a follow-up commit will allow specifying
syncfs().
Co-authored-by: Justin Pryzby
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/20210930004340.GM831%40telsasoft.com
Allow pg_dump to use the zstd compression, in addition to gzip/lz4. Bulk
of the new compression method is implemented in compress_zstd.{c,h},
covering the pg_dump compression APIs. The rest of the patch adds test
and makes various places aware of the new compression method.
The zstd library (which this patch relies on) supports multithreaded
compression since version 1.5. We however disallow that feature for now,
as it might interfere with parallel backups on platforms that rely on
threads (e.g. Windows). This can be improved / relaxed in the future.
This also fixes a minor issue in InitDiscoverCompressFileHandle(), which
was not updated to check if the file already has the .lz4 extension.
Adding zstd compression was originally proposed in 2020 (see the second
thread), but then was reworked to use the new compression API introduced
in e9960732a9. I've considered both threads when compiling the list of
reviewers.
Author: Justin Pryzby
Reviewed-by: Tomas Vondra, Jacob Champion, Andreas Karlsson
Discussion: https://postgr.es/m/20230224191840.GD1653@telsasoft.com
Discussion: https://postgr.es/m/20201221194924.GI30237@telsasoft.com
Prior to the introduction of the compression API in e9960732a9, pg_dump
would use the ZLIB_IN_SIZE/ZLIB_OUT_SIZE to size input/output buffers.
Commit 0da243fed0 introduced similar constants for LZ4, but while gzip
defined both buffers to be 4kB, LZ4 used 4kB and 16kB without any clear
reasoning why that's desirable.
Furthermore, parts of the code unaware of which compression is used
(e.g. pg_backup_directory.c) continued to use ZLIB_OUT_SIZE directly.
Simplify by replacing the various constants with DEFAULT_IO_BUFFER_SIZE,
set to 4kB. The compression implementations still have an option to use
a custom value, but considering 4kB was fine for 20+ years, I find that
unlikely (and we'd probably just increase the default buffer size).
Author: Georgios Kokolatos
Reviewed-by: Tomas Vondra, Justin Pryzby
Discussion: https://postgr.es/m/33496f7c-3449-1426-d568-63f6bca2ac1f@gmail.com
After 0da243fed0 got committed, we've received a report about a compiler
warning, related to the new LZ4File_gets() function:
compress_lz4.c: In function 'LZ4File_gets':
compress_lz4.c:492:19: warning: comparison of unsigned expression in
'< 0' is always false [-Wtype-limits]
492 | if (dsize < 0)
The reason is very simple - dsize is declared as size_t, which is an
unsigned integer, and thus the check is pointless and we might fail to
notice an error in some cases (or fail in a strange way a bit later).
The warning could have been silenced by simply changing the type, but we
realized the API mostly assumes all the libraries use the same types and
report errors the same way (e.g. by returning 0 and/or negative value).
But we can't make this assumption - the gzip/lz4 libraries already
disagree on some of this, and even if they did a library added in the
future might not.
The right solution is to define what the API does, and translate the
library-specific behavior in consistent way (so that the internal errors
are not exposed to users of our compression API). So this adjusts the
data types in a couple places, so that we don't miss library errors, and
simplifies and unifies the error reporting to simply return true/false
(instead of e.g. size_t).
While at it, make sure LZ4File_open_write() does not clobber errno in
case open_func() fails.
Author: Georgios Kokolatos
Reported-by: Alexander Lakhin
Reviewed-by: Tomas Vondra, Justin Pryzby
Discussion: https://postgr.es/m/33496f7c-3449-1426-d568-63f6bca2ac1f@gmail.com
Expand pg_dump's compression streaming and file APIs to support the lz4
algorithm. The newly added compress_lz4.{c,h} files cover all the
functionality of the aforementioned APIs. Minor changes were necessary
in various pg_backup_* files, where code for the 'lz4' file suffix has
been added, as well as pg_dump's compression option parsing.
Author: Georgios Kokolatos
Reviewed-by: Michael Paquier, Rachel Heaton, Justin Pryzby, Shi Yu, Tomas Vondra
Discussion: https://postgr.es/m/faUNEOpts9vunEaLnmxmG-DldLSg_ql137OC3JYDmgrOMHm1RvvWY2IdBkv_CRxm5spCCb_OmKNk2T03TMm0fBEWveFF9wA1WizPuAgB7Ss%3D%40protonmail.com
Switch pg_dump to use the Compression API, implemented by bf9aa490db.
The CompressFileHandle replaces the cfp* family of functions with a
struct of callbacks for accessing (compressed) files. This allows adding
new compression methods simply by introducing a new struct instance with
appropriate implementation of the callbacks.
Archives compressed using custom compression methods store an identifier
of the compression algorithm in their header instead of the compression
level. The header version is bumped.
Author: Georgios Kokolatos
Reviewed-by: Michael Paquier, Rachel Heaton, Justin Pryzby, Tomas Vondra
Discussion: https://postgr.es/m/faUNEOpts9vunEaLnmxmG-DldLSg_ql137OC3JYDmgrOMHm1RvvWY2IdBkv_CRxm5spCCb_OmKNk2T03TMm0fBEWveFF9wA1WizPuAgB7Ss%3D%40protonmail.com
For historical reasons, pg_dump refers to large objects as "BLOBs".
This term is not used anywhere else in PostgreSQL, and it also means
something different in the SQL standard and other SQL systems.
This patch renames internal functions, code comments, documentation,
etc. to use the "large object" or "LO" terminology instead. There is
no functionality change, so the archive format still uses the name
"BLOB" for the archive entry. Additional long command-line options
are added with the new naming.
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/flat/868a381f-4650-9460-1726-1ffd39a270b4%40enterprisedb.com
Compression specifications are currently used by pg_basebackup and
pg_receivewal, and are able to let the user control in an extended way
the method and level of compression used. As an effect of this commit,
pg_dump's -Z/--compress is now able to use more than just an integer, as
of the grammar "method[:detail]".
The method can be either "none" or "gzip", and can optionally take a
detail string. If the detail string is only an integer, it defines the
compression level. A comma-separated list of keywords can also be used
method allows for more options, the only keyword supported now is
"level".
The change is backward-compatible, hence specifying only an integer
leads to no compression for a level of 0 and gzip compression when the
level is greater than 0.
Most of the code changes are straight-forward, as pg_dump was relying on
an integer tracking the compression level to check for gzip or no
compression. These are changed to use a compression specification and
the algorithm stored in it.
As of this change, note that the dump format is not bumped because there
is no need yet to track the compression algorithm in the TOC entries.
Hence, we still rely on the compression level to make the difference
when reading them. This will be mandatory once a new compression method
is added, though.
In order to keep the code simpler when parsing the compression
specification, the code is changed so as pg_dump now fails hard when
using gzip on -Z/--compress without its support compiled, rather than
enforcing no compression without the user knowing about it except
through a warning. Like before this commit, archive and custom formats
are compressed by default when the code is compiled with gzip, and left
uncompressed without gzip.
Author: Georgios Kokolatos
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/O4mutIrCES8ZhlXJiMvzsivT7ztAMja2lkdL1LJx6O5f22I2W8PBIeLKz7mDLwxHoibcnRAYJXm1pH4tyUNC4a8eDzLn22a6Pb1S74Niexg=@pm.me
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in pg_dump/pg_dumpall
related code.
Affected code happens to be inconsistent in how it applies conventions
around how Archive and Archive Handle variables are named. Significant
code churn is required to fully fix those inconsistencies, so take the
least invasive approach possible: treat function definition names as
authoritative, and mechanically adjust corresponding names from function
definitions to match.
Like other recent commits that cleaned up function parameter names, this
commit was written with help from clang-tidy.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAH2-Wzmma+vzcO6gr5NYDZ+sx0G14aU-UrzFutT2FoRaisVCUQ@mail.gmail.com
Get rid of the separate "FATAL" log level, as it was applied
so inconsistently as to be meaningless. This mostly involves
s/pg_log_fatal/pg_log_error/g.
Create a macro pg_fatal() to handle the common use-case of
pg_log_error() immediately followed by exit(1). Various
modules had already invented either this or equivalent macros;
standardize on pg_fatal() and apply it where possible.
Invent the ability to add "detail" and "hint" messages to a
frontend message, much as we have long had in the backend.
Except where rewording was needed to convert existing coding
to detail/hint style, I have (mostly) resisted the temptation
to change existing message wording.
Patch by me. Design and patch reviewed at various stages by
Robert Haas, Kyotaro Horiguchi, Peter Eisentraut and
Daniel Gustafsson.
Discussion: https://postgr.es/m/1363732.1636496441@sss.pgh.pa.us
Coverity complained that applying get_gz_error after a failed gzclose,
as we did in one place in pg_basebackup, is unsafe. I think it's
right: it's entirely likely that the call is touching freed memory.
Change that to inspect errno, as we do for other gzclose calls.
Also, be careful to initialize errno to zero immediately before any
gzclose() call where we care about the error status. (There are
some calls where we don't, because we already failed at some previous
step.) This ensures that we don't get a misleadingly irrelevant
error code if gzclose() fails in a way that doesn't set errno.
We could work harder at that, but it looks to me like all such cases
are basically can't-happen if we're not misusing zlib, so it's
not worth the extra notational cruft that would be required.
Also, fix several places that simply failed to check for close-time
errors at all, mostly at some remove from the close or gzclose itself;
and one place that did check but didn't bother to report the errno.
Back-patch to v12. These mistakes are older than that, but between
the frontend logging API changes that happened in v12 and the fact
that frontend code can't rely on %m before that, the patch would need
substantial revision to work in older branches. It doesn't quite
seem worth the trouble given the lack of related field complaints.
Patch by me; thanks to Michael Paquier for review.
Discussion: https://postgr.es/m/1343113.1636489231@sss.pgh.pa.us
If the blob TOC file cannot be parsed, the error message was failing
to print the filename as the variable holding it was shadowed by the
destination buffer for parsing. When the filename fails to parse,
the error will print an empty string:
./pg_restore -d foo -F d dump
pg_restore: error: invalid line in large object TOC file "": ..
..instead of the intended error message:
./pg_restore -d foo -F d dump
pg_restore: error: invalid line in large object TOC file "dump/blobs.toc": ..
Fix by renaming both variables as the shared name was too generic to
store either and still convey what the variable held.
Backpatch all the way down to 9.6.
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/A2B151F5-B32B-4F2C-BA4A-6870856D9BDE@yesql.se
Backpatch-through: 9.6
Make sure that the string parsing is limited by the size of the
destination buffer.
In pg_basebackup the available values sent from the server
is limited to two characters so there was no risk of overflow.
In pg_dump the buffer is bounded by MAXPGPATH, and thus the limit
must be inserted via preprocessor expansion and the buffer increased
by one to account for the terminator. There is no risk of overflow
here, since in this case, the buffer scanned is smaller than the
destination buffer.
Backpatch the pg_basebackup fix to 11 where it was introduced, and
the pg_dump fix all the way down to 9.6.
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/B14D3D7B-F98C-4E20-9459-C122C67647FB@yesql.se
Backpatch-through: 11 and 9.6
A few places calling fwrite and gzwrite were not setting errno to ENOSPC
when reporting errors, as is customary; this led to some failures being
reported as
"could not write file: Success"
which makes us look silly. Make a few of these places in pg_dump and
pg_basebackup use our customary pattern.
Backpatch-to: 9.5
Author: Justin Pryzby <pryzby@telsasoft.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200611153753.GU14879@telsasoft.com
Thomas Munro fixed a longstanding annoyance in pg_bsd_indent, that
it would misformat lines containing IsA() macros on the assumption
that the IsA() call should be treated like a cast. This improves
some other cases involving field/variable names that match typedefs,
too. The only places that get worse are a couple of uses of the
OpenSSL macro STACK_OF(); we'll gladly take that trade-off.
Discussion: https://postgr.es/m/20200114221814.GA19630@alvherre.pgsql
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.
Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
Previously, the way this worked was that a parallel pg_dump would
re-order the TABLE_DATA items in the dump's TOC into decreasing size
order, and separately re-order (some of) the INDEX items into decreasing
size order. Then pg_dump would dump the items in that order. Later,
parallel pg_restore just followed the TOC order. This method had lots
of deficiencies:
* TOC ordering randomly differed between parallel and non-parallel
dumps, and was hard to predict in the former case, causing problems
for building stable pg_dump test cases.
* Parallel restore only followed a well-chosen order if the dump had
been done in parallel; in particular, this never happened for restore
from custom-format dumps.
* The best order for restore isn't necessarily the same as for dump,
and it's not really static either because of locking considerations.
* TABLE_DATA and INDEX items aren't the only things that might take a lot
of work during restore. Scheduling was particularly stupid for the BLOBS
item, which might require lots of work during dump as well as restore,
but was left to the end in either case.
This patch removes the logic that changed the TOC order, fixing the
test instability problem. Instead, we sort the parallelizable items
just before processing them during a parallel dump. Independently
of that, parallel restore prioritizes the ready-to-execute tasks
based on the size of the underlying table. In the case of dependent
tasks such as index, constraint, or foreign key creation, the largest
relevant table is used as the metric for estimating the task length.
(This is pretty crude, but it should be enough to avoid the case we
want to avoid, which is ending the run with just a few large tasks
such that we can't make use of all N workers.)
Patch by me, responding to a complaint from Peter Eisentraut,
who also reviewed the patch.
Discussion: https://postgr.es/m/5137fe12-d0a2-4971-61b6-eb4e7e8875f8@2ndquadrant.com
Some error reports were reporting strerror(errno), which for some error
conditions coming from zlib are wrong, resulting in confusing reports
such as
pg_restore: [compress_io] could not read from input file: Success
which makes no sense. To correctly extract the error message we need to
use gzerror(), so let's do that.
This isn't as comprehensive or as neat as I would like, but at least it
should improve things in many common cases. The zlib abstraction in
compress_io does not seem to be applied consistently enough; we could
perhaps improve that, but it seems master-only material, not a bug fix
for back-patching.
This problem goes back all the way, but I decided to apply back to 9.4
only, because older branches don't contain commit 14ea89366 which this
change depends on.
Authors: Vladimir Kunschikov, Álvaro Herrera
Discussion: https://postgr.es/m/1498120508308.9826@infotecs.ru
The existing APIs for creating and parsing command and status messages are
rather messy; for example, archive-format modules have to provide code
for constructing command messages, which is entirely pointless since
the code to read them is hard-wired in WaitForCommands() and hence
no format-specific variation is actually possible. But there's little
foreseeable reason to need format-specific variation anyway.
The situation for status messages is no better; at least those are both
constructed and parsed by format-specific code, but said code is quite
redundant since there's no actual need for format-specific variation.
To add insult to injury, the first API involves returning pointers to
static buffers, which is bad, while the second involves returning pointers
to malloc'd strings, which is safer but randomly inconsistent.
Hence, get rid of the MasterStartParallelItem and MasterEndParallelItem
APIs, and instead write centralized functions that construct and parse
command and status messages. If we ever do need more flexibility, these
functions can be the standard implementations of format-specific
callback methods, but that's a long way off if it ever happens.
Tom Lane, reviewed by Kevin Grittner
Discussion: <17340.1464465717@sss.pgh.pa.us>
The ListenToWorkers/ReapWorkerStatus APIs were messy and hard to use.
Instead, make DispatchJobForTocEntry register a callback function that
will take care of state cleanup, doing whatever had been done by the caller
of ReapWorkerStatus in the old design. (This callback is essentially just
the old mark_work_done function in the restore case, and a trivial test for
worker failure in the dump case.) Then we can have ListenToWorkers call
the callback immediately on receipt of a status message, and return the
worker to WRKR_IDLE state; so the WRKR_FINISHED state goes away.
This allows us to design a unified wait-for-worker-messages loop:
WaitForWorkers replaces EnsureIdleWorker and EnsureWorkersFinished as well
as the mess in restore_toc_entries_parallel. Also, we no longer need the
fragile API spec that the caller of DispatchJobForTocEntry is responsible
for ensuring there's an idle worker, since DispatchJobForTocEntry can just
wait until there is one.
In passing, I got rid of the ParallelArgs struct, which was a net negative
in terms of notational verboseness, and didn't seem to be providing any
noticeable amount of abstraction either.
Tom Lane, reviewed by Kevin Grittner
Discussion: <1188.1464544443@sss.pgh.pa.us>
Formerly, Unix builds of pg_dump/pg_restore would trap SIGINT and similar
signals and set a flag that was tested in various data-transfer loops.
This was prone to errors of omission (cf commit 3c8aa6654); and even if
the client-side response was prompt, we did nothing that would cause
long-running SQL commands (e.g. CREATE INDEX) to terminate early.
Also, the master process would effectively do nothing at all upon receipt
of SIGINT; the only reason it seemed to work was that in typical scenarios
the signal would also be delivered to the child processes. We should
support termination when a signal is delivered only to the master process,
though.
Windows builds had no console interrupt handler, so they would just fall
over immediately at control-C, again leaving long-running SQL commands to
finish unmolested.
To fix, remove the flag-checking approach altogether. Instead, allow the
Unix signal handler to send a cancel request directly and then exit(1).
In the master process, also have it forward the signal to the children.
On Windows, add a console interrupt handler that behaves approximately
the same. The main difference is that a single execution of the Windows
handler can send all the cancel requests since all the info is available
in one process, whereas on Unix each process sends a cancel only for its
own database connection.
In passing, fix an old problem that DisconnectDatabase tends to send a
cancel request before exiting a parallel worker, even if nothing went
wrong. This is at least a waste of cycles, and could lead to unexpected
log messages, or maybe even data loss if it happened in pg_restore (though
in the current code the problem seems to affect only pg_dump). The cause
was that after a COPY step, pg_dump was leaving libpq in PGASYNC_BUSY
state, causing PQtransactionStatus() to report PQTRANS_ACTIVE. That's
normally harmless because the next PQexec() will silently clear the
PGASYNC_BUSY state; but in a parallel worker we might exit without any
additional SQL commands after a COPY step. So add an extra PQgetResult()
call after a COPY to allow libpq to return to PGASYNC_IDLE state.
This is a bug fix, IMO, so back-patch to 9.3 where parallel dump/restore
were introduced.
Thanks to Kyotaro Horiguchi for Windows testing and code suggestions.
Original-Patch: <7005.1464657274@sss.pgh.pa.us>
Discussion: <20160602.174941.256342236.horiguchi.kyotaro@lab.ntt.co.jp>
Parallel restore from directory format failed to respond to control-C
in a timely manner, because there were no checkAborting() calls in the
code path that reads data from a file and sends it to the backend.
If any worker was in the midst of restoring data for a large table,
you'd just have to wait.
This fix doesn't do anything for the problem of aborting a long-running
server-side command, but at least it fixes things for data transfers.
Back-patch to 9.3 where parallel restore was introduced.
It was using %u to read a string that was earlier produced by snprintf with %d
into a signed integer variable. This seems to work in practice but is
incorrect.
found by cppcheck
Rather than passing around DumpOptions and RestoreOptions as separate
arguments, add fields to struct Archive to carry pointers to these objects,
and access them through those fields when needed. There already was a
RestoreOptions pointer in Archive, though for no obvious reason it was part
of the "private" struct rather than out where pg_dump.c could see it.
Doing this allows reversion of quite a lot of parameter-addition changes
made in commit 0eea8047bf, which is a good thing IMO because this will
reduce the code delta between 9.4 and 9.5, probably easing a few future
back-patch efforts. Moreover, the previous commit only added a DumpOptions
argument to functions that had to have it at the time, which means we could
anticipate still more code churn (and more back-patch hazard) as the
requirement spread further. I'd hit exactly that problem in my upcoming
patch to fix extension membership marking, which is what motivated me to
do this.
Most pg_dump.c global variables, which were passed down individually to
dumping routines, are now grouped as members of the new DumpOptions
struct, which is used as a local variable and passed down into routines
that need it. This helps future development efforts; in particular it
is said to enable a mode in which a parallel pg_dump run can output
multiple streams, and have them restored in parallel.
Also take the opportunity to clean up the pg_dump header files somewhat,
to avoid circularity.
Author: Joachim Wieland, revised by Álvaro Herrera
Reviewed by Peter Eisentraut