Commit Graph

142 Commits

Author SHA1 Message Date
Tom Lane a45c78e328 Rearrange pg_dump's handling of large objects for better efficiency.
Commit c0d5be5d6 caused pg_dump to create a separate BLOB metadata TOC
entry for each large object (blob), but it did not touch the ancient
decision to put all the blobs' data into a single "BLOBS" TOC entry.
This is bad for a few reasons: for databases with millions of blobs,
the TOC becomes unreasonably large, causing performance issues;
selective restore of just some blobs is quite impossible; and we
cannot parallelize either dump or restore of the blob data, since our
architecture for that relies on farming out whole TOC entries to
worker processes.

To improve matters, let's group multiple blobs into each blob metadata
TOC entry, and then make corresponding per-group blob data TOC entries.
Selective restore using pg_restore's -l/-L switches is then possible,
though only at the group level.  (Perhaps we should provide a switch
to allow forcing one-blob-per-group for users who need precise
selective restore and don't have huge numbers of blobs.  This patch
doesn't do that, instead just hard-wiring the maximum number of blobs
per entry at 1000.)

The blobs in a group must all have the same owner, since the TOC entry
format only allows one owner to be named.  In this implementation
we also require them to all share the same ACL (grants); the archive
format wouldn't require that, but pg_dump's representation of
DumpableObjects does.  It seems unlikely that either restriction
will be problematic for databases with huge numbers of blobs.

The metadata TOC entries now have a "desc" string of "BLOB METADATA",
and their "defn" string is just a newline-separated list of blob OIDs.
The restore code has to generate creation commands, ALTER OWNER
commands, and drop commands (for --clean mode) from that.  We would
need special-case code for ALTER OWNER and drop in any case, so the
alternative of keeping the "defn" as directly executable SQL code
for creation wouldn't buy much, and it seems like it'd bloat the
archive to little purpose.

Since we require the blobs of a metadata group to share the same ACL,
we can furthermore store only one copy of that ACL, and then make
pg_restore regenerate the appropriate commands for each blob.  This
saves space in the dump file not only by removing duplicative SQL
command strings, but by not needing a separate TOC entry for each
blob's ACL.  In turn, that reduces client-side memory requirements for
handling many blobs.

ACL TOC entries that need this special processing are labeled as
"ACL"/"LARGE OBJECTS nnn..nnn".  If we have a blob with a unique ACL,
continue to label it as "ACL"/"LARGE OBJECT nnn".  We don't actually
have to make such a distinction, but it saves a few cycles during
restore for the easy case, and it seems like a good idea to not change
the TOC contents unnecessarily.

The data TOC entries ("BLOBS") are exactly the same as before,
except that now there can be more than one, so we'd better give them
identifying tag strings.

Also, commit c0d5be5d6 put the new BLOB metadata TOC entries into
SECTION_PRE_DATA, which perhaps is defensible in some ways, but
it's a rather odd choice considering that we go out of our way to
treat blobs as data.  Moreover, because parallel restore handles
the PRE_DATA section serially, this means we'd only get part of the
parallelism speedup we could hope for.  Move these entries into
SECTION_DATA, letting us parallelize the lo_create calls not just the
data loading when there are many blobs.  Add dependencies to ensure
that we won't try to load data for a blob we've not yet created.

As this stands, we still generate a separate TOC entry for any comment
or security label attached to a blob.  I feel comfortable in believing
that comments and security labels on blobs are rare, so this patch
should be enough to get most of the useful TOC compression for blobs.

We have to bump the archive file format version number, since existing
versions of pg_restore wouldn't know they need to do something special
for BLOB METADATA, plus they aren't going to work correctly with
multiple BLOBS entries or multiple-large-object ACL entries.

The directory and tar-file format handlers need some work
for multiple BLOBS entries: they used to hard-wire the file name
as "blobs.toc", which is replaced here with "blobs_<dumpid>.toc".
The 002_pg_dump.pl test script also knows about that and requires
minor updates.  (I had to drop the test for manually-compressed
blobs.toc files with LZ4, because lz4's obtuse command line
design requires explicit specification of the output file name
which seems impractical here.  I don't think we're losing any
useful test coverage thereby; that test stanza seems completely
duplicative with the gzip and zstd cases anyway.)

In passing, centralize management of the lo_buf used to hold data
while restoring blobs.  The code previously had each format handler
create lo_buf, which seems rather pointless given that the format
handlers all make it the same way.  Moreover, the format handlers
never use lo_buf directly, making this setup a failure from a
separation-of-concerns standpoint.  Let's move the responsibility into
pg_backup_archiver.c, which is the only module concerned with lo_buf.
The reason to do this in this patch is that it allows a centralized
fix for the now-false assumption that we never restore blobs in
parallel.  Also, get rid of dead code in DropLOIfExists: it's been a
long time since we had any need to be able to restore to a pre-9.0
server.

Discussion: https://postgr.es/m/a9f9376f1c3343a6bb319dce294e20ac@EX13D05UWC001.ant.amazon.com
2024-04-01 16:25:56 -04:00
Peter Eisentraut 35ce24c333 pg_dump: Remove "blob" terminology
For historical reasons, pg_dump refers to large objects as "BLOBs".
This term is not used anywhere else in PostgreSQL, and it also means
something different in the SQL standard and other SQL systems.

This patch renames internal functions, code comments, documentation,
etc. to use the "large object" or "LO" terminology instead.  There is
no functionality change, so the archive format still uses the name
"BLOB" for the archive entry.  Additional long command-line options
are added with the new naming.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/flat/868a381f-4650-9460-1726-1ffd39a270b4%40enterprisedb.com
2022-12-05 08:52:55 +01:00
Peter Eisentraut 02c408e21a Remove redundant null pointer checks before free()
Per applicable standards, free() with a null pointer is a no-op.
Systems that don't observe that are ancient and no longer relevant.
Some PostgreSQL code already required this behavior, so this change
does not introduce any new requirements, just makes the code more
consistent.

Discussion: https://www.postgresql.org/message-id/flat/dac5d2d0-98f5-94d9-8e69-46da2413593d%40enterprisedb.com
2022-07-03 11:47:15 +02:00
Tom Lane 9a374b77fb Improve frontend error logging style.
Get rid of the separate "FATAL" log level, as it was applied
so inconsistently as to be meaningless.  This mostly involves
s/pg_log_fatal/pg_log_error/g.

Create a macro pg_fatal() to handle the common use-case of
pg_log_error() immediately followed by exit(1).  Various
modules had already invented either this or equivalent macros;
standardize on pg_fatal() and apply it where possible.

Invent the ability to add "detail" and "hint" messages to a
frontend message, much as we have long had in the backend.

Except where rewording was needed to convert existing coding
to detail/hint style, I have (mostly) resisted the temptation
to change existing message wording.

Patch by me.  Design and patch reviewed at various stages by
Robert Haas, Kyotaro Horiguchi, Peter Eisentraut and
Daniel Gustafsson.

Discussion: https://postgr.es/m/1363732.1636496441@sss.pgh.pa.us
2022-04-08 14:55:14 -04:00
Tom Lane c49d926833 Clean up some more freshly-dead code in pg_dump and pg_upgrade.
I missed a few things in 30e7c175b and e469f0aaf,
as noted by Justin Pryzby.

Discussion: https://postgr.es/m/2923349.1634942313@sss.pgh.pa.us
2021-12-16 12:01:59 -05:00
Tom Lane 58cd8dca3d Avoid redundantly prefixing PQerrorMessage for a connection failure.
libpq's error messages for connection failures pretty well stand on
their own, especially since commits 52a10224e/27a48e5a1.  Prefixing
them with 'could not connect to database "foo"' or the like is just
redundant, and perhaps even misleading if the specific database name
isn't relevant to the failure.  (When it is, we trust that the
backend's error message will include the DB name.)  Indeed, psql
hasn't used any such prefix in a long time.  So, make all our other
programs and documentation examples agree with psql's practice.

Discussion: https://postgr.es/m/1094524.1611266589@sss.pgh.pa.us
2021-01-22 16:52:31 -05:00
Tom Lane d3adaabaf7 Revert "pg_dump: Lock all relations, not just plain tables".
Revert 403a3d91c, as well as the followup fix 7f4235032, in all
branches.  We need to think a bit harder about what the behavior
of LOCK TABLE on views should be, and there's no time for that
before next week's releases.  We'll take another crack at this
later.

Discussion: https://postgr.es/m/16703-e348f58aab3cf6cc@postgresql.org
2020-11-06 15:48:04 -05:00
Tom Lane 7f4235032f Avoid null pointer dereference if error result lacks SQLSTATE.
Although error results received from the backend should always have
a SQLSTATE field, ones generated by libpq won't, making this code
vulnerable to a crash after, say, untimely loss of connection.
Noted by Coverity.

Oversight in commit 403a3d91c.  Back-patch to 9.5, as that was.
2020-11-01 11:26:16 -05:00
Alvaro Herrera 403a3d91c8
pg_dump: Lock all relations, not just plain tables
Now that LOCK TABLE can take any relation type, acquire lock on all
relations that are to be dumped.  This prevents schema changes or
deadlock errors that could cause a dump to fail after expending much
effort.  The server is tested to have the capability and the feature
disabled if it doesn't, so that a patched pg_dump doesn't fail when
connecting to an unpatched server.

Backpatch to 9.5.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Wells Oliver <wells.oliver@gmail.com>
Discussion: https://postgr.es/m/20201021200659.GA32358@alvherre.pgsql
2020-10-27 14:31:37 -03:00
Tom Lane a45bc8a4f6 Fix handling of -d "connection string" in pg_dump/pg_restore.
Parallel pg_dump failed if its -d parameter was a connection string
containing any essential information other than host, port, or username.
The same was true for pg_restore with --create.

The reason is that these scenarios failed to preserve the connection
string from the command line; the code felt free to replace that with
just the database name when reconnecting from a pg_dump parallel worker
or after creating the target database.  By chance, parallel pg_restore
did not suffer this defect, as long as you didn't say --create.

In practice it seems that the error would be obvious only if the
connstring included essential, non-default SSL or GSS parameters.
This may explain why it took us so long to notice.  (It also makes
it very difficult to craft a regression test case illustrating the
problem, since the test would fail in builds without those options.)

Fix by refactoring so that ConnectDatabase always receives all the
relevant options directly from the command line, rather than
reconstructed values.  Inject a different database name, when necessary,
by relying on libpq's rules for handling multiple "dbname" parameters.

While here, let's get rid of the essentially duplicate _connectDB
function, as well as some obsolete nearby cruft.

Per bug #16604 from Zsolt Ero.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/16604-933f4b8791227b15@postgresql.org
2020-09-24 18:19:38 -04:00
Tom Lane 67a472d71c Remove arbitrary restrictions on password length.
This patch started out with the goal of harmonizing various arbitrary
limits on password length, but after awhile a better idea emerged:
let's just get rid of those fixed limits.

recv_password_packet() has an arbitrary limit on the packet size,
which we don't really need, so just drop it.  (Note that this doesn't
really affect anything for MD5 or SCRAM password verification, since
those will hash the user's password to something shorter anyway.
It does matter for auth methods that require a cleartext password.)

Likewise remove the arbitrary error condition in pg_saslprep().

The remaining limits are mostly in client-side code that prompts
for passwords.  To improve those, refactor simple_prompt() so that
it allocates its own result buffer that can be made as big as
necessary.  Actually, it proves best to make a separate routine
pg_get_line() that has essentially the semantics of fgets(), except
that it allocates a suitable result buffer and hence will never
return a truncated line.  (pg_get_line has a lot of potential
applications to replace randomly-sized fgets buffers elsewhere,
but I'll leave that for another patch.)

I built pg_get_line() atop stringinfo.c, which requires moving
that code to src/common/; but that seems fine since it was a poor
fit for src/port/ anyway.

This patch is mostly mine, but it owes a good deal to Nathan Bossart
who pressed for a solution to the password length problem and
created a predecessor patch.  Also thanks to Peter Eisentraut and
Stephen Frost for ideas and discussion.

Discussion: https://postgr.es/m/09512C4F-8CB9-4021-B455-EF4C4F0D55A0@amazon.com
2020-09-03 20:09:18 -04:00
Noah Misch e078fb5d4e Move connect.h from fe_utils to src/include/common.
Any libpq client can use the header.  Clients include backend components
postgres_fdw, dblink, and logical replication apply worker.  Back-patch
to v10, because another fix needs this.  In released branches, just copy
the header and keep the original.
2020-08-10 09:22:54 -07:00
Amit Kapila dddf4cdc33 Make the order of the header file includes consistent in non-backend modules.
Similar to commit 7e735035f2, this commit makes the order of header file
inclusion consistent for non-backend modules.

In passing, fix the case where we were using angle brackets (<>) for the
local module includes instead of quotes ("").

Author: Vignesh C
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
2019-10-25 07:41:52 +05:30
Peter Eisentraut e435c1e7d9 Message style improvements 2019-07-09 15:47:09 +02:00
Michael Paquier 6b8548964b Fix inconsistencies in the code
This addresses a couple of issues in the code:
- Typos and inconsistencies in comments and function declarations.
- Removal of unreferenced function declarations.
- Removal of unnecessary compile flags.
- A cleanup error in regressplans.sh.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/0c991fdf-2670-1997-c027-772a420c4604@gmail.com
2019-07-08 13:15:09 +09:00
David Rowley 8abc13a889 Use appendStringInfoString and appendPQExpBufferStr where possible
This changes various places where appendPQExpBuffer was used in places
where it was possible to use appendPQExpBufferStr, and likewise for
appendStringInfo and appendStringInfoString.  This is really just a
stylistic improvement, but there are also small performance gains to be
had from doing this.

Discussion: http://postgr.es/m/CAKJS1f9P=M-3ULmPvr8iCno8yvfDViHibJjpriHU8+SXUgeZ=w@mail.gmail.com
2019-07-04 13:01:13 +12:00
Tom Lane be76af171c Initial pgindent run for v12.
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.

Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
2019-05-22 12:55:34 -04:00
Tom Lane fc9a62af3f Move logging.h and logging.c from src/fe_utils/ to src/common/.
The original placement of this module in src/fe_utils/ is ill-considered,
because several src/common/ modules have dependencies on it, meaning that
libpgcommon and libpgfeutils now have mutual dependencies.  That makes it
pointless to have distinct libraries at all.  The intended design is that
libpgcommon is lower-level than libpgfeutils, so only dependencies from
the latter to the former are acceptable.

We already have the precedent that fe_memutils and a couple of other
modules in src/common/ are frontend-only, so it's not stretching anything
out of whack to treat logging.c as a frontend-only module in src/common/.
To the extent that such modules help provide a common frontend/backend
environment for the rest of common/ to use, it's a reasonable design.
(logging.c does not yet provide an ereport() emulation, but one can
dream.)

Hence, move these files over, and revert basically all of the build-system
changes made by commit cc8d41511.  There are no places that need to grow
new dependencies on libpgcommon, further reinforcing the idea that this
is the right solution.

Discussion: https://postgr.es/m/a912ffff-f6e4-778a-c86a-cf5c47a12933@2ndquadrant.com
2019-05-14 14:20:10 -04:00
Peter Eisentraut cc8d415117 Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.

Features:

- Program name is automatically prefixed.

- Message string does not end with newline.  This removes a common
  source of inconsistencies and omissions.

- Additionally, a final newline is automatically stripped, simplifying
  use of PQerrorMessage() etc., another common source of mistakes.

- I converted error message strings to use %m where possible.

- As a result of the above several points, more translatable message
  strings can be shared between different components and between
  frontends and backend, without gratuitous punctuation or whitespace
  differences.

- There is support for setting a "log level".  This is not meant to be
  user-facing, but can be used internally to implement debug or
  verbose modes.

- Lazy argument evaluation, so no significant overhead if logging at
  some level is disabled.

- Some color in the messages, similar to gcc and clang.  Set
  PG_COLOR=auto to try it out.  Some colors are predefined, but can be
  customized by setting PG_COLORS.

- Common files (common/, fe_utils/, etc.) can handle logging much more
  simply by just using one API without worrying too much about the
  context of the calling program, requiring callbacks, or having to
  pass "progname" around everywhere.

- Some programs called setvbuf() to make sure that stderr is
  unbuffered, even on Windows.  But not all programs did that.  This
  is now done centrally.

Soft goals:

- Reduces vertical space use and visual complexity of error reporting
  in the source code.

- Encourages more deliberate classification of messages.  For example,
  in some cases it wasn't clear without analyzing the surrounding code
  whether a message was meant as an error or just an info.

- Concepts and terms are vaguely aligned with popular logging
  frameworks such as log4j and Python logging.

This is all just about printing stuff out.  Nothing affects program
flow (e.g., fatal exits).  The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.

I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded.  One significant change is that
pg_rewind used to write all error messages to stdout.  That is now
changed to stderr.

Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 20:01:35 +02:00
Noah Misch 582edc369c Empty search_path in Autovacuum and non-psql/pgbench clients.
This makes the client programs behave as documented regardless of the
connect-time search_path and regardless of user-created objects.  Today,
a malicious user with CREATE permission on a search_path schema can take
control of certain of these clients' queries and invoke arbitrary SQL
functions under the client identity, often a superuser.  This is
exploitable in the default configuration, where all users have CREATE
privilege on schema "public".

This changes behavior of user-defined code stored in the database, like
pg_index.indexprs and pg_extension_config_dump().  If they reach code
bearing unqualified names, "does not exist" or "no schema has been
selected to create in" errors might appear.  Users may fix such errors
by schema-qualifying affected names.  After upgrading, consider watching
server logs for these errors.

The --table arguments of src/bin/scripts clients have been lax; for
example, "vacuumdb -Zt pg_am\;CHECKPOINT" performed a checkpoint.  That
now fails, but for now, "vacuumdb -Zt 'pg_am(amname);CHECKPOINT'" still
performs a checkpoint.

Back-patch to 9.3 (all supported versions).

Reviewed by Tom Lane, though this fix strategy was not his first choice.
Reported by Arseniy Sharoglazov.

Security: CVE-2018-1058
2018-02-26 07:39:44 -08:00
Tom Lane 160a4f62ee In pg_dump, force reconnection after issuing ALTER DATABASE SET command(s).
The folly of not doing this was exposed by the buildfarm: in some cases,
the GUC settings applied through ALTER DATABASE SET may be essential to
interpreting the reloaded data correctly.  Another argument why we can't
really get away with the scheme proposed in commit b3f840120 is that it
cannot work for parallel restore: even if the parent process manages to
hang onto the previous GUC state, worker processes would see the state
post-ALTER-DATABASE.  (Perhaps we could have dodged that bullet by
delaying DATABASE PROPERTIES restoration to the end of the run, but
that does nothing for the data semantics problem.)

This leaves us with no solution for the default_transaction_read_only issue
that commit 4bd371f6f intended to work around, other than "you gotta remove
such settings before dumping/upgrading".  However, in view of the fact that
parallel restore broke that hack years ago and no one has noticed, it's
fair to question how many people care.  I'm unexcited about adding a large
dollop of new complexity to handle that corner case.

This would be a one-liner fix, except it turns out that ReconnectToServer
tries to optimize away "redundant" reconnections.  While that may have been
valuable when coded, a quick survey of current callers shows that there are
no cases where that's actually useful, so just remove that check.  While at
it, remove the function's useless return value.

Discussion: https://postgr.es/m/12453.1516655001@sss.pgh.pa.us
2018-01-23 10:55:16 -05:00
Peter Eisentraut 0e1539ba0d Add some const decorations to prototypes
Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
2017-11-10 13:38:57 -05:00
Tom Lane 382ceffdf7 Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.

By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis.  However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent.  That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.

This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:35:54 -04:00
Alvaro Herrera e0bf16060b Ignore PQcancel errors properly
Add a (void) cast to all PQcancel() calls that purposefully don't check
the return value, to keep compilers and static checkers happy.

Per Coverity.
2017-05-09 14:58:51 -03:00
Peter Eisentraut 5baf869f74 pg_dump: Remove query truncation in error messages
Remove the behavior that a query mentioned in an error message would be
truncated to 128 characters.  The queries that pg_dump runs are often
longer than that, and this behavior makes analyzing failures harder
unnecessarily.

Discussion: https://www.postgresql.org/message-id/flat/63201ef9-26fb-3f1f-664d-98531678cebc%402ndquadrant.com
2017-03-29 15:17:14 -04:00
Tom Lane 9daec77e16 Simplify correct use of simple_prompt().
The previous API for this function had it returning a malloc'd string.
That meant that callers had to check for NULL return, which few of them
were doing, and it also meant that callers had to remember to free()
the string later, which required extra logic in most cases.

Instead, make simple_prompt() write into a buffer supplied by the caller.
Anywhere that the maximum required input length is reasonably small,
which is almost all of the callers, we can just use a local or static
array as the buffer instead of dealing with malloc/free.

A fair number of callers used "pointer == NULL" as a proxy for "haven't
requested the password yet".  Maintaining the same behavior requires
adding a separate boolean flag for that, which adds back some of the
complexity we save by removing free()s.  Nonetheless, this nets out
at a small reduction in overall code size, and considerably less code
than we would have had if we'd added the missing NULL-return checks
everywhere they were needed.

In passing, clean up the API comment for simple_prompt() and get rid
of a very-unnecessary malloc/free in its Windows code path.

This is nominally a bug fix, but it does not seem worth back-patching,
because the actual risk of an OOM failure in any of these places seems
pretty tiny, and all of them are client-side not server-side anyway.

This patch is by me, but it owes a great deal to Michael Paquier
who identified the problem and drafted a patch for fixing it the
other way.

Discussion: <CAB7nPqRu07Ot6iht9i9KRfYLpDaF2ZuUv5y_+72uP23ZAGysRg@mail.gmail.com>
2016-08-30 17:02:02 -04:00
Noah Misch fcd15f1358 Obstruct shell, SQL, and conninfo injection via database and role names.
Due to simplistic quoting and confusion of database names with conninfo
strings, roles with the CREATEDB or CREATEROLE option could escalate to
superuser privileges when a superuser next ran certain maintenance
commands.  The new coding rule for PQconnectdbParams() calls, documented
at conninfo_array_parse(), is to pass expand_dbname=true and wrap
literal database names in a trivial connection string.  Escape
zero-length values in appendConnStrVal().  Back-patch to 9.1 (all
supported versions).

Nathan Bossart, Michael Paquier, and Noah Misch.  Reviewed by Peter
Eisentraut.  Reported by Nathan Bossart.

Security: CVE-2016-5424
2016-08-08 10:07:46 -04:00
Tom Lane e652273e07 Redesign handling of SIGTERM/control-C in parallel pg_dump/pg_restore.
Formerly, Unix builds of pg_dump/pg_restore would trap SIGINT and similar
signals and set a flag that was tested in various data-transfer loops.
This was prone to errors of omission (cf commit 3c8aa6654); and even if
the client-side response was prompt, we did nothing that would cause
long-running SQL commands (e.g. CREATE INDEX) to terminate early.
Also, the master process would effectively do nothing at all upon receipt
of SIGINT; the only reason it seemed to work was that in typical scenarios
the signal would also be delivered to the child processes.  We should
support termination when a signal is delivered only to the master process,
though.

Windows builds had no console interrupt handler, so they would just fall
over immediately at control-C, again leaving long-running SQL commands to
finish unmolested.

To fix, remove the flag-checking approach altogether.  Instead, allow the
Unix signal handler to send a cancel request directly and then exit(1).
In the master process, also have it forward the signal to the children.
On Windows, add a console interrupt handler that behaves approximately
the same.  The main difference is that a single execution of the Windows
handler can send all the cancel requests since all the info is available
in one process, whereas on Unix each process sends a cancel only for its
own database connection.

In passing, fix an old problem that DisconnectDatabase tends to send a
cancel request before exiting a parallel worker, even if nothing went
wrong.  This is at least a waste of cycles, and could lead to unexpected
log messages, or maybe even data loss if it happened in pg_restore (though
in the current code the problem seems to affect only pg_dump).  The cause
was that after a COPY step, pg_dump was leaving libpq in PGASYNC_BUSY
state, causing PQtransactionStatus() to report PQTRANS_ACTIVE.  That's
normally harmless because the next PQexec() will silently clear the
PGASYNC_BUSY state; but in a parallel worker we might exit without any
additional SQL commands after a COPY step.  So add an extra PQgetResult()
call after a COPY to allow libpq to return to PGASYNC_IDLE state.

This is a bug fix, IMO, so back-patch to 9.3 where parallel dump/restore
were introduced.

Thanks to Kyotaro Horiguchi for Windows testing and code suggestions.

Original-Patch: <7005.1464657274@sss.pgh.pa.us>
Discussion: <20160602.174941.256342236.horiguchi.kyotaro@lab.ntt.co.jp>
2016-06-02 13:28:17 -04:00
Magnus Hagander d74048defc Make pg_dump error cleanly with -j against hot standby
Getting a synchronized snapshot is not supported on a hot standby node,
and is by default taken when using -j with multiple sessions. Trying to
do so still failed, but with a server error that would also go in the
log. Instead, proprely detect this case and give a better error message.
2016-05-26 22:14:23 +02:00
Tom Lane 1aa41e3eae In pg_dump, remember connection passwords no matter how we got them.
When pg_dump prompts the user for a password, it remembers the password
for possible re-use by parallel worker processes.  However, libpq might
have extracted the password from a connection string originally passed
as "dbname".  Since we don't record the original form of dbname but
break it down to host/port/etc, the password gets lost.  Fix that by
retrieving the actual password from the PGconn.

(It strikes me that this whole approach is rather broken, as it will also
lose other information such as options that might have been present in
the connection string.  But we'll leave that problem for another day.)

In passing, get rid of rather silly use of malloc() for small fixed-size
arrays.

Back-patch to 9.3 where parallel pg_dump was introduced.

Report and fix by Zeus Kronion, adjusted a bit by Michael Paquier and me
2015-12-23 14:25:53 -05:00
Tom Lane 586dd5d6a5 Replace a bunch more uses of strncpy() with safer coding.
strncpy() has a well-deserved reputation for being unsafe, so make an
effort to get rid of nearly all occurrences in HEAD.

A large fraction of the remaining uses were passing length less than or
equal to the known strlen() of the source, in which case no null-padding
can occur and the behavior is equivalent to memcpy(), though doubtless
slower and certainly harder to reason about.  So just use memcpy() in
these cases.

In other cases, use either StrNCpy() or strlcpy() as appropriate (depending
on whether padding to the full length of the destination buffer seems
useful).

I left a few strncpy() calls alone in the src/timezone/ code, to keep it
in sync with upstream (the IANA tzcode distribution).  There are also a
few such calls in ecpg that could possibly do with more analysis.

AFAICT, none of these changes are more than cosmetic, except for the four
occurrences in fe-secure-openssl.c, which are in fact buggy: an overlength
source leads to a non-null-terminated destination buffer and ensuing
misbehavior.  These don't seem like security issues, first because no stack
clobber is possible and second because if your values of sslcert etc are
coming from untrusted sources then you've got problems way worse than this.
Still, it's undesirable to have unpredictable behavior for overlength
inputs, so back-patch those four changes to all active branches.
2015-01-24 13:05:42 -05:00
Tom Lane 7584649a1c Re-pgindent src/bin/pg_dump/*.
Seems to have gotten rather messy lately, as a consequence of a couple
of large recent commits.
2014-10-17 12:19:05 -04:00
Alvaro Herrera 0eea8047bf pg_dump: Reduce use of global variables
Most pg_dump.c global variables, which were passed down individually to
dumping routines, are now grouped as members of the new DumpOptions
struct, which is used as a local variable and passed down into routines
that need it.  This helps future development efforts; in particular it
is said to enable a mode in which a parallel pg_dump run can output
multiple streams, and have them restored in parallel.

Also take the opportunity to clean up the pg_dump header files somewhat,
to avoid circularity.

Author: Joachim Wieland, revised by Álvaro Herrera
Reviewed by Peter Eisentraut
2014-10-14 15:00:55 -03:00
Tom Lane c81e63d85f Fix pg_restore's processing of old-style BLOB COMMENTS data.
Prior to 9.0, pg_dump handled comments on large objects by dumping a bunch
of COMMENT commands into a single BLOB COMMENTS archive object.  With
sufficiently many such comments, some of the commands would likely get
split across bufferloads when restoring, causing failures in
direct-to-database restores (though no problem would be evident in text
output).  This is the same type of issue we have with table data dumped as
INSERT commands, and it can be fixed in the same way, by using a mini SQL
lexer to figure out where the command boundaries are.  Fortunately, the
COMMENT commands are no more complex to lex than INSERTs, so we can just
re-use the existing lexer for INSERTs.

Per bug #10611 from Jacek Zalewski.  Back-patch to all active branches.
2014-06-12 20:14:32 -04:00
Bruce Momjian 0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Bruce Momjian 14ea89366f Properly detect read and write errors in pg_dump/dumpall, and pg_restore
Previously some I/O errors were ignored.
2014-05-05 20:27:16 -04:00
Heikki Linnakangas 7800a71291 Move some pg_dump function around.
Move functions used only by pg_dump and pg_restore from dumputils.c to a new
file, pg_backup_utils.c. dumputils.c is linked into psql and some programs
in bin/scripts, so it seems good to keep it slim. The parallel functionality
is moved to parallel.c, as is exit_horribly, because the interesting code in
exit_horribly is parallel-related.

This refactoring gets rid of the on_exit_msg_func function pointer. It was
problematic, because a modern gcc version with -Wmissing-format-attribute
complained if it wasn't marked with PF_PRINTF_ATTRIBUTE, but the ancient gcc
version that Tom Lane's old HP-UX box has didn't accept that attribute on a
function pointer, and gave an error. We still use a similar function pointer
trick for getLocalPQBuffer() function, to use a thread-local version of that
in parallel mode on Windows, but that dodges the problem because it doesn't
take printf-like arguments.
2013-03-27 18:10:40 +02:00
Heikki Linnakangas 901b89e37b Get rid of obsolete parse_version helper function.
For getting the server's version in numeric form, use PQserverVersion().
It does the exact same parsing as dumputils.c's parse_version(), and has
been around in libpq for a long time. For the client's version, just use
the PG_VERSION_NUM constant.
2013-03-26 15:32:02 +02:00
Andrew Dunstan 9e257a181c Add parallel pg_dump option.
New infrastructure is added which creates a set number of workers
(threads on Windows, forked processes on Unix). Jobs are then
handed out to these workers by the master process as needed.
pg_restore is adjusted to use this new infrastructure in place of the
old setup which created a new worker for each step on the fly. Parallel
dumps acquire a snapshot clone in order to stay consistent, if
available.

The parallel option is selected by the -j / --jobs command line
parameter of pg_dump.

Joachim Wieland, lightly editorialized by Andrew Dunstan.
2013-03-24 11:27:20 -04:00
Heikki Linnakangas 2930c05634 Don't pass NULL to fprintf, if a bogus connection string is given to pg_dump.
Back-patch to all supported branches.
2013-02-20 16:33:24 +02:00
Alvaro Herrera 8396447cdb Create libpgcommon, and move pg_malloc et al to it
libpgcommon is a new static library to allow sharing code among the
various frontend programs and backend; this lets us eliminate duplicate
implementations of common routines.  We avoid libpgport, because that's
intended as a place for porting issues; per discussion, it seems better
to keep them separate.

The first use case, and the only implemented by this patch, is pg_malloc
and friends, which many frontend programs were already using.

At the same time, we can use this to provide palloc emulation functions
for the frontend; this way, some palloc-using files in the backend can
also be used by the frontend cleanly.  To do this, we change palloc() in
the backend to be a function instead of a macro on top of
MemoryContextAlloc().  This was previously believed to cause loss of
performance, but this implementation has been tweaked by Tom and Andres
so that on modern compilers it provides a slight improvement over the
previous one.

This lets us clean up some places that were already with
localized hacks.

Most of the pg_malloc/palloc changes in this patch were authored by
Andres Freund. Zoltán Böszörményi also independently provided a form of
that.  libpgcommon infrastructure was authored by Álvaro.
2013-02-12 11:21:05 -03:00
Alvaro Herrera 58f17dcf83 Add translator comments to module names 2012-07-25 00:02:49 -04:00
Bruce Momjian 927d61eeff Run pgindent on 9.2 source tree in preparation for first 9.3
commit-fest.
2012-06-10 15:20:04 -04:00
Alvaro Herrera 9d23a70d51 pg_dump: get rid of die_horribly
The old code was using exit_horribly or die_horribly other depending on
whether it had an ArchiveHandle on which to close the connection or not;
but there were places that were passing a NULL ArchiveHandle to
die_horribly, and other places that used exit_horribly while having an
AH available.  So there wasn't all that much consistency.

Improve the situation by keeping only one of the routines, and instead
of having to pass the AH down from the caller, arrange for it to be
present for an on_exit_nicely callback to operate on.

Author: Joachim Wieland
Some tweaks by me

Per a suggestion from Robert Haas, in the ongoing "parallel pg_dump"
saga.
2012-03-20 18:58:00 -03:00
Robert Haas 689d0eb7db pg_dump: Remove global connection pointer.
Parallel pg_dump wants to have multiple ArchiveHandle objects, and
therefore multiple PGconns, in play at the same time.  This should
be just about the end of the refactoring that we need in order to
make that workable.
2012-02-16 13:00:24 -05:00
Robert Haas e9a22259c4 Invent on_exit_nicely for pg_dump.
Per recent discussions on pgsql-hackers regarding parallel pg_dump.
2012-02-16 11:49:20 -05:00
Robert Haas 1631598ea2 pg_dump: Further reduce reliance on global variables.
This is another round of refactoring to make things simpler for parallel
pg_dump.  pg_dump.c now issues SQL queries through the relevant Archive
object, rather than relying on the global variable g_conn.  This commit
isn't quite enough to get rid of g_conn entirely, but it makes a big
dent in its utilization and, along the way, manages to be slightly less
code than before.
2012-02-07 10:07:02 -05:00
Tom Lane f3316a05b5 Fix pg_restore's direct-to-database mode for INSERT-style table data.
In commit 6545a901aa, I removed the mini SQL
lexer that was in pg_backup_db.c, thinking that it had no real purpose
beyond separating COPY data from SQL commands, which purpose had been
obsoleted by long-ago fixes in pg_dump's archive file format.
Unfortunately this was in error: that code was also used to identify
command boundaries in INSERT-style table data, which is run together as a
single string in the archive file for better compressibility.  As a result,
direct-to-database restores from archive files made with --inserts or
--column-inserts fail in our latest releases, as reported by Dick Visser.

To fix, restore the mini SQL lexer, but simplify it by adjusting the
calling logic so that it's only required to cope with INSERT-style table
data, not arbitrary SQL commands.  This allows us to not have to deal with
SQL comments, E'' strings, or dollar-quoted strings, none of which have
ever been emitted by dumpTableData_insert.

Also, fix the lexer to cope with standard-conforming strings, which was the
actual bug that the previous patch was meant to solve.

Back-patch to all supported branches.  The previous patch went back to 8.2,
which unfortunately means that the EOL release of 8.2 contains this bug,
but I don't think we're doing another 8.2 release just because of that.
2012-01-06 13:04:09 -05:00
Bruce Momjian 9a7d49d1fb Move pg_dump memory routines into pg_dumpmem.c/h and restore common.c
with its original functions.  The previous function migration would
cause too many difficulties in back-patching.
2011-11-26 22:34:36 -05:00
Bruce Momjian 3c0afde11a Modify pg_dump to use error-free memory allocation macros. This avoids
ignoring errors and call-site error checking.
2011-11-25 15:40:51 -05:00