Commit Graph

104 Commits

Author SHA1 Message Date
Tom Lane 4317e0246c Rewrite --section option to decouple it from --schema-only/--data-only.
The initial implementation of pg_dump's --section option supposed that the
existing --schema-only and --data-only options could be made equivalent to
--section settings.  This is wrong, though, due to dubious but long since
set-in-stone decisions about where to dump SEQUENCE SET items, as seen in
bug report from Martin Pitt.  (And I'm not totally convinced there weren't
other bugs, either.)  Undo that coupling and instead drive --section
filtering off current-section state tracked as we scan through the TOC
list to call _tocEntryRequired().

To make sure those decisions don't shift around and hopefully save a few
cycles, run _tocEntryRequired() only once per TOC entry and save the result
in a new TOC field.  This required minor rejiggering of ACL handling but
also allows a far cleaner implementation of inhibit_data_for_failed_table.

Also, to ensure that pg_dump and pg_restore have the same behavior with
respect to the --section switches, add _tocEntryRequired() filtering to
WriteToc() and WriteDataChunks(), rather than trying to implement section
filtering in an entirely orthogonal way in dumpDumpableObject().  This
required adjusting the handling of the special ENCODING and STDSTRINGS
items, but they were pretty weird before anyway.

Minor other code review for the patch, too.
2012-05-29 23:22:14 -04:00
Tom Lane c89bdf7690 Eliminate some more O(N^2) behaviors in pg_dump/pg_restore.
This patch fixes three places (which AFAICT is all of them) where runtime
was O(N^2) in the number of TOC entries, by using an index array to replace
linear searches of the TOC list.  This performance issue is a bit less bad
than those recently fixed, because it depends on the number of items dumped
not the number in the source database, so the problem can be dodged by
doing partial dumps.

The previous coding already had an instance of one of the two index arrays
needed, but it was only calculated in parallel-restore cases; now we need
it all the time.  I also chose to move the arrays into the ArchiveHandle
data structure, to make this code a bit more ready for the day that we
try to sling multiple ArchiveHandles around in pg_dump or pg_restore.

Since we still need some server-side work before pg_dump can really cope
nicely with tens of thousands of tables, there's probably little point in
back-patching.
2012-05-28 20:38:28 -04:00
Alvaro Herrera 9d23a70d51 pg_dump: get rid of die_horribly
The old code was using exit_horribly or die_horribly other depending on
whether it had an ArchiveHandle on which to close the connection or not;
but there were places that were passing a NULL ArchiveHandle to
die_horribly, and other places that used exit_horribly while having an
AH available.  So there wasn't all that much consistency.

Improve the situation by keeping only one of the routines, and instead
of having to pass the AH down from the caller, arrange for it to be
present for an on_exit_nicely callback to operate on.

Author: Joachim Wieland
Some tweaks by me

Per a suggestion from Robert Haas, in the ongoing "parallel pg_dump"
saga.
2012-03-20 18:58:00 -03:00
Peter Eisentraut 19f45565f5 pg_dump: Remove undocumented "files" output format
This was for demonstration only, and now it was creating compiler
warnings from zlib without an obvious fix (see also
d923125b77), let's just remove it.  The
"directory" format is presumably similar enough anyway.
2012-03-20 20:39:59 +02:00
Robert Haas 1631598ea2 pg_dump: Further reduce reliance on global variables.
This is another round of refactoring to make things simpler for parallel
pg_dump.  pg_dump.c now issues SQL queries through the relevant Archive
object, rather than relying on the global variable g_conn.  This commit
isn't quite enough to get rid of g_conn entirely, but it makes a big
dent in its utilization and, along the way, manages to be slightly less
code than before.
2012-02-07 10:07:02 -05:00
Robert Haas 96abd81744 Remove dead declaration. 2012-02-06 12:09:20 -05:00
Peter Eisentraut 88a6ac9f93 pg_dump: Add GCC noreturn attribute to appropriate functions
This is a small help to the compiler and static analyzers.
2012-01-31 20:49:10 +02:00
Tom Lane f3316a05b5 Fix pg_restore's direct-to-database mode for INSERT-style table data.
In commit 6545a901aa, I removed the mini SQL
lexer that was in pg_backup_db.c, thinking that it had no real purpose
beyond separating COPY data from SQL commands, which purpose had been
obsoleted by long-ago fixes in pg_dump's archive file format.
Unfortunately this was in error: that code was also used to identify
command boundaries in INSERT-style table data, which is run together as a
single string in the archive file for better compressibility.  As a result,
direct-to-database restores from archive files made with --inserts or
--column-inserts fail in our latest releases, as reported by Dick Visser.

To fix, restore the mini SQL lexer, but simplify it by adjusting the
calling logic so that it's only required to cope with INSERT-style table
data, not arbitrary SQL commands.  This allows us to not have to deal with
SQL comments, E'' strings, or dollar-quoted strings, none of which have
ever been emitted by dumpTableData_insert.

Also, fix the lexer to cope with standard-conforming strings, which was the
actual bug that the previous patch was meant to solve.

Back-patch to all supported branches.  The previous patch went back to 8.2,
which unfortunately means that the EOL release of 8.2 contains this bug,
but I don't think we're doing another 8.2 release just because of that.
2012-01-06 13:04:09 -05:00
Andrew Dunstan a4cd6abcc9 Add --section option to pg_dump and pg_restore.
Valid values are --pre-data, data and post-data. The option can be
given more than once. --schema-only is equivalent to
--section=pre-data --section=post-data. --data-only is equivalent
to --section=data.

Andrew Dunstan, reviewed by Joachim Wieland and Josh Berkus.
2011-12-16 19:09:38 -05:00
Tom Lane 0195e5c4ab Clean up after recent pg_dump patches.
Fix entirely broken handling of va_list printing routines, update some
out-of-date comments, fix some bogus inclusion orders, fix NLS declarations,
fix missed realloc calls.
2011-11-29 20:41:54 -05:00
Bruce Momjian 8b08deb0d1 Simplify the pg_dump/pg_restore error reporting macros, and allow
pg_dumpall to use the same memory allocation functions as the others.
2011-11-29 16:34:45 -05:00
Tom Lane 6545a901aa Fix pg_restore's direct-to-database mode for standard_conforming_strings.
pg_backup_db.c contained a mini SQL lexer with which it tried to identify
boundaries between SQL commands, but that code was not designed to cope
with standard_conforming_strings, and would get the wrong answer if a
backslash immediately precedes a closing single quote in such a string,
as per report from Julian Mehnle.  The bug only affects direct-to-database
restores from archive files made with standard_conforming_strings = on.

Rather than complicating the code some more to try to fix that, let's just
rip it all out.  The only reason it was needed was to cope with COPY data
embedded into ordinary archive entries, which was a layout that was used
only for about the first three weeks of the archive format's existence,
and never in any production release of pg_dump.  Instead, just rely on the
archive file layout to tell us whether we're printing COPY data or not.

This bug represents a data corruption hazard in all releases in which
standard_conforming_strings can be turned on, ie 8.2 and later, so
back-patch to all supported branches.
2011-07-28 14:06:57 -04:00
Andrew Dunstan c02d5b7c27 Use a macro variable PG_PRINTF_ATTRIBUTE for the style used for checking printf type functions.
The style is set to "printf" for backwards compatibility everywhere except
on Windows, where it is set to "gnu_printf", which eliminates hundreds of
false error messages from modern versions of gcc arising from  %m and %ll{d,u}
formats.
2011-04-28 10:56:14 -04:00
Bruce Momjian bf50caf105 pgindent run before PG 9.1 beta 1. 2011-04-10 11:42:00 -04:00
Heikki Linnakangas 7f508f1c6b Add 'directory' format to pg_dump. The new directory format is compatible
with the 'tar' format, in that untarring a tar format archive produces a
valid directory format archive.

Joachim Wieland and Heikki Linnakangas
2011-01-23 23:10:15 +02:00
Tom Lane e2627258c3 Suppress possibly-uninitialized-variable warnings from gcc 4.5.
It appears that gcc 4.5 can issue such warnings for whole structs, not
just scalar variables as in the past.  Refactor some pg_dump code slightly
so that the OutputContext local variables are always initialized, even
if they won't be used.  It's cheap enough to not be worth worrying about.
2011-01-22 17:56:42 -05:00
Tom Lane 663fc32e26 Eliminate O(N^2) behavior in parallel restore with many blobs.
With hundreds of thousands of TOC entries, the repeated searches in
reduce_dependencies() become the dominant cost.  Get rid of that searching
by constructing reverse-dependency lists, which we can do in O(N) time
during the fix_dependencies() preprocessing.  I chose to store the reverse
dependencies as DumpId arrays for consistency with the forward-dependency
representation, and keep the previously-transient tocsByDumpId[] array
around to locate actual TOC entry structs quickly from dump IDs.

While this fixes the slow case reported by Vlad Arkhipov, there is still
a potential for O(N^2) behavior with sufficiently many tables:
fix_dependencies itself, as well as mark_create_done and
inhibit_data_for_failed_table, are doing repeated searches to deal with
table-to-table-data dependencies.  Possibly this work could be extended
to deal with that, although the latter two functions are also used in
non-parallel restore where we currently don't run fix_dependencies.

Another TODO is that we fail to parallelize restore of multiple blobs
at all.  This appears to require changes in the archive format to fix.

Back-patch to 9.0 where the problem was reported.  8.4 has potential issues
as well; but since it doesn't create a separate TOC entry for each blob,
it's at much less risk of having enough TOC entries to cause real problems.
2010-12-09 13:03:11 -05:00
Heikki Linnakangas bf9aa490db Refactor the pg_dump zlib code from pg_backup_custom.c to a separate file,
to make it easier to reuse that code. There is no user-visible changes.

This is in preparation for the patch to add a new archive format, a directory,
to perform a custom-like dump but with each table being dumped to a separate
file (that in turn is a prerequisite for parallel pg_dump). This also makes it
easier to add new compression methods in the future, and makes the
pg_backup_custom.c code easier to read, when the compression-related code is
factored out.

Joachim Wieland, with heavy editorialization by me.
2010-12-02 21:39:03 +02:00
Magnus Hagander 9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Bruce Momjian 65e806cba1 pgindent run for 9.0 2010-02-26 02:01:40 +00:00
Tom Lane c0d5be5d6a Fix up pg_dump's treatment of large object ownership and ACLs. We now emit
a separate archive entry for each BLOB, and use pg_dump's standard methods
for dealing with its ownership, ACL if any, and comment if any.  This means
that switches like --no-owner and --no-privileges do what they're supposed
to.  Preliminary testing says that performance is still reasonable even
with many blobs, though we'll have to see how that shakes out in the field.

KaiGai Kohei, revised by me
2010-02-18 01:29:10 +00:00
Itagaki Takahiro 84f910a707 Additional fixes for large object access control.
Use pg_largeobject_metadata.oid instead of pg_largeobject.loid
to enumerate existing large objects in pg_dump, pg_restore, and
contrib modules.
2009-12-14 00:39:11 +00:00
Tom Lane f033f6d28b Modify parallel pg_restore to track pending and ready items by means of
two new lists, rather than repeatedly rescanning the main TOC list.
This avoids a potential O(N^2) slowdown, although you'd need a *lot*
of tables to make that really significant; and it might simplify future
improvements in the scheduling algorithm by making the set of ready
items more easily inspectable.  The original thought that it would
in itself result in a more efficient job dispatch order doesn't seem
to have been borne out in testing, but it seems worth doing anyway.
2009-08-07 22:48:34 +00:00
Tom Lane b1732111f2 Fix pg_dump to do the right thing when escaping the contents of large objects.
The previous implementation got it right in most cases but failed in one:
if you pg_dump into an archive with standard_conforming_strings enabled, then
pg_restore to a script file (not directly to a database), the script will set
standard_conforming_strings = on but then emit large object data as
nonstandardly-escaped strings.

At the moment the code is made to emit hex-format bytea strings when dumping
to a script file.  We might want to change to old-style escaping for backwards
compatibility, but that would be slower and bulkier.  If we do, it's just a
matter of reimplementing appendByteaLiteral().

This has been broken for a long time, but given the lack of field complaints
I'm not going to worry about back-patching.
2009-08-04 21:56:09 +00:00
Tom Lane a5375bf903 Make pg_dump/pg_restore --clean options drop large objects too.
In passing, make invocations of lo_xxx functions a bit more schema-safe.

Itagaki Takahiro
2009-07-21 21:46:10 +00:00
Bruce Momjian d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Peter Eisentraut 9de59fd191 Add a -w/--no-password option that prevents all password prompts to all
programs that have a -W/--password option.

In passing, remove the ancient PSQL_ALWAYS_GET_PASSWORDS compile option.
2009-02-26 16:02:39 +00:00
Andrew Dunstan 775f1b379e Provide for parallel restoration from a custom format archive. Each data and
post-data step is run in a separate worker child (a thread on Windows, a child
process elsewhere) up to the concurrent number specified by the new pg_restore
command-line --multi-thread | -m switch.

Andrew Dunstan, with some editing by Tom Lane.
2009-02-02 20:07:37 +00:00
Peter Eisentraut 5f9869d0ee Use "alternative" instead of "alternate" where it is clearer. 2007-11-07 12:24:24 +00:00
Magnus Hagander 74096ed1fd Fix pg_dump on win32 to properly dump files larger than 2Gb when using
binary dump formats.
2007-02-19 15:05:06 +00:00
Bruce Momjian 6441288ec9 Add 'output file' option for pg_dumpall, especially useful for Win32,
where output redirection of child processes (pg_dump) doesn't work.

Dave Page
2007-01-25 03:30:43 +00:00
Bruce Momjian f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Bruce Momjian fcd1b0d891 Mark a few functions as static or NOT_USED. 2006-07-18 17:42:01 +00:00
Bruce Momjian b85a965f5f Allow each C include file to compile on its own by including any needed
header files.
2006-07-11 13:54:25 +00:00
Tom Lane 134b463f02 Fix up pg_dump to do string escaping fully correctly for client encoding
and standard_conforming_strings; likewise for the other client programs
that need it.  As per previous discussion, a pg_dump dump now conforms
to the standard_conforming_strings setting of the source database.
We don't use E'' syntax in the dump, thereby improving portability of
the SQL.  I added a SET escape_strings_warning = off command to keep
the dumps from getting a lot of back-chatter from that.
2006-05-28 21:13:54 +00:00
Tom Lane 8e4057cc7d Fix pg_restore to properly discard COPY data when trying to continue
after an error in a COPY statement.  Formerly it thought the COPY data
was SQL commands, and got quite confused.

Stephen Frost
2006-02-05 20:58:47 +00:00
Bruce Momjian 1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane c7d1a8d428 Fix some corner-case bugs in _sendSQLLine's parsing of SQL commands
> found in a pg_dump archive.  It had problems with dollar-quote tags
broken across bufferload boundaries (this may explain bug report from
Rod Taylor), also with dollar-quote literals of the form $a$a$...,
and was also confused about the rules for backslash in double quoted
identifiers (ie, they're not special).  Also put in placeholder support
for E'...' literals --- this will need more work later.
2005-09-11 04:10:25 +00:00
Neil Conway a4c75ece82 Fix a few macro definitions to ensure that unary minus is enclosed in
parentheses. This avoids possible operator precedence problems, and
is consistent with most of the macro definitions in the tree.
2005-07-27 12:44:10 +00:00
Tom Lane 7a28de2052 pg_dump can now dump large objects even in plain-text output mode, by
using the recently added lo_create() function.  The restore logic in
pg_restore is greatly simplified as well, since there's no need anymore
to try to adjust database references to match a new set of blob OIDs.
2005-06-21 20:45:44 +00:00
Bruce Momjian b492c3accc Add parentheses to macros when args are used in computations. Without
them, the executation behavior could be unexpected.
2005-05-25 21:40:43 +00:00
Tom Lane fd5437c78b Fix breakage created by addition of separate 'acl pass' in pg_dump.
Also clean up incredibly poor style in TocIDRequired() usage.
2005-01-25 22:44:31 +00:00
Tom Lane 04baa0ebf9 Update pg_dump to use SET DEFAULT_TABLESPACE instead of explicit
tablespace clauses; this should improve compatibility of dump files.
Philip Warner, some rework by Tom Lane.
2004-11-06 19:36:02 +00:00
Bruce Momjian b6b71b85bc Pgindent run for 8.0. 2004-08-29 05:07:03 +00:00
Bruce Momjian f7168bd44c They are two different problems; the TOC entry is important for any
multiline command  or to rerun the command easily later.

Whereas displaying the failed SQL command is a matter of fixing the
error
messages.

The latter is complicated by failed COPY commands which, with
die-on-errors
off, results in the data being processed as a command, so dumping the
command will dump all of the data.

In the case of long commands, should the whole command be dumped? eg.
(eg.
several pages of function definition).

In the case of the COPY command, I'm not sure what to do. Obviously, it
would be best to avoid sending the data, but the data and command are
combined (from memory). Also, the 'data' may be in the form of INSERT
statements.

Attached patch produces the first 125 chars of the command:

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC Entry 26; 1255 16449270
FUNCTION
plpgsql_call_handler() pjw
pg_restore: [archiver (db)] could not execute query: ERROR:  function
"plpgsql_call_handler" already exists with same argument types
     Command was: CREATE FUNCTION plpgsql_call_handler() RETURNS
language_handler
     AS '/var/lib/pgsql-8.0b1/lib/plpgsql', 'plpgsql_call_han...
pg_restore: [archiver (db)] Error from TOC Entry 27; 1255 16449271
FUNCTION
plpgsql_validator(oid) pjw
pg_restore: [archiver (db)] could not execute query: ERROR:  function
"plpgsql_validator" already exists with same argument types
     Command was: CREATE FUNCTION plpgsql_validator(oid) RETURNS void
     AS '/var/lib/pgsql-8.0b1/lib/plpgsql', 'plpgsql_validator'
     LANGU...

Philip Warner
2004-08-20 20:00:34 +00:00
Bruce Momjian 1b5e0143b5 This patch allows pg_restore to recognize $-quotes in SQL queries. It
will treat any unquoted string that starts with a $ and has no preceding
identifier chars as a potential $-quote tag, it then makes sure that the
tag chars are valid. If so, it processes the $-quote.

Philip Warner
2004-08-20 16:07:15 +00:00
Bruce Momjian ec7c4c1b66 Please find attached a small patch so that "pg_restore" ignores some sql
errors. This is the second submission, which integrates Tom comments about
localisation and exit code. I also added some comments about one sql
command which is not ignored.

Fabien COELHO
2004-04-22 02:39:10 +00:00
Bruce Momjian f23cce73b3 Use the new GUC variable default_with_oids in pg_dump, rather than using
WITH/WITHOUT OIDS in dump files.  This makes dump files more portable.

I have updated the pg_dump version so old binary dumps will load fine.

Pre-7.5 dumps use WITHOUT OIDS in SQL were needed, so they should be
fine.
2004-03-24 03:06:08 +00:00
Tom Lane 92bec9a0bc Cause pg_dump to emit a 'SET client_encoding' command at the start of
any restore operation, thereby ensuring that dumped data is interpreted
the same way it was dumped even if the target database has a different
encoding.  Per suggestions from Pavel Stehule and others.  Also,
simplify scheme for handling check_function_bodies ... we may as well
just set that at the head of the script.
2004-02-24 03:35:19 +00:00
Tom Lane 918b158743 Work around naming conflict between zlib and OpenSSL by tweaking inclusion
order.  Remove some unnecessary #includes (that duplicate c.h).
2003-12-08 16:39:05 +00:00