Commit Graph

1969 Commits

Author SHA1 Message Date
Peter Eisentraut 0bf83648a5 pg_dump: Fix dumping of inherited generated columns
Generation expressions of generated columns are always inherited, so
there is no need to set them separately in child tables, and there is
no syntax to do so either.  The code previously used the code paths
for the handling of default values, for which different rules apply;
in particular it might want to set a default value explicitly for an
inherited column.  This resulted in unrestorable dumps.  For generated
columns, just skip them in inherited tables.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/15830.1575468847%40sss.pgh.pa.us
2021-02-03 11:27:13 +01:00
Peter Eisentraut 2592be8be5 Fix typo 2021-01-29 09:43:21 +01:00
Tom Lane 58cd8dca3d Avoid redundantly prefixing PQerrorMessage for a connection failure.
libpq's error messages for connection failures pretty well stand on
their own, especially since commits 52a10224e/27a48e5a1.  Prefixing
them with 'could not connect to database "foo"' or the like is just
redundant, and perhaps even misleading if the specific database name
isn't relevant to the failure.  (When it is, we trust that the
backend's error message will include the DB name.)  Indeed, psql
hasn't used any such prefix in a long time.  So, make all our other
programs and documentation examples agree with psql's practice.

Discussion: https://postgr.es/m/1094524.1611266589@sss.pgh.pa.us
2021-01-22 16:52:31 -05:00
Tom Lane 27a48e5a16 Improve new wording of libpq's connection failure messages.
"connection to server so-and-so failed:" seems clearer than the
previous wording "could not connect to so-and-so:" (introduced by
52a10224e), because the latter suggests a network-level connection
failure.  We're now prefixing this string to all types of connection
failures, for instance authentication failures; so we need wording
that doesn't imply a low-level error.

Per discussion with Robert Haas.

Discussion: https://postgr.es/m/CA+TgmobssJ6rS22dspWnu-oDxXevGmhMD8VcRBjmj-b9UDqRjw@mail.gmail.com
2021-01-21 16:10:18 -05:00
Noah Misch f713ff7c64 Fix pg_dump for GRANT OPTION among initial privileges.
The context is an object that no longer bears some aclitem that it bore
initially.  (A user issued REVOKE or GRANT statements upon the object.)
pg_dump is forming SQL to reproduce the object ACL.  Since initdb
creates no ACL bearing GRANT OPTION, reaching this bug requires an
extension where the creation script establishes such an ACL.  No PGXN
extension does that.  If an installation did reach the bug, pg_dump
would have omitted a semicolon, causing a REVOKE and the next SQL
statement to fail.  Separately, since the affected code exists to
eliminate an entire aclitem, it wants plain REVOKE, not REVOKE GRANT
OPTION FOR.  Back-patch to 9.6, where commit
23f34fa4ba first appeared.

Discussion: https://postgr.es/m/20210109102423.GA160022@rfd.leadboat.com
2021-01-16 12:21:35 -08:00
Tom Lane 8e396a773b pg_dump: label PUBLICATION TABLE ArchiveEntries with an owner.
This is the same fix as commit 9eabfe300 applied to INDEX ATTACH
entries, but for table-to-publication attachments.  As in that
case, even though the backend doesn't record "ownership" of the
attachment, we still ought to label it in the dump archive with
the role name that should run the ALTER PUBLICATION command.
The existing behavior causes the ALTER to be done by the original
role that started the restore; that will usually work fine, but
there may be corner cases where it fails.

The bulk of the patch is concerned with changing struct
PublicationRelInfo to include a pointer to the associated
PublicationInfo object, so that we can get the owner's name
out of that when the time comes.  While at it, I rewrote
getPublicationTables() to do just one query of pg_publication_rel,
not one per table.

Back-patch to v10 where this code was introduced.

Discussion: https://postgr.es/m/1165710.1610473242@sss.pgh.pa.us
2021-01-14 16:19:38 -05:00
Tom Lane 9eabfe300a pg_dump: label INDEX ATTACH ArchiveEntries with an owner.
Although a partitioned index's attachment to its parent doesn't
have separate ownership, the ArchiveEntry for it needs to be
marked with an owner anyway, to ensure that the ALTER command
is run by the appropriate role when restoring with
--use-set-session-authorization.  Without this, the ALTER will
be run by the role that started the restore session, which will
usually work but it's formally the wrong thing.

Back-patch to v11 where this type of ArchiveEntry was added.
In HEAD, add equivalent commentary to the just-added TABLE ATTACH
case, which I'd made do the right thing already.

Discussion: https://postgr.es/m/1094034.1610418498@sss.pgh.pa.us
2021-01-12 13:37:38 -05:00
Tom Lane 9a4c0e36fb Dump ALTER TABLE ... ATTACH PARTITION as a separate ArchiveEntry.
Previously, we emitted the ATTACH PARTITION command as part of
the child table's ArchiveEntry.  This was a poor choice since it
complicates restoring the partition as a standalone table; you have
to ignore the error from the ATTACH, which isn't even an option when
restoring direct-to-database with pg_restore.  (pg_restore will issue
the whole ArchiveEntry as one PQexec, so that any error rolls back
the table creation as well.)  Hence, separate it out as its own
ArchiveEntry, as indeed we already did for index ATTACH PARTITION
commands.

Justin Pryzby

Discussion: https://postgr.es/m/20201023052940.GE9241@telsasoft.com
2021-01-11 21:09:18 -05:00
Tom Lane d5ab79d815 Make pg_dump's table of object-type priorities more maintainable.
Wedging a new object type into this table has historically required
manually renumbering a lot of existing entries.  (Although it appears
that some people got lazy and re-used the priority level of an
existing object type, even if it wasn't particularly related.)
We can let the compiler do the counting by inventing an enum type that
lists the desired priority levels in order.  Now, if you want to add
or remove a priority level, that's a one-liner.

This patch is not purely cosmetic, because I split apart the priorities
of DO_COLLATION and DO_TRANSFORM, as well as those of DO_ACCESS_METHOD
and DO_OPERATOR, which look to me to have been merged out of expediency
rather than because it was a good idea.  Shell types continue to be
sorted interchangeably with full types, and opclasses interchangeably
with opfamilies.
2021-01-11 21:09:18 -05:00
Tom Lane 52a10224e3 Uniformly identify the target host in libpq connection failure reports.
Prefix "could not connect to host-or-socket-path:" to all connection
failure cases that occur after the socket() call, and remove the
ad-hoc server identity data that was appended to a few of these
messages.  This should produce much more intelligible error reports
in multiple-target-host situations, especially for error cases that
are off the beaten track to any degree (because none of those provided
any server identity info).

As an example of the change, formerly a connection attempt with a bad
port number such as "psql -p 12345 -h localhost,/tmp" might produce

psql: error: could not connect to server: Connection refused
        Is the server running on host "localhost" (::1) and accepting
        TCP/IP connections on port 12345?
could not connect to server: Connection refused
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 12345?
could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/tmp/.s.PGSQL.12345"?

Now it looks like

psql: error: could not connect to host "localhost" (::1), port 12345: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
could not connect to host "localhost" (127.0.0.1), port 12345: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
could not connect to socket "/tmp/.s.PGSQL.12345": No such file or directory
        Is the server running locally and accepting connections on that socket?

This requires adjusting a couple of regression tests to allow for
variation in the contents of a connection failure message.

Discussion: https://postgr.es/m/BN6PR05MB3492948E4FD76C156E747E8BC9160@BN6PR05MB3492.namprd05.prod.outlook.com
2021-01-11 14:03:39 -05:00
Bruce Momjian ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Michael Paquier 1b3433e25f doc: Improve description of min_dynamic_shared_memory
While on it, fix one oversight in 90fbf7c, that introduced a reference
to an incorrect value for the compression level of pg_dump.

Author: Justin Pryzby
Reviewed-by: Thomas Munro, Michael Paquier
Discussion: https://postgr.es/m/CA+hUKGJRTLWWPcQfjm_xaOk98M8aROK903X92O0x-4vLJPWrrA@mail.gmail.com
2020-12-29 16:49:14 +09:00
Michael Paquier 90fbf7c57d Fix typos and grammar in docs and comments
This fixes several areas of the documentation and some comments in
matters of style, grammar, or even format.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20201222041153.GK30237@telsasoft.com
2020-12-24 17:05:49 +09:00
Alexander Korotkov 8344d72ccc Fixes for pg_dump.c regarding multiranges
This commit fixes two wrong version number checks and one wrong check for null.
2020-12-20 08:14:35 +03:00
Alexander Korotkov 6df7a9698b Multirange datatypes
Multiranges are basically sorted arrays of non-overlapping ranges with
set-theoretic operations defined over them.

Since v14, each range type automatically gets a corresponding multirange
datatype.  There are both manual and automatic mechanisms for naming multirange
types.  Once can specify multirange type name using multirange_type_name
attribute in CREATE TYPE.  Otherwise, a multirange type name is generated
automatically.  If the range type name contains "range" then we change that to
"multirange".  Otherwise, we add "_multirange" to the end.

Implementation of multiranges comes with a space-efficient internal
representation format, which evades extra paddings and duplicated storage of
oids.  Altogether this format allows fetching a particular range by its index
in O(n).

Statistic gathering and selectivity estimation are implemented for multiranges.
For this purpose, stored multirange is approximated as union range without gaps.
This field will likely need improvements in the future.

Catversion is bumped.

Discussion: https://postgr.es/m/CALNJ-vSUpQ_Y%3DjXvTxt1VYFztaBSsWVXeF1y6gTYQ4bOiWDLgQ%40mail.gmail.com
Discussion: https://postgr.es/m/a0b8026459d1e6167933be2104a6174e7d40d0ab.camel%40j-davis.com#fe7218c83b08068bfffb0c5293eceda0
Author: Paul Jungwirth, revised by me
Reviewed-by: David Fetter, Corey Huinker, Jeff Davis, Pavel Stehule
Reviewed-by: Alvaro Herrera, Tom Lane, Isaac Morland, David G. Johnston
Reviewed-by: Zhihong Yu, Alexander Korotkov
2020-12-20 07:20:33 +03:00
Peter Eisentraut d2a2808eb4 pg_dump: Don't use enums for defining bit mask values
This usage would mean that values of the enum type are potentially not
one of the enum values.  Use macros instead, like everywhere else.

Discussion: https://www.postgresql.org/message-id/14dde730-1d34-260e-fa9d-7664df2d6313@enterprisedb.com
2020-12-11 19:15:30 +01:00
Tom Lane c7aba7c14e Support subscripting of arbitrary types, not only arrays.
This patch generalizes the subscripting infrastructure so that any
data type can be subscripted, if it provides a handler function to
define what that means.  Traditional variable-length (varlena) arrays
all use array_subscript_handler(), while the existing fixed-length
types that support subscripting use raw_array_subscript_handler().
It's expected that other types that want to use subscripting notation
will define their own handlers.  (This patch provides no such new
features, though; it only lays the foundation for them.)

To do this, move the parser's semantic processing of subscripts
(including coercion to whatever data type is required) into a
method callback supplied by the handler.  On the execution side,
replace the ExecEvalSubscriptingRef* layer of functions with direct
calls to callback-supplied execution routines.  (Thus, essentially
no new run-time overhead should be caused by this patch.  Indeed,
there is room to remove some overhead by supplying specialized
execution routines.  This patch does a little bit in that line,
but more could be done.)

Additional work is required here and there to remove formerly
hard-wired assumptions about the result type, collation, etc
of a SubscriptingRef expression node; and to remove assumptions
that the subscript values must be integers.

One useful side-effect of this is that we now have a less squishy
mechanism for identifying whether a data type is a "true" array:
instead of wiring in weird rules about typlen, we can look to see
if pg_type.typsubscript == F_ARRAY_SUBSCRIPT_HANDLER.  For this
to be bulletproof, we have to forbid user-defined types from using
that handler directly; but there seems no good reason for them to
do so.

This patch also removes assumptions that the number of subscripts
is limited to MAXDIM (6), or indeed has any hard-wired limit.
That limit still applies to types handled by array_subscript_handler
or raw_array_subscript_handler, but to discourage other dependencies
on this constant, I've moved it from c.h to utils/array.h.

Dmitry Dolgov, reviewed at various times by Tom Lane, Arthur Zakirov,
Peter Eisentraut, Pavel Stehule

Discussion: https://postgr.es/m/CA+q6zcVDuGBv=M0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w@mail.gmail.com
Discussion: https://postgr.es/m/CA+q6zcVovR+XY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA@mail.gmail.com
2020-12-09 12:40:37 -05:00
Tom Lane 0473296246 pg_dump: Reorganize dumpBaseType()
Along the same lines as ed2c7f65b and daa9fe8a5, reduce code duplication
by having just one copy of the parts of the query that are the same
across all server versions; and make the conditionals control the
smallest possible amount of code.  This is in preparation for adding
another dumpable field to pg_type.
2020-12-06 22:37:40 -05:00
Michael Paquier 13b58f8934 Improve failure detection with array parsing in pg_dump
Similarly to 3636efa, the checks done in pg_dump when parsing array
values from catalogs have been too lax.  Under memory pressure, it could
be possible, though very unlikely, to finish with dumps that miss some
data like:
- Statistics for indexes
- Run-time configuration of functions
- Configuration of extensions
- Publication list for a subscription

No backpatch is done as this is not going to be a problem in practice.
For example, if an OOM causes an array parsing to fail, a follow-up code
path of pg_dump would most likely complain with an allocation failure
due to the memory pressure.

Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20201111061319.GE2276@paquier.xyz
2020-11-19 10:36:08 +09:00
Thomas Munro 3636efa119 Fix parsePGArray() error checking in pg_dump.
Coverity complained about a defect in commit 257836a7:

  Calling "parsePGArray" without checking return value (as is
  done elsewhere 11 out of 13 times).

Fix, and also check for empty strings explicitly (NULL as represented by
PQgetvalue()).  That worked correctly before only because parsePGArray()
happens to set *nitems = 0 when it fails on an empty string.  Also
convert a sanity check assertion to an error to be more paranoid, and
pgindent a nearby line.

Reported-by: Michael Paquier <michael@paquier.xyz>
2020-11-09 16:26:47 +13:00
Michael Paquier a05dbf477b Add GUC_LIST_INPUT and GUC_LIST_QUOTE to unix_socket_directories
This should have been done in the initial commit that made
unix_socket_directories a list as of c9b0cbe.  This change allows to
support correctly the case of ALTER SYSTEM, where it is possible to
specify multiple paths as a list, like the following pattern where
flattening is applied to each item:
ALTER SYSTEM SET unix_socket_directories = '/path1', '/path2';

Any parameters specified in postgresql.conf are parsed the same way, so
there is no compatibility change.  pg_dump has a hardcoded list of
parameters marked with GUC_LIST_QUOTE, that gets its routine update.
These are reordered alphabetically for clarity.

Author: Ian Lawrence Barwick
Reviewed-by: Peter Eisentraunt, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/CAB8KJ=iMOtNY6_sUwV=LQVCJ2zgYHBDyNzVfvE5GN3WQ3v9kQg@mail.gmail.com
2020-11-07 10:30:22 +09:00
Tom Lane d3adaabaf7 Revert "pg_dump: Lock all relations, not just plain tables".
Revert 403a3d91c, as well as the followup fix 7f4235032, in all
branches.  We need to think a bit harder about what the behavior
of LOCK TABLE on views should be, and there's no time for that
before next week's releases.  We'll take another crack at this
later.

Discussion: https://postgr.es/m/16703-e348f58aab3cf6cc@postgresql.org
2020-11-06 15:48:04 -05:00
Thomas Munro 257836a755 Track collation versions for indexes.
Record the current version of dependent collations in pg_depend when
creating or rebuilding an index.  When accessing the index later, warn
that the index may be corrupted if the current version doesn't match.

Thanks to Douglas Doole, Peter Eisentraut, Christoph Berg, Laurenz Albe,
Michael Paquier, Robert Haas, Tom Lane and others for very helpful
discussion.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> (earlier versions)
Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
2020-11-03 01:19:50 +13:00
Thomas Munro 7d1297df08 Remove pg_collation.collversion.
This model couldn't be extended to cover the default collation, and
didn't have any information about the affected database objects when the
version changed.  Remove, in preparation for a follow-up commit that
will add a new mechanism.

Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
2020-11-03 00:44:59 +13:00
Tom Lane 7f4235032f Avoid null pointer dereference if error result lacks SQLSTATE.
Although error results received from the backend should always have
a SQLSTATE field, ones generated by libpq won't, making this code
vulnerable to a crash after, say, untimely loss of connection.
Noted by Coverity.

Oversight in commit 403a3d91c.  Back-patch to 9.5, as that was.
2020-11-01 11:26:16 -05:00
Alvaro Herrera 403a3d91c8
pg_dump: Lock all relations, not just plain tables
Now that LOCK TABLE can take any relation type, acquire lock on all
relations that are to be dumped.  This prevents schema changes or
deadlock errors that could cause a dump to fail after expending much
effort.  The server is tested to have the capability and the feature
disabled if it doesn't, so that a patched pg_dump doesn't fail when
connecting to an unpatched server.

Backpatch to 9.5.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Wells Oliver <wells.oliver@gmail.com>
Discussion: https://postgr.es/m/20201021200659.GA32358@alvherre.pgsql
2020-10-27 14:31:37 -03:00
Tom Lane 929c69aa19 In pg_restore's dump_lo_buf(), work a little harder on error handling.
Failure to write data to a large object during restore led to an ugly
and uninformative error message.  To add insult to injury, it then
fatal'd out, where other SQL-level errors usually result in pressing on.

Report the underlying error condition, rather than just giving not-very-
useful byte counts, and use warn_or_exit_horribly() so as to adhere to
pg_restore's general policy about whether to continue or not.

Also recognize that lo_write() returns int not size_t.

Per report from Justin Pryzby, though I didn't use his patch.
Given the lack of comparable complaints, I'm not sure this is
worth back-patching.

Discussion: https://postgr.es/m/20201018010232.GF9241@telsasoft.com
2020-10-18 12:26:02 -04:00
Tom Lane 7d00a6b2de In libpq for Windows, call WSAStartup once and WSACleanup not at all.
The Windows documentation insists that every WSAStartup call should
have a matching WSACleanup call.  However, if that ever had actual
relevance, it wasn't in this century.  Every remotely-modern Windows
kernel is capable of cleaning up when a process exits without doing
that, and must be so to avoid resource leaks in case of a process
crash.  Moreover, Postgres backends have done WSAStartup without
WSACleanup since commit 4cdf51e64 in 2004, and we've never seen any
indication of a problem with that.

libpq's habit of doing WSAStartup during connection start and
WSACleanup during shutdown is also rather inefficient, since a
series of non-overlapping connection requests leads to repeated,
quite expensive DLL unload/reload cycles.  We document a workaround
for that (having the application call WSAStartup for itself), but
that's just a kluge.  It's also worth noting that it's far from
uncommon for applications to exit without doing PQfinish, and
we've not heard reports of trouble from that either.

However, the real reason for acting on this is that recent
experiments by Alexander Lakhin suggest that calling WSACleanup
during PQfinish might be triggering the symptom we occasionally see
that a process using libpq fails to emit expected stdio output.

Therefore, let's change libpq so that it calls WSAStartup only
once per process, during the first connection attempt, and never
calls WSACleanup at all.

While at it, get rid of the only other WSACleanup call in our code
tree, in pg_dump/parallel.c; that presumably is equally useless.

If this proves to suppress the fairly-common ecpg test failures
we see on Windows, I'll back-patch, but for now let's just do it
in HEAD and see what happens.

Discussion: https://postgr.es/m/ac976d8c-03df-d6b8-025c-15a2de8d9af1@postgrespro.ru
2020-10-17 16:53:48 -04:00
David Rowley 110d81728a Fixup some appendStringInfo and appendPQExpBuffer calls
A number of places were using appendStringInfo() when they could have been
using appendStringInfoString() instead.  While there's no functionality
change there, it's just more efficient to use appendStringInfoString()
when no formatting is required.  Likewise for some
appendStringInfoString() calls which were just appending a single char.
We can just use appendStringInfoChar() for that.

Additionally, many places were using appendPQExpBuffer() when they could
have used appendPQExpBufferStr(). Change those too.

Patch by Zhijie Hou, but further searching by me found significantly more
places that deserved the same treatment.

Author: Zhijie Hou, David Rowley
Discussion: https://postgr.es/m/cb172cf4361e4c7ba7167429070979d4@G08CNEXMBPEKD05.g08.fujitsu.local
2020-10-15 20:35:17 +13:00
Tom Lane eeb01eb1f5 Remove pointless error-code checking in pg_dump/parallel.c.
Commit fe27009cb tried to make parallel.c's Windows implementation of
piperead() translate Windows socket errors to Unix, but that didn't
actually work because TranslateSocketError() is backend-internal code
(and not even public there).  But on closer inspection, the sole
caller of this function doesn't actually care whether the result is
zero or negative, much less inspect the errno.  So the whole exercise
is totally useless, and has been since this code was introduced.
Rip it out and just call recv() directly.

Per buildfarm.

Discussion: https://postgr.es/m/2621622.1602184554@sss.pgh.pa.us
2020-10-10 15:33:54 -04:00
Tom Lane fe27009cbb Recognize network-failure errnos as indicating hard connection loss.
Up to now, only ECONNRESET (and EPIPE, in most but not quite all places)
received special treatment in our error handling logic.  This patch
changes things so that related error codes such as ECONNABORTED are
also recognized as indicating that the connection's dead and unlikely
to come back.

We continue to think, however, that only ECONNRESET and EPIPE should be
reported as probable server crashes; the other cases indicate network
connectivity problems but prove little about the server's state.  Thus,
there's no change in the error message texts that are output for such
cases.  The key practical effect is that errcode_for_socket_access()
will report ERRCODE_CONNECTION_FAILURE rather than
ERRCODE_INTERNAL_ERROR for a network failure.  It's expected that this
will fix buildfarm member lorikeet's failures since commit 32a9c0bdf,
as that seems to be due to not treating ECONNABORTED equivalently to
ECONNRESET.

The set of errnos treated this way now includes ECONNABORTED, EHOSTDOWN,
EHOSTUNREACH, ENETDOWN, ENETRESET, and ENETUNREACH.  Several of these
were second-class citizens in terms of their handling in places like
get_errno_symbol(), so upgrade the infrastructure where necessary.

As committed, this patch assumes that all these symbols are defined
everywhere.  POSIX specifies all of them except EHOSTDOWN, but that
seems to exist on all platforms of interest; we'll see what the
buildfarm says about that.

Probably this should be back-patched, but let's see what the buildfarm
thinks of it first.

Fujii Masao and Tom Lane

Discussion: https://postgr.es/m/2621622.1602184554@sss.pgh.pa.us
2020-10-10 13:28:12 -04:00
Tom Lane 9e5f1f21ad Rethink recent fix for pg_dump's handling of extension config tables.
Commit 3eb3d3e78 was a few bricks shy of a load: while it correctly
set the table's "interesting" flag when deciding to dump the data of
an extension config table, it was not correct to clear that flag
if we concluded we shouldn't dump the data.  This led to the crash
reported in bug #16655, because in fact we'll traverse dumpTableSchema
anyway for all extension tables (to see if they have user-added
seclabels or RLS policies).

The right thing to do is to force "interesting" true in makeTableDataInfo,
and otherwise leave the flag alone.  (Doing it there is more future-proof
in case additional calls are added, and it also avoids setting the flag
unnecessarily if that function decides the table is non-dumpable.)

This investigation also showed that while only the --inserts code path
had an obvious failure in the case considered by 3eb3d3e78, the COPY
code path also has a problem with not having loaded table subsidiary
data.  That causes fmtCopyColumnList to silently return an empty string
instead of the correct column list.  That accidentally mostly works,
which perhaps is why we didn't notice this before.  It would only fail
if the restore column order is different from the dump column order,
which only happens in weird inheritance cases, so it's not surprising
nobody had hit the case with an extension config table.  Nonetheless,
it's a bug, and it goes a long way back, not just to v12 where the
--inserts code path started to have a problem with this.

In hopes of catching such cases a bit sooner in future, add some
Asserts that "interesting" has been set in both dumpTableData and
dumpTableSchema.  Adjust the test case added by 3eb3d3e78 so that it
checks the COPY rather than INSERT form of that bug, allowing it to
detect the longer-standing symptom.

Per bug #16655 from Cameron Daniel.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/16655-5c92d6b3a9438137@postgresql.org
Discussion: https://postgr.es/m/18048b44-3414-b983-8c7c-9165b177900d@2ndQuadrant.com
2020-10-07 12:51:02 -04:00
Tom Lane a45bc8a4f6 Fix handling of -d "connection string" in pg_dump/pg_restore.
Parallel pg_dump failed if its -d parameter was a connection string
containing any essential information other than host, port, or username.
The same was true for pg_restore with --create.

The reason is that these scenarios failed to preserve the connection
string from the command line; the code felt free to replace that with
just the database name when reconnecting from a pg_dump parallel worker
or after creating the target database.  By chance, parallel pg_restore
did not suffer this defect, as long as you didn't say --create.

In practice it seems that the error would be obvious only if the
connstring included essential, non-default SSL or GSS parameters.
This may explain why it took us so long to notice.  (It also makes
it very difficult to craft a regression test case illustrating the
problem, since the test would fail in builds without those options.)

Fix by refactoring so that ConnectDatabase always receives all the
relevant options directly from the command line, rather than
reconstructed values.  Inject a different database name, when necessary,
by relying on libpq's rules for handling multiple "dbname" parameters.

While here, let's get rid of the essentially duplicate _connectDB
function, as well as some obsolete nearby cruft.

Per bug #16604 from Zsolt Ero.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/16604-933f4b8791227b15@postgresql.org
2020-09-24 18:19:38 -04:00
Tom Lane 2e3c19462d Simplify SortTocFromFile() by removing fixed buffer-size limit.
pg_restore previously coped with overlength TOC-file lines using some
complicated logic to ignore additional bufferloads.  While this isn't
wrong, since we don't expect that the interesting part of a line would
run to more than a dozen or so bytes, it's more complex than it needs
to be.  Use a StringInfo instead of a fixed-size buffer so that we can
process long lines as single entities and thus not need the extra
logic.

Daniel Gustafsson

Discussion: https://postgr.es/m/48A4FA71-524E-41B9-953A-FD04EF36E2E7@yesql.se
2020-09-22 16:03:32 -04:00
Peter Eisentraut 8354e7b27e Remove unused parameters
Remove various unused parameters in pg_dump code.  These have all
become unused over time or were never used.

Discussion: https://www.postgresql.org/message-id/flat/511bb100-f829-ba21-2f10-9f952ec06ead%402ndquadrant.com
2020-09-19 13:29:54 +02:00
Tom Lane 1ed6b89563 Remove support for postfix (right-unary) operators.
This feature has been a thorn in our sides for a long time, causing
many grammatical ambiguity problems.  It doesn't seem worth the
pain to continue to support it, so remove it.

There are some follow-on improvements we can make in the grammar,
but this commit only removes the bare minimum number of productions,
plus assorted backend support code.

Note that pg_dump and psql continue to have full support, since
they may be used against older servers.  However, pg_dump warns
about postfix operators.  There is also a check in pg_upgrade.

Documentation-wise, I (tgl) largely removed the "left unary"
terminology in favor of saying "prefix operator", which is
a more standard and IMO less confusing term.

I included a catversion bump, although no initial catalog data
changes here, to mark the boundary at which oprkind = 'r'
stopped being valid in pg_operator.

Mark Dilger, based on work by myself and Robert Haas;
review by John Naylor

Discussion: https://postgr.es/m/38ca86db-42ab-9b48-2902-337a0d6b8311@2ndquadrant.com
2020-09-17 19:38:05 -04:00
Tom Lane 99175141c9 Improve common/logging.c's support for multiple verbosity levels.
Instead of hard-wiring specific verbosity levels into the option
processing of client applications, invent pg_logging_increase_verbosity()
and encourage clients to implement --verbose by calling that.  Then,
the common convention that more -v's gets you more verbosity just works.

In particular, this allows resurrection of the debug-grade messages that
have long existed in pg_dump and its siblings.  They were unreachable
before this commit due to lack of a way to select PG_LOG_DEBUG logging
level.  (It appears that they may have been unreachable for some time
before common/logging.c was introduced, too, so I'm not specifically
blaming cc8d41511 for the oversight.  One reason for thinking that is
that it's now apparent that _allocAH()'s message needs a null-pointer
guard.  Testing might have failed to reveal that before 96bf88d52.)

Discussion: https://postgr.es/m/1173106.1600116625@sss.pgh.pa.us
2020-09-17 12:52:18 -04:00
Michael Paquier 5423853fee Avoid retrieval of CHECK constraints and DEFAULT exprs in data-only dump
Those extra queries are not necessary when doing a data-only dump.  With
this change, this means that the dependencies between CHECK/DEFAULT and
the parent table are not tracked anymore for a data-only dump.  However,
these dependencies are only used for the schema generation and we have
never guaranteed that a dump can be reloaded if a CHECK constraint uses
a custom function whose behavior changes when loading the data, like
when using cross-table references in the CHECK function.

Author: Julien Rouhaud
Reviewed-by: Daniel Gustafsson, Michael Paquier
Discussion: https://postgr.es/m/20200712054850.GA92357@nol
2020-09-16 16:26:50 +09:00
Michael Paquier ac673a1aaf Avoid useless allocations for information of dumpable objects in pg_dump/
If there are no objects of a certain type, there is no need to do an
allocation for a set of DumpableObject items.  The previous coding did
an allocation of 1 byte instead as per the fallback of pg_malloc() in
the event of an allocation size of zero.  This assigns NULL instead for
a set of dumpable objects.

A similar rule already applied to findObjectByOid(), so this makes the
code more defensive as we would just fail with a pointer dereference
instead of attempting to use some incorrect data if a non-existing,
positive, OID is given by a caller of this function.

Author: Daniel Gustafsson
Reviewed-by: Julien Rouhaud, Ranier Vilela
Discussion: https://postgr.es/m/26C43E58-BDD0-4F1A-97CC-4A07B52E32C5@yesql.se
2020-09-14 10:44:23 +09:00
Tom Lane a5cc4dab6d Yet more elimination of dead stores and useless initializations.
I'm not sure what tool Ranier was using, but the ones I contributed
were found by using a newer version of scan-build than I tried before.

Ranier Vilela and Tom Lane

Discussion: https://postgr.es/m/CAEudQAo1+AcGppxDSg8k+zF4+Kv+eJyqzEDdbpDg58-=MQcerQ@mail.gmail.com
2020-09-05 13:17:32 -04:00
Andrew Dunstan 3eb3d3e782 Collect attribute data on extension owned tables being dumped
If this data is not collected, pg_dump segfaults if asked for column
inserts.

Fix by Fabrízio de Royes Mello

Backpatch to release 12 where the bug was introduced.
2020-09-04 13:54:54 -04:00
Tom Lane 67a472d71c Remove arbitrary restrictions on password length.
This patch started out with the goal of harmonizing various arbitrary
limits on password length, but after awhile a better idea emerged:
let's just get rid of those fixed limits.

recv_password_packet() has an arbitrary limit on the packet size,
which we don't really need, so just drop it.  (Note that this doesn't
really affect anything for MD5 or SCRAM password verification, since
those will hash the user's password to something shorter anyway.
It does matter for auth methods that require a cleartext password.)

Likewise remove the arbitrary error condition in pg_saslprep().

The remaining limits are mostly in client-side code that prompts
for passwords.  To improve those, refactor simple_prompt() so that
it allocates its own result buffer that can be made as big as
necessary.  Actually, it proves best to make a separate routine
pg_get_line() that has essentially the semantics of fgets(), except
that it allocates a suitable result buffer and hence will never
return a truncated line.  (pg_get_line has a lot of potential
applications to replace randomly-sized fgets buffers elsewhere,
but I'll leave that for another patch.)

I built pg_get_line() atop stringinfo.c, which requires moving
that code to src/common/; but that seems fine since it was a poor
fit for src/port/ anyway.

This patch is mostly mine, but it owes a good deal to Nathan Bossart
who pressed for a solution to the password length problem and
created a predecessor patch.  Also thanks to Peter Eisentraut and
Stephen Frost for ideas and discussion.

Discussion: https://postgr.es/m/09512C4F-8CB9-4021-B455-EF4C4F0D55A0@amazon.com
2020-09-03 20:09:18 -04:00
Amit Kapila 464824323e Add support for streaming to built-in logical replication.
To add support for streaming of in-progress transactions into the
built-in logical replication, we need to do three things:

* Extend the logical replication protocol, so identify in-progress
transactions, and allow adding additional bits of information (e.g.
XID of subtransactions).

* Modify the output plugin (pgoutput) to implement the new stream
API callbacks, by leveraging the extended replication protocol.

* Modify the replication apply worker, to properly handle streamed
in-progress transaction by spilling the data to disk and then
replaying them on commit.

We however must explicitly disable streaming replication during
replication slot creation, even if the plugin supports it. We
don't need to replicate the changes accumulated during this phase,
and moreover we don't have a replication connection open so we
don't have where to send the data anyway.

Author: Tomas Vondra, Dilip Kumar and Amit Kapila
Reviewed-by: Amit Kapila, Kuntal Ghosh and Ajin Cherian
Tested-by: Neha Sharma, Mahendra Singh Thalor and Ajin Cherian
Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
2020-09-03 07:54:07 +05:30
Alvaro Herrera 05c16b827f
Fix typo in comment
Introduced by 8b08f7d4820f; backpatch to 11.

Discussion: https://postgr.es/m/20200812214918.GA30353@alvherre.pgsql
2020-09-01 20:43:23 -04:00
Alvaro Herrera 2ba5b2db79
pg_dump: fix dependencies on FKs to partitioned tables
Parallel-restoring a foreign key that references a partitioned table
with several levels of partitions can fail:

pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 6684; 2606 29166 FK CONSTRAINT fk fk_a_fkey postgres
pg_restore: error: could not execute query: ERROR:  there is no unique constraint matching given keys for referenced table "pk"
Command was: ALTER TABLE fkpart3.fk
    ADD CONSTRAINT fk_a_fkey FOREIGN KEY (a) REFERENCES fkpart3.pk(a);

This happens in parallel restore mode because some index partitions
aren't yet attached to the topmost partitioned index that the FK uses,
and so the index is still invalid.  The current code marks the FK as
dependent on the first level of index-attach dump objects; the bug is
fixed by recursively marking the FK on their children.

Backpatch to 12, where FKs to partitioned tables were introduced.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/3170626.1594842723@sss.pgh.pa.us
Backpatch: 12-master
2020-08-14 17:33:31 -04:00
Noah Misch e078fb5d4e Move connect.h from fe_utils to src/include/common.
Any libpq client can use the header.  Clients include backend components
postgres_fdw, dblink, and logical replication apply worker.  Back-patch
to v10, because another fix needs this.  In released branches, just copy
the header and keep the original.
2020-08-10 09:22:54 -07:00
Tom Lane 1b9cde5124 Check for fseeko() failure in pg_dump's _tarAddFile().
Coverity pointed out, not unreasonably, that we checked fseeko's
result at every other call site but these.  Failure to seek in the
temp file (note this is NOT pg_dump's output file) seems quite
unlikely, and even if it did happen the file length cross-check
further down would probably detect the problem.  Still, that's a
poor excuse for not checking the result of a system call.
2020-08-09 12:39:07 -04:00
Tom Lane 9f9682783b Invent "amadjustmembers" AM method for validating opclass members.
This allows AM-specific knowledge to be applied during creation of
pg_amop and pg_amproc entries.  Specifically, the AM knows better than
core code which entries to consider as required or optional.  Giving
the latter entries the appropriate sort of dependency allows them to
be dropped without taking out the whole opclass or opfamily; which
is something we'd like to have to correct obsolescent entries in
extensions.

This callback also opens the door to performing AM-specific validity
checks during opclass creation, rather than hoping than an opclass
developer will remember to test with "amvalidate".  For the most part
I've not actually added any such checks yet; that can happen in a
follow-on patch.  (Note that we shouldn't remove any tests from
"amvalidate", as those are still needed to cross-check manually
constructed entries in the initdb data.  So adding tests to
"amadjustmembers" will be somewhat duplicative, but it seems like
a good idea anyway.)

Patch by me, reviewed by Alexander Korotkov, Hamid Akhtar, and
Anastasia Lubennikova.

Discussion: https://postgr.es/m/4578.1565195302@sss.pgh.pa.us
2020-08-01 17:12:47 -04:00
Tom Lane 9de77b5453 Allow logical replication to transfer data in binary format.
This patch adds a "binary" option to CREATE/ALTER SUBSCRIPTION.
When that's set, the publisher will send data using the data type's
typsend function if any, rather than typoutput.  This is generally
faster, if slightly less robust.

As committed, we won't try to transfer user-defined array or composite
types in binary, for fear that type OIDs won't match at the subscriber.
This might be changed later, but it seems like fit material for a
follow-on patch.

Dave Cramer, reviewed by Daniel Gustafsson, Petr Jelinek, and others;
adjusted some by me

Discussion: https://postgr.es/m/CADK3HH+R3xMn=8t3Ct+uD+qJ1KD=Hbif5NFMJ+d5DkoCzp6Vgw@mail.gmail.com
2020-07-18 12:44:51 -04:00
Tom Lane f009591d6e Cope with data-offset-less archive files during out-of-order restores.
pg_dump produces custom-format archive files that lack data offsets
when it is unable to seek its output.  Up to now that's been a hazard
for pg_restore.  But if pg_restore is able to seek in the archive
file, there is no reason to throw up our hands when asked to restore
data blocks out of order.  Instead, whenever we are searching for a
data block, record the locations of the blocks we passed over (that
is, fill in the missing data-offset fields in our in-memory copy of
the TOC data).  Then, when we hit a case that requires going
backwards, we can just seek back.

Also track the furthest point that we've searched to, and seek back
to there when beginning a search for a new data block.  This avoids
possible O(N^2) time consumption, by ensuring that each data block
is examined at most twice.  (On Unix systems, that's at most twice
per parallel-restore job; but since Windows uses threads here, the
threads can share block location knowledge, reducing the amount of
duplicated work.)

We can also improve the code a bit by using fseeko() to skip over
data blocks during the search.

This is all of some use even in simple restores, but it's really
significant for parallel pg_restore.  In that case, we require
seekability of the input already, and we will very probably need
to do out-of-order restores.

Back-patch to v12, as this fixes a regression introduced by commit
548e50976.  Before that, parallel restore avoided requesting
out-of-order restores, so it would work on a data-offset-less
archive.  Now it will again.

Ideally this patch would include some test coverage, but there are
other open bugs that need to be fixed before we can extend our
coverage of parallel restore very much.  Plan to revisit that later.

David Gilman and Tom Lane; reviewed by Justin Pryzby

Discussion: https://postgr.es/m/CALBH9DDuJ+scZc4MEvw5uO-=vRyR2=QF9+Yh=3hPEnKHWfS81A@mail.gmail.com
2020-07-17 13:04:05 -04:00