Hash partitioning on an enum is problematic because the hash codes are
derived from the OIDs assigned to the enum values, which will almost
certainly be different after a dump-and-reload than they were before.
This means that some rows probably end up in different partitions than
before, causing restore to fail because of partition constraint
violations. (pg_upgrade dodges this problem by using hacks to force
the enum values to keep the same OIDs, but that's not possible nor
desirable for pg_dump.)
Users can work around that by specifying --load-via-partition-root,
but since that's a dump-time not restore-time decision, one might
find out the need for it far too late. Instead, teach pg_dump to
apply that option automatically when dealing with a partitioned
table that has hash-on-enum partitioning.
Also deal with a pre-existing issue for --load-via-partition-root
mode: in a parallel restore, we try to TRUNCATE target tables just
before loading them, in order to enable some backend optimizations.
This is bad when using --load-via-partition-root because (a) we're
likely to suffer deadlocks from restore jobs trying to restore rows
into other partitions than they came from, and (b) if we miss getting
a deadlock we might still lose data due to a TRUNCATE removing rows
from some already-completed restore job.
The fix for this is conceptually simple: just don't TRUNCATE if we're
dealing with a --load-via-partition-root case. The tricky bit is for
pg_restore to identify those cases. In dumps using COPY commands we
can inspect each COPY command to see if it targets the nominal target
table or some ancestor. However, in dumps using INSERT commands it's
pretty impractical to examine the INSERTs in advance. To provide a
solution for that going forward, modify pg_dump to mark TABLE DATA
items that are using --load-via-partition-root with a comment.
(This change also responds to a complaint from Robert Haas that
the dump output for --load-via-partition-root is pretty confusing.)
pg_restore checks for the special comment as well as checking the
COPY command if present. This will fail to identify the combination
of --load-via-partition-root and --inserts in pre-existing dump files,
but that should be a pretty rare case in the field. If it does
happen you will probably get a deadlock failure that you can work
around by not using parallel restore, which is the same as before
this bug fix.
Having done this, there seems no remaining reason for the alarmism
in the pg_dump man page about combining --load-via-partition-root
with parallel restore, so remove that warning.
Patch by me; thanks to Julien Rouhaud for review. Back-patch to
v11 where hash partitioning was introduced.
Discussion: https://postgr.es/m/1376149.1675268279@sss.pgh.pa.us
64bit xids can't represent xids before epoch 0 (see also be504a3e97). When
FullTransactionIdFromXidAndCtx() was passed such an xid, it'd create a 64bit
xid far into the future. Noticed while adding assertions in the course of
investigating be504a3e97, as amcheck's test create such xids.
To fix the issue, just return FirstNormalFullTransactionId in this case. A
freshly initdb'd cluster already has a newer horizon. The most minimal version
of this would make the messages for some detected corruptions differently
inaccurate. To make those cases accurate, switch
FullTransactionIdFromXidAndCtx() to use the 32bit modulo difference between
xid and nextxid to compute the 64bit xid, yielding sensible "in the future" /
"in the past" answers.
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63bfn@awork3.anarazel.de
Backpatch: 14-, where heapam verification was introduced
Failing to do so results in an error when a pgbench script tries to
start a serializable transaction inside a pipeline, because by the time
BEGIN ISOLATION LEVEL SERIALIZABLE is executed, we're already in a
transaction that has acquired a snapshot, so the server rightfully
complains.
We can work around that by preparing all commands in the pipeline before
actually starting the pipeline. This changes the existing code in two
aspects: first, we now prepare each command individually at the point
where that command is about to be executed; previously, we would prepare
all commands in a script as soon as the first command of that script
would be executed. It's hard to see that this would make much of a
difference (particularly since it only affects the first time to execute
each script in a client), but I didn't actually try to measure it.
Secondly, we no longer use PQsendPrepare() in pipeline mode, but only
PQprepare. There's no specific reason for this change other than no
longer needing to do differently in pipeline mode. (Previously we had
no choice, because in pipeline mode PQprepare could not be used.)
Backpatch to 14, where pgbench got support for pipeline mode.
Reported-by: Yugo NAGATA <nagata@sraoss.co.jp>
Discussion: https://postgr.es/m/20210716153013.fc53b1c780b06fccc07a7f0d@sraoss.co.jp
Try to disable ASLR when building in EXEC_BACKEND mode, to avoid random
memory mapping failures while testing. For developer use only, no
effect on regular builds.
This has been originally applied as of f3e7806 for v15~, but
recently-added buildfarm member gokiburi tests this configuration on
older branches as well, causing it to fail randomly as ASLR would be
enabled.
Suggested-by: Andres Freund <andres@anarazel.de>
Tested-by: Bossart, Nathan <bossartn@amazon.com>
Discussion: https://postgr.es/m/20210806032944.m4tz7j2w47mant26%40alap3.anarazel.de
Backpatch-through: 12
No buildfarm members have reported that yet, but a recently-refreshed
Debian host did.
Reviewed-by: Andrew Dunstan
Discussion: https://postgr.es/m/Y8ey5z4Nav62g4/K@paquier.xyz
Backpatch-through: 11
To test pg_upgrade across major PG versions, we have to be able to
modify or drop any old objects with no-longer-supported properties,
and we have to be able to deal with cosmetic changes in pg_dump output.
Up to now, the buildfarm and pg_upgrade's own test infrastructure had
separate implementations of the former, and we had nothing but very
ad-hoc rules for the latter (including an arbitrary threshold on how
many lines of unchecked diff were okay!). This patch creates a Perl
module that can be shared by both those use-cases, and adds logic
that deals with pg_dump output diffs in a much more tightly defined
fashion.
This largely supersedes previous efforts in commits 0df9641d3,
9814ff550, and 62be9e4cd, which developed a SQL-script-based solution
for the task of dropping old objects. There was nothing fundamentally
wrong with that work in itself, but it had no basis for solving the
output-formatting problem. The most plausible way to deal with
formatting is to build a Perl module that can perform editing on the
dump files; and once we commit to that, it makes more sense for the
same module to also embed the knowledge of what has to be done for
dropping old objects.
Back-patch versions of the helper module as far as 9.2, to
support buildfarm animals that still test that far back.
It's also necessary to back-patch PostgreSQL/Version.pm,
because the new code depends on that. I fixed up pg_upgrade's
002_pg_upgrade.pl in v15, but did not look into back-patching
it further than that.
Tom Lane and Andrew Dunstan
Discussion: https://postgr.es/m/891521.1673657296@sss.pgh.pa.us
The ALTER DATABASE|FUNCTION|PROCEDURE|ROLE|ROUTINE|USER ... SET <name>
case in psql tab completion failed to exclude <name> = "SCHEMA", which
caused ALTER FUNCTION|PROCEDURE|ROUTINE ... SET SCHEMA to complete
with "FROM CURRENT" and "TO", which won't work.
Fix that, so that those cases now complete with the list of schemas,
like other ALTER ... SET SCHEMA commands.
Noticed while testing the recent patch to improve tab completion for
ALTER FUNCTION/PROCEDURE/ROUTINE, but this is not directly related to
that patch. Rather, this is a long-standing bug, so back-patch to all
supported branches.
Discussion: https://postgr.es/m/CALDaNm0s7GQmkLP_mx5Cvk=UzYMnjhPmXBxU8DsHEunFbC5sTg@mail.gmail.com
While on it, newlines are removed from the end of two elog() strings.
The others are simple grammar mistakes. One comment in pg_upgrade
referred incorrectly to sequences since a7e5457.
Author: Justin Pryzby
Discussion: https://postgr.es/m/20221230231257.GI1153@telsasoft.com
Backpatch-through: 11
The query used to disable WITH OIDS in all the relations making use of
it was checking for materialized views, but this is not a supported
operation. On the contrary, this needs to be done on foreign tables.
While on it, use quote_ident() in the ALTER TABLE strings built on the
relation name.
Author: Anton A. Melnikov, Michael Paquier
Discussion: https://postgr.es/m/49f389ba-95ce-8a9b-09ae-f60650c0e7c7@inbox.ru
Backpatch-through: 12
Some options of these commands need to be able to identify the start
of the function body within the output of pg_get_functiondef().
It used to be that that always began with "AS", but since the
introduction of new-style SQL functions, it might also start with
"BEGIN" or "RETURN". Fix that on the psql side, and add some
regression tests.
Noted by me awhile ago, but I didn't do anything about it.
Thanks to David Johnston for a nag.
Discussion: https://postgr.es/m/AM9PR01MB8268D5CDABDF044EE9F42173FE8C9@AM9PR01MB8268.eurprd01.prod.exchangelabs.com
Specifically, when pg_basebackup is invoked with -Tx=y, don't error
out if x could plausibly be an absolute path either on Windows or on
non-Windows systems. We don't know whether the remote system is
running the same OS as the local system, so it's not appropriate to
assume that our local rule about absolute pathnames is the same as
the rule on the remote system.
Patch by me, reviewed by Tom Lane, Andrew Dunstan, and
Davinder Singh.
Discussion: http://postgr.es/m/CA+TgmoY+jC3YiskomvYKDPK3FbrmsDU7_8+wMHt02HOdJeRb0g@mail.gmail.com
clang 15+ will issue a set-but-not-used warning when the only
use of a variable is in autoincrements (e.g., "foo++;").
That's perfectly sensible, but it detects a few more cases that
we'd not noticed before. Silence the warnings with our usual
methods, such as PG_USED_FOR_ASSERTS_ONLY, or in one case by
actually removing a useless variable.
One thing that we can't nicely get rid of is that with %pure-parser,
Bison emits "yynerrs" as a local variable that falls foul of this
warning. To silence those, I inserted "(void) yynerrs;" in the
top-level productions of affected grammars.
Per recently-established project policy, this is a candidate
for back-patching into out-of-support branches: it suppresses
annoying compiler warnings but changes no behavior. Hence,
back-patch to 9.5, which is as far as these patches go without
issues. (A preliminary check shows that the prior branches
need some other set-but-not-used cleanups too, so I'll leave
them for another day.)
Discussion: https://postgr.es/m/514615.1663615243@sss.pgh.pa.us
The zlib documentation mentions the values supported for the compression
strategy, but this code has been using a hardcoded value of 0 rather
than Z_DEFAULT_STRATEGY. This commit adjusts the code to use
Z_DEFAULT_STRATEGY.
Backpatch down to where this code has been added to ease the backport of
any future patch touching this area.
Reported-by: Tom Lane
Discussion: https://postgr.es/m/1400032.1662217889@sss.pgh.pa.us
Backpatch-through: 10
96ef3b8ff1 accidentally copied a not
applicable comment from the float8_pass_by_value code to the
data_checksums code. Remove that.
87d3b35a1c changed pg_upgrade to
checking the checksum version rather than just the Boolean presence of
checksums, but didn't change the field type in its ControlData struct
from bool. So this would not work correctly if there ever is a
checksum version larger than 1.
errno was not reported correctly after attempting to clone a file,
leading to incorrect error reports. While scanning through the code, I
have not noticed any similar mistakes.
Error introduced in 3a769d8.
Author: Justin Pryzby
Discussion: https://postgr.es/m/20220731134135.GY15006@telsasoft.com
Backpatch-through: 12
When you hit ^C, the terminal driver in Unix-like systems echoes "^C" as
well as sending an interrupt signal (depending on stty settings). At
least libedit (but maybe also libreadline) is then confused about the
current cursor location, and corrupts the display if you try to scroll
back. Fix, by moving to a new line before the next prompt is displayed.
Back-patch to all supported released.
Author: Pavel Stehule <pavel.stehule@gmail.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3278793.1626198638%40sss.pgh.pa.us
When we changed some built-in functions to use anycompatiblearray
instead of anyarray, we created a dump/restore hazard for user-defined
operators and aggregates relying on those functions: the user objects
have to be modified to change their signatures similarly. This causes
pg_upgrade to fail partway through if the source installation contains
such objects. We generally try to have pg_upgrade detect such hazards
and fail before it does anything exciting, so add logic to detect
this case too.
Back-patch to v14 where the change was made.
Justin Pryzby, reviewed by Andrey Borodin
Discussion: https://postgr.es/m/3383880.QJadu78ljV@vejsadalnx
This reverts commits a04ccf6df et al. in the back branches only.
There was some disagreement already over whether to back-patch
157f8739a, on the grounds that it is the sort of behavioral
change that we don't like to back-patch. Furthermore, it now
looks like the logic needs some more work, which we don't have
time for before the upcoming 14.4 release. Revert for now, and
perhaps reconsider later.
Discussion: https://postgr.es/m/17504-76b68018e130415e@postgresql.org
An error PGresult generated by libpq itself, such as a report of
connection loss, won't have broken-down error fields.
should_processing_continue() blithely assumed that
PG_DIAG_SEVERITY_NONLOCALIZED would always be present, and would
dump core if it wasn't.
Per grepping to see if 6d157e7cb's mistake was repeated elsewhere.
psql --single-transaction is able to handle multiple -c and -f switches
in a single transaction since d5563d7d, but this had the surprising
behavior of forcing a transaction COMMIT even if psql failed with an
error in the client (for example incorrect path given to \copy), which
would generate an error, but still commit any changes that were already
applied in the backend. This commit makes the behavior more consistent,
by enforcing a transaction ROLLBACK if any commands fail, both
client-side and backend-side, so as no changes are applied if one error
happens in any of them.
Some tests are added on HEAD to provide some coverage about all that.
Backend-side errors are unreliable as IPC::Run can complain on SIGPIPE
if psql quits before reading a query result, but that should work
properly in the case where any errors come from psql itself, which is
what the original report is about.
Reported-by: Christoph Berg
Author: Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/17504-76b68018e130415e@postgresql.org
Backpatch-through: 10
psql, pg_dump, and pg_amcheck share code to process object name
patterns like 'foo*.bar*' to match all tables with names starting in
'bar' that are in schemas starting with 'foo'. Before v14, any number
of extra name parts were silently ignored, so a command line '\d
foo.bar.baz.bletch.quux' was interpreted as '\d bletch.quux'. In v14,
as a result of commit 2c8726c4b0, we
instead treated this as a request for table quux in a schema named
'foo.bar.baz.bletch'. That caused problems for people like Justin
Pryzby who were accustomed to copying strings of the form
db.schema.table from messages generated by PostgreSQL itself and using
them as arguments to \d.
Accordingly, revise things so that if an object name pattern contains
more parts than we're expecting, we throw an error, unless there's
exactly one extra part and it matches the current database name.
That way, thisdb.myschema.mytable is accepted as meaning just
myschema.mytable, but otherdb.myschema.mytable is an error, and so
is some.random.garbage.myschema.mytable.
Mark Dilger, per report from Justin Pryzby and discussion among
various people.
Discussion: https://www.postgresql.org/message-id/20211013165426.GD27491%40telsasoft.com
After closedir() dirent->d_name is not valid anymore. As there alerady are a
few places relying on the limited lifetime of pg_waldump, do so here as well,
and just pg_strdup() the string.
The bug was introduced in fc49e24fa6.
Found by UBSan, run locally.
Backpatch: 11-, like fc49e24fa6 itself.
The previous method for doing that was to write zeroes into a
predetermined set of page locations. However, there's a roughly
1-in-64K chance that the existing checksum will match by chance,
and yesterday several buildfarm animals started to reproducibly
see that, resulting in test failures because no checksum mismatch
was reported.
Since the checksum includes the page LSN, test success depends on
the length of the installation's WAL history, which is affected by
(at least) the initial catalog contents, the set of locales installed
on the system, and the length of the pathname of the test directory.
Sooner or later we were going to hit a chance match, and today is
that day.
Harden these tests by specifically inverting the checksum field and
leaving all else alone, thereby guaranteeing that the checksum is
incorrect.
In passing, fix places that were using seek() to set up for syswrite(),
a combination that the Perl docs very explicitly warn against. We've
probably escaped problems because no regular buffered I/O is done on
these filehandles; but if it ever breaks, we wouldn't deserve or get
much sympathy.
Although we've only seen problems in HEAD, now that we recognize the
environmental dependencies it seems like it might be just a matter
of time until someone manages to hit this in back-branch testing.
Hence, back-patch to v11 where we started doing this kind of test.
Discussion: https://postgr.es/m/3192026.1648185780@sss.pgh.pa.us
It seems possible for the condition being tested to be true in
production, and nobody would never know (except when some data
eventually becomes corrupt?).
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m//202109040001.zky3wgv2qeqg@alvherre.pgsql
When cross-building to windows, or building with mingw on windows, the build
could fail with
x86_64-w64-mingw32-gcc: error: win32ver.o: No such file or director
because pg_dumpall didn't depend on WIN32RES, but it's recipe references
it. The build nevertheless succeeded most of the time, due to
pg_dump/pg_restore having the required dependency, causing win32ver.o to be
built.
Reported-By: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA+hUKGJeekpUPWW6yCVdf9=oBAcCp86RrBivo4Y4cwazAzGPng@mail.gmail.com
Backpatch: 10-, omission present on all live branches
Slow hosts may avoid load-induced, spurious failures by setting
environment variable PG_TEST_TIMEOUT_DEFAULT to some number of seconds
greater than 180. Developers may see faster failures by setting that
environment variable to some lesser number of seconds. In tests, write
$PostgreSQL::Test::Utils::timeout_default wherever the convention has
been to write 180. This change raises the default for some briefer
timeouts. Back-patch to v10 (all supported versions).
Discussion: https://postgr.es/m/20220218052842.GA3627003@rfd.leadboat.com
When opening a WAL file smaller than XLOG_BLCKSZ (e.g. 0 bytes long) while
determining the wal_segment_size, pg_waldump checked errno, despite errno not
being set by the short read. Resulting in a bogus error message.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220214.181847.775024684568733277.horikyota.ntt@gmail.com
Backpatch: 11-, the bug was introducedin fc49e24fa
Following migration of Windows buildfarm members running TAP tests to
use of ucrt64 perl for those tests, special processing for msys perl is
no longer necessary and so is removed.
Backpatch to release 10
Discussion: https://postgr.es/m/c65a8781-77ac-ea95-d185-6db291e1baeb@dunslane.net
"pg_ctl stop/restart" checked that the postmaster PID is valid just
once, as a side-effect of sending the stop signal, and then would
wait-till-timeout for the postmaster.pid file to go away. This
neglects the case wherein the postmaster dies uncleanly after we
signal it. Similarly, once "pg_ctl promote" has sent the signal,
it'd wait for the corresponding on-disk state change to occur
even if the postmaster dies.
I'm not sure how we've managed not to notice this problem, but it
seems to explain slow execution of the 017_shm.pl test script on AIX
since commit 4fdbf9af5, which added a speculative "pg_ctl stop" with
the idea of making real sure that the postmaster isn't there. In the
test steps that kill-9 and then restart the postmaster, it's possible
to get past the initial signal attempt before kill() stops working
for the doomed postmaster. If that happens, pg_ctl waited till
PGCTLTIMEOUT before giving up ... and the buildfarm's AIX members
have that set very high.
To fix, include a "kill(pid, 0)" test (similar to what
postmaster_is_alive uses) in these wait loops, so that we'll
give up immediately if the postmaster PID disappears.
While here, I chose to refactor those loops out of where they were.
do_stop() and do_restart() can perfectly well share one copy of the
wait-for-stop loop, and it seems desirable to put a similar function
beside that for wait-for-promote.
Back-patch to all supported versions, since pg_ctl's wait logic
is substantially identical in all, and we're seeing the slow test
behavior in all branches.
Discussion: https://postgr.es/m/20220210023537.GA3222837@rfd.leadboat.com
Commit 6df7a9698 wrote appendPQExpBuffer where it should have
written printfPQExpBuffer. This resulted in re-issuing the
previous query along with the desired one, which very accidentally
had no negative consequences except for some wasted cycles.
Back-patch to v14 where that came in.
Discussion: https://postgr.es/m/1714711.1642962663@sss.pgh.pa.us
The original coding (from c33869cc3) failed with "more than one row
returned by a subquery used as an expression" if there were unrelated
triggers of the same tgname on parent partitioned tables. (That's
possible because statement-level triggers don't get inherited.) Fix
by applying LIMIT 1 after sorting the candidates by inheritance level.
Also, wrap the subquery in a CASE so that we don't have to execute it at
all when the trigger is visibly non-inherited. Aside from saving some
cycles, this avoids the need for a confusing and undocumented NULLIF().
While here, tweak the format of the emitted query to look a bit
nicer for "psql -E", and add some explanation of this subquery,
because it badly needs it.
Report and patch by Justin Pryzby (with some editing by me).
Back-patch to v13 where the faulty code came in.
Discussion: https://postgr.es/m/20211217154356.GJ17618@telsasoft.com
Since enum labels have to be single-quoted, this part of the
tab completion machinery got side-swiped by commit cd69ec66c.
A side-effect of that commit is that (at least with some versions
of Readline) the text string passed for completion will omit the
leading quote mark of the enum label literal. Libedit still acts
the same as before, though, so adapt COMPLETE_WITH_ENUM_VALUE so
that it can cope with either convention.
Also, when we fail to find any valid completion, set
rl_completion_suppress_quote = 1. Otherwise readline will
go ahead and append a closing quote, which is unwanted.
Per report from Peter Eisentraut. Back-patch to v13 where
cd69ec66c came in.
Discussion: https://postgr.es/m/8ca82d89-ec3d-8b28-8291-500efaf23b25@enterprisedb.com
This addresses some problems in the describe queries used for extended
statistics:
- Two schema qualifications for the text type were missing for \dX.
- The list of extended statistics listed for a table through \d was
ordered based on the object OIDs, but it is more consistent with the
other commands to order by namespace and then by object name.
- A couple of aliases were not used in \d. These are removed.
This is similar to commits 1f092a3 and 07f8a9e.
Author: Justin Pryzby
Discussion: https://postgr.es/m/20220107022235.GA14051@telsasoft.com
Backpatch-through: 14
The existing pg_upgrade/test.sh and the buildfarm code have been holding
the same set of SQL queries when doing cross-version upgrade tests to
adapt the objects created by the regression tests before the upgrade
(mostly, incompatible or non-existing objects need to be dropped from
the origin, perhaps re-created).
This moves all those SQL queries into a new, separate, file with a set
of \if clauses to handle the version checks depending on the old version
of the cluster to-be-upgraded.
The long-term plan is to make the buildfarm code re-use this new SQL
file, so as committers are able to fix any compatibility issues in the
tests of pg_upgrade with a refresh of the core code, without having to
poke at the buildfarm client. Note that this is only able to handle the
main regression test suite, and that nothing is done yet for contrib
modules yet (these have more issues like their database names).
A backpatch down to 10 is done, adapting the version checks as this
script needs to be only backward-compatible, so as it becomes possible
to clean up a maximum amount of code within the buildfarm client.
Author: Justin Pryzby, Michael Paquier
Discussion: https://postgr.es/m/20201206180248.GI24052@telsasoft.com
Backpatch-through: 10