The previous behavior of preferring the oldest match had the advantage
of not breaking existing scripts when we add a conflicting format name;
but that behavior was undocumented and fragile (it seems just luck that
commit add9182e5 didn't break it). Let's go over to the less mistake-
prone approach of complaining when there are multiple matches.
Since this is a small compatibility break, no back-patch.
Daniel Vérité
Discussion: https://postgr.es/m/cb7e1caf-3ea6-450d-af28-f524903a030c@manitou-mail.org
A table with OIDs that was the first in the dump output would not get
dumped with OIDs enabled. Fix that.
The reason was that the currWithOids flag was declared to be bool but
actually also takes a -1 value for "don't know yet". But under
stdbool.h semantics, that is coerced to true, so the required SET
default_with_oids command is not output again. Change the variable
type to char to fix that.
Reported-by: Derek Nelson <derek@pipelinedb.com>
Add another transfer mode --clone to pg_upgrade (besides the existing
--link and the default copy), using special file cloning calls. This
makes the file transfer faster and more space efficient, achieving
speed similar to --link mode without the associated drawbacks.
On Linux, file cloning is supported on Btrfs and XFS (if formatted with
reflink support). On macOS, file cloning is supported on APFS.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
The installcheck run takes a sizable fraction of test.sh to run. Using
a parallel schedule reduces that noticably.
It's possible that we want to backpatch this at some point, to reduce
buildfarm times, but for now lets just see if this upsets the
buildfarm.
Author: Andres Freund
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
This moves one check for conflicting options from the archive restore
code to the main function where other similar checks are performed.
Also reword the error message to be consistent with other messages.
The only option combination impacted is --create specified with
--single-transaction, and informing the caller at an early step saves
from opening the archive worked on. A TAP test is added for this
combination.
Author: Daniel Gustafsson
Reviewed-by: Fabien Coelho
Discussion: https://postgr.es/m/616808BD-4B59-4E6C-97A9-7317F62D5570@yesql.se
This adds tab completion of the clauses WHEN and EXECUTE
FUNCTION|PROCEDURE clauses to CREATE EVENT TRIGGER, similar to CREATE
TRIGGER in the previous commit. This has version-dependent logic so as
FUNCTION is chosen over PROCEDURE for 11 and newer versions.
Author: Dagfinn Ilmari Mannsåker
Reviewed-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/d8jmur4q4yc.fsf@dalvik.ping.uio.no
The change to accept EXECUTE FUNCTION as well as EXECUTE PROCEDURE in
CREATE TRIGGER (added by 0a63f99) forgot to tell psql's tab completion
system about this. In passing, add tab completion of EXECUTE
FUNCTION/PROCEDURE after a complete WHEN ( … ) clause.
This change is version-aware, with FUNCTION being selected automatically
instead of PROCEDURE depending on the backend version, PROCEDURE being
an historical grammar kept for compatibility and considered as
deprecated in v11.
Author: Dagfinn Ilmari Mannsåker
Reviewed-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/d8jmur4q4yc.fsf@dalvik.ping.uio.no
The pg_verify_checksums test tries to create files with corrupt data
named "123." and "123_." But on Windows a file name with a trailing dot
is the same as a file without the trailing dot. In the first case this
will create a file with a "valid" name, which causes the test to fail in
an unexpected way, and in the secongd case this will be redandant as the
test already creates a file named "123_".
Bug discovered by buildfarm animal bowerbird.
PQnotifies() is defined to just process already-read data, not try to read
any more from the socket. (This is a debatable decision, perhaps, but I'm
hesitant to change longstanding library behavior.) The documentation has
long recommended calling PQconsumeInput() before PQnotifies() to ensure
that any already-arrived message would get absorbed and processed.
However, psql did not get that memo, which explains why it's not very
reliable about reporting notifications promptly.
Also, most (not quite all) callers called PQconsumeInput() just once before
a PQnotifies() loop. Taking this recommendation seriously implies that we
should do PQconsumeInput() before each call. This is more important now
that we have "payload" strings in notification messages than it was before;
that increases the probability of having more than one packet's worth
of notify messages. Hence, adjust code as well as documentation examples
to do it like that.
Back-patch to 9.5 to match related server fixes. In principle we could
probably go back further with these changes, but given lack of field
complaints I doubt it's worthwhile.
Discussion: https://postgr.es/m/CAOYf6ec-TmRYjKBXLLaGaB-jrd=mjG1Hzn1a1wufUAR39PQYhw@mail.gmail.com
The original implementation of pg_verify_checksums used a blacklist to
decide which files should be skipped for scanning as they do not include
data checksums, like pg_internal.init or pg_control. However, this
missed two things:
- Some files are created within builds of EXEC_BACKEND and these were
not listed, causing failures on Windows.
- Extensions may create custom files in data folders, causing the tool
to equally fail.
This commit switches to a whitelist-like method instead by checking if
the files to scan are authorized relation files. This is close to a
reverse-engineering of what is defined in relpath.c in charge of
building the relation paths, and we could consider refactoring what this
patch does so as all routines are in a single place. This is left for
later.
This is based on a suggestion from Andres Freund. TAP tests are updated
so as multiple file patterns are tested. The bug has been spotted by
various buildfarm members as a result of b34e84f which has introduced
the TAP tests of pg_verify_checksums.
Author: Michael Paquier
Reviewed-by: Andrew Dunstan, Michael Banck
Discussion: https://postgr.es/m/20181012005614.GC26424@paquier.xyz
Backpatch-through: 11
When an error occurs during a benchmark run, exit with a nonzero exit
code and write a message at the end. Previously, it would just print
the error message when it happened but then proceed to print the run
summary normally and exit with status 0. To still allow
distinguishing setup from run-time errors, we use exit status 2 for
the new state, whereas existing errors during pgbench initialization
use exit status 1.
Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
All options available in the utility get coverage:
- Tests with disabled page checksums.
- Tests with enabled test checksums.
- Emulation of corruption and broken checksums with a full scan and
single relfilenode scan.
This patch has been contributed mainly by Michael Banck and Magnus
Hagander with things presented on various threads, and I have gathered
all the contents into a single patch.
Author: Michael Banck, Magnus Hagander, Michael Paquier
Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/20181005012645.GE1629@paquier.xyz
Historically we forbade datatype-specific comparison functions from
returning INT_MIN, so that it would be safe to invert the sort order
just by negating the comparison result. However, this was never
really safe for comparison functions that directly return the result
of memcmp(), strcmp(), etc, as POSIX doesn't place any such restriction
on those library functions. Buildfarm results show that at least on
recent Linux on s390x, memcmp() actually does return INT_MIN sometimes,
causing sort failures.
The agreed-on answer is to remove this restriction and fix relevant
call sites to not make such an assumption; code such as "res = -res"
should be replaced by "INVERT_COMPARE_RESULT(res)". The same is needed
in a few places that just directly negated the result of memcmp or
strcmp.
To help find places having this problem, I've also added a compile option
to nbtcompare.c that causes some of the commonly used comparators to
return INT_MIN/INT_MAX instead of their usual -1/+1. It'd likely be
a good idea to have at least one buildfarm member running with
"-DSTRESS_SORT_INT_MIN". That's far from a complete test of course,
but it should help to prevent fresh introductions of such bugs.
This is a longstanding portability hazard, so back-patch to all supported
branches.
Discussion: https://postgr.es/m/20180928185215.ffoq2xrq5d3pafna@alap3.anarazel.de
I started out with the idea that we needed to detect use of %m format specs
in contexts other than elog/ereport calls, because we couldn't rely on that
working in *printf calls. But a better answer is to fix things so that it
does work. Now that we're using snprintf.c all the time, we can implement
%m in that and we've fixed the problem.
This requires also adjusting our various printf-wrapping functions so that
they ensure "errno" is preserved when they call snprintf.c.
Remove elog.c's handmade implementation of %m, and let it rely on
snprintf to support the feature. That should provide some performance
gain, though I've not attempted to measure it.
There are a lot of places where we could now simplify 'printf("%s",
strerror(errno))' into 'printf("%m")', but I'm not in any big hurry
to make that happen.
Patch by me, reviewed by Michael Paquier
Discussion: https://postgr.es/m/2975.1526862605@sss.pgh.pa.us
It's fairly silly to truncate the throttle_delay to integer when the only
math we ever do with it requires converting back to double. Furthermore,
given that people are starting to complain about restrictions like only
supporting 1K client connections, I don't think we're very far away from
situations where the precision loss matters. As the code stood, for
example, there's no difference between --rate 100001 and --rate 111111;
both get converted to throttle_delay = 9. Somebody trying to run 100
threads and have each one dispatch around 1K TPS would find this lack of
precision rather surprising, especially since the required per-thread
delays are around 1ms, well within the timing precision of modern systems.
96e1cb4 has added support for --no-publications in pg_dump, pg_dumpall
and pg_restore, but forgot the fact that publication tables also need to
be ignored when this option is used.
Author: Gilles Darold
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/3f48e812-b0fa-388e-2043-9a176bdee27e@dalibo.com
Backpatch-through: 10, where publications have been added.
We haven't touched these since text search functionality landed in core
in 2007 :-(. While the upstream project isn't a beehive of activity,
they do make additions and bug fixes from time to time. Update our
copies of these files.
Also update our documentation about how to keep things in sync, since
they're not making distribution tarballs these days. Fortunately,
their source code turns out to be a breeze to build.
Notable changes:
* The non-UTF8 version of the hungarian stemmer now works in LATIN2
not LATIN1.
* New stemmers have appeared for arabic, indonesian, irish, lithuanian,
nepali, and tamil. These all work in UTF8, and the indonesian and
irish ones also work in LATIN1.
(There are some new stemmers that I did not incorporate, mainly because
their names don't match the underlying languages, suggesting that they're
not to be considered mainstream.)
Worth noting: the upstream Nepali dictionary was contributed by
Arthur Zakirov.
initdb forced because the contents of snowball_create.sql have
changed.
Still TODO: see about updating the stopword lists.
Arthur Zakirov, minor mods and doc work by me
Discussion: https://postgr.es/m/20180626122025.GA12647@zakirov.localdomain
Discussion: https://postgr.es/m/20180219140849.GA9050@zakirov.localdomain
Previously, pgbench always used select(2) for this purpose, but that's
problematic for very high client counts, because select() can't deal
with file descriptor numbers larger than FD_SETSIZE. It's pretty common
for that to be only 1024 or so, whereas modern OSes can allow many more
open files than that. Using poll(2) would surmount that problem, but it
creates another one: poll()'s timeout resolution is only 1ms, which is
poor enough to cause problems with --rate specifications approaching or
exceeding 1K TPS.
On platforms that have ppoll(2), which includes Linux and recent
FreeBSD, we can use that to avoid the FD_SETSIZE problem without any
loss of timeout resolution. Hence, add configure logic to test for
ppoll(), and use it if available.
This patch introduces an abstraction layer into pgbench that could
be extended to support other kernel event-wait APIs such as kevents.
But actually adding such support is a matter for some future patch.
Doug Rady, reviewed by Robert Haas and Fabien Coelho, and whacked around
a good bit more by me
Discussion: https://postgr.es/m/23D017C9-81B7-484D-8490-FD94DEC4DF59@amazon.com
This removes a difference between the standard IsUnderPostmaster
execution environment and that of --boot and --single. In a stand-alone
backend, "SELECT random()" always started at the same seed.
On a system capable of using posix shared memory, initdb could still
conclude "selecting dynamic shared memory implementation ... sysv".
Crashed --boot or --single postgres processes orphaned shared memory
objects having names that collided with the not-actually-random names
that initdb probed. The sysv fallback appeared after ten crashes of
--boot or --single postgres. Since --boot and --single are rare in
production use, systems used for PostgreSQL development are the
principal candidate to notice this symptom.
Back-patch to 9.3 (all supported versions). PostgreSQL 9.4 introduced
dynamic shared memory, but 9.3 does share the "SELECT random()" problem.
Reviewed by Tom Lane and Kyotaro HORIGUCHI.
Discussion: https://postgr.es/m/20180915221546.GA3159382@rfd.leadboat.com
This replaces the "TailMatchesN" macros with just "TailMatches",
and likewise "HeadMatchesN" becomes "HeadMatches" and "MatchesN"
becomes "Matches". The various COMPLETE_WITH_LISTn macros are
reduced to COMPLETE_WITH, and the single-item COMPLETE_WITH_CONST
also gets folded into that. This eliminates a lot of minor
annoyance in writing tab-completion rules. Usefully, the compiled
code also gets a bit smaller (10% or so, on my machine).
The implementation depends on variadic macros, so we couldn't have
done this before we required C99.
Andres Freund and Thomas Munro; some cosmetic cleanup by me.
Discussion: https://postgr.es/m/d8jo9djvm7h.fsf@dalvik.ping.uio.no
You can't use "FOR TABLE" as a single Matches argument, because readline
will consider that input to be two words not one. It's necessary to make
the pattern contain two arguments.
The case accidentally worked anyway because the words_after_create
test fired ... but only for the first such table name.
Noted by Edmund Horner, though this isn't exactly his proposed fix.
Backpatch to v10 where the faulty code came in.
Discussion: https://postgr.es/m/CAMyN-kDe=gBmHgxWwUUaXuwK+p+7g1vChR7foPHRDLE592nJPQ@mail.gmail.com
Previously, we made no attempt to provide tab completion in these
statements' optional parenthesized options lists. This patch teaches
psql to do so.
To prevent the option completions from being offered after we've already
seen a complete parenthesized option list, it's necessary to improve
word_matches() so that it allows a wildcard '*' in the middle of an
alternative, not only at the end as formerly. That requires only a
little more code than before, and it allows us to test for "incomplete
parenthesized options" with a test like
else if (HeadMatches2("EXPLAIN", "(*") &&
!HeadMatches2("EXPLAIN", "(*)"))
In addition, add some logic to offer column names in the context of
"ANALYZE tablename ( ...", and likewise for VACUUM. This isn't real
complete; it won't offer column names again after a comma. But it's
better than before, and it doesn't take much code.
Justin Pryzby, reviewed at various times by Álvaro Herrera, Arthur
Zakirov, and Edmund Horner; some additional fixups by me
Discussion: https://postgr.es/m/20180529000623.GA21896@telsasoft.com
The previous convention was to use names based on the set of relkinds being
selected for, which was not at all helpful for maintenance, especially
since people had been quite inconsistent about whether to change the names
when they changed the relkinds being selected for. Instead, use names
based on the functionality we need the relation to have, following the
model established by Query_for_list_of_updatables.
While at it, sort the list of Query constants a bit better; it had the
distinct air of code-assembled-by-dartboard before.
Discussion: https://postgr.es/m/14830.1537481254@sss.pgh.pa.us
This should offer the same relation types that SELECT ... FROM would.
You can't select from an index for instance, so offering it here is
unhelpful. Noted while testing ilmari's recent patch.
We have the infrastructure to offer a list of tablespace names, but
it wasn't being used here; instead you got "FROM", "CURRENT", and "TO"
which aren't actually legal in this syntax.
Dagfinn Ilmari Mannsåker, reviewed by Arthur Zakirov
Discussion: https://postgr.es/m/d8jo9djvm7h.fsf@dalvik.ping.uio.no
The pref_non_data heuristic has been dead code for nearly ten years,
and as far as I can tell was dead code even when it was first committed.
I'm tired of silencing Coverity complaints about it, so get rid of it.
If anyone is ever interested in pursuing the concept, they can get the
code out of our git history.
Our general practice in frontend code is to accept input with either
Unix-style newlines (\n) or DOS-style (\r\n). pgbench was mostly down
with that, but its rule for line continuations (backslash-newline) was
not. This had been masked on Windows buildfarm machines before commit
0ba06e0bf by use of Windows text mode to read files. We could have fixed
it by forcing text mode again, but it's better to fix the parsing code
so that Windows-style text files on Unix systems don't cause problems.
Back-patch to v10 where pgbench grew line continuations.
Discussion: https://postgr.es/m/17194.1537191697@sss.pgh.pa.us
Previously, the way this worked was that a parallel pg_dump would
re-order the TABLE_DATA items in the dump's TOC into decreasing size
order, and separately re-order (some of) the INDEX items into decreasing
size order. Then pg_dump would dump the items in that order. Later,
parallel pg_restore just followed the TOC order. This method had lots
of deficiencies:
* TOC ordering randomly differed between parallel and non-parallel
dumps, and was hard to predict in the former case, causing problems
for building stable pg_dump test cases.
* Parallel restore only followed a well-chosen order if the dump had
been done in parallel; in particular, this never happened for restore
from custom-format dumps.
* The best order for restore isn't necessarily the same as for dump,
and it's not really static either because of locking considerations.
* TABLE_DATA and INDEX items aren't the only things that might take a lot
of work during restore. Scheduling was particularly stupid for the BLOBS
item, which might require lots of work during dump as well as restore,
but was left to the end in either case.
This patch removes the logic that changed the TOC order, fixing the
test instability problem. Instead, we sort the parallelizable items
just before processing them during a parallel dump. Independently
of that, parallel restore prioritizes the ready-to-execute tasks
based on the size of the underlying table. In the case of dependent
tasks such as index, constraint, or foreign key creation, the largest
relevant table is used as the metric for estimating the task length.
(This is pretty crude, but it should be enough to avoid the case we
want to avoid, which is ending the run with just a few large tasks
such that we can't make use of all N workers.)
Patch by me, responding to a complaint from Peter Eisentraut,
who also reviewed the patch.
Discussion: https://postgr.es/m/5137fe12-d0a2-4971-61b6-eb4e7e8875f8@2ndquadrant.com
PostgreSQL uses a custom wrapper for open() and fopen() which is
concurrent-safe, allowing multiple processes to open and work on the
same file. This has a couple of advantages:
- pg_test_fsync does not handle O_DSYNC correctly otherwise, leading to
false claims that disks are unsafe.
- TAP tests can run into race conditions when a postmaster and pg_ctl
open postmaster.pid, fixing some random failures in the buildfam.
pg_upgrade is one frontend tool using workarounds to bypass file locking
issues with the log files it generates, however the interactions with
pg_ctl are proving to be tedious to get rid of, so this is left for
later.
Author: Laurenz Albe
Reviewed-by: Michael Paquier, Kuntal Ghosh
Discussion: https://postgr.es/m/1527846213.2475.31.camel@cybertec.at
Discussion: https://postgr.es/m/16922.1520722108@sss.pgh.pa.us
Fix one untranslatable string concatenation in pg_rewind.
Fix one message in pg_verify_checksums to use a style use elsewhere
and avoid plural issues.
Fix one gratuitous abbreviation in psql.
On many modern platforms, /etc/localtime is a symlink to a file within the
IANA database. Reading the symlink lets us find out the name of the system
timezone directly, without going through the brute-force search embodied in
scan_available_timezones(). This shortens the runtime of initdb by some
tens of ms, which is helpful for the buildfarm, and it also allows us to
reliably select the same zone name the system was actually configured for,
rather than possibly choosing one of IANA's many zone aliases. (For
example, in a system configured for "Asia/Tokyo", the brute-force search
would not choose that name but its alias "Japan", on the grounds of the
latter string being shorter. More surprisingly, "Navajo" is preferred
to either "America/Denver" or "US/Mountain", as seen in an old complaint
from Josh Berkus.)
If /etc/localtime doesn't exist, or isn't a symlink, or we can't make
sense of its contents, or the contents match a zone we know but that
zone doesn't match the observed behavior of localtime(), fall back to
the brute-force search.
Also, tweak initdb so that it prints the zone name it selected.
In passing, replace the last few references to the "Olson" database in
code comments with "IANA", as that's been our preferred term since
commit b2cbced9e.
Patch by me, per a suggestion from Robert Haas; review by Michael Paquier
Discussion: https://postgr.es/m/7408.1525812528@sss.pgh.pa.us
* Include partitioned tables in what's offered after ANALYZE.
* Include toast_tuple_target in what's offered after ALTER TABLE ... SET|RESET.
* Include HASH in what's offered after PARTITION BY.
This is extracted from a larger patch; these bits seem like
uncontroversial bug fixes for v11 features, so back-patch them into v11.
Justin Pryzby
Discussion: https://postgr.es/m/20180529000623.GA21896@telsasoft.com
There's a project policy against using plain "char buf[BLCKSZ]" local
or static variables as page buffers; preferred style is to palloc or
malloc each buffer to ensure it is MAXALIGN'd. However, that policy's
been ignored in an increasing number of places. We've apparently got
away with it so far, probably because (a) relatively few people use
platforms on which misalignment causes core dumps and/or (b) the
variables chance to be sufficiently aligned anyway. But this is not
something to rely on. Moreover, even if we don't get a core dump,
we might be paying a lot of cycles for misaligned accesses.
To fix, invent new union types PGAlignedBlock and PGAlignedXLogBlock
that the compiler must allocate with sufficient alignment, and use
those in place of plain char arrays.
I used these types even for variables where there's no risk of a
misaligned access, since ensuring proper alignment should make
kernel data transfers faster. I also changed some places where
we had been palloc'ing short-lived buffers, for coding style
uniformity and to save palloc/pfree overhead.
Since this seems to be a live portability hazard (despite the lack
of field reports), back-patch to all supported versions.
Patch by me; thanks to Michael Paquier for review.
Discussion: https://postgr.es/m/1535618100.1286.3.camel@credativ.de
Currently there are two ways to trigger log rotation in logging collector
process: call pg_rotate_logfile() SQL-function or send SIGUSR1 signal directly
to logging collector process. However, it's nice to have more suitable way
for external tools to do that, which wouldn't require SQL connection or
knowledge of logging collector pid. This commit implements triggering log
rotation by "pg_ctl logrotate" command.
Discussion: https://postgr.es/m/20180416.115435.28153375.horiguchi.kyotaro%40lab.ntt.co.jp
Author: Kyotaro Horiguchi, Alexander Kuzmenkov, Alexander Korotkov
A cast declared WITH INOUT was described as '(binary coercible)',
which seems pretty inaccurate; let's print '(with inout)' instead.
Per complaint from Jean-Pierre Pelletier.
This definitely seems like a bug fix, but given that it's been wrong
since 8.4 and nobody complained before, I'm hesitant to back-patch a
behavior change into stable branches. It doesn't seem too late for
v11 though.
Discussion: https://postgr.es/m/5b887023.1c69fb81.ff96e.6a1d@mx.google.com
Use postgres_fe.h, since this is frontend code. Pretend that we've heard
of project style guidelines for, eg, #include order. Use BlockNumber not
int arithmetic for block numbers, to avoid misbehavior with relations
exceeding 2^31 blocks. Avoid an unnecessary strict-aliasing warning
(per report from Michael Banck). Const-ify assorted stuff. Avoid
scribbling on the output of readdir() -- perhaps that's safe in practice,
but POSIX forbids it, and this code has so far earned exactly zero
credibility portability-wise. Editorialize on an ambiguously-worded
message.
I did not touch the problem of the "buf" local variable being possibly
insufficiently aligned; that's not specific to this code, and seems like
it should be fixed as part of a different, larger patch.
Discussion: https://postgr.es/m/1535618100.1286.3.camel@credativ.de
To verify the checksums, we open the file in text mode which doesn't work
on Windows as WIN32 treats Control-Z as EOF in files opened in text mode.
This leads to "short read of block .." error in some cases.
Fix it by opening the files in the binary mode.
Author: Amit Kapila
Reviewed-by: Magnus Hagander
Backpatch-through: 11
Discussion: https://postgr.es/m/CAA4eK1+LOnzod+h85FGmyjWzXKy-XV1FYwEyP-Tky2WpD5cxwA@mail.gmail.com
Instead of repeating the almost same large query in each version branch,
use one query and add a few columns to the SELECT list depending on the
version. This saves a lot of duplication.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>