On Windows, CMD.EXE allegedly does not run a command that uses forward slashes,
so let's convert the path to use backslashes instead.
Backpatch to 10.
Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Reviewed-by: Juan José Santamaría Flecha <juanjo.santamaria@gmail.com>
Discussion: https://postgr.es/m/CAMm1aWaNDuaPYFYMAqDeJrZmPtNvLcJRS++CcZWY8LT6KcoBZw@mail.gmail.com
For those system catalogs that have a unique indexes, make a primary
key and unique constraint, using ALTER TABLE ... PRIMARY KEY/UNIQUE
USING INDEX.
This can be helpful for GUI tools that look for a primary key, and it
might in the future allow declaring foreign keys, for making schema
diagrams.
The constraint creation statements are automatically created by
genbki.pl from DECLARE_UNIQUE_INDEX directives. To specify which one
of the available unique indexes is the primary key, use the new
directive DECLARE_UNIQUE_INDEX_PKEY instead. By convention, we
usually make a catalog's OID column its primary key, if it has one.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/dc5f44d9-5ec1-a596-0251-dadadcdede98@2ndquadrant.com
Specifying this parameter removes the informational messages about how
to start the server. This is intended for use by wrappers in different
packaging systems, where those instructions would most likely be wrong
anyway, but the other output from initdb would still be useful (and thus
just redirecting everything to /dev/null would be bad).
Author: Magnus Hagander
Reviewed-By: Peter Eisentraut
Discusion: https://postgr.es/m/CABUevEzo4t5bmTXF0_B9WzmuWpVbMpkNZZiGvzV8NZa-=fPqeQ@mail.gmail.com
Since at least 2001 we've used putenv() and avoided setenv(), on the
grounds that the latter was unportable and not in POSIX. However,
POSIX added it that same year, and by now the situation has reversed:
setenv() is probably more portable than putenv(), since POSIX now
treats the latter as not being a core function. And setenv() has
cleaner semantics too. So, let's reverse that old policy.
This commit adds a simple src/port/ implementation of setenv() for
any stragglers (we have one in the buildfarm, but I'd not be surprised
if that code is never used in the field). More importantly, extend
win32env.c to also support setenv(). Then, replace usages of putenv()
with setenv(), and get rid of some ad-hoc implementations of setenv()
wannabees.
Also, adjust our src/port/ implementation of unsetenv() to follow the
POSIX spec that it returns an error indicator, rather than returning
void as per the ancient BSD convention. I don't feel a need to make
all the call sites check for errors, but the portability stub ought
to match real-world practice.
Discussion: https://postgr.es/m/2065122.1609212051@sss.pgh.pa.us
The patch needs test cases, reorganization, and cfbot testing.
Technically reverts commits 5c31afc49d..e35b2bad1a (exclusive/inclusive)
and 08db7c63f3..ccbe34139b.
Reported-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/E1ktAAG-0002V2-VB@gemulon.postgresql.org
This adds a key management system that stores (currently) two data
encryption keys of length 128, 192, or 256 bits. The data keys are
AES256 encrypted using a key encryption key, and validated via GCM
cipher mode. A command to obtain the key encryption key must be
specified at initdb time, and will be run at every database server
start. New parameters allow a file descriptor open to the terminal to
be passed. pg_upgrade support has also been added.
Discussion: https://postgr.es/m/CA+fd4k7q5o6Nc_AaX6BcYM9yqTbC6_pnH-6nSD=54Zp6NBQTCQ@mail.gmail.com
Discussion: https://postgr.es/m/20201202213814.GG20285@momjian.us
Author: Masahiko Sawada, me, Stephen Frost
The BKI file's string quoting conventions were previously quite weird,
perhaps as a result of repurposing a function built to scan
single-quoted strings to scan double-quoted ones. Change to use the
same rules as we use in GUC files, allowing some simplifications in
genbki.pl and initdb.c.
While at it, completely remove the backend's scanstr() function, which
was essentially a duplicate of the string dequoting code in guc-file.l.
Instead export that one (under a less generic name than it had) and let
bootscanner.l use it. Now we can clarify that scansup.c exists only to
support the main lexer. We could alternatively have removed GUC_scanstr,
but this way seems better since the previous arrangement could mislead
a reader into thinking that scanstr() had something to do with the main
lexer's handling of string literals. Maybe it did once, but if so it
was a long time ago.
This patch does not bump catversion, since the initially-installed
catalog contents don't change. Note however that successful initdb
after applying this patch will require up-to-date postgres.bki as well
as postgres and initdb executables.
In passing, remove a bunch of very-long-obsolete #include's in
bootparse.y and bootscanner.l.
John Naylor
Discussion: https://postgr.es/m/CACPNZCtDpd18T0KATTmCggO2GdVC4ow86ypiq5ENff1VnauL8g@mail.gmail.com
Further experience says that the appending behavior offered by
pg_get_line_append is useful to only a very small minority of callers.
For most, the requirement to reset the buffer after each line is just
an error-prone nuisance. Hence, invent another alternative call
pg_get_line_buf, which takes care of that detail.
Noted while reviewing a patch from Daniel Gustafsson.
Discussion: https://postgr.es/m/48A4FA71-524E-41B9-953A-FD04EF36E2E7@yesql.se
Letting the caller provide a StringInfo to read into is helpful when
the caller needs to merge lines or otherwise modify the data after
it's been read. Notably, now the code added by commit 8f8154a50
can use pg_get_line_append() instead of having its own copy of that
logic. A follow-on commit will also make use of this.
Also, since StringInfo buffers are a minimum of 1KB long, blindly
using pg_get_line() in a loop can eat a lot more memory than one would
expect. I discovered for instance that commit e0f05cd5b caused initdb
to consume circa 10MB to read postgres.bki, even though that's under
1MB worth of data. A less memory-hungry alternative is to re-use the
same StringInfo for all lines and pg_strdup the results.
Discussion: https://postgr.es/m/1315832.1599345736@sss.pgh.pa.us
At some point back in the last century, somebody felt that reading
all of pg_type twice was cheaper, or at least easier, than using
repalloc() to resize the Typ[] array dynamically. That seems like an
entirely wacko proposition, so rewrite the code to do it the other
way. (To add insult to injury, there were two not-quite-identical
copies of said code.)
initdb.c's readfile() function had the same disease of preferring
to do double the I/O to avoid resizing its output array. Here,
we can make things easier by using the just-invented pg_get_line()
function to handle reading individual lines without a predetermined
notion of how long they are.
On my machine, it's difficult to detect any net change in the
overall runtime of initdb from these changes; but they should
help on slower buildfarm machines (especially since a buildfarm
cycle involves a lot of initdb's these days).
My attention was drawn to these places by scan-build complaints,
but on inspection they needed a lot more work than just suppressing
dead stores :-(
This patch started out with the goal of harmonizing various arbitrary
limits on password length, but after awhile a better idea emerged:
let's just get rid of those fixed limits.
recv_password_packet() has an arbitrary limit on the packet size,
which we don't really need, so just drop it. (Note that this doesn't
really affect anything for MD5 or SCRAM password verification, since
those will hash the user's password to something shorter anyway.
It does matter for auth methods that require a cleartext password.)
Likewise remove the arbitrary error condition in pg_saslprep().
The remaining limits are mostly in client-side code that prompts
for passwords. To improve those, refactor simple_prompt() so that
it allocates its own result buffer that can be made as big as
necessary. Actually, it proves best to make a separate routine
pg_get_line() that has essentially the semantics of fgets(), except
that it allocates a suitable result buffer and hence will never
return a truncated line. (pg_get_line has a lot of potential
applications to replace randomly-sized fgets buffers elsewhere,
but I'll leave that for another patch.)
I built pg_get_line() atop stringinfo.c, which requires moving
that code to src/common/; but that seems fine since it was a poor
fit for src/port/ anyway.
This patch is mostly mine, but it owes a good deal to Nathan Bossart
who pressed for a solution to the password length problem and
created a predecessor patch. Also thanks to Peter Eisentraut and
Stephen Frost for ideas and discussion.
Discussion: https://postgr.es/m/09512C4F-8CB9-4021-B455-EF4C4F0D55A0@amazon.com
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.
Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
There were a few different ways to line-wrap the error messages. Make
them all the same, and use placeholders for the actual program names,
to save translation work.
This fixes and updates a couple of comments related to outdated Windows
versions. Particularly, src/common/exec.c had a fallback implementation
to read a file's line from a pipe because stdin/stdout/stderr does not
exist in Windows 2000 that is removed to simplify src/common/ as there
are unlikely versions of Postgres running on such platforms.
Author: Michael Paquier
Reviewed-by: Kyotaro Horiguchi, Juan José Santamaría Flecha
Discussion: https://postgr.es/m/20191219021526.GC4202@paquier.xyz
This data was only in separate files because it was the most convenient
way to handle it with a shell script. Now that we use a general-purpose
programming language, it's easy to assemble the data into the same format
as the rest of the catalogs and output it into postgres.bki. This allows
removal of some special-purpose code from initdb.c.
Discussion: https://www.postgresql.org/message-id/CACPNZCtVFtjHre6hg9dput0qRPp39pzuyA2A6BT8wdgrRy%2BQdA%40mail.gmail.com
Author: John Naylor
I left PG_CMD_PUTS around even though it could be handled by
PG_CMD_PRINTF since PG_CMD_PUTS is sometimes called with non-literal
arguments, and so that would create a potential problem if such a
string contained percent signs.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Change the defaults for the pg_hba.conf generated by initdb to "peer"
for local (if supported, else "md5") and "md5" for host.
(Changing from "md5" to SCRAM is left as a separate exercise.)
"peer" is currently not supported on AIX, HP-UX, and Windows. Users
on those operating systems will now either have to provide a password
to initdb or choose a different authentication method when running
initdb.
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/bec17f0a-ddb1-8b95-5e69-368d9d0a3390%40postgresql.org
The main change is a new stemmer for Greek. There are minor changes
in the Danish and French stemmers.
Author: Panagiotis Mavrogiorgos <pmav99@gmail.com>
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.
Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
The original placement of this module in src/fe_utils/ is ill-considered,
because several src/common/ modules have dependencies on it, meaning that
libpgcommon and libpgfeutils now have mutual dependencies. That makes it
pointless to have distinct libraries at all. The intended design is that
libpgcommon is lower-level than libpgfeutils, so only dependencies from
the latter to the former are acceptable.
We already have the precedent that fe_memutils and a couple of other
modules in src/common/ are frontend-only, so it's not stretching anything
out of whack to treat logging.c as a frontend-only module in src/common/.
To the extent that such modules help provide a common frontend/backend
environment for the rest of common/ to use, it's a reasonable design.
(logging.c does not yet provide an ereport() emulation, but one can
dream.)
Hence, move these files over, and revert basically all of the build-system
changes made by commit cc8d41511. There are no places that need to grow
new dependencies on libpgcommon, further reinforcing the idea that this
is the right solution.
Discussion: https://postgr.es/m/a912ffff-f6e4-778a-c86a-cf5c47a12933@2ndquadrant.com
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
This adds a flag "deterministic" to collations. If that is false,
such a collation disables various optimizations that assume that
strings are equal only if they are byte-wise equal. That then allows
use cases such as case-insensitive or accent-insensitive comparisons
or handling of strings with different Unicode normal forms.
This functionality is only supported with the ICU provider. At least
glibc doesn't appear to have any locales that work in a
nondeterministic way, so it's not worth supporting this for the libc
provider.
The term "deterministic comparison" in this context is from Unicode
Technical Standard #10
(https://unicode.org/reports/tr10/#Deterministic_Comparison).
This patch makes changes in three areas:
- CREATE COLLATION DDL changes and system catalog changes to support
this new flag.
- Many executor nodes and auxiliary code are extended to track
collations. Previously, this code would just throw away collation
information, because the eventually-called user-defined functions
didn't use it since they only cared about equality, which didn't
need collation information.
- String data type functions that do equality comparisons and hashing
are changed to take the (non-)deterministic flag into account. For
comparison, this just means skipping various shortcuts and tie
breakers that use byte-wise comparison. For hashing, we first need
to convert the input string to a canonical "sort key" using the ICU
analogue of strxfrm().
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
Commit c0d0e54084 replaced the ones in the documentation, but missed out
on the ones in the code. Replace those as well, but unlike c0d0e54084,
don't backpatch the code changes to avoid breaking translations.
Instead of running a SQL script to create the standard conversion
functions and pg_conversion entries, put those entries into the
initial data in postgres.bki.
This shaves a few percent off the runtime of initdb, and also allows
accurate comments to be attached to the conversion functions; the
previous script labeled them with machine-generated comments that
were not quite right for multi-purpose conversion functions.
Also, we can get rid of the duplicative Makefile and MSVC perl
implementations of the generation code for that SQL script.
A functional change is that these pg_proc and pg_conversion entries
are now "pinned" by initdb. Leaving them unpinned was perhaps a
good thing back while the conversions feature was under development,
but there seems no valid reason for it now.
Also, the conversion functions are now marked as immutable, where
before they were volatile by virtue of lacking any explicit
specification. That seems like it was just an oversight.
To avoid using magic constants in pg_conversion.dat, extend
genbki.pl to allow encoding names to be converted, much as it
does for language, access method, etc names.
John Naylor
Discussion: https://postgr.es/m/CAJVSVGWtUqxpfAaxS88vEGvi+jKzWZb2EStu5io-UPc4p9rSJg@mail.gmail.com
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
We haven't touched these since text search functionality landed in core
in 2007 :-(. While the upstream project isn't a beehive of activity,
they do make additions and bug fixes from time to time. Update our
copies of these files.
Also update our documentation about how to keep things in sync, since
they're not making distribution tarballs these days. Fortunately,
their source code turns out to be a breeze to build.
Notable changes:
* The non-UTF8 version of the hungarian stemmer now works in LATIN2
not LATIN1.
* New stemmers have appeared for arabic, indonesian, irish, lithuanian,
nepali, and tamil. These all work in UTF8, and the indonesian and
irish ones also work in LATIN1.
(There are some new stemmers that I did not incorporate, mainly because
their names don't match the underlying languages, suggesting that they're
not to be considered mainstream.)
Worth noting: the upstream Nepali dictionary was contributed by
Arthur Zakirov.
initdb forced because the contents of snowball_create.sql have
changed.
Still TODO: see about updating the stopword lists.
Arthur Zakirov, minor mods and doc work by me
Discussion: https://postgr.es/m/20180626122025.GA12647@zakirov.localdomain
Discussion: https://postgr.es/m/20180219140849.GA9050@zakirov.localdomain
This removes a difference between the standard IsUnderPostmaster
execution environment and that of --boot and --single. In a stand-alone
backend, "SELECT random()" always started at the same seed.
On a system capable of using posix shared memory, initdb could still
conclude "selecting dynamic shared memory implementation ... sysv".
Crashed --boot or --single postgres processes orphaned shared memory
objects having names that collided with the not-actually-random names
that initdb probed. The sysv fallback appeared after ten crashes of
--boot or --single postgres. Since --boot and --single are rare in
production use, systems used for PostgreSQL development are the
principal candidate to notice this symptom.
Back-patch to 9.3 (all supported versions). PostgreSQL 9.4 introduced
dynamic shared memory, but 9.3 does share the "SELECT random()" problem.
Reviewed by Tom Lane and Kyotaro HORIGUCHI.
Discussion: https://postgr.es/m/20180915221546.GA3159382@rfd.leadboat.com
PostgreSQL uses a custom wrapper for open() and fopen() which is
concurrent-safe, allowing multiple processes to open and work on the
same file. This has a couple of advantages:
- pg_test_fsync does not handle O_DSYNC correctly otherwise, leading to
false claims that disks are unsafe.
- TAP tests can run into race conditions when a postmaster and pg_ctl
open postmaster.pid, fixing some random failures in the buildfam.
pg_upgrade is one frontend tool using workarounds to bypass file locking
issues with the log files it generates, however the interactions with
pg_ctl are proving to be tedious to get rid of, so this is left for
later.
Author: Laurenz Albe
Reviewed-by: Michael Paquier, Kuntal Ghosh
Discussion: https://postgr.es/m/1527846213.2475.31.camel@cybertec.at
Discussion: https://postgr.es/m/16922.1520722108@sss.pgh.pa.us
On many modern platforms, /etc/localtime is a symlink to a file within the
IANA database. Reading the symlink lets us find out the name of the system
timezone directly, without going through the brute-force search embodied in
scan_available_timezones(). This shortens the runtime of initdb by some
tens of ms, which is helpful for the buildfarm, and it also allows us to
reliably select the same zone name the system was actually configured for,
rather than possibly choosing one of IANA's many zone aliases. (For
example, in a system configured for "Asia/Tokyo", the brute-force search
would not choose that name but its alias "Japan", on the grounds of the
latter string being shorter. More surprisingly, "Navajo" is preferred
to either "America/Denver" or "US/Mountain", as seen in an old complaint
from Josh Berkus.)
If /etc/localtime doesn't exist, or isn't a symlink, or we can't make
sense of its contents, or the contents match a zone we know but that
zone doesn't match the observed behavior of localtime(), fall back to
the brute-force search.
Also, tweak initdb so that it prints the zone name it selected.
In passing, replace the last few references to the "Olson" database in
code comments with "IANA", as that's been our preferred term since
commit b2cbced9e.
Patch by me, per a suggestion from Robert Haas; review by Michael Paquier
Discussion: https://postgr.es/m/7408.1525812528@sss.pgh.pa.us
Commit bcbd94080 didn't bother to fix up a comment it had falsified.
Also, const-ify choose_dsm_implementation(), since what it's returning
is always a constant string; pickier compilers would warn about this.
PostgreSQL nowadays offers some kind of dynamic shared memory feature on
all supported platforms. Having the choice of "none" prevents us from
relying on DSM in core features. So this patch removes the choice of
"none".
Author: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
While poking into initdb's performance, I noticed that this query
wasn't being done very intelligently. By forcing it to execute
obj_description() for each pg_proc/pg_operator join row, we were
essentially setting up a nestloop join to pg_description, which
is not a bright query plan when there are hundreds of outer rows.
Convert the check for a "deprecated" operator into a NOT EXISTS
so that it can be done as a hashed antijoin. On my workstation
this reduces the time for this query from ~ 35ms to ~ 10ms.
Which is not a huge win, but it adds up over buildfarm runs.
In passing, insert forced query breaks (\n\n, in single-user mode)
after each SQL-query file that initdb sources, and after some
relatively new queries in setup_privileges(). This doesn't make
a lot of difference normally, but it will result in briefer, saner
error messages if anything goes wrong.
Change things around so that proper quoting of values interpolated into
the BKI data by initdb is the responsibility of initdb, not something
we half-heartedly handle by putting double quotes into the raw BKI data.
(Note: experimentation shows that it still doesn't work to put a double
quote into the initial superuser username, but that's the fault of
inadequate quoting while interpolating the name into SQL scripts;
the BKI aspect of it works fine now.)
Having done that, we can remove the special-case handling of values
that look like "something" from genbki.pl, and instead teach it to
escape double --- and single --- quotes properly. This removes the
nowhere-documented need to treat those specially in the BKI source
data; whatever you write will be passed through unchanged into the
inserted data value, modulo Perl's rules about single-quoted strings.
Add documentation explaining the (pre-existing) handling of backslashes
in the BKI data.
Per an earlier discussion with John Naylor.
Discussion: https://postgr.es/m/CAJVSVGUNao=-Q2-vAN3PYcdF5tnL5JAHwGwzZGuYHtq+Mk_9ng@mail.gmail.com
In commit 9c0a0de4c, I'd failed to notice that catalog/catalog.h
should also be considered a frontend-unsafe header, because it includes
(and needs) the full form of pg_class.h, not to mention relcache.h.
However, various frontend code was depending on it to get
TABLESPACE_VERSION_DIRECTORY, so refactoring of some sort is called for.
The cleanest answer seems to be to move TABLESPACE_VERSION_DIRECTORY,
as well as the OIDCHARS symbol, to common/relpath.h. Do that, and mop up
inclusions as necessary. (I found that quite a few current users of
catalog/catalog.h don't seem to need it at all anymore, apparently as a
result of the refactorings that created common/relpath.[hc]. And
initdb.c needed it only as a route to pg_class_d.h.)
Discussion: https://postgr.es/m/6629.1523294509@sss.pgh.pa.us
Everything of use to frontend code should now appear in the _d.h files,
and making this change frees us from needing to worry about whether the
catalog header files proper are frontend-safe.
Remove src/interfaces/ecpg/ecpglib/pg_type.h entirely, as the previous
commit reduced it to a confusingly-named wrapper around pg_type_d.h.
In passing, make test_rls_hooks.c follow project convention of including
our own files with #include "" not <>.
Discussion: https://postgr.es/m/23690.1523031777@sss.pgh.pa.us
Allow the cluster to be optionally init'd with read access for the
group.
This means a relatively non-privileged user can perform a backup of the
cluster without requiring write privileges, which enhances security.
The mode of PGDATA is used to determine whether group permissions are
enabled for directory and file creates. This method was chosen as it's
simple and works well for the various utilities that write into PGDATA.
Changing the mode of PGDATA manually will not automatically change the
mode of all the files contained therein. If the user would like to
enable group access on an existing cluster then changing the mode of all
the existing files will be required. Note that pg_upgrade will
automatically change the mode of all migrated files if the new cluster
is init'd with the -g option.
Tests are included for the backend and all the utilities which operate
on the PG data directory to ensure that the correct mode is set based on
the data directory permissions.
Author: David Steele <david@pgmasters.net>
Reviewed-By: Michael Paquier, with discussion amongst many others.
Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
Consolidate directory and file create permissions for tools which work
with the PG data directory by adding a new module (common/file_perm.c)
that contains variables (pg_file_create_mode, pg_dir_create_mode) and
constants to initialize them (0600 for files and 0700 for directories).
Convert mkdir() calls in the backend to MakePGDirectory() if the
original call used default permissions (always the case for regular PG
directories).
Add tests to make sure permissions in PGDATA are set correctly by the
tools which modify the PG data directory.
Authors: David Steele <david@pgmasters.net>,
Adam Brightwell <adam.brightwell@crunchydata.com>
Reviewed-By: Michael Paquier, with discussion amongst many others.
Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
Fix the warnings created by the compiler warning options
-Wformat-overflow=2 -Wformat-truncation=2, supported since GCC 7. This
is a more aggressive variant of the fixes in
6275f5d28a, which GCC 7 warned about by
default.
The issues are all harmless, but some dubious coding patterns are
cleaned up.
One issue that is of external interest is that BGW_MAXLEN is increased
from 64 to 96. Apparently, the old value would cause the bgw_name of
logical replication workers to be truncated in some circumstances.
But this doesn't actually add those warning options. It appears that
the warnings depend a bit on compilation and optimization options, so it
would be annoying to have to keep up with that. This is more of a
once-in-a-while cleanup.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
The server won't actually start with that setting anymore, not since
we raised the default max_wal_senders to 10. Per discussion, we don't
wish to back down on that default, so instead raise the effective floor
for max_connections (to 20). It's still possible to configure a smaller
setting manually, but initdb won't set it that way.
Since that change happened in v10, back-patch to v10.
Kyotaro Horiguchi
Discussion: https://postgr.es/m/20180209.170823.42008365.horiguchi.kyotaro@lab.ntt.co.jp
Doing this suppresses Coverity warnings and might allow improved
code in some cases. The prospects of that are not so bright as
to warrant back-patching, though.
Michael Paquier, per Coverity
For performance reasons a larger segment size than the default 16MB
can be useful. A larger segment size has two main benefits: Firstly,
in setups using archiving, it makes it easier to write scripts that
can keep up with higher amounts of WAL, secondly, the WAL has to be
written and synced to disk less frequently.
But at the same time large segment size are disadvantageous for
smaller databases. So far the segment size had to be configured at
compile time, often making it unrealistic to choose one fitting to a
particularly load. Therefore change it to a initdb time setting.
This includes a breaking changes to the xlogreader.h API, which now
requires the current segment size to be configured. For that and
similar reasons a number of binaries had to be taught how to recognize
the current segment size.
Author: Beena Emerson, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Kuntal Ghosh, Michael
Paquier, Peter Eisentraut, Robert Hass, Tushar Ahuja
Discussion: https://postgr.es/m/CAOG9ApEAcQ--1ieKbhFzXSQPw_YLmepaa4hNdnY5+ZULpt81Mw@mail.gmail.com
The encoding ID was converted between string and number too many times,
probably a remnant from the shell script days.
Reviewed-by: Aleksandr Parfenov <a.parfenov@postgrespro.ru>
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
It's essential that initdb.c's setup_depend() scan each system catalog
that could contain objects that need to have "p" (pin) entries in pg_depend
or pg_shdepend. Forgetting to add that, either when a catalog is first
invented or when it first acquires DATA() entries, is an obvious bug
hazard. We can detect such omissions at reasonable cost by probing every
OID-containing system catalog to see whether the lowest-numbered OID in it
is pinned. If so, the catalog must have been properly accounted for in
setup_depend(). If the lowest OID is above FirstNormalObjectId then the
catalog must have been empty at the end of initdb, so it doesn't matter.
There are a small number of catalogs whose first entry is made later in
initdb than setup_depend(), resulting in nonempty expected output of the
test, but these can be manually inspected to see that they are OK. Any
future mistake of this ilk will manifest as a new entry in the test's
output.
Since pg_conversion is already in the test's output, add it to the set of
catalogs scanned by setup_depend(). That has no effect today (hence, no
catversion bump here) but it will protect us if we ever do add pin-worthy
conversions.
This test is very much like the catalog sanity checks embodied in
opr_sanity.sql and type_sanity.sql, but testing pg_depend doesn't seem to
fit naturally into either of those scripts' charters. Hence, invent a new
test script misc_sanity.sql, which can be a home for this as well as tests
on any other catalogs we might want in future.
Discussion: https://postgr.es/m/8068.1498155068@sss.pgh.pa.us
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:
* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
than the expected column 33.
On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list. This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.
There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses. I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
Per discussion, plain "scram" is confusing because we actually implement
SCRAM-SHA-256 rather than the original SCRAM that uses SHA-1 as the hash
algorithm. If we add support for SCRAM-SHA-512 or some other mechanism in
the SCRAM family in the future, that would become even more confusing.
Most of the internal files and functions still use just "scram" as a
shorthand for SCRMA-SHA-256, but I did change PASSWORD_TYPE_SCRAM to
PASSWORD_TYPE_SCRAM_SHA_256, as that could potentially be used by 3rd
party extensions that hook into the password-check hook.
Michael Paquier did this in an earlier version of the SCRAM patch set
already, but I didn't include that in the version that was committed.
Discussion: https://www.postgresql.org/message-id/fde71ff1-5858-90c8-99a9-1c2427e7bafb@iki.fi
Add a column collprovider to pg_collation that determines which library
provides the collation data. The existing choices are default and libc,
and this adds an icu choice, which uses the ICU4C library.
The pg_locale_t type is changed to a union that contains the
provider-specific locale handles. Users of locale information are
changed to look into that struct for the appropriate handle to use.
Also add a collversion column that records the version of the collation
when it is created, and check at run time whether it is still the same.
This detects potentially incompatible library upgrades that can corrupt
indexes and other structures. This is currently only supported by
ICU-provided collations.
initdb initializes the default collation set as before from the `locale
-a` output but also adds all available ICU locales with a "-x-icu"
appended.
Currently, ICU-provided collations can only be explicitly named
collations. The global database locales are still always libc-provided.
ICU support is enabled by configure --with-icu.
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Although it's reasonable to expect that most of these constants will
never change, that does not make it good programming style to hard-code
the value rather than using the RELKIND_FOO macros.
Discussion: https://postgr.es/m/11145.1488931324@sss.pgh.pa.us
initdb now initializes a pg_hba.conf that allows replication connections
from the local host, same as it does for regular connections. The
connecting user still needs to have the REPLICATION attribute or be a
superuser.
The intent is to allow pg_basebackup from the local host to succeed
without requiring additional configuration.
Michael Paquier <michael.paquier@gmail.com> and me
This introduces a new generic SASL authentication method, similar to the
GSS and SSPI methods. The server first tells the client which SASL
authentication mechanism to use, and then the mechanism-specific SASL
messages are exchanged in AuthenticationSASLcontinue and PasswordMessage
messages. Only SCRAM-SHA-256 is supported at the moment, but this allows
adding more SASL mechanisms in the future, without changing the overall
protocol.
Support for channel binding, aka SCRAM-SHA-256-PLUS is left for later.
The SASLPrep algorithm, for pre-processing the password, is not yet
implemented. That could cause trouble, if you use a password with
non-ASCII characters, and a client library that does implement SASLprep.
That will hopefully be added later.
Authorization identities, as specified in the SCRAM-SHA-256 specification,
are ignored. SET SESSION AUTHORIZATION provides more or less the same
functionality, anyway.
If a user doesn't exist, perform a "mock" authentication, by constructing
an authentic-looking challenge on the fly. The challenge is derived from
a new system-wide random value, "mock authentication nonce", which is
created at initdb, and stored in the control file. We go through these
motions, in order to not give away the information on whether the user
exists, to unauthenticated users.
Bumps PG_CONTROL_VERSION, because of the new field in control file.
Patch by Michael Paquier and Heikki Linnakangas, reviewed at different
stages by Robert Haas, Stephen Frost, David Steele, Aleksander Alekseev,
and many others.
Discussion: https://www.postgresql.org/message-id/CAB7nPqRbR3GmFYdedCAhzukfKrgBLTLtMvENOmPrVWREsZkF8g%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAB7nPqSMXU35g%3DW9X74HVeQp0uvgJxvYOuA4A-A3M%2B0wfEBv-w%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/55192AFE.6080106@iki.fi
c.h #includes a number of core libc header files, such as <stdio.h>.
There's no point in re-including these after having read postgres.h,
postgres_fe.h, or c.h; so remove code that did so.
While at it, also fix some places that were ignoring our standard pattern
of "include postgres[_fe].h, then system header files, then other Postgres
header files". While there's not any great magic in doing it that way
rather than system headers last, it's silly to have just a few files
deviating from the general pattern. (But I didn't attempt to enforce this
globally, only in files I was touching anyway.)
I'd be the first to say that this is mostly compulsive neatnik-ism,
but over time it might save enough compile cycles to be useful.
The previous coding did not properly quote the user name before casting
it to regrole. To avoid all that, just pass in BOOTSTRAP_SUPERUSERID
numerically.
Also fix one place where the BOOTSTRAP_SUPERUSERID was hardcoded as 10.
Move this logic out of initdb into a user-callable function. This
simplifies the code and makes it possible to update the standard
collations later on if additional operating system collations appear.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Euler Taveira <euler@timbira.com.br>