Commit Graph

284 Commits

Author SHA1 Message Date
Thomas Munro 4fc6b6eefc Fix get_dirent_type() for symlinks on MinGW/MSYS.
On Windows with MSVC, get_dirent_type() was recently made to return
DT_LNK for junction points by commit 9d3444dc, which fixed some
defective dirent.c code.

On Windows with Cygwin, get_dirent_type() already worked for symlinks,
as it does on POSIX systems, because Cygwin has its own fake symlinks
that behave like POSIX (on closer inspection, Cygwin's dirent has the
BSD d_type extension but it's probably always DT_UNKNOWN, so we fall
back to lstat(), which understands Cygwin symlinks with S_ISLNK()).

On Windows with MinGW/MSYS, we need extra code, because the MinGW
runtime has its own readdir() without d_type, and the lstat()-based
fallback has no knowledge of our convention for treating junctions as
symlinks.

Back-patch to 14, where get_dirent_type() landed.

Reported-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/b9ddf605-6b36-f90d-7c30-7b3e95c46276%40dunslane.net
2022-07-28 14:26:12 +12:00
Andres Freund c8a9246e09 Add output directory argument to generate-unicode_norm_table.pl
This is in preparation for building postgres with meson / ninja.

When building with meson, commands are run at the root of the build tree. Add
an option to put build output into the appropriate place.

Author: Andres Freund <andres@anarazel.de>
Author: Peter Eisentraut <peter@eisentraut.org>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/5e216522-ba3c-f0e6-7f97-5276d0270029@enterprisedb.com
2022-07-18 12:24:39 -07:00
Peter Eisentraut 9fd45870c1 Replace many MemSet calls with struct initialization
This replaces all MemSet() calls with struct initialization where that
is easily and obviously possible.  (For example, some cases have to
worry about padding bits, so I left those.)

(The same could be done with appropriate memset() calls, but this
patch is part of an effort to phase out MemSet(), so it doesn't touch
memset() calls.)

Reviewed-by: Ranier Vilela <ranier.vf@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/9847b13c-b785-f4e2-75c3-12ec77a3b05c@enterprisedb.com
2022-07-16 08:50:49 +02:00
Tom Lane 920072339f Improve error reporting from validate_exec().
validate_exec() didn't guarantee to set errno to something appropriate
after a failure, leading to callers not being able to print an on-point
message.  Improve that.

Noted by Kyotaro Horiguchi, though this isn't exactly his proposal.

Discussion: https://postgr.es/m/20220615.131403.1791191615590453058.horikyota.ntt@gmail.com
2022-07-12 15:37:39 -04:00
John Naylor d3117fc1a3 Fix out-of-bounds read in json_lex_string
Commit 3838fa269 added a lookahead loop to allow building strings multiple
bytes at a time. This loop could exit because it reached the end of input,
yet did not check for that before checking if we reached the end of a
valid string. To fix, put the end of string check back in the outer loop.

Per Valgrind animal skink
2022-07-12 11:25:47 +07:00
John Naylor 3838fa269c Build de-escaped JSON strings in larger chunks during lexing
During COPY BINARY with large JSONB blobs, it was found that half
the time was spent parsing JSON, with much of that spent in separate
appendStringInfoChar() calls for each input byte.

Add lookahead loop to json_lex_string() to allow batching multiple bytes
via appendBinaryStringInfo(). Also use this same logic when de-escaping
is not done, to avoid code duplication.

Report and proof of concept patch by Jelte Fennema, reworked by Andres
Freund and John Naylor

Discussion: https://www.postgresql.org/message-id/CAGECzQQuXbies_nKgSiYifZUjBk6nOf2%3DTSXqRjj2BhUh8CTeA%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/flat/PR3PR83MB0476F098CBCF68AF7A1CA89FF7B49@PR3PR83MB0476.EURPRD83.prod.outlook.com
2022-07-11 11:11:36 +07:00
John Naylor 3de359f18f Simplify json lexing state
Instead of updating the length as we go, use a const pointer to end of
the input, which we know already at the start.

This simplifies the coding and may improve performance slightly, but
the real motivation for doing this is to make further changes in this
area easier to reason about.

Discussion: https://www.postgresql.org/message-id/CAFBsxsGhaR2KQ5eisaK%3D6Vm60t%3DaxhD8Ckj1qFoCH1pktZi%2B2w%40mail.gmail.com
2022-07-08 14:53:20 +07:00
Robert Haas b0a55e4329 Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.

Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".

Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.

On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.

Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.

Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.

Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 11:39:09 -04:00
Peter Eisentraut 02c408e21a Remove redundant null pointer checks before free()
Per applicable standards, free() with a null pointer is a no-op.
Systems that don't observe that are ancient and no longer relevant.
Some PostgreSQL code already required this behavior, so this change
does not introduce any new requirements, just makes the code more
consistent.

Discussion: https://www.postgresql.org/message-id/flat/dac5d2d0-98f5-94d9-8e69-46da2413593d%40enterprisedb.com
2022-07-03 11:47:15 +02:00
Peter Eisentraut a8cca6026e logging: Also add the command prefix to detail and hint messages
This makes the output line up better and allows filtering messages by
command.

Discussion: https://www.postgresql.org/message-id/ba6d4fac-9e33-91f9-94fb-1e4c144a48b9@enterprisedb.com
2022-05-30 07:26:06 +02:00
Tom Lane 23e7b38bfe Pre-beta mechanical code beautification.
Run pgindent, pgperltidy, and reformat-dat-files.
I manually fixed a couple of comments that pgindent uglified.
2022-05-12 15:17:30 -04:00
Daniel Gustafsson 17ec5fa502 Clear the OpenSSL error queue before cryptohash operations
Setting up an EVP context for ciphers banned under FIPS generate
two OpenSSL errors in the queue, and as we only consume one from
the queue the other is at the head for the next invocation:

  postgres=# select md5('foo');
  ERROR:  could not compute MD5 hash: unsupported
  postgres=# select md5('foo');
  ERROR:  could not compute MD5 hash: initialization error

Clearing the error queue when creating the context ensures that
we don't pull in an error from an earlier operation.

Discussion: https://postgr.es/m/C89D932C-501E-4473-9750-638CFCD9095E@yesql.se
2022-05-06 14:41:31 +02:00
Andrew Dunstan b787c554c2 Inhibit mingw CRT's auto-globbing of command line arguments
For some reason by default the mingw C Runtime takes it upon itself to
expand program arguments that look like shell globbing characters. That
has caused much scratching of heads and mis-attribution of the causes of
some TAP test failures, so stop doing that.

This removes an inconsistency with Windows binaries built with MSVC,
which have no such behaviour.

Per suggestion from Noah Misch.

Backpatch to all live branches.

Discussion: https://postgr.es/m/20220423025927.GA1274057@rfd.leadboat.com
2022-04-25 15:47:55 -04:00
Tom Lane 587de223f0 Add missing error handling in pg_md5_hash().
It failed to provide an error string as expected for the
admittedly-unlikely case of OOM in pg_cryptohash_create().
Also, make it initialize *errstr to NULL for success,
as pg_md5_binary() does.

Also add missing comments.  Readers should not have to
reverse-engineer the API spec for a publicly visible routine.
2022-04-18 20:04:55 -04:00
Alvaro Herrera 24d2b2680a
Remove extraneous blank lines before block-closing braces
These are useless and distracting.  We wouldn't have written the code
with them to begin with, so there's no reason to keep them.

Author: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com
Discussion: https://postgr.es/m/attachment/133167/0016-Extraneous-blank-lines.patch
2022-04-13 19:16:02 +02:00
Tom Lane 2c9381840f Remove not-very-useful early checks of __pg_log_level in logging.h.
Enforce __pg_log_level message filtering centrally in logging.c,
instead of relying on the calling macros to do it.  This is more
reliable (e.g. it works correctly for direct calls to pg_log_generic)
and it saves a percent or so of total code size because we get rid of
so many duplicate checks of __pg_log_level.

This does mean that argument expressions in a logging macro will be
evaluated even if we end up not printing anything.  That seems of
little concern for INFO and higher levels as those messages are printed
by default, and most of our frontend programs don't even offer a way to
turn them off.  I left the unlikely() checks in place for DEBUG
messages, though.

Discussion: https://postgr.es/m/3993549.1649449609@sss.pgh.pa.us
2022-04-12 13:25:29 -04:00
Michael Paquier a4b57543ac Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.

pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup.  This cleanup needs to be done
before releasing a beta of 15.  pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.

Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 13:38:54 +09:00
Peter Eisentraut c215cc7b61 Add color support for new frontend detail/hint messages
As before, the defaults are similar to gcc's default appearance.
2022-04-11 17:36:44 +02:00
Tom Lane 9a374b77fb Improve frontend error logging style.
Get rid of the separate "FATAL" log level, as it was applied
so inconsistently as to be meaningless.  This mostly involves
s/pg_log_fatal/pg_log_error/g.

Create a macro pg_fatal() to handle the common use-case of
pg_log_error() immediately followed by exit(1).  Various
modules had already invented either this or equivalent macros;
standardize on pg_fatal() and apply it where possible.

Invent the ability to add "detail" and "hint" messages to a
frontend message, much as we have long had in the backend.

Except where rewording was needed to convert existing coding
to detail/hint style, I have (mostly) resisted the temptation
to change existing message wording.

Patch by me.  Design and patch reviewed at various stages by
Robert Haas, Kyotaro Horiguchi, Peter Eisentraut and
Daniel Gustafsson.

Discussion: https://postgr.es/m/1363732.1636496441@sss.pgh.pa.us
2022-04-08 14:55:14 -04:00
Robert Haas 8e053dc6df Fix possible NULL-pointer-deference in backup_compression.c.
Per Coverity and Tom Lane. Reviewed by Tom Lane and Justin Pryzby.

Discussion: http://postgr.es/m/384291.1648403267@sss.pgh.pa.us
2022-03-30 15:53:08 -04:00
Robert Haas 51c0d186d9 Allow parallel zstd compression when taking a base backup.
libzstd allows transparent parallel compression just by setting
an option when creating the compression context, so permit that
for both client and server-side backup compression. To use this,
use something like pg_basebackup --compress WHERE-zstd:workers=N
where WHERE is "client" or "server" and N is an integer.

When compression is performed on the server side, this will spawn
threads inside the PostgreSQL backend. While there is almost no
PostgreSQL server code which is thread-safe, the threads here are used
internally by libzstd and touch only data structures controlled by
libzstd.

Patch by me, based in part on earlier work by Dipesh Pandit
and Jeevan Ladhe. Reviewed by Justin Pryzby.

Discussion: http://postgr.es/m/CA+Tgmobj6u-nWF-j=FemygUhobhryLxf9h-wJN7W-2rSsseHNA@mail.gmail.com
2022-03-30 09:41:26 -04:00
Peter Eisentraut c64fb698d0 Make update-unicode target work in vpath builds
Author: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/616c6873-83b5-85c0-93cb-548977c39c60@enterprisedb.com
2022-03-25 09:47:50 +01:00
Robert Haas 68d8f9bfb2 In get_bc_algorithm_name, add a dummy return statement.
This code shouldn't be reached, but having it here might avoid a
compiler warning.

Per CI complaint from Andres Freund.

Discussion: http://postgr.es/m/C6A7643A-582B-47F7-A03D-01736BC0349B@anarazel.de
2022-03-23 11:37:12 -04:00
Robert Haas ffd53659c4 Replace BASE_BACKUP COMPRESSION_LEVEL option with COMPRESSION_DETAIL.
There are more compression parameters that can be specified than just
an integer compression level, so rename the new COMPRESSION_LEVEL
option to COMPRESSION_DETAIL before it gets released. Introduce a
flexible syntax for that option to allow arbitrary options to be
specified without needing to adjust the main replication grammar,
and common code to parse it that is shared between the client and
the server.

This commit doesn't actually add any new compression parameters,
so the only user-visible change is that you can now type something
like pg_basebackup --compress gzip:level=5 instead of writing just
pg_basebackup --compress gzip:5. However, it should make it easy to
add new options. If for example gzip starts offering fries, we can
support pg_basebackup --compress gzip:level=5,fries=true for the
benefit of users who want fries with that.

Along the way, this fixes a few things in pg_basebackup so that the
pg_basebackup can be used with a server-side compression algorithm
that pg_basebackup itself does not understand. For example,
pg_basebackup --compress server-lz4 could still succeed even if
only the server and not the client has LZ4 support, provided that
the other options to pg_basebackup don't require the client to
decompress the archive.

Patch by me. Reviewed by Justin Pryzby and Dagfinn Ilmari Mannsåker.

Discussion: http://postgr.es/m/CA+TgmoYvpetyRAbbg1M8b3-iHsaN4nsgmWPjOENu5-doHuJ7fA@mail.gmail.com
2022-03-23 09:19:14 -04:00
John Naylor 4b35408f1e Use bitwise rotate functions in more places
There were a number of places in the code that used bespoke bit-twiddling
expressions to do bitwise rotation. While we've had pg_rotate_right32()
for a while now, we hadn't gotten around to standardizing on that. Do so
now. Since many potential call sites look more natural with the "left"
equivalent, add that function too.

Reviewed by Tom Lane and Yugo Nagata

Discussion:
https://www.postgresql.org/message-id/CAFBsxsH7c1LC0CGZ0ADCBXLHU5-%3DKNXx-r7tHYPAW51b2HK4Qw%40mail.gmail.com
2022-02-20 13:22:08 +07:00
Tom Lane 291ec6e45e Suppress integer-overflow compiler warning for inconsistent sun_len.
On AIX 7.1, struct sockaddr_un is declared to be 1025 bytes long,
but the sun_len field that should hold the length is only a byte.
Clamp the value we try to store to ensure it will fit in the field.

(This coding might need adjustment if there are any machines out
there where sun_len is as wide as size_t; but a preliminary survey
suggests there's not, so let's keep it simple.)

Discussion: https://postgr.es/m/2781112.1644819528@sss.pgh.pa.us
2022-02-14 11:25:46 -05:00
John Naylor d3f45323bb Improve code clarity in epilogue of UTF-8 verification fast path
The previous coding was correct, but the style and commentary were a bit
vague about which operations had to happen, in what circumstances, and
in what order. Rearrange so that the epilogue does nothing in the DFA END
state. That allows turning some conditional statements in the backtracking
logic into asserts. With that, we can be more explicit about needing
to backtrack at least one byte in non-END states to ensure checking the
current byte sequence in the slow path. No change to the regression tests,
since they should be able catch deficiencies here already.

In passing, improve the comments around DFA states where the first
continuation byte has a restricted range.
2022-01-17 22:53:50 -05:00
Michael Paquier 5513dc6a30 Improve error handling of HMAC computations
This is similar to b69aba7, except that this completes the work for
HMAC with a new routine called pg_hmac_error() that would provide more
context about the type of error that happened during a HMAC computation:
- The fallback HMAC implementation in hmac.c relies on cryptohashes, so
in some code paths it is necessary to return back the error generated by
cryptohashes.
- For the OpenSSL implementation (hmac_openssl.c), the logic is very
similar to cryptohash_openssl.c, where the error context comes from
OpenSSL if one of its internal routines failed, with different error
codes if something internal to hmac_openssl.c failed or was incorrect.

Any in-core code paths that use the centralized HMAC interface are
related to SCRAM, for errors that are unlikely going to happen, with
only SHA-256.  It would be possible to see errors when computing some
HMACs with MD5 for example and OpenSSL FIPS enabled, and this commit
would help in reporting the correct errors but nothing in core uses
that.  So, at the end, no backpatch to v14 is done, at least for now.

Errors in SCRAM related to the computation of the server key, stored
key, etc. need to pass down the potential error context string across
more layers of their respective call stacks for the frontend and the
backend, so each surrounding routine is adapted for this purpose.

Reviewed-by: Sergey Shinderuk
Discussion: https://postgr.es/m/Yd0N9tSAIIkFd+qi@paquier.xyz
2022-01-13 16:17:21 +09:00
Michael Paquier 87f29f4fcc Fix incorrect comments in hmac.c and hmac_openssl.c
Both files referred to pg_hmac_ctx->data, which, I guess, comes from the
early versions of the patch that has resulted in commit e6bdfd9.

Author: Sergey Shinderuk
Discussion: https://postgr.es/m/8cbb56dd-63d6-a581-7a65-25a97ac4be03@postgrespro.ru
Backpatch-through: 14
2022-01-13 09:43:36 +09:00
Michael Paquier 9a3d8e1886 Fix comment related to pg_cryptohash_error()
One of the comments introduced in b69aba7 was worded a bit weirdly, so
improve it.

Reported-by: Sergey Shinderuk
Discussion: https://postgr.es/m/71b9a5d2-a3bf-83bc-a243-93dcf0bcfb3b@postgrespro.ru
Backpatch-through: 14
2022-01-12 12:39:36 +09:00
Michael Paquier b69aba7457 Improve error handling of cryptohash computations
The existing cryptohash facility was causing problems in some code paths
related to MD5 (frontend and backend) that relied on the fact that the
only type of error that could happen would be an OOM, as the MD5
implementation used in PostgreSQL ~13 (the in-core implementation is
used when compiling with or without OpenSSL in those older versions),
could fail only under this circumstance.

The new cryptohash facilities can fail for reasons other than OOMs, like
attempting MD5 when FIPS is enabled (upstream OpenSSL allows that up to
1.0.2, Fedora and Photon patch OpenSSL 1.1.1 to allow that), so this
would cause incorrect reports to show up.

This commit extends the cryptohash APIs so as callers of those routines
can fetch more context when an error happens, by using a new routine
called pg_cryptohash_error().  The error states are stored within each
implementation's internal context data, so as it is possible to extend
the logic depending on what's suited for an implementation.  The default
implementation requires few error states, but OpenSSL could report
various issues depending on its internal state so more is needed in
cryptohash_openssl.c, and the code is shaped so as we are always able to
grab the necessary information.

The core code is changed to adapt to the new error routine, painting
more "const" across the call stack where the static errors are stored,
particularly in authentication code paths on variables that provide
log details.  This way, any future changes would warn if attempting to
free these strings.  The MD5 authentication code was also a bit blurry
about the handling of "logdetail" (LOG sent to the postmaster), so
improve the comments related that, while on it.

The origin of the problem is 87ae969, that introduced the centralized
cryptohash facility.  Extra changes are done for pgcrypto in v14 for the
non-OpenSSL code path to cope with the improvements done by this
commit.

Reported-by: Michael Mühlbeyer
Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/89B7F072-5BBE-4C92-903E-D83E865D9367@trivadis.com
Backpatch-through: 14
2022-01-11 09:55:16 +09:00
Thomas Munro f3e78069db Make EXEC_BACKEND more convenient on Linux and FreeBSD.
Try to disable ASLR when building in EXEC_BACKEND mode, to avoid random
memory mapping failures while testing.  For developer use only, no
effect on regular builds.

Suggested-by: Andres Freund <andres@anarazel.de>
Tested-by: Bossart, Nathan <bossartn@amazon.com>
Discussion: https://postgr.es/m/20210806032944.m4tz7j2w47mant26%40alap3.anarazel.de
2022-01-11 00:04:33 +13:00
Bruce Momjian 27b77ecf9f Update copyright for 2022
Backpatch-through: 10
2022-01-07 19:04:57 -05:00
John Naylor 911588a3f8 Add fast path for validating UTF-8 text
Our previous validator used a traditional algorithm that performed
comparison and branching one byte at a time. It's useful in that
we always know exactly how many bytes we have validated, but that
precision comes at a cost. Input validation can show up prominently
in profiles of COPY FROM, and future improvements to COPY FROM such
as parallelism or faster line parsing will put more pressure on input
validation. Hence, add fast paths for both ASCII and multibyte UTF-8:

Use bitwise operations to check 16 bytes at a time for ASCII. If
that fails, use a "shift-based" DFA on those bytes to handle the
general case, including multibyte. These paths are relatively free
of branches and thus robust against all kinds of byte patterns. With
these algorithms, UTF-8 validation is several times faster, depending
on platform and the input byte distribution.

The previous coding in pg_utf8_verifystr() is retained for short
strings and for when the fast path returns an error.

Review, performance testing, and additional hacking by: Heikki
Linakangas, Vladimir Sitnikov, Amit Khandekar, Thomas Munro, and
Greg Stark

Discussion:
https://www.postgresql.org/message-id/CAFBsxsEV_SzH%2BOLyCiyon%3DiwggSyMh_eF6A3LU2tiWf3Cy2ZQg%40mail.gmail.com
2021-12-20 10:07:29 -04:00
Michael Paquier 6fb7c5d67c Centralize timestamp computation of control file on updates
This commit moves the timestamp computation of the control file within
the routine of src/common/ in charge of updating the backend's control
file, which is shared by multiple frontend tools (pg_rewind,
pg_checksums and pg_resetwal) and the backend itself.

This change has as direct effect to update the control file's timestamp
when writing the control file in pg_rewind and pg_checksums, something
that is helpful to keep track of control file updates for those
operations, something also tracked by the backend at startup within its
logs.  This part is arguably a bug, as ControlFileData->time should be
updated each time a new version of the control file is written, but this
is a behavior change so no backpatch is done.

Author: Amul Sul
Reviewed-by: Nathan Bossart, Michael Paquier, Bharath Rupireddy
Discussion: https://postgr.es/m/CAAJ_b97nd_ghRpyFV9Djf9RLXkoTbOUqnocq11WGq9TisX09Fw@mail.gmail.com
2021-11-29 13:36:13 +09:00
Tom Lane 3804539e48 Replace random(), pg_erand48(), etc with a better PRNG API and algorithm.
Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating
a bunch of platform dependencies as well as fundamentally-obsolete PRNG
code.  In addition, this API replacement will ease replacing the
algorithm again in future, should that become necessary.

xoroshiro128** is a few percent slower than the drand48 family,
but it can produce full-width 64-bit random values not only 48-bit,
and it should be much more trustworthy.  It's likely to be noticeably
faster than the platform's random(), depending on which platform you
are thinking about; and we can have non-global state vectors easily,
unlike with random().  It is not cryptographically strong, but neither
are the functions it replaces.

Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself

Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
2021-11-28 21:33:07 -05:00
Tom Lane 46d665bc26 Allow psql's other uses of simple_prompt() to be interrupted by ^C.
This fills in the work left un-done by 5f1148224.  \prompt can
be canceled out of now, and so can password prompts issued during
\connect.  (We don't need to do anything for password prompts
issued during startup, because we aren't yet trapping SIGINT
at that point.)

Nathan Bossart

Discussion: https://postgr.es/m/747443.1635536754@sss.pgh.pa.us
2021-11-19 12:11:46 -05:00
Tom Lane 5f1148224b Provide a variant of simple_prompt() that can be interrupted by ^C.
Up to now, you couldn't escape out of psql's \password command
by typing control-C (or other local spelling of SIGINT).  This
is pretty user-unfriendly, so improve it.  To do so, we have to
modify the functions provided by pg_get_line.c; but we don't
want to mess with psql's SIGINT handler setup, so provide an
API that lets that handler cause the cancel to occur.

This relies on the assumption that we won't do any major harm by
longjmp'ing out of fgets().  While that's obviously a little shaky,
we've long had the same assumption in the main input loop, and few
issues have been reported.

psql has some other simple_prompt() calls that could usefully
be improved the same way; for now, just deal with \password.

Nathan Bossart, minor tweaks by me

Discussion: https://postgr.es/m/747443.1635536754@sss.pgh.pa.us
2021-11-17 19:09:54 -05:00
Michael Paquier 098c134556 Fix buffer overrun in unicode string normalization with empty input
PostgreSQL 13 and newer versions are directly impacted by that through
the SQL function normalize(), which would cause a call of this function
to write one byte past its allocation if using in input an empty
string after recomposing the string with NFC and NFKC.  Older versions
(v10~v12) are not directly affected by this problem as the only code
path using normalization is SASLprep in SCRAM authentication that
forbids the case of an empty string, but let's make the code more robust
anyway there so as any out-of-core callers of this function are covered.

The solution chosen to fix this issue is simple, with the addition of a
fast-exit path if the decomposed string is found as empty.  This would
only happen for an empty string as at its lowest level a codepoint would
be decomposed as itself if it has no entry in the decomposition table or
if it has a decomposition size of 0.

Some tests are added to cover this issue in v13~.  Note that an empty
string has always been considered as normalized (grammar "IS NF[K]{C,D}
NORMALIZED", through the SQL function is_normalized()) for all the
operations allowed (NFC, NFD, NFKC and NFKD) since this feature has been
introduced as of 2991ac5.  This behavior is unchanged but some tests are
added in v13~ to check after that.

I have also checked "make normalization-check" in src/common/unicode/,
while on it (works in 13~, and breaks in older stable branches
independently of this commit).

The release notes should just mention this commit for v13~.

Reported-by: Matthijs van der Vleuten
Discussion: https://postgr.es/m/17277-0c527a373794e802@postgresql.org
Backpatch-through: 10
2021-11-11 15:00:59 +09:00
Daniel Gustafsson 0ded7039fa Fix memory leak in pg_hmac
The intermittent h buffer was not freed, causing it to leak. Backpatch
through 14 where HMAC was refactored to the current API.

Author: Sergey Shinderuk <s.shinderuk@postgrespro.ru>
Discussion: https://postgr.es/m/af07e620-7e28-a742-4637-2bc44aa7c2be@postgrespro.ru
Backpatch-through: 14
2021-10-01 22:47:05 +02:00
Michael Paquier e767ddcd35 Fix typos and grammar in code comments
Several mistakes have piled in the code comments over the time,
including incorrect grammar, function names and simple typos.  This
commit takes care of a portion of these.

No backpatch is done as this is only cosmetic.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20210924215827.GS831@telsasoft.com
2021-09-27 14:21:28 +09:00
John Naylor 5bc429aacb Extend collection of Unicode combining characters to beyond the BMP
The former limit was perhaps a carryover from an older hand-coded
table. Since commit bab982161 we have enough space in mbinterval to
store larger codepoints, so collect all combining characters.

Discussion: https://www.postgresql.org/message-id/49ad1fa0-174e-c901-b14c-c484b60907f1%40enterprisedb.com
2021-08-26 13:07:34 -04:00
John Naylor bab982161e Update display widths as part of updating Unicode
The hardcoded "wide character" set in ucs_wcwidth() was last updated
around the Unicode 5.0 era.  This led to misalignment when printing
emojis and other codepoints that have since been designated
wide or full-width.

To fix and keep up to date, extend update-unicode to download the list
of wide and full-width codepoints from the offical sources.

In passing, remove some comments about non-spacing characters that
haven't been accurate since we removed the former hardcoded logic.

Jacob Champion

Reported and reviewed by Pavel Stehule
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRCeX21O69YHxmykYySYyprZAqrKWWg0KoGKdjgqcGyygg@mail.gmail.com
2021-08-26 10:53:56 -04:00
John Naylor 1563ecbc1b Revert "Rename unicode_combining_table to unicode_width_table"
This reverts commit eb0d0d2c73.

After I had committed eb0d0d2c7 and 78ab944cd, I decided to add
a sanity check for a "can't happen" scenario just to be cautious.
It turned out that it already happened in the official Unicode source
data, namely that a character can be both wide and a combining
character. This fact renders the aforementioned commits unnecessary,
so revert both of them.

Discussion: https://www.postgresql.org/message-id/CAFBsxsH5ejH4-1xaTLpSK8vWoK1m6fA1JBtTM6jmBsLfmDki1g%40mail.gmail.com
2021-08-26 10:06:12 -04:00
John Naylor f8c8a8bccc Revert "Change mbbisearch to return the character range"
This reverts commit 78ab944cd4.

After I had committed eb0d0d2c7 and 78ab944cd, I decided to add
a sanity check for a "can't happen" scenario just to be cautious.
It turned out that it already happened in the official Unicode source
data, namely that a character can be both wide and a combining
character. This fact renders the aforementioned commits unnecessary,
so revert both of them.

Discussion:
https://www.postgresql.org/message-id/CAFBsxsH5ejH4-1xaTLpSK8vWoK1m6fA1JBtTM6jmBsLfmDki1g%40mail.gmail.com
2021-08-26 09:58:28 -04:00
John Naylor 78ab944cd4 Change mbbisearch to return the character range
Add a width field to mbinterval and have mbbisearch return a
pointer to the found range rather than just bool for success.
A future commit will add another width besides zero, and this
will allow that to use the same search.

Reviewed by Jacob Champion
Discussion: https://www.postgresql.org/message-id/CAFBsxsGOCpzV7c-f3a8ADsA1n4uZ%3D8puCctQp%2Bx7W0vgkv%3Dw%2Bg%40mail.gmail.com
2021-08-25 13:08:11 -04:00
John Naylor eb0d0d2c73 Rename unicode_combining_table to unicode_width_table
No functional changes. A future commit will use this table for
other purposes besides combining characters.
2021-08-25 13:01:35 -04:00
Michael Paquier 2576dcfb76 Revert refactoring of hex code to src/common/
This is a combined revert of the following commits:
- c3826f8, a refactoring piece that moved the hex decoding code to
src/common/.  This code was cleaned up by aef8948, as it originally
included no overflow checks in the same way as the base64 routines in
src/common/ used by SCRAM, making it unsafe for its purpose.
- aef8948, a more advanced refactoring of the hex encoding/decoding code
to src/common/ that added sanity checks on the result buffer for hex
decoding and encoding.  As reported by Hans Buschmann, those overflow
checks are expensive, and it is possible to see a performance drop in
the decoding/encoding of bytea or LOs the longer they are.  Simple SQLs
working on large bytea values show a clear difference in perf profile.
- ccf4e27, a cleanup made possible by aef8948.

The reverts of all those commits bring back the performance of hex
decoding and encoding back to what it was in ~13.  Fow now and
post-beta3, this is the simplest option.

Reported-by: Hans Buschmann
Discussion: https://postgr.es/m/1629039545467.80333@nidsa.net
Backpatch-through: 14
2021-08-19 09:20:13 +09:00
Michael Paquier b44669b2ca Simplify error handing of jsonapi.c for the frontend
This commit removes a dependency to the central logging facilities in
the JSON parsing routines of src/common/, which existed to log errors
when seeing error codes that do not match any existing values in
JsonParseErrorType, which is not something that should never happen.

The routine providing a detailed error message based on the error code
is made backend-only, the existing code being unsafe to use in the
frontend as the error message may finish by being palloc'd or point to a
static string, so there is no way to know if the memory of the message
should be pfree'd or not.  The only user of this routine in the frontend
was pg_verifybackup, that is changed to use a more generic error message
on parsing failure.

Note that making this code more resilient to OOM failures if used in
shared libraries would require much more work as a lot of code paths
still rely on palloc() & friends, but we are not sure yet if we need to
go down to that.  Still, removing the dependency to logging is a step
toward more portability.

This cleans up the handling of check_stack_depth() while on it, as it
exists only in the backend.

Per discussion with Jacob Champion and Tom Lane.

Discussion: https://postgr.es/m/YNwL7kXwn3Cckbd6@paquier.xyz
2021-07-02 09:35:12 +09:00
Tom Lane 42f94f56bf Fix incautious handling of possibly-miscoded strings in client code.
An incorrectly-encoded multibyte character near the end of a string
could cause various processing loops to run past the string's
terminating NUL, with results ranging from no detectable issue to
a program crash, depending on what happens to be in the following
memory.

This isn't an issue in the server, because we take care to verify
the encoding of strings before doing any interesting processing
on them.  However, that lack of care leaked into client-side code
which shouldn't assume that anyone has validated the encoding of
its input.

Although this is certainly a bug worth fixing, the PG security team
elected not to regard it as a security issue, primarily because
any untrusted text should be sanitized by PQescapeLiteral or
the like before being incorporated into a SQL or psql command.
(If an app fails to do so, the same technique can be used to
cause SQL injection, with probably much more dire consequences
than a mere client-program crash.)  Those functions were already
made proof against this class of problem, cf CVE-2006-2313.

To fix, invent PQmblenBounded() which is like PQmblen() except it
won't return more than the number of bytes remaining in the string.
In HEAD we can make this a new libpq function, as PQmblen() is.
It seems imprudent to change libpq's API in stable branches though,
so in the back branches define PQmblenBounded as a macro in the files
that need it.  (Note that just changing PQmblen's behavior would not
be a good idea; notably, it would completely break the escaping
functions' defense against this exact problem.  So we just want a
version for those callers that don't have any better way of handling
this issue.)

Per private report from houjingyi.  Back-patch to all supported branches.
2021-06-07 14:15:25 -04:00