postgresql

Commit Graph

Author	SHA1	Message	Date
Nathan Bossart	3726c1cb0e	Move is_valid_ascii() to ascii.h. This function requires simd.h, which is a rather large dependency for a widely-used header file like pg_wchar.h. Furthermore, there is a report of a third-party tool that is struggling to use pg_wchar.h due to its dependence on simd.h (presumably because simd.h uses several intrinsics). Moving the function to the much less popular ascii.h resolves these issues for now. This commit is back-patched for the benefit of the aforementioned third-party tool. The simd.h dependency was only added in v16, but we've opted to back-patch to v15 so that is_valid_ascii() lives in the same file for all versions where it exists. This could break existing third-party code that uses the function, but we couldn't find any examples of such code. It should be possible to fix any code that this commit breaks by including ascii.h in the file that uses is_valid_ascii(). Author: Jubilee Young Reviewed-by: Tom Lane, John Naylor, Andres Freund, Eric Ridge Discussion: https://postgr.es/m/CAPNHn3oKJJxMsYq%2BqLYzVJOFrUcOr4OF1EC-KtFT-qh8nOOOtQ%40mail.gmail.com Backpatch-through: 15	2024-01-29 12:09:08 -06:00
Tom Lane	985ac5ce29	Improve pglz_decompress's defenses against corrupt compressed data. When processing a match tag, check to see if the claimed "off" is more than the distance back to the output buffer start. If it is, then the data is corrupt, and what's more we would fetch from outside the buffer boundaries and potentially incur a SIGSEGV. (Although the odds of that seem relatively low, given that "off" can't be more than 4K.) Back-patch to v13; before that, this function wasn't really trying to protect against bad data. Report and fix by Flavien Guedez. Discussion: https://postgr.es/m/01fc0593-e31e-463d-902c-dd43174acee2@oopacity.net	2023-10-18 20:43:17 -04:00
Thomas Munro	5e39884d32	Try to handle torn reads of pg_control in frontend. Some of our src/bin tools read the control file without any kind of interlocking against concurrent writes from the server. At least ext4 and ntfs can expose partially modified contents when you do that. For now, we'll try to tolerate this by retrying up to 10 times if the checksum doesn't match, until we get two reads in a row with the same bad checksum. This is not guaranteed to reach the right conclusion, but it seems very likely to. Thanks to Tom Lane for this suggestion. Various ideas for interlocking or atomicity were considered too complicated, unportable or expensive given the lack of field reports, but remain open for future reconsideration. Back-patch as far as 12. It doesn't seem like a good idea to put a heuristic change for a very rare problem into the final release of 11. Reviewed-by: Anton A. Melnikov <aamelnikov@inbox.ru> Reviewed-by: David Steele <david@pgmasters.net> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de	2023-10-16 17:23:02 +13:00
Tom Lane	74a1a36d75	Fix JSON error reporting for many cases of erroneous string values. The majority of error exit cases in json_lex_string() failed to set lex->token_terminator, causing problems for the error context reporting code: it would see token_terminator less than token_start and do something more or less nuts. In v14 and up the end result could be as bad as a crash in report_json_context(). Older versions accidentally avoided that fate; but all versions produce error context lines that are far less useful than intended, because they'd stop at the end of the prior token instead of continuing to where the actually-bad input is. To fix, invent some macros that make it less notationally painful to do the right thing. Also add documentation about what the function is actually required to do; and in >= v14, add an assertion in report_json_context about token_terminator being sufficiently far advanced. Per report from Nikolay Shaplov. Back-patch to all supported versions. Discussion: https://postgr.es/m/7332649.x5DLKWyVIX@thinkpad-pgpro	2023-03-13 15:19:00 -04:00
Alvaro Herrera	1eeac95dc4	Change some errdetail() to errdetail_internal() This prevents marking the argument string for translation for gettext, and it also prevents the given string (which is already translated) from being translated at runtime. Also, mark the strings used as arguments to check_rolespec_name for translation. Backpatch all the way back as appropriate. None of this is caught by any tests (necessarily so), so I verified it manually.	2022-09-28 17:14:53 +02:00
Peter Eisentraut	517484b582	Message style improvements	2022-09-24 18:38:35 -04:00
Michael Paquier	ade925e169	Use min/max bounds defined by Zstd for compression level The bounds hardcoded in compression.c since `ffd5365` (minimum at 1 and maximum at 22) do not match the reality of what zstd is able to handle, these values being available via ZSTD_maxCLevel() and ZSTD_minCLevel() at run-time. The maximum of 22 is actually correct in recent versions, but the minimum was not as the library can go down to -131720 by design. This commit changes the code to use the run-time values in the code instead of some hardcoded ones. Zstd seems to assume that these bounds could change in the future, and Postgres will be able to adapt automatically to such changes thanks to what's being done in this commit. Reported-by: Justin Prysby Discussion: https://postgr.es/m/20220922033716.GL31833@telsasoft.com Backpatch-through: 15	2022-09-22 20:03:30 +09:00
Michael Paquier	53332eacaf	Simplify handling of compression level with compression specifications PG_COMPRESSION_OPTION_LEVEL is removed from the compression specification logic, and instead the compression level is always assigned with each library's default if nothing is directly given. This centralizes the checks on the compression methods supported by a given build, and always assigns a default compression level when parsing a compression specification. This results in complaining at an earlier stage than previously if a build supports a compression method or not, aka when parsing a specification in the backend or the frontend, and not when processing it. zstd, lz4 and zlib are able to handle in their respective routines setting up the compression level the case of a default value, hence the backend or frontend code (pg_receivewal or pg_basebackup) has now no need to know what the default compression level should be if nothing is specified: the logic is now done so as the specification parsing assigns it. It can also be enforced by passing down a "level" set to the default value, that the backend will accept (the replication protocol is for example able to handle a command like BASE_BACKUP (COMPRESSION_DETAIL 'gzip:level=-1')). This code simplification fixes an issue with pg_basebackup --gzip introduced by `ffd5365`, where the tarball of the streamed WAL segments would be created as of pg_wal.tar.gz with uncompressed contents, while the intention is to compress the segments with gzip at a default level. The origin of the confusion comes from the handling of the default compression level of gzip (-1 or Z_DEFAULT_COMPRESSION) and the value of 0 was getting assigned, which is what walmethods.c would consider as equivalent to no compression when streaming WAL segments with its tar methods. Assigning always the compression level removes the confusion of some code paths considering a value of 0 set in a specification as either no compression or a default compression level. Note that 010_pg_basebackup.pl has to be adjusted to skip a few tests where the shape of the compression detail string for client and server-side compression was checked using gzip. This is a result of the code simplification, as gzip specifications cannot be used if a build does not support it. Reported-by: Tom Lane Reviewed-by: Tom Lane Discussion: https://postgr.es/m/1400032.1662217889@sss.pgh.pa.us Backpatch-through: 15	2022-09-14 12:17:03 +09:00
Thomas Munro	fc4e5af307	Fix get_dirent_type() for symlinks on MinGW/MSYS. On Windows with MSVC, get_dirent_type() was recently made to return DT_LNK for junction points by commit `9d3444dc`, which fixed some defective dirent.c code. On Windows with Cygwin, get_dirent_type() already worked for symlinks, as it does on POSIX systems, because Cygwin has its own fake symlinks that behave like POSIX (on closer inspection, Cygwin's dirent has the BSD d_type extension but it's probably always DT_UNKNOWN, so we fall back to lstat(), which understands Cygwin symlinks with S_ISLNK()). On Windows with MinGW/MSYS, we need extra code, because the MinGW runtime has its own readdir() without d_type, and the lstat()-based fallback has no knowledge of our convention for treating junctions as symlinks. Back-patch to 14, where get_dirent_type() landed. Reported-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://postgr.es/m/b9ddf605-6b36-f90d-7c30-7b3e95c46276%40dunslane.net	2022-07-28 14:27:28 +12:00
Peter Eisentraut	a8cca6026e	logging: Also add the command prefix to detail and hint messages This makes the output line up better and allows filtering messages by command. Discussion: https://www.postgresql.org/message-id/ba6d4fac-9e33-91f9-94fb-1e4c144a48b9@enterprisedb.com	2022-05-30 07:26:06 +02:00
Tom Lane	23e7b38bfe	Pre-beta mechanical code beautification. Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.	2022-05-12 15:17:30 -04:00
Daniel Gustafsson	17ec5fa502	Clear the OpenSSL error queue before cryptohash operations Setting up an EVP context for ciphers banned under FIPS generate two OpenSSL errors in the queue, and as we only consume one from the queue the other is at the head for the next invocation: postgres=# select md5('foo'); ERROR: could not compute MD5 hash: unsupported postgres=# select md5('foo'); ERROR: could not compute MD5 hash: initialization error Clearing the error queue when creating the context ensures that we don't pull in an error from an earlier operation. Discussion: https://postgr.es/m/C89D932C-501E-4473-9750-638CFCD9095E@yesql.se	2022-05-06 14:41:31 +02:00
Andrew Dunstan	b787c554c2	Inhibit mingw CRT's auto-globbing of command line arguments For some reason by default the mingw C Runtime takes it upon itself to expand program arguments that look like shell globbing characters. That has caused much scratching of heads and mis-attribution of the causes of some TAP test failures, so stop doing that. This removes an inconsistency with Windows binaries built with MSVC, which have no such behaviour. Per suggestion from Noah Misch. Backpatch to all live branches. Discussion: https://postgr.es/m/20220423025927.GA1274057@rfd.leadboat.com	2022-04-25 15:47:55 -04:00
Tom Lane	587de223f0	Add missing error handling in pg_md5_hash(). It failed to provide an error string as expected for the admittedly-unlikely case of OOM in pg_cryptohash_create(). Also, make it initialize *errstr to NULL for success, as pg_md5_binary() does. Also add missing comments. Readers should not have to reverse-engineer the API spec for a publicly visible routine.	2022-04-18 20:04:55 -04:00
Alvaro Herrera	24d2b2680a	Remove extraneous blank lines before block-closing braces These are useless and distracting. We wouldn't have written the code with them to begin with, so there's no reason to keep them. Author: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com Discussion: https://postgr.es/m/attachment/133167/0016-Extraneous-blank-lines.patch	2022-04-13 19:16:02 +02:00
Tom Lane	2c9381840f	Remove not-very-useful early checks of __pg_log_level in logging.h. Enforce __pg_log_level message filtering centrally in logging.c, instead of relying on the calling macros to do it. This is more reliable (e.g. it works correctly for direct calls to pg_log_generic) and it saves a percent or so of total code size because we get rid of so many duplicate checks of __pg_log_level. This does mean that argument expressions in a logging macro will be evaluated even if we end up not printing anything. That seems of little concern for INFO and higher levels as those messages are printed by default, and most of our frontend programs don't even offer a way to turn them off. I left the unlikely() checks in place for DEBUG messages, though. Discussion: https://postgr.es/m/3993549.1649449609@sss.pgh.pa.us	2022-04-12 13:25:29 -04:00
Michael Paquier	a4b57543ac	Rename backup_compression.{c,h} to compression.{c,h} Compression option handling (level, algorithm or even workers) can be used across several parts of the system and not only base backups. Structures, objects and routines are renamed in consequence, to remove the concept of base backups from this part of the code making this change straight-forward. pg_receivewal, that has gained support for LZ4 since `babbbb5`, will make use of this infrastructure for its set of compression options, bringing more consistency with pg_basebackup. This cleanup needs to be done before releasing a beta of 15. pg_dump is a potential future target, as well, and adding more compression options to it may happen in 16~. Author: Michael Paquier Reviewed-by: Robert Haas, Georgios Kokolatos Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz	2022-04-12 13:38:54 +09:00
Peter Eisentraut	c215cc7b61	Add color support for new frontend detail/hint messages As before, the defaults are similar to gcc's default appearance.	2022-04-11 17:36:44 +02:00
Tom Lane	9a374b77fb	Improve frontend error logging style. Get rid of the separate "FATAL" log level, as it was applied so inconsistently as to be meaningless. This mostly involves s/pg_log_fatal/pg_log_error/g. Create a macro pg_fatal() to handle the common use-case of pg_log_error() immediately followed by exit(1). Various modules had already invented either this or equivalent macros; standardize on pg_fatal() and apply it where possible. Invent the ability to add "detail" and "hint" messages to a frontend message, much as we have long had in the backend. Except where rewording was needed to convert existing coding to detail/hint style, I have (mostly) resisted the temptation to change existing message wording. Patch by me. Design and patch reviewed at various stages by Robert Haas, Kyotaro Horiguchi, Peter Eisentraut and Daniel Gustafsson. Discussion: https://postgr.es/m/1363732.1636496441@sss.pgh.pa.us	2022-04-08 14:55:14 -04:00
Robert Haas	8e053dc6df	Fix possible NULL-pointer-deference in backup_compression.c. Per Coverity and Tom Lane. Reviewed by Tom Lane and Justin Pryzby. Discussion: http://postgr.es/m/384291.1648403267@sss.pgh.pa.us	2022-03-30 15:53:08 -04:00
Robert Haas	51c0d186d9	Allow parallel zstd compression when taking a base backup. libzstd allows transparent parallel compression just by setting an option when creating the compression context, so permit that for both client and server-side backup compression. To use this, use something like pg_basebackup --compress WHERE-zstd:workers=N where WHERE is "client" or "server" and N is an integer. When compression is performed on the server side, this will spawn threads inside the PostgreSQL backend. While there is almost no PostgreSQL server code which is thread-safe, the threads here are used internally by libzstd and touch only data structures controlled by libzstd. Patch by me, based in part on earlier work by Dipesh Pandit and Jeevan Ladhe. Reviewed by Justin Pryzby. Discussion: http://postgr.es/m/CA+Tgmobj6u-nWF-j=FemygUhobhryLxf9h-wJN7W-2rSsseHNA@mail.gmail.com	2022-03-30 09:41:26 -04:00
Peter Eisentraut	c64fb698d0	Make update-unicode target work in vpath builds Author: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/616c6873-83b5-85c0-93cb-548977c39c60@enterprisedb.com	2022-03-25 09:47:50 +01:00
Robert Haas	68d8f9bfb2	In get_bc_algorithm_name, add a dummy return statement. This code shouldn't be reached, but having it here might avoid a compiler warning. Per CI complaint from Andres Freund. Discussion: http://postgr.es/m/C6A7643A-582B-47F7-A03D-01736BC0349B@anarazel.de	2022-03-23 11:37:12 -04:00
Robert Haas	ffd53659c4	Replace BASE_BACKUP COMPRESSION_LEVEL option with COMPRESSION_DETAIL. There are more compression parameters that can be specified than just an integer compression level, so rename the new COMPRESSION_LEVEL option to COMPRESSION_DETAIL before it gets released. Introduce a flexible syntax for that option to allow arbitrary options to be specified without needing to adjust the main replication grammar, and common code to parse it that is shared between the client and the server. This commit doesn't actually add any new compression parameters, so the only user-visible change is that you can now type something like pg_basebackup --compress gzip:level=5 instead of writing just pg_basebackup --compress gzip:5. However, it should make it easy to add new options. If for example gzip starts offering fries, we can support pg_basebackup --compress gzip:level=5,fries=true for the benefit of users who want fries with that. Along the way, this fixes a few things in pg_basebackup so that the pg_basebackup can be used with a server-side compression algorithm that pg_basebackup itself does not understand. For example, pg_basebackup --compress server-lz4 could still succeed even if only the server and not the client has LZ4 support, provided that the other options to pg_basebackup don't require the client to decompress the archive. Patch by me. Reviewed by Justin Pryzby and Dagfinn Ilmari Mannsåker. Discussion: http://postgr.es/m/CA+TgmoYvpetyRAbbg1M8b3-iHsaN4nsgmWPjOENu5-doHuJ7fA@mail.gmail.com	2022-03-23 09:19:14 -04:00
John Naylor	4b35408f1e	Use bitwise rotate functions in more places There were a number of places in the code that used bespoke bit-twiddling expressions to do bitwise rotation. While we've had pg_rotate_right32() for a while now, we hadn't gotten around to standardizing on that. Do so now. Since many potential call sites look more natural with the "left" equivalent, add that function too. Reviewed by Tom Lane and Yugo Nagata Discussion: https://www.postgresql.org/message-id/CAFBsxsH7c1LC0CGZ0ADCBXLHU5-%3DKNXx-r7tHYPAW51b2HK4Qw%40mail.gmail.com	2022-02-20 13:22:08 +07:00
Tom Lane	291ec6e45e	Suppress integer-overflow compiler warning for inconsistent sun_len. On AIX 7.1, struct sockaddr_un is declared to be 1025 bytes long, but the sun_len field that should hold the length is only a byte. Clamp the value we try to store to ensure it will fit in the field. (This coding might need adjustment if there are any machines out there where sun_len is as wide as size_t; but a preliminary survey suggests there's not, so let's keep it simple.) Discussion: https://postgr.es/m/2781112.1644819528@sss.pgh.pa.us	2022-02-14 11:25:46 -05:00
John Naylor	d3f45323bb	Improve code clarity in epilogue of UTF-8 verification fast path The previous coding was correct, but the style and commentary were a bit vague about which operations had to happen, in what circumstances, and in what order. Rearrange so that the epilogue does nothing in the DFA END state. That allows turning some conditional statements in the backtracking logic into asserts. With that, we can be more explicit about needing to backtrack at least one byte in non-END states to ensure checking the current byte sequence in the slow path. No change to the regression tests, since they should be able catch deficiencies here already. In passing, improve the comments around DFA states where the first continuation byte has a restricted range.	2022-01-17 22:53:50 -05:00
Michael Paquier	5513dc6a30	Improve error handling of HMAC computations This is similar to `b69aba7`, except that this completes the work for HMAC with a new routine called pg_hmac_error() that would provide more context about the type of error that happened during a HMAC computation: - The fallback HMAC implementation in hmac.c relies on cryptohashes, so in some code paths it is necessary to return back the error generated by cryptohashes. - For the OpenSSL implementation (hmac_openssl.c), the logic is very similar to cryptohash_openssl.c, where the error context comes from OpenSSL if one of its internal routines failed, with different error codes if something internal to hmac_openssl.c failed or was incorrect. Any in-core code paths that use the centralized HMAC interface are related to SCRAM, for errors that are unlikely going to happen, with only SHA-256. It would be possible to see errors when computing some HMACs with MD5 for example and OpenSSL FIPS enabled, and this commit would help in reporting the correct errors but nothing in core uses that. So, at the end, no backpatch to v14 is done, at least for now. Errors in SCRAM related to the computation of the server key, stored key, etc. need to pass down the potential error context string across more layers of their respective call stacks for the frontend and the backend, so each surrounding routine is adapted for this purpose. Reviewed-by: Sergey Shinderuk Discussion: https://postgr.es/m/Yd0N9tSAIIkFd+qi@paquier.xyz	2022-01-13 16:17:21 +09:00
Michael Paquier	87f29f4fcc	Fix incorrect comments in hmac.c and hmac_openssl.c Both files referred to pg_hmac_ctx->data, which, I guess, comes from the early versions of the patch that has resulted in commit `e6bdfd9`. Author: Sergey Shinderuk Discussion: https://postgr.es/m/8cbb56dd-63d6-a581-7a65-25a97ac4be03@postgrespro.ru Backpatch-through: 14	2022-01-13 09:43:36 +09:00
Michael Paquier	9a3d8e1886	Fix comment related to pg_cryptohash_error() One of the comments introduced in `b69aba7` was worded a bit weirdly, so improve it. Reported-by: Sergey Shinderuk Discussion: https://postgr.es/m/71b9a5d2-a3bf-83bc-a243-93dcf0bcfb3b@postgrespro.ru Backpatch-through: 14	2022-01-12 12:39:36 +09:00
Michael Paquier	b69aba7457	Improve error handling of cryptohash computations The existing cryptohash facility was causing problems in some code paths related to MD5 (frontend and backend) that relied on the fact that the only type of error that could happen would be an OOM, as the MD5 implementation used in PostgreSQL ~13 (the in-core implementation is used when compiling with or without OpenSSL in those older versions), could fail only under this circumstance. The new cryptohash facilities can fail for reasons other than OOMs, like attempting MD5 when FIPS is enabled (upstream OpenSSL allows that up to 1.0.2, Fedora and Photon patch OpenSSL 1.1.1 to allow that), so this would cause incorrect reports to show up. This commit extends the cryptohash APIs so as callers of those routines can fetch more context when an error happens, by using a new routine called pg_cryptohash_error(). The error states are stored within each implementation's internal context data, so as it is possible to extend the logic depending on what's suited for an implementation. The default implementation requires few error states, but OpenSSL could report various issues depending on its internal state so more is needed in cryptohash_openssl.c, and the code is shaped so as we are always able to grab the necessary information. The core code is changed to adapt to the new error routine, painting more "const" across the call stack where the static errors are stored, particularly in authentication code paths on variables that provide log details. This way, any future changes would warn if attempting to free these strings. The MD5 authentication code was also a bit blurry about the handling of "logdetail" (LOG sent to the postmaster), so improve the comments related that, while on it. The origin of the problem is `87ae969`, that introduced the centralized cryptohash facility. Extra changes are done for pgcrypto in v14 for the non-OpenSSL code path to cope with the improvements done by this commit. Reported-by: Michael Mühlbeyer Author: Michael Paquier Reviewed-by: Tom Lane Discussion: https://postgr.es/m/89B7F072-5BBE-4C92-903E-D83E865D9367@trivadis.com Backpatch-through: 14	2022-01-11 09:55:16 +09:00
Thomas Munro	f3e78069db	Make EXEC_BACKEND more convenient on Linux and FreeBSD. Try to disable ASLR when building in EXEC_BACKEND mode, to avoid random memory mapping failures while testing. For developer use only, no effect on regular builds. Suggested-by: Andres Freund <andres@anarazel.de> Tested-by: Bossart, Nathan <bossartn@amazon.com> Discussion: https://postgr.es/m/20210806032944.m4tz7j2w47mant26%40alap3.anarazel.de	2022-01-11 00:04:33 +13:00
Bruce Momjian	27b77ecf9f	Update copyright for 2022 Backpatch-through: 10	2022-01-07 19:04:57 -05:00
John Naylor	911588a3f8	Add fast path for validating UTF-8 text Our previous validator used a traditional algorithm that performed comparison and branching one byte at a time. It's useful in that we always know exactly how many bytes we have validated, but that precision comes at a cost. Input validation can show up prominently in profiles of COPY FROM, and future improvements to COPY FROM such as parallelism or faster line parsing will put more pressure on input validation. Hence, add fast paths for both ASCII and multibyte UTF-8: Use bitwise operations to check 16 bytes at a time for ASCII. If that fails, use a "shift-based" DFA on those bytes to handle the general case, including multibyte. These paths are relatively free of branches and thus robust against all kinds of byte patterns. With these algorithms, UTF-8 validation is several times faster, depending on platform and the input byte distribution. The previous coding in pg_utf8_verifystr() is retained for short strings and for when the fast path returns an error. Review, performance testing, and additional hacking by: Heikki Linakangas, Vladimir Sitnikov, Amit Khandekar, Thomas Munro, and Greg Stark Discussion: https://www.postgresql.org/message-id/CAFBsxsEV_SzH%2BOLyCiyon%3DiwggSyMh_eF6A3LU2tiWf3Cy2ZQg%40mail.gmail.com	2021-12-20 10:07:29 -04:00
Michael Paquier	6fb7c5d67c	Centralize timestamp computation of control file on updates This commit moves the timestamp computation of the control file within the routine of src/common/ in charge of updating the backend's control file, which is shared by multiple frontend tools (pg_rewind, pg_checksums and pg_resetwal) and the backend itself. This change has as direct effect to update the control file's timestamp when writing the control file in pg_rewind and pg_checksums, something that is helpful to keep track of control file updates for those operations, something also tracked by the backend at startup within its logs. This part is arguably a bug, as ControlFileData->time should be updated each time a new version of the control file is written, but this is a behavior change so no backpatch is done. Author: Amul Sul Reviewed-by: Nathan Bossart, Michael Paquier, Bharath Rupireddy Discussion: https://postgr.es/m/CAAJ_b97nd_ghRpyFV9Djf9RLXkoTbOUqnocq11WGq9TisX09Fw@mail.gmail.com	2021-11-29 13:36:13 +09:00
Tom Lane	3804539e48	Replace random(), pg_erand48(), etc with a better PRNG API and algorithm. Standardize on xoroshiro128 as our basic PRNG algorithm, eliminating a bunch of platform dependencies as well as fundamentally-obsolete PRNG code. In addition, this API replacement will ease replacing the algorithm again in future, should that become necessary. xoroshiro128 is a few percent slower than the drand48 family, but it can produce full-width 64-bit random values not only 48-bit, and it should be much more trustworthy. It's likely to be noticeably faster than the platform's random(), depending on which platform you are thinking about; and we can have non-global state vectors easily, unlike with random(). It is not cryptographically strong, but neither are the functions it replaces. Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo	2021-11-28 21:33:07 -05:00
Tom Lane	46d665bc26	Allow psql's other uses of simple_prompt() to be interrupted by ^C. This fills in the work left un-done by `5f1148224`. \prompt can be canceled out of now, and so can password prompts issued during \connect. (We don't need to do anything for password prompts issued during startup, because we aren't yet trapping SIGINT at that point.) Nathan Bossart Discussion: https://postgr.es/m/747443.1635536754@sss.pgh.pa.us	2021-11-19 12:11:46 -05:00
Tom Lane	5f1148224b	Provide a variant of simple_prompt() that can be interrupted by ^C. Up to now, you couldn't escape out of psql's \password command by typing control-C (or other local spelling of SIGINT). This is pretty user-unfriendly, so improve it. To do so, we have to modify the functions provided by pg_get_line.c; but we don't want to mess with psql's SIGINT handler setup, so provide an API that lets that handler cause the cancel to occur. This relies on the assumption that we won't do any major harm by longjmp'ing out of fgets(). While that's obviously a little shaky, we've long had the same assumption in the main input loop, and few issues have been reported. psql has some other simple_prompt() calls that could usefully be improved the same way; for now, just deal with \password. Nathan Bossart, minor tweaks by me Discussion: https://postgr.es/m/747443.1635536754@sss.pgh.pa.us	2021-11-17 19:09:54 -05:00
Michael Paquier	098c134556	Fix buffer overrun in unicode string normalization with empty input PostgreSQL 13 and newer versions are directly impacted by that through the SQL function normalize(), which would cause a call of this function to write one byte past its allocation if using in input an empty string after recomposing the string with NFC and NFKC. Older versions (v10~v12) are not directly affected by this problem as the only code path using normalization is SASLprep in SCRAM authentication that forbids the case of an empty string, but let's make the code more robust anyway there so as any out-of-core callers of this function are covered. The solution chosen to fix this issue is simple, with the addition of a fast-exit path if the decomposed string is found as empty. This would only happen for an empty string as at its lowest level a codepoint would be decomposed as itself if it has no entry in the decomposition table or if it has a decomposition size of 0. Some tests are added to cover this issue in v13~. Note that an empty string has always been considered as normalized (grammar "IS NF[K]{C,D} NORMALIZED", through the SQL function is_normalized()) for all the operations allowed (NFC, NFD, NFKC and NFKD) since this feature has been introduced as of `2991ac5`. This behavior is unchanged but some tests are added in v13~ to check after that. I have also checked "make normalization-check" in src/common/unicode/, while on it (works in 13~, and breaks in older stable branches independently of this commit). The release notes should just mention this commit for v13~. Reported-by: Matthijs van der Vleuten Discussion: https://postgr.es/m/17277-0c527a373794e802@postgresql.org Backpatch-through: 10	2021-11-11 15:00:59 +09:00
Daniel Gustafsson	0ded7039fa	Fix memory leak in pg_hmac The intermittent h buffer was not freed, causing it to leak. Backpatch through 14 where HMAC was refactored to the current API. Author: Sergey Shinderuk <s.shinderuk@postgrespro.ru> Discussion: https://postgr.es/m/af07e620-7e28-a742-4637-2bc44aa7c2be@postgrespro.ru Backpatch-through: 14	2021-10-01 22:47:05 +02:00
Michael Paquier	e767ddcd35	Fix typos and grammar in code comments Several mistakes have piled in the code comments over the time, including incorrect grammar, function names and simple typos. This commit takes care of a portion of these. No backpatch is done as this is only cosmetic. Author: Justin Pryzby Discussion: https://postgr.es/m/20210924215827.GS831@telsasoft.com	2021-09-27 14:21:28 +09:00
John Naylor	5bc429aacb	Extend collection of Unicode combining characters to beyond the BMP The former limit was perhaps a carryover from an older hand-coded table. Since commit `bab982161` we have enough space in mbinterval to store larger codepoints, so collect all combining characters. Discussion: https://www.postgresql.org/message-id/49ad1fa0-174e-c901-b14c-c484b60907f1%40enterprisedb.com	2021-08-26 13:07:34 -04:00
John Naylor	bab982161e	Update display widths as part of updating Unicode The hardcoded "wide character" set in ucs_wcwidth() was last updated around the Unicode 5.0 era. This led to misalignment when printing emojis and other codepoints that have since been designated wide or full-width. To fix and keep up to date, extend update-unicode to download the list of wide and full-width codepoints from the offical sources. In passing, remove some comments about non-spacing characters that haven't been accurate since we removed the former hardcoded logic. Jacob Champion Reported and reviewed by Pavel Stehule Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRCeX21O69YHxmykYySYyprZAqrKWWg0KoGKdjgqcGyygg@mail.gmail.com	2021-08-26 10:53:56 -04:00
John Naylor	1563ecbc1b	Revert "Rename unicode_combining_table to unicode_width_table" This reverts commit `eb0d0d2c73`. After I had committed `eb0d0d2c7` and `78ab944cd`, I decided to add a sanity check for a "can't happen" scenario just to be cautious. It turned out that it already happened in the official Unicode source data, namely that a character can be both wide and a combining character. This fact renders the aforementioned commits unnecessary, so revert both of them. Discussion: https://www.postgresql.org/message-id/CAFBsxsH5ejH4-1xaTLpSK8vWoK1m6fA1JBtTM6jmBsLfmDki1g%40mail.gmail.com	2021-08-26 10:06:12 -04:00
John Naylor	f8c8a8bccc	Revert "Change mbbisearch to return the character range" This reverts commit `78ab944cd4`. After I had committed `eb0d0d2c7` and `78ab944cd`, I decided to add a sanity check for a "can't happen" scenario just to be cautious. It turned out that it already happened in the official Unicode source data, namely that a character can be both wide and a combining character. This fact renders the aforementioned commits unnecessary, so revert both of them. Discussion: https://www.postgresql.org/message-id/CAFBsxsH5ejH4-1xaTLpSK8vWoK1m6fA1JBtTM6jmBsLfmDki1g%40mail.gmail.com	2021-08-26 09:58:28 -04:00
John Naylor	78ab944cd4	Change mbbisearch to return the character range Add a width field to mbinterval and have mbbisearch return a pointer to the found range rather than just bool for success. A future commit will add another width besides zero, and this will allow that to use the same search. Reviewed by Jacob Champion Discussion: https://www.postgresql.org/message-id/CAFBsxsGOCpzV7c-f3a8ADsA1n4uZ%3D8puCctQp%2Bx7W0vgkv%3Dw%2Bg%40mail.gmail.com	2021-08-25 13:08:11 -04:00
John Naylor	eb0d0d2c73	Rename unicode_combining_table to unicode_width_table No functional changes. A future commit will use this table for other purposes besides combining characters.	2021-08-25 13:01:35 -04:00
Michael Paquier	2576dcfb76	Revert refactoring of hex code to src/common/ This is a combined revert of the following commits: - `c3826f8`, a refactoring piece that moved the hex decoding code to src/common/. This code was cleaned up by `aef8948`, as it originally included no overflow checks in the same way as the base64 routines in src/common/ used by SCRAM, making it unsafe for its purpose. - `aef8948`, a more advanced refactoring of the hex encoding/decoding code to src/common/ that added sanity checks on the result buffer for hex decoding and encoding. As reported by Hans Buschmann, those overflow checks are expensive, and it is possible to see a performance drop in the decoding/encoding of bytea or LOs the longer they are. Simple SQLs working on large bytea values show a clear difference in perf profile. - `ccf4e27`, a cleanup made possible by `aef8948`. The reverts of all those commits bring back the performance of hex decoding and encoding back to what it was in ~13. Fow now and post-beta3, this is the simplest option. Reported-by: Hans Buschmann Discussion: https://postgr.es/m/1629039545467.80333@nidsa.net Backpatch-through: 14	2021-08-19 09:20:13 +09:00
Michael Paquier	b44669b2ca	Simplify error handing of jsonapi.c for the frontend This commit removes a dependency to the central logging facilities in the JSON parsing routines of src/common/, which existed to log errors when seeing error codes that do not match any existing values in JsonParseErrorType, which is not something that should never happen. The routine providing a detailed error message based on the error code is made backend-only, the existing code being unsafe to use in the frontend as the error message may finish by being palloc'd or point to a static string, so there is no way to know if the memory of the message should be pfree'd or not. The only user of this routine in the frontend was pg_verifybackup, that is changed to use a more generic error message on parsing failure. Note that making this code more resilient to OOM failures if used in shared libraries would require much more work as a lot of code paths still rely on palloc() & friends, but we are not sure yet if we need to go down to that. Still, removing the dependency to logging is a step toward more portability. This cleans up the handling of check_stack_depth() while on it, as it exists only in the backend. Per discussion with Jacob Champion and Tom Lane. Discussion: https://postgr.es/m/YNwL7kXwn3Cckbd6@paquier.xyz	2021-07-02 09:35:12 +09:00
Tom Lane	42f94f56bf	Fix incautious handling of possibly-miscoded strings in client code. An incorrectly-encoded multibyte character near the end of a string could cause various processing loops to run past the string's terminating NUL, with results ranging from no detectable issue to a program crash, depending on what happens to be in the following memory. This isn't an issue in the server, because we take care to verify the encoding of strings before doing any interesting processing on them. However, that lack of care leaked into client-side code which shouldn't assume that anyone has validated the encoding of its input. Although this is certainly a bug worth fixing, the PG security team elected not to regard it as a security issue, primarily because any untrusted text should be sanitized by PQescapeLiteral or the like before being incorporated into a SQL or psql command. (If an app fails to do so, the same technique can be used to cause SQL injection, with probably much more dire consequences than a mere client-program crash.) Those functions were already made proof against this class of problem, cf CVE-2006-2313. To fix, invent PQmblenBounded() which is like PQmblen() except it won't return more than the number of bytes remaining in the string. In HEAD we can make this a new libpq function, as PQmblen() is. It seems imprudent to change libpq's API in stable branches though, so in the back branches define PQmblenBounded as a macro in the files that need it. (Note that just changing PQmblen's behavior would not be a good idea; notably, it would completely break the escaping functions' defense against this exact problem. So we just want a version for those callers that don't have any better way of handling this issue.) Per private report from houjingyi. Back-patch to all supported branches.	2021-06-07 14:15:25 -04:00

1 2 3 4 5 ...

284 Commits