postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2024-10-05 23:06:52 +02:00

Author	SHA1	Message	Date
Robert Haas	a448e49bcb	Revert 56-bit relfilenode change and follow-up commits. There are still some alignment-related failures in the buildfarm, which might or might not be able to be fixed quickly, but I've also just realized that it increased the size of many WAL records by 4 bytes because a block reference contains a RelFileLocator. The effect of that hasn't been studied or discussed, so revert for now.	2022-09-28 09:55:28 -04:00
Robert Haas	05d4cbf9b6	Increase width of RelFileNumbers from 32 bits to 56 bits. RelFileNumbers are now assigned using a separate counter, instead of being assigned from the OID counter. This counter never wraps around: if all 2^56 possible RelFileNumbers are used, an internal error occurs. As the cluster is limited to 2^64 total bytes of WAL, this limitation should not cause a problem in practice. If the counter were 64 bits wide rather than 56 bits wide, we would need to increase the width of the BufferTag, which might adversely impact buffer lookup performance. Also, this lets us use bigint for pg_class.relfilenode and other places where these values are exposed at the SQL level without worrying about overflow. This should remove the need to keep "tombstone" files around until the next checkpoint when relations are removed. We do that to keep RelFileNumbers from being recycled, but now that won't happen anyway. However, this patch doesn't actually change anything in this area; it just makes it possible for a future patch to do so. Dilip Kumar, based on an idea from Andres Freund, who also reviewed some earlier versions of the patch. Further review and some wordsmithing by me. Also reviewed at various points by Ashutosh Sharma, Vignesh C, Amul Sul, Álvaro Herrera, and Tom Lane. Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com	2022-09-27 13:25:21 -04:00
Peter Eisentraut	26f7802beb	Message style improvements	2022-09-24 18:41:25 -04:00
Michael Paquier	18ac08f0b4	Use min/max bounds defined by Zstd for compression level The bounds hardcoded in compression.c since `ffd5365` (minimum at 1 and maximum at 22) do not match the reality of what zstd is able to handle, these values being available via ZSTD_maxCLevel() and ZSTD_minCLevel() at run-time. The maximum of 22 is actually correct in recent versions, but the minimum was not as the library can go down to -131720 by design. This commit changes the code to use the run-time values in the code instead of some hardcoded ones. Zstd seems to assume that these bounds could change in the future, and Postgres will be able to adapt automatically to such changes thanks to what's being done in this commit. Reported-by: Justin Prysby Discussion: https://postgr.es/m/20220922033716.GL31833@telsasoft.com Backpatch-through: 15	2022-09-22 20:02:40 +09:00
Andres Freund	e6927270cd	meson: Add initial version of meson based build system Autoconf is showing its age, fewer and fewer contributors know how to wrangle it. Recursive make has a lot of hard to resolve dependency issues and slow incremental rebuilds. Our home-grown MSVC build system is hard to maintain for developers not using Windows and runs tests serially. While these and other issues could individually be addressed with incremental improvements, together they seem best addressed by moving to a more modern build system. After evaluating different build system choices, we chose to use meson, to a good degree based on the adoption by other open source projects. We decided that it's more realistic to commit a relatively early version of the new build system and mature it in tree. This commit adds an initial version of a meson based build system. It supports building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD, Solaris and Windows (however only gcc is supported on aix, solaris). For Windows/MSVC postgres can now be built with ninja (faster, particularly for incremental builds) and msbuild (supporting the visual studio GUI, but building slower). Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM bitcode generation, documentation adjustments) are done in subsequent commits requiring further review. Other aspects (e.g. not installing test-only extensions) are not yet addressed. When building on Windows with msbuild, builds are slower when using a visual studio version older than 2019, because those versions do not support MultiToolTask, required by meson for intra-target parallelism. The plan is to remove the MSVC specific build system in src/tools/msvc soon after reaching feature parity. However, we're not planning to remove the autoconf/make build system in the near future. Likely we're going to keep at least the parts required for PGXS to keep working around until all supported versions build with meson. Some initial help for postgres developers is at https://wiki.postgresql.org/wiki/Meson With contributions from Thomas Munro, John Naylor, Stone Tickle and others. Author: Andres Freund <andres@anarazel.de> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Author: Peter Eisentraut <peter@eisentraut.org> Reviewed-By: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/20211012083721.hvixq4pnh2pixr3j@alap3.anarazel.de	2022-09-21 22:37:17 -07:00
Michael Paquier	f352e2d08a	Simplify handling of compression level with compression specifications PG_COMPRESSION_OPTION_LEVEL is removed from the compression specification logic, and instead the compression level is always assigned with each library's default if nothing is directly given. This centralizes the checks on the compression methods supported by a given build, and always assigns a default compression level when parsing a compression specification. This results in complaining at an earlier stage than previously if a build supports a compression method or not, aka when parsing a specification in the backend or the frontend, and not when processing it. zstd, lz4 and zlib are able to handle in their respective routines setting up the compression level the case of a default value, hence the backend or frontend code (pg_receivewal or pg_basebackup) has now no need to know what the default compression level should be if nothing is specified: the logic is now done so as the specification parsing assigns it. It can also be enforced by passing down a "level" set to the default value, that the backend will accept (the replication protocol is for example able to handle a command like BASE_BACKUP (COMPRESSION_DETAIL 'gzip:level=-1')). This code simplification fixes an issue with pg_basebackup --gzip introduced by `ffd5365`, where the tarball of the streamed WAL segments would be created as of pg_wal.tar.gz with uncompressed contents, while the intention is to compress the segments with gzip at a default level. The origin of the confusion comes from the handling of the default compression level of gzip (-1 or Z_DEFAULT_COMPRESSION) and the value of 0 was getting assigned, which is what walmethods.c would consider as equivalent to no compression when streaming WAL segments with its tar methods. Assigning always the compression level removes the confusion of some code paths considering a value of 0 set in a specification as either no compression or a default compression level. Note that 010_pg_basebackup.pl has to be adjusted to skip a few tests where the shape of the compression detail string for client and server-side compression was checked using gzip. This is a result of the code simplification, as gzip specifications cannot be used if a build does not support it. Reported-by: Tom Lane Reviewed-by: Tom Lane Discussion: https://postgr.es/m/1400032.1662217889@sss.pgh.pa.us Backpatch-through: 15	2022-09-14 12:16:57 +09:00
Peter Eisentraut	45b1a67a0f	pg_clean_ascii(): escape bytes rather than lose them Rather than replace each unprintable byte with a '?' character, replace it with a hex escape instead. The API now allocates a copy rather than modifying the input in place. Author: Jacob Champion <jchampion@timescale.com> Discussion: https://www.postgresql.org/message-id/CAAWbhmgsvHrH9wLU2kYc3pOi1KSenHSLAHBbCVmmddW6-mc_=w@mail.gmail.com	2022-09-13 16:10:44 +02:00
John Naylor	0bd9c62973	Treat Unicode codepoints of category "Format" as non-spacing Commit `d8594d123` updated the list of non-spacing codepoints used for calculating display width, but in doing so inadvertently removed some, since the script used for that commit only considered combining characters. For complete coverage for zero-width characters, include codepoints in the category Cf (Format). To reflect the wider purpose, also rename files and update comments that referred specifically to combining characters. Some of these ranges have been missing since v12, but due to lack of field complaints it was determined not important enough to justify adding special-case logic the backbranches. Kyotaro Horiguchi Report by Pavel Stehule Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRBE8yvpQ0FSkPCoe0Ny1jAAsAQ6j3qMgVwWvkqAoaaNmQ%40mail.gmail.com	2022-09-13 16:13:33 +07:00
Peter Eisentraut	5015e1e1b5	Assorted examples of expanded type-safer palloc/pg_malloc API This adds some uses of the new palloc/pg_malloc variants here and there as a demonstration and test. This is kept separate from the actual API patch, since the latter might be backpatched at some point. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/bb755632-2a43-d523-36f8-a1e7a389a907@enterprisedb.com	2022-09-12 08:45:03 +02:00
Michael Paquier	47bd0b3fa6	Replace load of functions by direct calls for some WIN32 This commit changes the following code paths to do direct system calls to some WIN32 functions rather than loading them from an external library, shaving some code in the process: - Creation of restricted tokens in pg_ctl.c, introduced by `a25cd81`. - QuerySecurityContextToken() in auth.c for SSPI authentication in the backend, introduced in `d602592`. - CreateRestrictedToken() in src/common/. This change is similar to the case of pg_ctl.c. Most of these functions were loaded rather than directly called because, as mentioned in the code comments, MinGW headers were not declaring them. I have double-checked the recent MinGW code, and all the functions changed here are declared in its headers, so this change should be safe. Note that I do not have a MinGW environment at hand so I have not tested it directly, but that MSVC was fine with the change. The buildfarm will tell soon enough if this change is appropriate or not for a much broader set of environments. A few code paths still use GetProcAddress() to load some functions: - LDAP authentication for ldap_start_tls_sA(), where I am not confident that this change would work. - win32env.c and win32ntdll.c where we have a per-MSVC version dependency for the name of the library loaded. - crashdump.c for MiniDumpWriteDump() and EnumDirTree(), where direct calls were not able to work after testing. Reported-by: Thomas Munro Reviewed-by: Justin Prysby Discussion: https://postgr.es/m/CA+hUKG+BMdcaCe=P-EjMoLTCr3zrrzqbcVE=8h5LyNsSVHKXZA@mail.gmail.com	2022-09-09 10:52:17 +09:00
John Naylor	0a8de93a48	Speed up lexing of long JSON strings Use optimized linear search when looking ahead for end quotes, backslashes, and non-printable characters. This results in nearly 40% faster JSON parsing on x86-64 when most values are long strings, and all platforms should see some improvement. Reviewed by Andres Freund and Nathan Bossart Discussion: https://www.postgresql.org/message-id/CAFBsxsGhaR2KQ5eisaK%3D6Vm60t%3DaxhD8Ckj1qFoCH1pktZi%2B2w%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAFBsxsESLUyJ5spfOSyPrOvKUEYYNqsBosue9SV1j8ecgNXSKA%40mail.gmail.com	2022-09-02 09:36:22 +07:00
Tom Lane	7fed801135	Clean up inconsistent use of fflush(). More than twenty years ago (`79fcde48b`), we hacked the postmaster to avoid a core-dump on systems that didn't support fflush(NULL). We've mostly, though not completely, hewed to that rule ever since. But such systems are surely gone in the wild, so in the spirit of cleaning out no-longer-needed portability hacks let's get rid of multiple per-file fflush() calls in favor of using fflush(NULL). Also, we were fairly inconsistent about whether to fflush() before popen() and system() calls. While we've received no bug reports about that, it seems likely that at least some of these call sites are at risk of odd behavior, such as error messages appearing in an unexpected order. Rather than expend a lot of brain cells figuring out which places are at hazard, let's just establish a uniform coding rule that we should fflush(NULL) before these calls. A no-op fflush() is surely of trivial cost compared to launching a sub-process via a shell; while if it's not a no-op then we likely need it. Discussion: https://postgr.es/m/2923412.1661722825@sss.pgh.pa.us	2022-08-29 13:55:41 -04:00
John Naylor	121d2d3d70	Use SSE2 in is_valid_ascii() where available. Per flame graph from Jelte Fennema, COPY FROM ... USING BINARY shows input validation taking at least 5% of the profile, so it's worth trying to be more efficient here. With this change, validation of pure ASCII is nearly 40% faster on contemporary Intel hardware. To make this change legible and easier to adopt to additional architectures, use helper functions to abstract the platform details away. Reviewed by Nathan Bossart Discussion: https://www.postgresql.org/message-id/CAFBsxsG%3Dk8t%3DC457FXnoBXb%3D8iA4OaZkbFogFMachWif7mNnww%40mail.gmail.com	2022-08-26 15:48:49 +07:00
Thomas Munro	c981879814	Don't bother to set sockaddr_un.sun_len. It's not necessary to fill in sun_len when calling bind() or connect(), on all known systems that have it. Discussion: https://postgr.es/m/2781112.1644819528%40sss.pgh.pa.us	2022-08-24 00:09:37 +12:00
Thomas Munro	64ef572c06	Remove configure probes for sockaddr_storage members. Remove four probes for members of sockaddr_storage. Keep only the probe for sockaddr's sa_len, which is enough for our two remaining places that know about _len fields: 1. ifaddr.c needs to know if sockaddr has sa_len to understand the result of ioctl(SIOCGIFCONF). Only AIX is still using the relevant code today, but it seems like a good idea to keep it compilable on Linux. 2. ip.c was testing for presence of ss_len to decide whether to fill in sun_len in our getaddrinfo_unix() function. It's just as good to test for sa_len. If you have one, you have them all. (The code in #2 isn't actually needed at all on several OSes I checked since modern versions ignore sa_len on input to system calls. Proving that's the case for all relevant OSes is left for another day, but wouldn't get rid of that last probe anyway if we still want it for #1.) Discussion: https://postgr.es/m/CA%2BhUKGJJjF2AqdU_Aug5n2MAc1gr%3DGykNjVBZq%2Bd6Jrcp3Dyvg%40mail.gmail.com	2022-08-22 17:50:30 +12:00
Thomas Munro	2492fe49dc	Remove configure probe for netinet/tcp.h. <netinet/tcp.h> is in SUSv3 and all targeted Unix systems have it. For Windows, we can provide a stub include file, to avoid some #ifdef noise. Discussion: https://postgr.es/m/CA+hUKGKErNfhmvb_H0UprEmp4LPzGN06yR2_0tYikjzB-2ECMw@mail.gmail.com	2022-08-18 16:31:11 +12:00
Thomas Munro	f558088285	Remove HAVE_UNIX_SOCKETS. Since HAVE_UNIX_SOCKETS is now defined unconditionally, remove the macro and drop a small amount of dead code. The last known systems not to have them (as far as I know at least) were QNX, which we de-supported years ago, and Windows, which now has them. If a new OS ever shows up with the POSIX sockets API but without working AF_UNIX, it'll presumably still be able to compile the code, and fail at runtime with an unsupported address family error. We might want to consider adding a HINT that you should turn off the option to use it if your network stack doesn't support it at that point, but it doesn't seem worth making the relevant code conditional at compile time. Also adjust a couple of places in the docs and comments that referred to builds without Unix-domain sockets, since there aren't any. Windows still gets a special mention in those places, though, because we don't try to use them by default there yet. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/CA%2BhUKG%2BL_3brvh%3D8e0BW_VfX9h7MtwgN%3DnFHP5o7X2oZucY9dg%40mail.gmail.com	2022-08-14 08:46:53 +12:00
Thomas Munro	5fc88c5d53	Replace pgwin32_is_junction() with lstat(). Now that lstat() reports junction points with S_IFLNK/S_ISLINK(), and unlink() can unlink them, there is no need for conditional code for Windows in a few places. That was expressed by testing for WIN32 or S_ISLNK, which we can now constant-fold. The coding around pgwin32_is_junction() was a bit suspect anyway, as we never checked for errors, and we also know that errors can be spuriously reported because of transient sharing violations on this OS. The lstat()-based code has handling for that. This also reverts `4fc6b6ee` on master only. That was done because lstat() didn't previously work for symlinks (junction points), but now it does. Tested-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://postgr.es/m/CA%2BhUKGLfOOeyZpm5ByVcAt7x5Pn-%3DxGRNCvgiUPVVzjFLtnY0w%40mail.gmail.com	2022-08-06 12:50:59 +12:00
Thomas Munro	2b1f580ee2	Remove configure probes for symlink/readlink, and dead code. symlink() and readlink() are in SUSv2 and all targeted Unix systems have them. We have partial emulation on Windows. Code that raised runtime errors on systems without it has been dead for years, so we can remove that and also references to such systems in the documentation. Define HAVE_READLINK and HAVE_SYMLINK macros on Unix. Our Windows replacement functions based on junction points can't be used for relative paths or for non-directories, so the macros can be used to check for full symlink support. The places that deal with tablespaces can just use symlink functions without checking the macros. (If they did check the macros, they'd need to provide an #else branch with a runtime or compile time error, and it'd be dead code.) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com	2022-08-05 09:22:56 +12:00
Thomas Munro	4fc6b6eefc	Fix get_dirent_type() for symlinks on MinGW/MSYS. On Windows with MSVC, get_dirent_type() was recently made to return DT_LNK for junction points by commit `9d3444dc`, which fixed some defective dirent.c code. On Windows with Cygwin, get_dirent_type() already worked for symlinks, as it does on POSIX systems, because Cygwin has its own fake symlinks that behave like POSIX (on closer inspection, Cygwin's dirent has the BSD d_type extension but it's probably always DT_UNKNOWN, so we fall back to lstat(), which understands Cygwin symlinks with S_ISLNK()). On Windows with MinGW/MSYS, we need extra code, because the MinGW runtime has its own readdir() without d_type, and the lstat()-based fallback has no knowledge of our convention for treating junctions as symlinks. Back-patch to 14, where get_dirent_type() landed. Reported-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://postgr.es/m/b9ddf605-6b36-f90d-7c30-7b3e95c46276%40dunslane.net	2022-07-28 14:26:12 +12:00
Andres Freund	c8a9246e09	Add output directory argument to generate-unicode_norm_table.pl This is in preparation for building postgres with meson / ninja. When building with meson, commands are run at the root of the build tree. Add an option to put build output into the appropriate place. Author: Andres Freund <andres@anarazel.de> Author: Peter Eisentraut <peter@eisentraut.org> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/5e216522-ba3c-f0e6-7f97-5276d0270029@enterprisedb.com	2022-07-18 12:24:39 -07:00
Peter Eisentraut	9fd45870c1	Replace many MemSet calls with struct initialization This replaces all MemSet() calls with struct initialization where that is easily and obviously possible. (For example, some cases have to worry about padding bits, so I left those.) (The same could be done with appropriate memset() calls, but this patch is part of an effort to phase out MemSet(), so it doesn't touch memset() calls.) Reviewed-by: Ranier Vilela <ranier.vf@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/9847b13c-b785-f4e2-75c3-12ec77a3b05c@enterprisedb.com	2022-07-16 08:50:49 +02:00
Tom Lane	920072339f	Improve error reporting from validate_exec(). validate_exec() didn't guarantee to set errno to something appropriate after a failure, leading to callers not being able to print an on-point message. Improve that. Noted by Kyotaro Horiguchi, though this isn't exactly his proposal. Discussion: https://postgr.es/m/20220615.131403.1791191615590453058.horikyota.ntt@gmail.com	2022-07-12 15:37:39 -04:00
John Naylor	d3117fc1a3	Fix out-of-bounds read in json_lex_string Commit `3838fa269` added a lookahead loop to allow building strings multiple bytes at a time. This loop could exit because it reached the end of input, yet did not check for that before checking if we reached the end of a valid string. To fix, put the end of string check back in the outer loop. Per Valgrind animal skink	2022-07-12 11:25:47 +07:00
John Naylor	3838fa269c	Build de-escaped JSON strings in larger chunks during lexing During COPY BINARY with large JSONB blobs, it was found that half the time was spent parsing JSON, with much of that spent in separate appendStringInfoChar() calls for each input byte. Add lookahead loop to json_lex_string() to allow batching multiple bytes via appendBinaryStringInfo(). Also use this same logic when de-escaping is not done, to avoid code duplication. Report and proof of concept patch by Jelte Fennema, reworked by Andres Freund and John Naylor Discussion: https://www.postgresql.org/message-id/CAGECzQQuXbies_nKgSiYifZUjBk6nOf2%3DTSXqRjj2BhUh8CTeA%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/flat/PR3PR83MB0476F098CBCF68AF7A1CA89FF7B49@PR3PR83MB0476.EURPRD83.prod.outlook.com	2022-07-11 11:11:36 +07:00
John Naylor	3de359f18f	Simplify json lexing state Instead of updating the length as we go, use a const pointer to end of the input, which we know already at the start. This simplifies the coding and may improve performance slightly, but the real motivation for doing this is to make further changes in this area easier to reason about. Discussion: https://www.postgresql.org/message-id/CAFBsxsGhaR2KQ5eisaK%3D6Vm60t%3DaxhD8Ckj1qFoCH1pktZi%2B2w%40mail.gmail.com	2022-07-08 14:53:20 +07:00
Robert Haas	b0a55e4329	Change internal RelFileNode references to RelFileNumber or RelFileLocator. We have been using the term RelFileNode to refer to either (1) the integer that is used to name the sequence of files for a certain relation within the directory set aside for that tablespace/database combination; or (2) that value plus the OIDs of the tablespace and database; or occasionally (3) the whole series of files created for a relation based on those values. Using the same name for more than one thing is confusing. Replace RelFileNode with RelFileNumber when we're talking about just the single number, i.e. (1) from above, and with RelFileLocator when we're talking about all the things that are needed to locate a relation's files on disk, i.e. (2) from above. In the places where we refer to (3) as a relfilenode, instead refer to "relation storage". Since there is a ton of SQL code in the world that knows about pg_class.relfilenode, don't change the name of that column, or of other SQL-facing things that derive their name from it. On the other hand, do adjust closely-related internal terminology. For example, the structure member names dbNode and spcNode appear to be derived from the fact that the structure itself was called RelFileNode, so change those to dbOid and spcOid. Likewise, various variables with names like rnode and relnode get renamed appropriately, according to how they're being used in context. Hopefully, this is clearer than before. It is also preparation for future patches that intend to widen the relfilenumber fields from its current width of 32 bits. Variables that store a relfilenumber are now declared as type RelFileNumber rather than type Oid; right now, these are the same, but that can now more easily be changed. Dilip Kumar, per an idea from me. Reviewed also by Andres Freund. I fixed some whitespace issues, changed a couple of words in a comment, and made one other minor correction. Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com	2022-07-06 11:39:09 -04:00
Peter Eisentraut	02c408e21a	Remove redundant null pointer checks before free() Per applicable standards, free() with a null pointer is a no-op. Systems that don't observe that are ancient and no longer relevant. Some PostgreSQL code already required this behavior, so this change does not introduce any new requirements, just makes the code more consistent. Discussion: https://www.postgresql.org/message-id/flat/dac5d2d0-98f5-94d9-8e69-46da2413593d%40enterprisedb.com	2022-07-03 11:47:15 +02:00
Peter Eisentraut	a8cca6026e	logging: Also add the command prefix to detail and hint messages This makes the output line up better and allows filtering messages by command. Discussion: https://www.postgresql.org/message-id/ba6d4fac-9e33-91f9-94fb-1e4c144a48b9@enterprisedb.com	2022-05-30 07:26:06 +02:00
Tom Lane	23e7b38bfe	Pre-beta mechanical code beautification. Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.	2022-05-12 15:17:30 -04:00
Daniel Gustafsson	17ec5fa502	Clear the OpenSSL error queue before cryptohash operations Setting up an EVP context for ciphers banned under FIPS generate two OpenSSL errors in the queue, and as we only consume one from the queue the other is at the head for the next invocation: postgres=# select md5('foo'); ERROR: could not compute MD5 hash: unsupported postgres=# select md5('foo'); ERROR: could not compute MD5 hash: initialization error Clearing the error queue when creating the context ensures that we don't pull in an error from an earlier operation. Discussion: https://postgr.es/m/C89D932C-501E-4473-9750-638CFCD9095E@yesql.se	2022-05-06 14:41:31 +02:00
Andrew Dunstan	b787c554c2	Inhibit mingw CRT's auto-globbing of command line arguments For some reason by default the mingw C Runtime takes it upon itself to expand program arguments that look like shell globbing characters. That has caused much scratching of heads and mis-attribution of the causes of some TAP test failures, so stop doing that. This removes an inconsistency with Windows binaries built with MSVC, which have no such behaviour. Per suggestion from Noah Misch. Backpatch to all live branches. Discussion: https://postgr.es/m/20220423025927.GA1274057@rfd.leadboat.com	2022-04-25 15:47:55 -04:00
Tom Lane	587de223f0	Add missing error handling in pg_md5_hash(). It failed to provide an error string as expected for the admittedly-unlikely case of OOM in pg_cryptohash_create(). Also, make it initialize *errstr to NULL for success, as pg_md5_binary() does. Also add missing comments. Readers should not have to reverse-engineer the API spec for a publicly visible routine.	2022-04-18 20:04:55 -04:00
Alvaro Herrera	24d2b2680a	Remove extraneous blank lines before block-closing braces These are useless and distracting. We wouldn't have written the code with them to begin with, so there's no reason to keep them. Author: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com Discussion: https://postgr.es/m/attachment/133167/0016-Extraneous-blank-lines.patch	2022-04-13 19:16:02 +02:00
Tom Lane	2c9381840f	Remove not-very-useful early checks of __pg_log_level in logging.h. Enforce __pg_log_level message filtering centrally in logging.c, instead of relying on the calling macros to do it. This is more reliable (e.g. it works correctly for direct calls to pg_log_generic) and it saves a percent or so of total code size because we get rid of so many duplicate checks of __pg_log_level. This does mean that argument expressions in a logging macro will be evaluated even if we end up not printing anything. That seems of little concern for INFO and higher levels as those messages are printed by default, and most of our frontend programs don't even offer a way to turn them off. I left the unlikely() checks in place for DEBUG messages, though. Discussion: https://postgr.es/m/3993549.1649449609@sss.pgh.pa.us	2022-04-12 13:25:29 -04:00
Michael Paquier	a4b57543ac	Rename backup_compression.{c,h} to compression.{c,h} Compression option handling (level, algorithm or even workers) can be used across several parts of the system and not only base backups. Structures, objects and routines are renamed in consequence, to remove the concept of base backups from this part of the code making this change straight-forward. pg_receivewal, that has gained support for LZ4 since `babbbb5`, will make use of this infrastructure for its set of compression options, bringing more consistency with pg_basebackup. This cleanup needs to be done before releasing a beta of 15. pg_dump is a potential future target, as well, and adding more compression options to it may happen in 16~. Author: Michael Paquier Reviewed-by: Robert Haas, Georgios Kokolatos Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz	2022-04-12 13:38:54 +09:00
Peter Eisentraut	c215cc7b61	Add color support for new frontend detail/hint messages As before, the defaults are similar to gcc's default appearance.	2022-04-11 17:36:44 +02:00
Tom Lane	9a374b77fb	Improve frontend error logging style. Get rid of the separate "FATAL" log level, as it was applied so inconsistently as to be meaningless. This mostly involves s/pg_log_fatal/pg_log_error/g. Create a macro pg_fatal() to handle the common use-case of pg_log_error() immediately followed by exit(1). Various modules had already invented either this or equivalent macros; standardize on pg_fatal() and apply it where possible. Invent the ability to add "detail" and "hint" messages to a frontend message, much as we have long had in the backend. Except where rewording was needed to convert existing coding to detail/hint style, I have (mostly) resisted the temptation to change existing message wording. Patch by me. Design and patch reviewed at various stages by Robert Haas, Kyotaro Horiguchi, Peter Eisentraut and Daniel Gustafsson. Discussion: https://postgr.es/m/1363732.1636496441@sss.pgh.pa.us	2022-04-08 14:55:14 -04:00
Robert Haas	8e053dc6df	Fix possible NULL-pointer-deference in backup_compression.c. Per Coverity and Tom Lane. Reviewed by Tom Lane and Justin Pryzby. Discussion: http://postgr.es/m/384291.1648403267@sss.pgh.pa.us	2022-03-30 15:53:08 -04:00
Robert Haas	51c0d186d9	Allow parallel zstd compression when taking a base backup. libzstd allows transparent parallel compression just by setting an option when creating the compression context, so permit that for both client and server-side backup compression. To use this, use something like pg_basebackup --compress WHERE-zstd:workers=N where WHERE is "client" or "server" and N is an integer. When compression is performed on the server side, this will spawn threads inside the PostgreSQL backend. While there is almost no PostgreSQL server code which is thread-safe, the threads here are used internally by libzstd and touch only data structures controlled by libzstd. Patch by me, based in part on earlier work by Dipesh Pandit and Jeevan Ladhe. Reviewed by Justin Pryzby. Discussion: http://postgr.es/m/CA+Tgmobj6u-nWF-j=FemygUhobhryLxf9h-wJN7W-2rSsseHNA@mail.gmail.com	2022-03-30 09:41:26 -04:00
Peter Eisentraut	c64fb698d0	Make update-unicode target work in vpath builds Author: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/616c6873-83b5-85c0-93cb-548977c39c60@enterprisedb.com	2022-03-25 09:47:50 +01:00
Robert Haas	68d8f9bfb2	In get_bc_algorithm_name, add a dummy return statement. This code shouldn't be reached, but having it here might avoid a compiler warning. Per CI complaint from Andres Freund. Discussion: http://postgr.es/m/C6A7643A-582B-47F7-A03D-01736BC0349B@anarazel.de	2022-03-23 11:37:12 -04:00
Robert Haas	ffd53659c4	Replace BASE_BACKUP COMPRESSION_LEVEL option with COMPRESSION_DETAIL. There are more compression parameters that can be specified than just an integer compression level, so rename the new COMPRESSION_LEVEL option to COMPRESSION_DETAIL before it gets released. Introduce a flexible syntax for that option to allow arbitrary options to be specified without needing to adjust the main replication grammar, and common code to parse it that is shared between the client and the server. This commit doesn't actually add any new compression parameters, so the only user-visible change is that you can now type something like pg_basebackup --compress gzip:level=5 instead of writing just pg_basebackup --compress gzip:5. However, it should make it easy to add new options. If for example gzip starts offering fries, we can support pg_basebackup --compress gzip:level=5,fries=true for the benefit of users who want fries with that. Along the way, this fixes a few things in pg_basebackup so that the pg_basebackup can be used with a server-side compression algorithm that pg_basebackup itself does not understand. For example, pg_basebackup --compress server-lz4 could still succeed even if only the server and not the client has LZ4 support, provided that the other options to pg_basebackup don't require the client to decompress the archive. Patch by me. Reviewed by Justin Pryzby and Dagfinn Ilmari Mannsåker. Discussion: http://postgr.es/m/CA+TgmoYvpetyRAbbg1M8b3-iHsaN4nsgmWPjOENu5-doHuJ7fA@mail.gmail.com	2022-03-23 09:19:14 -04:00
John Naylor	4b35408f1e	Use bitwise rotate functions in more places There were a number of places in the code that used bespoke bit-twiddling expressions to do bitwise rotation. While we've had pg_rotate_right32() for a while now, we hadn't gotten around to standardizing on that. Do so now. Since many potential call sites look more natural with the "left" equivalent, add that function too. Reviewed by Tom Lane and Yugo Nagata Discussion: https://www.postgresql.org/message-id/CAFBsxsH7c1LC0CGZ0ADCBXLHU5-%3DKNXx-r7tHYPAW51b2HK4Qw%40mail.gmail.com	2022-02-20 13:22:08 +07:00
Tom Lane	291ec6e45e	Suppress integer-overflow compiler warning for inconsistent sun_len. On AIX 7.1, struct sockaddr_un is declared to be 1025 bytes long, but the sun_len field that should hold the length is only a byte. Clamp the value we try to store to ensure it will fit in the field. (This coding might need adjustment if there are any machines out there where sun_len is as wide as size_t; but a preliminary survey suggests there's not, so let's keep it simple.) Discussion: https://postgr.es/m/2781112.1644819528@sss.pgh.pa.us	2022-02-14 11:25:46 -05:00
John Naylor	d3f45323bb	Improve code clarity in epilogue of UTF-8 verification fast path The previous coding was correct, but the style and commentary were a bit vague about which operations had to happen, in what circumstances, and in what order. Rearrange so that the epilogue does nothing in the DFA END state. That allows turning some conditional statements in the backtracking logic into asserts. With that, we can be more explicit about needing to backtrack at least one byte in non-END states to ensure checking the current byte sequence in the slow path. No change to the regression tests, since they should be able catch deficiencies here already. In passing, improve the comments around DFA states where the first continuation byte has a restricted range.	2022-01-17 22:53:50 -05:00
Michael Paquier	5513dc6a30	Improve error handling of HMAC computations This is similar to `b69aba7`, except that this completes the work for HMAC with a new routine called pg_hmac_error() that would provide more context about the type of error that happened during a HMAC computation: - The fallback HMAC implementation in hmac.c relies on cryptohashes, so in some code paths it is necessary to return back the error generated by cryptohashes. - For the OpenSSL implementation (hmac_openssl.c), the logic is very similar to cryptohash_openssl.c, where the error context comes from OpenSSL if one of its internal routines failed, with different error codes if something internal to hmac_openssl.c failed or was incorrect. Any in-core code paths that use the centralized HMAC interface are related to SCRAM, for errors that are unlikely going to happen, with only SHA-256. It would be possible to see errors when computing some HMACs with MD5 for example and OpenSSL FIPS enabled, and this commit would help in reporting the correct errors but nothing in core uses that. So, at the end, no backpatch to v14 is done, at least for now. Errors in SCRAM related to the computation of the server key, stored key, etc. need to pass down the potential error context string across more layers of their respective call stacks for the frontend and the backend, so each surrounding routine is adapted for this purpose. Reviewed-by: Sergey Shinderuk Discussion: https://postgr.es/m/Yd0N9tSAIIkFd+qi@paquier.xyz	2022-01-13 16:17:21 +09:00
Michael Paquier	87f29f4fcc	Fix incorrect comments in hmac.c and hmac_openssl.c Both files referred to pg_hmac_ctx->data, which, I guess, comes from the early versions of the patch that has resulted in commit `e6bdfd9`. Author: Sergey Shinderuk Discussion: https://postgr.es/m/8cbb56dd-63d6-a581-7a65-25a97ac4be03@postgrespro.ru Backpatch-through: 14	2022-01-13 09:43:36 +09:00
Michael Paquier	9a3d8e1886	Fix comment related to pg_cryptohash_error() One of the comments introduced in `b69aba7` was worded a bit weirdly, so improve it. Reported-by: Sergey Shinderuk Discussion: https://postgr.es/m/71b9a5d2-a3bf-83bc-a243-93dcf0bcfb3b@postgrespro.ru Backpatch-through: 14	2022-01-12 12:39:36 +09:00
Michael Paquier	b69aba7457	Improve error handling of cryptohash computations The existing cryptohash facility was causing problems in some code paths related to MD5 (frontend and backend) that relied on the fact that the only type of error that could happen would be an OOM, as the MD5 implementation used in PostgreSQL ~13 (the in-core implementation is used when compiling with or without OpenSSL in those older versions), could fail only under this circumstance. The new cryptohash facilities can fail for reasons other than OOMs, like attempting MD5 when FIPS is enabled (upstream OpenSSL allows that up to 1.0.2, Fedora and Photon patch OpenSSL 1.1.1 to allow that), so this would cause incorrect reports to show up. This commit extends the cryptohash APIs so as callers of those routines can fetch more context when an error happens, by using a new routine called pg_cryptohash_error(). The error states are stored within each implementation's internal context data, so as it is possible to extend the logic depending on what's suited for an implementation. The default implementation requires few error states, but OpenSSL could report various issues depending on its internal state so more is needed in cryptohash_openssl.c, and the code is shaped so as we are always able to grab the necessary information. The core code is changed to adapt to the new error routine, painting more "const" across the call stack where the static errors are stored, particularly in authentication code paths on variables that provide log details. This way, any future changes would warn if attempting to free these strings. The MD5 authentication code was also a bit blurry about the handling of "logdetail" (LOG sent to the postmaster), so improve the comments related that, while on it. The origin of the problem is `87ae969`, that introduced the centralized cryptohash facility. Extra changes are done for pgcrypto in v14 for the non-OpenSSL code path to cope with the improvements done by this commit. Reported-by: Michael Mühlbeyer Author: Michael Paquier Reviewed-by: Tom Lane Discussion: https://postgr.es/m/89B7F072-5BBE-4C92-903E-D83E865D9367@trivadis.com Backpatch-through: 14	2022-01-11 09:55:16 +09:00
Thomas Munro	f3e78069db	Make EXEC_BACKEND more convenient on Linux and FreeBSD. Try to disable ASLR when building in EXEC_BACKEND mode, to avoid random memory mapping failures while testing. For developer use only, no effect on regular builds. Suggested-by: Andres Freund <andres@anarazel.de> Tested-by: Bossart, Nathan <bossartn@amazon.com> Discussion: https://postgr.es/m/20210806032944.m4tz7j2w47mant26%40alap3.anarazel.de	2022-01-11 00:04:33 +13:00
Bruce Momjian	27b77ecf9f	Update copyright for 2022 Backpatch-through: 10	2022-01-07 19:04:57 -05:00
John Naylor	911588a3f8	Add fast path for validating UTF-8 text Our previous validator used a traditional algorithm that performed comparison and branching one byte at a time. It's useful in that we always know exactly how many bytes we have validated, but that precision comes at a cost. Input validation can show up prominently in profiles of COPY FROM, and future improvements to COPY FROM such as parallelism or faster line parsing will put more pressure on input validation. Hence, add fast paths for both ASCII and multibyte UTF-8: Use bitwise operations to check 16 bytes at a time for ASCII. If that fails, use a "shift-based" DFA on those bytes to handle the general case, including multibyte. These paths are relatively free of branches and thus robust against all kinds of byte patterns. With these algorithms, UTF-8 validation is several times faster, depending on platform and the input byte distribution. The previous coding in pg_utf8_verifystr() is retained for short strings and for when the fast path returns an error. Review, performance testing, and additional hacking by: Heikki Linakangas, Vladimir Sitnikov, Amit Khandekar, Thomas Munro, and Greg Stark Discussion: https://www.postgresql.org/message-id/CAFBsxsEV_SzH%2BOLyCiyon%3DiwggSyMh_eF6A3LU2tiWf3Cy2ZQg%40mail.gmail.com	2021-12-20 10:07:29 -04:00
Michael Paquier	6fb7c5d67c	Centralize timestamp computation of control file on updates This commit moves the timestamp computation of the control file within the routine of src/common/ in charge of updating the backend's control file, which is shared by multiple frontend tools (pg_rewind, pg_checksums and pg_resetwal) and the backend itself. This change has as direct effect to update the control file's timestamp when writing the control file in pg_rewind and pg_checksums, something that is helpful to keep track of control file updates for those operations, something also tracked by the backend at startup within its logs. This part is arguably a bug, as ControlFileData->time should be updated each time a new version of the control file is written, but this is a behavior change so no backpatch is done. Author: Amul Sul Reviewed-by: Nathan Bossart, Michael Paquier, Bharath Rupireddy Discussion: https://postgr.es/m/CAAJ_b97nd_ghRpyFV9Djf9RLXkoTbOUqnocq11WGq9TisX09Fw@mail.gmail.com	2021-11-29 13:36:13 +09:00
Tom Lane	3804539e48	Replace random(), pg_erand48(), etc with a better PRNG API and algorithm. Standardize on xoroshiro128 as our basic PRNG algorithm, eliminating a bunch of platform dependencies as well as fundamentally-obsolete PRNG code. In addition, this API replacement will ease replacing the algorithm again in future, should that become necessary. xoroshiro128 is a few percent slower than the drand48 family, but it can produce full-width 64-bit random values not only 48-bit, and it should be much more trustworthy. It's likely to be noticeably faster than the platform's random(), depending on which platform you are thinking about; and we can have non-global state vectors easily, unlike with random(). It is not cryptographically strong, but neither are the functions it replaces. Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo	2021-11-28 21:33:07 -05:00
Tom Lane	46d665bc26	Allow psql's other uses of simple_prompt() to be interrupted by ^C. This fills in the work left un-done by `5f1148224`. \prompt can be canceled out of now, and so can password prompts issued during \connect. (We don't need to do anything for password prompts issued during startup, because we aren't yet trapping SIGINT at that point.) Nathan Bossart Discussion: https://postgr.es/m/747443.1635536754@sss.pgh.pa.us	2021-11-19 12:11:46 -05:00
Tom Lane	5f1148224b	Provide a variant of simple_prompt() that can be interrupted by ^C. Up to now, you couldn't escape out of psql's \password command by typing control-C (or other local spelling of SIGINT). This is pretty user-unfriendly, so improve it. To do so, we have to modify the functions provided by pg_get_line.c; but we don't want to mess with psql's SIGINT handler setup, so provide an API that lets that handler cause the cancel to occur. This relies on the assumption that we won't do any major harm by longjmp'ing out of fgets(). While that's obviously a little shaky, we've long had the same assumption in the main input loop, and few issues have been reported. psql has some other simple_prompt() calls that could usefully be improved the same way; for now, just deal with \password. Nathan Bossart, minor tweaks by me Discussion: https://postgr.es/m/747443.1635536754@sss.pgh.pa.us	2021-11-17 19:09:54 -05:00
Michael Paquier	098c134556	Fix buffer overrun in unicode string normalization with empty input PostgreSQL 13 and newer versions are directly impacted by that through the SQL function normalize(), which would cause a call of this function to write one byte past its allocation if using in input an empty string after recomposing the string with NFC and NFKC. Older versions (v10~v12) are not directly affected by this problem as the only code path using normalization is SASLprep in SCRAM authentication that forbids the case of an empty string, but let's make the code more robust anyway there so as any out-of-core callers of this function are covered. The solution chosen to fix this issue is simple, with the addition of a fast-exit path if the decomposed string is found as empty. This would only happen for an empty string as at its lowest level a codepoint would be decomposed as itself if it has no entry in the decomposition table or if it has a decomposition size of 0. Some tests are added to cover this issue in v13~. Note that an empty string has always been considered as normalized (grammar "IS NF[K]{C,D} NORMALIZED", through the SQL function is_normalized()) for all the operations allowed (NFC, NFD, NFKC and NFKD) since this feature has been introduced as of `2991ac5`. This behavior is unchanged but some tests are added in v13~ to check after that. I have also checked "make normalization-check" in src/common/unicode/, while on it (works in 13~, and breaks in older stable branches independently of this commit). The release notes should just mention this commit for v13~. Reported-by: Matthijs van der Vleuten Discussion: https://postgr.es/m/17277-0c527a373794e802@postgresql.org Backpatch-through: 10	2021-11-11 15:00:59 +09:00
Daniel Gustafsson	0ded7039fa	Fix memory leak in pg_hmac The intermittent h buffer was not freed, causing it to leak. Backpatch through 14 where HMAC was refactored to the current API. Author: Sergey Shinderuk <s.shinderuk@postgrespro.ru> Discussion: https://postgr.es/m/af07e620-7e28-a742-4637-2bc44aa7c2be@postgrespro.ru Backpatch-through: 14	2021-10-01 22:47:05 +02:00
Michael Paquier	e767ddcd35	Fix typos and grammar in code comments Several mistakes have piled in the code comments over the time, including incorrect grammar, function names and simple typos. This commit takes care of a portion of these. No backpatch is done as this is only cosmetic. Author: Justin Pryzby Discussion: https://postgr.es/m/20210924215827.GS831@telsasoft.com	2021-09-27 14:21:28 +09:00
John Naylor	5bc429aacb	Extend collection of Unicode combining characters to beyond the BMP The former limit was perhaps a carryover from an older hand-coded table. Since commit `bab982161` we have enough space in mbinterval to store larger codepoints, so collect all combining characters. Discussion: https://www.postgresql.org/message-id/49ad1fa0-174e-c901-b14c-c484b60907f1%40enterprisedb.com	2021-08-26 13:07:34 -04:00
John Naylor	bab982161e	Update display widths as part of updating Unicode The hardcoded "wide character" set in ucs_wcwidth() was last updated around the Unicode 5.0 era. This led to misalignment when printing emojis and other codepoints that have since been designated wide or full-width. To fix and keep up to date, extend update-unicode to download the list of wide and full-width codepoints from the offical sources. In passing, remove some comments about non-spacing characters that haven't been accurate since we removed the former hardcoded logic. Jacob Champion Reported and reviewed by Pavel Stehule Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRCeX21O69YHxmykYySYyprZAqrKWWg0KoGKdjgqcGyygg@mail.gmail.com	2021-08-26 10:53:56 -04:00
John Naylor	1563ecbc1b	Revert "Rename unicode_combining_table to unicode_width_table" This reverts commit `eb0d0d2c73`. After I had committed `eb0d0d2c7` and `78ab944cd`, I decided to add a sanity check for a "can't happen" scenario just to be cautious. It turned out that it already happened in the official Unicode source data, namely that a character can be both wide and a combining character. This fact renders the aforementioned commits unnecessary, so revert both of them. Discussion: https://www.postgresql.org/message-id/CAFBsxsH5ejH4-1xaTLpSK8vWoK1m6fA1JBtTM6jmBsLfmDki1g%40mail.gmail.com	2021-08-26 10:06:12 -04:00
John Naylor	f8c8a8bccc	Revert "Change mbbisearch to return the character range" This reverts commit `78ab944cd4`. After I had committed `eb0d0d2c7` and `78ab944cd`, I decided to add a sanity check for a "can't happen" scenario just to be cautious. It turned out that it already happened in the official Unicode source data, namely that a character can be both wide and a combining character. This fact renders the aforementioned commits unnecessary, so revert both of them. Discussion: https://www.postgresql.org/message-id/CAFBsxsH5ejH4-1xaTLpSK8vWoK1m6fA1JBtTM6jmBsLfmDki1g%40mail.gmail.com	2021-08-26 09:58:28 -04:00
John Naylor	78ab944cd4	Change mbbisearch to return the character range Add a width field to mbinterval and have mbbisearch return a pointer to the found range rather than just bool for success. A future commit will add another width besides zero, and this will allow that to use the same search. Reviewed by Jacob Champion Discussion: https://www.postgresql.org/message-id/CAFBsxsGOCpzV7c-f3a8ADsA1n4uZ%3D8puCctQp%2Bx7W0vgkv%3Dw%2Bg%40mail.gmail.com	2021-08-25 13:08:11 -04:00
John Naylor	eb0d0d2c73	Rename unicode_combining_table to unicode_width_table No functional changes. A future commit will use this table for other purposes besides combining characters.	2021-08-25 13:01:35 -04:00
Michael Paquier	2576dcfb76	Revert refactoring of hex code to src/common/ This is a combined revert of the following commits: - `c3826f8`, a refactoring piece that moved the hex decoding code to src/common/. This code was cleaned up by `aef8948`, as it originally included no overflow checks in the same way as the base64 routines in src/common/ used by SCRAM, making it unsafe for its purpose. - `aef8948`, a more advanced refactoring of the hex encoding/decoding code to src/common/ that added sanity checks on the result buffer for hex decoding and encoding. As reported by Hans Buschmann, those overflow checks are expensive, and it is possible to see a performance drop in the decoding/encoding of bytea or LOs the longer they are. Simple SQLs working on large bytea values show a clear difference in perf profile. - `ccf4e27`, a cleanup made possible by `aef8948`. The reverts of all those commits bring back the performance of hex decoding and encoding back to what it was in ~13. Fow now and post-beta3, this is the simplest option. Reported-by: Hans Buschmann Discussion: https://postgr.es/m/1629039545467.80333@nidsa.net Backpatch-through: 14	2021-08-19 09:20:13 +09:00
Michael Paquier	b44669b2ca	Simplify error handing of jsonapi.c for the frontend This commit removes a dependency to the central logging facilities in the JSON parsing routines of src/common/, which existed to log errors when seeing error codes that do not match any existing values in JsonParseErrorType, which is not something that should never happen. The routine providing a detailed error message based on the error code is made backend-only, the existing code being unsafe to use in the frontend as the error message may finish by being palloc'd or point to a static string, so there is no way to know if the memory of the message should be pfree'd or not. The only user of this routine in the frontend was pg_verifybackup, that is changed to use a more generic error message on parsing failure. Note that making this code more resilient to OOM failures if used in shared libraries would require much more work as a lot of code paths still rely on palloc() & friends, but we are not sure yet if we need to go down to that. Still, removing the dependency to logging is a step toward more portability. This cleans up the handling of check_stack_depth() while on it, as it exists only in the backend. Per discussion with Jacob Champion and Tom Lane. Discussion: https://postgr.es/m/YNwL7kXwn3Cckbd6@paquier.xyz	2021-07-02 09:35:12 +09:00
Tom Lane	42f94f56bf	Fix incautious handling of possibly-miscoded strings in client code. An incorrectly-encoded multibyte character near the end of a string could cause various processing loops to run past the string's terminating NUL, with results ranging from no detectable issue to a program crash, depending on what happens to be in the following memory. This isn't an issue in the server, because we take care to verify the encoding of strings before doing any interesting processing on them. However, that lack of care leaked into client-side code which shouldn't assume that anyone has validated the encoding of its input. Although this is certainly a bug worth fixing, the PG security team elected not to regard it as a security issue, primarily because any untrusted text should be sanitized by PQescapeLiteral or the like before being incorporated into a SQL or psql command. (If an app fails to do so, the same technique can be used to cause SQL injection, with probably much more dire consequences than a mere client-program crash.) Those functions were already made proof against this class of problem, cf CVE-2006-2313. To fix, invent PQmblenBounded() which is like PQmblen() except it won't return more than the number of bytes remaining in the string. In HEAD we can make this a new libpq function, as PQmblen() is. It seems imprudent to change libpq's API in stable branches though, so in the back branches define PQmblenBounded as a macro in the files that need it. (Note that just changing PQmblen's behavior would not be a good idea; notably, it would completely break the escaping functions' defense against this exact problem. So we just want a version for those callers that don't have any better way of handling this issue.) Per private report from houjingyi. Back-patch to all supported branches.	2021-06-07 14:15:25 -04:00
David Rowley	7fc26d11e3	Adjust locations which have an incorrect copyright year A few patches committed after `ca3b37487` mistakenly forgot to make the copyright year 2021. Fix these. Discussion: https://postgr.es/m/CAApHDvqyLmd9P2oBQYJ=DbrV8QwyPRdmXtCTFYPE08h+ip0UJw@mail.gmail.com	2021-06-04 12:19:50 +12:00
Peter Eisentraut	82c3cd9741	Factor out system call names from error messages Instead, put them in via a format placeholder. This reduces the number of distinct translatable messages and also reduces the chances of typos during translation. We already did this for the system call arguments in a number of cases, so this is just the same thing taken a bit further. Discussion: https://www.postgresql.org/message-id/flat/92d6f545-5102-65d8-3c87-489f71ea0a37%40enterprisedb.com	2021-04-23 14:21:37 +02:00
Michael Paquier	7ef8b52cf0	Fix typos and grammar in comments and docs Author: Justin Pryzby Discussion: https://postgr.es/m/20210416070310.GG3315@telsasoft.com	2021-04-19 11:32:30 +09:00
Michael Paquier	e6bdfd9700	Refactor HMAC implementations Similarly to the cryptohash implementations, this refactors the existing HMAC code into a single set of APIs that can be plugged with any crypto libraries PostgreSQL is built with (only OpenSSL currently). If there is no such libraries, a fallback implementation is available. Those new APIs are designed similarly to the existing cryptohash layer, so there is no real new design here, with the same logic around buffer bound checks and memory handling. HMAC has a dependency on cryptohashes, so all the cryptohash types supported by cryptohash{_openssl}.c can be used with HMAC. This refactoring is an advantage mainly for SCRAM, that included its own implementation of HMAC with SHA256 without relying on the existing crypto libraries even if PostgreSQL was built with their support. This code has been tested on Windows and Linux, with and without OpenSSL, across all the versions supported on HEAD from 1.1.1 down to 1.0.1. I have also checked that the implementations are working fine using some sample results, a custom extension of my own, and doing cross-checks across different major versions with SCRAM with the client and the backend. Author: Michael Paquier Reviewed-by: Bruce Momjian Discussion: https://postgr.es/m/X9m0nkEJEzIPXjeZ@paquier.xyz	2021-04-03 17:30:49 +09:00
Peter Eisentraut	f06b1c5982	pg_upgrade: Check version of target cluster binaries This expands the binary validation in pg_upgrade with a version check per binary to ensure that the target cluster installation only contains binaries from the target version. In order to reduce duplication, validate_exec is exported from port.h and the local copy in pg_upgrade is removed. Author: Daniel Gustafsson <daniel@yesql.se> Discussion: https://www.postgresql.org/message-id/flat/9328.1552952117@sss.pgh.pa.us	2021-03-03 09:45:56 +01:00
Tom Lane	ffd3944ab9	Improve reporting for syntax errors in multi-line JSON data. Point to the specific line where the error was detected; the previous code tended to include several preceding lines as well. Avoid re-scanning the entire input to recompute which line that was. Simplify the logic a bit. Add test cases. Simon Riggs and Hamid Akhtar, reviewed by Daniel Gustafsson and myself Discussion: https://postgr.es/m/CANbhV-EPBnXm3MF_TTWBwwqgn1a1Ghmep9VHfqmNBQ8BT0f+_g@mail.gmail.com	2021-03-01 16:44:17 -05:00
Michael Paquier	b83dcf7928	Add result size as argument of pg_cryptohash_final() for overflow checks With its current design, a careless use of pg_cryptohash_final() could would result in an out-of-bound write in memory as the size of the destination buffer to store the result digest is not known to the cryptohash internals, without the caller knowing about that. This commit adds a new argument to pg_cryptohash_final() to allow such sanity checks, and implements such defenses. The internals of SCRAM for HMAC could be tightened a bit more, but as everything is based on SCRAM_KEY_LEN with uses particular to this code there is no need to complicate its interface more than necessary, and this comes back to the refactoring of HMAC in core. Except that, this minimizes the uses of the existing DIGEST_LENGTH variables, relying instead on sizeof() for the result sizes. In ossp-uuid, this also makes the code more defensive, as it already relied on dce_uuid_t being at least the size of a MD5 digest. This is in philosophy similar to `cfc40d3` for base64.c and `aef8948` for hex.c. Reported-by: Ranier Vilela Author: Michael Paquier, Ranier Vilela Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/CAEudQAoqEGmcff3J4sTSV-R_16Monuz-UpJFbf_dnVH=APr02Q@mail.gmail.com	2021-02-15 10:18:34 +09:00
Michael Paquier	42d74e0c44	Fix copy-paste error with SHA256 digest length in checksum_helper.c Issue introduced by `87ae969`, noticed while working on the area. While on it, fix some grammar in the surrounding static assertions.	2021-02-11 19:16:11 +09:00
Michael Paquier	fe61df7f82	Introduce --with-ssl={openssl} as a configure option This is a replacement for the existing --with-openssl, extending the logic to make easier the addition of new SSL libraries. The grammar is chosen to be similar to --with-uuid, where multiple values can be chosen, with "openssl" as the only supported value for now. The original switch, --with-openssl, is kept for compatibility. Author: Daniel Gustafsson, Michael Paquier Reviewed-by: Jacob Champion Discussion: https://postgr.es/m/FAB21FC8-0F62-434F-AA78-6BD9336D630A@yesql.se	2021-02-01 19:19:44 +09:00
Heikki Linnakangas	b80e10638e	Add mbverifystr() functions specific to each encoding. This makes pg_verify_mbstr() function faster, by allowing more efficient encoding-specific implementations. All the implementations included in this commit are pretty naive, they just call the same encoding-specific verifychar functions that were used previously, but that already gives a performance boost because the tight character-at-a-time loop is simpler. Reviewed-by: John Naylor Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01@iki.fi	2021-01-28 14:40:07 +02:00
Michael Paquier	a8ed6bb8f4	Introduce SHA1 implementations in the cryptohash infrastructure With this commit, SHA1 goes through the implementation provided by OpenSSL via EVP when building the backend with it, and uses as fallback implementation KAME which was located in pgcrypto and already shaped for an integration with a set of init, update and final routines. Structures and routines have been renamed to make things consistent with the fallback implementations of MD5 and SHA2. uuid-ossp has used for ages a shortcut with pgcrypto to fetch a copy of SHA1 if needed. This was built depending on the build options within ./configure, so this cleans up some code and removes the build dependency between pgcrypto and uuid-ossp. Note that this will help with the refactoring of HMAC, as pgcrypto offers the option to use MD5, SHA1 or SHA2, so only the second option was missing to make that possible. Author: Michael Paquier Reviewed-by: Heikki Linnakangas Discussion: https://postgr.es/m/X9HXKTgrvJvYO7Oh@paquier.xyz	2021-01-23 11:33:04 +09:00
Michael Paquier	aef8948f38	Rework refactoring of hex and encoding routines This commit addresses some issues with `c3826f83` that moved the hex decoding routine to src/common/: - The decoding function lacked overflow checks, so when used for security-related features it was an open door to out-of-bound writes if not carefully used that could remain undetected. Like the base64 routines already in src/common/ used by SCRAM, this routine is reworked to check for overflows by having the size of the destination buffer passed as argument, with overflows checked before doing any writes. - The encoding routine was missing. This is moved to src/common/ and it gains the same overflow checks as the decoding part. On failure, the hex routines of src/common/ issue an error as per the discussion done to make them usable by frontend tools, but not by shared libraries. Note that this is why ECPG is left out of this commit, and it still includes a duplicated logic doing hex encoding and decoding. While on it, this commit uses better variable names for the source and destination buffers in the existing escape and base64 routines in encode.c and it makes them more robust to overflow detection. The previous core code issued a FATAL after doing out-of-bound writes if going through the SQL functions, which would be enough to detect problems when working on changes that impacted this area of the code. Instead, an error is issued before doing an out-of-bound write. The hex routines were being directly called for bytea conversions and backup manifests without such sanity checks. The current calls happen to not have any problems, but careless uses of such APIs could easily lead to CVE-class bugs. Author: Bruce Momjian, Michael Paquier Reviewed-by: Sehrope Sarkuni Discussion: https://postgr.es/m/20201231003557.GB22199@momjian.us	2021-01-14 11:13:24 +09:00
Michael Paquier	15b824da97	Fix and simplify some code related to cryptohashes This commit addresses two issues: - In pgcrypto, MD5 computation called pg_cryptohash_{init,update,final} without checking for the result status. - Simplify pg_checksum_raw_context to use only one variable for all the SHA2 options available in checksum manifests. Reported-by: Heikki Linnakangas Discussion: https://postgr.es/m/f62f26bb-47a5-8411-46e5-4350823e06a5@iki.fi	2021-01-08 10:37:03 +09:00
Michael Paquier	55fe26a4b5	Fix allocation logic of cryptohash context data with OpenSSL The allocation of the cryptohash context data when building with OpenSSL was happening in the memory context of the caller of pg_cryptohash_create(), which could lead to issues with resowner cleanup if cascading resources are cleaned up on an error. Like other facilities using resowners, move the base allocation to TopMemoryContext to ensure a correct cleanup on failure. The resulting code gets simpler with this commit as the context data is now hold by a unique opaque pointer, so as there is only one single allocation done in TopMemoryContext. After discussion, also change the cryptohash subroutines to return an error if the caller provides NULL for the context data to ease error detection on OOM. Author: Heikki Linnakangas Discussion: https://postgr.es/m/X9xbuEoiU3dlImfa@paquier.xyz	2021-01-07 10:21:02 +09:00
Bruce Momjian	ca3b37487b	Update copyright for 2021 Backpatch-through: 9.5	2021-01-02 13:06:25 -05:00
Tom Lane	7ca37fb040	Use setenv() in preference to putenv(). Since at least 2001 we've used putenv() and avoided setenv(), on the grounds that the latter was unportable and not in POSIX. However, POSIX added it that same year, and by now the situation has reversed: setenv() is probably more portable than putenv(), since POSIX now treats the latter as not being a core function. And setenv() has cleaner semantics too. So, let's reverse that old policy. This commit adds a simple src/port/ implementation of setenv() for any stragglers (we have one in the buildfarm, but I'd not be surprised if that code is never used in the field). More importantly, extend win32env.c to also support setenv(). Then, replace usages of putenv() with setenv(), and get rid of some ad-hoc implementations of setenv() wannabees. Also, adjust our src/port/ implementation of unsetenv() to follow the POSIX spec that it returns an error indicator, rather than returning void as per the ancient BSD convention. I don't feel a need to make all the call sites check for errors, but the portability stub ought to match real-world practice. Discussion: https://postgr.es/m/2065122.1609212051@sss.pgh.pa.us	2020-12-30 12:56:06 -05:00
Bruce Momjian	3187ef7c46	Revert "Add key management system" (`978f869b99`) & later commits The patch needs test cases, reorganization, and cfbot testing. Technically reverts commits 5c31afc49d..e35b2bad1a (exclusive/inclusive) and 08db7c63f3..ccbe34139b. Reported-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/E1ktAAG-0002V2-VB@gemulon.postgresql.org	2020-12-27 21:37:42 -05:00
Bruce Momjian	7705f8ca03	Fix function call typo in frontend Win32 code, commit `978f869b99` Reported-by: buildfarm member walleye Backpatch-through: master	2020-12-25 20:49:50 -05:00
Tom Lane	0848cf4f55	Really fix the dummy implementations in cipher.c. `945083b2f` wasn't enough to silence compiler warnings.	2020-12-25 14:45:24 -05:00
Bruce Momjian	8e59813e22	fix no-return function call in cipher.c from commit `978f869b99` Reported-by: buildfarm member sifaka Backpatch-through: master	2020-12-25 14:40:46 -05:00
Bruce Momjian	e35b2bad1a	remove uint128 requirement from patch `978f869b99` (CFE) Used char[16] instead. Reported-by: buildfarm member florican Backpatch-through: master	2020-12-25 11:35:59 -05:00
Bruce Momjian	945083b2f7	Fix return value and const declaration from commit `978f869b99` This fixes the non-OpenSSL compile case. Reported-by: buildfarm member sifaka Backpatch-through: master	2020-12-25 11:00:32 -05:00
Bruce Momjian	978f869b99	Add key management system This adds a key management system that stores (currently) two data encryption keys of length 128, 192, or 256 bits. The data keys are AES256 encrypted using a key encryption key, and validated via GCM cipher mode. A command to obtain the key encryption key must be specified at initdb time, and will be run at every database server start. New parameters allow a file descriptor open to the terminal to be passed. pg_upgrade support has also been added. Discussion: https://postgr.es/m/CA+fd4k7q5o6Nc_AaX6BcYM9yqTbC6_pnH-6nSD=54Zp6NBQTCQ@mail.gmail.com Discussion: https://postgr.es/m/20201202213814.GG20285@momjian.us Author: Masahiko Sawada, me, Stephen Frost	2020-12-25 10:19:44 -05:00
Bruce Momjian	c3826f831e	move hex_decode() to /common so it can be called from frontend This allows removal of a copy of hex_decode() from ecpg, and will be used by the soon-to-be added pg_alterckey command. Backpatch-through: master	2020-12-24 17:25:48 -05:00
Michael Paquier	93e8ff8701	Refactor logic to check for ASCII-only characters in string The same logic was present for collation commands, SASLprep and pgcrypto, so this removes some code. Author: Michael Paquier Reviewed-by: Stephen Frost, Heikki Linnakangas Discussion: https://postgr.es/m/X9womIn6rne6Gud2@paquier.xyz	2020-12-21 09:37:11 +09:00
Michael Paquier	9b584953e7	Improve some code around cryptohash functions This adjusts some code related to recent changes for cryptohash functions: - Add a variable in md5.h to track down the size of a computed result, moved from pgcrypto. Note that pg_md5_hash() assumed a result of this size already. - Call explicit_bzero() on the hashed data when freeing the context for fallback implementations. For MD5, particularly, it would be annoying to leave some non-zeroed data around. - Clean up some code related to recent changes of uuid-ossp. .gitignore still included md5.c and a comment was incorrect. Discussion: https://postgr.es/m/X9HXKTgrvJvYO7Oh@paquier.xyz	2020-12-14 12:38:13 +09:00
Michael Paquier	b67b57a966	Refactor MD5 implementations according to new cryptohash infrastructure This commit heavily reorganizes the MD5 implementations that exist in the tree in various aspects. First, MD5 is added to the list of options available in cryptohash.c and cryptohash_openssl.c. This means that if building with OpenSSL, EVP is used for MD5 instead of the fallback implementation that Postgres had for ages. With the recent refactoring work for cryptohash functions, this change is straight-forward. If not building with OpenSSL, a fallback implementation internal to src/common/ is used. Second, this reduces the number of MD5 implementations present in the tree from two to one, by moving the KAME implementation from pgcrypto to src/common/, and by removing the implementation that existed in src/common/. KAME was already structured with an init/update/final set of routines by pgcrypto (see original pgcrypto/md5.h) for compatibility with OpenSSL, so moving it to src/common/ has proved to be a straight-forward move, requiring no actual manipulation of the internals of each routine. Some benchmarking has not shown any performance gap between both implementations. Similarly to the fallback implementation used for SHA2, the fallback implementation of MD5 is moved to src/common/md5.c with an internal header called md5_int.h for the init, update and final routines. This gets then consumed by cryptohash.c. The original routines used for MD5-hashed passwords are moved to a separate file called md5_common.c, also in src/common/, aimed at being shared between all MD5 implementations as utility routines to keep compatibility with any code relying on them. Like the SHA2 changes, this commit had its round of tests on both Linux and Windows, across all versions of OpenSSL supported on HEAD, with and even without OpenSSL. Author: Michael Paquier Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/20201106073434.GA4961@paquier.xyz	2020-12-10 11:59:10 +09:00
Michael Paquier	16c302f512	Simplify code for getting a unicode codepoint's canonical class. Three places of unicode_norm.c use a similar logic for getting the combining class from a codepoint. Commit `2991ac5` has added the function get_canonical_class() for this purpose, but it was only called by the backend. This commit refactors the code to use this function in all the places where the combining class is retrieved from a given codepoint. Author: John Naylor Discussion: https://postgr.es/m/CAFBsxsHUV7s7YrOm6hFz-Jq8Sc7K_yxTkfNZxsDV-DuM-k-gwg@mail.gmail.com	2020-12-09 13:24:38 +09:00
Michael Paquier	4f48a6fbe2	Change SHA2 implementation based on OpenSSL to use EVP digest routines The use of low-level hash routines is not recommended by upstream OpenSSL since 2000, and pgcrypto already switched to EVP as of `5ff4a67`. This takes advantage of the refactoring done in `87ae969` that has introduced the allocation and free routines for cryptographic hashes. Since 1.1.0, OpenSSL does not publish the contents of the cryptohash contexts, forcing any consumers to rely on OpenSSL for all allocations. Hence, the resource owner callback mechanism gains a new set of routines to track and free cryptohash contexts when using OpenSSL, preventing any risks of leaks in the backend. Nothing is needed in the frontend thanks to the refactoring of `87ae969`, and the resowner knowledge is isolated into cryptohash_openssl.c. Note that this also fixes a failure with SCRAM authentication when using FIPS in OpenSSL, but as there have been few complaints about this problem and as this causes an ABI breakage, no backpatch is done. Author: Michael Paquier Reviewed-by: Daniel Gustafsson, Heikki Linnakangas Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz Discussion: https://postgr.es/m/20180911030250.GA27115@paquier.xyz	2020-12-04 10:49:23 +09:00
Michael Paquier	91624c2ff8	Fix compilation warnings in cryptohash_openssl.c These showed up with -O2. Oversight in `87ae969`. Author: Fujii Masao Discussion: https://postgr.es/m/cee3df00-566a-400c-1252-67c3701f918a@oss.nttdata.com	2020-12-02 12:31:10 +09:00
Michael Paquier	87ae9691d2	Move SHA2 routines to a new generic API layer for crypto hashes Two new routines to allocate a hash context and to free it are created, as these become necessary for the goal behind this refactoring: switch the all cryptohash implementations for OpenSSL to use EVP (for FIPS and also because upstream does not recommend the use of low-level cryptohash functions for 20 years). Note that OpenSSL hides the internals of cryptohash contexts since 1.1.0, so it is necessary to leave the allocation to OpenSSL itself, explaining the need for those two new routines. This part is going to require more work to properly track hash contexts with resource owners, but this not introduced here. Still, this refactoring makes the move possible. This reduces the number of routines for all SHA2 implementations from twelve (SHA{224,256,386,512} with init, update and final calls) to five (create, free, init, update and final calls) by incorporating the hash type directly into the hash context data. The new cryptohash routines are moved to a new file, called cryptohash.c for the fallback implementations, with SHA2 specifics becoming a part internal to src/common/. OpenSSL specifics are part of cryptohash_openssl.c. This infrastructure is usable for more hash types, like MD5 or HMAC. Any code paths using the internal SHA2 routines are adapted to report correctly errors, which are most of the changes of this commit. The zones mostly impacted are checksum manifests, libpq and SCRAM. Note that `e21cbb4` was a first attempt to switch SHA2 to EVP, but it lacked the refactoring needed for libpq, as done here. This patch has been tested on Linux and Windows, with and without OpenSSL, and down to 1.0.1, the oldest version supported on HEAD. Author: Michael Paquier Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz	2020-12-02 10:37:20 +09:00

1 2 3 4 5 ...

353 Commits