postgresql

Commit Graph

Author	SHA1	Message	Date
Robert Haas	4667d97ca6	Fix typos in commit `05d4cbf9b6`. Reported by Justin Pryzby. Discussion: http://postgr.es/m/20220927185121.GE6256@telsasoft.com	2022-09-27 15:34:17 -04:00
Robert Haas	05d4cbf9b6	Increase width of RelFileNumbers from 32 bits to 56 bits. RelFileNumbers are now assigned using a separate counter, instead of being assigned from the OID counter. This counter never wraps around: if all 2^56 possible RelFileNumbers are used, an internal error occurs. As the cluster is limited to 2^64 total bytes of WAL, this limitation should not cause a problem in practice. If the counter were 64 bits wide rather than 56 bits wide, we would need to increase the width of the BufferTag, which might adversely impact buffer lookup performance. Also, this lets us use bigint for pg_class.relfilenode and other places where these values are exposed at the SQL level without worrying about overflow. This should remove the need to keep "tombstone" files around until the next checkpoint when relations are removed. We do that to keep RelFileNumbers from being recycled, but now that won't happen anyway. However, this patch doesn't actually change anything in this area; it just makes it possible for a future patch to do so. Dilip Kumar, based on an idea from Andres Freund, who also reviewed some earlier versions of the patch. Further review and some wordsmithing by me. Also reviewed at various points by Ashutosh Sharma, Vignesh C, Amul Sul, Álvaro Herrera, and Tom Lane. Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com	2022-09-27 13:25:21 -04:00
Michael Paquier	78fdb1e50f	Mark ParallelMessagePending as sig_atomic_t ParallelMessagePending was previously marked as a boolean which should be fine on modern platforms, but the C standard recommends the use of sig_atomic_t for variables manipulated in signal handlers. Author: Hayato Kuroda Discussion: https://postgr.es/m/TYAPR01MB58667C15A95A234720F4F876F5529@TYAPR01MB5866.jpnprd01.prod.outlook.com	2022-09-27 09:29:56 +09:00
Michael Paquier	e1e6f8f3df	Remove dependency to StringInfo in xlogbackup.{c.h} This was used as the returned result type of the generated contents for the backup_label and backup history files. This is replaced by a simple string, reducing the cleanup burden of all the callers of build_backup_content(). Reviewed-by: Bharath Rupireddy Discussion: https://postgr.es/m/YzERvNPaZivHEKZJ@paquier.xyz	2022-09-27 09:15:07 +09:00
Michael Paquier	7d708093b7	Refactor creation of backup_label and backup history files This change simplifies some of the logic related to the generation and creation of the backup_label and backup history files, which has become unnecessarily complicated since the removal of the exclusive backup mode in commit `39969e2`. The code was previously generating the contents of these files as a string (start phase for the backup_label and stop phase for the backup history file), one problem being that the contents of the backup_label string were scanned to grab some of its internal contents at the stop phase. This commit changes the logic so as we store the data required to build these files in an intermediate structure named BackupState. The backup_label file and backup history file strings are generated when they are ready to be sent back to the client. Both files are now generated with the same code path. While on it, this commit renames some variables for clarity. Two new files named xlogbackup.{c,h} are introduced in this commit, to remove from xlog.c some of the logic around base backups. Note that more could be moved to this new set of files. Author: Bharath Rupireddy, Michael Paquier Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/CALj2ACXWwTDgJqCjdaPyfR7djwm6SrybGcrZyrvojzcsmt4FFw@mail.gmail.com	2022-09-26 11:15:47 +09:00
Peter Eisentraut	26f7802beb	Message style improvements	2022-09-24 18:41:25 -04:00
Andres Freund	e6927270cd	meson: Add initial version of meson based build system Autoconf is showing its age, fewer and fewer contributors know how to wrangle it. Recursive make has a lot of hard to resolve dependency issues and slow incremental rebuilds. Our home-grown MSVC build system is hard to maintain for developers not using Windows and runs tests serially. While these and other issues could individually be addressed with incremental improvements, together they seem best addressed by moving to a more modern build system. After evaluating different build system choices, we chose to use meson, to a good degree based on the adoption by other open source projects. We decided that it's more realistic to commit a relatively early version of the new build system and mature it in tree. This commit adds an initial version of a meson based build system. It supports building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD, Solaris and Windows (however only gcc is supported on aix, solaris). For Windows/MSVC postgres can now be built with ninja (faster, particularly for incremental builds) and msbuild (supporting the visual studio GUI, but building slower). Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM bitcode generation, documentation adjustments) are done in subsequent commits requiring further review. Other aspects (e.g. not installing test-only extensions) are not yet addressed. When building on Windows with msbuild, builds are slower when using a visual studio version older than 2019, because those versions do not support MultiToolTask, required by meson for intra-target parallelism. The plan is to remove the MSVC specific build system in src/tools/msvc soon after reaching feature parity. However, we're not planning to remove the autoconf/make build system in the near future. Likely we're going to keep at least the parts required for PGXS to keep working around until all supported versions build with meson. Some initial help for postgres developers is at https://wiki.postgresql.org/wiki/Meson With contributions from Thomas Munro, John Naylor, Stone Tickle and others. Author: Andres Freund <andres@anarazel.de> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Author: Peter Eisentraut <peter@eisentraut.org> Reviewed-By: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/20211012083721.hvixq4pnh2pixr3j@alap3.anarazel.de	2022-09-21 22:37:17 -07:00
Michael Paquier	fbb5f54b67	Clear ps display of startup process at the end of recovery If the ps display is not cleared at this point, the process could continue displaying "recovering NNN" even if handling end-of-recovery steps. `df9274a` has tackled that by providing some information with the end-of-recovery checkpoint but `7ff23c6` has nullified the effect of the first commit. Per a suggestion from Justin, just clear the ps display when we are done with recovery, so as no incorrect information is displayed. This may get extended in the future, but for now restore the pre-7ff23c6 behavior. Author: Justin Prysby Discussion: https://postgr.es/m/20220913223954.GU31833@telsasoft.com Backpatch-through: 15	2022-09-22 14:25:09 +09:00
Tom Lane	152c9f7b8f	Suppress variable-set-but-not-used warnings from clang 15. clang 15+ will issue a set-but-not-used warning when the only use of a variable is in autoincrements (e.g., "foo++;"). That's perfectly sensible, but it detects a few more cases that we'd not noticed before. Silence the warnings with our usual methods, such as PG_USED_FOR_ASSERTS_ONLY, or in one case by actually removing a useless variable. One thing that we can't nicely get rid of is that with %pure-parser, Bison emits "yynerrs" as a local variable that falls foul of this warning. To silence those, I inserted "(void) yynerrs;" in the top-level productions of affected grammars. Per recently-established project policy, this is a candidate for back-patching into out-of-support branches: it suppresses annoying compiler warnings but changes no behavior. Hence, back-patch to 9.5, which is as far as these patches go without issues. (A preliminary check shows that the prior branches need some other set-but-not-used cleanups too, so I'll leave them for another day.) Discussion: https://postgr.es/m/514615.1663615243@sss.pgh.pa.us	2022-09-20 12:04:37 -04:00
Peter Geoghegan	bfcf1b3480	Harmonize parameter names in storage and AM code. Make sure that function declarations use names that exactly match the corresponding names from function definitions in storage, catalog, access method, executor, and logical replication code, as well as in miscellaneous utility/library code. Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will do the same for other parts of the codebase. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com	2022-09-19 19:18:36 -07:00
Peter Geoghegan	4bac9600f0	Harmonize heapam and tableam parameter names. Make sure that function declarations use names that exactly match the corresponding names from function definitions. Having parameter names that are reliably consistent in this way will make it easier to reason about groups of related C functions from the same translation unit as a module. It will also make certain refactoring tasks easier. Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will do the same for other parts of the codebase. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com	2022-09-19 16:46:23 -07:00
John Naylor	08f8af983a	Fix typos referring to PGPROC Japin Li Reviewed by Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/MEYP282MB1669459813B36FB5EAA38434B6499@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM	2022-09-19 11:36:51 +07:00
Noah Misch	b4f584f9d2	Reset InstallXLogFileSegmentActive after walreceiver self-initiated exit. After commit `cc2c7d65fc` added this flag, failure to reset it caused assertion failures. In non-assert builds, it made the system fail to achieve the objectives listed in that commit; chiefly, we might emit a spurious log message. Back-patch to v15, where that commit first appeared. Bharath Rupireddy and Kyotaro Horiguchi. Reviewed by Dilip Kumar, Nathan Bossart and Michael Paquier. Reported by Dilip Kumar. Discussion: https://postgr.es/m/CAFiTN-sE3ry=ycMPVtC+Djw4Fd7gbUGVv_qqw6qfzp=JLvqT3g@mail.gmail.com	2022-09-15 06:45:23 -07:00
Tom Lane	31dcfae83c	Use the terminology "WAL file" not "log file" more consistently. Referring to the WAL as just "log" invites confusion with the postmaster log, so avoid doing that in docs and error messages. Also shorten "WAL segment file" to just "WAL file" in various places. Bharath Rupireddy, reviewed by Nathan Bossart and Kyotaro Horiguchi Discussion: https://postgr.es/m/CALj2ACUeXa8tDPaiTLexBDMZ7hgvaN+RTb957-cn5qwv9zf-MQ@mail.gmail.com	2022-09-14 18:40:58 -04:00
Tom Lane	0a20ff54f5	Split up guc.c for better build speed and ease of maintenance. guc.c has grown to be one of our largest .c files, making it a bottleneck for compilation. It's also acquired a bunch of knowledge that'd be better kept elsewhere, because of our not very good habit of putting variable-specific check hooks here. Hence, split it up along these lines: * guc.c itself retains just the core GUC housekeeping mechanisms. * New file guc_funcs.c contains the SET/SHOW interfaces and some SQL-accessible functions for GUC manipulation. * New file guc_tables.c contains the data arrays that define the built-in GUC variables, along with some already-exported constant tables. * GUC check/assign/show hook functions are moved to the variable's home module, whenever that's clearly identifiable. A few hard- to-classify hooks ended up in commands/variable.c, which was already a home for miscellaneous GUC hook functions. To avoid cluttering a lot more header files with #include "guc.h", I also invented a new header file utils/guc_hooks.h and put all the GUC hook functions' declarations there, regardless of their originating module. That allowed removal of #include "guc.h" from some existing headers. The fallout from that (hopefully all caught here) demonstrates clearly why such inclusions are best minimized: there are a lot of files that, for example, were getting array.h at two or more levels of remove, despite not having any connection at all to GUCs in themselves. There is some very minor code beautification here, such as renaming a couple of inconsistently-named hook functions and improving some comments. But mostly this just moves code from point A to point B and deals with the ensuing needs for #include adjustments and exporting a few functions that previously weren't exported. Patch by me, per a suggestion from Andres Freund; thanks also to Michael Paquier for the idea to invent guc_funcs.c. Discussion: https://postgr.es/m/587607.1662836699@sss.pgh.pa.us	2022-09-13 11:11:45 -04:00
Michael Paquier	bb629c294b	Rename macro related to pg_backup_stop() This should have been part of `39969e2` that has renamed pg_stop_backup() to pg_backup_stop(), and this one is the last reference to pg_stop/start_backup() I could find in the tree. Author: Bharath Rupireddy Discussion: https://postgr.es/m/CALj2ACXjvC28ppeDTCrfaSyHga0ggP5nRLJbsjx=7N-74UT4QA@mail.gmail.com	2022-09-13 10:45:43 +09:00
Michael Paquier	df4a056619	Add more error context to RestoreBlockImage() and consume it On failure in restoring a block image, no details were provided, while it is possible to see failure with an inconsistent record state, a failure in processing decompression or a failure in decompression because a build does not support this option. RestoreBlockImage() is used in two code paths in the backend code, during recovery and when checking a page consistency after applying masking, and both places are changed to consume the error message produced by the internal routine when it returns a false status. All the error messages are reported under ERRCODE_INTERNAL_ERROR, that gets used also when attempting to access a page compressed by a method not supported by the build attempting the decompression. This is something that can happen in core when doing physical replication with primary and standby using inconsistent build options, for example. This routine is available since `2c03216d` and it has never provided any context about the error happening when it failed. This change is justified even more after `57aa5b2`, that introduced compression of FPWs in WAL. Reported-by: Justin Prysby Author: Michael Paquier Discussion: https://postgr.es/m/20220905002320.GD31833@telsasoft.com Backpatch-through: 15	2022-09-09 10:00:40 +09:00
Thomas Munro	adb466150b	Fix recovery_prefetch with low maintenance_io_concurrency. We should process completed IOs before trying to start more, so that it is always possible to decode one more record when the decoded record queue is empty, even if maintenance_io_concurrency is set so low that a single earlier WAL record might have saturated the IO queue. That bug was hidden because the effect of maintenance_io_concurrency was arbitrarily clamped to be at least 2. Fix the ordering, and also remove that clamp. We need a special case for 0, which is now treated the same as recovery_prefetch=off, but otherwise the number is used directly. This allows for testing with 1, which would have made the problem obvious in simple test scenarios. Also add an explicit error message for missing contrecords. It was a bit strange that we didn't report an error already, and became a latent bug with prefetching, since the internal state that tracks aborted contrecords would not survive retrying, as revealed by 026_overwrite_contrecord.pl with this adjustment. Reporting an error prevents that. Back-patch to 15. Reported-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20220831140128.GS31833%40telsasoft.com	2022-09-08 21:44:55 +12:00
Peter Eisentraut	6bcda4a721	Fix incorrect uses of Datum conversion macros Since these macros just cast whatever you give them to the designated output type, and many normal uses also cast the output type further, a number of incorrect uses go undiscovered. The fixes in this patch have been discovered by changing these macros to inline functions, which is the subject of a future patch. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/flat/8528fb7e-0aa2-6b54-85fb-0c0886dbd6ed%40enterprisedb.com	2022-09-05 13:30:44 +02:00
Thomas Munro	932b016300	Fix cache invalidation bug in recovery_prefetch. XLogPageRead() can retry internally after a pread() system call has succeeded, in the case of short reads, and page validation failures while in standby mode (see commit `0668719801`). Due to an oversight in commit `3f1ce973`, these cases could leave stale data in the internal cache of xlogreader.c without marking it invalid. The main defense against stale cached data on failure to read a page was in the error handling path of the calling function ReadPageInternal(), but that wasn't quite enough for errors handled internally by XLogPageRead()'s retry loop if we then exited with XLREAD_WOULDBLOCK. 1. ReadPageInternal() now marks the cache invalid before calling the page_read callback, by setting state->readLen to 0. It'll be set to a non-zero value only after a successful read. It'll stay valid as long as the caller requests data in the cached range. 2. XLogPageRead() no long performs internal retries while reading ahead. While such retries should work, the general philosophy is that we should give up prefetching if anything unusual happens so we can handle it when recovery catches up, to reduce the complexity of the system. Let's do that here too. 3. While here, a new function XLogReaderResetError() improves the separation between xlogrecovery.c and xlogreader.c, where the former previously clobbered the latter's internal error buffer directly. The new function makes this more explicit, and also clears a related flag, without which a standby would needlessly retry in the outer function. Thanks to Noah Misch for tracking down the conditions required for a rare build farm failure in src/bin/pg_ctl/t/003_promote.pl, and providing a reproducer. Back-patch to 15. Reported-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20220807003627.GA4168930%40rfd.leadboat.com	2022-09-03 13:28:43 +12:00
Michael Paquier	bfb9dfd937	Expand the use of get_dirent_type(), shaving a few calls to stat()/lstat() Several backend-side loops scanning one or more directories with ReadDir() (WAL segment recycle/removal in xlog.c, backend-side directory copy, temporary file removal, configuration file parsing, some logical decoding logic and some pgtz stuff) already know the type of the entry being scanned thanks to the dirent structure associated to the entry, on platforms where we know about DT_REG, DT_DIR and DT_LNK to make the difference between a regular file, a directory and a symbolic link. Relying on the direct structure of an entry saves a few system calls to stat() and lstat() in the loops updated here, shaving some code while on it. The logic of the code remains the same, calling stat() or lstat() depending on if it is necessary to look through symlinks. Authors: Nathan Bossart, Bharath Rupireddy Reviewed-by: Andres Freund, Thomas Munro, Michael Paquier Discussion: https://postgr.es/m/CALj2ACV8n-J-f=yiLUOx2=HrQGPSOZM3nWzyQQvLPcccPXxEdg@mail.gmail.com	2022-09-02 16:58:06 +09:00
Tom Lane	7fed801135	Clean up inconsistent use of fflush(). More than twenty years ago (`79fcde48b`), we hacked the postmaster to avoid a core-dump on systems that didn't support fflush(NULL). We've mostly, though not completely, hewed to that rule ever since. But such systems are surely gone in the wild, so in the spirit of cleaning out no-longer-needed portability hacks let's get rid of multiple per-file fflush() calls in favor of using fflush(NULL). Also, we were fairly inconsistent about whether to fflush() before popen() and system() calls. While we've received no bug reports about that, it seems likely that at least some of these call sites are at risk of odd behavior, such as error messages appearing in an unexpected order. Rather than expend a lot of brain cells figuring out which places are at hazard, let's just establish a uniform coding rule that we should fflush(NULL) before these calls. A no-op fflush() is surely of trivial cost compared to launching a sub-process via a shell; while if it's not a no-op then we likely need it. Discussion: https://postgr.es/m/2923412.1661722825@sss.pgh.pa.us	2022-08-29 13:55:41 -04:00
Robert Haas	6672d79139	Prevent WAL corruption after a standby promotion. When a PostgreSQL instance performing archive recovery but not using standby mode is promoted, and the last WAL segment that it attempted to read ended in a partial record, the previous code would create invalid WAL on the new timeline. The WAL from the previously timeline would be copied to the new timeline up until the end of the last valid record, but instead of beginning to write WAL at immediately afterwards, the promoted server would write an overwrite contrecord at the beginning of the next segment. The end of the previous segment would be left as all-zeroes, resulting in failures if anything tried to read WAL from that file. The root of the issue is that ReadRecord() decides whether to set abortedRecPtr and missingContrecPtr based on the value of StandbyMode, but ReadRecord() switches to a new timeline based on the value of ArchiveRecoveryRequested. We shouldn't try to write an overwrite contrecord if we're switching to a new timeline, so change the test in ReadRecod() to check ArchiveRecoveryRequested instead. Code fix by Dilip Kumar. Comments by me incorporating suggested language from Álvaro Herrera. Further review from Kyotaro Horiguchi and Sami Imseih. Discussion: http://postgr.es/m/CAFiTN-t7umki=PK8dT1tcPV=mOUe2vNhHML6b3T7W7qqvvajjg@mail.gmail.com Discussion: http://postgr.es/m/FB0DEA0B-E14E-43A0-811F-C1AE93D00FF3%40amazon.com	2022-08-29 11:07:37 -04:00
David Rowley	3e0fff2e68	More -Wshadow=compatible-local warning fixes In a similar effort to `f01592f91`, here we're targetting fixing the warnings where we've deemed the shadowing variable to serve a close enough purpose to the shadowed variable just to reuse the shadowed version and not declare the shadowing variable at all. By my count, this takes the warning count from 106 down to 71. Author: Justin Pryzby Discussion: https://postgr.es/m/20220825020839.GT2342@telsasoft.com	2022-08-26 02:35:40 +12:00
Michael Paquier	d951052a9e	Allow parallel workers to retrieve some data from Port This commit moves authn_id into a new global structure called ClientConnectionInfo (mapping to a MyClientConnectionInfo for each backend) which is intended to hold all the client information that should be shared between the backend and any of its parallel workers, access for extensions and triggers being the primary use case. There is no need to push all the data of Port to the workers, and authn_id is quite a generic concept so using a separate structure provides the best balance (the name of the structure has been suggested by Robert Haas). While on it, and per discussion as this would be useful for a potential SYSTEM_USER that can be accessed through parallel workers, a second field is added for the authentication method, copied directly from Port. ClientConnectionInfo is serialized and restored using a new parallel key and a structure tracks the length of the authn_id, making the addition of more fields straight-forward. Author: Jacob Champion Reviewed-by: Bertrand Drouvot, Stephen Frost, Robert Haas, Tom Lane, Michael Paquier, Julien Rouhaud Discussion: https://postgr.es/m/793d990837ae5c06a558d58d62de9378ab525d83.camel@vmware.com	2022-08-24 12:57:13 +09:00
John Naylor	1b9050da66	Remove empty statement Peter Smith Discussion: https://www.postgresql.org/message-id/CAHut%2BPtRGVuj8Q_GpHHxZyk7fGwdYDG8_s4GSfKoc_4Yd9vR-w%40mail.gmail.com	2022-08-23 09:24:32 +07:00
Robert Haas	ec97db399f	Adjust assertion in XLogDecodeNextRecord. As written, if you use XLogBeginRead() to position an xlogreader at the beginning of a WAL page and then try to read WAL, this assertion will fail. However, the header comment for XLogBeginRead() claims that positioning an xlogreader at the beginning of a page is valid, and the code here is perfectly able to cope with it. It's only the assertion that causes trouble. So relax it. This is formally a bug in all supported branches, but as it doesn't seem to have any consequences for current uses of the xlogreader facility, no back-patch, at least for now. Dilip Kumar and Robert Haas Discussion: http://postgr.es/m/CA+TgmoaJSs2_7WHW2GzFYe9+zfPtxBKvT3GW47+x=ptUE=cULw@mail.gmail.com	2022-08-18 12:22:20 -04:00
Michael Paquier	d265cd2029	Use SetInstallXLogFileSegmentActive() in more places in xlog.c This reduces the code paths where XLogCtl->InstallXLogFileSegmentActive is directly touched, and this wrapper function does the same thing as the original code replaced by the function call. Author: Bharath Rupireddy Reviewed-by: Nathan Bossart Discussion: https://postgr.es/m/CALj2ACVhkf-bC5CX-=6iBUfkO5GqmBntQH+m=HpY0iQ=-g1pRg@mail.gmail.com	2022-08-17 15:28:45 +09:00
Robert Haas	a8c0128697	Move basebackup code to new directory src/backend/backup Reviewed by David Steele and Justin Pryzby Discussion: http://postgr.es/m/CA+TgmoafqboATDSoXHz8VLrSwK_MDhjthK4hEpYjqf9_1Fmczw%40mail.gmail.com	2022-08-10 14:03:23 -04:00
Thomas Munro	670475b2fa	Fix obsolete comment in commit_ts.c. Commit `08aa89b` removed COMMIT_TS_SETTS, but left a reference in a comment. Author: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://postgr.es/m/20220726173343.GA154110%40nathanxps13	2022-08-09 12:58:04 +12:00
Thomas Munro	d2e150831a	Remove configure probe for fdatasync. fdatasync() is in SUSv2, and all targeted Unix systems have it. We have a replacement function for Windows. We retain the probe for the function declaration, which allows us to supply the mysteriously missing declaration for macOS, and also for Windows. No need to keep a HAVE_FDATASYNC macro around. Also rename src/port/fdatasync.c to win32fdatasync.c since it's only for Windows. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com Discussion: https://postgr.es/m/CA%2BhUKGJZJVO%3DiX%2Beb-PXi2_XS9ZRqnn_4URh0NUQOwt6-_51xQ%40mail.gmail.com	2022-08-05 16:37:38 +12:00
Thomas Munro	cf112c1220	Remove dead pread and pwrite replacement code. pread() and pwrite() are in SUSv2, and all targeted Unix systems have them. Previously, we defined pg_pread and pg_pwrite to emulate these function with lseek() on old Unixen. The names with a pg_ prefix were a reminder of a portability hazard: they might change the current file position. That hazard is gone, so we can drop the prefixes. Since the remaining replacement code is Windows-only, move it into src/port/win32p{read,write}.c, and move the declarations into src/include/port/win32_port.h. No need for vestigial HAVE_PREAD, HAVE_PWRITE macros as they were only used for declarations in port.h which have now moved into win32_port.h. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Greg Stark <stark@mit.edu> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com	2022-08-05 09:49:21 +12:00
Thomas Munro	2b1f580ee2	Remove configure probes for symlink/readlink, and dead code. symlink() and readlink() are in SUSv2 and all targeted Unix systems have them. We have partial emulation on Windows. Code that raised runtime errors on systems without it has been dead for years, so we can remove that and also references to such systems in the documentation. Define HAVE_READLINK and HAVE_SYMLINK macros on Unix. Our Windows replacement functions based on junction points can't be used for relative paths or for non-directories, so the macros can be used to check for full symlink support. The places that deal with tablespaces can just use symlink functions without checking the macros. (If they did check the macros, they'd need to provide an #else branch with a runtime or compile time error, and it'd be dead code.) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com	2022-08-05 09:22:56 +12:00
Daniel Gustafsson	f8f20203c2	Rephrase comments to make them clearer The use of "we" when referring to the active backend might be misunderstood, so rephrase to make it clearer who is performing the actions discussed in the comment. Author: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Erikjan Rijkers <er@xs4all.nl> Reviewed-by: Robert Treat <rob@xzilla.net> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAEG8a3LRSMqkvjiURiJoSi4aGWORpiXUmUfQQK5PaD6WfPzu3w@mail.gmail.com	2022-08-04 16:30:06 +02:00
Alvaro Herrera	9e4f914b5e	Fix replay of create database records on standby Crash recovery on standby may encounter missing directories when replaying database-creation WAL records. Prior to this patch, the standby would fail to recover in such a case; however, the directories could be legitimately missing. Consider the following sequence of commands: CREATE DATABASE DROP DATABASE DROP TABLESPACE If, after replaying the last WAL record and removing the tablespace directory, the standby crashes and has to replay the create database record again, crash recovery must be able to continue. A fix for this problem was already attempted in `49d9cfc68b`, but it was reverted because of design issues. This new version is based on Robert Haas' proposal: any missing tablespaces are created during recovery before reaching consistency. Tablespaces are created as real directories, and should be deleted by later replay. CheckRecoveryConsistency ensures they have disappeared. The problems detected by this new code are reported as PANIC, except when allow_in_place_tablespaces is set to ON, in which case they are WARNING. Apart from making tests possible, this gives users an escape hatch in case things don't go as planned. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Asim R Praveen <apraveen@pivotal.io> Author: Paul Guo <paulguo@gmail.com> Reviewed-by: Anastasia Lubennikova <lubennikovaav@gmail.com> (older versions) Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> (older versions) Reviewed-by: Michaël Paquier <michael@paquier.xyz> Diagnosed-by: Paul Guo <paulguo@gmail.com> Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com	2022-07-28 08:40:06 +02:00
Michael Paquier	ffd1b6bb6f	Add overflow protection for block-related data in WAL records XLogRecordBlockHeader, the header holding the information for the data related to a block, tracks the length of the data appended to the WAL record with data_length (uint16). This limitation in size was not enforced by the public routine in charge of registering the data assembled later to form the WAL record inserted, XLogRegisterBufData(). Incorrectly used, it could lead to the generation of records with some of its data overflowed. This commit adds some safeguards to prevent that for the block data, complaining immediately if attempting to add to a record block information with a size larger than UINT16_MAX, which is the limit implied by the internal logic. Note that this also adjusts XLogRegisterData() and XLogRegisterBufData() so as the length of the WAL record data given by the caller is unsigned, matching with what gets stored in XLogRecData->len. Extracted from a larger patch by the same author. The original patch includes more protections when assembling a record in full that will be looked at separately later. Author: Matthias van de Meent Reviewed-by: Andres Freund, Heikki Linnakangas, Michael Paquier, David Zhang Discussion: https://postgr.es/m/CAEze2WgGiw+LZt+vHf8tWqB_6VxeLsMeoAuod0N=ij1q17n5pw@mail.gmail.com	2022-07-27 13:35:40 +09:00
Tom Lane	f92944137c	Force immediate commit after CREATE DATABASE etc in extended protocol. We have a few commands that "can't run in a transaction block", meaning that if they complete their processing but then we fail to COMMIT, we'll be left with inconsistent on-disk state. However, the existing defenses for this are only watertight for simple query protocol. In extended protocol, we didn't commit until receiving a Sync message. Since the client is allowed to issue another command instead of Sync, we're in trouble if that command fails or is an explicit ROLLBACK. In any case, sitting in an inconsistent state while waiting for a client message that might not come seems pretty risky. This case wasn't reachable via libpq before we introduced pipeline mode, but it's always been an intended aspect of extended query protocol, and likely there are other clients that could reach it before. To fix, set a flag in PreventInTransactionBlock that tells exec_execute_message to force an immediate commit. This seems to be the approach that does least damage to existing working cases while still preventing the undesirable outcomes. While here, add some documentation to protocol.sgml that explicitly says how to use pipelining. That's latent in the existing docs if you know what to look for, but it's better to spell it out; and it provides a place to document this new behavior. Per bug #17434 from Yugo Nagata. It's been wrong for ages, so back-patch to all supported branches. Discussion: https://postgr.es/m/17434-d9f7a064ce2a88a3@postgresql.org	2022-07-26 13:07:03 -04:00
Fujii Masao	2387f52962	Remove useless arguments in ReadCheckpointRecord(). This commit removes two arguments "report" and "whichChkpt" in ReadCheckpointRecord(). "report" is obviously useless because it's always true, i.e., there are two callers of the function and they always specify true as "report". Commit `1d919de5eb` removed the only call with "report" = false. "whichChkpt" indicated where the specified checkpoint location came from, pg_control or backup_label. This information was used to report different error messages depending on where the invalid checkpoint record came from, when it was found. But ReadCheckpointRecord() doesn't need to do that because its callers already do that and users can still identify where the invalid checkpoint record came from, by reading such log messages. Also when "whichChkpt" was 0, the word "primary checkpoint" was used in the log message and could confuse users because the concept of primary and secondary checkpoints was already removed before. These are why this commit removes "whichChkpt" argument. Author: Fujii Masao Reviewed-by: Bharath Rupireddy, Kyotaro Horiguchi Discussion: https://postgr.es/m/fa2e12eb-81c3-0717-0272-755f8a81c8f2@oss.nttdata.com	2022-07-25 10:59:38 +09:00
Thomas Munro	5344723755	Remove unnecessary Windows-specific basebackup code. Commit `c6f2f016` added an explicit check for a Windows "junction point". That turned out to be needed only because get_dirent_type() was busted on Windows. It's been fixed by commit `9d3444dc`, so remove it. Add a TAP-test to demonstrate that in-place tablespaces are copied by pg_basebackup. This exercises the codepath that would fail before `c6f2f016` on Windows, and shows that it still doesn't fail now that we're using get_dirent_type() on both Windows and Unix. Back-patch to 15, where in-place tablespaces arrived and caused this problem (ie directories where previously only symlinks were expected). Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CA%2BhUKGLzLK4PUPx0_AwXEWXOYAejU%3D7XpxnYE55Y%2Be7hB2N3FA%40mail.gmail.com	2022-07-22 17:41:47 +12:00
Thomas Munro	a1b56090eb	Remove O_FSYNC and associated macros. O_FSYNC was a pre-POSIX way of spelling O_SYNC, supported since commit `9d645fd84c` for non-conforming operating systems of the time. It's not needed on any modern system. We can just use standard O_SYNC directly if it exists (= all targeted systems except Windows), and get rid of our OPEN_SYNC_FLAG macro. Similarly for standard O_DSYNC, we can just use that directly if it exists (= all targeted systems except DragonFlyBSD), and get rid of our OPEN_DATASYNC_FLAG macro. We still avoid choosing open_datasync as a default value for wal_sync_method if O_DSYNC has the same value as O_SYNC (= only OpenBSD), so there is no change in default behavior. Discussion: https://postgr.es/m/CA%2BhUKGJE7y92NY7FG2ftUbZUaqohBU65_Ys_7xF5mUHo4wirTQ%40mail.gmail.com	2022-07-22 12:41:17 +12:00
Fujii Masao	b24b2be119	Fix assertion failure and segmentation fault in backup code. When a non-exclusive backup is canceled, do_pg_abort_backup() is called and resets some variables set by pg_backup_start (pg_start_backup in v14 or before). But previously it forgot to reset the session state indicating whether a non-exclusive backup is in progress or not in this session. This issue could cause an assertion failure when the session running BASE_BACKUP is terminated after it executed pg_backup_start and pg_backup_stop (pg_stop_backup in v14 or before). Also it could cause a segmentation fault when pg_backup_stop is called after BASE_BACKUP in the same session is canceled. This commit fixes the issue by making do_pg_abort_backup reset that session state. Back-patch to all supported branches. Author: Fujii Masao Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Michael Paquier, Robert Haas Discussion: https://postgr.es/m/3374718f-9fbf-a950-6d66-d973e027f44c@oss.nttdata.com	2022-07-20 09:57:01 +09:00
Peter Eisentraut	9fd45870c1	Replace many MemSet calls with struct initialization This replaces all MemSet() calls with struct initialization where that is easily and obviously possible. (For example, some cases have to worry about padding bits, so I left those.) (The same could be done with appropriate memset() calls, but this patch is part of an effort to phase out MemSet(), so it doesn't touch memset() calls.) Reviewed-by: Ranier Vilela <ranier.vf@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/9847b13c-b785-f4e2-75c3-12ec77a3b05c@enterprisedb.com	2022-07-16 08:50:49 +02:00
Fujii Masao	62c46eee22	Add checkpoint and REDO LSN to log_checkpoints message. It is useful for debugging purposes to report the checkpoint LSN and REDO LSN in log_checkpoints message. It can give more context while analyzing checkpoint-related issues. pg_controldata reports the last checkpoint LSN and REDO LSN, but having this information alongside the log message helps analyze issues that happened previously, connect the dots and identify the root cause. Author: Bharath Rupireddy, Kyotaro Horiguchi Reviewed-by: Michael Paquier, Julien Rouhaud, Nathan Bossart, Fujii Masao, Greg Stark Discussion: https://postgr.es/m/CALj2ACWt6kqriAHrO+AJj+OmP=suwbktHT5JoYAn-nqZe2gd2g@mail.gmail.com	2022-07-07 22:37:54 +09:00
Robert Haas	b0a55e4329	Change internal RelFileNode references to RelFileNumber or RelFileLocator. We have been using the term RelFileNode to refer to either (1) the integer that is used to name the sequence of files for a certain relation within the directory set aside for that tablespace/database combination; or (2) that value plus the OIDs of the tablespace and database; or occasionally (3) the whole series of files created for a relation based on those values. Using the same name for more than one thing is confusing. Replace RelFileNode with RelFileNumber when we're talking about just the single number, i.e. (1) from above, and with RelFileLocator when we're talking about all the things that are needed to locate a relation's files on disk, i.e. (2) from above. In the places where we refer to (3) as a relfilenode, instead refer to "relation storage". Since there is a ton of SQL code in the world that knows about pg_class.relfilenode, don't change the name of that column, or of other SQL-facing things that derive their name from it. On the other hand, do adjust closely-related internal terminology. For example, the structure member names dbNode and spcNode appear to be derived from the fact that the structure itself was called RelFileNode, so change those to dbOid and spcOid. Likewise, various variables with names like rnode and relnode get renamed appropriately, according to how they're being used in context. Hopefully, this is clearer than before. It is also preparation for future patches that intend to widen the relfilenumber fields from its current width of 32 bits. Variables that store a relfilenumber are now declared as type RelFileNumber rather than type Oid; right now, these are the same, but that can now more easily be changed. Dilip Kumar, per an idea from me. Reviewed also by Andres Freund. I fixed some whitespace issues, changed a couple of words in a comment, and made one other minor correction. Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com	2022-07-06 11:39:09 -04:00
Michael Paquier	dac1ff3090	Replace durable_rename_excl() by durable_rename(), take two durable_rename_excl() attempts to avoid overwriting any existing files by using link() and unlink(), and it falls back to rename() on some platforms (aka WIN32), which offers no such overwrite protection. Most callers use durable_rename_excl() just in case there is an existing file, but in practice there shouldn't be one (see below for more details). Furthermore, failures during durable_rename_excl() can result in multiple hard links to the same file. As per Nathan's tests, it is possible to end up with two links to the same file in pg_wal after a crash just before unlink() during WAL recycling. Specifically, the test produced links to the same file for the current WAL file and the next one because the half-recycled WAL file was re-recycled upon restarting, leading to WAL corruption. This change replaces all the calls of durable_rename_excl() to durable_rename(). This removes the protection against accidentally overwriting an existing file, but some platforms are already living without it and ordinarily there shouldn't be one. The function itself is left around in case any extensions are using it. It will be removed on HEAD via a follow-up commit. Here is a summary of the existing callers of durable_rename_excl() (see second discussion link at the bottom), replaced by this commit. First, basic_archive used it to avoid overwriting an archive concurrently created by another server, but as mentioned above, it will still overwrite files on some platforms. Second, xlog.c uses it to recycle past WAL segments, where an overwrite should not happen (origin of the change at `f0e37a8`) because there are protections about the WAL segment to select when recycling an entry. The third and last area is related to the write of timeline history files. writeTimeLineHistory() will write a new timeline history file at the end of recovery on promotion, so there should be no such files for the same timeline. What remains is writeTimeLineHistoryFile(), that can be used in parallel by a WAL receiver and the startup process, and some digging of the buildfarm shows that EEXIST from a WAL receiver can happen with an error of "could not link file \"pg_wal/xlogtemp.NN\" to \"pg_wal/MM.history\", which would cause an automatic restart of the WAL receiver as it is promoted to FATAL, hence this should improve the stability of the WAL receiver as rename() would overwrite an existing TLI history file already fetched by the startup process at recovery. This is a bug fix, but knowing the unlikeliness of the problem involving one or more crashes at an exceptionally bad moment, no backpatch is done. Also, I want to be careful with such changes (`aaa3aed` did the opposite of this change by removing HAVE_WORKING_LINK so as Windows would do a link() rather than a rename() but this was not concurrent-safe). A backpatch could be revisited in the future. This is the second time this change is attempted, `ccfbd92` being the first one, but this time no assertions are added for the case of a TLI history file written concurrently by the WAL receiver or the startup process because we can expect one to exist (some of the TAP tests are able to trigger with a proper timing). Author: Nathan Bossart Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13 Discussion: https://postgr.es/m/Ym6GZbqQdlalSKSG@paquier.xyz	2022-07-05 10:16:12 +09:00
Michael Paquier	33bd4698c1	Fix code comments still referring to pg_start/stop_backup() pg_start_backup() and pg_stop_backup() have been respectively renamed to pg_backup_start() and pg_backup_stop() as of `39969e2`, but a few comments did not get the call. Reviewed-by: Kyotaro Horiguchi, David Steele Discussion: https://postgr.es/m/YrqGlj1+4DF3dbZ/@paquier.xyz	2022-07-01 09:37:17 +09:00
Tom Lane	82d0ffae32	pgindent run prior to branching v15. pgperltidy and reformat-dat-files too. Not many changes.	2022-06-30 11:03:03 -04:00
Heikki Linnakangas	adf6d5dfb2	Fix visibility check when XID is committed in CLOG but not in procarray. TransactionIdIsInProgress had a fast path to return 'false' if the single-item CLOG cache said that the transaction was known to be committed. However, that was wrong, because a transaction is first marked as committed in the CLOG but doesn't become visible to others until it has removed its XID from the proc array. That could lead to an error: ERROR: t_xmin is uncommitted in tuple to be updated or for an UPDATE to go ahead without blocking, before the previous UPDATE on the same row was made visible. The window is usually very short, but synchronous replication makes it much wider, because the wait for synchronous replica happens in that window. Another thing that makes it hard to hit is that it's hard to get such a commit-in-progress transaction into the single item CLOG cache. Normally, if you call TransactionIdIsInProgress on such a transaction, it determines that the XID is in progress without checking the CLOG and without populating the cache. One way to prime the cache is to explicitly call pg_xact_status() on the XID. Another way is to use a lot of subtransactions, so that the subxid cache in the proc array is overflown, making TransactionIdIsInProgress rely on pg_subtrans and CLOG checks. This has been broken ever since it was introduced in 2008, but the race condition is very hard to hit, especially without synchronous replication. There were a couple of reports of the error starting from summer 2021, but no one was able to find the root cause then. TransactionIdIsKnownCompleted() is now unused. In 'master', remove it, but I left it in place in backbranches in case it's used by extensions. Also change pg_xact_status() to check TransactionIdIsInProgress(). Previously, it only checked the CLOG, and returned "committed" before the transaction was actually made visible to other queries. Note that this also means that you cannot use pg_xact_status() to reproduce the bug anymore, even if the code wasn't fixed. Report and analysis by Konstantin Knizhnik. Patch by Simon Riggs, with the pg_xact_status() change added by me. Author: Simon Riggs Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/flat/4da7913d-398c-e2ad-d777-f752cf7f0bbb%40garret.ru	2022-06-27 08:21:08 +03:00
Tom Lane	7ab5b4eb48	Be more careful about GucSource for internally-driven GUC settings. The original advice for hard-wired SetConfigOption calls was to use PGC_S_OVERRIDE, particularly for PGC_INTERNAL GUCs. However, that's really overkill for PGC_INTERNAL GUCs, since there is no possibility that we need to override a user-provided setting. Instead use PGC_S_DYNAMIC_DEFAULT in most places, so that the value will appear with source = 'default' in pg_settings and thereby not be shown by psql's new \dconfig command. The one exception is that when changing in_hot_standby in a hot-standby session, we still use PGC_S_OVERRIDE, because people felt that seeing that in \dconfig would be a good thing. Similarly use PGC_S_DYNAMIC_DEFAULT for the auto-tune value of wal_buffers (if possible, that is if wal_buffers wasn't explicitly set to -1), and for the typical 2MB value of max_stack_depth. In combination these changes remove four not-very-interesting entries from the typical output of \dconfig, all of which people fingered as "why is that showing up?" in the discussion thread. Discussion: https://postgr.es/m/3118455.1649267333@sss.pgh.pa.us	2022-06-08 13:26:18 -04:00
Michael Paquier	b4529005fd	Revert "Add single-item cache when looking at topmost XID of a subtrans XID" This reverts commit `06f5295` as per issues with this approach, both in terms of efficiency impact and stability. First, contrary to the single-item cache for transaction IDs in transam.c, the cache may finish by not be hit for a long time, and without an invalidation mechanism to clear it, it would cause inconsistent results on wraparound for example. Second, the use of SubTransGetTopmostTransaction() for the caching has a limited impact on performance. SubTransGetParent() could have more impact, though the benchmarking of the single-item approach still needs to be proved, particularly under the conditions where SLRU lookups are stressed in parallel with overflowed snapshots (aka more than 64 subxids generated, for example). After discussion with Andres Freund. Discussion: https://postgr.es/m/20220524235250.gtt3uu5zktfkr4hv@alap3.anarazel.de	2022-05-28 15:02:08 +09:00

1 2 3 4 5 ...

2126 Commits