postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2024-10-08 11:26:52 +02:00

Author	SHA1	Message	Date
Jeff Davis	9553b4115f	Fix warning introduced in `5c279a6d35`. Change two macros to be static inline functions instead to keep the data type consistent. This avoids a "comparison is always true" warning that was occurring with -Wtype-limits. In the process, change the names to look less like macros. Discussion: https://postgr.es/m/20220407063505.njnnrmbn4sxqfsts@alap3.anarazel.de	2022-04-07 00:39:30 -07:00
Andres Freund	3536b851ad	Fix compilation with WAL_DEBUG. Broke with `5c279a6d35`. But looks like it had been half-broken since `70e81861fa`, because 'rmid' didn't refer to the current record's rmid anymore, but to rmid from "Initialize resource managers" - a constant.	2022-04-06 23:26:59 -07:00
Jeff Davis	5c279a6d35	Custom WAL Resource Managers. Allow extensions to specify a new custom resource manager (rmgr), which allows specialized WAL. This is meant to be used by a Table Access Method or Index Access Method. Prior to this commit, only Generic WAL was available, which offers support for recovery and physical replication but not logical replication. Reviewed-by: Julien Rouhaud, Bharath Rupireddy, Andres Freund Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com	2022-04-06 23:06:46 -07:00
Michael Paquier	06f5295af6	Add single-item cache when looking at topmost XID of a subtrans XID This change affects SubTransGetTopmostTransaction(), used to find the topmost transaction ID of a given transaction ID. The cache is able to store one value, so as we can save the backend from unnecessary lookups at pg_subtrans/ on repetitive calls of this routine. There is a similar practice in transam.c, for example. Author: Simon Riggs Reviewed-by: Andrey Borodin, Julien Rouhaud Discussion: https://postgr.es/m/CANbhV-G8Co=yq4v4BkW7MJDqVt68K_8A48nAZ_+8UQS7LrwLEQ@mail.gmail.com	2022-04-07 14:34:37 +09:00
Andres Freund	5891c7a8ed	pgstat: store statistics in shared memory. Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version) Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version) Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version) Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de	2022-04-06 21:29:46 -07:00
Andres Freund	be902e2651	pgstat: normalize function naming. Most of pgstat uses pgstat_<verb>_<subject>() or just <verb>_<subject>(). But not all (some introduced fairly recently by me). Rename ones that aren't intentionally following a different scheme (e.g. AtEOXact_*).	2022-04-06 21:29:46 -07:00
Andres Freund	8b1dccd37c	pgstat: scaffolding for transactional stats creation / drop. One problematic part of the current statistics collector design is that there is no reliable way of getting rid of statistics entries. Because of that pgstat_vacuum_stat() (called by [auto-]vacuum) matches all stats for the current database with the catalog contents and tries to drop now-superfluous entries. That's quite expensive. What's worse, it doesn't work on physical replicas, despite physical replicas collection statistics entries. This commit introduces infrastructure to create / drop statistics entries transactionally, together with the underlying catalog objects (functions, relations, subscriptions). pgstat_xact.c maintains a list of stats entries created / dropped transactionally in the current transaction. To ensure the removal of statistics entries is durable dropped statistics entries are included in commit / abort (and prepare) records, which also ensures that stats entries are dropped on standbys. Statistics entries created separately from creating the underlying catalog object (e.g. when stats were previously lost due to an immediate restart) are not WAL logged. However that can only happen outside of the transaction creating the catalog object, so it does not lead to "leaked" statistics entries. For this to work, functions creating / dropping functions / relations / subscriptions need to call into pgstat. For subscriptions this was already done when dropping subscriptions, via pgstat_report_subscription_drop() (now renamed to pgstat_drop_subscription()). This commit does not actually drop stats yet, it just provides the infrastructure. It is however a largely independent piece of infrastructure, so committing it separately makes sense. Bumps XLOG_PAGE_MAGIC. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de	2022-04-06 18:27:52 -07:00
Stephen Frost	39969e2a1e	Remove exclusive backup mode Exclusive-mode backups have been deprecated since 9.6 (when non-exclusive backups were introduced) due to the issues they can cause should the system crash while one is running and generally because non-exclusive provides a much better interface. Further, exclusive backup mode wasn't really being tested (nor was most of the related code- like being able to log in just to stop an exclusive backup and the bits of the state machine related to that) and having to possibly deal with an exclusive backup and the backup_label file existing during pg_basebackup, pg_rewind, etc, added other complexities that we are better off without. This patch removes the exclusive backup mode, the various special cases for dealing with it, and greatly simplifies the online backup code and documentation. Authors: David Steele, Nathan Bossart Reviewed-by: Chapman Flack Discussion: https://postgr.es/m/ac7339ca-3718-3c93-929f-99e725d1172c@pgmasters.net https://postgr.es/m/CAHg+QDfiM+WU61tF6=nPZocMZvHDzCK47Kneyb0ZRULYzV5sKQ@mail.gmail.com	2022-04-06 14:41:03 -04:00
Peter Eisentraut	01effb1304	Fix unsigned output format in SLRU error reporting Avoid printing signed values as unsigned. (No impact in practice expected.) Author: Pavel Borisov <pashkin.elfe@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CALT9ZEHN7hWJo6MgJKqoDMGj%3DGOzQU50wTvOYZXDj7x%3DsUK-kw%40mail.gmail.com	2022-04-06 09:15:05 +02:00
Robert Haas	9c08aea6a3	Add new block-by-block strategy for CREATE DATABASE. Because this strategy logs changes on a block-by-block basis, it avoids the need to checkpoint before and after the operation. However, because it logs each changed block individually, it might generate a lot of extra write-ahead logging if the template database is large. Therefore, the older strategy remains available via a new STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy option to createdb. Somewhat controversially, this patch assembles the list of relations to be copied to the new database by reading the pg_class relation of the template database. Cross-database access like this isn't normally possible, but it can be made to work here because there can't be any connections to the database being copied, nor can it contain any in-doubt transactions. Even so, we have to use lower-level interfaces than normal, since the table scan and relcache interfaces will not work for a database to which we're not connected. The advantage of this approach is that we do not need to rely on the filesystem to determine what ought to be copied, but instead on PostgreSQL's own knowledge of the database structure. This avoids, for example, copying stray files that happen to be located in the source database directory. Dilip Kumar, with a fairly large number of cosmetic changes by me. Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor, Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian, Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others. Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com	2022-03-29 11:48:36 -04:00
Alvaro Herrera	bf902c1393	Revert "Fix replay of create database records on standby" This reverts commit `49d9cfc68b`. The approach taken by this patch has problems, so we'll come up with a radically different fix. Discussion: https://postgr.es/m/CA+TgmoYcUPL+WOJL2ZzhH=zmrhj0iOQ=iCFM0SuYqBbqZEamEg@mail.gmail.com	2022-03-29 15:36:21 +02:00
Alvaro Herrera	49d9cfc68b	Fix replay of create database records on standby Crash recovery on standby may encounter missing directories when replaying create database WAL records. Prior to this patch, the standby would fail to recover in such a case. However, the directories could be legitimately missing. Consider a sequence of WAL records as follows: CREATE DATABASE DROP DATABASE DROP TABLESPACE If, after replaying the last WAL record and removing the tablespace directory, the standby crashes and has to replay the create database record again, the crash recovery must be able to move on. This patch adds a mechanism similar to invalid-page tracking, to keep a tally of missing directories during crash recovery. If all the missing directory references are matched with corresponding drop records at the end of crash recovery, the standby can safely continue following the primary. Backpatch to 13, at least for now. The bug is older, but fixing it in older branches requires more careful study of the interactions with commit `e6d8069522`, which appeared in 13. A new TAP test file is added to verify the condition. However, because it depends on commit `d6d317dbf6`, it can only be added to branch master. I (Álvaro) manually verified that the code behaves as expected in branch 14. It's a bit nervous-making to leave the code uncovered by tests in older branches, but leaving the bug unfixed is even worse. Also, the main reason this fix took so long is precisely that we couldn't agree on a good strategy to approach testing for the bug, so perhaps this is the best we can do. Diagnosed-by: Paul Guo <paulguo@gmail.com> Author: Paul Guo <paulguo@gmail.com> Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Asim R Praveen <apraveen@pivotal.io> Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com	2022-03-25 13:16:21 +01:00
Robert Haas	412ad7a556	Fix possible recovery trouble if TRUNCATE overlaps a checkpoint. If TRUNCATE causes some buffers to be invalidated and thus the checkpoint does not flush them, TRUNCATE must also ensure that the corresponding files are truncated on disk. Otherwise, a replay from the checkpoint might find that the buffers exist but have the wrong contents, which may cause replay to fail. Report by Teja Mupparti. Patch by Kyotaro Horiguchi, per a design suggestion from Heikki Linnakangas, with some changes to the comments by me. Review of this and a prior patch that approached the issue differently by Heikki Linnakangas, Andres Freund, Álvaro Herrera, Masahiko Sawada, and Tom Lane. Discussion: http://postgr.es/m/BYAPR06MB6373BF50B469CA393C614257ABF00@BYAPR06MB6373.namprd06.prod.outlook.com	2022-03-24 14:52:28 -04:00
Alvaro Herrera	9d92582abf	Fix "missing continuation record" after standby promotion Invalidate abortedRecPtr and missingContrecPtr after a missing continuation record is successfully skipped on a standby. This fixes a PANIC caused when a recently promoted standby attempts to write an OVERWRITE_RECORD with an LSN of the previously read aborted record. Backpatch to 10 (all stable versions). Author: Sami Imseih <simseih@amazon.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/44D259DE-7542-49C4-8A52-2AB01534DCA9@amazon.com	2022-03-23 18:22:10 +01:00
Thomas Munro	3f1ce97346	Add circular WAL decoding buffer, take II. Teach xlogreader.c to decode the WAL into a circular buffer. This will support optimizations based on looking ahead, to follow in a later commit. * XLogReadRecord() works as before, decoding records one by one, and allowing them to be examined via the traditional XLogRecGetXXX() macros and certain traditional members like xlogreader->ReadRecPtr. * An alternative new interface XLogReadAhead()/XLogNextRecord() is added that returns pointers to DecodedXLogRecord objects so that it's now possible to look ahead in the WAL stream while replaying. * In order to be able to use the new interface effectively while streaming data, support is added for the page_read() callback to respond to a new nonblocking mode with XLREAD_WOULDBLOCK instead of waiting for more data to arrive. No direct user of the new interface is included in this commit, though XLogReadRecord() uses it internally. Existing code doesn't need to change, except in a few places where it was accessing reader internals directly and now needs to go through accessor macros. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Discussion: https://postgr.es/m/CA+hUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq=AovOddfHpA@mail.gmail.com	2022-03-18 18:45:47 +13:00
Thomas Munro	46d9bfb0a6	Fix race between DROP TABLESPACE and checkpointing. Commands like ALTER TABLE SET TABLESPACE may leave files for the next checkpoint to clean up. If such files are not removed by the time DROP TABLESPACE is called, we request a checkpoint so that they are deleted. However, there is presently a window before checkpoint start where new unlink requests won't be scheduled until the following checkpoint. This means that the checkpoint forced by DROP TABLESPACE might not remove the files we expect it to remove, and the following ERROR will be emitted: ERROR: tablespace "mytblspc" is not empty To fix, add a call to AbsorbSyncRequests() just before advancing the unlink cycle counter. This ensures that any unlink requests forwarded prior to checkpoint start (i.e., when ckpt_started is incremented) will be processed by the current checkpoint. Since AbsorbSyncRequests() performs memory allocations, it cannot be called within a critical section, so we also need to move SyncPreCheckpoint() to before CreateCheckPoint()'s critical section. This is an old bug, so back-patch to all supported versions. Author: Nathan Bossart <nathandbossart@gmail.com> Reported-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20220215235845.GA2665318%40nathanxps13	2022-03-16 17:20:24 +13:00
Michael Paquier	6bdf1a1400	Fix collection of typos in the code and the documentation Some words were duplicated while other places were grammatically incorrect, including one variable name in the code. Author: Otto Kekalainen, Justin Pryzby Discussion: https://postgr.es/m/7DDBEFC5-09B6-4325-B942-B563D1A24BDC@amazon.com	2022-03-15 11:29:35 +09:00
Thomas Munro	c6f2f01611	Fix pg_basebackup with in-place tablespaces. Previously, pg_basebackup from a cluster that contained an 'in-place' tablespace, as introduced by commit `7170f215`, would produce a harmless warning on Unix and fail completely on Windows. Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20220304.165449.1200020258723305904.horikyota.ntt%40gmail.com	2022-03-15 14:01:23 +13:00
Michael Paquier	e9537321a7	Add support for zstd with compression of full-page writes in WAL wal_compression gains a new value, "zstd", to allow the compression of full-page images using the compression method of the same name. Compression is done using the default level recommended by the library, as of ZSTD_CLEVEL_DEFAULT = 3. Some benchmarking has shown that it could make sense to use a level lower for the FPI compression, like 1 or 2, as the compression rate did not change much with a bit less CPU consumed, but any tests done would only cover few scenarios so it is hard to come to a clear conclusion. Anyway, there is no reason to not use the default level instead, which is the level recommended by the library so it should be fine for most cases. zstd outclasses easily pglz, and is better than LZ4 where one wants to have more compression at the cost of extra CPU but both are good enough in their own scenarios, so the choice between one or the other of these comes to a study of the workload patterns and the schema involved, mainly. This commit relies heavily on `4035cd5`, that reshaped the code creating and restoring full-page writes to be aware of the compression type, making this integration straight-forward. This patch borrows some early work from Andrey Borodin, though the patch got a complete rewrite. Author: Justin Pryzby Discussion: https://postgr.es/m/20220222231948.GJ9008@telsasoft.com	2022-03-11 12:18:53 +09:00
Michael Paquier	0071fc7127	Fix header inclusion order in xloginsert.c with lz4.h Per project policy, all system and library headers need to be declared in the backend code after "postgres.h" and before the internal headers, but `4035cd5` broke this policy when adding support for LZ4 in wal_compression. Noticed while reviewing the patch to add support for zstd in this area. This only impacts HEAD, so there is no need for a back-patch.	2022-03-11 10:59:47 +09:00
Tom Lane	46ab07ffda	Clean up assorted failures under clang's -fsanitize=undefined checks. Most of these are cases where we could call memcpy() or other libc functions with a NULL pointer and a zero count, which is forbidden by POSIX even though every production version of libc allows it. We've fixed such things before in a piecemeal way, but apparently never made an effort to try to get them all. I don't claim that this patch does so either, but it gets every failure I observe in check-world, using clang 12.0.1 on current RHEL8. numeric.c has a different issue that the sanitizer doesn't like: "ln(-1.0)" will compute log10(0) and then try to assign the resulting -Inf to an integer variable. We don't actually use the result in such a case, so there's no live bug. Back-patch to all supported branches, with the idea that we might start running a buildfarm member that tests this case. This includes back-patching `c1132aae3` (Check the size in COPY_POINTER_FIELD), which previously silenced some of these issues in copyfuncs.c. Discussion: https://postgr.es/m/CALNJ-vT9r0DSsAOw9OXVJFxLENoVS_68kJ5x0p44atoYH+H4dg@mail.gmail.com	2022-03-03 18:13:24 -05:00
Michael Paquier	62ce0c758d	Fix catalog data of pg_stop_backup(), labelled v2 This function has been incorrectly marked as a set-returning function with prorows (estimated number of rows) set to 1 since its creation in `7117685`, that introduced non-exclusive backups. There is no need for that as the function is designed to return only one tuple. This commit fixes the catalog definition of pg_stop_backup_v2() so as it is not marked as proretset anymore, with prorows set to 0. This simplifies its internals by removing one tuplestore (used for one single record anyway) and by removing all the checks related to a set-returning function. Issue found during my quest to simplify some of the logic used in in-core system functions. Bump catalog version. Reviewed-by: Aleksander Alekseev, Kyotaro Horiguchi Discussion: https://postgr.es/m/Yh8guT78f1Ercfzw@paquier.xyz	2022-03-03 10:51:57 +09:00
Tom Lane	12d768e704	Don't use static storage for SaveTransactionCharacteristics(). This is pretty queasy-making on general principles, and the more so once you notice that CommitTransactionCommand() is actually stomping on the values saved by _SPI_commit(). It's okay as long as the active values didn't change during HoldPinnedPortals(); but that's a larger assumption than I think we want to make, especially since the fix is so simple. Discussion: https://postgr.es/m/1533956.1645731245@sss.pgh.pa.us	2022-02-28 12:54:12 -05:00
Heikki Linnakangas	69639e2b5c	Fix uninitialized variable. I'm very surprised the compiler didn't warn about it. But Coverity and Valgrind did.	2022-02-20 18:33:50 +02:00
Michael Paquier	d61a361d1a	Remove all traces of tuplestore_donestoring() in the C code This routine is a no-op since `dd04e95` from 2003, with a macro kept around for compatibility purposes. This has led to the same code patterns being copy-pasted around for no effect, sometimes in confusing ways like in pg_logical_slot_get_changes_guts() from logical.c where the code was actually incorrect. This issue has been discussed on two different threads recently, so rather than living with this legacy, remove any uses of this routine in the C code to simplify things. The compatibility macro is kept to avoid breaking any out-of-core modules that depend on it. Reported-by: Tatsuhito Kasahara, Justin Pryzby Author: Tatsuhito Kasahara Discussion: https://postgr.es/m/20211217200419.GQ17618@telsasoft.com Discussion: https://postgr.es/m/CAP0=ZVJeeYfAeRfmzqAF2Lumdiv4S4FewyBnZd4DPTrsSQKJKw@mail.gmail.com	2022-02-17 09:52:02 +09:00
Heikki Linnakangas	4620892344	Fix bogus log message when starting from a cleanly shut down state. In commit `70e81861fa` to split xlog.c, I moved the startup code that updates the state in the control file and prints out the "database system was not properly shut down" message to the log, but I accidentally removed the "if (InRecovery)" check around it. As a result, that message was printed even if the system was cleanly shut down, also during 'initdb'. Discussion: https://www.postgresql.org/message-id/3357075.1645031062@sss.pgh.pa.us	2022-02-16 23:15:08 +02:00
Heikki Linnakangas	9ed87a78e0	Fix read beyond buffer bug introduced by the split xlog.c patch. FinishWalRecovery() copied the valid part of the last WAL block into a palloc'd buffer, and the code in StartupXLOG() copied it to the WAL buffer. But the memcpy in StartupXLOG() copied a full 8kB block, not just the valid part, i.e. it copied from beyond the end of the buffer. The invalid part was cleared immediately afterwards, so as long as the memory was allocated and didn't segfault, it didn't do any harm, but it can definitely segfault. Discussion: https://www.postgresql.org/message-id/efc12e32-5af2-3485-5b1d-5af9f707491a@iki.fi	2022-02-16 12:01:32 +02:00
Heikki Linnakangas	70e81861fa	Split xlog.c into xlog.c and xlogrecovery.c. This moves the functions related to performing WAL recovery into the new xlogrecovery.c source file, leaving xlog.c responsible for maintaining the WAL buffers, coordinating the startup and switch from recovery to normal operations, and other miscellaneous stuff that have always been in xlog.c. Reviewed-by: Andres Freund, Kyotaro Horiguchi, Robert Haas Discussion: https://www.postgresql.org/message-id/a31f27b4-a31d-f976-6217-2b03be646ffa%40iki.fi	2022-02-16 09:30:38 +02:00
Heikki Linnakangas	be1c00ab13	Move code around in StartupXLOG(). This is in preparation for the next commit, which will split off recovery-related code from xlog.c into a new source file. This is the order that things will happen with the next commit, and the point of this commit is to make these ordering changes more explicit, while the next commit mechanically moves the source code to the new file. To aid review, I added "BEGIN/END function" comments to mark which blocks of code are moved to which functions in the next commit. They will be gone in the next commit. Reviewed-by: Andres Freund, Kyotaro Horiguchi, Robert Haas Discussion: https://www.postgresql.org/message-id/a31f27b4-a31d-f976-6217-2b03be646ffa%40iki.fi	2022-02-16 09:22:44 +02:00
Heikki Linnakangas	b3a5d01c05	Refactor setting XLP_FIRST_IS_OVERWRITE_CONTRECORD. Set it directly in CreateOverwriteContrecordRecord(). That way, AdvanceXLInsertBuffer() doesn't need the missingContrecPtr global variable. This is in preparation for splitting xlog.c into multiple files. Reviewed-by: Robert Haas Discussion: https://www.postgresql.org/message-id/a462d79c-cb5a-47cc-e9ac-616b5003965f%40iki.fi	2022-02-16 09:22:41 +02:00
Heikki Linnakangas	d231be00cb	Run pgindent on xlog.c. To tidy up after some recent refactorings in xlog.c. These would be fixed by the pgindent run we do at the end of the development cycle, but I want to clean these up now as I'm about to do some more big refactorings on xlog.c.	2022-02-16 09:22:34 +02:00
Michael Paquier	c963e84fb8	Make origin data initialization consistent other fields in 2PC header As of `1eb6d65`, the origin data is optionally stored in a 2PC file header, with the data filled in EndPrepare() even in the default case where there is no origin data to add. This was inconsistent with all the other fields of TwoPhaseFileHeader which are initialized in StartPrepare(), so move the initialization of origin_lsn and origin_timestamp there instead. The effect of missing the initialization at this early stage is only cosmetic based on the current logic of the code, but could have led to issues in the long-term, and it is more consistent done this way. Reported-by: Ranier Vilela Discussion: https://postgr.es/m/CAEudQAooECJ+gU_RZB-yhioPOV94R4ucoHAf68PiJhLpgpVpBw@mail.gmail.com	2022-02-14 09:30:35 +09:00
Michael Paquier	0147fc7c8c	Fix typo in multixact.c Introduced in `aa64f23`. Author: Nathan Bossart Discussion: https://postgr.es/m/20220209175338.GB1627503@nathanxps13	2022-02-10 10:45:14 +09:00
Robert Haas	aa64f23b02	Remove MaxBackends variable in favor of GetMaxBackends() function. Previously, it was really easy to write code that accessed MaxBackends before we'd actually initialized it, especially when coding up an extension. To make this less error-prune, introduce a new function GetMaxBackends() which should be used to obtain the correct value. This will ERROR if called too early. Demote the global variable to a file-level static, so that nobody can peak at it directly. Nathan Bossart. Idea by Andres Freund. Review by Greg Sabino Mullane, by Michael Paquier (who had doubts about the approach), and by me. Discussion: http://postgr.es/m/20210802224204.bckcikl45uezv5e4@alap3.anarazel.de	2022-02-08 15:53:19 -05:00
Robert Haas	5ef1eefd76	Allow archiving via loadable modules. Running a shell command for each file to be archived has a lot of overhead and may not offer as much error checking as you want, or the exact semantics that you want. So, offer the option to call a loadable module for each file to be archived, rather than running a shell command. Also, add a 'basic_archive' contrib module as an example implementation that archives to a local directory. Nathan Bossart, with a little bit of kibitzing by me. Discussion: http://postgr.es/m/20220202224433.GA1036711@nathanxps13	2022-02-03 14:05:02 -05:00
Alvaro Herrera	b3d7d6e462	Remove xloginsert.h from xlog.h xlog.h is directly and indirectly #included in a lot of places. With this change, xloginsert.h is no longer unnecessarily included in the large number of them that don't need it. Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/CALj2ACVe-W+WM5P44N7eG9C2_FmaeM8Dq5aCnD3fHt0Ba=WR6w@mail.gmail.com	2022-01-30 12:25:24 -03:00
Michael Paquier	741bd32933	Improve errors related to incorrect TLI on checkpoint record replay WAL replay would cause a hard crash if the timeline expected by a XLOG_END_OF_RECOVERY, a XLOG_CHECKPOINT_ONLINE, or a XLOG_CHECKPOINT_SHUTDOWN record is not the same as the timeline being replayed, using the same error message for all three of them. This commit changes those error messages to use different wordings, adapted to each record type, which is useful when it comes to the debugging of an issue in this area. Author: Amul Sul Reviewed-by: Nathan Bossart, Robert Haas Discussion: https://postgr.es/m/CAAJ_b97i1ZerYC_xW6o_AiDSW5n+sGi8k91Yc8KS8bKWKxjqwQ@mail.gmail.com	2022-01-25 13:37:19 +09:00
Michael Paquier	410aa248e5	Fix various typos, grammar and code style in comments and docs This fixes a set of issues that have accumulated over the past months (or years) in various code areas. Most fixes are related to some recent additions, as of the development of v15. Author: Justin Pryzby Discussion: https://postgr.es/m/20220124030001.GQ23027@telsasoft.com	2022-01-25 09:40:04 +09:00
Michael Paquier	237d1f3172	Fix one-off bug causing missing commit timestamps for subtransactions The logic in charge of writing commit timestamps (enabled with track_commit_timestamp) for subtransactions had a one-bug bug, where it would be possible that commit timestamps go missing for the last subtransaction committed. While on it, simplify a bit the iteration logic in the loop writing the commit timestamps, as per suggestions from Kyotaro Horiguchi and Tom Lane, so as some variable initializations are not part of the loop itself. Issue introduced in `73c986a`. Analyzed-by: Alex Kingsborough Author: Alex Kingsborough, Kyotaro Horiguchi Discussion: https://postgr.es/m/73A66172-4050-4F2A-B7F1-13508EDA2144@amazon.com Backpatch-through: 10	2022-01-21 14:54:04 +09:00
Jeff Davis	7a5f6b4748	Make logical decoding a part of the rmgr. Add a new rmgr method, rm_decode, and use that rather than a switch statement. In preparation for rmgr extensibility. Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com Discussion: https://postgr.es/m/20220118095332.6xtlcjoyxobv6cbk@jrouhaud	2022-01-19 14:58:49 -08:00
Bruce Momjian	27b77ecf9f	Update copyright for 2022 Backpatch-through: 10	2022-01-07 19:04:57 -05:00
Peter Eisentraut	113fa3945f	Fix incorrect format placeholders	2021-12-29 10:08:41 +01:00
Amit Kapila	8e1fae1938	Move parallel vacuum code to vacuumparallel.c. This commit moves parallel vacuum related code to a new file commands/vacuumparallel.c so that any table AM supporting indexes can utilize parallel vacuum in order to call index AM callbacks (ambulkdelete and amvacuumcleanup) with parallel workers. Another reason for this refactoring is that the parallel vacuum isn't specific to heap so it doesn't make sense to keep this code in heap/vacuumlazy.c. Author: Masahiko Sawada, based on suggestion from Andres Freund Reviewed-by: Hou Zhijie, Amit Kapila, Haiying Tang Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de	2021-12-23 11:42:52 +05:30
Thomas Munro	a13db0e164	Change ProcSendSignal() to take pgprocno. Instead of referring to target backends by pid, use pgprocno. This means that we don't have to scan the ProcArray and we can drop some special case code for dealing with the startup process. Discussion: https://postgr.es/m/CA%2BhUKGLYRyDaneEwz5Uya_OgFLMx5BgJfkQSD%3Dq9HmwsfRRb-w%40mail.gmail.com Reviewed-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com> Reviewed-by: Ashwin Agrawal <ashwinstar@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de>	2021-12-16 15:56:03 +13:00
Michael Paquier	ece8c76192	Remove assertion for replication origins in PREPARE TRANSACTION When using replication origins, pg_replication_origin_xact_setup() is an optional choice to be able to set a LSN and a timestamp to mark the origin, which would be additionally added to WAL for transaction commits or aborts (including 2PC transactions). An assertion in the code path of PREPARE TRANSACTION assumed that this data should always be set, so it would trigger when using replication origins without setting up an origin LSN. Some tests are added to cover more this kind of scenario. Oversight in commit `1eb6d65`. Per discussion with Amit Kapila and Masahiko Sawada. Discussion: https://postgr.es/m/YbbBfNSvMm5nIINV@paquier.xyz Backpatch-through: 11	2021-12-14 10:58:15 +09:00
Robert Haas	fa0e03c15a	Remove InitXLOGAccess(). It's not great that RecoveryInProgress() calls InitXLOGAccess(), because a status inquiry function typically shouldn't have the side effect of performing initializations. We could fix that by calling InitXLOGAccess() from some other place, but instead, let's remove it altogether. One thing InitXLogAccess() did is initialize wal_segment_size, but it doesn't need to do that. In the postmaster, PostmasterMain() calls LocalProcessControlFile(), and all child processes will inherit that value -- except in EXEC_BACKEND bulds, but then each backend runs SubPostmasterMain() which also calls LocalProcessControlFile(). The other thing InitXLOGAccess() did is update RedoRecPtr and doPageWrites, but that's not critical, because all code that uses them will just retry if it turns out that they've changed. The only difference is that most code will now see an initial value that is definitely invalid instead of one that might have just been way out of date, but that will only happen once per backend lifetime, so it shouldn't be a big deal. Patch by me, reviewed by Nathan Bossart, Michael Paquier, Andres Freund, Heikki Linnakangas, and Álvaro Herrera. Discussion: http://postgr.es/m/CA+TgmoY7b65qRjzHN_tWUk8B4sJqk1vj1d31uepVzmgPnZKeLg@mail.gmail.com	2021-12-13 09:58:36 -05:00
Robert Haas	64da07c41a	Default to log_checkpoints=on, log_autovacuum_min_duration=10m The idea here is that when a performance problem is known to have occurred at a certain point in time, it's a good thing if there is some information available from the logs to help figure out what might have happened around that time. This change attracted an above-average amount of dissent, because it means that a server with default settings will produce some amount of log output even if nothing has gone wrong. However, by my count, the mailing list discussion had about twice as many people in favor of the change as opposed. The reasons for believing that the extra log output is not an issue in practice are: (1) the rate at which messages can be generated by this setting is bounded to one every few minutes on a properly-configured system and (2) production systems tend to have a lot more junk in the log from that due to failed connection attempts, ERROR messages generated by application activity, and the like. Bharath Rupireddy, reviewed by Fujii Masao and by me. Many other people commented on the thread, but as far as I can see that was discussion of the merits of the change rather than review of the patch. Discussion: https://postgr.es/m/CALj2ACX-rW_OeDcp4gqrFUAkf1f50Fnh138dmkd0JkvCNQRKGA@mail.gmail.com	2021-12-13 09:48:48 -05:00
Daniel Gustafsson	018b800245	Remove mention of TimeLineID update from comments Commit `4a92a1c3d` removed the TimeLineID update from RecoveryInProgress, update comments accordingly. Author: Amul Sul <sulamul@gmail.com> Discussion: https://postgr.es/m/CAAJ_b96wyzs8N45jc-kYd-bTE02hRWQieLZRpsUtNbhap7_PuQ@mail.gmail.com	2021-12-01 14:17:24 +01:00
Alvaro Herrera	4c83e59e01	Increase size of shared memory for pg_commit_ts Like `5364b357fb` did for pg_commit, change the formula used to determine number of pg_commit_ts buffers, which helps performance with larger servers. Discussion: https://postgr.es/m/20210115220744.GA24457@alvherre.pgsql Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>	2021-11-30 14:29:31 -03:00
Michael Paquier	6fb7c5d67c	Centralize timestamp computation of control file on updates This commit moves the timestamp computation of the control file within the routine of src/common/ in charge of updating the backend's control file, which is shared by multiple frontend tools (pg_rewind, pg_checksums and pg_resetwal) and the backend itself. This change has as direct effect to update the control file's timestamp when writing the control file in pg_rewind and pg_checksums, something that is helpful to keep track of control file updates for those operations, something also tracked by the backend at startup within its logs. This part is arguably a bug, as ControlFileData->time should be updated each time a new version of the control file is written, but this is a behavior change so no backpatch is done. Author: Amul Sul Reviewed-by: Nathan Bossart, Michael Paquier, Bharath Rupireddy Discussion: https://postgr.es/m/CAAJ_b97nd_ghRpyFV9Djf9RLXkoTbOUqnocq11WGq9TisX09Fw@mail.gmail.com	2021-11-29 13:36:13 +09:00
Tom Lane	3804539e48	Replace random(), pg_erand48(), etc with a better PRNG API and algorithm. Standardize on xoroshiro128 as our basic PRNG algorithm, eliminating a bunch of platform dependencies as well as fundamentally-obsolete PRNG code. In addition, this API replacement will ease replacing the algorithm again in future, should that become necessary. xoroshiro128 is a few percent slower than the drand48 family, but it can produce full-width 64-bit random values not only 48-bit, and it should be much more trustworthy. It's likely to be noticeably faster than the platform's random(), depending on which platform you are thinking about; and we can have non-global state vectors easily, unlike with random(). It is not cryptographically strong, but neither are the functions it replaces. Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo	2021-11-28 21:33:07 -05:00
Alvaro Herrera	44bd3ed332	Fix determination of broken LSN in OVERWRITTEN_CONTRECORD In commit `ff9f111bce` I mixed up inconsistent definitions of the LSN of the first record in a page, when the previous record ends exactly at the page boundary. The correct LSN is adjusted to skip the WAL page header; I failed to use that when setting XLogReaderState->overwrittenRecPtr, so at WAL replay time VerifyOverwriteContrecord would refuse to let replay continue past that record. Backpatch to 10. 9.6 also contains this bug, but it's no longer being maintained. Discussion: https://postgr.es/m/45597.1637694259@sss.pgh.pa.us	2021-11-26 11:14:27 -03:00
Andres Freund	3030903dfe	Replace straggling uses of ReadRecPtr/EndRecPtr. `d2ddfa681d` removed ReadRecPtr/EndRecPtr, but two uses within an #ifdef WAL_DEBUG escaped. Discussion: https://postgr.es/m/20211124231206.gbadj5bblcljb6d5@alap3.anarazel.de	2021-11-24 16:56:14 -08:00
Robert Haas	d2ddfa681d	xlog.c: Remove global variables ReadRecPtr and EndRecPtr. In most places, the variables necessarily store the same value as the eponymous members of the XLogReaderState that we use during WAL replay, because ReadRecord() assigns the values from the structure members to the global variables just after XLogReadRecord() returns. However, XLogBeginRead() adjusts the structure members but not the global variables, so after XLogBeginRead() and before the completion of XLogReadRecord() the values can differ. Otherwise, they must be identical. According to my analysis, the only place where either variable is referenced at a point where it might not have the same value as the structure member is the refrence to EndRecPtr within XLogPageRead. Therefore, at every other place where we are using the global variable, we can just switch to using the structure member instead, and remove the global variable. However, we can, and in fact should, do this in XLogPageRead() as well, because at that point in the code, the global variable will actually store the start of the record we want to read - either because it's where the last WAL record ended, or because the read position has been changed using XLogBeginRead since the last record was read. The structure member, on the other hand, will already have been updated to point to the end of the record we just read. Elsewhere, the latter is what we use as an argument to emode_for_corrupt_record(), so we should do the same here. This part of the patch is perhaps a bug fix, but I don't think it has any important consequences, so no back-patch. The point here is just to continue to whittle down the entirely excessive use of global variables in xlog.c. Discussion: http://postgr.es/m/CA+Tgmoao96EuNeSPd+hspRKcsCddu=b1h-QNRuKfY8VmfNQdfg@mail.gmail.com	2021-11-24 11:27:39 -05:00
Robert Haas	e7ea2fa342	Fix corner-case failure to detect improper timeline switch. rescanLatestTimeLine() contains a guard against switching to a timeline that forked off from the current one prior to the current recovery point, but that guard does not work if the timeline switch occurs before the first WAL recod (which must be the checkpoint record) is read. Without this patch, an improper timeline switch is therefore possible in such cases. This happens because rescanLatestTimeLine() relies on the global variable EndRecPtr to understand the current position of WAL replay. However, EndRecPtr at this point in the code contains the endpoint of the last-replayed record, not the startpoint or endpoint of the record being replayed now. Thus, before any records have been replayed, it's zero, which causes the sanity check to always pass. To fix, pass down the correct timeline explicitly. The EndRecPtr value we want is the one from the xlogreader, which will be the starting position of the record we're about to try to read, rather than the global variable, which is the ending position of the last record we successfully read. They're usually the same, but not in the corner case described here. No back-patch, because in v14 and earlier branhes, we were using the wrong TLI here as well as the wrong LSN. In master, that was fixed by commit `4a92a1c3d1`, but that and it's prerequisite patches are too invasive to back-patch for such a minor issue. Patch by me, reviewed by Amul Sul. Discussion: http://postgr.es/m/CA+Tgmoao96EuNeSPd+hspRKcsCddu=b1h-QNRuKfY8VmfNQdfg@mail.gmail.com	2021-11-24 08:13:10 -05:00
Alvaro Herrera	2fed48f48f	Be more specific about OOM in XLogReaderAllocate A couple of spots can benefit from an added errdetail(), which matches what we were already doing in other places; and those that cannot withstand errdetail() can get a more descriptive primary message. Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Discussion: https://postgr.es/m/CALj2ACV+cX1eM03GfcA=ZMLXh5fSn1X1auJLz3yuS1duPSb9QA@mail.gmail.com	2021-11-22 13:43:43 -03:00
Fujii Masao	1b06d7bac9	Report wait events for local shell commands like archive_command. This commit introduces new wait events for archive_command, archive_cleanup_command, restore_command and recovery_end_command. Author: Fujii Masao Reviewed-by: Bharath Rupireddy, Michael Paquier Discussion: https://postgr.es/m/4ca4f920-6b48-638d-08b2-93598356f5d3@oss.nttdata.com	2021-11-22 10:28:21 +09:00
Michael Paquier	f975fc3a35	Remove global variable "LastRec" in xlog.c This variable is used only by StartupXLOG() now, so let's make it local to simplify the code. Author: Amul Sul Reviewed-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/CAAJ_b96Qd023itERBRN9Z7P2saNDT3CYvGuMO8RXwndVNN6z7g@mail.gmail.com	2021-11-17 11:04:18 +09:00
Robert Haas	e51c46991f	Move InitXLogInsert() call from InitXLOGAccess() to BaseInit(). At present, there is an undocumented coding rule that you must call RecoveryInProgress(), or do something else that results in a call to InitXLogInsert(), before trying to write WAL. Otherwise, the WAL construction buffers won't be initialized, resulting in failures. Since it's not good to rely on a status inquiry function like RecoveryInProgress() having the side effect of initializing critical data structures, instead do the initialization eariler, when the backend first starts up. Patch by me. Reviewed by Nathan Bossart and Michael Paquier. Discussion: http://postgr.es/m/CA+TgmoY7b65qRjzHN_tWUk8B4sJqk1vj1d31uepVzmgPnZKeLg@mail.gmail.com	2021-11-16 09:43:17 -05:00
Noah Misch	3354746910	Report any XLogReadRecord() error in XlogReadTwoPhaseData(). Buildfarm members kittiwake and tadarida have witnessed errors at this site. The site discarded key facts. Back-patch to v10 (all supported versions). Reviewed by Michael Paquier and Tom Lane. Discussion: https://postgr.es/m/20211107013157.GB790288@rfd.leadboat.com	2021-11-11 17:10:18 -08:00
Robert Haas	beb4e9ba16	Improve performance of pgarch_readyXlog() with many status files. Presently, the archive_status directory was scanned for each file to archive. When there are many status files, say because archive_command has been failing for a long time, these directory scans can get very slow. With this change, the archiver remembers several files to archive during each directory scan, speeding things up. To ensure timeline history files are archived as quickly as possible, XLogArchiveNotify() forces the archiver to do a new directory scan as soon as the .ready file for one is created. Nathan Bossart, per a long discussion involving many people. It is not clear to me exactly who out of all those people reviewed this particular patch. Discussion: http://postgr.es/m/CA+TgmobhAbs2yabTuTRkJTq_kkC80-+jw=pfpypdOJ7+gAbQbw@mail.gmail.com Discussion: http://postgr.es/m/620F3CE1-0255-4D66-9D87-0EADE866985A@amazon.com	2021-11-11 15:20:26 -05:00
Robert Haas	a27048cbcb	More cleanup of 'ThisTimeLineID'. In XLogCtlData, rename the structure member ThisTimeLineID to InsertTimeLineID and update the comments to make clear that it's only expected to be set after recovery is complete. In StartupXLOG, replace the local variables ThisTimeLineID and PrevTimeLineID with new local variables replayTLI and newTLI. In the old scheme, ThisTimeLineID was the replay TLI until we created a new timeline, and after that the replay TLI was in PrevTimeLineID. Now, replayTLI is the TLI from which we last replayed WAL throughout the entire function, and newTLI is either that, or the new timeline created upon promotion. Remove some misleading comments from the comment block just above where recoveryTargetTimeLineGoal and friends are declared. It's become incorrect, not only because ThisTimeLineID as a variable is now gone, but also because the rmgr code does not care about ThisTimeLineID and has not since what used to be the TLI field in the page header was repurposed to store the page checksum. Add a comment GetFlushRecPtr that it's only supposed to be used in normal running, and an assertion to verify that this is so. Per some ideas from Michael Paquier and some of my own. Review by Michael Paquier also. Discussion: http://postgr.es/m/CA+TgmoY1a2d1AnVR3tJcKmGGkhj7GGrwiNwjtKr21dxOuLBzCQ@mail.gmail.com	2021-11-10 09:45:24 -05:00
Tom Lane	c3ec4f8fe8	Silence uninitialized-variable warning. Quite a few buildfarm animals are warning about this, and lapwing is actually failing (because -Werror). It's a false positive AFAICS, so no need to do more than zero the variable to start with. Discussion: https://postgr.es/m/YYXJnUxgw9dZKxlX@paquier.xyz	2021-11-07 12:18:18 -05:00
Robert Haas	4a92a1c3d1	Change ThisTimeLineID from a global variable to a local variable. StartupXLOG() still has ThisTimeLineID as a local variable, but the remaining code in xlog.c now needs to the relevant TimeLineID by some other means. Mostly, this means that we now pass it as a function parameter to a bunch of functions where we didn't previously. However, a few cases require special handling: - In functions that might be called by outside callers who wouldn't necessarily know what timeline to specify, we get the timeline ID from shared memory. XLogCtl->ThisTimeLineID can be used in most cases since recovery is known to have completed by the time those functions are called. In xlog_redo(), we can use XLogCtl->replayEndTLI. - XLogFileClose() needs to know the TLI of the open logfile. Do that with a new global variable openLogTLI. While someone could argue that this is just trading one global variable for another, the new one has a far more narrow purposes and is referenced in just a few places. - read_backup_label() now returns the TLI that it obtains by parsing the backup_label file. Previously, ReadRecord() could be called to parse the checkpoint record without ThisTimeLineID having been initialized. Now, the timeline is passed down, and I didn't want to pass an uninitialized variable; this change lets us avoid that. The old coding didn't seem to have any practical consequences that we need to worry about, but this is cleaner. - In BootstrapXLOG(), it's just a constant. Patch by me, reviewed and tested by Michael Paquier, Amul Sul, and Álvaro Herrera. Discussion: https://postgr.es/m/CA+TgmobfAAqhfWa1kaFBBFvX+5CjM=7TE=n4r4Q1o2bjbGYBpA@mail.gmail.com	2021-11-05 12:53:15 -04:00
Robert Haas	e997a0c642	Remove all use of ThisTimeLineID global variable outside of xlog.c All such code deals with this global variable in one of three ways. Sometimes the same functions use it in more than one of these ways at the same time. First, sometimes it's an implicit argument to one or more functions being called in xlog.c or elsewhere, and must be set to the appropriate value before calling those functions lest they misbehave. In those cases, it is now passed as an explicit argument instead. Second, sometimes it's used to obtain the current timeline after the end of recovery, i.e. the timeline to which WAL is being written and flushed. Such code now calls GetWALInsertionTimeLine() or relies on the new out parameter added to GetFlushRecPtr(). Third, sometimes it's used during recovery to store the current replay timeline. That can change, so such code must generally update the value before each use. It can still do that, but must now use a local variable instead. The net effect of these changes is to reduce by a fair amount the amount of code that is directly accessing this global variable. That's good, because history has shown that we don't always think clearly about which timeline ID it's supposed to contain at any given point in time, or indeed, whether it has been or needs to be initialized at any given point in the code. Patch by me, reviewed and tested by Michael Paquier, Amul Sul, and Álvaro Herrera. Discussion: https://postgr.es/m/CA+TgmobfAAqhfWa1kaFBBFvX+5CjM=7TE=n4r4Q1o2bjbGYBpA@mail.gmail.com	2021-11-05 12:50:01 -04:00
Amit Kapila	335397456b	Move MarkCurrentTransactionIdLoggedIfAny() out of the critical section. We don't modify any shared state in this function which could cause problems for any concurrent session. This will make it look similar to the other updates for the same structure (TransactionState) which avoids confusion for future readers of code. Author: Dilip Kumar Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/E1mSoYz-0007Fh-D9@gemulon.postgresql.org	2021-11-02 09:11:05 +05:30
Amit Kapila	71db6459e6	Replace XLOG_INCLUDE_XID flag with a more localized flag. Commit `0bead9af48` introduced XLOG_INCLUDE_XID flag to indicate that the WAL record contains subXID-to-topXID association. It uses that flag later to mark in CurrentTransactionState that top-xid is logged so that we should not try to log it again with the next WAL record in the current subtransaction. However, we can use a localized variable to pass that information. In passing, change the related function and variable names to make them consistent with what the code is actually doing. Author: Dilip Kumar Reviewed-by: Alvaro Herrera, Amit Kapila Discussion: https://postgr.es/m/E1mSoYz-0007Fh-D9@gemulon.postgresql.org	2021-11-02 08:35:29 +05:30
Daniel Gustafsson	8af57ad815	Fix typos in comments Author: Peter Smith <smithpb2250@gmail.com> Discussion: https://postgr.es/m/CAHut+PsN_gmKu-KfeEb9NDARoTPbs4AN4PPu=6LZXFZRJ13SEw@mail.gmail.com	2021-10-27 22:38:38 +02:00
Robert Haas	a030a0c5cc	Initialize variable to placate compiler. Per Nathan Bossart. Discussion: http://postgr.es/m/FECEE7FC-CB74-45A9-BB24-89FEE52A9585@amazon.com	2021-10-25 16:31:00 -04:00
Robert Haas	9ce346eabf	Report progress of startup operations that take a long time. Users sometimes get concerned whe they start the server and it emits a few messages and then doesn't emit any more messages for a long time. Generally, what's happening is either that the system is taking a long time to apply WAL, or it's taking a long time to reset unlogged relations, or it's taking a long time to fsync the data directory, but it's not easy to tell which is the case. To fix that, add a new 'log_startup_progress_interval' setting, by default 10s. When an operation that is known to be potentially long-running takes more than this amount of time, we'll log a status update each time this interval elapses. To avoid undesirable log chatter, don't log anything about WAL replay when in standby mode. Nitin Jadhav and Robert Haas, reviewed by Amul Sul, Bharath Rupireddy, Justin Pryzby, Michael Paquier, and Álvaro Herrera. Discussion: https://postgr.es/m/CA+TgmoaHQrgDFOBwgY16XCoMtXxsrVGFB2jNCvb7-ubuEe1MGg@mail.gmail.com Discussion: https://postgr.es/m/CAMm1aWaHF7VE69572_OLQ+MgpT5RUiUDgF1x5RrtkJBLdpRj3Q@mail.gmail.com	2021-10-25 11:51:57 -04:00
Robert Haas	18e0913a42	StartupXLOG: Don't repeatedly disable/enable local xlog insertion. All the code that runs in the startup process to write WAL records before that's allowed generally is now consecutive, so there's no reason to shut the facility to write WAL locally off and then turn it on again three times in a row. Unfortunately, this requires a slight kludge in the checkpointer, which needs to separately enable writing WAL in order to write the checkpoint record. Because that code might run in the same process as StartupXLOG() if we are in single-user mode, we must save/restore the state of the LocalXLogInsertAllowed flag. Hopefully, we'll be able to eliminate this wart in further refactoring, but it's not too bad anyway. Amul Sul, with modifications by me. Discussion: http://postgr.es/m/CAAJ_b97fysj6sRSQEfOHj-y8Jfd5uPqOgO74qast89B4WfD+TA@mail.gmail.com	2021-10-25 10:16:28 -04:00
Robert Haas	a75dbf7f9e	StartupXLOG: Call CleanupAfterArchiveRecovery after XLogReportParameters. This does a better job grouping related operations together, since all of the WAL records that we need to write prior to allowing WAL writes generally and written by a single uninterrupted stretch of code. Since CleanupAfterArchiveRecovery() just (1) runs recovery_end_command, (2) removes non-parent xlog files, and (3) archives any final partial segment, this should be safe, because all of those things are pretty much unrelated to the WAL record written by XLogReportParameters(). Amul Sul, per a suggestion from me Discussion: http://postgr.es/m/CAAJ_b97fysj6sRSQEfOHj-y8Jfd5uPqOgO74qast89B4WfD+TA@mail.gmail.com	2021-10-25 10:02:36 -04:00
Noah Misch	3cd9c3b921	Fix CREATE INDEX CONCURRENTLY for the newest prepared transactions. The purpose of commit `8a54e12a38` was to fix this, and it sufficed when the PREPARE TRANSACTION completed before the CIC looked for lock conflicts. Otherwise, things still broke. As before, in a cluster having used CIC while having enabled prepared transactions, queries that use the resulting index can silently fail to find rows. It may be necessary to reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices. Fix this for future index builds by making CIC wait for arbitrarily-recent prepared transactions and for ordinary transactions that may yet PREPARE TRANSACTION. As part of that, have PREPARE TRANSACTION transfer locks to its dummy PGPROC before it calls ProcArrayClearTransaction(). Back-patch to 9.6 (all supported versions). Andrey Borodin, reviewed (in earlier versions) by Andres Freund. Discussion: https://postgr.es/m/01824242-AA92-4FE9-9BA7-AEBAFFEA3D0C@yandex-team.ru	2021-10-23 18:36:38 -07:00
Michael Paquier	409f9ca447	Reset properly snapshot export state during transaction abort During a replication slot creation, an ERROR generated in the same transaction as the one creating a to-be-exported snapshot would have left the backend in an inconsistent state, as the associated static export snapshot state was not being reset on transaction abort, but only on the follow-up command received by the WAL sender that created this snapshot on replication slot creation. This would trigger inconsistency failures if this session tried to export again a snapshot, like during the creation of a replication slot. Note that a snapshot export cannot happen in a transaction block, so there is no need to worry resetting this state for subtransaction aborts. Also, this inconsistent state would very unlikely show up to users. For example, one case where this could happen is an out-of-memory error when building the initial snapshot to-be-exported. Dilip found this problem while poking at a different patch, that caused an error in this code path for reasons unrelated to HEAD. Author: Dilip Kumar Reviewed-by: Michael Paquier, Zhihong Yu Discussion: https://postgr.es/m/CAFiTN-s0zA1Kj0ozGHwkYkHwa5U0zUE94RSc_g81WrpcETB5=w@mail.gmail.com Backpatch-through: 9.6	2021-10-18 11:55:42 +09:00
Robert Haas	811051c2e7	Postpone some end-of-recovery operations related to allowing WAL. CreateOverwriteContrecordRecord(), UpdateFullPageWrites(), PerformRecoveryXLogAction(), and CleanupAfterArchiveRecovery() are moved somewhat later in StartupXLOG(). This is preparatory work for a future patch that wants to allow recovery to end at one time and only later start to allow WAL writes. To do that, it's necessary to separate code that has to do with allowing WAL writes from other things that need to happen simply because recovery is ending, such as initializing shared memory data structures that depend on information that might not be accurate before redo is complete. This commit does not achieve that goal, but it is a step in that direction. For example, there are a few different bits of code that write things into WAL once we have finished recovery, and with this change, those bits of code are closer to each other than previously, with fewer unrelated bits of code interspersed. Robert Haas and Amul Sul Discussion: http://postgr.es/m/CAAJ_b97abMuq=470Wahun=aS1PHTSbStHtrjjPaD-C0YQ1AqVw@mail.gmail.com	2021-10-14 11:55:50 -04:00
Robert Haas	6df1543abf	Refactor some end-of-recovery code out of StartupXLOG(). Create a new function PerformRecoveryXLogAction() and move the code which either writes an end-of-recovery record or requests a checkpoint there. Also create a new function CleanupAfterArchiveRecovery() to perform a few tasks that we want to do after we've actually exited archive recovery but before we start accepting new WAL writes. More refactoring of this file is planned, but this commit is just straightforward code movement to make StartupXLOG() a little bit shorter and a little bit easier to understand. Robert Haas and Amul Sul Discussion: http://postgr.es/m/CAAJ_b97abMuq=470Wahun=aS1PHTSbStHtrjjPaD-C0YQ1AqVw@mail.gmail.com	2021-10-13 12:23:32 -04:00
Fujii Masao	68601985e6	Make recovery report error message when invalid page header is found. Commit `0668719801` changed XLogPageRead() so that it validated the page header, if invalid page header was found reset the error message and retried reading the page, to fix the scenario where streaming standby got stuck at a continuation record. This change hid the error message about invalid page header, which would make it harder for users to investigate what the actual issue was found in WAL. To fix the issue, this commit makes XLogPageRead() report the error message when invalid page header is found. When not in standby mode, an invalid page header should cause recovery to end, not retry reading the page, so XLogPageRead() doesn't need to validate the page header for the retry. Instead, ReadPageInternal() should be responsible for the validation in that case. Therefore this commit changes XLogPageRead() so that if not in standby mode it doesn't validate the page header for the retry. Reported-by: Yugo Nagata Author: Yugo Nagata, Kyotaro Horiguchi Reviewed-by: Ranier Vilela, Fujii Masao Discussion: https://postgr.es/m/20210718045505.32f463ed6c227111038d8ae4@sraoss.co.jp	2021-10-06 00:16:03 +09:00
Daniel Gustafsson	7111e332c5	Fix duplicate words in comments Remove accidentally duplicated words in code comments. Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://postgr.es/m/87bl45t0co.fsf@wibble.ilmari.org	2021-10-04 15:12:57 +02:00
Michael Paquier	8a4237908c	Fix snapshot builds during promotion of hot standby node with 2PC Some specific logic is done at the end of recovery when involving 2PC transactions: 1) Call RecoverPreparedTransactions(), to recover the state of 2PC transactions into memory (re-acquire locks, etc.). 2) ShutdownRecoveryTransactionEnvironment(), to move back to normal operations, mainly cleaning up recovery locks and KnownAssignedXids (including any 2PC transaction tracked previously). 3) Switch XLogCtl->SharedRecoveryState to RECOVERY_STATE_DONE, which is the tipping point for any process calling RecoveryInProgress() to check if the cluster is still in recovery or not. Any snapshot taken between steps 2) and 3) would be empty, causing any transaction relying on a snapshot at this point to potentially corrupt data as there could still be some 2PC transactions to track, with RecentXmin moving backwards on successive calls to GetSnapshotData() in the same transaction. As SharedRecoveryState is the point to take into account to know if it is safe to discard KnownAssignedXids, this commit moves step 2) after step 3), so as we can never finish with empty snapshots. This exists since the introduction of hot standby, so backpatch all the way down. The window with incorrect snapshots is extremely small, but I have seen it when running 023_pitr_prepared_xact.pl, as did buildfarm member fairywren. Thomas Munro also found it independently. Special thanks to Andres Freund for taking the time to analyze this issue. Reported-by: Thomas Munro, Michael Paquier Analyzed-by: Andres Freund Discussion: https://postgr.es/m/20210422203603.fdnh3fu2mmfp2iov@alap3.anarazel.de Backpatch-through: 9.6	2021-10-04 14:05:20 +09:00
Alvaro Herrera	d186d233df	Remove unstable, unnecessary test; fix typo Commit `ff9f111bce` added some test code that's unportable and doesn't add meaningful coverage. Remove it rather than try and get it to work everywhere. While at it, fix a typo in a log message added by the aforementioned commit. Backpatch to 14. Discussion: https://postgr.es/m/3000074.1632947632@sss.pgh.pa.us	2021-10-01 18:03:11 -03:00
Tom Lane	7b5d4c29ed	Fix Portal snapshot tracking to handle subtransactions properly. Commit `84f5c2908` forgot to consider the possibility that EnsurePortalSnapshotExists could run inside a subtransaction with lifespan shorter than the Portal's. In that case, the new active snapshot would be popped at the end of the subtransaction, leaving a dangling pointer in the Portal, with mayhem ensuing. To fix, make sure the ActiveSnapshot stack entry is marked with the same subtransaction nesting level as the associated Portal. It's certainly safe to do so since we won't be here at all unless the stack is empty; hence we can't create an out-of-order stack. Let's also apply this logic in the case where PortalRunUtility sets portalSnapshot, just to be sure that path can't cause similar problems. It's slightly less clear that that path can't create an out-of-order stack, so add an assertion guarding it. Report and patch by Bertrand Drouvot (with kibitzing by me). Back-patch to v11, like the previous commit. Discussion: https://postgr.es/m/ff82b8c5-77f4-3fe7-6028-fcf3303e82dd@amazon.com	2021-10-01 11:10:12 -04:00
Alvaro Herrera	ff9f111bce	Fix WAL replay in presence of an incomplete record Physical replication always ships WAL segment files to replicas once they are complete. This is a problem if one WAL record is split across a segment boundary and the primary server crashes before writing down the segment with the next portion of the WAL record: WAL writing after crash recovery would happily resume at the point where the broken record started, overwriting that record ... but any standby or backup may have already received a copy of that segment, and they are not rewinding. This causes standbys to stop following the primary after the latter crashes: LOG: invalid contrecord length 7262 at A8/D9FFFBC8 because the standby is still trying to read the continuation record (contrecord) for the original long WAL record, but it is not there and it will never be. A workaround is to stop the replica, delete the WAL file, and restart it -- at which point a fresh copy is brought over from the primary. But that's pretty labor intensive, and I bet many users would just give up and re-clone the standby instead. A fix for this problem was already attempted in commit `515e3d84a0`, but it only addressed the case for the scenario of WAL archiving, so streaming replication would still be a problem (as well as other things such as taking a filesystem-level backup while the server is down after having crashed), and it had performance scalability problems too; so it had to be reverted. This commit fixes the problem using an approach suggested by Andres Freund, whereby the initial portion(s) of the split-up WAL record are kept, and a special type of WAL record is written where the contrecord was lost, so that WAL replay in the replica knows to skip the broken parts. With this approach, we can continue to stream/archive segment files as soon as they are complete, and replay of the broken records will proceed across the crash point without a hitch. Because a new type of WAL record is added, users should be careful to upgrade standbys first, primaries later. Otherwise they risk the standby being unable to start if the primary happens to write such a record. A new TAP test that exercises this is added, but the portability of it is yet to be seen. This has been wrong since the introduction of physical replication, so backpatch all the way back. In stable branches, keep the new XLogReaderState members at the end of the struct, to avoid an ABI break. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Nathan Bossart <bossartn@amazon.com> Discussion: https://postgr.es/m/202108232252.dh7uxf6oxwcy@alvherre.pgsql	2021-09-29 11:21:51 -03:00
Alvaro Herrera	ade24dab97	Document XLOG_INCLUDE_XID a little better I noticed that commit `0bead9af48` left this flag undocumented in XLogSetRecordFlags, which led me to discover that the flag doesn't actually do what the one comment on it said it does. Improve the situation by adding some more comments. Backpatch to 14, where the aforementioned commit appears. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/202109212119.c3nhfp64t2ql@alvherre.pgsql	2021-09-21 19:47:53 -03:00
Tom Lane	2e4eae87d0	Send NOTIFY signals during CommitTransaction. Formerly, we sent signals for outgoing NOTIFY messages within ProcessCompletedNotifies, which was also responsible for sending relevant ones of those messages to our connected client. It therefore had to run during the main-loop processing that occurs just before going idle. This arrangement had two big disadvantages: * Now that procedures allow intra-command COMMITs, it would be useful to send NOTIFYs to other sessions immediately at COMMIT (though, for reasons of wire-protocol stability, we still shouldn't forward them to our client until end of command). * Background processes such as replication workers would not send NOTIFYs at all, since they never execute the client communication loop. We've had requests to allow triggers running in replication workers to send NOTIFYs, so that's a problem. To fix these things, move transmission of outgoing NOTIFY signals into AtCommit_Notify, where it will happen during CommitTransaction. Also move the possible call of asyncQueueAdvanceTail there, to ensure we don't bloat the async SLRU if a background worker sends many NOTIFYs with no one listening. We can also drop the call of asyncQueueReadAllNotifications, allowing ProcessCompletedNotifies to go away entirely. That's because commit `790026972` added a call of ProcessNotifyInterrupt adjacent to PostgresMain's call of ProcessCompletedNotifies, and that does its own call of asyncQueueReadAllNotifications, meaning that we were uselessly doing two such calls (inside two separate transactions) whenever inbound notify signals coincided with an outbound notify. We need only set notifyInterruptPending to ensure that ProcessNotifyInterrupt runs, and we're done. The existing documentation suggests that custom background workers should call ProcessCompletedNotifies if they want to send NOTIFY messages. To avoid an ABI break in the back branches, reduce it to an empty routine rather than removing it entirely. Removal will occur in v15. Although the problems mentioned above have existed for awhile, I don't feel comfortable back-patching this any further than v13. There was quite a bit of churn in adjacent code between 12 and 13. At minimum we'd have to also backpatch `51004c717`, and a good deal of other adjustment would also be needed, so the benefit-to-risk ratio doesn't look attractive. Per bug #15293 from Michael Powers (and similar gripes from others). Artur Zakirov and Tom Lane Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org	2021-09-14 17:18:25 -04:00
Alvaro Herrera	96b665083e	Revert "Avoid creating archive status ".ready" files too early" This reverts commit `515e3d84a0` and equivalent commits in back branches. This solution to the problem has a number of problems, so we'll try again with a different approach. Per note from Andres Freund Discussion: https://postgr.es/m/20210831042949.52eqp5xwbxgrfank@alap3.anarazel.de	2021-09-04 12:14:30 -04:00
Robert Haas	a780b2fcce	Fix broken snapshot handling in parallel workers. Pengchengliu reported an assertion failure in a parallel woker while performing a parallel scan using an overflowed snapshot. The proximate cause is that TransactionXmin was set to an incorrect value. The underlying cause is incorrect snapshot handling in parallel.c. In particular, InitializeParallelDSM() was unconditionally calling GetTransactionSnapshot(), because I (rhaas) mistakenly thought that was always retrieving an existing snapshot whereas, at isolation levels less than REPEATABLE READ, it's actually taking a new one. So instead do this only at higher isolation levels where there actually is a single snapshot for the whole transaction. By itself, this is not a sufficient fix, because we still need to guarantee that TransactionXmin gets set properly in the workers. The easiest way to do that seems to be to install the leader's active snapshot as the transaction snapshot if the leader did not serialize a transaction snapshot. This doesn't affect the results of future GetTrasnactionSnapshot() calls since those have to take a new snapshot anyway; what we care about is the side effect of setting TransactionXmin. Report by Pengchengliu. Patch by Greg Nancarrow, except for some comment text which I supplied. Discussion: https://postgr.es/m/002f01d748ac$eaa781a0$bff684e0$@tju.edu.cn	2021-08-25 08:32:04 -04:00
Alvaro Herrera	515e3d84a0	Avoid creating archive status ".ready" files too early WAL records may span multiple segments, but XLogWrite() does not wait for the entire record to be written out to disk before creating archive status files. Instead, as soon as the last WAL page of the segment is written, the archive status file is created, and the archiver may process it. If PostgreSQL crashes before it is able to write and flush the rest of the record (in the next WAL segment), the wrong version of the first segment file lingers in the archive, which causes operations such as point-in-time restores to fail. To fix this, keep track of records that span across segments and ensure that segments are only marked ready-for-archival once such records have been completely written to disk. This has always been wrong, so backpatch all the way back. Author: Nathan Bossart <bossartn@amazon.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Ryo Matsumura <matsumura.ryo@fujitsu.com> Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/CBDDFA01-6E40-46BB-9F98-9340F4379505@amazon.com	2021-08-23 15:50:35 -04:00
Michael Paquier	e4ba1005c0	Refresh apply delay on reload of recovery_min_apply_delay at recovery This commit ensures that the wait interval in the replay delay loop waiting for an amount of time defined by recovery_min_apply_delay is correctly handled on reload, recalculating the delay if this GUC value is updated, based on the timestamp of the commit record being replayed. The previous behavior would be problematic for example with replay still waiting even if the delay got reduced or just cancelled. If the apply delay was increased to a larger value, the wait would have just respected the old value set, finishing earlier. Author: Soumyadeep Chakraborty, Ashwin Agrawal Reviewed-by: Kyotaro Horiguchi, Michael Paquier Discussion: https://postgr.es/m/CAE-ML+93zfr-HLN8OuxF0BjpWJ17O5dv1eMvSE5jsj9jpnAXZA@mail.gmail.com Backpatch-through: 9.6	2021-08-16 12:10:22 +09:00
Michael Paquier	710796f054	Avoid unnecessary shared invalidations in ROLLBACK PREPARED The performance gain is minimal, but this makes the logic more consistent with AtEOXact_Inval(). No other invalidation is needed in this case as PREPARE takes already care of sending any local ones. Author: Liu Huailing Reviewed-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/OSZPR01MB6215AA84D71EF2B3D354CF86BE139@OSZPR01MB6215.jpnprd01.prod.outlook.com	2021-08-12 20:12:47 +09:00
Peter Eisentraut	ae03a7c739	Remove some unnecessary casts in format arguments We can use %zd or %zu directly, no need to cast to int. Conversely, some code was casting away from int when it could be using %d directly.	2021-08-08 22:08:07 +02:00
Andres Freund	fa91d4c91f	Make parallel worker shutdown complete entirely via before_shmem_exit(). This is a step toward storing stats in dynamic shared memory. As dynamic shared memory segments are detached from just after before_shmem_exit() callbacks are processed, but before on_shmem_exit() callbacks are, no stats can be collected after before_shmem_exit() callbacks have been processed. Parallel worker shutdown can cause stats to be emitted during DSM detach callbacks, e.g. for SharedFileSet (which closes its files, which can causes fd.c to emit stats about temporary files). Therefore parallel worker shutdown needs to complete during the processing of before_shmem_exit callbacks. One might think this problem could instead be solved by carefully ordering the attaching to DSM segments, so that the pgstats segments get detached from later than the parallel query ones. That turns out to not work because the stats hash might need to grow which can cause new segments to be allocated, which then will be detached from earlier. There are two code changes: First, call ParallelWorkerShutdown() via before_shmem_exit. That's a good idea on its own, because other shutdown callbacks like ShutdownPostgres and ShutdownAuxiliaryProcess are called via before_*. Second, explicitly detach from the parallel query DSM segment, thereby ensuring all stats are emitted during ParallelWorkerShutdown(). There are nicer solutions to these problems, but it's not obvious which of those solutions is the correct one. As the shared memory stats work already is a huge amount of work... Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20210405092914.mmxqe7j56lsjfsej@alap3.anarazel.de Discussion: https://postgr.es/m/20210803023612.iziacxk5syn2r4ut@alap3.anarazel.de	2021-08-06 19:08:56 -07:00
Andres Freund	1bc8e7b099	pgstat: split reporting/fetching of bgwriter and checkpointer stats. These have been unrelated since bgwriter and checkpointer were split into two processes in `806a2aee37`. As there several pending patches (shared memory stats, extending the set of tracked IO / buffer statistics) that are made a bit more awkward by the grouping, split them. Done separately to make reviewing easier. This does not change the contents of pg_stat_bgwriter or move fields out of bgwriter/checkpointer stats that arguably do not belong in either. However pgstat_fetch_global() was renamed and split into pgstat_fetch_stat_checkpointer() and pgstat_fetch_stat_bgwriter(). Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20210405092914.mmxqe7j56lsjfsej@alap3.anarazel.de	2021-08-04 19:16:04 -07:00
Thomas Munro	8f7c8e2bef	Further simplify a bit of logic in StartupXLOG(). Commit `7ff23c6d27` left us with two identical cases. Collapse them. Author: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGJ8NRsqgkZEnsnRc2MFROBV-jCnacbYvtpptK2A9YYp9Q%40mail.gmail.com	2021-08-03 14:16:58 +12:00
Thomas Munro	7ff23c6d27	Run checkpointer and bgwriter in crash recovery. Start up the checkpointer and bgwriter during crash recovery (except in --single mode), as we do for replication. This wasn't done back in commit `cdd46c76` out of caution. Now it seems like a better idea to make the environment as similar as possible in both cases. There may also be some performance advantages. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> Discussion: https://postgr.es/m/CA%2BhUKGJ8NRsqgkZEnsnRc2MFROBV-jCnacbYvtpptK2A9YYp9Q%40mail.gmail.com	2021-08-02 17:32:44 +12:00
Heikki Linnakangas	317632f307	Move InRecovery and standbyState global vars to xlogutils.c. They are used in code that runs both during normal operation and during WAL replay, and needs to behave differently during replay. Move them to xlogutils.c, because that's where we have other helper functions used by redo routines. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/b3b71061-4919-e882-4857-27e370ab134a%40iki.fi	2021-07-31 09:50:26 +03:00
Heikki Linnakangas	4fe8dcdff3	Extract code to describe recovery stop reason to a function. StartupXLOG() is very long, this makes it a little bit more readable. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/b3b71061-4919-e882-4857-27e370ab134a%40iki.fi	2021-07-31 09:49:30 +03:00
Heikki Linnakangas	6b16532811	Remove unnecessary 'restoredFromArchive' global variable. It might've been useful for debugging purposes, but meh. There's 'readSource' which does almost the same thing. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/b3b71061-4919-e882-4857-27e370ab134a%40iki.fi	2021-07-31 09:38:32 +03:00
Heikki Linnakangas	e9f5a0681c	Don't use O_SYNC or similar when opening signal file to fsync it. No need to use get_sync_bit() when we're calling pg_fsync() on the file. We're not writing to the files, so it doesn't make any difference in practice, but seems less surprising this way. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/b3b71061-4919-e882-4857-27e370ab134a%40iki.fi	2021-07-31 09:36:11 +03:00
Robert Haas	1d919de5eb	Remove unnecessary call to ReadCheckpointRecord(). It should always be the case that the last checkpoint record is still readable, because otherwise, a crash would leave us in a situation from which we can't recover. Therefore the test removed by this patch should always succeed. For it to fail, either there has to be a serious bug in the code someplace, or the user has to be manually modifying pg_wal while crash recovery is running. If it's the first one, we should fix the bug. If it's the second one, they should stop, or anyway they're doing so at their own risk. In neither case does a full checkpoint instead of an end-of-recovery record seem like a clear winner. Furthermore, rarely-taken code paths are particularly vulnerable to bugs, so let's simplify by getting rid of this one. Discussion: http://postgr.es/m/CA+TgmoYmw==TOJ6EzYb_vcjyS09NkzrVKSyBKUUyo1zBEaJASA@mail.gmail.com	2021-07-30 08:35:13 -04:00
Heikki Linnakangas	df9f0c716c	Update obsolete comment that still referred to CheckpointLock CheckpointLock was removed in commit `d18e75664a`, and commit `ce197e91d0` updated a leftover comment in CreateCheckPoint, but there was another copy of it in CreateRestartPoint still.	2021-07-30 12:52:44 +03:00

1 2 3 4 5 ...

2108 Commits