postgresql

Commit Graph

Author	SHA1	Message	Date
Heikki Linnakangas	b31ba5310b	Rename ShmemVariableCache to TransamVariables The old name was misleading: It's not a cache, the values kept in the struct are the authoritative source. Reviewed-by: Tristan Partin, Richard Guo Discussion: https://www.postgresql.org/message-id/6537d63d-4bb5-46f8-9b5d-73a8ba4720ab@iki.fi	2023-12-08 09:47:15 +02:00
Heikki Linnakangas	15916ffb04	Initialize ShmemVariableCache like other shmem areas For sake of consistency. Reviewed-by: Tristan Partin, Richard Guo Discussion: https://www.postgresql.org/message-id/6537d63d-4bb5-46f8-9b5d-73a8ba4720ab@iki.fi	2023-12-08 09:46:59 +02:00
Heikki Linnakangas	049ef3398d	Don't try to open visibilitymap when analyzing a foreign table It's harmless, visibilitymap_count() returns 0 if the file doesn't exist. But it's also very pointless. I noticed this when I added an assertion in smgropen() that the relnumber is valid. Discussion: https://www.postgresql.org/message-id/621a52fd-3cd8-4f5d-a561-d510b853bbaf@iki.fi	2023-12-08 09:16:21 +02:00
Heikki Linnakangas	fd5e8b440d	Refactor how InitProcess is called The order of process initialization steps is now more consistent between !EXEC_BACKEND and EXEC_BACKEND modes. InitProcess() is called at the same place in either mode. We can now also move the AttachSharedMemoryStructs() call into InitProcess() itself. This reduces the number of "#ifdef EXEC_BACKEND" blocks. Reviewed-by: Tristan Partin, Andres Freund, Alexander Lakhin Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi	2023-12-03 16:39:18 +02:00
Heikki Linnakangas	69d903367c	Refactor CreateSharedMemoryAndSemaphores For clarity, have separate functions for creating the shared memory and semaphores at postmaster or single-user backend startup, and for attaching to existing shared memory structures in EXEC_BACKEND case. CreateSharedMemoryAndSemaphores() is now called only at postmaster startup, and a new AttachSharedMemoryStructs() function is called at backend startup in EXEC_BACKEND mode. Reviewed-by: Tristan Partin, Andres Freund Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi	2023-12-03 16:09:42 +02:00
Heikki Linnakangas	f93133a250	Print lwlock stats also for aux processes, when built with LWLOCK_STATS InitAuxiliaryProcess() closely resembles InitProcess(), but it didn't call InitLWLockAccess(). But because InitLWLockAccess() is a no-op unless compiled with LWLOCK_STATS, and everything works even if it's not called, the only consequence was that the stats were not printed for aux processes. This was an oversight in commit `1c6821be31`, in version 9.5, so it is missing in all supported branches. But since it only affects developers using LWLOCK_STATS and no one has complained, no backpatching. Discussion: https://www.postgresql.org/message-id/20231130202648.7k6agmuizdilufnv@awork3.anarazel.de	2023-12-01 01:00:03 +02:00
Michael Paquier	8d9978a717	Apply quotes more consistently to GUC names in logs Quotes are applied to GUCs in a very inconsistent way across the code base, with a mix of double quotes or no quotes used. This commit removes double quotes around all the GUC names that are obviously referred to as parameters with non-English words (use of underscore, mixed case, etc). This is the result of a discussion with Álvaro Herrera, Nathan Bossart, Laurenz Albe, Peter Eisentraut, Tom Lane and Daniel Gustafsson. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Pv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w@mail.gmail.com	2023-11-30 14:11:45 +09:00
Alexander Korotkov	4ed8f0913b	Index SLRUs by 64-bit integers rather than by 32-bit integers We've had repeated bugs in the area of handling SLRU wraparound in the past, some of which have caused data loss. Switching to an indexing system for SLRUs that does not wrap around should allow us to get rid of a whole bunch of problems and improve the overall reliability of the system. This particular patch however only changes the indexing and doesn't address the wraparound per se. This is going to be done in the following patches. Author: Maxim Orlov, Aleksander Alekseev, Alexander Korotkov, Teodor Sigaev Author: Nikita Glukhov, Pavel Borisov, Yura Sokolov Reviewed-by: Jacob Champion, Heikki Linnakangas, Alexander Korotkov Reviewed-by: Japin Li, Pavel Borisov, Tom Lane, Peter Eisentraut, Andres Freund Reviewed-by: Andrey Borodin, Dilip Kumar, Aleksander Alekseev Discussion: https://postgr.es/m/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com Discussion: https://postgr.es/m/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com	2023-11-29 01:40:56 +02:00
Heikki Linnakangas	50c67c2019	Use ResourceOwner to track WaitEventSets. A WaitEventSet holds file descriptors or event handles (on Windows). If FreeWaitEventSet is not called, those fds or handles are leaked. Use ResourceOwners to track WaitEventSets, to clean those up automatically on error. This was a live bug in async Append nodes, if a FDW's ForeignAsyncRequest function failed. (In back branches, I will apply a more localized fix for that based on PG_TRY-PG_FINALLY.) The added test doesn't check for leaking resources, so it passed even before this commit. But at least it covers the code path. In the passing, fix misleading comment on what the 'nevents' argument to WaitEventSetWait means. Report by Alexander Lakhin, analysis and suggestion for the fix by Tom Lane. Fixes bug #17828. Reviewed-by: Alexander Lakhin, Thomas Munro Discussion: https://www.postgresql.org/message-id/472235.1678387869@sss.pgh.pa.us	2023-11-23 13:31:36 +02:00
Michael Paquier	3650e7a393	Prevent overflow for block number in buffile.c As coded, the start block calculated by BufFileAppend() would overflow once more than 16k files are used with a default block size. This issue existed before `b1e5c9fa9a`, but there's no reason not to be clean about it. Per report from Coverity, with a fix suggested by Tom Lane.	2023-11-20 09:14:53 +09:00
Michael Paquier	b1e5c9fa9a	Change logtape/tuplestore code to use int64 for block numbers The code previously relied on "long" as type to track block numbers, which would be 4 bytes in all Windows builds or any 32-bit builds. This limited the code to be able to handle up to 16TB of data with the default block size of 8kB, like during a CLUSTER. This code now relies on a more portable int64, which should be more than enough for at least the next 20 years to come. This issue has been reported back in 2017, but nothing was done about it back then, so here we go now. Reported-by: Peter Geoghegan Reviewed-by: Heikki Linnakangas Discussion: https://postgr.es/m/CAH2-WznCscXnWmnj=STC0aSa7QG+BRedDnZsP=Jo_R9GUZvUrg@mail.gmail.com	2023-11-17 11:20:53 +09:00
Michael Paquier	c99c7a4871	Remove NOT_USED BufFileTellBlock() from buffile.c This routine has been marked as NOT_USED since `20ad43b576` from 2000, and a patch is planned to switch the logtape/tuplestore APIs to rely on int64 rather than long for the block nunbers, which is more portable. Keeping it is more confusing than anything at this stage, so let's get rid of it entirely. Thanks for Heikki Linnakangas for the poke on this one. Discussion: https://postgr.es/m/5047be8c-7ee6-4dd5-af76-6c916c3103b4@iki.fi	2023-11-17 10:46:50 +09:00
Etsuro Fujita	06e8e71e7f	Remove incorrect file reference in comment. Commit `b7eda3e0e` moved XidInMVCCSnapshot() from tqual.c into snapmgr.c, but follow-up commit `c91560def` incorrectly updated this reference. We could fix it, but as pointed out by Daniel Gustafsson, 1) the reader can easily find the file that contains the definition of that function, e.g. by grepping, and 2) this kind of reference is prone to going stale; so let's just remove it. Back-patch to all supported branches. Reviewed by Daniel Gustafsson. Discussion: https://postgr.es/m/CAPmGK145VdKkPBLWS2urwhgsfidbSexwY-9zCL6xSUJH%2BBTUUg%40mail.gmail.com	2023-11-13 19:05:00 +09:00
Heikki Linnakangas	b8bff07daa	Make ResourceOwners more easily extensible. Instead of having a separate array/hash for each resource kind, use a single array and hash to hold all kinds of resources. This makes it possible to introduce new resource "kinds" without having to modify the ResourceOwnerData struct. In particular, this makes it possible for extensions to register custom resource kinds. The old approach was to have a small array of resources of each kind, and if it fills up, switch to a hash table. The new approach also uses an array and a hash, but now the array and the hash are used at the same time. The array is used to hold the recently added resources, and when it fills up, they are moved to the hash. This keeps the access to recent entries fast, even when there are a lot of long-held resources. All the resource-specific ResourceOwnerEnlarge(), ResourceOwnerRemember(), and ResourceOwnerForget*() functions have been replaced with three generic functions that take resource kind as argument. For convenience, we still define resource-specific wrapper macros around the generic functions with the old names, but they are now defined in the source files that use those resource kinds. The release callback no longer needs to call ResourceOwnerForget on the resource being released. ResourceOwnerRelease unregisters the resource from the owner before calling the callback. That needed some changes in bufmgr.c and some other files, where releasing the resources previously always called ResourceOwnerForget. Each resource kind specifies a release priority, and ResourceOwnerReleaseAll releases the resources in priority order. To make that possible, we have to restrict what you can do between phases. After calling ResourceOwnerRelease(), you are no longer allowed to remember any more resources in it or to forget any previously remembered resources by calling ResourceOwnerForget. There was one case where that was done previously. At subtransaction commit, AtEOSubXact_Inval() would handle the invalidation messages and call RelationFlushRelation(), which temporarily increased the reference count on the relation being flushed. We now switch to the parent subtransaction's resource owner before calling AtEOSubXact_Inval(), so that there is a valid ResourceOwner to temporarily hold that relcache reference. Other end-of-xact routines make similar calls to AtEOXact_Inval() between release phases, but I didn't see any regression test failures from those, so I'm not sure if they could reach a codepath that needs remembering extra resources. There were two exceptions to how the resource leak WARNINGs on commit were printed previously: llvmjit silently released the context without printing the warning, and a leaked buffer io triggered a PANIC. Now everything prints a WARNING, including those cases. Add tests in src/test/modules/test_resowner. Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu Reviewed-by: Peter Eisentraut, Andres Freund Discussion: https://www.postgresql.org/message-id/cbfabeb0-cd3c-e951-a572-19b365ed314d%40iki.fi	2023-11-08 13:30:50 +02:00
Heikki Linnakangas	b70c2143bb	Move a few ResourceOwnerEnlarge() calls for safety and clarity. These are functions where a lot of things happen between the ResourceOwnerEnlarge and ResourceOwnerRemember calls. It's important that there are no unrelated ResourceOwnerRemember calls in the code in between, otherwise the reserved entry might be used up by the intervening ResourceOwnerRemember and not be available at the intended ResourceOwnerRemember call anymore. I don't see any bugs here, but the longer the code path between the calls is, the harder it is to verify. In bufmgr.c, there is a function similar to ResourceOwnerEnlarge, ReservePrivateRefCountEntry(), to ensure that the private refcount array has enough space. The ReservePrivateRefCountEntry() calls were made at different places than the ResourceOwnerEnlargeBuffers() calls. Move the ResourceOwnerEnlargeBuffers() and ReservePrivateRefCountEntry() calls together for consistency. Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu Reviewed-by: Peter Eisentraut, Andres Freund Discussion: https://www.postgresql.org/message-id/cbfabeb0-cd3c-e951-a572-19b365ed314d%40iki.fi	2023-11-08 13:30:46 +02:00
Peter Eisentraut	721856ff24	Remove distprep A PostgreSQL release tarball contains a number of prebuilt files, in particular files produced by bison, flex, perl, and well as html and man documentation. We have done this consistent with established practice at the time to not require these tools for building from a tarball. Some of these tools were hard to get, or get the right version of, from time to time, and shipping the prebuilt output was a convenience to users. Now this has at least two problems: One, we have to make the build system(s) work in two modes: Building from a git checkout and building from a tarball. This is pretty complicated, but it works so far for autoconf/make. It does not currently work for meson; you can currently only build with meson from a git checkout. Making meson builds work from a tarball seems very difficult or impossible. One particular problem is that since meson requires a separate build directory, we cannot make the build update files like gram.h in the source tree. So if you were to build from a tarball and update gram.y, you will have a gram.h in the source tree and one in the build tree, but the way things work is that the compiler will always use the one in the source tree. So you cannot, for example, make any gram.y changes when building from a tarball. This seems impossible to fix in a non-horrible way. Second, there is increased interest nowadays in precisely tracking the origin of software. We can reasonably track contributions into the git tree, and users can reasonably track the path from a tarball to packages and downloads and installs. But what happens between the git tree and the tarball is obscure and in some cases non-reproducible. The solution for both of these issues is to get rid of the step that adds prebuilt files to the tarball. The tarball now only contains what is in the git tree (). Getting the additional build dependencies is no longer a problem nowadays, and the complications to keep these dual build modes working are significant. And of course we want to get the meson build system working universally. This commit removes the make distprep target altogether. The make dist target continues to do its job, it just doesn't call distprep anymore. () - The tarball also contains the INSTALL file that is built at make dist time, but not by distprep. This is unchanged for now. The make maintainer-clean target, whose job it is to remove the prebuilt files in addition to what make distclean does, is now just an alias to make distprep. (In practice, it is probably obsolete given that git clean is available.) The following programs are now hard build requirements in configure (they were already required by meson.build): - bison - flex - perl Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e07408d9-e5f2-d9fd-5672-f53354e9305e@eisentraut.org	2023-11-06 15:18:04 +01:00
Noah Misch	3a9b18b309	Ban role pg_signal_backend from more superuser backend types. Documentation says it cannot signal "a backend owned by a superuser". On the contrary, it could signal background workers, including the logical replication launcher. It could signal autovacuum workers and the autovacuum launcher. Block all that. Signaling autovacuum workers and those two launchers doesn't stall progress beyond what one could achieve other ways. If a cluster uses a non-core extension with a background worker that does not auto-restart, this could create a denial of service with respect to that background worker. A background worker with bugs in its code for responding to terminations or cancellations could experience those bugs at a time the pg_signal_backend member chooses. Back-patch to v11 (all supported versions). Reviewed by Jelte Fennema-Nio. Reported by Hemanth Sandrana and Mahendrakar Srinivasarao. Security: CVE-2023-5870	2023-11-06 06:14:13 -08:00
Michael Paquier	96f052613f	Introduce pg_stat_checkpointer Historically, the statistics of the checkpointer have been always part of pg_stat_bgwriter. This commit removes a few columns from pg_stat_bgwriter, and introduces pg_stat_checkpointer with equivalent, renamed columns (plus a new one for the reset timestamp): - checkpoints_timed -> num_timed - checkpoints_req -> num_requested - checkpoint_write_time -> write_time - checkpoint_sync_time -> sync_time - buffers_checkpoint -> buffers_written The fields of PgStat_CheckpointerStats and its SQL functions are renamed to match with the new field names, for consistency. Note that background writer and checkpointer have been split into two different processes in commits `806a2aee37` and `bf405ba8e4`. The pgstat structures were already split, making this change straight-forward. Bump catalog version. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Andres Freund, Michael Paquier Discussion: https://postgr.es/m/CALj2ACVxX2ii=66RypXRweZe2EsBRiPMj0aHfRfHUeXJcC7kHg@mail.gmail.com	2023-10-30 09:47:16 +09:00
Peter Eisentraut	611806cd72	Add trailing commas to enum definitions Since C99, there can be a trailing comma after the last value in an enum definition. A lot of new code has been introducing this style on the fly. Some new patches are now taking an inconsistent approach to this. Some add the last comma on the fly if they add a new last value, some are trying to preserve the existing style in each place, some are even dropping the last comma if there was one. We could nudge this all in a consistent direction if we just add the trailing commas everywhere once. I omitted a few places where there was a fixed "last" value that will always stay last. I also skipped the header files of libpq and ecpg, in case people want to use those with older compilers. There were also a small number of cases where the enum type wasn't used anywhere (but the enum values were), which ended up confusing pgindent a bit, so I left those alone. Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org	2023-10-26 09:20:54 +02:00
Jeff Davis	00d7fb5e2e	Assert that buffers are marked dirty before XLogRegisterBuffer(). Enforce the rule from transam/README in XLogRegisterBuffer(), and update callers to follow the rule. Hash indexes sometimes register clean pages as a part of the locking protocol, so provide a REGBUF_NO_CHANGE flag to support that use. Discussion: https://postgr.es/m/c84114f8-c7f1-5b57-f85a-3adc31e1a904@iki.fi Reviewed-by: Heikki Linnakangas	2023-10-23 17:17:46 -07:00
Robert Haas	5c47c6546c	Refactor parse_filename_for_nontemp_relation to parse more. Instead of returning the number of characters in the RelFileNumber, return the RelFileNumber itself. Continue to return the fork number, as before, and additionally return the segment number. parse_filename_for_nontemp_relation now rejects a RelFileNumber or segment number that begins with a leading zero. Before, we accepted such cases as relation filenames, but if we continued to do so after this change, the function might return the same values for two different files (e.g. 1234.5 and 001234.5 or 1234.005) which could be annoying for callers. Since we don't actually ever generate filenames with leading zeroes in the names, any such files that we find must have been created by something other than PostgreSQL, and it is therefore reasonable to treat them as non-relation files. Along the way, change unlogged_relation_entry to store a RelFileNumber rather than an OID. This update should have been made in `851f4cc75c`, but it was overlooked. It's trivial to make the update as part of this commit, perhaps more trivial than it would have been without it, so do that. Patch by me, reviewed by David Steele. Discussion: http://postgr.es/m/CA+TgmoZNVeBzoqDL8xvr-nkaepq815jtDR4nJzPew7=3iEuM1g@mail.gmail.com	2023-10-23 15:08:53 -04:00
Thomas Munro	dab889d60b	Fix min_dynamic_shared_memory on Windows. When min_dynamic_shared_memory is set above 0, we try to find space in a pre-allocated region of the main shared memory area instead of calling dsm_impl_XXX() routines to allocate more. The dsm_pin_segment() and dsm_unpin_segment() routines had a bug: they called dsm_impl_XXX() routines even for main region segments. Nobody noticed before now because those routines do nothing on Unix, but on Windows they'd fail while attempting to duplicate an invalid Windows HANDLE. Add the missing gating. Back-patch to 14, where commit `84b1c63a` added this feature. Fixes pgsql-bugs bug #18165. Reported-by: Maxime Boyer <maxime.boyer@cra-arc.gc.ca> Tested-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/18165-bf4f525cea6e51de%40postgresql.org	2023-10-22 10:04:55 +13:00
Nathan Bossart	97550c0711	Avoid calling proc_exit() in processes forked by system(). The SIGTERM handler for the startup process immediately calls proc_exit() for the duration of the restore_command, i.e., a call to system(). This system() call forks a new process to execute the shell command, and this child process inherits the parent's signal handlers. If both the parent and child processes receive SIGTERM, both will attempt to call proc_exit(). This can end badly. For example, both processes will try to remove themselves from the PGPROC shared array. To fix this problem, this commit adds a check in StartupProcShutdownHandler() to see whether MyProcPid == getpid(). If they match, this is the parent process, and we can proc_exit() like before. If they do not match, this is a child process, and we just emit a message to STDERR (in a signal safe manner) and _exit(), thereby skipping any problematic exit callbacks. This commit also adds checks in proc_exit(), ProcKill(), and AuxiliaryProcKill() that verify they are not being called within such child processes. Suggested-by: Andres Freund Reviewed-by: Thomas Munro, Andres Freund Discussion: https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz Discussion: https://postgr.es/m/20230223231503.GA743455%40nathanxps13 Backpatch-through: 11	2023-10-17 10:41:48 -05:00
Michael Paquier	d6b0c2bcb1	Improve truncation of pg_serial/, removing "apparent wraparound" LOGs It is possible that the tail XID of pg_serial/ gets ahead of its head XID, which would cause the truncation of pg_serial/ done during checkpoints to show up as a "wraparound" LOG in SimpleLruTruncate(), which is confusing. This also wastes a bit of disk space until the head page is reclaimed again. CheckPointPredicate() is changed so as the cutoff page for the truncation is switched to the head page if the tail XID has advanced beyond the head XID, rather than the tail page. This prevents the confusing LOG message about a wraparound while allowing some truncation to be done to cut in disk space. This could be considered as a bug fix, but the original behavior is harmless as well, resulting only in disk space temporarily wasted, so no backpatch is done. Author: Sami Imseih Reviewed-by: Heikki Linnakangas, Michael Paquier Discussion: https://postgr.es/m/755E19CA-D02C-4A4C-80D3-74F775410C48@amazon.com	2023-10-17 14:36:21 +09:00
Alexander Korotkov	e83d1b0c40	Add support event triggers on authenticated login This commit introduces trigger on login event, allowing to fire some actions right on the user connection. This can be useful for logging or connection check purposes as well as for some personalization of environment. Usage details are described in the documentation included, but shortly usage is the same as for other triggers: create function returning event_trigger and then create event trigger on login event. In order to prevent the connection time overhead when there are no triggers the commit introduces pg_database.dathasloginevt flag, which indicates database has active login triggers. This flag is set by CREATE/ALTER EVENT TRIGGER command, and unset at connection time when no active triggers found. Author: Konstantin Knizhnik, Mikhail Gribkov Discussion: https://postgr.es/m/0d46d29f-4558-3af9-9c85-7774e14a7709%40postgrespro.ru Reviewed-by: Pavel Stehule, Takayuki Tsunakawa, Greg Nancarrow, Ivan Panchenko Reviewed-by: Daniel Gustafsson, Teodor Sigaev, Robert Haas, Andres Freund Reviewed-by: Tom Lane, Andrey Sokolov, Zhihong Yu, Sergey Shinderuk Reviewed-by: Gregory Stark, Nikita Malakhov, Ted Yu	2023-10-16 03:18:22 +03:00
Nathan Bossart	8d140c5822	Improve the naming in wal_sync_method code. * sync_method is renamed to wal_sync_method. * sync_method_options[] is renamed to wal_sync_method_options[]. * assign_xlog_sync_method() is renamed to assign_wal_sync_method(). * The names of the available synchronization methods are now prefixed with "WAL_SYNC_METHOD_" and have been moved into a WalSyncMethod enum. * PLATFORM_DEFAULT_SYNC_METHOD is renamed to PLATFORM_DEFAULT_WAL_SYNC_METHOD, and DEFAULT_SYNC_METHOD is renamed to DEFAULT_WAL_SYNC_METHOD. These more descriptive names help distinguish the code for wal_sync_method from the code for DataDirSyncMethod (e.g., the recovery_init_sync_method configuration parameter and the --sync-method option provided by several frontend utilities). This change also prevents name collisions between the aforementioned sets of code. Since this only improves the naming of internal identifiers, there should be no behavior change. Author: Maxim Orlov Discussion: https://postgr.es/m/CACG%3DezbL1gwE7_K7sr9uqaCGkWhmvRTcTEnm3%2BX1xsRNwbXULQ%40mail.gmail.com	2023-10-13 15:16:45 -05:00
Etsuro Fujita	aec684ff0f	Remove extra parenthesis from comment.	2023-10-06 18:30:00 +09:00
Bruce Momjian	441bbd2988	doc: correct reference to pg_relation in comment Reported-by: Dagfinn Ilmari Mannsåker Discussion: https://postgr.es/m/87sf9apnr0.fsf@wibble.ilmari.org Backpatch-through: master	2023-09-26 17:07:14 -04:00
Peter Eisentraut	9847ca2c79	Standardize type of extend_by counter The counter of extend_by loops is mixed int and uint32. Fix by standardizing from int to uint32, to match the extend_by variable. Fixup for `31966b151e`. Author: Ranier Vilela <ranier.vf@gmail.com> Reviewed-by: Gurjeet Singh <gurjeet@singh.im> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAEudQAqHG-JP-YnG54ftL_b7v6-57rMKwET_MSvEoen0UHuPig@mail.gmail.com	2023-09-19 09:46:01 +02:00
Andres Freund	7369798a83	Fix tracking of temp table relation extensions as writes Karina figured out that I (Andres) confused BufferUsage.temp_blks_written with BufferUsage.local_blks_written in `fcdda1e4b5`. Tests in core PG can't easily test this, as BufferUsage is just used for EXPLAIN (ANALYZE, BUFFERS) and pg_stat_statements. Thus this commit adds tests for this to pg_stat_statements. Reported-by: Karina Litskevich <litskevichkarina@gmail.com> Author: Karina Litskevich <litskevichkarina@gmail.com> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CACiT8ibxXA6+0amGikbeFhm8B84XdQVo6D0Qfd1pQ1s8zpsnxQ@mail.gmail.com Backpatch: 16-, where `fcdda1e4b5` was merged	2023-09-13 19:14:09 -07:00
Thomas Munro	04a09ee944	Teach WaitEventSetWait() to report multiple events on Windows. The WAIT_USE_WIN32 implementation of WaitEventSetWait() previously reported at most one event per call, because that's what the underlying WaitForMultipleObjects() call does. We can make the behavior match the three Unix implementations by looping until our output buffer is full, or there are no more events available now. This makes no difference to most callers including the regular FEBE socket code, since they ask for at most one event anyway. A difference in socket accept priority might be perceived by end users after commit `7389aad6` started using WaitEventSet in the postmaster. With this commit, the accept order now matches Unix systems, servicing listening sockets in round-robin order. We decided it wasn't really a bug or worth back-patching, but it seems good to align the behavior across platforms. Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Tested-by: "Wei Wang (Fujitsu)" <wangw.fnst@fujitsu.com> Discussion: https://postgr.es/m/CA%2BhUKG%2BA2dk29hr5zRP3HVJQ-_PncNJM6HVQ7aaYLXLRBZU-xw%40mail.gmail.com	2023-09-08 18:49:08 +12:00
Michael Paquier	e722846daf	Improve BackendXidGetPid() to only access allProcs on matching XID Compilers are able to optimize that, but it makes the code slightly more readable this way. Author: Zhao Junwang Reviewed-by: Ashutosh Bapat Discussion: https://postgr.es/m/CAEG8a3+i9gtqF65B+g_puVaCQuf0rZC-EMqMyEjGFJYOqUUWfA@mail.gmail.com	2023-09-08 10:00:29 +09:00
Thomas Munro	0da096d78e	Fix recovery conflict SIGUSR1 handling. We shouldn't be doing non-trivial work in signal handlers in general, and in this case the handler could reach unsafe code and corrupt state. It also clobbered its own "reason" code. Move all recovery conflict decision logic into the next CHECK_FOR_INTERRUPTS(), and have the signal handler just set flags and the latch, following the standard pattern. Since there are several different "reasons", use a separate flag for each. With this refactoring, the recovery conflict system no longer piggy-backs on top of the regular query cancelation mechanism, but instead raises an error directly if it decides that is necessary. It still needs to respect QueryCancelHoldoffCount, because otherwise the FEBE protocol might get out of sync (see commit `2b3a8b20c2`). This fixes one class of intermittent failure in the new 031_recovery_conflict.pl test added by commit `9f8a050f`, though the buggy coding is much older. Failures outside contrived testing seem to be very rare (or perhaps incorrectly attributed) in the field, based on lack of reports. No back-patch for now due to complexity and release schedule. We have the option to back-patch into 16 later, as 16 has prerequisite commit `bea3d7e`. Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Reviewed-by: Michael Paquier <michael@paquier.xyz> (earlier version) Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier version) Tested-by: Christoph Berg <myon@debian.org> Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com Discussion: https://postgr.es/m/CALj2ACVr8au2J_9D88UfRCi0JdWhyQDDxAcSVav0B0irx9nXEg%40mail.gmail.com	2023-09-07 12:39:24 +12:00
Nathan Bossart	3ed1956719	Make enum for sync methods available to frontend code. This commit renames RecoveryInitSyncMethod to DataDirSyncMethod and moves it to common/file_utils.h. This is preparatory work for a follow-up commit that will allow specifying the synchronization method in frontend utilities such as pg_upgrade and pg_basebackup. Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/ZN2ZB4afQ2JbR9TA%40paquier.xyz	2023-09-06 16:26:39 -07:00
Michael Paquier	414f6c0fb7	Use more consistent names for wait event objects and types The event names use the same case-insensitive characters, hence applying lower() or upper() to the monitoring queries allows the detection of the same events as before this change. It is possible to cross-check the data with the system view pg_wait_events, for instance, with a query like that showing no differences: SELECT lower(type), lower(name), description FROM pg_wait_events ORDER BY 1, 2; This will help in the introduction of more simplifications in the format of wait_event_names. Some of the enum values in the code had to be renamed a bit to follow the same convention naming across the board. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/ZOxVHQwEC/9X/p/z@paquier.xyz	2023-09-06 10:04:43 +09:00
Nathan Bossart	f39b265808	Move PG_TEMP_FILE* macros to file_utils.h. Presently, frontend code that needs to use these macros must either include storage/fd.h, which declares several frontend-unsafe functions, or duplicate the macros. This commit moves these macros to common/file_utils.h, which is safe for both frontend and backend code. Consequently, we can also remove the duplicated macros in pg_checksums and stop including storage/fd.h in pg_rewind. Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/ZOP5qoUualu5xl2Z%40paquier.xyz	2023-09-05 17:02:06 -07:00
Nathan Bossart	119c23eb98	Replace known_assigned_xids_lck with memory barriers. This lock was introduced before memory barrier support was added, and it is only used to guarantee proper memory ordering when KnownAssignedXidsAdd() appends to the array without a lock. Now that such memory barrier support exists, we can remove the lock and use barriers instead. Suggested-by: Tom Lane Author: Michail Nikolaev Reviewed-by: Robert Haas Discussion: https://postgr.es/m/CANtu0oh0si%3DjG5z_fLeFtmYcETssQ08kLEa8b6TQqDm_cinroA%40mail.gmail.com	2023-09-05 13:59:06 -07:00
Thomas Munro	f691f5b80a	Remove the "snapshot too old" feature. Remove the old_snapshot_threshold setting and mechanism for producing the error "snapshot too old", originally added by commit `848ef42b`. Unfortunately it had a number of known problems in terms of correctness and performance, mostly reported by Andres in the course of his work on snapshot scalability. We agreed to remove it, after a long period without an active plan to fix it. This is certainly a desirable feature, and someone might propose a new or improved implementation in the future. Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CACG%3DezYV%2BEvO135fLRdVn-ZusfVsTY6cH1OZqWtezuEYH6ciQA%40mail.gmail.com Discussion: https://postgr.es/m/20200401064008.qob7bfnnbu4w5cw4%40alap3.anarazel.de Discussion: https://postgr.es/m/CA%2BTgmoY%3Daqf0zjTD%2B3dUWYkgMiNDegDLFjo%2B6ze%3DWtpik%2B3XqA%40mail.gmail.com	2023-09-05 19:53:43 +12:00
Peter Eisentraut	4f3514f201	Rename hook functions for debug_io_direct to match variable name. Commit `319bae9a` renamed the GUC. Rename the check and assign functions to match, and alphabetize. Back-patch to 16. Author: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/2769341e-fa28-c2ee-3e4b-53fdcaaf2271%40eisentraut.org	2023-08-24 22:25:49 +12:00
Daniel Gustafsson	27a36f79b6	Fix wording in comment The comment for the DSM_OP_CREATE paramater read "the a new handle" which is confusing. Fix by rewording to indicate what the parameter means for DSM_OP_CREATE. Reported-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAEG8a3J2bc197ym-M_ykOXb9ox2eNn-QNKNeoSAoHYSw2NCOnw@mail.gmail.com	2023-08-23 10:22:55 +02:00
Thomas Munro	7114791158	ExtendBufferedWhat -> BufferManagerRelation. Commit `31966b15` invented a way for functions dealing with relation extension to accept a Relation in online code and an SMgrRelation in recovery code. It seems highly likely that future bufmgr.c interfaces will face the same problem, and need to do something similar. Generalize the names so that each interface doesn't have to re-invent the wheel. Back-patch to 16. Since extension AM authors might start using the constructor macros once 16 ships, we agreed to do the rename in 16 rather than waiting for 17. Reviewed-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKG%2B6tLD2BhpRWycEoti6LVLyQq457UL4ticP5xd8LqHySA%40mail.gmail.com	2023-08-23 12:31:23 +12:00
Thomas Munro	81e36d3e0d	Invalidate smgr_targblock in smgrrelease(). In rare circumstances involving relfilenode reuse, it might have been possible for smgr_targblock to finish up pointing past the end. Oversight in `b74e94dc`. Back-patch to 15. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA%2BhUKGJ8NTvqLHz6dqbQnt2c8XCki4r2QvXjBQcXpVwxTY_pvA%40mail.gmail.com	2023-08-17 15:45:13 +12:00
Thomas Munro	5ffb7c7750	De-pessimize ConditionVariableCancelSleep(). Commit `b91dd9de` was concerned with a theoretical problem with our non-atomic condition variable operations. If you stop sleeping, and then cancel the sleep in a separate step, you might be signaled in between, and that could be lost. That doesn't matter for callers of ConditionVariableBroadcast(), but callers of ConditionVariableSignal() might be upset if a signal went missing like this. Commit `bc971f4025` interacted badly with that logic, because it doesn't use ConditionVariableSleep(), which would normally put us back in the wait list. ConditionVariableCancelSleep() would be confused and think we'd received an extra signal, and try to forward it to another backend, resulting in wakeup storms. New idea: ConditionVariableCancelSleep() can just return true if we've been signaled. Hypothetical users of ConditionVariableSignal() would then still have a way to deal with rare lost signals if they are concerned about that problem. Back-patch to 16, where `bc971f4025` arrived. Reported-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/2840876b-4cfe-240f-0a7e-29ffd66711e7%40enterprisedb.com	2023-08-15 10:23:47 +12:00
Michael Paquier	af720b4c50	Change custom wait events to use dynamic shared hash tables Currently, the names of the custom wait event must be registered for each backend, requiring all these to link to the shared memory area of an extension, even if these are not loaded with shared_preload_libraries. This patch relaxes the constraints related to this infrastructure by storing the wait events and their names in two dynamic hash tables in shared memory. This has the advantage to simplify the registration of custom wait events to a single routine call that returns an event ID ready for consumption: uint32 WaitEventExtensionNew(const char *wait_event_name); The caller of this routine can then cache locally the ID returned, to be used for pgstat_report_wait_start(), WaitLatch() or a similar routine. The implementation uses two hash tables: one with a key based on the event name to avoid duplicates and a second using the event ID as key for event lookups, like on pg_stat_activity. These tables can hold a minimum of 16 entries, and a maximum of 128 entries, which should be plenty enough. The code changes done in worker_spi show how things are simplified (most of the code removed in this commit comes from there): - worker_spi_init() is gone. - No more shared memory hooks required (size requested and initialization). - The custom wait event ID is cached in the process that needs to set it, with one single call to WaitEventExtensionNew() to retrieve it. Per suggestion from Andres Freund. Author: Masahiro Ikeda, with a few tweaks from me. Discussion: https://postgr.es/m/20230801032349.aaiuvhtrcvvcwzcx@awork3.anarazel.de	2023-08-14 14:47:27 +09:00
Michael Paquier	c9af054653	Support custom wait events for wait event type "Extension" Two backend routines are added to allow extension to allocate and define custom wait events, all of these being allocated in the type "Extension": * WaitEventExtensionNew(), that allocates a wait event ID computed from a counter in shared memory. * WaitEventExtensionRegisterName(), to associate a custom string to the wait event ID allocated. Note that this includes an example of how to use this new facility in worker_spi with tests in TAP for various scenarios, and some documentation about how to use them. Any code in the tree that currently uses WAIT_EVENT_EXTENSION could switch to this new facility to define custom wait events. This is left as work for future patches. Author: Masahiro Ikeda Reviewed-by: Andres Freund, Michael Paquier, Tristan Partin, Bharath Rupireddy Discussion: https://postgr.es/m/b9f5411acda0cf15c8fbb767702ff43e@oss.nttdata.com	2023-07-31 17:09:24 +09:00
Masahiko Sawada	bd88404d3c	Fix crash with RemoveFromWaitQueue() when detecting a deadlock. Commit `5764f611e` used dclist_delete_from() to remove the proc from the wait queue. However, since it doesn't clear dist_node's next/prev to NULL, it could call RemoveFromWaitQueue() twice: when the process detects a deadlock and then when cleaning up locks on aborting the transaction. The waiting lock information is cleared in the first call, so it led to a crash in the second call. Backpatch to v16, where the change was introduced. Bug: #18031 Reported-by: Justin Pryzby, Alexander Lakhin Reviewed-by: Andres Freund Discussion: https://postgr.es/m/ZKy4AdrLEfbqrxGJ%40telsasoft.com Discussion: https://postgr.es/m/18031-ebe2d08cb405f6cc@postgresql.org Backpatch-through: 16	2023-07-26 14:41:26 +09:00
Michael Paquier	66d86d4201	Document more assumptions of LWLock variable changes with WAL inserts This commit adds a few comments about what LWLockWaitForVar() relies on when a backend waits for a variable update on its LWLocks for WAL insertions up to an expected LSN. First, LWLockWaitForVar() does not include a memory barrier, relying on a spinlock taken at the beginning of WaitXLogInsertionsToFinish(). This was hidden behind two layers of routines in lwlock.c. This assumption is now documented at the top of LWLockWaitForVar(), and detailed at bit more within LWLockConflictsWithVar(). Second, document why WaitXLogInsertionsToFinish() does not include memory barriers, relying on a spinlock at its top, which is, per Andres' input, fine for two different reasons, both depending on the fact that the caller of WaitXLogInsertionsToFinish() is waiting for a LSN up to a certain value. This area's documentation and assumptions could be improved more in the future, but at least that's a beginning. Author: Bharath Rupireddy, Andres Freund Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CALj2ACVF+6jLvqKe6xhDzCCkr=rfd6upaGc3477Pji1Ke9G7Bg@mail.gmail.com	2023-07-26 12:06:04 +09:00
Michael Paquier	71e4cc6b8e	Optimize WAL insertion lock acquisition and release with some atomics The WAL insertion lock variable insertingAt is currently being read and written with the help of the LWLock wait list lock to avoid any read of torn values. This wait list lock can become a point of contention on a highly concurrent write workloads. This commit switches insertingAt to a 64b atomic variable that provides torn-free reads/writes. On platforms without 64b atomic support, the fallback implementation uses spinlocks to provide the same guarantees for the values read. LWLockWaitForVar(), through LWLockConflictsWithVar(), reads the new value to check if it still needs to wait with a u64 atomic operation. LWLockUpdateVar() updates the variable before waking up the waiters with an exchange_u64 (full memory barrier). LWLockReleaseClearVar() now uses also an exchange_u64 to reset the variable. Before this commit, all these steps relied on LWLockWaitListLock() and LWLockWaitListUnlock(). This reduces contention on LWLock wait list lock and improves performance of highly-concurrent write workloads. Here are some numbers using pg_logical_emit_message() (HEAD at `d6677b93`) with various arbitrary record lengths and clients up to 1k on a rather-large machine (64 vCPUs, 512GB of RAM, 16 cores per sockets, 2 sockets), in terms of TPS numbers coming from pgbench: message_size_b \| 16 \| 64 \| 256 \| 1024 --------------------+--------+--------+--------+------- patch_4_clients \| 83830 \| 82929 \| 80478 \| 73131 patch_16_clients \| 267655 \| 264973 \| 250566 \| 213985 patch_64_clients \| 380423 \| 378318 \| 356907 \| 294248 patch_256_clients \| 360915 \| 354436 \| 326209 \| 263664 patch_512_clients \| 332654 \| 321199 \| 287521 \| 240128 patch_1024_clients \| 288263 \| 276614 \| 258220 \| 217063 patch_2048_clients \| 252280 \| 243558 \| 230062 \| 192429 patch_4096_clients \| 212566 \| 213654 \| 205951 \| 166955 head_4_clients \| 83686 \| 83766 \| 81233 \| 73749 head_16_clients \| 266503 \| 265546 \| 249261 \| 213645 head_64_clients \| 366122 \| 363462 \| 341078 \| 261707 head_256_clients \| 132600 \| 132573 \| 134392 \| 165799 head_512_clients \| 118937 \| 114332 \| 116860 \| 150672 head_1024_clients \| 133546 \| 115256 \| 125236 \| 151390 head_2048_clients \| 137877 \| 117802 \| 120909 \| 138165 head_4096_clients \| 113440 \| 115611 \| 120635 \| 114361 Bharath has been measuring similar improvements, where the limit of the WAL insertion lock begins to be felt when more than 256 concurrent clients are involved in this specific workload. An extra patch has been discussed to introduce a fast-exit path in LWLockUpdateVar() when there are no waiters, still this does not influence the write-heavy workload cases discussed as there are always waiters. This will be considered separately. Author: Bharath Rupireddy Reviewed-by: Nathan Bossart, Andres Freund, Michael Paquier Discussion: https://postgr.es/m/CALj2ACVF+6jLvqKe6xhDzCCkr=rfd6upaGc3477Pji1Ke9G7Bg@mail.gmail.com	2023-07-25 13:38:58 +09:00
Andres Freund	f3bc519288	Fix off-by-one in LimitAdditionalPins() Due to the bug LimitAdditionalPins() could return 0, violating LimitAdditionalPins()'s API ("One additional pin is always allowed"). This could be hit when setting shared_buffers very low and using a fair amount of concurrency. This bug was introduced in `31966b151e`. Author: "Anton A. Melnikov" <aamelnikov@inbox.ru> Reported-by: "Anton A. Melnikov" <aamelnikov@inbox.ru> Reported-by: Victoria Shepard Discussion: https://postgr.es/m/ae46f2fb-5586-3de0-b54b-1bb0f6410ebd@inbox.ru Backpatch: 16-	2023-07-24 19:07:52 -07:00
Thomas Munro	d0c28601ef	Remove wal_sync_method=fsync_writethrough on Windows. The "fsync" level already flushes drive write caches on Windows (as does "fdatasync"), so it only confuses matters to have an apparently higher level that isn't actually different at all. That leaves "fsync_writethrough" only for macOS, where it actually does something different. Reviewed-by: Magnus Hagander <magnus@hagander.net> Discussion: https://postgr.es/m/CA%2BhUKGJ2CG2SouPv2mca2WCTOJxYumvBARRcKPraFMB6GSEMcA%40mail.gmail.com	2023-07-14 12:30:13 +12:00

1 2 3 4 5 ...

2586 Commits