postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2024-07-16 09:01:09 +02:00

Author	SHA1	Message	Date
Michael Paquier	b6f1cca9ba	Remove unnecessary break in pg_logical_replication_slot_advance() pg_logical_replication_slot_advance() included a break condition to stop when a targeted LSN is reached, when processing a series of WAL records with XLogReadRecord(). Since `38a957316d`, it matched with the check of its main while loop. This condition saved from an extra CFI check, actually pointless, so let's remove this condition and simplify the code. In passing, fix an incorrect comment. Author: Bharath Rupireddy Reviewed-by: Tom Lane, Gurjeet Singh Discussion: https://postgr.es/m/CALj2ACWfGDLQ2cy7ZKwxnJqbDkO6Yvqqrqxne5ZN4HYm=PRTGg@mail.gmail.com	2023-10-23 10:20:30 +09:00
Thomas Munro	dab889d60b	Fix min_dynamic_shared_memory on Windows. When min_dynamic_shared_memory is set above 0, we try to find space in a pre-allocated region of the main shared memory area instead of calling dsm_impl_XXX() routines to allocate more. The dsm_pin_segment() and dsm_unpin_segment() routines had a bug: they called dsm_impl_XXX() routines even for main region segments. Nobody noticed before now because those routines do nothing on Unix, but on Windows they'd fail while attempting to duplicate an invalid Windows HANDLE. Add the missing gating. Back-patch to 14, where commit `84b1c63a` added this feature. Fixes pgsql-bugs bug #18165. Reported-by: Maxime Boyer <maxime.boyer@cra-arc.gc.ca> Tested-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/18165-bf4f525cea6e51de%40postgresql.org	2023-10-22 10:04:55 +13:00
Tom Lane	2d870b4aef	Allow ALTER SYSTEM to set unrecognized custom GUCs. Previously, ALTER SYSTEM failed if the target GUC wasn't present in the session's GUC hashtable. That is a reasonable behavior for core (single-part) GUC names, and for custom GUCs for which we have loaded an extension that's reserved the prefix. But it's unnecessarily restrictive otherwise, and it also causes inconsistent behavior: you can "ALTER SYSTEM SET foo.bar" only if you did "SET foo.bar" earlier in the session. That's fairly silly. Hence, refactor things so that we can execute ALTER SYSTEM even if the variable doesn't have a GUC hashtable entry, as long as the name meets the custom-variable naming requirements and does not have a reserved prefix. (It's safe to do this even if the variable belongs to an extension we currently don't have loaded. A bad value will at worst cause a WARNING when the extension does get loaded.) Also, adjust GRANT ON PARAMETER to have the same opinions about whether to allow an unrecognized GUC name, and to throw the same errors if not (it previously used a one-size-fits-all message for several distinguishable conditions). By default, only a superuser will be allowed to do ALTER SYSTEM SET on an unrecognized name, but it's possible to GRANT the ability to do it. Patch by me, pursuant to a documentation complaint from Gavin Panella. Arguably this is a bug fix, but given the lack of other complaints I'll refrain from back-patching. Discussion: https://postgr.es/m/2617358.1697501956@sss.pgh.pa.us Discussion: https://postgr.es/m/169746329791.169914.16613647309012285391@wrigleys.postgresql.org	2023-10-21 13:35:19 -04:00
Alvaro Herrera	36a14afc07	Make some error strings more generic It's undesirable to have SQL commands or configuration options in a translatable error string, so take some of these out.	2023-10-20 22:52:15 +02:00
Tom Lane	2b5154beab	Extend ALTER OPERATOR to allow setting more optimization attributes. Allow the COMMUTATOR, NEGATOR, MERGES, and HASHES attributes to be set by ALTER OPERATOR. However, we don't allow COMMUTATOR/NEGATOR to be changed once set, nor allow the MERGES/HASHES flags to be unset once set. Changes like that might invalidate plans already made, and dealing with the consequences seems like more trouble than it's worth. The main use-case we foresee for this is to allow addition of missed properties in extension update scripts, such as extending an existing operator to support hashing. So only transitions from not-set to set states seem very useful. This patch also causes us to reject some incorrect cases that formerly resulted in inconsistent catalog state, such as trying to set the commutator of an operator to be some other operator that already has a (different) commutator. While at it, move the InvokeObjectPostCreateHook call for CREATE OPERATOR to not occur until after we've fixed up commutator or negator links as needed. The previous ordering could only be justified by thinking of the OperatorUpd call as a kind of ALTER OPERATOR step; but we don't call InvokeObjectPostAlterHook therein. It seems better to let the hook see the final state of the operator object. In the documentation, move the discussion of how to establish commutator pairs from xoper.sgml to the CREATE OPERATOR ref page. Tommy Pavlicek, reviewed and editorialized a bit by me Discussion: https://postgr.es/m/CAEhP-W-vGVzf4udhR5M8Bdv88UYnPrhoSkj3ieR3QNrsGQoqdg@mail.gmail.com	2023-10-20 12:28:46 -04:00
Robert Haas	afd12774ae	During online checkpoints, insert XLOG_CHECKPOINT_REDO at redo point. This allows tools that read the WAL sequentially to identify (possible) redo points when they're reached, rather than only being able to detect them in retrospect when XLOG_CHECKPOINT_ONLINE is found, possibly much later in the WAL stream. There are other possible applications as well; see the discussion links below. Any redo location that precedes the checkpoint location should now point to an XLOG_CHECKPOINT_REDO record, so add a cross-check to verify this. While adjusting the code in CreateCheckPoint() for this patch, I made it call WALInsertLockAcquireExclusive a bit later than before, since there appears to be no need for it to be held while checking whether the system is idle, whether this is an end-of-recovery checkpoint, or what the current timeline is. Bump XLOG_PAGE_MAGIC. Patch by me, based in part on earlier work from Dilip Kumar. Review by Dilip Kumar, Amit Kapila, Andres Freund, and Michael Paquier. Discussion: http://postgr.es/m/CA+TgmoYy-Vc6G9QKcAKNksCa29cv__czr+N9X_QCxEfQVpp_8w@mail.gmail.com Discussion: http://postgr.es/m/20230614194717.jyuw3okxup4cvtbt%40awork3.anarazel.de Discussion: http://postgr.es/m/CA+hUKG+b2ego8=YNW2Ohe9QmSiReh1-ogrv8V_WZpJTqP3O+2w@mail.gmail.com	2023-10-19 14:47:29 -04:00
Tom Lane	8483a54b7d	Doc: modernize comment for boolin(). Most of the behavior described by this comment was moved to parse_bool_with_len() some time ago. Move what's still valuable there too, and drop the rest. Peter Smith Discussion: https://postgr.es/m/CAHut+PtMJURKp=U8Z=Ktp0zV40sEb1f-iEk9FvY2GQe+5ZBnwg@mail.gmail.com	2023-10-19 11:31:05 -04:00
Michael Paquier	295c36c0c1	Add local_blk_{read\|write}_time I/O timing statistics for local blocks There was no I/O timing statistics for counting read and write timings on local blocks, contrary to the counterparts for temp and shared blocks. This information is available when track_io_timing is enabled. The output of EXPLAIN is updated to show this information. An update of pg_stat_statements is planned next. Author: Nazir Bilal Yavuz Reviewed-by: Robert Haas, Melanie Plageman Discussion: https://postgr.es/m/CAN55FZ19Ss279mZuqGbuUNxka0iPbLgYuOQXqAKewrjNrp27VA@mail.gmail.com	2023-10-19 13:39:38 +09:00
Michael Paquier	13d00729d4	Rename I/O timing statistics columns to shared_blk_{read\|write}_time These two counters, defined in BufferUsage to track respectively the time spent while reading and writing blocks have historically only tracked data related to shared buffers, when track_io_timing is enabled. An upcoming patch to add specific counters for local buffers will take advantage of this rename as it has come up that no data is currently tracked for local buffers, and tracking local and shared buffers using the same fields would be inconsistent with the treatment done for temp buffers. Renaming the existing fields clarifies what the block type of each stats field is. pg_stat_statement is updated to reflect the rename. No extension version bump is required as `5a3423ad8e` has done one, affecting v17~. Author: Nazir Bilal Yavuz Reviewed-by: Robert Haas, Melanie Plageman Discussion: https://postgr.es/m/CAN55FZ19Ss279mZuqGbuUNxka0iPbLgYuOQXqAKewrjNrp27VA@mail.gmail.com	2023-10-19 11:26:40 +09:00
Thomas Munro	76200e5ee4	jit: Changes for LLVM 17. Changes required by https://llvm.org/docs/NewPassManager.html. Back-patch to 12, leaving the final release of 11 unchanged, consistent with earlier decision not to back-patch LLVM 16 support either. Author: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKG%2BWXznXCyTgCADd%3DHWkP9Qksa6chd7L%3DGCnZo-MBgg9Lg%40mail.gmail.com	2023-10-19 05:13:23 +13:00
Thomas Munro	f90b4a846b	jit: Supply LLVMGlobalGetValueType() for LLVM < 8. Commit `37d5babb` used this C API function while adding support for LLVM 16 and opaque pointers, but it's not available in LLVM 7 and older. Provide it in our own llvmjit_wrap.cpp. It just calls a C++ function that pre-dates LLVM 3.9, our minimum target. Back-patch to 12, like `37d5babb`. Discussion: https://postgr.es/m/CA%2BhUKGKnLnJnWrkr%3D4mSGhE5FuTK55FY15uULR7%3Dzzc%3DwX4Nqw%40mail.gmail.com	2023-10-19 03:01:55 +13:00
Thomas Munro	37d5babb5c	jit: Support opaque pointers in LLVM 16. Remove use of LLVMGetElementType() and provide the type of all pointers to LLVMBuildXXX() functions when emitting IR, as required by modern LLVM versions[1]. * For LLVM <= 14, we'll still use the old LLVMBuildXXX() functions. * For LLVM == 15, we'll continue to do the same, explicitly opting out of opaque pointer mode. * For LLVM >= 16, we'll use the new LLVMBuildXXX2() functions that take the extra type argument. The difference is hidden behind some new IR emitting wrapper functions l_load(), l_gep(), l_call() etc. The change is mostly mechanical, except that at each site the correct type had to be provided. In some places we needed to do some extra work to get functions types, including some new wrappers for C++ APIs that are not yet exposed by in LLVM's C API, and some new "example" functions in llvmjit_types.c because it's no longer possible to start from the function pointer type and ask for the function type. Back-patch to 12, because it's a little tricker in 11 and we agreed not to put the latest LLVM support into the upcoming final release of 11. [1] https://llvm.org/docs/OpaquePointers.html Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGKNX_%3Df%2B1C4r06WETKTq0G4Z_7q4L4Fxn5WWpMycDj9Fw%40mail.gmail.com	2023-10-18 22:47:23 +13:00
Michael Paquier	d17ffc734d	Count write times when extending relation files for shared buffers Relation files extended by multiple blocks at a time have been counting the number of blocks written, but forgot to increment the write time in this case, as single-block write and relation extension are treated as two different I/O operations in the shared stats: IOOP_EXTEND vs IOOP_WRITE. In this case IOOP_EXTEND was forgotten for normal (non-temporary) relations, still the number of blocks written was incremented according to the relation extend done. Write times are tracked when track_io_timing is enabled, which is not the case by default. Author: Nazir Bilal Yavuz Reviewed-by: Robert Haas, Melanie Plageman Discussion: https://postgr.es/m/CAN55FZ19Ss279mZuqGbuUNxka0iPbLgYuOQXqAKewrjNrp27VA@mail.gmail.com Backpatch-through: 16	2023-10-18 14:54:33 +09:00
Michael Paquier	173b56f1ef	Add flush option to pg_logical_emit_message() Since its introduction, LogLogicalMessage() (via the SQL interface pg_logical_emit_message()) has never included a call to XLogFlush(), causing it to potentially lose messages on a crash when used in non-transactional mode. This has come up to me as a problem while playing with ideas to design a test suite for what has become 039_end_of_wal.pl introduced in `bae868caf2` by Thomas Munro, because there are no direct ways to force a WAL flush via SQL. The default is false, to not flush messages and influence existing use-cases where this function could be used. If set to true, the message emitted is flushed before returning back to the caller, making the message durable on crash. This new option has no effect when using pg_logical_emit_message() in transactional mode, as the record's flush is guaranteed by the WAL record generated by the transaction committed. Two queries of test_decoding are tweaked to cover the new code path for the flush. Bump catalog version. Author: Michael Paquier Reviewed-by: Andres Freund, Amit Kapila, Fujii Masao, Tung Nguyen, Tomas Vondra Discussion: https://postgr.es/m/ZNsdThSe2qgsfs7R@paquier.xyz	2023-10-18 11:24:59 +09:00
Tom Lane	19fa977311	Dodge a compiler bug affecting timetz_zone/timetz_izone. Use a modulo operator instead of implementing the same behavior with a loop. The loop solution is doubtless microscopically faster for the typical case of only wrapping into the very next day, but maybe not so much for large interval values. In any case, timetz is such a backwater that it's doubtful anybody would notice any performance change anyway. This avoids a compiler bug occurring in AIX's xlc, even in pretty late-model revisions. We did not have test coverage for the case where the initial result->time value is negative, so add that. For the moment, install this only in HEAD. My plan is to back-patch the test case, and then the code change assuming that buildfarm testing proves the bug occurs in the back branches. (That seems pretty likely, but let's find out for sure.) Per buildfarm results from commits `97957fdba` and `2f0472030`. Thanks to Michael Paquier for the idea to use a modulo operation to replace the faulty loop. Discussion: https://postgr.es/m/CA+hUKGK=DOC+hE-62FKfZy=Ybt5uLkrg3zCZD-jFykM-iPn8yw@mail.gmail.com	2023-10-17 13:10:35 -04:00
Nathan Bossart	97550c0711	Avoid calling proc_exit() in processes forked by system(). The SIGTERM handler for the startup process immediately calls proc_exit() for the duration of the restore_command, i.e., a call to system(). This system() call forks a new process to execute the shell command, and this child process inherits the parent's signal handlers. If both the parent and child processes receive SIGTERM, both will attempt to call proc_exit(). This can end badly. For example, both processes will try to remove themselves from the PGPROC shared array. To fix this problem, this commit adds a check in StartupProcShutdownHandler() to see whether MyProcPid == getpid(). If they match, this is the parent process, and we can proc_exit() like before. If they do not match, this is a child process, and we just emit a message to STDERR (in a signal safe manner) and _exit(), thereby skipping any problematic exit callbacks. This commit also adds checks in proc_exit(), ProcKill(), and AuxiliaryProcKill() that verify they are not being called within such child processes. Suggested-by: Andres Freund Reviewed-by: Thomas Munro, Andres Freund Discussion: https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz Discussion: https://postgr.es/m/20230223231503.GA743455%40nathanxps13 Backpatch-through: 11	2023-10-17 10:41:48 -05:00
Robert Haas	2406c4e34c	Reword messages about impending (M)XID exhaustion. First, we shouldn't recommend switching to single-user mode, because that's terrible advice. Especially on newer versions where VACUUM will enter emergency mode when nearing (M)XID exhaustion, it's perfectly fine to just VACUUM in multi-user mode. Doing it that way is less disruptive and avoids disabling the safeguards that prevent actual wraparound, so recommend that instead. Second, be more precise about what is going to happen (when we're nearing the limits) or what is happening (when we actually hit them). The database doesn't shut down, nor does it refuse all commands. It refuses commands that assign whichever of XIDs and MXIDs are nearly exhausted. No back-patch. The existing hint that advises going to single-user mode is sufficiently awful advice that removing it or changing it might be justifiable even though we normally avoid changing user-facing messages in back-branches, but I (rhaas) felt that it was better to be more conservative and limit this fix to master only. Aside from the usual risk of breaking translations, people might be used to the existing message, or even have monitoring scripts that look for it. Alexander Alekseev, John Naylor, Robert Haas, reviewed at various times by Peter Geoghegan, Hannu Krosing, and Andres Freund. Discussion: http://postgr.es/m/CA+TgmoZBg95FiR9wVQPAXpGPRkacSt2okVge+PKPPFppN7sfnQ@mail.gmail.com	2023-10-17 10:34:21 -04:00
Robert Haas	a1a5da8cb7	Talk about assigning, rather than generating, new MultiXactIds. The word "assign" is used in various places internally to describe what GetNewMultiXactId does, but the user-facing messages have previously said "generate". For consistency, standardize on "assign," which seems (at least to me) to be slightly clearer. Discussion: http://postgr.es/m/CA+TgmoaoE1_i3=4-7GCTtKLVZVQ2Gh6qESW2VG1OprtycxOHMA@mail.gmail.com	2023-10-17 10:23:31 -04:00
Michael Paquier	d6b0c2bcb1	Improve truncation of pg_serial/, removing "apparent wraparound" LOGs It is possible that the tail XID of pg_serial/ gets ahead of its head XID, which would cause the truncation of pg_serial/ done during checkpoints to show up as a "wraparound" LOG in SimpleLruTruncate(), which is confusing. This also wastes a bit of disk space until the head page is reclaimed again. CheckPointPredicate() is changed so as the cutoff page for the truncation is switched to the head page if the tail XID has advanced beyond the head XID, rather than the tail page. This prevents the confusing LOG message about a wraparound while allowing some truncation to be done to cut in disk space. This could be considered as a bug fix, but the original behavior is harmless as well, resulting only in disk space temporarily wasted, so no backpatch is done. Author: Sami Imseih Reviewed-by: Heikki Linnakangas, Michael Paquier Discussion: https://postgr.es/m/755E19CA-D02C-4A4C-80D3-74F775410C48@amazon.com	2023-10-17 14:36:21 +09:00
Amit Kapila	79243de13f	Restart the apply worker if the privileges have been revoked. Restart the apply worker if the subscription owner's superuser privileges have been revoked. This is required so that the subscription connection string gets revalidated and use the password option to connect to the publisher for non-superusers, if required. Author: Vignesh C Reviewed-by: Amit Kapila Discussion: http://postgr.es/m/CALDaNm2Dxmhq08nr4P6G+24QvdBo_GAVyZ_Q1TcGYK+8NHs9xw@mail.gmail.com	2023-10-17 08:41:44 +05:30
Tom Lane	54b208f909	Ensure we have a snapshot while dropping ON COMMIT DROP temp tables. Dropping a temp table could entail TOAST table access to clean out toasted catalog entries, such as large pg_constraint.conbin strings for complex CHECK constraints. If we did that via ON COMMIT DROP, we triggered the assertion in init_toast_snapshot(), because there was no provision for setting up a snapshot for the drop actions. Fix that. (I assume here that the adjacent truncation actions for ON COMMIT DELETE ROWS don't have a similar problem: it doesn't seem like nontransactional truncations would need to touch any toasted fields. If that proves wrong, we could refactor a bit to have the same snapshot acquisition cover that too.) The test case added here does not fail before v15, because that assertion was added in `277692220` which was not back-patched. However, the race condition the assertion warns of surely exists further back, so back-patch to all supported branches. Per report from Richard Guo. Discussion: https://postgr.es/m/CAMbWs4-x26=_QxxgdJyNbiCDzvtr2WV5ZDso_v-CukKEe6cBZw@mail.gmail.com	2023-10-16 14:06:14 -04:00
Nathan Bossart	8fb13dd6ab	Move extra code out of the Pre/PostRestoreCommand() section. If SIGTERM is received within this section, the startup process will immediately proc_exit() in the signal handler, so it is inadvisable to include any more code than is required there (as such code is unlikely to be compatible with doing proc_exit() in a signal handler). This commit moves the code recently added to this section (see `1b06d7bac9` and `7fed801135`) to outside of the section. This ensures that the startup process only calls proc_exit() in its SIGTERM handler for the duration of the system() call, which is how this code worked from v8.4 to v14. Reported-by: Michael Paquier, Thomas Munro Analyzed-by: Andres Freund Suggested-by: Tom Lane Reviewed-by: Michael Paquier, Robert Haas, Thomas Munro, Andres Freund Discussion: https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz Discussion: https://postgr.es/m/20230223231503.GA743455%40nathanxps13 Backpatch-through: 15	2023-10-16 12:41:55 -05:00
Michael Paquier	e9718b4bd3	Fix code indentation violations in `e83d1b0c40` koel has not reported this one yet, I have just bumped on it while looking at a different patch.	2023-10-16 09:36:31 +09:00
Thomas Munro	01529c7040	Fix comment from commit `22655aa231`. Per automated complaint from BF animal koel this needed to be re-indented, but there was also a typo. Back-patch to 16.	2023-10-16 13:32:41 +13:00
Alexander Korotkov	e83d1b0c40	Add support event triggers on authenticated login This commit introduces trigger on login event, allowing to fire some actions right on the user connection. This can be useful for logging or connection check purposes as well as for some personalization of environment. Usage details are described in the documentation included, but shortly usage is the same as for other triggers: create function returning event_trigger and then create event trigger on login event. In order to prevent the connection time overhead when there are no triggers the commit introduces pg_database.dathasloginevt flag, which indicates database has active login triggers. This flag is set by CREATE/ALTER EVENT TRIGGER command, and unset at connection time when no active triggers found. Author: Konstantin Knizhnik, Mikhail Gribkov Discussion: https://postgr.es/m/0d46d29f-4558-3af9-9c85-7774e14a7709%40postgrespro.ru Reviewed-by: Pavel Stehule, Takayuki Tsunakawa, Greg Nancarrow, Ivan Panchenko Reviewed-by: Daniel Gustafsson, Teodor Sigaev, Robert Haas, Andres Freund Reviewed-by: Tom Lane, Andrey Sokolov, Zhihong Yu, Sergey Shinderuk Reviewed-by: Gregory Stark, Nikita Malakhov, Ted Yu	2023-10-16 03:18:22 +03:00
Thomas Munro	c558e6fd92	Acquire ControlFileLock in relevant SQL functions. Commit `dc7d70ea` added functions that read the control file, but didn't acquire ControlFileLock. With unlucky timing, file systems that have weak interlocking like ext4 and ntfs could expose partially overwritten contents, and the checksum would fail. Back-patch to all supported releases. Reviewed-by: David Steele <david@pgmasters.net> Reviewed-by: Anton A. Melnikov <aamelnikov@inbox.ru> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de	2023-10-16 10:43:47 +13:00
Tom Lane	fcdd6689d0	Harden xxx_is_visible() functions against concurrent object drops. For the same reasons given in commit `403ac226d`, adjust these functions to not assume that checking SearchSysCacheExists can guarantee success of a later fetch. This follows the same internal API choices made in the earlier commit: add a function XXXExt(oid, is_missing) and use that to eliminate the need for a separate existence check. The changes are very straightforward, though tedious. For the moment I just made the new functions static in namespace.c, but we could export them if a need emerges. Per bug #18014 from Alexander Lakhin. Given the lack of hard evidence that there's a bug in non-debug builds, I'm content to fix this only in HEAD. Discussion: https://postgr.es/m/18014-28c81cb79d44295d@postgresql.org	2023-10-14 16:13:11 -04:00
Tom Lane	403ac226dd	Harden has_xxx_privilege() functions against concurrent object drops. The versions of these functions that accept object OIDs are supposed to return NULL, rather than failing, if the target object has been dropped. This makes it safe(r) to use them in queries that scan catalogs, since the functions will be applied to objects that are visible in the query's snapshot but might now be gone according to the catalog snapshot. In most cases we implemented this by doing a SearchSysCacheExists test and assuming that if that succeeds, we can safely invoke the appropriate aclchk.c function, which will immediately re-fetch the same syscache entry. It was argued that if the existence test succeeds then the followup fetch must succeed as well, for lack of any intervening AcceptInvalidationMessages call. Alexander Lakhin demonstrated that this is not so when CATCACHE_FORCE_RELEASE is enabled: the syscache entry will be forcibly dropped at the end of SearchSysCacheExists, and then it is possible for the catalog snapshot to get advanced while re-fetching the entry. Alexander's test case requires the operation to happen inside a parallel worker, but that seems incidental to the fundamental problem. What remains obscure is whether there is a way for this to happen in a non-debug build. Nonetheless, CATCACHE_FORCE_RELEASE is a very useful test methodology, so we'd better make the code safe for it. After some discussion we concluded that the most future-proof fix is to give up the assumption that checking SearchSysCacheExists can guarantee success of a later fetch. At best that assumption leads to fragile code --- for example, has_type_privilege appears broken for array types even if you believe the assumption holds. And it's not even particularly efficient. There had already been some work towards extending the aclchk.c APIs to include "is_missing" output flags, so this patch extends that work to cover all the aclchk.c functions that are used by the has_xxx_privilege() functions. (This allows getting rid of some ad-hoc decisions about not throwing errors in certain places in aclchk.c.) In passing, this fixes the has_sequence_privilege() functions to provide the same guarantees as their cousins: for some reason the SearchSysCacheExists tests never got added to those. There is more work to do to remove the unsafe coding pattern with SearchSysCacheExists in other places, but this is a pretty self-contained patch so I'll commit it separately. Per bug #18014 from Alexander Lakhin. Given the lack of hard evidence that there's a bug in non-debug builds, I'm content to fix this only in HEAD. (Perhaps we should clean up the has_sequence_privilege() oversight in the back branches, but in the absence of field complaints I'm not too excited about that either.) Discussion: https://postgr.es/m/18014-28c81cb79d44295d@postgresql.org	2023-10-14 14:49:50 -04:00
Andres Freund	22655aa231	Fix bulk table extension when copying into multiple partitions When COPYing into a partitioned table that does now permit the use of table_multi_insert(), we could error out with ERROR: could not read block NN in file "base/...": read only 0 of 8192 bytes because BulkInsertState->next_free was not reset between partitions. This problem occurred only when not able to use table_multi_insert(), as a dedicated BulkInsertState for each partition is used in that case. The bug was introduced in `00d1e02be2`, but it was hard to hit at that point, as commonly bulk relation extension is not used when not using table_multi_insert(). It became more likely after `82a4edabd2`, which expanded the use of bulk extension. To fix the bug, reset the bulk relation extension state in BulkInsertState in ReleaseBulkInsertStatePin(). That was added (in `b1ecb9b3fc`) to tackle a very similar issue. Obviously the name is not quite correct, but there might be external callers, and bulk insert state needs to be reset in precisely in the situations that ReleaseBulkInsertStatePin() already needed to be called. Medium term the better fix likely is to disallow reusing BulkInsertState across relations. Add a test that, without the fix, reproduces #18130 in most configurations. The test also catches the problem fixed in `b1ecb9b3fc` when run with small shared_buffers. Reported-by: Ivan Kolombet <enderstd@gmail.com> Analyzed-by: Tom Lane <tgl@sss.pgh.pa.us> Analyzed-by: Andres Freund <andres@anarazel.de> Bug: #18130 Discussion: https://postgr.es/m/18130-7a86a7356a75209d%40postgresql.org Discussion: https://postgr.es/m/257696.1695670946%40sss.pgh.pa.us Backpatch: 16-	2023-10-13 19:16:44 -07:00
Nathan Bossart	8d140c5822	Improve the naming in wal_sync_method code. * sync_method is renamed to wal_sync_method. * sync_method_options[] is renamed to wal_sync_method_options[]. * assign_xlog_sync_method() is renamed to assign_wal_sync_method(). * The names of the available synchronization methods are now prefixed with "WAL_SYNC_METHOD_" and have been moved into a WalSyncMethod enum. * PLATFORM_DEFAULT_SYNC_METHOD is renamed to PLATFORM_DEFAULT_WAL_SYNC_METHOD, and DEFAULT_SYNC_METHOD is renamed to DEFAULT_WAL_SYNC_METHOD. These more descriptive names help distinguish the code for wal_sync_method from the code for DataDirSyncMethod (e.g., the recovery_init_sync_method configuration parameter and the --sync-method option provided by several frontend utilities). This change also prevents name collisions between the aforementioned sets of code. Since this only improves the naming of internal identifiers, there should be no behavior change. Author: Maxim Orlov Discussion: https://postgr.es/m/CACG%3DezbL1gwE7_K7sr9uqaCGkWhmvRTcTEnm3%2BX1xsRNwbXULQ%40mail.gmail.com	2023-10-13 15:16:45 -05:00
Michael Paquier	97957fdbaa	Add support for AT LOCAL When converting a timestamp to/from with/without time zone, the SQL Standard specifies an AT LOCAL variant of AT TIME ZONE which uses the session's time zone. This includes three system functions able to do the work in the same way as the existing flavors for AT TIME ZONE, except that these need to be marked as stable as they depend on the session's TimeZone GUC. Bump catalog version. Author: Vik Fearing Reviewed-by: Laurenz Albe, Cary Huang, Michael Paquier Discussion: https://postgr.es/m/8e25dec4-5667-c1a5-6581-167d710c2182@postgresfriends.org	2023-10-13 13:01:37 +09:00
Thomas Munro	0013ba290b	Add wait events for checkpoint delay mechanism. When MyProc->delayChkptFlags is set to temporarily block phase transitions in a concurrent checkpoint, the checkpointer enters a sleep-poll loop to wait for the flag to be cleared. We should show that as a wait event in the pg_stat_activity view. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CA%2BhUKGL7Whi8iwKbzkbn_1fixH3Yy8aAPz7mfq6Hpj7FeJrKMg%40mail.gmail.com	2023-10-13 16:43:22 +13:00
Robert Haas	df9a3d4e99	Unify two isLogSwitch tests in XLogInsertRecord. An upcoming patch wants to introduce an additional special case in this function. To keep that as cheap as possible, minimize the amount of branching that we do based on whether this is an XLOG_SWITCH record. Additionally, and also in the interest of keeping the overhead of special-case code paths as low as possible, apply likely() to the non-XLOG_SWITCH case, since only a very tiny fraction of WAL records will be XLOG_SWITCH records. Patch by me, reviewed by Dilip Kumar, Amit Kapila, Andres Freund, and Michael Paquier. Discussion: http://postgr.es/m/CA+TgmoYy-Vc6G9QKcAKNksCa29cv__czr+N9X_QCxEfQVpp_8w@mail.gmail.com	2023-10-12 13:48:21 -04:00
David Rowley	d9e46dfb78	Fix runtime partition pruning for HASH partitioned tables This could only affect HASH partitioned tables with at least 2 partition key columns. If partition pruning was delayed until execution and the query contained an IS NULL qual on one of the partitioned keys, and some subsequent partitioned key was being compared to a non-Const, then this could result in a crash due to the incorrect keyno being used to calculate the stateidx for the expression evaluation code. Here we fix this by properly skipping partitioned keys which have a nullkey set. Effectively, this must be the same as what's going on inside perform_pruning_base_step(). Sergei Glukhov also provided a patch, but that's not what's being used here. Reported-by: Sergei Glukhov Reviewed-by: tender wang, Sergei Glukhov Discussion: https://postgr.es/m/d05b26fa-af54-27e1-f693-6c31590802fa@postgrespro.ru Backpatch-through: 11, where runtime partition pruning was added.	2023-10-13 01:12:31 +13:00
David Rowley	f0c409d9c7	Fix incorrect step generation in HASH partition pruning get_steps_using_prefix_recurse() incorrectly assumed that it could stop recursive processing of the 'prefix' list when cur_keyno was one before the step_lastkeyno. Since hash partition pruning can prune using IS NULL quals, and these IS NULL quals are not present in the 'prefix' list, then that logic could cause more levels of recursion than what is needed and lead to there being no more items in the 'prefix' list to process. This would manifest itself as a crash in some code that expected the 'start' ListCell not to be NULL. Here we adjust the logic so that instead of stopping recursion at 1 key before the step_lastkeyno, we just look at the llast(prefix) item and ensure we only recursively process up until just before whichever the last key is. This effectively allows keys to be missing in the 'prefix' list. This change does mean that step_lastkeyno is no longer needed, so we remove that from the static functions. I also spent quite some time reading this code and testing it to try to convince myself that there are no other issues. That resulted in the irresistible temptation of rewriting some comments, many of which were just not true or inconcise. Reported-by: Sergei Glukhov Reviewed-by: Sergei Glukhov, tender wang Discussion: https://postgr.es/m/2f09ce72-315e-2a33-589a-8519ada8df61@postgrespro.ru Backpatch-through: 11, where partition pruning was introduced.	2023-10-12 19:50:38 +13:00
Michael Paquier	e7689190b3	Add option to bgworkers to allow the bypass of role login check This adds a new option called BGWORKER_BYPASS_ROLELOGINCHECK to the flags available to BackgroundWorkerInitializeConnection() and BackgroundWorkerInitializeConnectionByOid(). This gives the possibility to bgworkers to bypass the role login check, making possible the use of a role that has no login rights while not being a superuser. PostgresInit() gains a new flag called INIT_PG_OVERRIDE_ROLE_LOGIN, taking advantage of the refactoring done in `4800a5dfb4`. Regression tests are added to worker_spi to check the behavior of this new option with bgworkers. Author: Bertrand Drouvot Reviewed-by: Nathan Bossart, Michael Paquier, Bharath Rupireddy Discussion: https://postgr.es/m/bcc36259-7850-4882-97ef-d6b905d2fc51@gmail.com	2023-10-12 09:24:17 +09:00
Tom Lane	b6a77c6a6c	Reindent comment in GenericXLogFinish(). Restore pgindent cleanliness, per buildfarm member koel.	2023-10-11 17:14:31 -04:00
Tom Lane	5d8aa8bced	Fix missed optimization in relation_excluded_by_constraints(). In commit `3fc6e2d7f`, I (tgl) argued that we only need to check for a constant-FALSE restriction clause when there's exactly one restriction clause, on the grounds that const-folding would have thrown away anything ANDed with a Const FALSE. That's true just after const-folding has been applied, but subsequent processing such as equivalence class expansion could result in cases where a Const FALSE is ANDed with some other stuff. (Compare for instance joinrels.c's restriction_is_constant_false.) Hence, tweak this logic to check all the elements of the baserestrictinfo list, not just one; that's cheap enough to not be worth worrying about. There is one existing test case where this visibly improves the plan. There would not be any savings in runtime, but the planner effort and executor startup effort will be reduced, and anyway it's odd that we can detect related cases but not this one. Richard Guo (independently discovered by David Rowley) Discussion: https://postgr.es/m/CAMbWs4_x3-CnVVrCboS1LkEhB5V+W7sLSCabsRiG+n7+5_kqbg@mail.gmail.com	2023-10-11 12:51:38 -04:00
Heikki Linnakangas	16671ba6e7	Move canAcceptConnections check from ProcessStartupPacket to caller. The check is not about processing the startup packet, so the calling function seems like a more natural place. I'm also working on a patch that moves 'canAcceptConnections' out of the Port struct, and this makes that refactoring more convenient. Reviewed-by: Tristan Partin Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi	2023-10-11 14:06:38 +03:00
Michael Paquier	4800a5dfb4	Refactor InitPostgres() to use bitwise option flags InitPostgres() has been using a set of boolean arguments to control its behavior, and a patch under discussion was aiming at expanding it with a third one. In preparation for expanding this area, this commit switches all the current boolean arguments of this routine to a single bits32 argument instead. Two values are currently supported for the flags: - INIT_PG_LOAD_SESSION_LIBS to load [session\|local]_preload_libraries at startup. - INIT_PG_OVERRIDE_ALLOW_CONNS to allow connection to a database even if it has !datallowconn. This is used by bgworkers. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/ZSTn66_BXRZCeaqS@paquier.xyz	2023-10-11 12:31:49 +09:00
Jeff Davis	ef74c7197c	Fix bug in GenericXLogFinish(). Mark the buffers dirty before writing WAL. Discussion: https://postgr.es/m/25104133-7df8-cae3-b9a2-1c0aaa1c094a@iki.fi Reviewed-by: Heikki Linnakangas Backpatch-through: 11	2023-10-10 11:01:13 -07:00
Tom Lane	14661ba1a7	Replace has_multiple_baserels() with a bitmap test on all_baserels. Since we added the PlannerInfo.all_baserels set, it's not really necessary to grovel over the rangetable to count baserels in the current query. So let's drop has_multiple_baserels() in favor of a bms_membership() test. This might be microscopically faster, but the main point is to remove some unnecessary code. Richard Guo Discussion: https://postgr.es/m/CAMbWs4_8RcSbbfs1ASZLrMuL0c0EQgXWcoLTQD8swBRY_pQQiA@mail.gmail.com	2023-10-10 13:08:29 -04:00
Peter Eisentraut	1d91d24d9a	Add const to values and nulls arguments This excludes any changes that would change the external AM APIs. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/flat/14c31f4a-0347-0805-dce8-93a9072c05a5%40eisentraut.org	2023-10-10 07:50:43 +02:00
David Rowley	fc4089f3c6	Fix possible crash in add_paths_to_append_rel() While working on `a8a968a82`, I failed to consider that cheapest_startup_path can be NULL when there is no non-parameterized path in the pathlist. This is well documented in set_cheapest(), I just failed to notice. Here we adjust the code to just check if the RelOptInfo has a cheapest_startup_path set before adding it to the startup_subpaths list. Reported-by: Richard Guo Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs49w3t03V69XhdCuw+GDwivny4uQUxrkVp6Gejaspt0wMQ@mail.gmail.com	2023-10-10 16:50:03 +13:00
David Rowley	4f3b56eea2	Revert "Optimize various aggregate deserialization functions" This reverts commit `608fd198de`. On 2nd thoughts, the StringInfo API requires that strings are NUL terminated and pointing directly to the data in a bytea Datum isn't NUL terminated. Discussion: https://postgr.es/m/CAApHDvorfO3iBZ=xpiZvp3uHtJVLyFaPBSvcAhAq2HPLnaNSwQ@mail.gmail.com	2023-10-10 14:16:54 +13:00
Heikki Linnakangas	637109d13a	Rename StartBackgroundWorker() to BackgroundWorkerMain(). The comment claimed that it is "called from postmaster", but it is actually called in the child process, pretty early in the process initialization. I guess you could interpret "called from postmaster" to mean that, but it seems wrong to me. Rename the function to be consistent with other functions with similar role. Reviewed-by: Thomas Munro Discussion: https://www.postgresql.org/message-id/4f95c1fc-ad3c-7974-3a8c-6faa3931804c@iki.fi	2023-10-09 11:52:09 +03:00
Heikki Linnakangas	0bbafb5342	Allocate Backend structs in PostmasterContext. The child processes don't need them. By allocating them in PostmasterContext, the memory gets free'd and is made available for other stuff in the child processes. Reviewed-by: Thomas Munro Discussion: https://www.postgresql.org/message-id/4f95c1fc-ad3c-7974-3a8c-6faa3931804c@iki.fi	2023-10-09 11:29:39 +03:00
Heikki Linnakangas	1ca312686e	Clarify the checks in RegisterBackgroundWorker. In EXEC_BACKEND or single-user mode, we process shared_preload_libraries at postmaster startup as usual, but also at backend startup. When a library calls RegisterBackgroundWorker() when being loaded into a backend process, we go through the motions to add the worker to BackgroundWorkerList, even though that is a postmaster-private data structure. Make it return early when called in a backend process, without changing BackgroundWorkerList. You could argue that it was intentional: In non-EXEC_BACKEND mode, the backend processes inherit BackgroundWorkerList at fork(), so it does make some sense to initialize it to the same state in EXEC_BACKEND mode, too. It's clearly a postmaster-private structure, though, and all the functions that use it are clearly marked as "should only be called in postmaster". You could also argue that libraries should not call RegisterBackgroundWorker() during backend startup. It's too late to correctly register any static background workers at that stage. But it's a common pattern in extensions, and it doesn't seem worth the churn to require all extensions to change it. Another sloppiness was the exception for "internal" background workers. We checked that RegisterBackgroundWorker() was called during shared_preload_libraries processing, or the background worker was an internal one. That exception was made in commit `665d1fad99` to allow postmaster to register the logical apply launcher in ApplyLauncherRegister(). The way the check was written, it would not complain if you registered an internal background worker in a regular backend process. But it would complain if postmaster registered a background worker defined in a shared library, outside shared_preload_libraries processing. I think the correct rule is that you can only register static background workers in the postmaster process, and only before the bgworker shared memory array has been initialized. Check for that more directly. Reviewed-by: Thomas Munro Discussion: https://www.postgresql.org/message-id/4f95c1fc-ad3c-7974-3a8c-6faa3931804c@iki.fi	2023-10-09 11:29:33 +03:00
David Rowley	608fd198de	Optimize various aggregate deserialization functions The serialized representation of an internal aggregate state is a bytea value. In each deserial function, in order to "receive" the bytea value we appended it onto a short-lived StringInfoData using appendBinaryStringInfo. This was a little wasteful as it meant having to palloc memory, copy a (possibly long) series of bytes then later pfree that memory. Instead of going to this extra trouble, we can just fake up a StringInfoData and point the data directly at the bytea's payload. This should help increase the performance of internal aggregate deserialization. Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CAApHDvr=e-YOigriSHHm324a40HPqcUhSp6pWWgjz5WwegR=cQ@mail.gmail.com	2023-10-09 17:25:16 +13:00
Amit Kapila	7cc2f59dd5	Remove duplicate words in docs and code comments. Additionally, add a missing "the" in a couple of places. Author: Vignesh C, Dagfinn Ilmari Mannsåker Discussion: http://postgr.es/m/CALDaNm28t+wWyPfuyqEaARS810Je=dRFkaPertaLAEJYY2cWYQ@mail.gmail.com	2023-10-09 09:18:47 +05:30
David Rowley	d8a295389b	Strip off ORDER BY/DISTINCT aggregate pathkeys in create_agg_path `1349d2790` added code to adjust the PlannerInfo.group_pathkeys so that ORDER BY / DISTINCT aggregate functions could obtain pre-sorted inputs to allow faster execution. That commit forgot to adjust the pathkeys in create_agg_path(). Some code in there assumed that it was always fine to make the AggPath's pathkeys the same as its subpath's. That seems to have been ok up until `1349d2790`, but since that commit adds pathkeys for columns which are within the aggregate function, those columns won't be available above the aggregate node. This can result in "could not find pathkey item to sort" during create_plan(). The fix here is to strip off the additional pathkeys added by adjust_group_pathkeys_for_groupagg(). It seems that the pathkeys here will only ever be group_pathkeys, so all we need to do is check if the length of the pathkey list is longer than the num_groupby_pathkeys and get rid of the additional ones only if we see extras. Reported-by: Justin Pryzby Reviewed-by: Richard Guo Discussion: https://postgr.es/m/ZQhYYRhUxpW3PSf9%40telsasoft.com Backpatch-through: 16, where `1349d2790` was introduced	2023-10-09 16:37:05 +13:00
David Rowley	77db132637	Remove debug_print_rel and replace usages with pprint Going by `c4a1933b4`, `b33ef397a` and `05893712c` (to name just a few), it seems that maintaining debug_print_rel() is often forgotten. In the case of `c4a1933b4`, it was several years before anyone noticed that a path type was not handled by debug_print_rel(). (debug_print_rel() is only compiled when building with OPTIMIZER_DEBUG). After a quick survey on the pgsql-hackers mailing list, nobody came forward to admit that they use OPTIMIZER_DEBUG. So to prevent any future maintenance neglect, let's just remove debug_print_rel() and have OPTIMIZER_DEBUG make use of pprint() instead (as suggested by Tom Lane). If anyone wants to come forward to claim they make use of OPTIMIZER_DEBUG in a way that they need debug_print_rel() then they have around 10 months remaining in the v17 cycle where we could revert this. If nobody comes forward in that time, then we can likely safely declare debug_print_rel() as not worth keeping. Discussion: https://postgr.es/m/CAApHDvoCdjo8Cu2zEZF4-AxWG-90S+pYXAnoDDa9J3xH-OrczQ@mail.gmail.com	2023-10-09 15:53:16 +13:00
Alexander Korotkov	82a7132f53	Fix another typo in `e0b1ee17dc` Reported-by: Richard Guo Discussion: https://postgr.es/m/CAMbWs4_kHMJDak75y1kBTirv-drS1-knT-7Mpg5LprAjqRJDVA%40mail.gmail.com	2023-10-07 20:36:47 +03:00
Alexander Korotkov	e8c334c47a	Fix typos in `e0b1ee17dc` Reported-by: Alexander Lakhin	2023-10-07 11:55:55 +03:00
Etsuro Fujita	aec684ff0f	Remove extra parenthesis from comment.	2023-10-06 18:30:00 +09:00
Alexander Korotkov	e0b1ee17dc	Skip checking of scan keys required for directional scan in B-tree Currently, B-tree code matches every scan key to every item on the page. Imagine the ordered B-tree scan for the query like this. SELECT * FROM tbl WHERE col > 'a' AND col < 'b' ORDER BY col; The (col > 'a') scan key will be always matched once we find the location to start the scan. The (col < 'b') scan key will match every item on the page as long as it matches the last item on the page. This patch implements prechecking of the scan keys required for directional scan on beginning of page scan. If precheck is successful we can skip this scan keys check for the items on the page. That could lead to significant acceleration especially if the comparison operator is expensive. Idea from patch by Konstantin Knizhnik. Discussion: https://postgr.es/m/079c3f8e-3371-abe2-e93c-fc8a0ae3f571%40garret.ru Reviewed-by: Peter Geoghegan, Pavel Borisov	2023-10-06 10:40:51 +03:00
Heikki Linnakangas	5da0a622e8	Fix crash on syslogger startup When syslogger starts up, ListenSockets is still NULL. Don't try to pfree it. Oversight in commit `e29c464395`. Reported-by: Michael Paquier Discussion: https://www.postgresql.org/message-id/ZR-uNkgL7m60lWUe@paquier.xyz	2023-10-06 10:22:02 +03:00
Peter Eisentraut	180e3394a7	Push attcompression and attstorage handling into BuildDescForRelation() This was previously handled by the callers but it can be moved into a common place. Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org	2023-10-05 16:20:46 +02:00
Peter Eisentraut	04e485273b	Move BuildDescForRelation() from tupdesc.c to tablecmds.c BuildDescForRelation() main job is to convert ColumnDef lists to pg_attribute/tuple descriptor arrays, which is really mostly an internal subroutine of DefineRelation() and some related functions, which is more the remit of tablecmds.c and doesn't have much to do with the basic tuple descriptor interfaces in tupdesc.c. This is also supported by observing the header includes we can remove in tupdesc.c. By moving it over, we can also (in the future) make BuildDescForRelation() use more internals of tablecmds.c that are not sensible to be exposed in tupdesc.c. Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org	2023-10-05 16:20:46 +02:00
Peter Eisentraut	6d341407a6	Push attidentity and attgenerated handling into BuildDescForRelation() Previously, this was handled by the callers separately, but it can be trivially moved into BuildDescForRelation() so that it is handled in a central place. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org	2023-10-05 16:20:46 +02:00
Heikki Linnakangas	e29c464395	Refactor ListenSocket array. Keep track of the used size of the array. That avoids looping through the whole array in a few places. It doesn't matter from a performance point of view since the array is small anyway, but this feels less surprising and is a little less code. Now that we have an explicit NumListenSockets variable that is statically initialized to 0, we don't need the loop to initialize the array. Allocate the array in PostmasterContext. The array isn't needed in child processes, so this allows reusing that memory. We could easily make the array resizable now, but we haven't heard any complaints about the current 64 sockets limit. Discussion: https://www.postgresql.org/message-id/7bb7ad65-a018-2419-742f-fa5fd877d338@iki.fi	2023-10-05 15:05:25 +03:00
Alvaro Herrera	1c99cde2f3	Improve JsonLexContext's freeability Previously, the JSON code didn't have to worry too much about freeing JsonLexContext, because it was never too long-lived. With new features being added for SQL/JSON this is no longer the case. Add a routine that knows how to free this struct and apply that to a few places, to prevent this from becoming problematic. At the same time, we change the API of makeJsonLexContextCstringLen to make it receive a pointer to JsonLexContext for callers that want it to be stack-allocated; it can also be passed as NULL to get the original behavior of a palloc'ed one. This also causes an ABI break due to the addition of flags to JsonLexContext, so we can't easily backpatch it. AFAICS that's not much of a problem; apparently some leaks might exist in JSON usage of text-search, for example via json_to_tsvector, but I haven't seen any complaints about that. Per Coverity complaint about datum_to_jsonb_internal(). Discussion: https://postgr.es/m/20230808174110.oq3iymllsv6amkih@alvherre.pgsql	2023-10-05 10:59:08 +02:00
David Rowley	a8a968a821	Consider cheap startup paths in add_paths_to_append_rel `6b94e7a6d` did this for ordered append paths to allow fast startup MergeAppends, however, nothing was done for the Append case. Here we adjust add_paths_to_append_rel() to have it build an AppendPath containing the cheapest startup paths from each of the child relations when the append rel has "consider_startup" set. Author: Andy Fan, David Rowley Discussion: https://www.postgresql.org/message-id/CAKU4AWrXSkUV=Pt-gRxQT7EbfUeNssprGyNsB=5mJibFZ6S3ww@mail.gmail.com	2023-10-05 21:03:10 +13:00
David Rowley	0b053e78b5	Fix memory leak in Memoize code Ensure we switch to the per-tuple memory context to prevent any memory leaks of detoasted Datums in MemoizeHash_hash() and MemoizeHash_equal(). Reported-by: Orlov Aleksej Author: Orlov Aleksej, David Rowley Discussion: https://postgr.es/m/83281eed63c74e4f940317186372abfd%40cft.ru Backpatch-through: 14, where Memoize was added	2023-10-05 20:30:47 +13:00
Peter Eisentraut	5e4282772a	Remove RelationGetIndexRawAttOptions() There was only one caller left, for which this function was overkill. Also, having it in relcache.c was inappropriate, since it doesn't work with the relcache at all. Discussion: https://www.postgresql.org/message-id/flat/f84640e3-00d3-5abd-3f41-e6a19d33c40b@eisentraut.org	2023-10-03 17:51:02 +02:00
Peter Eisentraut	7841623571	Remove IndexInfo.ii_OpclassOptions field It is unnecessary to include this field in IndexInfo. It is only used by DDL code, not during execution. It is really only used to pass local information around between functions in index.c and indexcmds.c, for which it is clearer to use local variables, like in similar cases. Discussion: https://www.postgresql.org/message-id/flat/f84640e3-00d3-5abd-3f41-e6a19d33c40b@eisentraut.org	2023-10-03 17:51:02 +02:00
Tom Lane	af3ee8a086	Add some notes about why "ALTER TYPE enum DROP VALUE" is hard. In hopes of putting these where any would-be implementer is sure to find them, make a placeholder grammar production for ALTER DROP VALUE and put them there. This is really just a docs patch, though. Vik Fearing, with a bit more wordsmithing by me Discussion: https://postgr.es/m/9fffd149-da0f-0c9c-6745-731fb688642a@postgresfriends.org	2023-10-03 11:41:42 -04:00
Robert Haas	c2ba3fdea5	In basebackup.c, refactor to create read_file_data_into_buffer. This further reduces the length and complexity of sendFile(), hopefully make it easier to understand and modify. In addition to moving some logic into a new function, I took this opportunity to make a few slight adjustments to sendFile() itself, including renaming the 'len' variable to 'bytes_done', since we use it to represent the number of bytes we've already handled so far, not the total length of the file. Patch by me, reviewed by David Steele. Discussion: http://postgr.es/m/CA+TgmoYt5jXH4U6cu1dm9Oe2FTn1aae6hBNhZzJJjyjbE_zYig@mail.gmail.com	2023-10-03 11:00:40 -04:00
Robert Haas	053183138a	In basebackup.c, refactor to create verify_page_checksum. If checksum verification fails for a particular page, we reread the page and try one more time. The code that does this somewhat complex and difficult to follow. Move some of the logic into a new function and rearrange the code a bit to try to make it clearer. This way, we don't need the block_retry Boolean, a couple of other variables move from sendFile() into the new function, and some code is now less deeply indented. Patch by me, reviewed by David Steele. Discussion: http://postgr.es/m/CA+TgmoYt5jXH4U6cu1dm9Oe2FTn1aae6hBNhZzJJjyjbE_zYig@mail.gmail.com	2023-10-03 10:37:20 -04:00
Michael Paquier	a956bd3fa9	Avoid memory size overflow when allocating backend activity buffer The code in charge of copying the contents of PgBackendStatus to local memory could fail on memory allocation because of an overflow on the amount of memory to use. The overflow can happen when combining a high value track_activity_query_size (max at 1MB) with a large max_connections, when both multiplied get higher than INT32_MAX as both parameters treated as signed integers. This could for example trigger with the following functions, all calling pgstat_read_current_status(): - pg_stat_get_backend_subxact() - pg_stat_get_backend_idset() - pg_stat_get_progress_info() - pg_stat_get_activity() - pg_stat_get_db_numbackends() The change to use MemoryContextAllocHuge() has been introduced in `8d0ddccec6`, so backpatch down to 12. Author: Jakub Wartak Discussion: https://postgr.es/m/CAKZiRmw8QSNVw2qNK-dznsatQqz+9DkCquxP0GHbbv1jMkGHMA@mail.gmail.com Backpatch-through: 12	2023-10-03 15:37:00 +09:00
David Rowley	2075ba9dc9	Tidy-up some appendStringInfo*() usages Make a few newish calls to appendStringInfo() which have no special formatting use appendStringInfoString() instead. Also, adjust usages of appendStringInfoString() which only append a string containing a single character to make use of appendStringInfoChar() instead. This makes the code marginally faster, but primarily this change is so we use the StringInfo type as it was intended to be used. Discussion: https://postgr.es/m/CAApHDvpXKQmL+r=VDNS98upqhr9yGBhv2Jw3GBFFk_wKHcB39A@mail.gmail.com	2023-10-03 17:09:52 +13:00
Michael Paquier	6b18b3fe2c	Fail hard on out-of-memory failures in xlogreader.c This commit changes the WAL reader routines so as a FATAL for the backend or exit(FAILURE) for the frontend is triggered if an allocation for a WAL record decode fails in walreader.c, rather than treating this case as bogus data, which would be equivalent to the end of WAL. The key is to avoid palloc_extended(MCXT_ALLOC_NO_OOM) in walreader.c, relying on plain palloc() calls. The previous behavior could make WAL replay finish too early than it should. For example, crash recovery finishing earlier may corrupt clusters because not all the WAL available locally was replayed to ensure a consistent state. Out-of-memory failures would show up randomly depending on the memory pressure on the host, but one simple case would be to generate a large record, then replay this record after downsizing a host, as Ethan Mertz originally reported. This relies on `bae868caf2`, as the WAL reader routines now do the memory allocation required for a record only once its header has been fully read and validated, making xl_tot_len trustable. Making the WAL reader react differently on out-of-memory or bogus record data would require ABI changes, so this is the safest choice for stable branches. Also, it is worth noting that `3f1ce97346` has been using a plain palloc() in this code for some time now. Thanks to Noah Misch and Thomas Munro for the discussion. Like the other commit, backpatch down to 12, leaving out v11 that will be EOL'd soon. The behavior of considering a failed allocation as bogus data comes originally from `0ffe11abd3`, where the record length retrieved from its header was not entirely trustable. Reported-by: Ethan Mertz Discussion: https://postgr.es/m/ZRKKdI5-RRlta3aF@paquier.xyz Backpatch-through: 12	2023-10-03 10:21:44 +09:00
Robert Haas	1ccc1e05ae	Remove retry loop in heap_page_prune(). The retry loop is needed because heap_page_prune() calls HeapTupleSatisfiesVacuum() and then lazy_scan_prune() does the same thing again, and they might get different answers due to concurrent clog updates. But this patch makes heap_page_prune() return the HeapTupleSatisfiesVacuum() results that it computed back to the caller, which allows lazy_scan_prune() to avoid needing to recompute those values in the first place. That's nice both because it eliminates the need for a retry loop and also because it's cheaper. Melanie Plageman, reviewed by David Geier, Andres Freund, and me. Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com	2023-10-02 11:40:07 -04:00
Heikki Linnakangas	e64c733bb1	Flush WAL stats in bgwriter bgwriter can write out WAL, but did not flush the WAL pgstat counters, so the writes were not seen in pg_stat_wal. Back-patch to v14, where pg_stat_wal was introduced. Author: Nazir Bilal Yavuz Reviewed-by: Matthias van de Meent, Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/CAN55FZ2FPYngovZstr%3D3w1KSEHe6toiZwrurbhspfkXe5UDocg%40mail.gmail.com	2023-10-02 12:39:35 +03:00
Heikki Linnakangas	f0bd0b4489	Add rmgrdesc README In the README, briefly explain what rmgrdesc functions are, and why they are in a separate directory. Commit `c03c2eae0a` added some guidelines on the preferred output format; move that to the README too. Reviewed-by: Melanie Plageman, Peter Geoghegan Discussion: https://www.postgresql.org/message-id/9159daf7-f42d-781b-458f-1b2cf32cb256%40iki.fi	2023-10-02 12:18:57 +03:00
Amit Langote	c8ec5e0543	Revert "Add soft error handling to some expression nodes" This reverts commit `7fbc75b26e`. Looks like the LLVM additions may not be totally correct.	2023-10-02 13:48:15 +09:00
Amit Langote	7fbc75b26e	Add soft error handling to some expression nodes This adjusts the expression evaluation code for CoerceViaIO and CoerceToDomain to handle errors softly if needed. For CoerceViaIo, this means using InputFunctionCallSafe(), which provides the option to handle errors softly, instead of calling the type input function directly. For CoerceToDomain, this simply entails replacing the ereport() in ExecEvalConstraintCheck() by errsave(). In both cases, the ErrorSaveContext to be used when evaluating the expression is stored by ExecInitExprRec() in the expression's struct in the expression's ExprEvalStep. The ErrorSaveContext is passed by setting ExprState.escontext to point to it when calling ExecInitExprRec() on the expression whose errors are to be handled softly. Note that no call site of ExecInitExprRec() has been changed in this commit, so there's no functional change. This is intended for implementing new SQL/JSON expression nodes in future commits that will use to it suppress errors that may occur during type coercions. Reviewed-by: Álvaro Herrera Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com	2023-10-02 11:52:28 +09:00
Noah Misch	e1f95ec8cf	Correct assertion and comments about XLogRecordMaxSize. The largest allocation, of xl_tot_len+8192, is in allocate_recordbuf(). Discussion: https://postgr.es/m/20230812211327.GB2326466@rfd.leadboat.com	2023-10-01 12:20:55 -07:00
Tom Lane	5b7b382464	Fix datalen calculation in tsvectorrecv(). After receiving position data for a lexeme, tsvectorrecv() advanced its "datalen" value by (npos+1)sizeof(WordEntry) where the correct calculation is (npos+1)sizeof(WordEntryPos). This accidentally failed to render the constructed tsvector invalid, but it did result in leaving some wasted space approximately equal to the space consumed by the position data. That could have several bad effects: * Disk space is wasted if the received tsvector is stored into a table as-is. * A legal tsvector could get rejected with "maximum total lexeme length exceeded" if the extra space pushes it over the MAXSTRPOS limit. * In edge cases, the finished tsvector could be assigned a length larger than the allocated size of its palloc chunk, conceivably leading to SIGSEGV when the tsvector gets copied somewhere else. The odds of a field failure of this sort seem low, though valgrind testing could probably have found this. While we're here, let's express the calculation as "sizeof(uint16) + npos * sizeof(WordEntryPos)" to avoid the type pun implicit in the "npos + 1" formulation. It's not wrong given that WordEntryPos had better be 2 bytes to avoid padding problems, but it seems clearer this way. Report and patch by Denis Erokhin. Back-patch to all supported versions. Discussion: https://postgr.es/m/009801d9f2d9$f29730c0$d7c59240$@datagile.ru	2023-10-01 13:16:47 -04:00
Tom Lane	d8a09939a3	In COPY FROM, fail cleanly when unsupported encoding conversion is needed. In recent releases, such cases fail with "cache lookup failed for function 0" rather than complaining that the conversion function doesn't exist as prior versions did. Seems to be a consequence of sloppy refactoring in commit `f82de5c46`. Add the missing error check. Per report from Pierre Fortin. Back-patch to v14 where the oversight crept in. Discussion: https://postgr.es/m/20230929163739.3bea46e5.pfortin@pfortin.com	2023-10-01 12:09:26 -04:00
Andrew Dunstan	276393f53e	Only evaluate default values as required when doing COPY FROM Commit `9f8377f7a2` was a little too eager in fetching default values. Normally this would not matter, but if the default value is not valid for the type (e.g. a varchar that's too long) it caused an unnecessary error. Complaint and fix from Laurenz Albe Backpatch to release 16. Discussion: https://postgr.es/m/75a7b7483aeb331aa017328d606d568fc715b90d.camel@cybertec.at	2023-10-01 10:18:41 -04:00
Andrew Dunstan	f6d4c9cf16	Provide FORCE_NULL * and FORCE_NOT_NULL * options for COPY FROM These options already exist, but you need to specify a column list for them, which can be cumbersome. We already have the possibility of all columns for FORCE QUOTE, so this is simply extending that facility to FORCE_NULL and FORCE_NOT_NULL. Author: Zhang Mingli Reviewed-By: Richard Guo, Kyatoro Horiguchi, Michael Paquier. Discussion: https://postgr.es/m/CACJufxEnVqzOFtqhexF2+AwOKFrV8zHOY3y=p+gPK6eB14pn_w@mail.gmail.com	2023-09-30 12:34:41 -04:00
Heikki Linnakangas	c181f2e2bc	Fix briefly showing old progress stats for ANALYZE on inherited tables. ANALYZE on a table with inheritance children analyzes all the child tables in a loop. When stepping to next child table, it updated the child rel ID value in the command progress stats, but did not reset the 'sample_blks_total' and 'sample_blks_scanned' counters. acquire_sample_rows() updates 'sample_blks_total' as soon as the scan starts and 'sample_blks_scanned' after processing the first block, but until then, pg_stat_progress_analyze would display a bogus combination of the new child table relid with old counter values from the previously processed child table. Fix by resetting 'sample_blks_total' and 'sample_blks_scanned' to zero at the same time that 'current_child_table_relid' is updated. Backpatch to v13, where pg_stat_progress_analyze view was introduced. Reported-by: Justin Pryzby Discussion: https://www.postgresql.org/message-id/20230122162345.GP13860%40telsasoft.com	2023-09-30 17:03:50 +03:00
Dean Rasheed	1d5caec221	Fix EvalPlanQual rechecking during MERGE. Under some circumstances, concurrent MERGE operations could lead to inconsistent results, that varied according the plan chosen. This was caused by a lack of rowmarks on the source relation, which meant that EvalPlanQual rechecking was not guaranteed to return the same source tuples when re-running the join query. Fix by ensuring that preprocess_rowmarks() sets up PlanRowMarks for all non-target relations used in MERGE, in the same way that it does for UPDATE and DELETE. Per bug #18103. Back-patch to v15, where MERGE was introduced. Dean Rasheed, reviewed by Richard Guo. Discussion: https://postgr.es/m/18103-c4386baab8e355e3%40postgresql.org	2023-09-30 10:52:21 +01:00
Bruce Momjian	6d0c39a293	C comment: add optimizer function reference Reported-by: James Coleman Discussion: https://postgr.es/m/CAAaqYe9F6uoMhAr+8rMLwvGzaKaSknPA0Wi3Ehtv8pbSYmJq-Q@mail.gmail.com Backpatch-through: master	2023-09-29 14:25:59 -04:00
David Rowley	d40d827219	Robustify find_base_rel and find_base_rel_ignore_join Improve find_base_rel() and find_base_rel_ignore_join() so that they raise an ERROR if they ever receive a negative relid value in non-cassert builds. If either of these functions had ever received a negative relid then they'd have attempted to access memory that does not belong to simple_rel_array. Because no evidence has been presented of actual cases where bugs have caused this to happen, here we take a lightweight approach to checking for negative values and simply cast both values to uint32 before performing the comparison. This will cause any negative relids to be seen as greater than simple_rel_array_size which will ERROR rather than attempt to access a negative simple_rel_array element. Obviously, the run-time error is better than a crash, so it makes sense to protect against this, especially when it can be done without adding any additional run-time overhead. There is a slight change here if the functions are ever called with a relid of 0. This will pass the bounds check, but that array entry should be NULL (along with the corresponding simple_rte_array entry), so won't pass the "if (rel)" condition and still fall through and raise an ERROR. Author: Ranier Vilela Reviewed-by: Ashutosh Bapat, David Rowley Discussion: https://postgr.es/m/CAEudQArQSghBu2gLojg4o_tnHj_x2HcS%3D%2BwewL3NJS8z0VnK%2Bg%40mail.gmail.com	2023-09-29 16:58:32 +13:00
Peter Geoghegan	714780dcdd	Fix btmarkpos/btrestrpos array key wraparound bug. nbtree's mark/restore processing failed to correctly handle an edge case involving array key advancement and related search-type scan key state. Scans with ScalarArrayScalarArrayOpExpr quals requiring mark/restore processing (for a merge join) could incorrectly conclude that an affected array/scan key must not have advanced during the time between marking and restoring the scan's position. As a result of all this, array key handling within btrestrpos could skip a required call to _bt_preprocess_keys(). This confusion allowed later primitive index scans to overlook tuples matching the true current array keys. The scan's search-type scan keys would still have spurious values corresponding to the final array element(s) -- not values matching the first/now-current array element(s). To fix, remember that "array key wraparound" has taken place during the ongoing btrescan in a flag variable stored in the scan's state, and use that information at the point where btrestrpos decides if another call to _bt_preprocess_keys is required. Oversight in commit `70bc5833`, which taught nbtree to handle array keys during mark/restore processing, but missed this subtlety. That commit was itself a bug fix for an issue in commit `9e8da0f7`, which taught nbtree to handle ScalarArrayOpExpr quals natively. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkgP3DDRJxw6DgjCxo-cu-DKrvjEv_ArkP2ctBJatDCYg@mail.gmail.com Backpatch: 11- (all supported branches).	2023-09-28 16:29:37 -07:00
Tom Lane	9f71e10d65	Fix checking of index expressions in CompareIndexInfo(). This code was sloppy about comparison of index columns that are expressions. It didn't reliably reject cases where one index has an expression where the other has a plain column, and it could index off the start of the attmap array, leading to a Valgrind complaint (though an actual crash seems unlikely). I'm not sure that the expression-vs-column sloppiness leads to any visible problem in practice, because the subsequent comparison of the two expression lists would reject cases where the indexes have different numbers of expressions overall. Maybe we could falsely match indexes having the same expressions in different column positions, but it'd require unlucky contents of the word before the attmap array. It's not too surprising that no problem has been reported from the field. Nonetheless, this code is clearly wrong. Per bug #18135 from Alexander Lakhin. Back-patch to all supported branches. Discussion: https://postgr.es/m/18135-532f4a755e71e4d2@postgresql.org	2023-09-28 14:05:25 -04:00
Robert Haas	4e9fc3a976	Return data from heap_page_prune via a struct. Previously, one of the values in the struct was returned as the return value, and another was returned via an output parameter. In preparation for returning more stuff, consolidate both values into a struct returned via an output parameter. Melanie Plageman, reviewed by Andres Freund and by me. Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com	2023-09-28 10:36:34 -04:00
David Rowley	c4a1933b48	Add missing TidRangePath handling in print_path() Tid Range scans were added back in `bb437f995`. That commit forgot to add handling for TidRangePaths in print_path(). Only people building with OPTIMIZER_DEBUG might have noticed this, which likely is the reason it's taken 4 years for anyone to notice. Author: Andrey Lepikhov Reported-by: Andrey Lepikhov Discussion: https://postgr.es/m/379082d6-1b6a-4cd6-9ecf-7157d8c08635@postgrespro.ru Backpatch-through: 14, where `bb437f995` was introduced	2023-09-29 00:02:22 +13:00
Etsuro Fujita	c68f78538f	Fix typo in src/backend/access/transam/README.	2023-09-28 19:45:00 +09:00
Amit Langote	d060e921ea	Remove obsolete executor cleanup code This commit removes unnecessary ExecExprFreeContext() calls in ExecEnd* routines because the actual cleanup is managed by FreeExecutorState(). With no callers remaining for ExecExprFreeContext(), this commit also removes the function. This commit also drops redundant ExecClearTuple() calls, because ExecResetTupleTable() in ExecEndPlan() already takes care of resetting and dropping all TupleTableSlots initialized with ExecInitScanTupleSlot() and ExecInitExtraTupleSlot(). After these modifications, the ExecEnd*() routines for ValuesScan, NamedTuplestoreScan, and WorkTableScan became redundant. So, this commit removes them. Reviewed-by: Robert Haas Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com	2023-09-28 09:44:39 +09:00
Michael Paquier	9210afd3bc	Move tracking of in_streaming to PGOutputData "in_streaming" is a flag used to track if an instance of pgoutput is streaming changes. When pgoutput is started, the flag was always reset, switched it back and forth in the stream start/stop callbacks. Before this commit, it was a global variable, which is confusing as it is actually attached to a state of PGOutputData. Per my analysis, using a global variable did not lead to an active bug like in `54ccfd6586`, but it makes the code more consistent. Note that we cannot backpatch this change anyway as it requires the addition of a new field to PGOutputData, exposed in pgoutput.h. Author: Hou Zhijie Reviewed-by: Amit Kapila, Michael Paquier, Peter Smith Discussion: https://postgr.es/m/OS0PR01MB571690EF24F51F51EFFCBB0E94FAA@OS0PR01MB5716.jpnprd01.prod.outlook.com	2023-09-28 09:33:51 +09:00
Peter Eisentraut	ebf76f2753	Add TupleDescGetDefault() This unifies some repetitive code. Note: I didn't push the "not found" error message into the new function, even though all existing callers would be able to make use of it. Using the existing error handling as-is would probably require exposing the Relation type via tupdesc.h, which doesn't seem desirable. (Or even if we changed it to just report the OID, it would inject the concept of a relation containing the tuple descriptor into tupdesc.h, which might be a layering violation. Perhaps some further improvements could be considered here separately.) Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da%40eisentraut.org	2023-09-27 18:52:40 +01:00
Daniel Gustafsson	9dce22033d	llvmjit: Use explicit LLVMContextRef for inlining When performing inlining LLVM unfortunately "leaks" types (the types survive and are usable, but a new round of inlining will recreate new structurally equivalent types). This accumulation will over time amount to a memory leak which for some queries can be large enough to trigger the OOM process killer. To avoid accumulation of types, all IR related data is stored in an LLVMContextRef which is dropped and recreated in order to release all types. Dropping and recreating incurs overhead, so it will be done only after 100 queries. This is a heuristic which might be revisited, but until we can get the size of the context from LLVM we are flying a bit blind. This issue has been reported several times, there may be more references to it in the archives on top of the threads linked below. Backpatching of this fix will be handled once it has matured in master for a bit. Reported-By: Justin Pryzby <pryzby@telsasoft.com> Reported-By: Kurt Roeckx <kurt@roeckx.be> Reported-By: Jaime Casanova <jcasanov@systemguards.com.ec> Reported-By: Lauri Laanmets <pcspets@gmail.com> Author: Andres Freund and Daniel Gustafsson Discussion: https://postgr.es/m/7acc8678-df5f-4923-9cf6-e843131ae89d@www.fastmail.com Discussion: https://postgr.es/m/20201218235607.GC30237@telsasoft.com Discussion: https://postgr.es/m/CAPH-tTxLf44s3CvUUtQpkDr1D8Hxqc2NGDzGXS1ODsfiJ6WSqA@mail.gmail.com	2023-09-27 13:02:21 +02:00
Daniel Gustafsson	ef668d8bf5	llvmjit: Make llvm_types_module variable static Commit `b059d2f456` introduced llvm_types_module and accidentally exported it. As there is no usecase for accessing this variable externally, this makes it static. Author: Andres Freund <andres@anarazel.de> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/20221101055132.pjjsvlkeo4stbjkq@awork3.anarazel.de	2023-09-27 13:02:14 +02:00
Daniel Gustafsson	2dad308e73	llvmjit: Remove unnecessary types These types were added in `fb46ac26fe` but hasn't been used, so remove until there is a need for them. Author: Andres Freund <andres@anarazel.de> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/20221101055132.pjjsvlkeo4stbjkq@awork3.anarazel.de	2023-09-27 13:02:01 +02:00
Amit Kapila	54ccfd6586	Fix the misuse of origin filter across multiple pg_logical_slot_get_changes() calls. The pgoutput module uses a global variable (publish_no_origin) to cache the action for the origin filter, but we didn't reset the flag when shutting down the output plugin, so subsequent retries may access the previous publish_no_origin value. We fix this by storing the flag in the output plugin's private data. Additionally, the patch removes the currently unused origin string from the structure. For the back branch, to avoid changing the exposed structure, we eliminated the global variable and instead directly used the origin string for change filtering. Author: Hou Zhijie Reviewed-by: Amit Kapila, Michael Paquier Backpatch-through: 16 Discussion: http://postgr.es/m/OS0PR01MB571690EF24F51F51EFFCBB0E94FAA@OS0PR01MB5716.jpnprd01.prod.outlook.com	2023-09-27 14:32:51 +05:30
Bruce Momjian	441bbd2988	doc: correct reference to pg_relation in comment Reported-by: Dagfinn Ilmari Mannsåker Discussion: https://postgr.es/m/87sf9apnr0.fsf@wibble.ilmari.org Backpatch-through: master	2023-09-26 17:07:14 -04:00
Peter Eisentraut	b0ae29512c	MergeAttributes() and related variable renaming Mainly, rename "schema" to "columns" and related changes. The previous naming has long been confusing. Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da%40eisentraut.org	2023-09-26 16:08:35 +01:00
Peter Eisentraut	369202bf4b	Clean up MergeCheckConstraint() If the constraint is not already in the list, add it ourselves, instead of making the caller do it. This makes the interface more consistent with other "merge" functions in this file. Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da%40eisentraut.org	2023-09-26 14:01:53 +01:00
Heikki Linnakangas	28d3c2ddcf	Fix another bug in parent page splitting during GiST index build. Yet another bug in the ilk of commits `a7ee7c851` and `741b88435`. In `741b88435`, we took care to clear the memorized location of the downlink when we split the parent page, because splitting the parent page can move the downlink. But we missed that even updating a tuple on the parent can move it, because updating a tuple on a gist page is implemented as a delete+insert, so the updated tuple gets moved to the end of the page. This commit fixes the bug in two different ways (belt and suspenders): 1. Clear the downlink when we update a tuple on the parent page, even if it's not split. This the same approach as in commits `a7ee7c851` and `741b88435`. I also noticed that gistFindCorrectParent did not clear the 'downlinkoffnum' when it stepped to the right sibling. Fix that too, as it seems like a clear bug even though I haven't been able to find a test case to hit that. 2. Change gistFindCorrectParent so that it treats 'downlinkoffnum' merely as a hint. It now always first checks if the downlink is still at that location, and if not, it scans the page like before. That's more robust if there are still more cases where we fail to clear 'downlinkoffnum' that we haven't yet uncovered. With this, it's no longer necessary to meticulously clear 'downlinkoffnum', so this makes the previous fixes unnecessary, but I didn't revert them because it still seems nice to clear it when we know that the downlink has moved. Also add the test case using the same test data that Alexander posted. I tried to reduce it to a smaller test, and I also tried to reproduce this with different test data, but I was not able to, so let's just include what we have. Backpatch to v12, like the previous fixes. Reported-by: Alexander Lakhin Discussion: https://www.postgresql.org/message-id/18129-caca016eaf0c3702@postgresql.org	2023-09-26 14:14:49 +03:00
Peter Eisentraut	64b787656d	Add some const qualifiers There was a mismatch between the const qualifiers for excludeDirContents in src/backend/backup/basebackup.c and src/bin/pg_rewind/filemap.c, which led to a quick search for similar cases. We should make excludeDirContents match, but the rest of the changes seem like a good idea as well. Author: David Steele <david@pgmasters.net> Discussion: https://www.postgresql.org/message-id/flat/669a035c-d23d-2f38-7ff0-0cb93e01d610@pgmasters.net	2023-09-26 11:28:57 +01:00
Peter Eisentraut	eddad679d2	Clean up MergeAttributesIntoExisting() Make variable naming clearer and more consistent. Move some variables to smaller scope. Remove some unnecessary intermediate variables. Try to save some vertical space. Apply analogous changes to nearby MergeConstraintsIntoExisting() and RemoveInheritance() for consistency. Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da%40eisentraut.org	2023-09-26 09:09:36 +01:00
Peter Eisentraut	eb36c6ac84	Remove unused include This was added in `add5cf28d4` but was apparently never used. Discussion: https://www.postgresql.org/message-id/flat/f84640e3-00d3-5abd-3f41-e6a19d33c40b@eisentraut.org	2023-09-26 07:56:41 +01:00
Michael Paquier	e221c0befb	Fix behavior of "force" in pgstat_report_wal() As implemented in `5891c7a8ed`, setting "force" to true in pgstat_report_wal() causes the routine to not wait for the pgstat shmem lock if it cannot be acquired, in which case the WAL and I/O statistics finish by not being flushed. The origin of the confusion comes from pgstat_flush_wal() and pgstat_flush_io(), that use "nowait" as sole argument. The I/O stats are new in v16. This is the opposite behavior of what has been used in pgstat_report_stat(), where "force" is the opposite of "nowait". In this case, when "force" is true, the routine sets "nowait" to false, which would cause the routine to wait for the pgstat shmem lock, ensuring that the stats are always flushed. When "force" is false, "nowait" is set to true, and the stats would only not be flushed if the pgstat shmem lock can be acquired, returning immediately without flushing the stats if the lock cannot be acquired. This commit changes pgstat_report_wal() so as "force" has the same behavior as in pgstat_report_stat(). There are currently three callers of pgstat_report_wal(): - Two in the checkpointer where force=true during a shutdown and the main checkpointer loop. Now the code behaves so as the stats are always flushed. - One in the main loop of the bgwriter, where force=false. Now the code behaves so as the stats would not be flushed if the pgstat shmem lock could not be acquired. Before this commit, some stats on WAL and I/O could have been lost after a shutdown, for example. Reported-by: Ryoga Yoshida Author: Ryoga Yoshida, Michael Paquier Discussion: https://postgr.es/m/f87a4d7be70530606b864fd1df91718c@oss.nttdata.com Backpatch-through: 15	2023-09-26 09:29:47 +09:00
Thomas Munro	becfbdd6c1	Fix edge-case for xl_tot_len broken by `bae868ca`. `bae868ca` removed a check that was still needed. If you had an xl_tot_len at the end of a page that was too small for a record header, but not big enough to span onto the next page, we'd immediately perform the CRC check using a bogus large length. Because of arbitrary coding differences between the CRC implementations on different platforms, nothing very bad happened on common modern systems. On systems using the _sb8.c fallback we could segfault. Restore that check, add a new assertion and supply a test for that case. Back-patch to 12, like `bae868ca`. Tested-by: Tom Lane <tgl@sss.pgh.pa.us> Tested-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGLCkTT7zYjzOxuLGahBdQ%3DMcF%3Dz5ZvrjSOnW4EDhVjT-g%40mail.gmail.com	2023-09-26 10:53:38 +13:00
Nathan Bossart	13aeaf0797	Add worker type to pg_stat_subscription. Thanks to commit `2a8b40e368`, the logical replication worker type is easily determined. The worker type could already be deduced via other columns such as leader_pid and relid, but that is unnecessary complexity for users. Bumps catversion. Author: Peter Smith Reviewed-by: Michael Paquier, Maxim Orlov, Amit Kapila Discussion: https://postgr.es/m/CAHut%2BPtmbSMfErSk0S7xxVdZJ9XVE3xVLhqBTmT91kf57BeKDQ%40mail.gmail.com	2023-09-25 14:12:43 -07:00
Tom Lane	dc8d72c1c2	Collect dependency information for parsed CallStmts. Parse analysis of a CallStmt will inject mutable information, for instance the OID of the called procedure, so that subsequent DDL may create a need to re-parse the CALL. We failed to detect this for CALLs in plpgsql routines, because no dependency information was collected when putting a CallStmt into the plan cache. That could lead to misbehavior or strange errors such as "cache lookup failed". Before commit `ee895a655`, the issue would only manifest for CALLs appearing in atomic contexts, because we re-planned non-atomic CALLs every time through anyway. It is now apparent that extract_query_dependencies() probably needs a special case for every utility statement type for which stmt_requires_parse_analysis() returns true. I wanted to add something like Assert(!stmt_requires_parse_analysis(...)) when falling out of extract_query_dependencies_walker without doing anything, but there are API issues as well as a more fundamental point: stmt_requires_parse_analysis is supposed to be applied to raw parser output, so it'd be cheating to assume it will give the correct answer for post-parse-analysis trees. I contented myself with adding a comment. Per bug #18131 from Christian Stork. Back-patch to all supported branches. Discussion: https://postgr.es/m/18131-576854e79c5cd264@postgresql.org	2023-09-25 14:42:17 -04:00
Tom Lane	cf1c65070a	Limit to_tsvector_byid's initial array allocation to something sane. The initial estimate of the number of distinct ParsedWords is just that: an estimate. Don't let it exceed what palloc is willing to allocate. If in fact we need more entries, we'll eventually fail trying to enlarge the array. But if we don't, this allows success on inputs that currently draw "invalid memory alloc request size". Per bug #18080 from Uwe Binder. Back-patch to all supported branches. Discussion: https://postgr.es/m/18080-d5c5e58fef8c99b7@postgresql.org	2023-09-25 11:50:28 -04:00
Tom Lane	3aff1d3fd0	Doc: improve cross-reference in Makefile comment. Per gripe from Japin Li. Discussion: https://postgr.es/m/MEYP282MB16692171F13B5DF40DB768EEB6FCA@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM	2023-09-25 11:25:19 -04:00
Daniel Gustafsson	c1609cf3c0	Fix typo in numutils.c comments s/messges/messages/	2023-09-25 13:29:34 +02:00
Daniel Gustafsson	7750fefdb2	Add GUC for temporarily disabling event triggers In order to troubleshoot misbehaving or buggy event triggers, the documented advice is to enter single-user mode. In an attempt to reduce the number of situations where single-user mode is required (or even recommended) for non-extraordinary maintenance, this GUC allows to temporarily suspend event triggers. This was originally extracted from a larger patchset which aimed at supporting event triggers on login events. Reviewed-by: Ted Yu <yuzhihong@gmail.com> Reviewed-by: Mikhail Gribkov <youzhick@gmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Michael Paquier <michael@paquier.xyz Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/9140106E-F9BF-4D85-8FC8-F2D3C094A6D9@yesql.se Discussion: https://postgr.es/m/0d46d29f-4558-3af9-9c85-7774e14a7709@postgrespro.ru	2023-09-25 12:41:49 +02:00
Thomas Munro	bae868caf2	Don't trust unvalidated xl_tot_len. xl_tot_len comes first in a WAL record. Usually we don't trust it to be the true length until we've validated the record header. If the record header was split across two pages, previously we wouldn't do the validation until after we'd already tried to allocate enough memory to hold the record, which was bad because it might actually be garbage bytes from a recycled WAL file, so we could try to allocate a lot of memory. Release 15 made it worse. Since `70b4f82a4b`, we'd at least generate an end-of-WAL condition if the garbage 4 byte value happened to be > 1GB, but we'd still try to allocate up to 1GB of memory bogusly otherwise. That was an improvement, but unfortunately release 15 tries to allocate another object before that, so you could get a FATAL error and recovery could fail. We can fix both variants of the problem more fundamentally using pre-existing page-level validation, if we just re-order some logic. The new order of operations in the split-header case defers all memory allocation based on xl_tot_len until we've read the following page. At that point we know that its first few bytes are not recycled data, by checking its xlp_pageaddr, and that its xlp_rem_len agrees with xl_tot_len on the preceding page. That is strong evidence that xl_tot_len was truly the start of a record that was logged. This problem was most likely to occur on a standby, because walreceiver.c recycles WAL files without zeroing out trailing regions of each page. We could fix that too, but it wouldn't protect us from rare crash scenarios where the trailing zeroes don't make it to disk. With reliable xl_tot_len validation in place, the ancient policy of considering malloc failure to indicate corruption at end-of-WAL seems quite surprising, but changing that is left for later work. Also included is a new TAP test to exercise various cases of end-of-WAL detection by writing contrived data into the WAL from Perl. Back-patch to 12. We decided not to put this change into the final release of 11. Author: Thomas Munro <thomas.munro@gmail.com> Author: Michael Paquier <michael@paquier.xyz> Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> (the idea, not the code) Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Sergei Kornilov <sk@zsrv.org> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/17928-aa92416a70ff44a2%40postgresql.org	2023-09-23 10:26:24 +12:00
Daniel Gustafsson	5f3aa309a8	Avoid potential pfree on NULL on OpenSSL errors Guard against the pointer being NULL before pfreeing upon an error returned from OpenSSL. Also handle errors from X509_NAME_print_ex which can return -1 on memory allocation errors. Backpatch down to v15 where the code was added. Author: Sergey Shinderuk <s.shinderuk@postgrespro.ru> Discussion: https://postgr.es/m/8db5374d-32e0-6abb-d402-40762511eff2@postgrespro.ru Backpatch-through: v15	2023-09-22 11:18:25 +02:00
Peter Eisentraut	e59fcbd712	Simplify information schema check constraint deparsing The computation of the column information_schema.check_constraints.check_clause used pg_get_constraintdef() plus some string manipulation to get the check clause back out. This ended up with an extra pair of parentheses, which is only an aesthetic problem, but also with suffixes like "NOT VALID", which don't belong into that column. We can fix both of these problems and simplify the code by just using pg_get_expr() instead. Discussion: https://www.postgresql.org/message-id/799b59ef-3330-f0d2-ee23-8cdfa1740987@eisentraut.org	2023-09-22 07:43:26 +02:00
Tom Lane	48e2b234f8	Fix COMMIT/ROLLBACK AND CHAIN in the presence of subtransactions. In older branches, COMMIT/ROLLBACK AND CHAIN failed to propagate the current transaction's properties to the new transaction if there was any open subtransaction (unreleased savepoint). Instead, some previous transaction's properties would be restored. This is because the "if (s->chain)" check in CommitTransactionCommand examined the wrong instance of the "chain" flag and falsely concluded that it didn't need to save transaction properties. Our regression tests would have noticed this, except they used identical transaction properties for multiple tests in a row, so that the faulty behavior was not distinguishable from correct behavior. Commit `12d768e70` fixed the problem in v15 and later, but only rather accidentally, because I removed the "if (s->chain)" test to avoid a compiler warning, while not realizing that the warning was flagging a real bug. In v14 and before, remove the if-test and save transaction properties unconditionally; just as in the newer branches, that's not expensive enough to justify thinking harder. Add the comment and extra regression test to v15 and later to forestall any future recurrence, but there's no live bug in those branches. Patch by me, per bug #18118 from Liu Xiang. Back-patch to v12 where the AND CHAIN feature was added. Discussion: https://postgr.es/m/18118-4b72fcbb903aace6@postgresql.org	2023-09-21 23:11:30 -04:00
Etsuro Fujita	c621467d2b	Update comment about set_join_pathlist_hook(). The comment introduced by commit `e7cb7ee14` was a bit too terse, which could lead to extensions doing different things within the hook function than we intend to allow. Extend the comment to explain what they can do within the hook function. Back-patch to all supported branches. In passing, I rephrased a nearby comment that I recently added to the back branches. Reviewed by David Rowley and Andrei Lepikhov. Discussion: https://postgr.es/m/CAPmGK15SBPA1nr3Aqsdm%2BYyS-ay0Ayo2BRYQ8_A2To9eLqwopQ%40mail.gmail.com	2023-09-21 19:45:00 +09:00
Michael Paquier	c868cbfef7	Fix typos in pgoutput.c RelationSyncCache was mentioned in two comments under a different name. Issue noticed while reviewing a different patch touching the same area. Introduced by `665d1fad99`. Discussion: https://postgr.es/m/ZQk1Ca_eFDTmBiZy@paquier.xyz	2023-09-20 10:02:12 +09:00
Peter Eisentraut	c5b0582841	Replace more MemSet calls with struct initialization This fixes up `10ea0f924a` to use the style introduced by `9fd45870c1`. Author: Richard Guo <guofenglinux@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAMbWs490gJf5A=ydqyjh+Z8mVQa_foTGtcmBtHGLra0aOwLWHQ@mail.gmail.com	2023-09-19 11:35:01 +02:00
Heikki Linnakangas	bf094372d1	Fix GiST README's explanation of the NSN cross-check. The text got the condition backwards, it's "NSN > LSN", not "NSN < LSN". While we're at it, expand it a little for clarity. Reviewed-by: Daniel Gustafsson Discussion: https://www.postgresql.org/message-id/4cb46e18-e688-524a-0f73-b1f03ed5d6ee@iki.fi	2023-09-19 11:53:51 +03:00
Peter Eisentraut	9847ca2c79	Standardize type of extend_by counter The counter of extend_by loops is mixed int and uint32. Fix by standardizing from int to uint32, to match the extend_by variable. Fixup for `31966b151e`. Author: Ranier Vilela <ranier.vf@gmail.com> Reviewed-by: Gurjeet Singh <gurjeet@singh.im> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAEudQAqHG-JP-YnG54ftL_b7v6-57rMKwET_MSvEoen0UHuPig@mail.gmail.com	2023-09-19 09:46:01 +02:00
Michael Paquier	78a33bba4c	Improve error message for snapshot import in snapmgr.c, take two When a snapshot file fails to be read in ImportSnapshot(), it would issue an ERROR as "invalid snapshot identifier" when opening a stream for it in read-only mode. The error handling is improved to be more talkative in failure cases: - If a snapshot identifier uses incorrect characters, complain with the same error as before this commit. - If the snapshot file cannot be found in pg_snapshots/, complain with a "snapshot \"foo\" does not exist" instead. This maps to the case where AllocateFile() fails on ENOENT. Based on a suggestion from Andres Freund. - If AllocateFile() throws something else than ENOENT as errno, report it with more details in %m instead, as these failures are never expected. b29504eeb489 was the first improvement take. The older error message exists since `bb446b689b` that introduced snapshot imports. Two test cases are added to cover the cases of an identifier with an incorrect format and of a missing snapshot. Author: Bharath Rupireddy Reviewed-by: Andres Freund, Daniel Gustafsson, Michael Paquier Discussion: https://postgr.es/m/CALj2ACWmr=3KdxDkm8h7Zn1XxBoF6hdzq8WQyMn2y1OL5RYFrg@mail.gmail.com	2023-09-19 10:19:50 +09:00
Nathan Bossart	5af0263afd	Make binaryheap available to frontend code. There are a couple of places in frontend code that could make use of this simple binary heap implementation. This commit makes binaryheap usable in frontend code, much like commit `26aaf97b68` did for StringInfo. Like StringInfo, the header file is left in lib/ to reduce the likelihood of unnecessary breakage. The frontend version of binaryheap exposes a void -based API since frontend code does not have access to the Datum definitions. This seemed like a better approach than switching all existing uses to void or making the Datum definitions available to frontend code. Reviewed-by: Tom Lane, Alvaro Herrera Discussion: https://postgr.es/m/3612876.1689443232%40sss.pgh.pa.us	2023-09-18 12:18:33 -07:00
Tom Lane	f73fa5a470	Don't crash if cursor_to_xmlschema is used on a non-data-returning Portal. cursor_to_xmlschema() assumed that any Portal must have a tupDesc, which is not so. Add a defensive check. It's plausible that this mistake occurred because of the rather poorly chosen name of the lookup function SPI_cursor_find(), which in such cases is returning something that isn't very much like a cursor. Add some documentation to try to forestall future errors of the same ilk. Report and patch by Boyu Yang (docs changes by me). Back-patch to all supported branches. Discussion: https://postgr.es/m/dd343010-c637-434c-a8cb-418f53bda3b8.yangboyu.yby@alibaba-inc.com	2023-09-18 14:28:17 -04:00
Peter Eisentraut	a0a5e0feb3	Fix information schema for catalogued not-null constraints The column check_constraints.check_clause should be like col IS NOT NULL without a surrounding CHECK (...). Discussion: https://www.postgresql.org/message-id/09489196-0bc1-e796-c43e-63425f7c5910@eisentraut.org	2023-09-18 08:10:51 +02:00
Tom Lane	e0e492e5a9	Track nesting depth correctly when drilling down into RECORD Vars. expandRecordVariable() failed to adjust the parse nesting structure correctly when recursing to inspect an outer-level Var. This could result in assertion failures or core dumps in corner cases. Likewise, get_name_for_var_field() failed to adjust the deparse namespace stack correctly when recursing to inspect an outer-level Var. In this case the likely result was a "bogus varno" error while deparsing a view. Per bug #18077 from Jingzhou Fu. Back-patch to all supported branches. Richard Guo, with some adjustments by me Discussion: https://postgr.es/m/18077-b9db97c6e0ab45d8@postgresql.org	2023-09-15 17:01:52 -04:00
Daniel Gustafsson	a396e20ad0	Rename variable for code clarity When tracking IO timing for WAL, the duration is what we calculate based on the start and end timestamps, it's not what the variable contains. Rename the timestamp variable to end to better communicate what it contains. Original patch by Krishnakumar with additional hacking to fix another occurrence by me. Author: Krishnakumar R <kksrcv001@gmail.com> Discussion: https://postgr.es/m/CAPMWgZ9f9o8awrQpjo8oxnNQ=bMDVPx00NE0QcDzvHD_ZrdLPw@mail.gmail.com	2023-09-15 19:05:57 +02:00
Heikki Linnakangas	18724af9e8	Remove unnecessary smgrimmedsync() when creating unlogged table. This became safe after commit `4b4798e138`. The smgrcreate() call will now register the segment for syncing at the next checkpoint, so we don't need to sync it here. If a checkpoint happens before the creation is WAL-logged, the records will be replayed when starting recovery from the checkpoint. If a checkpoint happens after the WAL logging, the checkpoint will fsync() it. In the passing, clarify a comment in smgrDoPendingSyncs(). Discussion: https://www.postgresql.org/message-id/6e5bbc08-cdfc-b2b3-9e23-1a914b9850a9%40iki.fi Reviewed-by: Robert Haas	2023-09-15 17:29:37 +03:00
Daniel Gustafsson	b0ec61c9c2	Quote filenames in error messages The majority of all filenames are quoted in user facing error and log messages, but a few were still printed without quotes. While these filenames do not risk causing any ambiguity as their format is strict, quote them anyways to be consistent across all logs. Also concatenate a message to keep it one line to make it easier to grep for in the code. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/080EEABE-6645-4A46-AB20-6285ADAC44FE@yesql.se	2023-09-14 11:17:33 +02:00
Peter Eisentraut	be6f7cd9bb	Fix indentation in SQL file	2023-09-14 09:42:43 +02:00
Michael Paquier	be022908cf	Revert "Improve error message on snapshot import in snapmgr.c" This reverts commit `a0d87bcd9b`, following a remark from Andres Frend that the new error can be triggered with an incorrect SET TRANSACTION SNAPSHOT command without being really helpful for the user as it uses the internal file name. Discussion: https://postgr.es/m/20230914020724.hlks7vunitvtbbz4@awork3.anarazel.de Backpatch-through: 11	2023-09-14 16:00:01 +09:00
Amit Kapila	e0b2eed047	Flush logical slots to disk during a shutdown checkpoint if required. It's entirely possible for a logical slot to have a confirmed_flush LSN higher than the last value saved on disk while not being marked as dirty. Currently, it is not a major problem but a later patch adding support for the upgrade of slots relies on that value being properly flushed to disk. It can also help avoid processing the same transactions again in some boundary cases after the clean shutdown and restart. Say, we process some transactions for which we didn't send anything downstream (the changes got filtered) but the confirm_flush LSN is updated due to keepalives. As we don't flush the latest value of confirm_flush LSN, it may lead to processing the same changes again without this patch. The approach taken by this patch has been suggested by Ashutosh Bapat. Author: Vignesh C, Julien Rouhaud, Kuroda Hayato Reviewed-by: Amit Kapila, Dilip Kumar, Michael Paquier, Ashutosh Bapat, Peter Smith, Hou Zhijie Discussion: http://postgr.es/m/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com	2023-09-14 08:57:05 +05:30
Andres Freund	7369798a83	Fix tracking of temp table relation extensions as writes Karina figured out that I (Andres) confused BufferUsage.temp_blks_written with BufferUsage.local_blks_written in `fcdda1e4b5`. Tests in core PG can't easily test this, as BufferUsage is just used for EXPLAIN (ANALYZE, BUFFERS) and pg_stat_statements. Thus this commit adds tests for this to pg_stat_statements. Reported-by: Karina Litskevich <litskevichkarina@gmail.com> Author: Karina Litskevich <litskevichkarina@gmail.com> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CACiT8ibxXA6+0amGikbeFhm8B84XdQVo6D0Qfd1pQ1s8zpsnxQ@mail.gmail.com Backpatch: 16-, where `fcdda1e4b5` was merged	2023-09-13 19:14:09 -07:00
Michael Paquier	a0d87bcd9b	Improve error message on snapshot import in snapmgr.c When a snapshot file fails to be read in ImportSnapshot(), it would issue an ERROR as "invalid snapshot identifier" when opening a stream for it in read-only mode. This error message is reworded to be the same as all the other messages used in this case on failure, which is useful when debugging this area. Thinko introduced by `bb446b689b` where snapshot imports have been added. A backpatch down to 11 is done as this can improve any work related to snapshot imports in older branches. Author: Bharath Rupireddy Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/CALj2ACWmr=3KdxDkm8h7Zn1XxBoF6hdzq8WQyMn2y1OL5RYFrg@mail.gmail.com Backpatch-through: 11	2023-09-14 10:30:08 +09:00
Michael Paquier	b8f44a4779	Refactor error messages for unsupported providers in pg_locale.c These code paths should not be reached normally, but if they are an error with "(null)" as information for the collation provider would show up if no locale is set, while we can assume that we are referring to libc. This refactors the code so as the provider is always reported even if no locale is set. The name of the function where the error happens is added, while on it, as it can be helpful for debugging. Issue introduced by `d87d548cd0`, so backpatch down to 16. Author: Michael Paquier, Ranier Vilela Reviewed-by: Jeff Davis, Kyotaro Horiguchi Discussion: https://postgr.es/m/7073610042fcf97e1bea2ce08b7e0214b5e11094.camel@j-davis.com Backpatch-through: 16	2023-09-14 08:35:02 +09:00
David Rowley	ee3a551e96	Fix incorrect logic in plan dependency recording Both `50e17ad28` and `29f45e299` mistakenly tried to record a plan dependency on a function but mistakenly inverted the OidIsValid test. This meant that we'd record a dependency only when the function's Oid was InvalidOid. Clearly this was meant to not record the dependency in that case. `50e17ad28` made this mistake first, then in v15 `29f45e299` copied the same mistake. Reported-by: Tom Lane Backpatch-through: 14, where `50e17ad28` first made this mistake Discussion: https://postgr.es/m/2277537.1694301772@sss.pgh.pa.us	2023-09-14 11:27:29 +12:00
Amit Kapila	f062cddafe	Fix the ALTER SUBSCRIPTION to reflect the change in run_as_owner option. Reported-by: Jeff Davis Author: Hou Zhijie Reviewed-by: Amit Kapila Backpatch-through: 16 Discussion: http://postgr.es/m/17b62714fd115bd1899afd922954540a5c6a0467.camel@j-davis.com	2023-09-13 09:34:30 +05:30
Thomas Munro	3acd0599bd	Fix exception safety bug in typcache.c. If an out-of-memory error was thrown at an unfortunate time, ensure_record_cache_typmod_slot_exists() could leak memory and leave behind a global state that produced an infinite loop on the next call. Fix by merging RecordCacheArray and RecordIdentifierArray into a single array. With only one allocation or re-allocation, there is no intermediate state. Back-patch to all supported releases. Reported-by: "James Pang (chaolpan)" <chaolpan@cisco.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/PH0PR11MB519113E738814BDDA702EDADD6EFA%40PH0PR11MB5191.namprd11.prod.outlook.com	2023-09-13 14:58:22 +12:00
Michael Paquier	e434e21e11	Remove redundant assignments in copyfrom.c The tuple descriptor and the number of attributes are assigned twice to the same values in BeginCopyFrom(), for what looks like a small thinko coming from the refactoring done in `c532d15ddd`. Author: Jingtang Zhang Discussion: https://postgr.es/m/CAPsk3_CrYeXUVHEiaWAYxY9BKiGvGT3AoXo_+Jm0xP_s_VmXCA@mail.gmail.com	2023-09-09 21:12:41 +09:00
Daniel Gustafsson	5a3423ad8e	Add JIT deform_counter generation_counter includes time spent on both JIT:ing expressions and tuple deforming which are configured independently via options jit_expressions and jit_tuple_deforming. As they are combined in the same counter it's not apparent what fraction of time the tuple deforming takes. This adds deform_counter dedicated to tuple deforming, which allows seeing more directly the influence jit_tuple_deforming is having on the query. The counter is exposed in EXPLAIN and pg_stat_statements bumpin pg_stat_statements to 1.11. Author: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/20220612091253.eegstkufdsu4kfls@erthalion.local	2023-09-08 15:05:12 +02:00
Thomas Munro	04a09ee944	Teach WaitEventSetWait() to report multiple events on Windows. The WAIT_USE_WIN32 implementation of WaitEventSetWait() previously reported at most one event per call, because that's what the underlying WaitForMultipleObjects() call does. We can make the behavior match the three Unix implementations by looping until our output buffer is full, or there are no more events available now. This makes no difference to most callers including the regular FEBE socket code, since they ask for at most one event anyway. A difference in socket accept priority might be perceived by end users after commit `7389aad6` started using WaitEventSet in the postmaster. With this commit, the accept order now matches Unix systems, servicing listening sockets in round-robin order. We decided it wasn't really a bug or worth back-patching, but it seems good to align the behavior across platforms. Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Tested-by: "Wei Wang (Fujitsu)" <wangw.fnst@fujitsu.com> Discussion: https://postgr.es/m/CA%2BhUKG%2BA2dk29hr5zRP3HVJQ-_PncNJM6HVQ7aaYLXLRBZU-xw%40mail.gmail.com	2023-09-08 18:49:08 +12:00
Thomas Munro	9f0602539d	Remove some more "snapshot too old" vestiges. Commit `f691f5b8` removed the logic, but left behind some now-useless Snapshot arguments to various AM-internal functions, and missed a couple of comments. Reported-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wznj9qSNXZ1P1uWTUD_FeaTezbUazb416EPwi4Qr_jR_6A%40mail.gmail.com	2023-09-08 17:12:12 +12:00
Michael Paquier	e722846daf	Improve BackendXidGetPid() to only access allProcs on matching XID Compilers are able to optimize that, but it makes the code slightly more readable this way. Author: Zhao Junwang Reviewed-by: Ashutosh Bapat Discussion: https://postgr.es/m/CAEG8a3+i9gtqF65B+g_puVaCQuf0rZC-EMqMyEjGFJYOqUUWfA@mail.gmail.com	2023-09-08 10:00:29 +09:00
Robert Haas	9caf042088	Reorder tests in get_cheapest_path_for_pathkeys(). Checking parallel safety should be even cheaper than cost comparison, so do that first. Also make some minor, related comment improvements. Richard Guo, reviewed by Aleksander Alekseev, Andy Fan, and me. Discussion: http://postgr.es/m/CAMbWs4-KE2wf4QPj_Sr5mX4QFtBNNKGmxK=+e=KZEGUjdG33=g@mail.gmail.com	2023-09-07 13:51:35 -04:00
Alvaro Herrera	ac22a9545c	Move privilege check to the right place Now that ATExecDropConstraint doesn't recurse anymore, so it's wrong to test privileges "during recursion" there. Move the check to dropconstraint_internal, which is the place where recursion occurs. In passing, remove now-useless 'recursing' argument to ATExecDropConstraint. Discussion: https://postgr.es/m/202309051744.y4mndw5gwzhh@alvherre.pgsql	2023-09-07 12:15:18 +02:00
Alvaro Herrera	3af7217942	Update information_schema definition for not-null constraints Now that we have catalogued not-null constraints, our information_schema definition can be updated to grab those rather than fabricate synthetic definitions. Note that we still don't have catalog rows for not-null constraints on domains, but we've never had not-null constraints listed in information_schema, so that's a problem to be solved separately. Co-authored-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/81b461c4-edab-5d8c-2f88-203108425340@enterprisedb.com Discussion: https://postgr.es/m/202309041710.psytrxlsiqex@alvherre.pgsql	2023-09-07 11:33:01 +02:00
Thomas Munro	0da096d78e	Fix recovery conflict SIGUSR1 handling. We shouldn't be doing non-trivial work in signal handlers in general, and in this case the handler could reach unsafe code and corrupt state. It also clobbered its own "reason" code. Move all recovery conflict decision logic into the next CHECK_FOR_INTERRUPTS(), and have the signal handler just set flags and the latch, following the standard pattern. Since there are several different "reasons", use a separate flag for each. With this refactoring, the recovery conflict system no longer piggy-backs on top of the regular query cancelation mechanism, but instead raises an error directly if it decides that is necessary. It still needs to respect QueryCancelHoldoffCount, because otherwise the FEBE protocol might get out of sync (see commit `2b3a8b20c2`). This fixes one class of intermittent failure in the new 031_recovery_conflict.pl test added by commit `9f8a050f`, though the buggy coding is much older. Failures outside contrived testing seem to be very rare (or perhaps incorrectly attributed) in the field, based on lack of reports. No back-patch for now due to complexity and release schedule. We have the option to back-patch into 16 later, as 16 has prerequisite commit `bea3d7e`. Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Reviewed-by: Michael Paquier <michael@paquier.xyz> (earlier version) Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier version) Tested-by: Christoph Berg <myon@debian.org> Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com Discussion: https://postgr.es/m/CALj2ACVr8au2J_9D88UfRCi0JdWhyQDDxAcSVav0B0irx9nXEg%40mail.gmail.com	2023-09-07 12:39:24 +12:00
Nathan Bossart	3ed1956719	Make enum for sync methods available to frontend code. This commit renames RecoveryInitSyncMethod to DataDirSyncMethod and moves it to common/file_utils.h. This is preparatory work for a follow-up commit that will allow specifying the synchronization method in frontend utilities such as pg_upgrade and pg_basebackup. Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/ZN2ZB4afQ2JbR9TA%40paquier.xyz	2023-09-06 16:26:39 -07:00
Michael Paquier	59cbf60c0f	Remove column for wait event names in wait_event_names.txt This file is now made of two columns, removing the column listing the user-visible strings used in the system views and the documentation: - Enum definitions for each class without the prefix "WAIT_EVENT_", so as this information can be grepped in the code and wait_event_names.txt at the same time. - Description in the documentation. The wait event names are now generated from the enum objects in CamelCase, with the underscores removed. The data generated for wait events is consistent with what was produced by `414f6c0fb7`. This has the advantage to remove WAIT_EVENT_DOCONLY, which was a placeholder for the wait event types Lock and LWLock as these two only require the generation of the documentation. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/ZOxVHQwEC/9X/p/z@paquier.xyz	2023-09-06 10:27:02 +09:00
Michael Paquier	414f6c0fb7	Use more consistent names for wait event objects and types The event names use the same case-insensitive characters, hence applying lower() or upper() to the monitoring queries allows the detection of the same events as before this change. It is possible to cross-check the data with the system view pg_wait_events, for instance, with a query like that showing no differences: SELECT lower(type), lower(name), description FROM pg_wait_events ORDER BY 1, 2; This will help in the introduction of more simplifications in the format of wait_event_names. Some of the enum values in the code had to be renamed a bit to follow the same convention naming across the board. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/ZOxVHQwEC/9X/p/z@paquier.xyz	2023-09-06 10:04:43 +09:00
Nathan Bossart	f39b265808	Move PG_TEMP_FILE* macros to file_utils.h. Presently, frontend code that needs to use these macros must either include storage/fd.h, which declares several frontend-unsafe functions, or duplicate the macros. This commit moves these macros to common/file_utils.h, which is safe for both frontend and backend code. Consequently, we can also remove the duplicated macros in pg_checksums and stop including storage/fd.h in pg_rewind. Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/ZOP5qoUualu5xl2Z%40paquier.xyz	2023-09-05 17:02:06 -07:00
Nathan Bossart	119c23eb98	Replace known_assigned_xids_lck with memory barriers. This lock was introduced before memory barrier support was added, and it is only used to guarantee proper memory ordering when KnownAssignedXidsAdd() appends to the array without a lock. Now that such memory barrier support exists, we can remove the lock and use barriers instead. Suggested-by: Tom Lane Author: Michail Nikolaev Reviewed-by: Robert Haas Discussion: https://postgr.es/m/CANtu0oh0si%3DjG5z_fLeFtmYcETssQ08kLEa8b6TQqDm_cinroA%40mail.gmail.com	2023-09-05 13:59:06 -07:00
Thomas Munro	f691f5b80a	Remove the "snapshot too old" feature. Remove the old_snapshot_threshold setting and mechanism for producing the error "snapshot too old", originally added by commit `848ef42b`. Unfortunately it had a number of known problems in terms of correctness and performance, mostly reported by Andres in the course of his work on snapshot scalability. We agreed to remove it, after a long period without an active plan to fix it. This is certainly a desirable feature, and someone might propose a new or improved implementation in the future. Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CACG%3DezYV%2BEvO135fLRdVn-ZusfVsTY6cH1OZqWtezuEYH6ciQA%40mail.gmail.com Discussion: https://postgr.es/m/20200401064008.qob7bfnnbu4w5cw4%40alap3.anarazel.de Discussion: https://postgr.es/m/CA%2BTgmoY%3Daqf0zjTD%2B3dUWYkgMiNDegDLFjo%2B6ze%3DWtpik%2B3XqA%40mail.gmail.com	2023-09-05 19:53:43 +12:00
Michael Paquier	aa0d350456	Improve description of keys in tsvector If all the bits of a key in a tsvector are true (marked with ALLISTRUE), gtsvectorout() would show the following description: "0 true bits, 0 false bits" This is confusing, as all the bits are true, but this would be equivalent to the information if siglen is 0. This commit improves the output so as "all true bits" show instead in this case. Alexander has proposed a regression test for pageinspect, not included here as it is rather expensive compared to its coverage value. Author: Alexander Lakhin Discussion: https://postgr.es/m/17950-6c80a8d2b94ec695@postgresql.org	2023-09-05 13:57:44 +09:00
Michael Paquier	ae10dbb0c5	Fix out-of-bound read in gtsvector_picksplit() This could lead to an imprecise choice when splitting an index page of a GiST index on a tsvector, deciding which entries should remain on the old page and which entries should move to a new page. This is wrong since tsearch2 has been moved into core with commit `140d4ebcb4`, so backpatch all the way down. This error has been spotted by valgrind. Author: Alexander Lakhin Discussion: https://postgr.es/m/17950-6c80a8d2b94ec695@postgresql.org Backpatch-through: 11	2023-09-04 14:55:37 +09:00
Amit Kapila	e70ed4b1b8	Fix typo in decode.c. Author: Hou Zhijie Discussion: http://postgr.es/m/OS0PR01MB57162DFFFCFCDA2E4B95899394E4A@OS0PR01MB5716.jpnprd01.prod.outlook.com	2023-09-04 09:06:15 +05:30
Michael Paquier	2b8e5273e9	Fix handling of shared statistics with dropped databases Dropping a database while a connection is attempted on it was able to lead to the presence of valid database entries in shared statistics. The issue is that MyDatabaseId was getting set too early than it should, as, if the connection attempted on the dropped database fails when renamed or dropped, the shutdown callback of the shared statistics would finish by re-inserting a correct entry related to the database already dropped. As analyzed by the bug reporters, this issue could lead to phantom entries in the database list maintained by the autovacuum launcher (in rebuild_database_list()) if the database dropped was part of the database list when it was still valid. After the database was dropped, it would remain the highest on the list of databases to considered by the autovacuum worker as things to process. This would prevent autovacuum jobs to happen on all the other databases still present. The commit fixes this issue by delaying setting MyDatabaseId until the database existence has been re-checked with the second scan on pg_database after getting a shared lock on it, and by switching pgstat_update_dbstats() so as nothing happens if MyDatabaseId is not valid. Issue introduced by `5891c7a8ed`, so backpatch down to 15. Reported-by: Will Mortensen, Jacob Speidel Analyzed-by: Will Mortensen, Jacob Speidel Author: Andres Freund Discussion: https://postgr.es/m/17973-bca1f7d5c14f601e@postgresql.org Backpatch-through: 15	2023-09-04 08:04:22 +09:00
Alvaro Herrera	d0ec2ddbe0	Fix not-null constraint test When a partitioned table has a primary key, trying to find the corresponding not-null constraint for that column would come up empty, causing code that's trying to check said not-null constraint to crash. Fix by only running the check when the not-null constraint exists. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/d57b4a69-7394-3146-5976-9a1ef27e7972@gmail.com	2023-09-01 19:49:20 +02:00
Alvaro Herrera	e09d763e25	ATPrepAddPrimaryKey: ignore non-PK constraints Because of lack of test coverage, this function added by `b0e96f3119` wasn't ignoring constraint types other than primary keys, which it should have. Add some lines to a test for it. Reported-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/CAMbWs48bc-k_-1fh0dZpAhp_LiR5MfEX9haystmoBboR_4czCQ@mail.gmail.com	2023-09-01 14:21:27 +02:00
Heikki Linnakangas	e8d74ad625	Report syncscan position at end of scan. The comment in heapgettup_advance_block() says that it reports the scan position before checking for end of scan, but that didn't match the code. The code was refactored in commit `7ae0ab0ad9`, which inadvertently changed the order of the check and reporting. Change it back. This caused a few regression test failures with a small shared_buffers setting like 10 MB. The 'portals' and 'cluster' tests perform seqscans that are large enough that sync seqscans kick in. When the sync scan position is not updated at end of scan, the next seq scan doesn't start at the beginning of the table, and the test queries are sensitive to that. Reviewed-by: Melanie Plageman, David Rowley Discussion: https://www.postgresql.org/message-id/6f991389-ae22-d844-a9d8-9aceb7c01a9a@iki.fi Backpatch-through: 16	2023-08-31 13:02:15 +03:00
Peter Eisentraut	d7ceb41b9b	Correct ObjectProperty entry for transforms There was some confusion in the ObjectProperty entry for transforms. Some fields had values that were apparently meant for a different field. Also, some fields were not assigned, which is okay for most fields, but not for all. In particular, for .oid_catcache_id, .name_catcache_id, and .objtype, zero is a valid value, so we need to use -1 if not applicable. It has apparently been like that from the very beginning (commit `cac7658205`). The faulty values were not actually reachable, so it's not a big problem in practice, but we should make it correct. Discussion: https://www.postgresql.org/message-id/flat/75ae5875-3abc-dafc-8aec-73247ed41cde@eisentraut.org	2023-08-31 11:11:59 +02:00
Peter Eisentraut	f94dec76cc	genbki.pl: Factor out boilerplate generation Discussion: https://www.postgresql.org/message-id/flat/75ae5875-3abc-dafc-8aec-73247ed41cde@eisentraut.org	2023-08-31 10:21:34 +02:00
Peter Eisentraut	226d0a6b98	Restructure DECLARE_INDEX arguments Separate the table name from the index declaration. We need that anyway later for the ALTER TABLE / USING INDEX commands, so we might as well structure the declarations like that to begin with. Discussion: https://www.postgresql.org/message-id/flat/75ae5875-3abc-dafc-8aec-73247ed41cde@eisentraut.org	2023-08-31 08:14:57 +02:00
Michael Paquier	b5934bfd60	Fix some shadow variables in src/backend/replication/ The code is able to compile already without warnings under -Wshadow=compatible-local, which is itself already enabled in the tree, and the ones fixed here showed up with the more restrictive -Wshadow. There are more of these that we may want to look at, and the ones fixed here made the code confusing. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+PuR0y4ofNOxi691VTVWmBfScHV9AaBMGSpeh8+DKp81Nw@mail.gmail.com	2023-08-31 08:07:48 +09:00
Nathan Bossart	d0fe3046ee	Use actual backend IDs in pg_stat_get_backend_subxact(). Unlike the other pg_stat_get_backend* functions, pg_stat_get_backend_subxact() looks up the backend entry by using its integer argument as a 1-based index in an internal array. The other functions look for the entry with the matching session backend ID. These numbers often match, but that isn't reliably true. This commit resolves this discrepancy by introducing pgstat_get_local_beentry_by_backend_id() and using it in pg_stat_get_backend_subxact(). We cannot use pgstat_get_beentry_by_backend_id() because it returns a PgBackendStatus, which lacks the locally computed additions available in LocalPgBackendStatus that are required by pg_stat_get_backend_subxact(). Author: Ian Barwick Reviewed-by: Sami Imseih, Michael Paquier, Robert Haas Discussion: https://postgr.es/m/CAB8KJ%3Dj-ACb3H4L9a_b3ZG3iCYDW5aEu3WsPAzkm2S7JzS1Few%40mail.gmail.com Backpatch-through: 16	2023-08-30 14:47:01 -07:00
Nathan Bossart	3d51cb5197	Rename some support functions for pgstat* views. Presently, pgstat_fetch_stat_beentry() accepts a session's backend ID as its argument, and pgstat_fetch_stat_local_beentry() accepts a 1-based index in an internal array as its argument. The former is typically used wherever a user must provide a backend ID, and the latter is usually used internally when looping over all entries in the array. This difference was first introduced by `d7e39d72ca`. Before that commit, both functions accepted a 1-based index to the internal array. This commit renames these two functions to make it clear whether they use the backend ID or the 1-based index to look up the entry. This is preparatory work for a follow-up change that will introduce a function for looking up a LocalPgBackendStatus using a backend ID. Reviewed-by: Ian Barwick, Sami Imseih, Michael Paquier, Robert Haas Discussion: https://postgr.es/m/CAB8KJ%3Dj-ACb3H4L9a_b3ZG3iCYDW5aEu3WsPAzkm2S7JzS1Few%40mail.gmail.com Backpatch-through: 16	2023-08-30 14:46:52 -07:00
Peter Eisentraut	fe39f43352	Fix possible compiler warning related to `1fa9241bdd`	2023-08-30 16:21:21 +02:00
Nathan Bossart	6475e663b3	Fix misuse of PqMsg_Close. EndCommand() and EndReplicationCommand() should use PqMsg_CommandComplete instead. Oversight in commit `f4b54e1ed9`. Reported-by: Pavel Stehule, Tatsuo Ishii Author: Pavel Stehule Reviewed-by: Aleksander Alekseev, Michael Paquier Discussion: https://postgr.es/m/CAFj8pRAMDCJXjnwiCkCB1yO1f7NPggFY8PwwAJDnugu-Z2G-Cg%40mail.gmail.com	2023-08-29 18:32:38 -07:00
Michael Paquier	52c6c0f196	Avoid possible overflow with ltsGetFreeBlock() in logtape.c nFreeBlocks, defined as a long, stores the number of free blocks in a logical tape. ltsGetFreeBlock() has been using an int to store the value of nFreeBlocks, which could lead to overflows on platforms where long and int are not the same size (in short everything except Windows where long is 4 bytes). The problematic intermediate variable is switched to be a long instead of an int. Issue introduced by `c02fdc9223`, so backpatch down to 13. Author: Ranier vilela Reviewed-by: Peter Geoghegan, David Rowley Discussion: https://postgr.es/m/CAEudQApLDWCBR_xmwNjGBrDo+f+S4E87x3s7-+hoaKqYdtC4JQ@mail.gmail.com Backpatch-through: 13	2023-08-30 08:03:42 +09:00
Alvaro Herrera	9b581c5341	Disallow changing NO INHERIT status of a not-null constraint It makes no sense to add a NO INHERIT not-null constraint to a child table that already has one in that column inherited from its parent. Disallow that, and add tests for the relevant cases. Per complaint from Kyotaro Horiguchi. I also used part of his proposed patch. Co-authored-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20230828.161658.1184657435220765047.horikyota.ntt@gmail.com	2023-08-29 19:19:24 +02:00
Peter Eisentraut	63956bed7b	Rename logical_replication_mode to debug_logical_replication_streaming The logical_replication_mode GUC is intended for testing and debugging purposes, but its current name may be misleading and encourage users to make unnecessary changes. To avoid confusion, renaming the GUC to a less misleading name debug_logical_replication_streaming that casual users are less likely to mistakenly assume needs to be modified in a regular logical replication setup. Author: Hou Zhijie <houzj.fnst@cn.fujitsu.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/d672d774-c44b-6fec-f993-793e744f169a%40eisentraut.org	2023-08-29 15:19:56 +02:00
Peter Eisentraut	6844d3275a	Remove useless if condition We can call GetAttributeCompression() with a NULL argument. It handles that internally already. This change makes all the callers of GetAttributeCompression() uniform. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org	2023-08-29 08:58:56 +02:00
Peter Eisentraut	689c66a84b	Remove useless if condition This is useless because these fields are not set anywhere before, so we can assign them unconditionally. This also makes this more consistent with ATExecAddColumn(). Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org	2023-08-29 08:52:22 +02:00
Peter Eisentraut	1fa9241bdd	Make more use of makeColumnDef() Since we already have it, we might as well make full use of it, instead of assembling ColumnDef by hand in several places. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org	2023-08-29 08:45:05 +02:00
Peter Eisentraut	2b088c8e4a	Add some const decorations Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org	2023-08-29 08:30:45 +02:00
Heikki Linnakangas	5fec3c870e	Initialize ListenSocket array earlier. After commit `b0bea38705`, syslogger prints 63 warnings about failing to close a listen socket at postmaster startup. That's because the syslogger process forks before the ListenSockets array is initialized, so ClosePostmasterPorts() calls "close(0)" 64 times. The first call succeeds, because fd 0 is stdin. This has been like this since commit `9a86f03b4e` in version 13, which moved the SysLogger_Start() call to before initializing ListenSockets. We just didn't notice until commit `b0bea38705` added the LOG message. Reported by Michael Paquier and Jeff Janes. Author: Michael Paquier Discussion: https://www.postgresql.org/message-id/ZOvvuQe0rdj2slA9%40paquier.xyz Discussion: https://www.postgresql.org/message-id/ZO0fgDwVw2SUJiZx@paquier.xyz#482670177eb4eaf4c9f03c1eed963e5f Backpatch-through: 13	2023-08-29 09:09:40 +03:00
Michael Paquier	f593c5517d	Tweak pg_promote() to report failures on kill() or postmaster failures Since its introduction in `10074651e3`, pg_promote() has been returning a false status in three cases: - SIGUSR1 not sent to the postmaster process. - Postmaster death during standby promotion. - Standby not promoted within the specified wait time. An application calling this function will have a hard time understanding what a false state returned actually means. Per discussion, this switches the two first states to fail rather than return a "false" status, making the second case more consistent with the existing CHECK_FOR_INTERRUPTS in the wait loop. False is only returned when the promotion is not completed within the specified time (60s by default). Author: Ashutosh Sharma Reviewed-by: Fujii Masao, Laurenz Albe, Michael Paquier Discussion: https://postgr.es/m/CAE9k0P=QTrwptL0t4J0fuBRDDjgsT-0PVKd-ikd96i1hyL7Bcg@mail.gmail.com	2023-08-29 08:45:04 +09:00
Peter Eisentraut	36e4419d1f	Make error messages about WAL segment size more consistent Make the primary messages more compact and make the detail messages uniform. In initdb.c and pg_resetwal.c, use the newish option_parse_int() to simplify some of the option parsing. For the backend GUC wal_segment_size, add a GUC check hook to do the verification instead of coding it in bootstrap.c. This might be overkill, but that way the check is in the right place and it becomes more self-documenting. In passing, make pg_controldata use the logging API for warning messages. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/flat/9939aa8a-d7be-da2c-7715-0a0b5535a1f7@eisentraut.org	2023-08-28 15:17:04 +02:00
Michael Paquier	bb9002257b	Fix some typos in wait_event_names.txt Noticed in passing, while hacking on a different patch touching this area.	2023-08-28 17:09:12 +09:00
Michael Paquier	617f9b7d4b	Tighten unit parsing in internal values Interval values now generate an error when the user has multiple consecutive units or a unit without a value. Previously, it was possible to specify multiple units consecutively which is contrary to what the documentation allows, so it was possible to finish with confusing interval values. This is a follow-up of the work done in `165d581f14`. Author: Joseph Koshakow Reviewed-by: Jacob Champion, Gurjeet Singh, Reid Thompson Discussion: https://postgr.es/m/CAAvxfHd-yNO+XYnUxL=GaNZ1n+eE0V-oE0+-cC1jdjdU0KS3iw@mail.gmail.com	2023-08-28 14:27:17 +09:00
Michael Paquier	165d581f14	Tighten handling of "ago" in interval values This commit Restrict the unit "ago" to only appear at the end of the interval. According to the documentation, a direction can only be defined at the end of an interval, but it was possible to define it in the middle of the string or define it multiple times. In spirit, this is similar to the error handling improvements done in `5b3c595355` or `bcc704b524`. Author: Joseph Koshakow Reviewed-by: Jacob Champion, Gurjeet Singh, Reid Thompson Discussion: https://postgr.es/m/CAAvxfHd-yNO+XYnUxL=GaNZ1n+eE0V-oE0+-cC1jdjdU0KS3iw@mail.gmail.com	2023-08-28 13:49:55 +09:00
Peter Eisentraut	9a0ddc39c6	Format list of catalog files in makefile vertically This makes it easier to compare the lists visually with the corresponding meson lists. In passing, copy over some relevant comments from the makefiles to meson.build. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/flat/a306be82-ee71-4554-d499-49a45a654396%40eisentraut.org	2023-08-28 06:20:56 +02:00
Michael Paquier	d6d1430f40	Remove dead code in DecodeInterval() This commit removes some dead code related to the unit type RESERV, whose last use has been removed from the unit lookup table used for intervals ("deltatktbl" in datetime.c) in `666cbae16d`. Before that, RESERV was used as an equivalent of "invalid", but that's now unreachable. Author: Joseph Koshakow Reviewed-by: Jacob Champion, Gurjeet Singh, Reid Thompson Discussion: https://postgr.es/m/CAAvxfHd-yNO+XYnUxL=GaNZ1n+eE0V-oE0+-cC1jdjdU0KS3iw@mail.gmail.com	2023-08-28 12:53:41 +09:00
Michael Paquier	bb45156f34	Show names of DEALLOCATE as constants in pg_stat_statements This commit switches query jumbling so as prepared statement names are treated as constants in DeallocateStmt. A boolean field is added to DeallocateStmt to make a distinction between ALL and named prepared statements, as "name" was used to make this difference before, NULL meaning DEALLOCATE ALL. Prior to this commit, DEALLOCATE was not tracked in pg_stat_statements, for the reason that it was not possible to treat its name parameter as a constant. Now that query jumbling applies to all the utility nodes, this reason does not apply anymore. Like `638d42a3c5`, this can be a huge advantage for monitoring where prepared statement names are randomly generated, preventing bloat in pg_stat_statements. A couple of tests are added to track the new behavior. Author: Dagfinn Ilmari Mannsåker, Michael Paquier Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/ZMhT9kNtJJsHw6jK@paquier.xyz	2023-08-27 17:27:44 +09:00
Michael Paquier	e48b19c5db	Generate new LOG for "trust" connections under log_connections Adding an extra LOG for connections that have not set an authn ID, like when the "trust" authentication method is used, is useful for audit purposes. A couple of TAP tests for SSL and authentication need to be tweaked to adapt to this new LOG generated, as some scenarios expected no logs but they now get a hit. Reported-by: Shaun Thomas Author: Jacob Champion Reviewed-by: Robert Haas, Michael Paquier Discussion: https://postgr.es/m/CAFdbL1N7-GF-ZXKaB3XuGA+CkSmnjFvqb8hgjMnDfd+uhL2u-A@mail.gmail.com	2023-08-26 20:11:19 +09:00
Alvaro Herrera	b0e96f3119	Catalog not-null constraints We now create contype='n' pg_constraint rows for not-null constraints. We propagate these constraints to other tables during operations such as adding inheritance relationships, creating and attaching partitions and creating tables LIKE other tables. We also spawn not-null constraints for inheritance child tables when their parents have primary keys. These related constraints mostly follow the well-known rules of conislocal and coninhcount that we have for CHECK constraints, with some adaptations: for example, as opposed to CHECK constraints, we don't match not-null ones by name when descending a hierarchy to alter it, instead matching by column name that they apply to. This means we don't require the constraint names to be identical across a hierarchy. For now, we omit them for system catalogs. Maybe this is worth reconsidering. We don't support NOT VALID nor DEFERRABLE clauses either; these can be added as separate features later (this patch is already large and complicated enough.) psql shows these constraints in \d+. pg_dump requires some ad-hoc hacks, particularly when dumping a primary key. We now create one "throwaway" not-null constraint for each column in the PK together with the CREATE TABLE command, and once the PK is created, all those throwaway constraints are removed. This avoids having to check each tuple for nullness when the dump restores the primary key creation. pg_upgrading from an older release requires a somewhat brittle procedure to create a constraint state that matches what would be created if the database were being created fresh in Postgres 17. I have tested all the scenarios I could think of, and it works correctly as far as I can tell, but I could have neglected weird cases. This patch has been very long in the making. The first patch was written by Bernd Helmle in 2010 to add a new pg_constraint.contype value ('n'), which I (Álvaro) then hijacked in 2011 and 2012, until that one was killed by the realization that we ought to use contype='c' instead: manufactured CHECK constraints. However, later SQL standard development, as well as nonobvious emergent properties of that design (mostly, failure to distinguish them from "normal" CHECK constraints as well as the performance implication of having to test the CHECK expression) led us to reconsider this choice, so now the current implementation uses contype='n' again. During Postgres 16 this had already been introduced by commit `e056c557ae`, but there were some problems mainly with the pg_upgrade procedure that couldn't be fixed in reasonable time, so it was reverted. In 2016 Vitaly Burovoy also worked on this feature[1] but found no consensus for his proposed approach, which was claimed to be closer to the letter of the standard, requiring an additional pg_attribute column to track the OID of the not-null constraint for that column. [1] https://postgr.es/m/CAKOSWNkN6HSyatuys8xZxzRCR-KL1OkHS5-b9qd9bf1Rad3PLA@mail.gmail.com Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Bernd Helmle <mailings@oopsware.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>	2023-08-25 13:31:24 +02:00
Amit Kapila	9c13b6814a	Reset the logical worker type while cleaning up other worker info. Commit `2a8b40e36` introduces the worker type field for logical replication workers, but forgot to reset the type when the worker exits. This can lead to recognizing a stopped worker as a valid logical replication worker. Fix it by resetting the worker type and additionally adding the safeguard to not use LogicalRepWorker until ->in_use is verified. Reported-by: Thomas Munro based on cfbot reports. Author: Hou Zhijie, Alvaro Herrera Reviewed-by: Amit Kapila Discussion: http://postgr.es/m/CA+hUKGK2RQh4LifVgBmkHsCYChP-65UwGXOmnCzYVa5aAt4GWg@mail.gmail.com	2023-08-25 08:57:55 +05:30
Tom Lane	d8b2fcc9d4	Avoid unnecessary plancache revalidation of utility statements. Revalidation of a plancache entry (after a cache invalidation event) requires acquiring a snapshot. Normally that is harmless, but not if the cached statement is one that needs to run without acquiring a snapshot. We were already aware of that for TransactionStmts, but for some reason hadn't extrapolated to the other statements that PlannedStmtRequiresSnapshot() knows mustn't set a snapshot. This can lead to unexpected failures of commands such as SET TRANSACTION ISOLATION LEVEL. We can fix it in the same way, by excluding those command types from revalidation. However, we can do even better than that: there is no need to revalidate for any statement type for which parse analysis, rewrite, and plan steps do nothing interesting, which is nearly all utility commands. To mechanize this, invent a parser function stmt_requires_parse_analysis() that tells whether parse analysis does anything beyond wrapping a CMD_UTILITY Query around the raw parse tree. If that's what it does, then rewrite and plan will just skip the Query, so that it is not possible for the same raw parse tree to produce a different plan tree after cache invalidation. stmt_requires_parse_analysis() is basically equivalent to the existing function analyze_requires_snapshot(), except that for obscure reasons that function omits ReturnStmt and CallStmt. It is unclear whether those were oversights or intentional. I have not been able to demonstrate a bug from not acquiring a snapshot while analyzing these commands, but at best it seems mighty fragile. It seems safer to acquire a snapshot for parse analysis of these commands too, which allows making stmt_requires_parse_analysis and analyze_requires_snapshot equivalent. In passing this fixes a second bug, which is that ResetPlanCache would exclude ReturnStmts and CallStmts from revalidation. That's surely not safe, since they contain parsable expressions. Per bug #18059 from Pavel Kulakov. Back-patch to all supported branches. Discussion: https://postgr.es/m/18059-79c692f036b25346@postgresql.org	2023-08-24 12:02:46 -04:00
Heikki Linnakangas	b0bea38705	Use FD_CLOEXEC on ListenSockets It's good hygiene if e.g. an extension launches a subprogram when being loaded. We went through some effort to close them in the child process in EXEC_BACKEND mode, but it's better to not hand them down to the child process in the first place. We still need to close them after fork when !EXEC_BACKEND, but it's a little simpler. In the passing, LOG a message if closing the client connection or listen socket fails. Shouldn't happen, but if it does, would be nice to know. Reviewed-by: Tristan Partin, Andres Freund, Thomas Munro Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi	2023-08-24 17:03:05 +03:00
Peter Eisentraut	d71e6055e4	Fix lack of message pluralization	2023-08-24 14:22:46 +02:00
Peter Eisentraut	4f3514f201	Rename hook functions for debug_io_direct to match variable name. Commit `319bae9a` renamed the GUC. Rename the check and assign functions to match, and alphabetize. Back-patch to 16. Author: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/2769341e-fa28-c2ee-3e4b-53fdcaaf2271%40eisentraut.org	2023-08-24 22:25:49 +12:00
Amit Kapila	27449ccc4d	Fix the error message when failing to restore the snapshot. The SnapBuildRestoreContents() used a const value in the error message to indicate the size in bytes it was expecting to read from the serialized snapshot file. Fix it by reporting the size that was actually passed. Author: Hou Zhijie Reviewed-by: Amit Kapila Backpatch-through: 16 Discussion: http://postgr.es/m/OS0PR01MB5716D408364F7DF32221C08D941FA@OS0PR01MB5716.jpnprd01.prod.outlook.com	2023-08-24 14:37:29 +05:30
Peter Eisentraut	9efcf442b9	Fix translation markers Conditionals cannot be inside gettext trigger functions, they must be applied outside.	2023-08-24 10:25:51 +02:00
Heikki Linnakangas	bf8bf6d0bd	Fix _bt_allequalimage() call within critical section. _bt_allequalimage() does complicated things, so it's not OK to call it in a critical section. Per buildfarm failure on 'prion', which uses -DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE options. Discussion: https://www.postgresql.org/message-id/6e5bbc08-cdfc-b2b3-9e23-1a914b9850a9@iki.fi Backpatch-through: 16, like commit `ccadf73163` that introduced this	2023-08-23 18:12:41 +03:00
Nathan Bossart	260a1f18da	Add to_bin() and to_oct(). This commit introduces functions for converting numbers to their equivalent binary and octal representations. Also, the base conversion code for these functions and to_hex() has been moved to a common helper function. Co-authored-by: Eric Radman Reviewed-by: Ian Barwick, Dag Lem, Vignesh C, Tom Lane, Peter Eisentraut, Kirk Wolak, Vik Fearing, John Naylor, Dean Rasheed Discussion: https://postgr.es/m/Y6IyTQQ/TsD5wnsH%40vm3.eradman.com	2023-08-23 07:49:03 -07:00
Heikki Linnakangas	ccadf73163	Use the buffer cache when initializing an unlogged index. Some of the ambuildempty functions used smgrwrite() directly, followed by smgrimmedsync(). A few small problems with that: Firstly, one is supposed to use smgrextend() when extending a relation, not smgrwrite(). It doesn't make much difference in production builds. smgrextend() updates the relation size cache, so you miss that, but that's harmless because we never use the cached relation size of an init fork. But if you compile with CHECK_WRITE_VS_EXTEND, you get an assertion failure. Secondly, the smgrwrite() calls were performed before WAL-logging, so the page image written to disk had 0/0 as the LSN, not the LSN of the WAL record. That's also harmless in practice, but seems sloppy. Thirdly, it's better to use the buffer cache, because then you don't need to smgrimmedsync() the relation to disk, which adds latency. Bypassing the cache makes sense for bulk operations like index creation, but not when you're just initializing an empty index. Creation of unlogged tables is hardly performance bottleneck in any real world applications, but nevertheless. Backpatch to v16, but no further. These issues should be harmless in practice, so better to not rock the boat in older branches. Reviewed-by: Robert Haas Discussion: https://www.postgresql.org/message-id/6e5bbc08-cdfc-b2b3-9e23-1a914b9850a9@iki.fi	2023-08-23 17:21:31 +03:00
Daniel Gustafsson	27a36f79b6	Fix wording in comment The comment for the DSM_OP_CREATE paramater read "the a new handle" which is confusing. Fix by rewording to indicate what the parameter means for DSM_OP_CREATE. Reported-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAEG8a3J2bc197ym-M_ykOXb9ox2eNn-QNKNeoSAoHYSw2NCOnw@mail.gmail.com	2023-08-23 10:22:55 +02:00
Peter Eisentraut	ae556c4416	Some vertical reformatting Remove some line breaks that have become unnecessary after some variable renaming. Discussion: https://www.postgresql.org/message-id/flat/5ed89c69-f4e6-5dab-4003-63bde7460e5e%40eisentraut.org	2023-08-23 06:39:39 +02:00
Peter Eisentraut	23382b0f8b	Rename some function arguments for better clarity Especially make sure that array arguments have plural names. Discussion: https://www.postgresql.org/message-id/flat/5ed89c69-f4e6-5dab-4003-63bde7460e5e%40eisentraut.org	2023-08-23 06:39:39 +02:00
Peter Eisentraut	11af63fb48	Add const decorations in index.c and indexcmds.c and some adjacent places. This especially makes it easier to understand for some complicated function signatures which are the input and the output arguments. Discussion: https://www.postgresql.org/message-id/flat/5ed89c69-f4e6-5dab-4003-63bde7460e5e%40eisentraut.org	2023-08-23 06:39:39 +02:00
Nathan Bossart	f4b54e1ed9	Introduce macros for protocol characters. This commit introduces descriptively-named macros for the identifiers used in wire protocol messages. These new macros are placed in a new header file so that they can be easily used by third-party code. Author: Dave Cramer Reviewed-by: Alvaro Herrera, Tatsuo Ishii, Peter Smith, Robert Haas, Tom Lane, Peter Eisentraut, Michael Paquier Discussion: https://postgr.es/m/CADK3HHKbBmK-PKf1bPNFoMC%2BoBt%2BpD9PH8h5nvmBQskEHm-Ehw%40mail.gmail.com	2023-08-22 19:16:12 -07:00
Thomas Munro	7114791158	ExtendBufferedWhat -> BufferManagerRelation. Commit `31966b15` invented a way for functions dealing with relation extension to accept a Relation in online code and an SMgrRelation in recovery code. It seems highly likely that future bufmgr.c interfaces will face the same problem, and need to do something similar. Generalize the names so that each interface doesn't have to re-invent the wheel. Back-patch to 16. Since extension AM authors might start using the constructor macros once 16 ships, we agreed to do the rename in 16 rather than waiting for 17. Reviewed-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKG%2B6tLD2BhpRWycEoti6LVLyQq457UL4ticP5xd8LqHySA%40mail.gmail.com	2023-08-23 12:31:23 +12:00
Andrew Dunstan	a684581085	Cache by-reference missing values in a long lived context Attribute missing values might be needed past the lifetime of the tuple descriptors from which they are extracted. To avoid possibly using pointers for by-reference values which might thus be left dangling, we cache a datumCopy'd version of the datum in the TopMemoryContext. Since we first search for the value this only needs to be done once per session for any such value. Original complaint from Tom Lane, idea for mitigation by Andrew Dunstan, tweaked by Tom Lane. Backpatch to version 11 where missing values were introduced. Discussion: https://postgr.es/m/1306569.1687978174@sss.pgh.pa.us	2023-08-22 15:17:05 -04:00
Alvaro Herrera	757fa45c86	Add comment missing in `a4a232b1e7` Noticed while studying nearby code	2023-08-22 12:22:21 +02:00
Amit Kapila	1cdc6d86bf	Simplify the logical worker type checks by using the switch on worker type. The current code uses if/else statements at various places to take worker specific actions. Change those to use the switch on worker type added by commit `2a8b40e368`. This makes code easier to read and understand. Author: Peter Smith Reviewed-by: Amit Kapila, Hou Zhijie Discussion: http://postgr.es/m/CAHut+PttPSuP0yoZ=9zLDXKqTJ=d0bhxwKaEaNcaym1XqcvDEg@mail.gmail.com	2023-08-22 08:50:44 +05:30
Michael Paquier	6fde2d9a00	Fix pg_stat_reset_single_table_counters() for shared relations This commit fixes the function of $subject for shared relations. This feature has been added by `e042678`. Unfortunately, this new behavior got removed by `5891c7a` when moving statistics to shared memory. Reported-by: Mitsuru Hinata Author: Masahiro Ikeda Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada Discussion: https://postgr.es/m/7cc69f863d9b1bc677544e3accd0e4b4@oss.nttdata.com Backpatch-through: 15	2023-08-21 13:32:14 +09:00
Michael Paquier	1e68e43d3f	Add system view pg_wait_events This new view, wrapped around a SRF, shows some information known about wait events, as of: - Name. - Type (Activity, I/O, Extension, etc.). - Description. All the information retrieved comes from wait_event_names.txt, and the description is the same as the documentation with filters applied to remove any XML markups. This view is useful when joined with pg_stat_activity to get the description of a wait event reported. Custom wait events for extensions are included in the view. Original idea by Yves Colin. Author: Bertrand Drouvot Reviewed-by: Kyotaro Horiguchi, Masahiro Ikeda, Tom Lane, Michael Paquier Discussion: https://postgr.es/m/0e2ae164-dc89-03c3-cf7f-de86378053ac@gmail.com	2023-08-20 15:35:02 +09:00
Peter Eisentraut	881cd9e581	Remove dubious warning message from SQL/JSON functions There was a warning that FORMAT JSON has no effect on json/jsonb types, which is true, but it's not clear why we should issue a warning about it. The SQL standard does not say anything about this, which should generally govern the behavior here. So remove it. Discussion: https://www.postgresql.org/message-id/flat/dfec2cae-d17e-c508-6d16-c2dba82db486%40eisentraut.org	2023-08-18 07:41:14 +02:00
Michael Paquier	00e49233a9	Fix format if entry in wait_event_names.txt The entry LockManager had two successive whitespaces between two words. This is not an actual bug, but let's be clean. Thinko in `fa88928`. Reported-by: Masahiro Ikeda Author: Bertrand Drouvot Discussion: https://postgr.es/m/dd836027-2e9e-4df9-9fd9-7527cd1757e1@gmail.com	2023-08-18 08:11:10 +09:00
Thomas Munro	81e36d3e0d	Invalidate smgr_targblock in smgrrelease(). In rare circumstances involving relfilenode reuse, it might have been possible for smgr_targblock to finish up pointing past the end. Oversight in `b74e94dc`. Back-patch to 15. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA%2BhUKGJ8NTvqLHz6dqbQnt2c8XCki4r2QvXjBQcXpVwxTY_pvA%40mail.gmail.com	2023-08-17 15:45:13 +12:00
Michael Paquier	352ea3acf8	Add OAT hook calls for more subcommands of ALTER TABLE The OAT hooks are added in ALTER TABLE for the following subcommands: - { ENABLE \| DISABLE \| [NO] FORCE } ROW LEVEL SECURITY - { ENABLE \| DISABLE } TRIGGER - { ENABLE \| DISABLE } RULE. Note that there was hook for pg_rewrite, but not for relation ALTER'ed in pg_class. Tests are added to test_oat_hook for all the subcommand patterns gaining hooks here. Based on an ask from Legs Mansion. Discussion: https://postgr.es/m/tencent_083B3850655AC6EE04FA0A400766D3FE8309@qq.com	2023-08-17 08:54:17 +09:00
Peter Eisentraut	dca20013eb	Unify some error messages We had essentially the same error in several different wordings. Unify that.	2023-08-16 16:17:00 +02:00
Peter Eisentraut	1e7ca1189c	Improved CREATE SUBSCRIPTION message for clarity Discussion: https://www.postgresql.org/message-id/CAHut+PtfzQ7JRkb0-Y_UejAxaLQ17-bGMvV4MJJHcPoP3ML2bg@mail.gmail.com	2023-08-16 15:27:42 +02:00
Peter Eisentraut	78806a9509	Remove incorrect field from information schema The source code comment already said that the presence of the field element_types.domain_default might be a bug in the standard, since it never made sense there. Indeed, the field is gone in newer versions of the standard. So just remove it.	2023-08-16 13:46:26 +02:00
John Naylor	c9bfa40914	Split out tiebreaker comparisons from comparetup_* functions Previously, if a specialized comparator found equal datum1 keys, the "comparetup" function would repeat the comparison on the datum before proceeding with the unabbreviated first key and/or additional sort keys. Move comparing additional sort keys into "tiebreak" functions so that specialized comparators can call these directly if needed, avoiding duplicate work. Reviewed by David Rowley Discussion: https://postgr.es/m/CAFBsxsGaVfUrjTghpf%3DkDBYY%3DjWx1PN-fuusVe7Vw5s0XqGdGw%40mail.gmail.com	2023-08-16 17:15:07 +07:00
Etsuro Fujita	9e9931d2bf	Re-allow FDWs and custom scan providers to replace joins with pseudoconstant quals. This was disabled in commit `6f80a8d9c` due to the lack of support for handling of pseudoconstant quals assigned to replaced joins in createplan.c. To re-allow it, this patch adds the support by 1) modifying the ForeignPath and CustomPath structs so that if they represent foreign and custom scans replacing a join with a scan, they store the list of RestrictInfo nodes to apply to the join, as in JoinPaths, and by 2) modifying create_scan_plan() in createplan.c so that it uses that list in that case, instead of the baserestrictinfo list, to get pseudoconstant quals assigned to the join, as mentioned in the commit message for that commit. Important item for the release notes: this is non-backwards-compatible since it modifies the ForeignPath and CustomPath structs, as mentioned above, and changes the argument lists for FDW helper functions create_foreignscan_path(), create_foreign_join_path(), and create_foreign_upper_path(). Richard Guo, with some additional changes by me, reviewed by Nishant Sharma, Suraj Kharage, and Richard Guo. Discussion: https://postgr.es/m/CADrsxdbcN1vejBaf8a%2BQhrZY5PXL-04mCd4GDu6qm6FigDZd6Q%40mail.gmail.com	2023-08-15 16:45:00 +09:00
Thomas Munro	5ffb7c7750	De-pessimize ConditionVariableCancelSleep(). Commit `b91dd9de` was concerned with a theoretical problem with our non-atomic condition variable operations. If you stop sleeping, and then cancel the sleep in a separate step, you might be signaled in between, and that could be lost. That doesn't matter for callers of ConditionVariableBroadcast(), but callers of ConditionVariableSignal() might be upset if a signal went missing like this. Commit `bc971f4025` interacted badly with that logic, because it doesn't use ConditionVariableSleep(), which would normally put us back in the wait list. ConditionVariableCancelSleep() would be confused and think we'd received an extra signal, and try to forward it to another backend, resulting in wakeup storms. New idea: ConditionVariableCancelSleep() can just return true if we've been signaled. Hypothetical users of ConditionVariableSignal() would then still have a way to deal with rare lost signals if they are concerned about that problem. Back-patch to 16, where `bc971f4025` arrived. Reported-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/2840876b-4cfe-240f-0a7e-29ffd66711e7%40enterprisedb.com	2023-08-15 10:23:47 +12:00
Andres Freund	82a4edabd2	hio: Take number of prior relation extensions into account The new relation extension logic, introduced in `00d1e02be2`, could lead to slowdowns in some scenarios. E.g., when loading narrow rows into a table using COPY, the caller of RelationGetBufferForTuple() will only request a small number of pages. Without concurrency, we just extended using pwritev() in that case. However, if there is some concurrency, we switched between extending by a small number of pages and a larger number of pages, depending on the number of waiters for the relation extension logic. However, some filesystems, XFS in particular, do not perform well when switching between extending files using fallocate() and pwritev(). To avoid that issue, remember the number of prior relation extensions in BulkInsertState and extend more aggressively if there were prior relation extensions. That not just avoids the aforementioned slowdown, but also leads to noticeable performance gains in other situations, primarily due to extending more aggressively when there is no concurrency. I should have done it this way from the get go. Reported-by: Masahiko Sawada <sawada.mshk@gmail.com> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAD21AoDvDmUQeJtZrau1ovnT_smN940=Kp6mszNGK3bq9yRN6g@mail.gmail.com Backpatch: 16-, where the new relation extension code was added	2023-08-14 11:33:09 -07:00
Michael Paquier	af720b4c50	Change custom wait events to use dynamic shared hash tables Currently, the names of the custom wait event must be registered for each backend, requiring all these to link to the shared memory area of an extension, even if these are not loaded with shared_preload_libraries. This patch relaxes the constraints related to this infrastructure by storing the wait events and their names in two dynamic hash tables in shared memory. This has the advantage to simplify the registration of custom wait events to a single routine call that returns an event ID ready for consumption: uint32 WaitEventExtensionNew(const char *wait_event_name); The caller of this routine can then cache locally the ID returned, to be used for pgstat_report_wait_start(), WaitLatch() or a similar routine. The implementation uses two hash tables: one with a key based on the event name to avoid duplicates and a second using the event ID as key for event lookups, like on pg_stat_activity. These tables can hold a minimum of 16 entries, and a maximum of 128 entries, which should be plenty enough. The code changes done in worker_spi show how things are simplified (most of the code removed in this commit comes from there): - worker_spi_init() is gone. - No more shared memory hooks required (size requested and initialization). - The custom wait event ID is cached in the process that needs to set it, with one single call to WaitEventExtensionNew() to retrieve it. Per suggestion from Andres Freund. Author: Masahiro Ikeda, with a few tweaks from me. Discussion: https://postgr.es/m/20230801032349.aaiuvhtrcvvcwzcx@awork3.anarazel.de	2023-08-14 14:47:27 +09:00
Amit Kapila	2a8b40e368	Simplify determining logical replication worker types. We deduce a LogicalRepWorker's type from the values of several different fields ('relid' and 'leader_pid') whenever logic needs to know it. In fact, the logical replication worker type is already known at the time of launching the LogicalRepWorker and it never changes for the lifetime of that process. Instead of deducing the type, it is simpler to just store it one time, and access it directly thereafter. Author: Peter Smith Reviewed-by: Amit Kapila, Bharath Rupireddy Discussion: http://postgr.es/m/CAHut+PttPSuP0yoZ=9zLDXKqTJ=d0bhxwKaEaNcaym1XqcvDEg@mail.gmail.com	2023-08-14 08:38:03 +05:30
Noah Misch	c36b636096	Fix off-by-one in XLogRecordMaxSize check. pg_logical_emit_message(false, '_', repeat('x', 1069547465)) failed with self-contradictory message "WAL record would be 1069547520 bytes (of maximum 1069547520 bytes)". There's no particular benefit from allowing or denying one byte in either direction; XLogRecordMaxSize could rise a few megabytes without trouble. Hence, this is just for cleanliness. Back-patch to v16, where this check first appeared.	2023-08-12 14:37:05 -07:00
Michael Paquier	638d42a3c5	Show GIDs of two-phase commit commands as constants in pg_stat_statements This relies on the "location" field added to TransactionStmt in `31de7e6`, now applied to the "gid" field used by 2PC commands. These commands are now reported like: COMMIT PREPARED $1 PREPARE TRANSACTION $1 ROLLBACK PREPARED $1 Applying constants for these commands is a huge advantage for workloads that rely a lot on 2PC commands with different GIDs. Some tests are added to track the new behavior. Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/ZMhT9kNtJJsHw6jK@paquier.xyz	2023-08-12 10:44:15 +09:00
Michael Paquier	5dc456b7dd	Fix code indentation violations introduced by recent commit The two culprit commits are `5765cfe` and `5e0c761`. Per buildfarm member koel for the first commit, while I have noticed the second one in passing.	2023-08-11 20:43:34 +09:00
Jeff Davis	5765cfe18c	Transform proconfig for faster execution. Store function config settings in lists to avoid the need to parse and allocate for each function execution. Speedup is modest but significant. Additionally, this change also seems cleaner and supports some other performance improvements under discussion. Discussion: https://postgr.es/m/04c8592dbd694e4114a3ed87139a7a04e4363030.camel@j-davis.com Reviewed-by: Nathan Bossart	2023-08-10 12:43:53 -07:00
Alvaro Herrera	b57cfb439b	Document RelationGetIndexAttrBitmap better Commit `19d8e2308b` changed the list of set-of-columns that can be returned by RelationGetIndexAttrBitmap, but didn't update its "documentation". That was pretty hard to read already, so rewrite to make it more comprehensible, adding the missing values while at it. Backpatch to 16, like that commit. Discussion: https://postgr.es/m/20230809091155.7c7f3gttjk3dj4ze@alvherre.pgsql Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>	2023-08-10 12:04:07 +02:00
Jeff Davis	fa2e874946	Recalculate search_path after ALTER ROLE. Renaming a role can affect the meaning of the special string $user, so must cause search_path to be recalculated. Discussion: https://postgr.es/m/186761d32c0255debbdf50b6310b581b9c973e6c.camel@j-davis.com Reviewed-by: Nathan Bossart, Michael Paquier Backpatch-through: 11	2023-08-09 13:09:25 -07:00
Alvaro Herrera	c27f8621ee	struct PQcommMethods: use C99 designated initializers As in `98afa68d93`, `2c86077765`, et al.	2023-08-09 11:30:59 +02:00
Michael Paquier	f05b1fa1ff	doc: Fix incorrect entries generated from wait_event_names.txt `fa88928` has introduced wait_event_names.txt, and some of its entries had some documentation fields with more information than necessary. This commit brings back the description of all the wait events to be consistent with the older stable branches. Five descriptions were incorrect. Author: Bertrand Drouvot Discussion: https://postgr.es/m/e378989e-1899-643a-dec1-10f691a0a105@gmail.com	2023-08-08 08:17:53 +09:00
Noah Misch	cd5f2a3570	Reject substituting extension schemas or owners matching ["$'\]. Substituting such values in extension scripts facilitated SQL injection when @extowner@, @extschema@, or @extschema:...@ appeared inside a quoting construct (dollar quoting, '', or ""). No bundled extension was vulnerable. Vulnerable uses do appear in a documentation example and in non-bundled extensions. Hence, the attack prerequisite was an administrator having installed files of a vulnerable, trusted, non-bundled extension. Subject to that prerequisite, this enabled an attacker having database-level CREATE privilege to execute arbitrary code as the bootstrap superuser. By blocking this attack in the core server, there's no need to modify individual extensions. Back-patch to v11 (all supported versions). Reported by Micah Gate, Valerie Woolard, Tim Carey-Smith, and Christoph Berg. Security: CVE-2023-39417	2023-08-07 06:05:56 -07:00
Peter Eisentraut	2bdd7b262f	Translation updates Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 97398d714ace69f0c919984e160f429b6fd2300e	2023-08-07 12:39:30 +02:00
David Rowley	990c3650c5	Don't Memoize lateral joins with volatile join conditions The use of Memoize was already disabled in normal joins when the join conditions had volatile functions per the code in match_opclause_to_indexcol(). Ordinarily, the parameterization for the inner side of a nested loop will be an Index Scan or at least eventually lead to an index scan (perhaps nested several joins deep). However, for lateral joins, that's not the case and seq scans can be parameterized too, so we can't rely on match_opclause_to_indexcol(). Here we explicitly check the parameterization for volatile functions and don't consider the generation of a Memoize path when such functions are present. Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs49nHFnHbpepLsv_yF3qkpCS4BdB-v8HoJVv8_=Oat0u_w@mail.gmail.com Backpatch-through: 14, where Memoize was introduced	2023-08-07 22:14:21 +12:00
Dean Rasheed	c2e08b04c9	Fix RLS policy usage in MERGE. If MERGE executes an UPDATE action on a table with row-level security, the code incorrectly applied the WITH CHECK clauses from the target table's INSERT policies to new rows, instead of the clauses from the table's UPDATE policies. In addition, it failed to check new rows against the target table's SELECT policies, if SELECT permissions were required (likely to always be the case). In addition, if MERGE executes a DO NOTHING action for matched rows, the code incorrectly applied the USING clauses from the target table's DELETE policies to existing target tuples. These policies were applied as checks that would throw an error, if they did not pass. Fix this, so that a MERGE UPDATE action applies the same RLS policies as a plain UPDATE query with a WHERE clause, and a DO NOTHING action does not apply any RLS checks (other than adding clauses from SELECT policies to the join). Back-patch to v15, where MERGE was introduced. Dean Rasheed, reviewed by Stephen Frost. Security: CVE-2023-39418	2023-08-07 09:28:47 +01:00
David Rowley	fdd79d8992	Fix misleading comment in paraminfo_get_equal_hashops The comment mistakenly claimed the code was checking PlaceHolderVars for volatile functions when the code was actually checking lateral vars. Update the comment to reflect the reality of the code. Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs48HZGZOV85g0fx8z1qDx6NNKHexJPT2FCnKnZhxBWkd-A@mail.gmail.com	2023-08-07 18:16:46 +12:00
David Rowley	6c00d2c9d4	Tidy up join_search_one_level code The code in join_search_one_level was a bit convoluted. With a casual glance, you might think that other_rels_list was being set to something other than joinrels[1] when level == 2, however, joinrels[level - 1] is joinrels[1] when level == 2, so nothing special needs to happen to set other_rels_list. Let's clean that up to avoid confusing anyone. In passing, we may as well modernize the loop in make_rels_by_clause_joins() and instead of passing in the ListCell to start looping from, let's just pass in the index where to start from and make use of for_each_from(). Ever since `1cff1b95a`, Lists are arrays under the hood. lnext() and list_head() both seem a little too linked-list like. Author: Alex Hsieh, David Rowley, Richard Guo Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/CANWNU8x9P9aCXGF%3DaT-A_8mLTAT0LkcZ_ySYrGbcuHzMQw2-1g%40mail.gmail.com	2023-08-06 21:51:54 +12:00
Amit Kapila	81ccbe520f	Simplify some of the logical replication worker-type checks. Author: Peter Smith Reviewed-by: Hou Zhijie Discussion: http://postgr.es/m/CAHut+Pv-xkEpuPzbEJ=ZSi7Hp2RoGJf=VA-uDRxLi1KHSneFjg@mail.gmail.com	2023-08-04 08:15:07 +05:30
David Rowley	3fd19a9b23	Minor adjustments to WindowAgg startup cost code This is a follow-on of `3900a02c9` containing some changes which I forgot to commit locally before forming a patch with git format-patch. Discussion: https://postgr.es/m/CAApHDvrB0S5BMv+0-wTTqWFE-BJ0noWqTnDu9QQfjZ2VSpLv_g@mail.gmail.com	2023-08-04 10:47:54 +12:00
David Rowley	3900a02c97	Account for startup rows when costing WindowAggs Here we adjust the costs for WindowAggs so that they properly take into account how much of their subnode they must read before outputting the first row. Without this, we always assumed that the startup cost for the WindowAgg was not much more expensive than the startup cost of its subnode, however, that's going to be completely wrong in many cases. The WindowAgg may have to read all of its subnode to output a single row with certain window bound options. Here we estimate how many rows we'll need to read from the WindowAgg's subnode and proportionally add more of the subnode's run costs onto the WindowAgg's startup costs according to how much of it we expect to have to read in order to produce the first WindowAgg row. The reason this is more important than we might have initially thought is that we may end up making use of a path from the lower planner that works well as a cheap startup plan when the query has a LIMIT clause, however, the WindowAgg might mean we need to read far more rows than what the LIMIT specifies. No backpatch on this so as not to cause plan changes in released versions. Bug: #17862 Reported-by: Tim Palmer Author: David Rowley Reviewed-by: Andy Fan Discussion: https://postgr.es/m/17862-1ab8f74b0f7b0611@postgresql.org Discussion: https://postgr.es/m/CAApHDvrB0S5BMv+0-wTTqWFE-BJ0noWqTnDu9QQfjZ2VSpLv_g@mail.gmail.com	2023-08-04 09:27:38 +12:00
Amit Kapila	02c1b64fb1	Refactor to split Apply and Tablesync Workers code. Both apply and tablesync workers were using ApplyWorkerMain() as entry point. As the name implies, ApplyWorkerMain() should be considered as the main function for apply workers. Tablesync worker's path was hidden and does not have enough in common to share the same main function with apply worker. Also, most of the code shared by both worker types is already combined in LogicalRepApplyLoop(). There is no need to combine the rest in ApplyWorkerMain() anymore. This patch introduces TablesyncWorkerMain() as a new entry point for tablesync workers. This aims to increase code readability and would help with future improvements like the reuse of tablesync workers in the initial synchronization. Author: Melih Mutlu based on suggestions by Melanie Plageman Reviewed-by: Peter Smith, Kuroda Hayato, Amit Kapila Discussion: http://postgr.es/m/CAGPVpCTq=rUDd4JUdaRc1XUWf4BrH2gdSNf3rtOMUGj9rPpfzQ@mail.gmail.com	2023-08-03 08:59:50 +05:30
Masahiko Sawada	0125c4e21d	Fix ReorderBufferCheckMemoryLimit() comment. Commit `7259736a6` updated the comment but it was not correct since ReorderBufferLargestStreamableTopTXN() returns only top-level transactions. Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CAD21AoA9XB7OR86BqvrCe2dMYX%2BZv3-BvVmjF%3DGY2z6jN-kqjg%40mail.gmail.com Backpatch-through: 14	2023-08-02 15:01:13 +09:00
David Rowley	3845577cb5	Fix performance regression in pg_strtointNN_safe functions Between `6fcda9aba` and `1b6f632a3`, the pg_strtoint functions became quite a bit slower in v16, despite efforts in `6b423ec67` to speed these up. Since the majority of cases for these functions will only contain base-10 digits, perhaps prefixed by a '-', it makes sense to have a special case for this and just fall back on the more complex version which processes hex, octal, binary and underscores if the fast path version fails to parse the string. While we're here, update the header comments for these functions to mention that hex, octal and binary formats along with underscore separators are now supported. Author: Andres Freund, David Rowley Reported-by: Masahiko Sawada Reviewed-by: Dean Rasheed, John Naylor Discussion: https://postgr.es/m/CAD21AoDvDmUQeJtZrau1ovnT_smN940%3DKp6mszNGK3bq9yRN6g%40mail.gmail.com Backpatch-through: 16, where `6fcda9aba` and `1b6f632a3` were added	2023-08-02 12:05:41 +12:00
David Rowley	deae1657ee	Fix overly strict Assert in jsonpath code This was failing for queries which try to get the .type() of a jpiLikeRegex. For example: select jsonb_path_query('["string", "string"]', '($[0] like_regex ".{7}").type()'); Reported-by: Alexander Kozhemyakin Bug: #18035 Discussion: https://postgr.es/m/18035-64af5cdcb5adf2a9@postgresql.org Backpatch-through: 12, where SQL/JSON path was added.	2023-08-02 01:39:47 +12:00
Noah Misch	d3a38318ac	Rename OverrideSearchPath to SearchPathMatcher. The previous commit removed the "override" APIs. Surviving APIs facilitate plancache.c to snapshot search_path and test whether the current value equals a remembered snapshot. Aleksander Alekseev. Reported by Alexander Lakhin and Noah Misch. Discussion: https://postgr.es/m/8ffb4650-52c4-6a81-38fc-8f99be981130@gmail.com	2023-07-31 17:04:47 -07:00
Noah Misch	7c5c4e1c03	Remove PushOverrideSearchPath() and PopOverrideSearchPath(). Since commit `681d9e4621`, they have no in-tree calls. Any new calls would introduce security vulnerabilities like the one fixed in that commit. Alexander Lakhin, reviewed by Aleksander Alekseev. Discussion: https://postgr.es/m/8ffb4650-52c4-6a81-38fc-8f99be981130@gmail.com	2023-07-31 17:04:47 -07:00
Michael Paquier	c9af054653	Support custom wait events for wait event type "Extension" Two backend routines are added to allow extension to allocate and define custom wait events, all of these being allocated in the type "Extension": * WaitEventExtensionNew(), that allocates a wait event ID computed from a counter in shared memory. * WaitEventExtensionRegisterName(), to associate a custom string to the wait event ID allocated. Note that this includes an example of how to use this new facility in worker_spi with tests in TAP for various scenarios, and some documentation about how to use them. Any code in the tree that currently uses WAIT_EVENT_EXTENSION could switch to this new facility to define custom wait events. This is left as work for future patches. Author: Masahiro Ikeda Reviewed-by: Andres Freund, Michael Paquier, Tristan Partin, Bharath Rupireddy Discussion: https://postgr.es/m/b9f5411acda0cf15c8fbb767702ff43e@oss.nttdata.com	2023-07-31 17:09:24 +09:00
Michael Paquier	7395a90db8	Add WAIT_EVENT_{CLASS,ID}_MASK in wait_event.c These are useful to extract the class ID and the event ID associated to a single uint32 wait_event_info. Only two code paths use them now, but an upcoming patch will extend their use. Discussion: https://postgr.es/m/ZMcJ7F7nkGkIs8zP@paquier.xyz	2023-07-31 16:14:47 +09:00
Etsuro Fujita	6f80a8d9c1	Disallow replacing joins with scans in problematic cases. Commit `e7cb7ee14`, which introduced the infrastructure for FDWs and custom scan providers to replace joins with scans, failed to add support handling of pseudoconstant quals assigned to replaced joins in createplan.c, leading to an incorrect plan without a gating Result node when postgres_fdw replaced a join with such a qual. To fix, we could add the support by 1) modifying the ForeignPath and CustomPath structs to store the list of RestrictInfo nodes to apply to the join, as in JoinPaths, if they represent foreign and custom scans replacing a join with a scan, and by 2) modifying create_scan_plan() in createplan.c to use that list in that case, instead of the baserestrictinfo list, to get pseudoconstant quals assigned to the join; but #1 would cause an ABI break. So fix by modifying the infrastructure to just disallow replacing joins with such quals. Back-patch to all supported branches. Reported by Nishant Sharma. Patch by me, reviewed by Nishant Sharma and Richard Guo. Discussion: https://postgr.es/m/CADrsxdbcN1vejBaf8a%2BQhrZY5PXL-04mCd4GDu6qm6FigDZd6Q%40mail.gmail.com	2023-07-28 15:45:00 +09:00
Tom Lane	38df84c65e	Eliminate fixed token-length limit in hba.c. Historically, hba.c limited tokens in the authentication configuration files (pg_hba.conf and pg_ident.conf) to less than 256 bytes. We have seen a few reports of this limit causing problems; notably, for moderately-complex LDAP configurations. Let's get rid of the fixed limit by using a StringInfo instead of a fixed-size buffer. This actually takes less code than before, since we can get rid of a nontrivial error recovery stanza. It's doubtless a hair slower, but parsing the content of the HBA files should in no way be performance-critical. Although this is a pretty straightforward patch, it doesn't seem worth the risk to back-patch given the small number of complaints to date. In released branches, we'll just raise MAX_TOKEN to ameliorate the problem. Discussion: https://postgr.es/m/1588937.1690221208@sss.pgh.pa.us	2023-07-27 11:56:35 -04:00
David Rowley	b635ac03e8	Fix performance problem with new COPY DEFAULT code `9f8377f7a` added code to allow COPY FROM insert a column's default value when the input matches the DEFAULT string specified in the COPY command. Here we fix some inefficient code which needlessly palloc0'd an array to store if we should use the default value or input value for the given column. This array was being palloc0'd and pfree'd once per row. It's much more efficient to allocate this once and just reset the values once per row. Reported-by: Masahiko Sawada Author: Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoDvDmUQeJtZrau1ovnT_smN940%3DKp6mszNGK3bq9yRN6g%40mail.gmail.com Backpatch-through: 16, where `9f8377f7a` was introduced.	2023-07-27 14:47:05 +12:00
Michael Paquier	f6a84546b1	Add sanity asserts for index OID and attnums during cache init There was already a check on the relation OID, but not its index OID and the attributes that can be used during the syscache lookups. The two assertions added by this commit are cheap, and actually useful for developers to fasten the detection of incorrect data in a new entry added in the syscache list, as these assertions are triggered during the initial cache loading (initdb, or just backend startup), not requiring a syscache that uses the new entry. While on it, the relation OID check is switched to use OidIsValid(). Author: Aleksander Alekseev Reviewed-by: Dagfinn Ilmari Mannsåker, Zhang Mingli, Michael Paquier Discussion: https://postgr.es/m/CAJ7c6TOjUTJ0jxvWY6oJeP2-840OF8ch7qscZQsuVuotXTOS_g@mail.gmail.com	2023-07-27 10:55:16 +09:00
Michael Paquier	31de7e60da	Show savepoint names as constants in pg_stat_statements In pg_stat_statements, savepoint names now show up as constants with a parameter symbol, using as base query string the one added as a new entry to the PGSS hash table, leading to: RELEASE $1 ROLLBACK TO $1 SAVEPOINT $1 Applying constants to these query parts is a huge advantage for workloads that generate randomly savepoint points, like ORMs (Django is at the origin of this patch). The ODBC driver is a second layer that likes a lot savepoints, though it does not use a random naming pattern. A "location" field is added to TransactionStmt, now set only for savepoints. The savepoint name is ignored by the query jumbling. The location can be extended to other query patterns, if required, like 2PC commands. Some tests are added to pg_stat_statements for all the query patterns supported by the parser. ROLLBACK, ROLLBACK TO SAVEPOINT and ROLLBACK TRANSACTION TO SAVEPOINT have the same Node representation, so all these are equivalents. The same happens for RELEASE and RELEASE SAVEPOINT. Author: Greg Sabino Mullane Discussion: https://postgr.es/m/CAKAnmm+2s9PA4OaumwMJReWHk8qvJ_-g1WqxDRDAN1BSUfxyTw@mail.gmail.com	2023-07-27 09:42:33 +09:00
Amit Langote	03734a7fed	Add more SQL/JSON constructor functions This Patch introduces three SQL standard JSON functions: JSON() JSON_SCALAR() JSON_SERIALIZE() JSON() produces json values from text, bytea, json or jsonb values, and has facilitites for handling duplicate keys. JSON_SCALAR() produces a json value from any scalar sql value, including json and jsonb. JSON_SERIALIZE() produces text or bytea from input which containis or represents json or jsonb; For the most part these functions don't add any significant new capabilities, but they will be of use to users wanting standard compliant JSON handling. Catversion bumped as this changes ruleutils.c. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Andrew Dunstan <andrew@dunslane.net> Author: Amit Langote <amitlangote09@gmail.com> Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby, Álvaro Herrera, Peter Eisentraut Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com	2023-07-26 17:08:33 +09:00
Amit Langote	254ac5a7c3	Rename a nonterminal used in SQL/JSON grammar This renames json_output_clause_opt to json_returning_clause_opt, because the new name makes more sense given that the governing keyword is RETURNING. Per suggestion from Álvaro Herrera. Discussion: https://postgr.es/m/20230707122820.wy5zlmhn4tdzbojl%40alvherre.pgsql	2023-07-26 17:06:09 +09:00
Amit Langote	b22391a2ff	Some refactoring to export json(b) conversion functions This is to export datum_to_json(), datum_to_jsonb(), and jsonb_from_cstring(), though the last one is exported as jsonb_from_text(). A subsequent commit to add new SQL/JSON constructor functions will need them for calling from the executor. Discussion: https://postgr.es/m/20230720160252.ldk7jy6jqclxfxkq%40alvherre.pgsql	2023-07-26 17:06:03 +09:00
Masahiko Sawada	bd88404d3c	Fix crash with RemoveFromWaitQueue() when detecting a deadlock. Commit `5764f611e` used dclist_delete_from() to remove the proc from the wait queue. However, since it doesn't clear dist_node's next/prev to NULL, it could call RemoveFromWaitQueue() twice: when the process detects a deadlock and then when cleaning up locks on aborting the transaction. The waiting lock information is cleared in the first call, so it led to a crash in the second call. Backpatch to v16, where the change was introduced. Bug: #18031 Reported-by: Justin Pryzby, Alexander Lakhin Reviewed-by: Andres Freund Discussion: https://postgr.es/m/ZKy4AdrLEfbqrxGJ%40telsasoft.com Discussion: https://postgr.es/m/18031-ebe2d08cb405f6cc@postgresql.org Backpatch-through: 16	2023-07-26 14:41:26 +09:00
Michael Paquier	66d86d4201	Document more assumptions of LWLock variable changes with WAL inserts This commit adds a few comments about what LWLockWaitForVar() relies on when a backend waits for a variable update on its LWLocks for WAL insertions up to an expected LSN. First, LWLockWaitForVar() does not include a memory barrier, relying on a spinlock taken at the beginning of WaitXLogInsertionsToFinish(). This was hidden behind two layers of routines in lwlock.c. This assumption is now documented at the top of LWLockWaitForVar(), and detailed at bit more within LWLockConflictsWithVar(). Second, document why WaitXLogInsertionsToFinish() does not include memory barriers, relying on a spinlock at its top, which is, per Andres' input, fine for two different reasons, both depending on the fact that the caller of WaitXLogInsertionsToFinish() is waiting for a LSN up to a certain value. This area's documentation and assumptions could be improved more in the future, but at least that's a beginning. Author: Bharath Rupireddy, Andres Freund Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CALj2ACVF+6jLvqKe6xhDzCCkr=rfd6upaGc3477Pji1Ke9G7Bg@mail.gmail.com	2023-07-26 12:06:04 +09:00
Amit Kapila	62e9af4c63	Fix code indentation vioaltion introduced in commit `d38ad8e31d`. Per buildfarm member koel Discussion: https://postgr.es/m/ZL9bsGhthne6FaVV@paquier.xyz	2023-07-25 12:35:58 +05:30
Masahiko Sawada	d0ce9d0bc7	Remove unnecessary checks for indexes for REPLICA IDENTITY FULL tables. Previously, when selecting an usable index for update/delete for the REPLICA IDENTITY FULL table, in IsIndexOnlyExpression(), we used to check if all index fields are not expressions. However, it was not necessary, because it is enough to check if only the leftmost index field is not an expression (and references the remote table column) and this check has already been done by RemoteRelContainsLeftMostColumnOnIdx(). This commit removes IsIndexOnlyExpression() and RemoteRelContainsLeftMostColumnOnIdx() and all checks for usable indexes for REPLICA IDENTITY FULL tables are now performed by IsIndexUsableForReplicaIdentityFull(). Backpatch this to remain the code consistent. Reported-by: Peter Smith Reviewed-by: Amit Kapila, Önder Kalacı Discussion: https://postgr.es/m/CAHut%2BPsGRE5WSsY0jcLHJEoA17MrbP9yy8FxdjC_ZOAACxbt%2BQ%40mail.gmail.com Backpatch-through: 16	2023-07-25 15:09:34 +09:00
Michael Paquier	71e4cc6b8e	Optimize WAL insertion lock acquisition and release with some atomics The WAL insertion lock variable insertingAt is currently being read and written with the help of the LWLock wait list lock to avoid any read of torn values. This wait list lock can become a point of contention on a highly concurrent write workloads. This commit switches insertingAt to a 64b atomic variable that provides torn-free reads/writes. On platforms without 64b atomic support, the fallback implementation uses spinlocks to provide the same guarantees for the values read. LWLockWaitForVar(), through LWLockConflictsWithVar(), reads the new value to check if it still needs to wait with a u64 atomic operation. LWLockUpdateVar() updates the variable before waking up the waiters with an exchange_u64 (full memory barrier). LWLockReleaseClearVar() now uses also an exchange_u64 to reset the variable. Before this commit, all these steps relied on LWLockWaitListLock() and LWLockWaitListUnlock(). This reduces contention on LWLock wait list lock and improves performance of highly-concurrent write workloads. Here are some numbers using pg_logical_emit_message() (HEAD at `d6677b93`) with various arbitrary record lengths and clients up to 1k on a rather-large machine (64 vCPUs, 512GB of RAM, 16 cores per sockets, 2 sockets), in terms of TPS numbers coming from pgbench: message_size_b \| 16 \| 64 \| 256 \| 1024 --------------------+--------+--------+--------+------- patch_4_clients \| 83830 \| 82929 \| 80478 \| 73131 patch_16_clients \| 267655 \| 264973 \| 250566 \| 213985 patch_64_clients \| 380423 \| 378318 \| 356907 \| 294248 patch_256_clients \| 360915 \| 354436 \| 326209 \| 263664 patch_512_clients \| 332654 \| 321199 \| 287521 \| 240128 patch_1024_clients \| 288263 \| 276614 \| 258220 \| 217063 patch_2048_clients \| 252280 \| 243558 \| 230062 \| 192429 patch_4096_clients \| 212566 \| 213654 \| 205951 \| 166955 head_4_clients \| 83686 \| 83766 \| 81233 \| 73749 head_16_clients \| 266503 \| 265546 \| 249261 \| 213645 head_64_clients \| 366122 \| 363462 \| 341078 \| 261707 head_256_clients \| 132600 \| 132573 \| 134392 \| 165799 head_512_clients \| 118937 \| 114332 \| 116860 \| 150672 head_1024_clients \| 133546 \| 115256 \| 125236 \| 151390 head_2048_clients \| 137877 \| 117802 \| 120909 \| 138165 head_4096_clients \| 113440 \| 115611 \| 120635 \| 114361 Bharath has been measuring similar improvements, where the limit of the WAL insertion lock begins to be felt when more than 256 concurrent clients are involved in this specific workload. An extra patch has been discussed to introduce a fast-exit path in LWLockUpdateVar() when there are no waiters, still this does not influence the write-heavy workload cases discussed as there are always waiters. This will be considered separately. Author: Bharath Rupireddy Reviewed-by: Nathan Bossart, Andres Freund, Michael Paquier Discussion: https://postgr.es/m/CALj2ACVF+6jLvqKe6xhDzCCkr=rfd6upaGc3477Pji1Ke9G7Bg@mail.gmail.com	2023-07-25 13:38:58 +09:00
Amit Kapila	d38ad8e31d	Fix the display of UNKNOWN message type in apply worker. We include the message type while displaying an error context in the apply worker. Now, while retrieving the message type string if the message type is unknown we throw an error that will hide the original error. So, instead, we need to simply return the string indicating an unknown message type. Reported-by: Ashutosh Bapat Author: Euler Taveira, Amit Kapila Reviewed-by: Ashutosh Bapat Backpatch-through: 15 Discussion: https://postgr.es/m/CAExHW5suAEDW-mBZt_qu4RVxWZ1vL54-L+ci2zreYWebpzxYsA@mail.gmail.com	2023-07-25 09:12:29 +05:30
Andres Freund	f3bc519288	Fix off-by-one in LimitAdditionalPins() Due to the bug LimitAdditionalPins() could return 0, violating LimitAdditionalPins()'s API ("One additional pin is always allowed"). This could be hit when setting shared_buffers very low and using a fair amount of concurrency. This bug was introduced in `31966b151e`. Author: "Anton A. Melnikov" <aamelnikov@inbox.ru> Reported-by: "Anton A. Melnikov" <aamelnikov@inbox.ru> Reported-by: Victoria Shepard Discussion: https://postgr.es/m/ae46f2fb-5586-3de0-b54b-1bb0f6410ebd@inbox.ru Backpatch: 16-	2023-07-24 19:07:52 -07:00
Tom Lane	bda97e47af	Avoid compiler warning in non-assert builds. After `3c90dcd03`, try_partitionwise_join's child_joinrelids variable is read only in an Assert, provoking a compiler warning in non-assert builds. Rearrange code to avoid the warning and eliminate unnecessary work in the non-assert case. Per CI testing (via Jeff Davis and Bharath Rupireddy) Discussion: https://postgr.es/m/ef0de9713e605451f1b60b30648c5ee900b2394c.camel@j-davis.com	2023-07-22 10:32:52 -04:00
Tom Lane	3c90dcd039	Fix calculation of relid sets for partitionwise child joins. Applying add_outer_joins_to_relids() to a child join doesn't actually work, even if we've built a SpecialJoinInfo specialized to the child, because that function will also compare the join's relids to elements of the main join_info_list, which only deal in regular relids not child relids. This mistake escaped detection by the existing partitionwise join tests because they didn't test any cases where add_outer_joins_to_relids() needs to add additional OJ relids (that is, any cases where join reordering per identity 3 is possible). Instead, let's apply adjust_child_relids() to the relids of the parent join. This requires minor code reordering to collect the relevant AppendRelInfo structures first, but that's work we'd do shortly anyway. Report and fix by Richard Guo; cosmetic changes by me Discussion: https://postgr.es/m/CAMbWs49NCNbyubZWgci3o=_OTY=snCfAPtMnM-32f3mm-K-Ckw@mail.gmail.com	2023-07-21 12:00:14 -04:00
Amit Langote	7c7412cae3	Code review for commit `b6e1157e7d` `b6e1157e7d` made some changes to enforce that JsonValueExpr.formatted_expr is always set and is the expression that gives a JsonValueExpr its runtime value, but that's not really apparent from the comments about and the code manipulating formatted_expr. This commit fixes that. Per suggestion from Álvaro Herrera. Discussion: https://postgr.es/m/20230718155313.3wqg6encgt32adqb%40alvherre.pgsql	2023-07-21 19:15:34 +09:00
Tom Lane	9089287aa0	Guard against null plan pointer in CachedPlanIsSimplyValid(). If both the passed-in plan pointer and plansource->gplan are NULL, CachedPlanIsSimplyValid would think that the plan pointer is possibly-valid and try to dereference it. For the one extant call site in plpgsql, this situation doesn't normally happen which is why we've not noticed. However, it appears to be possible if the previous use of the cached plan failed, as per report from Justin Pryzby. Add an extra check to prevent crashing. Back-patch to v13 where this code was added. Discussion: https://postgr.es/m/ZLlV+STFz1l/WhAQ@telsasoft.com	2023-07-20 14:23:46 -04:00
Daniel Gustafsson	29a0ccbce9	Revert "Add notBefore and notAfter to SSL cert info display" Due to an oversight in reviewing, this used functionality not compatible with old versions of OpenSSL. This reverts commit `75ec5e7bec`.	2023-07-20 17:18:12 +02:00
Daniel Gustafsson	75ec5e7bec	Add notBefore and notAfter to SSL cert info display This adds the X509 attributes notBefore and notAfter to sslinfo as well as pg_stat_ssl to allow verifying and identifying the validity period of the current client certificate. Author: Cary Huang <cary.huang@highgo.ca> Discussion: https://postgr.es/m/182b8565486.10af1a86f158715.2387262617218380588@highgo.ca	2023-07-20 17:07:32 +02:00
Amit Langote	3c152a27b0	Unify JSON categorize type API and export for external use This essentially removes the JsonbTypeCategory enum and jsonb_categorize_type() and integrates any jsonb-specific logic that was in jsonb_categorize_type() into json_categorize_type(), now moved to jsonfuncs.c. The remaining JsonTypeCategory enum and json_categorize_type() cover the needs of the callers in both json.c and jsonb.c. json_categorize_type() has grown a new parameter named is_jsonb for callers to engage the jsonb-specific behavior of json_categorize_type(). One notable change in the now exported API of json_categorize_type() is that it now always returns *outfuncoid even though a caller may have no need currently to see one. This is in preparation of later commits to implement additional SQL/JSON functions. Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com	2023-07-20 16:19:56 +09:00
Michael Paquier	2a990abd79	Add missing ObjectIdGetDatum() in syscache lookup calls for Oids Based on how postgres.h foes the Oid <-> Datum conversion, there is no existing bugs but let's be consistent. 17 spots have been noticed as incorrectly passing down Oids rather than Datums. Aleksander got one, Zhang two and I the rest. Author: Michael Paquier, Aleksander Alekseev, Zhang Mingli Discussion: https://postgr.es/m/ZLUhqsqQN1MOaxdw@paquier.xyz	2023-07-20 15:18:25 +09:00
Nathan Bossart	cdaedfc96d	Support parenthesized syntax for CLUSTER without a table name. `b5913f6120` added a parenthesized syntax for CLUSTER, but it requires specifying a table name. This is unlike commands such as VACUUM and ANALYZE, which do not require specifying a table in the parenthesized syntax. This change resolves this inconsistency. This is preparatory work for a follow-up commit that will move the unparenthesized syntax to the "Compatibility" section of the CLUSTER documentation. Reviewed-by: Melanie Plageman, Michael Paquier Discussion: https://postgr.es/m/CAAKRu_bc5uHieG1976kGqJKxyWtyQt9yvktjsVX%2Bi7NOigDjOA%40mail.gmail.com	2023-07-19 15:26:52 -07:00
Nathan Bossart	018b61f81b	Rearrange CLUSTER rules in gram.y. This change moves the unparenthesized syntax for CLUSTER to the end of the ClusterStmt rules in preparation for a follow-up commit that will move this syntax to the "Compatibility" section of the CLUSTER documentation. The documentation for the CLUSTER syntaxes has also been consolidated. Suggested-by: Melanie Plageman Discussion https://postgr.es/m/CAAKRu_bc5uHieG1976kGqJKxyWtyQt9yvktjsVX%2Bi7NOigDjOA%40mail.gmail.com	2023-07-19 15:26:43 -07:00
Michael Paquier	4e465aac36	Fix indentation in twophase.c This has been missed in `cb0cca1`, noticed before buildfarm member koel has been able to complain while poking at a different patch. Like the other commit, backpatch all the way down to limit the odds of merge conflicts. Backpatch-through: 11	2023-07-18 14:04:31 +09:00
Michael Paquier	cb0cca1880	Fix recovery of 2PC transaction during crash recovery A crash in the middle of a checkpoint with some two-phase state data already flushed to disk by this checkpoint could cause a follow-up crash recovery to recover twice the same transaction, once from what has been found in pg_twophase/ at the beginning of recovery and a second time when replaying its corresponding record. This would lead to FATAL failures in the startup process during recovery, where the same transaction would have a state recovered twice instead of once: LOG: recovering prepared transaction 731 from shared memory LOG: recovering prepared transaction 731 from shared memory FATAL: lock ExclusiveLock on object 731/0/0 is already held This issue is fixed by skipping the addition of any 2PC state coming from a record whose equivalent 2PC state file has already been loaded in TwoPhaseState at the beginning of recovery by restoreTwoPhaseData(), which is OK as long as the system has not reached a consistent state. The timing to get a messed up recovery processing is very racy, and would very unlikely happen. The thread that has reported the issue has demonstrated the bug using injection points to force a PANIC in the middle of a checkpoint. Issue introduced in `728bd99`, so backpatch all the way down. Reported-by: "suyu.cmj" <mengjuan.cmj@alibaba-inc.com> Author: "suyu.cmj" <mengjuan.cmj@alibaba-inc.com> Author: Michael Paquier Discussion: https://postgr.es/m/109e6994-b971-48cb-84f6-829646f18b4c.mengjuan.cmj@alibaba-inc.com Backpatch-through: 11	2023-07-18 13:43:44 +09:00
Nathan Bossart	884eee5bfb	Remove db_user_namespace. This feature was intended to be a temporary measure to support per-database user names. A better one hasn't materialized in the ~21 years since it was added, and nobody claims to be using it, so let's just remove it. Reviewed-by: Michael Paquier, Magnus Hagander Discussion: https://postgr.es/m/20230630200509.GA2830328%40nathanxps13 Discussion: https://postgr.es/m/20230630215608.GD2941194%40nathanxps13	2023-07-17 11:44:59 -07:00
David Rowley	2c2eb0d6b2	Shrink memory contexts struct sizes Here we reduce the block size fields in AllocSetContext, GenerationContext and SlabContext from Size down to uint32. Ever since `c6e0fe1f2`, blocks for non-dedicated palloc chunks can no longer be larger than 1GB, so there's no need to store the various block size fields as 64-bit values. 32 bits are enough to store 2^30. Here we also further reduce the memory context struct sizes by getting rid of the 'keeper' field which stores a pointer to the context's keeper block. All the context types which have this field always allocate the keeper block in the same allocation as the memory context itself, so the keeper block always comes right at the end of the context struct. Add some macros to calculate that address rather than storing it in the context. Overall, in AllocSetContext and GenerationContext, this saves 20 bytes on 64-bit builds which for ALLOCSET_SMALL_SIZES can sometimes mean the difference between having to allocate a 2nd block and storing all the required allocations on the keeper block alone. Such contexts are used in relcache to store cache entries for indexes, of which there can be a large number in a single backend. Author: Melih Mutlu Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAGPVpCSOW3uJ1QJmsMR9_oE3X7fG_z4q0AoU4R_w+2RzvroPFg@mail.gmail.com	2023-07-17 11:16:56 +12:00
Tom Lane	e08d74ca13	Allow plan nodes with initPlans to be considered parallel-safe. If the plan itself is parallel-safe, and the initPlans are too, there's no reason anymore to prevent the plan from being marked parallel-safe. That restriction (dating to commit `ab77a5a45`) was really a special case of the fact that we couldn't transmit subplans to parallel workers at all. We fixed that in commit `5e6d8d2bb` and follow-ons, but this case never got addressed. We still forbid attaching initPlans to a Gather node that's inserted pursuant to debug_parallel_query = regress. That's because, when we hide the Gather from EXPLAIN output, we'd hide the initPlans too, causing cosmetic regression diffs. It seems inadvisable to kluge EXPLAIN to the extent required to make the output look the same, so just don't do it in that case. Along the way, this also takes care of some sloppiness about updating path costs to match when we move initplans from one place to another during createplan.c and setrefs.c. Since all the planning decisions are already made by that point, this is just cosmetic; but it seems good to keep EXPLAIN output consistent with where the initplans are. The diff in query_planner() might be worth remarking on. I found that one because after fixing things to allow parallel-safe initplans, one partition_prune test case changed plans (as shown in the patch) --- but only when debug_parallel_query was active. The reason proved to be that we only bothered to mark Result nodes as potentially parallel-safe when debug_parallel_query is on. This neglects the fact that parallel-safety may be of interest for a sub-query even though the Result itself doesn't parallelize. Discussion: https://postgr.es/m/1129530.1681317832@sss.pgh.pa.us	2023-07-14 11:41:20 -04:00
Tom Lane	d0d44049d1	Account for optimized MinMax aggregates during SS_finalize_plan. We are capable of optimizing MIN() and MAX() aggregates on indexed columns into subqueries that exploit the index, rather than the normal thing of scanning the whole table. When we do this, we replace the Aggref node(s) with Params referencing subquery outputs. Such Params really ought to be included in the per-plan-node extParam/allParam sets computed by SS_finalize_plan. However, we've never done so up to now because of an ancient implementation choice to perform that substitution during set_plan_references, which runs after SS_finalize_plan, so that SS_finalize_plan never sees these Params. This seems like clearly a bug, yet there have been no field reports of problems that could trace to it. This may be because the types of Plan nodes that could contain Aggrefs do not have any of the rescan optimizations that are controlled by extParam/allParam. Nonetheless it seems certain to bite us someday, so let's fix it in a self-contained patch that can be back-patched if we find a case in which there's a live bug pre-v17. The cleanest fix would be to perform a separate tree walk to do these substitutions before SS_finalize_plan runs. That seems unattractive, first because a whole-tree mutation pass is expensive, and second because we lack infrastructure for visiting expression subtrees in a Plan tree, so that we'd need a new function knowing as much as SS_finalize_plan knows about that. I also considered swapping the order of SS_finalize_plan and set_plan_references, but that fell foul of various assumptions that seem tricky to fix. So the approach adopted here is to teach SS_finalize_plan itself to check for such Aggrefs. I refactored things a bit in setrefs.c to avoid having three copies of the code that does that. Given the lack of any currently-known bug, no test case here. Discussion: https://postgr.es/m/2391880.1689025003@sss.pgh.pa.us	2023-07-14 11:41:20 -04:00
Tom Lane	b8d3dae00f	Improve error message for MaxAllocSize overrun in accumArrayResult. Before, if you went past about 64M array elements in array_agg() and allied functions, you got a generic "invalid memory alloc request size" error. This patch replaces that with "array size exceeds the maximum allowed", which seems more user-friendly since it points you to needing to reduce the size of your array result. (This is the same error text you'd get from construct_md_array in the event of overrunning the maximum physical size for the finished array.) Per question from Shaozhong Shi. Since this hasn't come up often, I don't feel a need to back-patch. Discussion: https://postgr.es/m/CA+i5JwYtVS9z2E71PcNKAVPbOn4R2wuj-LqbJsYr_XOz73q7dQ@mail.gmail.com	2023-07-14 10:35:49 -04:00
Amit Langote	00f2a2556c	Add missing initializations of p_perminfo In `a61b1f7482`, we failed to update transformFromClauseItem() and buildNSItemFromLists() to set ParseNamespaceItem.p_perminfo causing it to point to garbage. Pointed out by Tom Lane. Reported-by: Farias de Oliveira <matheusfarias519@gmail.com> Discussion: https://postgr.es/m/3173476.1689286373%40sss.pgh.pa.us Backpatch-through: 16	2023-07-14 14:56:35 +09:00
Nathan Bossart	a0363ab7aa	Fix privilege check for SET SESSION AUTHORIZATION. Presently, the privilege check for SET SESSION AUTHORIZATION checks whether the original authenticated role was a superuser at connection start time. Even if the role loses the superuser attribute, its existing sessions are permitted to change session authorization to any role. This commit modifies this privilege check to verify the original authenticated role currently has superuser. In the event that the authenticated role loses superuser within a session authorization change, the authorization change will remain in effect, which means the user can still take advantage of the target role's privileges. However, [RE]SET SESSION AUTHORIZATION will only permit switching to the original authenticated role. Author: Joseph Koshakow Discussion: https://postgr.es/m/CAAvxfHc-HHzONQ2oXdvhFF9ayRnidPwK%2BfVBhRzaBWYYLVQL-g%40mail.gmail.com	2023-07-13 21:13:45 -07:00
Nathan Bossart	9987a7bf34	Move privilege check for SET SESSION AUTHORIZATION. Presently, the privilege check for SET SESSION AUTHORIZATION is performed in session_authorization's assign_hook. A relevant comment states, "It's OK because the check does not require catalog access and can't fail during an end-of-transaction GUC reversion..." However, we plan to add a catalog lookup to this privilege check in a follow-up commit. This commit moves this privilege check to the check_hook for session_authorization. Like check_role(), we do not throw a hard error for insufficient privileges when the source is PGC_S_TEST. Author: Joseph Koshakow Discussion: https://postgr.es/m/CAAvxfHc-HHzONQ2oXdvhFF9ayRnidPwK%2BfVBhRzaBWYYLVQL-g%40mail.gmail.com	2023-07-13 21:10:36 -07:00
Amit Kapila	edca342434	Allow the use of a hash index on the subscriber during replication. Commit `89e46da5e5` allowed using BTREE indexes that are neither PRIMARY KEY nor REPLICA IDENTITY on the subscriber during apply of update/delete. This patch extends that functionality to also allow HASH indexes. We explored supporting other index access methods as well but they don't have a fixed strategy for equality operation which is required by the current infrastructure in logical replication to scan the indexes. Author: Kuroda Hayato Reviewed-by: Peter Smith, Onder Kalaci, Amit Kapila Discussion: https://postgr.es/m/TYAPR01MB58669D7414E59664E17A5827F522A@TYAPR01MB5866.jpnprd01.prod.outlook.com	2023-07-14 08:21:54 +05:30
Michael Paquier	a5ea825f95	Add indisreplident to fields refreshed by RelationReloadIndexInfo() RelationReloadIndexInfo() is a fast-path used for index reloads in the relation cache, and it has always forgotten about updating indisreplident, which is something that would happen after an index is selected for a replica identity. This can lead to incorrect cache information provided when executing a command in a transaction context that updates indisreplident. None of the code paths currently on HEAD that need to check upon pg_index.indisreplident fetch its value from the relation cache, always relying on a fresh copy on the syscache. Unfortunately, this may not be the case of out-of-core code, that could see out-of-date value. Author: Shruthi Gowda Reviewed-by: Robert Haas, Dilip Kumar, Michael Paquier Discussion: https://postgr.es/m/CAASxf_PBcxax0wW-3gErUyftZ0XrCs3Lrpuhq4-Z3Fak1DoW7Q@mail.gmail.com Backpatch-through: 11	2023-07-14 11:15:34 +09:00
Michael Paquier	38ea6aa90e	Fix updates of indisvalid for partitioned indexes indisvalid is switched to true for partitioned indexes when all its partitions have valid indexes when attaching a new partition, up to the top-most parent if all its leaves are themselves valid when dealing with multiple layers of partitions. The copy of the tuple from pg_index used to switch indisvalid to true came from the relation cache, which is incorrect. Particularly, in the case reported by Shruthi Gowda, executing a series of commands in a single transaction would cause the validation of partitioned indexes to use an incorrect version of a pg_index tuple, as indexes are reloaded after an invalidation request with RelationReloadIndexInfo(), a much faster version than a full index cache rebuild. In this case, the limited information updated in the cache leads to an incorrect version of the tuple used. One of the symptoms reported was the following error, with a replica identity update, for instance: "ERROR: attempted to update invisible tuple" This is incorrect since `8b08f7d`, so backpatch all the way down. Reported-by: Shruthi Gowda Author: Michael Paquier Reviewed-by: Shruthi Gowda, Dilip Kumar Discussion: https://postgr.es/m/CAASxf_PBcxax0wW-3gErUyftZ0XrCs3Lrpuhq4-Z3Fak1DoW7Q@mail.gmail.com Backpatch-through: 11	2023-07-14 10:12:48 +09:00
Thomas Munro	d0c28601ef	Remove wal_sync_method=fsync_writethrough on Windows. The "fsync" level already flushes drive write caches on Windows (as does "fdatasync"), so it only confuses matters to have an apparently higher level that isn't actually different at all. That leaves "fsync_writethrough" only for macOS, where it actually does something different. Reviewed-by: Magnus Hagander <magnus@hagander.net> Discussion: https://postgr.es/m/CA%2BhUKGJ2CG2SouPv2mca2WCTOJxYumvBARRcKPraFMB6GSEMcA%40mail.gmail.com	2023-07-14 12:30:13 +12:00
Michael Paquier	aea7fe33fb	Add information about line contents on parsing failure of wait_event_names.txt The contents of the line whose parsing failed was not reported in the error message produced by generate-wait_event_types.pl, making harder than necessary the debugging of incorrectly-shaped entries in the file. Reported-by: Andres Freund Discussion: https://postgr.es/m/ZK9S3jFEV1X797Ll@paquier.xyz	2023-07-14 09:09:23 +09:00
Michael Paquier	183a60a628	Remove double quotes from the second column of wait_event_names.txt The double quotes used for the wait event names are not required, as the values quoted are made of single words. The files generated by generate-wait_event_types.pl (pgstat_wait_event.c, wait_event_types.h and wait_event_types.sgml) are exactly the same before and after this commit, hence the wait event names and the enum elements have the same names as before. Discussion: https://postgr.es/m/ZK9S3jFEV1X797Ll@paquier.xyz	2023-07-14 08:55:11 +09:00
Andres Freund	c66a7d75e6	Handle DROP DATABASE getting interrupted Until now, when DROP DATABASE got interrupted in the wrong moment, the removal of the pg_database row would also roll back, even though some irreversible steps have already been taken. E.g. DropDatabaseBuffers() might have thrown out dirty buffers, or files could have been unlinked. But we continued to allow connections to such a corrupted database. To fix this, mark databases invalid with an in-place update, just before starting to perform irreversible steps. As we can't add a new column in the back branches, we use pg_database.datconnlimit = -2 for this purpose. An invalid database cannot be connected to anymore, but can still be dropped. Unfortunately we can't easily add output to psql's \l to indicate that some database is invalid, it doesn't fit in any of the existing columns. Add tests verifying that a interrupted DROP DATABASE is handled correctly in the backend and in various tools. Reported-by: Evgeny Morozov <postgresql3@realityexists.net> Author: Andres Freund <andres@anarazel.de> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/20230509004637.cgvmfwrbht7xm7p6@awork3.anarazel.de Discussion: https://postgr.es/m/20230314174521.74jl6ffqsee5mtug@awork3.anarazel.de Backpatch: 11-, bug present in all supported versions	2023-07-13 13:03:28 -07:00
Andres Freund	83ecfa9fad	Release lock after encountering bogs row in vac_truncate_clog() When vac_truncate_clog() encounters bogus datfrozenxid / datminmxid values, it returns early. Unfortunately, until now, it did not release WrapLimitsVacuumLock. If the backend later tries to acquire WrapLimitsVacuumLock, the session / autovacuum worker hangs in an uncancellable way. Similarly, other sessions will hang waiting for the lock. However, if the backend holding the lock exited or errored out for some reason, the lock was released. The bug was introduced as a side effect of `566372b3d6`. It is interesting that there are no production reports of this problem. That is likely due to a mix of bugs leading to bogus values having gotten less common, process exit releasing locks and instances of hangs being hard to debug for "normal" users. Discussion: https://postgr.es/m/20230621221208.vhsqgduwfpzwxnpg@awork3.anarazel.de	2023-07-13 13:03:28 -07:00
Amit Langote	40b3af72a7	Add missing const qualifier Missed in commit `785480c953`. Pointed out by Tom Lane. Discussion: https://postgr.es/m/2795364.1689221300%40sss.pgh.pa.us	2023-07-13 22:38:26 +09:00
Amit Langote	328f492d25	Fix code indentation violation in commit `b6e1157e7d` Per buildfarm member koel via Andrew Dunstan.	2023-07-13 22:26:10 +09:00
Peter Eisentraut	e1c83e7b96	Fix untranslatable log message assembly We can't inject the name of the logical replication worker into a log message like that. But for these messages we don't really need the precision of knowing what kind of worker it was, so just write "logical replication worker" and keep the message in one piece. Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPt1xwATviPGjjtJy5L631SGf3qjV9XUCmxLu16cHamfgg%40mail.gmail.com	2023-07-13 13:21:43 +02:00
Michael Paquier	ccfca8ea42	Remove duplicated assignment of LLVMJitHandle->lljit This duplicated assignment when emiting some code not yet compiled. Oversight in `6c57f2e`. Author: Matheus Alcantara Reviewed-by: Gurjeet Singh Discussion: https://postgr.es/m/La1Tfi7wirg9uGbCx_y7Qb9kl2T15mYouDXjCKAFuDqzQWnTRwH7KNLGigLLcxRs91V6dp2ySs1j_7mB4btzEZJ9NIMEAyOjUyAS7Jx-ydQ=@pm.me	2023-07-13 16:44:17 +09:00
Masahiko Sawada	fd48a86c62	Doc: clarify the conditions of usable indexes for REPLICA IDENTITY FULL tables. Commit `89e46da5e` allowed REPLICA IDENTITY FULL tables to use an index on the subscriber during apply of update/delete. This commit clarifies in the documentation that the leftmost field of candidate indexes must be a column (not an expression) that references the published relation column. The source code comments are also updated accordingly. Reviewed-by: Peter Smith, Amit Kapila Discussion: https://postgr.es/m/CAD21AoDJjffEvUFKXT27Q5U8-UU9JHv4rrJ9Ke8Zkc5UPWHLvA@mail.gmail.com Backpatch-through: 16	2023-07-13 15:03:17 +09:00
Nathan Bossart	0fef877538	Rename session_auth_is_superuser to current_role_is_superuser. This variable might've been accurately named when it was added in `ea886339b8`, but the name hasn't been accurate since at least the introduction of SET ROLE in `e5d6b91220`. The corresponding documentation was fixed in `eedb068c0a`. This commit renames the variable accordingly. Suggested-by: Joseph Koshakow Discussion: https://postgr.es/m/CAAvxfHc-HHzONQ2oXdvhFF9ayRnidPwK%2BfVBhRzaBWYYLVQL-g%40mail.gmail.com	2023-07-12 21:28:54 -07:00
Amit Langote	b6e1157e7d	Don't include CaseTestExpr in JsonValueExpr.formatted_expr A CaseTestExpr is currently being put into JsonValueExpr.formatted_expr as placeholder for the result of evaluating JsonValueExpr.raw_expr, which in turn is evaluated separately. Though, there's no need for this indirection if raw_expr itself can be embedded into formatted_expr and evaluated as part of evaluating the latter, especially as there is no special reason to evaluate it separately. So this commit makes it so. As a result, JsonValueExpr.raw_expr no longer needs to be evaluated in ExecInterpExpr(), eval_const_exprs_mutator() etc. and is now only used for displaying the original "unformatted" expression in ruleutils.c. While at it, this also removes the function makeCaseTestExpr(), because the code in makeJsonConstructorExpr() looks more readable without it IMO and isn't used by anyone else either. Finally, a note is added in the comment above CaseTestExpr's definition that JsonConstructorExpr is also using it. Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com	2023-07-13 12:13:58 +09:00
Amit Langote	785480c953	Pass constructName to transformJsonValueExpr() This allows it to pass to coerce_to_specific_type() the actual name corresponding to the specific JSON_* function expression being transformed, instead of the currently hardcoded string. Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com	2023-07-13 12:13:49 +09:00
Michael Paquier	c17164aec8	Simplify some conditions related to [LW]Lock in generate-wait_event_types.pl The first check on the enum values was not necessary as the values set in wait_event_names.txt for the classes LWLock and Lock were able to satisfy the check. The second check when generating the C and header files is now based on a match of the class names, making it simpler to understand. Author: Masahiro Ikeda, Michael Paquier Discussion: https://postgr.es/m/eaf82a85c0ef1b55dc3b651d3f7b867a@oss.nttdata.com	2023-07-13 09:09:04 +09:00
Peter Eisentraut	7ef2912519	Remove ancient special case code for dropping oid columns The special handling of negative attribute numbers in RemoveAttributeById() was introduced to support SET WITHOUT OIDS (commit `24614a9880`). But that feature doesn't exist anymore, so we can revert to the previous, simpler version. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org	2023-07-12 16:13:50 +02:00
Peter Eisentraut	5eaa0e92ee	Remove ancient special case code for adding oid columns The special handling of negative attribute numbers in ATExecAddColumn() was introduced to support SET WITH OIDS (commit `6d1e361852`). But that feature doesn't exist anymore, so we can revert to the previous, simpler version. In passing, also remove an obsolete comment about OID support. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org	2023-07-12 16:13:40 +02:00
Peter Eisentraut	adf333b4ed	Remove obsolete comment about OID support Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org	2023-07-12 16:13:31 +02:00
Peter Eisentraut	8c852ba9a4	Allow some exclusion constraints on partitions Previously we only allowed unique B-tree constraints on partitions (and only if the constraint included all the partition keys). But we could allow exclusion constraints with the same restriction. We also require that those columns be compared for equality, not something like &&. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/ec8b1d9b-502e-d1f8-e909-1bf9dffe6fa5@illuminatedcomputing.com	2023-07-12 09:25:17 +02:00
Masahiko Sawada	46ebdfe164	Report index vacuum progress. This commit adds two columns: indexes_total and indexes_processed, to pg_stat_progress_vacuum system view to show the index vacuum progress. These numbers are reported in the "vacuuming indexes" and "cleaning up indexes" phases. This uses the new parallel message type for progress reporting added by be06506e7. Bump catversion because this changes the definition of pg_stat_progress_vacuum. Author: Sami Imseih Reviewed by: Masahiko Sawada, Michael Paquier, Nathan Bossart, Andres Freund Discussion: https://www.postgresql.org/message-id/flat/5478DFCD-2333-401A-B2F0-0D186AB09228@amazon.com	2023-07-11 12:34:01 +09:00
Masahiko Sawada	f1889729dd	Add new parallel message type to progress reporting. This commit adds a new type of parallel message 'P' to allow a parallel worker to poke at a leader to update the progress. Currently it supports only incremental progress reporting but it's possible to allow for supporting of other backend progress APIs in the future. There are no users of this new message type as of this commit. That will follow in future commits. Idea from Andres Freund. Author: Sami Imseih Reviewed by: Michael Paquier, Masahiko Sawada Discussion: https://www.postgresql.org/message-id/flat/5478DFCD-2333-401A-B2F0-0D186AB09228@amazon.com	2023-07-11 12:33:54 +09:00
Thomas Munro	4e9fa6d56b	Don't expose Windows' mbstowcs_l() and wcstombs_l(). Windows has similar functions with leading underscores. Previously, we provided the rename via a macro in win32_port.h. In fact its functions are not always good replacements for the Unix functions, since they can't deal with UTF-8. They are only currently used by pg_locale.c, which is careful to redirect to other Windows routines for UTF-8. Given that portability hazard, it seem unlikely to be a good idea to encourage any other code to think of these functions as being available outside pg_locale.c. Any code that thinks it wants these functions probably wants our wchar2char() or char2wchar() routines instead, or it won't actually work on Windows in UTF-8 databases. Furthermore, some major libc implementations including glibc don't have them (they only have the standard variants without _l), so external code is very unlikely to require them to exist. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CA%2BhUKG%2Bt_CHPzEoPnKyARJBJgE9-GxNajJo6ZuSfRK_KWFO%2B6w%40mail.gmail.com	2023-07-11 09:34:22 +12:00
Tom Lane	a8b7424684	Be more rigorous about local variables in PostgresMain(). Since PostgresMain calls sigsetjmp, any local variables that are not marked "volatile" have a risk of unspecified behavior. In practice this means that when control returns via longjmp, such variables might get reset to their values as of the time of sigsetjmp, depending on whether the compiler chose to put them in registers or on the stack. We were careful about this for "send_ready_for_query", but not the other local variables. In the case of the timeout_enabled flags, resetting them to their initial "false" states is actually good, since we do "disable_all_timeouts()" in the longjmp cleanup code path. If that does not happen, we risk uselessly calling "disable_timeout()" later, which is harmless but a little bit expensive. Let's explicitly reset these flags so that the behavior is correct and platform-independent. (This change means that we really don't need the new "volatile" markings after all, but let's install them anyway since any change in this logic could re-introduce a problem.) There is no issue for "firstchar" and "input_message" because those are explicitly reinitialized each time through the query processing loop. To make that clearer, move them to be declared inside the loop. That leaves us with all the function-lifespan locals except the sigjmp_buf itself marked as volatile, which seems like a good policy to have going forward. Because of the possibility of extra disable_timeout() calls, this seems worth back-patching. Sergey Shinderuk and Tom Lane Discussion: https://postgr.es/m/2eda015b-7dff-47fd-d5e2-f1a9899b90a6@postgrespro.ru	2023-07-10 12:14:34 -04:00
Peter Eisentraut	a44d96add2	Fix pgindent for commit `e53a611523`	2023-07-10 12:05:32 +02:00
Peter Eisentraut	e53a611523	Message wording improvements	2023-07-10 10:47:24 +02:00
Michael Paquier	9b286858e3	Add more sanity checks with callers of changeDependencyFor() changeDependencyFor() returns the number of pg_depend entries changed, or 0 if there is a problem. The callers of this routine expect only one dependency to change, but they did not check for the result returned. The following code paths gain checks: - Namespace for extensions. - Namespace for various object types (see AlterObjectNamespace). - Planner support function for a function. Some existing error messages related to all that are reworded to be more consistent with the project style, and the new error messages added follow the same style. This change has exposed one bug fixed a bit earlier with `bd5ddbe`. Reviewed-by: Heikki Linnakangas, Akshat Jaimini Discussion: https://postgr.es/m/ZJzD/rn+UbloKjB7@paquier.xyz	2023-07-10 13:08:10 +09:00
Michael Paquier	bd5ddbe866	Fix ALTER EXTENSION SET SCHEMA with objects outside an extension's schema As coded, the code would use as a base comparison the namespace OID from the first object scanned in pg_depend when switching its namespace dependency entry to the new one, and use it as a base of comparison for any follow-up checks. It would also be used as the old namespace OID to switch from for the extension's pg_depend entry. Hence, if the first object scanned has a namespace different than the one stored in the extension, we would finish by: - Not checking that the extension objects map with the extension's schema. - Not switching the extension -> namespace dependency entry to the new namespace provided by the user, making ALTER EXTENSION ineffective. This issue exists since this command has been introduced in `d9572c4` for relocatable extension, so backpatch all the way down to 11. The test case has been provided by Heikki, that I have tweaked a bit to show the effects on pg_depend for the extension. Reported-by: Heikki Linnakangas Author: Michael Paquier, Heikki Linnakangas Discussion: https://postgr.es/m/20eea594-a05b-4c31-491b-007b6fceef28@iki.fi Backpatch-through: 11	2023-07-10 09:40:07 +09:00
Peter Eisentraut	f8d03ea727	Remove unnecessary unbind in LDAP search+bind mode Comments in src/backend/libpq/auth.c say: (after successfully finding the final DN to check the user-supplied password against) /* Unbind and disconnect from the LDAP server / and later / * Need to re-initialize the LDAP connection, so that we can bind to * it with a different username. */ But the protocol actually permits multiple subsequent authentications ("binds") over a single connection. So, it seems like the whole connection re-initialization thing was just a confusion and can be safely removed, thus saving quite a few network round-trips, especially for the case of ldaps/starttls. Author: Anatoly Zaretsky <anatoly.zaretsky@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CALbq6kmJ-1+58df4B51ctPfTOSyPbY8Qi2=ct8oR=i4TamkUoQ@mail.gmail.com	2023-07-09 08:51:46 +02:00
Thomas Munro	8d9a9f034e	All supported systems have locale_t. locale_t is defined by POSIX.1-2008 and SUSv4, and available on all targeted systems. For Windows, win32_port.h redirects to a partial implementation called _locale_t. We can now remove a lot of compile-time tests for HAVE_LOCALE_T, and associated comments and dead code branches that were needed for older computers. Since configure + MinGW builds didn't detect locale_t but now we assume that all systems have it, further inconsistencies among the 3 Windows build systems were revealed. With this commit, we no longer define HAVE_WCSTOMBS_L and HAVE_MBSTOWCS_L on any Windows build system, but we have logic to deal with that so that replacements are available where appropriate. Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Tristan Partin <tristan@neon.tech> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CA%2BhUKGLg7_T2GKwZFAkEf0V7vbnur-NfCjZPKZb%3DZfAXSV1ORw%40mail.gmail.com	2023-07-09 11:55:18 +12:00
Peter Eisentraut	5edf438eeb	Make some indentation in gram.y consistent Reviewed-by: Amit Langote <amitlangote09@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com	2023-07-08 15:56:01 +02:00
Nathan Bossart	151c22deee	Revert MAINTAIN privilege and pg_maintain predefined role. This reverts the following commits: `4dbdb82513`, `c2122aae63`, `5b1a879943`, `9e1e9d6560`, `ff9618e82a`, `60684dd834`, `4441fc704d`, and `b5d6382496`. A role with the MAINTAIN privilege may be able to use search_path tricks to escalate privileges to the table owner. Unfortunately, it is too late in the v16 development cycle to apply the proposed fix, i.e., restricting search_path when running maintenance commands. Bumps catversion. Reviewed-by: Jeff Davis Discussion: https://postgr.es/m/E1q7j7Y-000z1H-Hr%40gemulon.postgresql.org Backpatch-through: 16	2023-07-07 11:25:13 -07:00
Tomas Vondra	ec99d6e9c8	Document relaxed HOT for summarizing indexes Commit `19d8e2308b` allowed a weaker check for HOT with summarizing indexes, but it did not update README.HOT. So do that now. Patch by Matthias van de Meent, minor changes by me. Backpatch to 16, where the optimization was introduced. Author: Matthias van de Meent Reviewed-by: Tomas Vondra Backpatch-through: 16 Discussion: https://postgr.es/m/CAEze2WiEOm8V+c9kUeYp2BPhbEc5s473fUf51xNeqvSFGv44Ew@mail.gmail.com	2023-07-07 19:04:53 +02:00
Heikki Linnakangas	3142a8845b	WAL-log the creation of the init fork of unlogged indexes. We create a file, so we better WAL-log it. In practice, all the built-in index AMs and all extensions that I'm aware of write a metapage to the init fork, which is WAL-logged, and replay of the metapage implicitly creates the fork too. But if ambuildempty() didn't write any page, we would miss it. This can be seen with dummy_index_am. Set up replication, create a 'dummy_index_am' index on an unlogged table, and look at the files created in the replica: the init fork is not created on the replica. Dummy_index_am doesn't do anything with the relation files, however, so it doesn't lead to any user-visible errors. Backpatch to all supported versions. Reviewed-by: Robert Haas Discussion: https://www.postgresql.org/message-id/6e5bbc08-cdfc-b2b3-9e23-1a914b9850a9%40iki.fi	2023-07-06 17:25:29 +03:00
Amit Kapila	69a674a170	Fix code indentation vioaltion introduced in commit `cc32ec24fd`. Per buildfarm member koel Discussion: https://postgr.es/m/CAA4eK1K2Y_iegdXRUNbsghY9+b-4cSOrxYt9T8TtwXkkdWMVJA@mail.gmail.com	2023-07-06 11:49:18 +05:30
Michael Paquier	a14354cac0	Add GUC parameter "huge_pages_status" This is useful to show the allocation state of huge pages when setting up a server with "huge_pages = try", where allocating huge pages would be attempted but the server would continue its startup sequence even if the allocation fails. The effective status of huge pages is not easily visible without OS-level tools (or for instance, a lookup at /proc/N/smaps), and the environments where Postgres runs may not authorize that. Like the other GUCs related to huge pages, this works for Linux and Windows. This GUC can report as values: - "on", if huge pages were allocated. - "off", if huge pages were not allocated. - "unknown", a special state that could only be seen when using for example postgres -C because it is only possible to know if the shared memory allocation worked after we can check for the GUC values, even if checking a runtime-computed GUC. This value should never be seen when querying for the GUC on a running server. An assertion is added to check that. The discussion has also turned around having a new function to grab this status, but this would have required more tricks for -DEXEC_BACKEND, something that GUCs already handle. Noriyoshi Shinoda has initiated the thread that has led to the result of this commit. Author: Justin Pryzby Reviewed-by: Nathan Bossart, Kyotaro Horiguchi, Michael Paquier Discussion: https://postgr.es/m/TU4PR8401MB1152EBB0D271F827E2E37A01EECC9@TU4PR8401MB1152.NAMPRD84.PROD.OUTLOOK.COM	2023-07-06 14:42:36 +09:00
Michael Paquier	cf05113eb0	Add newline at the end of header generated by generate-wait_event_types.pl The header file wait_event_types.h was generated without a newline at its end, which was inconsistent with all the other things generated automatically. Per offline gripe from Nathan Bossart.	2023-07-06 13:35:50 +09:00
Amit Kapila	cc32ec24fd	Revert the commits related to allowing page lock to conflict among parallel group members. This commit reverts the work done by commits `3ba59ccc89` and `72e78d831a`. Those commits were incorrect in asserting that we never acquire any other heavy-weight lock after acquring page lock other than relation extension lock. We can acquire a lock on catalogs while doing catalog look up after acquring page lock. This won't impact any existing feature but we need to think some other way to achieve this before parallelizing other write operations or even improving the parallelism in vacuum (like allowing multiple workers for an index). Reported-by: Jaime Casanova Author: Amit Kapila Backpatch-through: 13 Discussion: https://postgr.es/m/CAJKUy5jffnRKNvRHKQ0LynRb0RJC-o4P8Ku3x9vGAVLwDBWumQ@mail.gmail.com	2023-07-06 08:52:10 +05:30
Michael Paquier	ae6d06f096	Handle \v as a whitespace character in parsers This commit comes as a continuation of the discussion that has led to `d522b05`, as \v was handled inconsistently when parsing array values or anything going through the parsers, and changing a parser behavior in stable branches is a scary thing to do. The parsing of array values now uses the more central scanner_isspace() and array_isspace() is removed. As pointing out by Peter Eisentraut, fix a confusing reference to horizontal space in the parsers with the term "horiz_space". \f was included in this set since `3cfdd8f` from 2000, but it is not horizontal. "horiz_space" is renamed to "non_newline_space", to refer to all whitespace characters except newlines. The changes impact the parsers for the backend, psql, seg, cube, ecpg and replication commands. Note that JSON should not escape \v, as per RFC 7159, so these are not touched. Reviewed-by: Peter Eisentraut, Tom Lane Discussion: https://postgr.es/m/ZJKcjNwWHHvw9ksQ@paquier.xyz	2023-07-06 08:16:24 +09:00
Heikki Linnakangas	4f4d73466d	Fix leak of LLVM "fatal-on-oom" section counter. llvm_release_context() called llvm_enter_fatal_on_oom(), but was missing the corresponding llvm_leave_fatal_on_oom() call. As a result, if JIT was used at all, we were almost always in the "fatal-on-oom" state. It only makes a difference if you use an extension written in C++, and run out of memory in a C++ 'new' call. In that case, you would get a PostgreSQL FATAL error, instead of the default behavior of throwing a C++ exception. Back-patch to all supported versions. Reviewed-by: Daniel Gustafsson Discussion: https://www.postgresql.org/message-id/54b78cca-bc84-dad8-4a7e-5b56f764fab5@iki.fi	2023-07-05 13:13:13 +03:00
Heikki Linnakangas	5e8068f04e	Change example in pgindent README on "/-----" comments. Most, but not all, of our "/----" style comments have an end-guard line with dashes at the end of the comment. However, pgindent doesn't care about the end-guards, so they mostly just waste screen space. Going forward, let's not require end-guards. Remove a broken end-guard in a comment in walsender.c that led me to think about this. Remove the end guard from another comment in walsender.c for consistency, so that we use the same style in all comments in the file. However, we have thousands of existing "/*----" comments the repository, so it's not worth the code churn to change them all. Discussion: https://www.postgresql.org/message-id/fb083c91-d490-3b65-25f3-05e9118b6b0d%40iki.fi	2023-07-05 10:02:15 +03:00
Daniel Gustafsson	a77d39d5f4	Rename EVT cache hash to make context name unique BuildEventTriggerCache sets up a context "EventTriggerCache" which house a hash named "Event Trigger Cache", which in turn creates a context with the table name. This generated log output for memory context dumps like below: LOG: level: 2; EventTriggerCache: 8192 total in 1 blocks; 7928 free (4 chunks); 264 used LOG: level: 3; Event Trigger Cache: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used This renames the hash to ensure that the hash context has a unique name for easier log reading and debugging. Discussion: https://postgr.es/m/5EDC969E-CAE3-4CBD-965E-3B8A1294CFA4@yesql.se	2023-07-05 08:53:09 +02:00
Masahiko Sawada	68a59f9e99	pgstat: fix subscription stats entry leak. Commit `7b64e4b3` taught DropSubscription() to drop stats entry of subscription that is not associated with a replication slot for apply worker at DROP SUBSCRIPTION but missed covering the case where the subscription is not associated with replication slots for both apply worker and tablesync worker. Also add a test to verify that the stats for slot-less subscription is removed at DROP SUBSCRIPTION time. Backpatch down to 15. Author: Masahiko Sawada Reviewed-by: Nathan Bossart, Hayato Kuroda, Melih Mutlu, Amit Kapila Discussion: https://postgr.es/m/CAD21AoB71zkP7uPT7JDPsZcvp0749ExEQnOJxeNKPDFisHar+w@mail.gmail.com Backpatch-through: 15	2023-07-05 14:49:46 +09:00
Michael Paquier	fa88928470	Generate automatically code and documentation related to wait events The documentation and the code is generated automatically from a new file called wait_event_names.txt, formatted in sections dedicated to each wait event class (Timeout, Lock, IO, etc.) with three tab-separated fields: - C symbol in enums - Format in the system views - Description in the docs Using this approach has several advantages, as we have proved to be rather bad in maintaining this area of the tree across the years: - The order of each item in the documentation and the code, which should be alphabetical, has become incorrect multiple times, and the script generating the code and documentation has a few rules to enforce that, making the maintenance a no-brainer. - Some wait events were added to the code, but not documented, so this cannot be missed now. - The order of the tables for each wait event class is enforced in the documentation (the input .txt file does so as well for clarity, though this is not mandatory). - Less code, shaving 1.2k lines from the tree, with 1/3 of the savings coming from the code, the rest from the documentation. The wait event types "Lock" and "LWLock" still have their own code path for their code, hence only the documentation is created for them. These classes are listed with a special marker called WAIT_EVENT_DOCONLY in the input file. Adding a new wait event now requires only an update of wait_event_names.txt, with "Lock" and "LWLock" treated as exceptions. This commit has been tested with configure/Makefile, the CI and VPATH build. clean, distclean and maintainer-clean were working fine. Author: Bertrand Drouvot, Michael Paquier Discussion: https://postgr.es/m/77a86b3a-c4a8-5f5d-69b9-d70bbf2e9b98@gmail.com	2023-07-05 10:53:11 +09:00
Daniel Gustafsson	48efb2302b	Fix assertion failure in snapshot building Clear any potential stale next_phase_at value from the snapshot builder which otherwise may trip an assertion check ensuring that there is no next_phase_at value. This can be reproduced by running 80 concurrent sessions like the below where $c is a loop counter (assumes there has been 1..$c databases created) : echo " CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120)); SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_$c', 'test_decoding'); SELECT data FROM pg_logical_slot_get_changes('regression_slot_$c', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); " \| psql -d regress_$c >>psql.log & Backpatch down to v16. Bug: #17695 Author: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reported-by: bowenshi <zxwsbg@qq.com> Discussion: https://postgr.es/m/17695-6be9277c9295985f@postgresql.org Backpatch-through: v16	2023-07-04 17:36:13 +02:00
Heikki Linnakangas	4b4798e138	Ensure that creation of an empty relfile is fsync'd at checkpoint. If you create a table and don't insert any data into it, the relation file is never fsync'd. You don't lose data, because an empty table doesn't have any data to begin with, but if you crash and lose the file, subsequent operations on the table will fail with "could not open file" error. To fix, register an fsync request in mdcreate(), like we do for mdwrite(). Per discussion, we probably should also fsync the containing directory after creating a new file. But that's a separate and much wider issue. Backpatch to all supported versions. Reviewed-by: Andres Freund, Thomas Munro Discussion: https://www.postgresql.org/message-id/d47d8122-415e-425c-d0a2-e0160829702d%40iki.fi	2023-07-04 17:57:03 +03:00
David Rowley	625d5b3ca0	Allow Incremental Sorts on GiST and SP-GiST indexes Previously an "amcanorderbyop" index would only be used when the index could provide sorted results which satisfied all query_pathkeys. Here we relax this so that we also allow these indexes to be considered by the planner when they only provide partially sorted results. This allows the planner to later consider making use of an Incremental Sort to satisfy the remaining pathkeys. This change is particularly useful for KNN-type queries which contain a LIMIT clause and an additional ORDER BY clause for a non-indexed column. Author: Miroslav Bendik Reviewed-by: Richard Guo, David Rowley Discussion: https://postgr.es/m/CAPoEpV0QYDtzjwamwWUBqyWpaCVbJV2d6qOD7Uy09bWn47PJtw%40mail.gmail.com	2023-07-04 23:08:52 +12:00
Thomas Munro	03f80daac8	Re-bin segment when memory pages are freed. It's OK to be lazy about re-binning memory segments when allocating, because that can only leave segments in a bin that's too high. We'll search higher bins if necessary while allocating next time, and also eventually re-bin, so no memory can become unreachable that way. However, when freeing memory, the largest contiguous range of free pages might go up, so we should re-bin eagerly to make sure we don't leave the segment in a bin that is too low for get_best_segment() to find. The re-binning code is moved into a function of its own, so it can be called whenever free pages are returned to the segment's free page map. Back-patch to all supported releases. Author: Dongming Liu <ldming101@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier version) Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CAL1p7e8LzB2LSeAXo2pXCW4%2BRya9s0sJ3G_ReKOU%3DAjSUWjHWQ%40mail.gmail.com	2023-07-04 15:16:47 +12:00
David Rowley	a8c09daa8b	Remove trailing zero words from Bitmapsets Prior to this, Bitmapsets could contain trailing words which had no members set. Many places in bitmapset.c had to loop over trailing words to check for empty words. If we ensure we always remove these trailing zero words then we can perform various optimizations such as fast pathing bms_is_subset to return false when 'a' has more words than 'b'. A similar optimization is possible in bms_equal. Both of these together can yield quite significant performance increases in the query planner when querying a partitioned table with around 100 or more partitions. While we're at it, since the minimum number of words a Bitmapset can contain is 1, we can make use of do/while loops instead of for loops when looping over all words in a set. This means checking the loop condition 1 less time, which for single-word sets cuts the loop condition checks in half. Author: David Rowley Reviewed-by: Yuya Watari Discussion: https://postgr.es/m/CAApHDvr5O41MuUjw0DQKqmAnv7QqvmLqXReEd5o4nXTzWp8-+w@mail.gmail.com	2023-07-04 12:34:48 +12:00
Nathan Bossart	957845789b	Increase size of bgw_library_name. This commit increases the size of the bgw_library_name member of the BackgroundWorker struct from BGW_MAXLEN (96) bytes to MAXPGPATH (default of 1024) bytes so that it can store longer file names (e.g., absolute paths). Author: Yurii Rashkovskii Reviewed-by: Daniel Gustafsson, Aleksander Alekseev Discussion: https://postgr.es/m/CA%2BRLCQyjFV5Y8tG5QgUb6gjteL4S3p%2B1gcyqWTqigyM93WZ9Pg%40mail.gmail.com	2023-07-03 15:02:16 -07:00
Thomas Munro	126552c85c	Fix race in SSI interaction with gin fast path. The ginfast.c code previously checked for conflicts in before locking the relevant buffer, leaving a window where a RW conflict could be missed. Re-order. There was also a place where buffer ID and block number were confused while trying to predicate-lock a page, noted by visual inspection. Back-patch to all supported releases. Fixes one more problem discovered with the reproducer from bug #17949, in this case when Dmitry tried other index types. Reported-by: Artem Anisimov <artem.anisimov.255@gmail.com> Reported-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/17949-a0f17035294a55e2%40postgresql.org	2023-07-04 09:07:31 +12:00
Thomas Munro	bcc93a389c	Fix race in SSI interaction with bitmap heap scan. When performing a bitmap heap scan, we don't want to miss concurrent writes that occurred after we observed the heap's rs_nblocks, but before we took predicate locks on index pages. Therefore, we can't skip fetching any heap tuples that are referenced by the index, because we need to test them all with CheckForSerializableConflictOut(). The old optimization that would ignore any references to blocks >= rs_nblocks gets in the way of that requirement, because it means that concurrent writes in that window are ignored. Removing that optimization shouldn't affect correctness at any isolation level, because any new tuples shouldn't be visible to an MVCC snapshot. There also shouldn't be any error-causing references to heap blocks past the end, because we should have held at least an AccessShareLock on the table before the index scan. It can't get smaller while our transaction is running. For now, though, we'll keep the optimization at lower levels to avoid making unnecessary changes in a bug fix. Back-patch to all supported releases. In release 11, the code is in a different place but not fundamentally different. Fixes one aspect of bug #17949. Reported-by: Artem Anisimov <artem.anisimov.255@gmail.com> Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/17949-a0f17035294a55e2%40postgresql.org	2023-07-04 09:07:31 +12:00
Thomas Munro	f9b7fc651a	Fix race in SSI interaction with empty btrees. When predicate-locking btrees, we have a special case for completely empty btrees, since there is no page to lock. This was racy, because, without buffer lock held, a matching key could be inserted between the _bt_search() and the PredicateLockRelation() calls. Fix, by rechecking _bt_search() after taking the relation-level SIREAD lock, if using SERIALIZABLE isolation and an empty btree is discovered. Back-patch to all supported releases. Fixes one aspect of bug #17949. Reported-by: Artem Anisimov <artem.anisimov.255@gmail.com> Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/17949-a0f17035294a55e2%40postgresql.org	2023-07-04 09:07:31 +12:00
Nathan Bossart	562bee0fc1	Don't truncate database and user names in startup packets. Unlike commands such as CREATE DATABASE, ProcessStartupPacket() does not perform multibyte-aware truncation of overlength names. This means that connection attempts might fail even if the user provides the same overlength names that were used in CREATE DATABASE, CREATE ROLE, etc. Ideally, we'd do the same multibyte- aware truncation in both code paths, but it doesn't seem worth the added complexity of trying to discover the encoding of the names. Instead, let's simply skip truncating the names in the startup packet and let the user/database lookup fail later on. With this change, users must provide the exact names stored in the catalogs, even if the names were truncated. This reverts commit `d18c1d1f51`. Author: Bertrand Drouvot Reviewed-by: Kyotaro Horiguchi, Tom Lane Discussion: https://postgr.es/m/07436793-1426-29b2-f924-db7422a05fb7%40gmail.com	2023-07-03 13:18:05 -07:00
Tomas Vondra	29cf61ade3	Consider fillfactor when estimating relation size When table_block_relation_estimate_size() estimated the number of tuples in a relation without statistics (e.g. right after load), it did not consider fillfactor when calculating density. With non-default fillfactor values, this may result in significant overestimate of the number of tuples - up to 10x with the minimum 10% fillfactor. This may have unexpected consequences, e.g. when creating hash indexes. This considers the current fillfactor value in the "no statistics" code path. If the fillfactor changes after loading data into the table, the estimate may be off. But that seems much less likely than changing the fillfactor before the data load. Reviewed-by: Corey Huinker, Peter Eisentraut Discussion: https://postgr.es/m/cf154ef9-6bac-d268-b735-67a3443debba@enterprisedb.com	2023-07-03 18:55:31 +02:00
Tomas Vondra	a4cfeeca5a	Fix code indentation violations Commits `ce5aaea8cd`, `2b8b2852bb` and `28d03feac3` violated the expected code indentation rules, upsetting the new buildfarm member "koel." Discussion: https://postgr.es/m/ZKIU4mhWpgJOM0W0%40paquier.xyz	2023-07-03 12:47:49 +02:00
Peter Eisentraut	6d56c501a7	A minor simplification for List manipulation Fix one place that was using lfirst(list_head(list)) by using linitial(list) instead. They are equivalent but the latter is simpler. We did the same in `9d299a49`. Author: Richard Guo <guofenglinux@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAMbWs49dJnpezDQDDxCPKq7+=_3NyqLqGqnhqCjd+dYe4MS15w@mail.gmail.com	2023-07-03 11:39:03 +02:00
Peter Eisentraut	c69bdf837f	Take pg_attribute out of VacAttrStats The VacAttrStats structure contained the whole Form_pg_attribute for a column, but it actually only needs attstattarget from there. So remove the Form_pg_attribute field and make a separate field for attstattarget. This simplifies some code for extended statistics that doesn't deal with a column but an expression, which had to fake up pg_attribute rows to satisfy internal APIs. Also, we can remove some comments that essentially said "don't look at pg_attribute directly". Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/d6069765-5971-04d3-c10d-e4f7b2e9c459%40eisentraut.org	2023-07-03 07:18:57 +02:00
Peter Eisentraut	7a7f60aef8	Add macro for maximum statistics target The number of places where 10000 was hardcoded had grown a bit beyond the comfort level. Introduce a macro MAX_STATISTICS_TARGET instead. Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/d6069765-5971-04d3-c10d-e4f7b2e9c459%40eisentraut.org	2023-07-03 07:18:57 +02:00
Peter Eisentraut	3ee2f25d21	Change type of pg_statistic_ext.stxstattarget Change from int32 to int16, to match attstattarget (changed in `90189eefc1`). Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/d6069765-5971-04d3-c10d-e4f7b2e9c459%40eisentraut.org	2023-07-03 07:18:57 +02:00
Michael Paquier	8e278b6576	Remove support for OpenSSL 1.0.1 Here are some notes about this change: - As X509_get_signature_nid() should always exist (OpenSSL and LibreSSL), hence HAVE_X509_GET_SIGNATURE_NID is now gone. - OPENSSL_API_COMPAT is bumped to 0x10002000L. - One comment related to 1.0.1e introduced by `74242c2` is removed. Upstream OpenSSL still provides long-term support for 1.0.2 in a closed fashion, so removing it is out of scope for a few years, at least. Reviewed-by: Jacob Champion, Daniel Gustafsson Discussion: https://postgr.es/m/ZG3JNursG69dz1lr@paquier.xyz	2023-07-03 13:20:27 +09:00
Michael Paquier	2aeaf80e57	Refactor some code related to wait events "BufferPin" and "Extension" The following changes are done: - Addition of WaitEventBufferPin and WaitEventExtension, that hold a list of wait events related to each category. - Addition of two functions that encapsulate the list of wait events for each category. - Rename BUFFER_PIN to BUFFERPIN (only this wait event class used an underscore, requiring a specific rule in the automation script). These changes make a bit easier the automatic generation of all the code and documentation related to wait events, as all the wait event categories are now controlled by consistent structures and functions. Author: Bertrand Drouvot Discussion: https://postgr.es/m/c6f35117-4b20-4c78-1df5-d3056010dcf5@gmail.com Discussion: https://postgr.es/m/77a86b3a-c4a8-5f5d-69b9-d70bbf2e9b98@gmail.com	2023-07-03 11:01:02 +09:00
David Rowley	c65102006b	Remove redundant PARTITION BY columns from WindowClauses Here we adjust the query planner to have it remove items from a window clause's PARTITION BY clause in cases where the pathkey for a column in the PARTITION BY clause is redundant. Doing this allows the optimization added in `9d9c02ccd` to stop window aggregation early rather than going into "pass-through" mode to find tuples belonging to the next partition. Also, when we manage to remove all PARTITION BY columns, we now no longer needlessly check that the current tuple belongs to the same partition as the last tuple in nodeWindowAgg.c. If the pathkey was redundant then all tuples must contain the same value for the given redundant column, so there's no point in checking that during execution. Author: David Rowley Reviewed-by: Richard Guo Discussion: https://postgr.es/m/CAApHDvo2ji+hdxrxfXtRtsfSVw3to2o1nCO20qimw0dUGK8hcQ@mail.gmail.com	2023-07-03 12:49:43 +12:00
Thomas Munro	4637a6ac0b	Silence "missing contrecord" error. Commit `dd38ff28ad` added a new error message "missing contrecord" when we fail to reassemble a record. Unfortunately that caused noisy messages to be logged by pg_waldump at end of segment, and by walsender when asked to shut down on a segment boundary. Remove the new error message, so that this condition signals end-of- WAL without a message. It's arguably a reportable condition that should not be silenced while performing crash recovery, but fixing that without introducing noise in the other cases will require more research. Back-patch to 15. Reported-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/6a1df56e-4656-b3ce-4b7a-a9cb41df8189%40enterprisedb.com	2023-07-03 11:16:27 +12:00
Tomas Vondra	ce5aaea8cd	Fix oversight in handling of modifiedCols since `f24523672d` Commit `f24523672d` fixed a memory leak by moving the modifiedCols bitmap into the per-row memory context. In the case of AFTER UPDATE triggers, the bitmap is however referenced from an event kept until the end of the query, resulting in a use-after-free bug. Fixed by copying the bitmap into the AfterTriggerEvents memory context, which is the one where we keep the trigger events. There's only one place that needs to do the copy, but the memory context may not exist yet. Doing that in a separate function seems more readable. Report by Alexander Pyhalov, fix by me. Backpatch to 13, where the bitmap was added to the event by commit `71d60e2aa0`. Reported-by: Alexander Pyhalov Backpatch-through: 13 Discussion: https://postgr.es/m/acddb17c89b0d6cb940eaeda18c08bbe@postgrespro.ru	2023-07-02 22:21:02 +02:00
Tomas Vondra	98640f960e	Fix memory leak in Incremental Sort rescans The Incremental Sort had a couple issues, resulting in leaking memory during rescans, possibly triggering OOM. The code had a couple of related flaws: 1. During rescans, the sort states were reset but then also set to NULL (despite the comment saying otherwise). ExecIncrementalSort then sees NULL and initializes a new sort state, leaking the memory used by the old one. 2. Initializing the sort state also automatically rebuilt the info about presorted keys, leaking the already initialized info. presorted_keys was also unnecessarily reset to NULL. Patch by James Coleman, based on patches by Laurenz Albe and Tom Lane. Backpatch to 13, where Incremental Sort was introduced. Author: James Coleman, Laurenz Albe, Tom Lane Reported-by: Laurenz Albe, Zu-Ming Jiang Backpatch-through: 13 Discussion: https://postgr.es/m/b2bd02dff61af15e3526293e2771f874cf2a3be7.camel%40cybertec.at Discussion: https://postgr.es/m/db03c582-086d-e7cd-d4a1-3bc722f81765%40inf.ethz.ch	2023-07-02 20:03:30 +02:00
Tomas Vondra	0457109344	Improve BRIN minmax-multi opclass test coverage Per the code coverage report, the existing regression tests did not exercice some a couple important BRIN minmax-multi code paths. - The tests focused on testing planning with a range of scan key strategies, but not the execution. Fixed by adding queries that actually test query execution for both equality and inequality. - All tests created indexes after inserting data, but this only exercises the CREATE INDEX strategy that sees all values at once, not incremental summary updates. The new tests flip the order and create the index before adding data. - The assert check(s) validating correctness of expanded ranges were present only in the "union" code path, which is not covered by regression tests at all (as it requires concurrency etc.). Fixed by adding the asserts to a couple more places. Reviewed-by: Heikki Linnakangas Discussion: https://postgr.es/m/57020b2e-d9c9-9bc7-4892-b36d9bb07563%40enterprisedb.com	2023-07-02 10:33:38 +02:00
Tomas Vondra	2b8b2852bb	Introduce bloom_filter_size for BRIN bloom opclass Move the calculation of Bloom filter parameters (for BRIN indexes) into a separate function to make reuse easier. At the moment we only call it from one place, but that may change and it's easier to read anyway. Reviewed-by: Heikki Linnakangas Discussion: https://postgr.es/m/0e1f3350-c9cf-ab62-43a5-5dae314de89c%40enterprisedb.com	2023-07-02 10:24:29 +02:00
Tomas Vondra	28d03feac3	Minor cleanups in the BRIN code BRIN bloom and minmax-multi opclasses were somewhat inconsistent when dealing with bool variables, assigning to them Datum values etc. While not a bug, it makes the code harder to understand, so fix that. While at it, update an incorrect comment copied to bloom opclass from minmax, talking about strategies not supported by bloom. Reviewed-by: Heikki Linnakangas Discussion: https://postgr.es/m/0e1f3350-c9cf-ab62-43a5-5dae314de89c%40enterprisedb.com	2023-07-02 10:21:17 +02:00
Thomas Munro	4f49b3f849	Trust signalfd on illumos, again. Commit `3ab4fc5d` avoided choosing signalfd by default on illumos, because it triggered kernel panics. That was fixed, so we can remove a kludge from our code. Users/packagers can still override the default choice at compile time if desired, and we'll leave the back-branches unchanged so they keep choosing self-pipe by default, but we'll default to signalfd (like we do for Linux) in 17. Fixed kernels should be everywhere by the time 17 ships. The illumos issues were: * https://www.illumos.org/issues/13700 * https://www.illumos.org/issues/14892 Discussion: https://postgr.es/m/CA+hUKG+NK-K_G_i1H3OpDTwYPEsiwQi_jw58PGcW2H+-N2eVCA@mail.gmail.com	2023-07-02 15:28:48 +12:00
Heikki Linnakangas	e251e780bf	Remove redundant check for fast_forward. We already checked for it earlier in the function. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/1ba2899e-77f8-7866-79e5-f3b7d1251a3e@iki.fi	2023-06-30 18:31:10 +03:00
Heikki Linnakangas	a0dd4c95b9	Improve comment on why we need ctid->(cmin,cmax) mapping. Combocids are only part of the problem. Explain the problem in more detail. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/1ba2899e-77f8-7866-79e5-f3b7d1251a3e@iki.fi	2023-06-30 18:30:32 +03:00
Michael Paquier	cfc43aeb38	Fix marking of indisvalid for partitioned indexes at creation The logic that introduced partitioned indexes missed a few things when invalidating a partitioned index when these are created, still the code is written to handle recursions: 1) If created from scratch because a mapping index could not be found, the new index created could be itself invalid, if for example it was a partitioned index with one of its leaves invalid. 2) A CCI was missing when indisvalid is set for a parent index, leading to inconsistent trees when recursing across more than one level for a partitioned index creation if an invalidation of the parent was required. This could lead to the creation of a partition index tree where some of the partitioned indexes are marked as invalid, but some of the parents are marked valid, which is not something that should happen (as validatePartitionedIndex() defines, indisvalid is switched to true for a partitioned index iff all its partitions are themselves valid). This patch makes sure that indisvalid is set to false on a partitioned index if at least one of its partition is invalid. The flag is set to true if all its partitions are valid. The regression test added in this commit abuses of a failed concurrent index creation, marked as invalid, that maps with an index created on its partitioned table afterwards. Reported-by: Alexander Lakhin Reviewed-by: Alexander Lakhin Discussion: https://postgr.es/m/14987634-43c0-0cb3-e075-94d423607e08@gmail.com Backpatch-through: 11	2023-06-30 13:54:48 +09:00
Michael Paquier	23d8624fe5	Use named captures in Catalog::ParseHeader() Using at least perl 5.14 is required since `4c15327`, meaning that it is possible to use named captures and the %+ hash instead of having to count parenthesis groups manually. While on it, CATALOG is made more flexible in its handling of whitespaces for parameter lists (see the addition of \s* in this case). The generated postgres.bki remains exactly the same before and after this commit. Author: Dagfinn Ilmari Mannsåker Reviewed-by: John Naylor Discussion: https://postgr.es/m/87y1l3s7o9.fsf@wibble.ilmari.org	2023-06-30 09:16:27 +09:00
Michael Paquier	97d8910104	Fix pg_depend entry to AMs after ALTER TABLE .. SET ACCESS METHOD ALTER TABLE .. SET ACCESS METHOD was not registering a dependency to the new access method with the relation altered in its rewrite phase, making possible the drop of an access method even if there are relations that depend on it. During the rewrite, a temporary relation is created to build the new relation files before swapping the new and old files, and, while the temporary relation was registering a correct dependency to the new AM, the old relation did not do that. A dependency on the access method is added when the relation files are swapped, which is the point where pg_class is updated. Materialized views and tables use the same code path, hence both were impacted. Backpatch down to 15, where this command has been introduced. Reported-by: Alexander Lakhin Reviewed-by: Nathan Bossart, Andres Freund Discussion: https://postgr.es/m/18000-9145c25b1af475ca@postgresql.org Backpatch-through: 15	2023-06-30 07:49:01 +09:00
Tom Lane	a798660ebe	Defend against bogus parameterization of join input paths. An outer join cannot be formed using an input path that is parameterized by a value that is supposed to be nulled by the outer join. This is obviously nonsensical, and it could lead to a bad plan being selected; although currently it seems that we'll hit various sanity-check assertions first. I think that such cases were formerly prevented by the delay_upper_joins mechanism, but now that that's gone we need an explicit check. (Perhaps we should avoid generating baserel paths that could lead to this situation in the first place; but it seems like having a defense at the join level would be a good idea anyway.) Richard Guo and Tom Lane, per report from Jaime Casanova Discussion: https://postgr.es/m/CAJKUy5g2uZRrUDZJ8p-=giwcSHVUn0c9nmdxPSY0jF0Ov8VoEA@mail.gmail.com	2023-06-29 12:12:52 -04:00
Tom Lane	43af714def	Fix order of operations in ExecEvalFieldStoreDeForm(). If the given composite datum is toasted out-of-line, DatumGetHeapTupleHeader will perform database accesses to detoast it. That can invalidate the result of get_cached_rowtype, as documented (perhaps not plainly enough) in that function's API spec; which leads to strange errors or crashes when we try to use the TupleDesc to read the tuple. In short then, trying to update a field of a composite column could fail intermittently if the overall column value is wide enough to require toasting. We can fix the bug at no cost by just changing the order of operations, since we don't need the TupleDesc until after detoasting. (Other callers of get_cached_rowtype appear to get this right already, so there's only one bug.) Note that the added regression test case reveals this bug reliably only with debug_discard_caches/CLOBBER_CACHE_ALWAYS. Per bug #17994 from Alexander Lakhin. Sadly, this patch does not fix the missing-values issue revealed in the bug discussion; we'll need some more work to cover that. Discussion: https://postgr.es/m/17994-5c7100b51b4790e9@postgresql.org	2023-06-29 10:19:10 -04:00
Peter Eisentraut	efcf55f8fe	Remove inappropriate raw_expression_tree_walker() code It was walking into the ColumnDef->compression field, which is not a node but a string. This code is currently not reachable (because the compression field is only set in situations that don't go through raw_expression_tree_walker()), but if it had been, this could have behaved erratically.	2023-06-29 10:34:53 +02:00
Peter Eisentraut	39a584dc90	Error message wording improvements	2023-06-29 09:14:55 +02:00
Peter Eisentraut	046c8c5c8f	Reword error messages for consistency	2023-06-28 19:30:26 +02:00
Michael Paquier	fc55c7ff8d	Ignore invalid indexes when enforcing index rules in ALTER TABLE ATTACH PARTITION A portion of ALTER TABLE .. ATTACH PARTITION is to ensure that the partition being attached to the partitioned table has a correct set of indexes, so as there is a consistent index mapping between the partitioned table and its new-to-be partition. However, as introduced in `8b08f7d`, the current logic could choose an invalid index as a match, which is something that can exist when dealing with more than two levels of partitioning, like attaching a partitioned table (that has partitions, with an index created by CREATE INDEX ON ONLY) to another partitioned table. A partitioned index with indisvalid set to false is equivalent to an incomplete partition tree, meaning that an invalid partitioned index does not have indexes defined in all its partitions. Hence, choosing an invalid partitioned index can create inconsistent partition index trees, where the parent attaching to is valid, but its partition may be invalid. In the report from Alexander Lakhin, this showed up as an assertion failure when validating an index. Without assertions enabled, the partition index tree would be actually broken, as indisvalid should be switched to true for a partitioned index once all its partitions are themselves valid. With two levels of partitioning, the top partitioned table used a valid index and was able to link to an invalid index stored on its partition, itself a partitioned table. I have studied a few options here (like the possibility to switch indisvalid to false for the parent), but came down to the conclusion that we'd better rely on a simple rule: invalid indexes had better never be chosen, so as the partition attached uses and creates indexes that the parent expects. Some regression tests are added to provide some coverage. Note that the existing coverage is not impacted. This is a problem since partitioned indexes exist, so backpatch all the way down to v11. Reported-by: Alexander Lakhin Discussion: https://postgr.es/14987634-43c0-0cb3-e075-94d423607e08@gmail.com Backpatch-through: 11	2023-06-28 15:57:31 +09:00
Michael Paquier	2ecbb0a493	Remove dependency to query text in JumbleQuery() Since `3db72eb`, the query ID of utilities is generated using the Query structure, making the use of the query string in JumbleQuery() unnecessary. This commit removes the argument "querytext" from JumbleQuery(). Reported-by: Joe Conway Reviewed-by: Nathan Bossart Discussion: https://postgr.es/m/ZJlQAWE4COFqHuAV@paquier.xyz	2023-06-28 08:59:36 +09:00
Peter Eisentraut	3f0199da5a	Translation updates Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: ab77975e9d2cde44da796c18af3ec1a66f0df7ae	2023-06-26 12:02:02 +02:00
Heikki Linnakangas	b491a15f8c	Change "..." to cstring in old input/output function comments. It was not clear what the "..." meant. Author: Steve Chavez Discussion: https://www.postgresql.org/message-id/CAGRrpzZzeh7zC3yaVG9di%3DydJ%2BiHwdXnFPB3evGFKvC1zf6ajA@mail.gmail.com	2023-06-26 11:52:31 +03:00
Tom Lane	691594acd6	Check for interrupts and stack overflow in TParserGet(). TParserGet() recurses for some token types, meaning it's possible to drive it to stack overflow. Since this is a minority behavior, I chose to add the check_stack_depth() call to the two places that recurse rather than doing it during every single call. While at it, add CHECK_FOR_INTERRUPTS(), because this can run unpleasantly long for long inputs. Per bug #17995 from Zuming Jiang. This is old, so back-patch to all supported branches. Discussion: https://postgr.es/m/17995-9f20ff3e6389db4c@postgresql.org	2023-06-24 17:18:08 -04:00
Peter Eisentraut	3ad5f07c0f	Error message refactoring Take some untranslatable things out of the message and replace by format placeholders, to reduce translatable strings and reduce translation mistakes.	2023-06-23 16:36:17 +02:00
Nathan Bossart	4dbdb82513	Fix cache lookup hazards introduced by `ff9618e82a`. `ff9618e82a` introduced has_partition_ancestor_privs(), which is used to check whether a user has MAINTAIN on any partition ancestors. This involves syscache lookups, and presently this function does not take any relation locks, so it is likely subject to the same kind of cache lookup failures that were fixed by `19de0ab23c`. To fix this problem, this commit partially reverts `ff9618e82a`. Specifically, it removes the partition-related changes, including the has_partition_ancestor_privs() function mentioned above. This means that MAINTAIN on a partitioned table is no longer sufficient to perform maintenance commands on its partitions. This is more like how privileges for maintenance commands work on supported versions. Privileges are checked for each partition, so a command that flows down to all partitions might refuse to process them (e.g., if the current user doesn't have MAINTAIN on the partition). In passing, adjust a few related comments and error messages, and add a test for the privilege checks for CLUSTER on a partitioned table. Reviewed-by: Michael Paquier, Jeff Davis Discussion: https://postgr.es/m/20230613211246.GA219055%40nathanxps13	2023-06-22 15:48:20 -07:00
Peter Geoghegan	5f0762f147	nbtree VACUUM: cope with topparent inconsistencies. Avoid "right sibling %u of block %u is not next child" errors when vacuuming a corrupt nbtree index. Just LOG the issue and press on. That way VACUUM will have a decent chance of finishing off all required processing for the index (and for the table as a whole). This is similar to recent work from commit `5abff197`, as well as work from commit `5b861baa` (later backpatched as commit `43e409ce`), which taught nbtree VACUUM to keep going when its "re-find" check fails. The hardening added by this commit takes place directly after the "re-find" check, right before the critical section for the first stage of page deletion. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wz=dayg0vjs4+er84TS9ami=csdzjpuiCGbEw=idhwqhzQ@mail.gmail.com Backpatch: 11- (all supported versions).	2023-06-21 17:41:58 -07:00
Jeff Davis	f3a01af29b	ICU: do not convert locale 'C' to 'en-US-u-va-posix'. Older versions of ICU canonicalize "C" to "en-US-u-va-posix"; but starting in ICU version 64, the "C" locale is considered obsolete. Postgres commit `ea1db8ae70` introduced code to always canonicalize "C" to "en-US-u-va-posix" for consistency and convenience, but it was deemed too confusing. This commit removes that code, so that "C" is treated like other ICU locale names: canonicalization is attempted, and if it fails, the behavior is controlled by icu_validation_level. A similar change was previously committed as `f7faa9976c`, then reverted due to an ICU-version-dependent test failure. This commit un-reverts it, omitting the test because we now expect the behavior to depend on the version of ICU being used. Discussion: https://postgr.es/m/3a200aca-4672-4b37-fc91-5d198a323503%40eisentraut.org Discussion: https://postgr.es/m/f83f089ee1e9acd5dbbbf3353294d24e1f196e95.camel@j-davis.com Discussion: https://postgr.es/m/37520ec1ae9591f83132f82dbd625f3fc2d69c16.camel@j-davis.com	2023-06-21 13:18:25 -07:00
Tom Lane	555b929bbe	Avoid Assert failure when processing empty statement in aborted xact. exec_parse_message() wants to create a cached plan in all cases, including for empty input. The empty-input path does not have a test for being in an aborted transaction, making it possible that plancache.c will fail due to trying to do database lookups even though there's no real work to do. One solution would be to throw an aborted-transaction error in this path too, but it's not entirely clear whether the lack of such an error was intentional or whether some clients might be relying on non-error behavior. Instead, let's hack plancache.c so that it treats empty statements with the same logic it already had for transaction control commands, ensuring that it can soldier through even in an already-aborted transaction. Per bug #17983 from Alexander Lakhin. Back-patch to all supported branches. Discussion: https://postgr.es/m/17983-da4569fcb878672e@postgresql.org	2023-06-21 11:07:24 -04:00
Amit Kapila	a734caa25f	Fix the errhint message and docs for drop subscription failure. The existing errhint message and docs were missing the fact that we can't disassociate from the slot unless the subscription is disabled. Author: Robert Sjöblom, Peter Smith Reviewed-by: Peter Eisentraut, Amit Kapila Backpatch-through: 11 Discussion: https://postgr.es/m/807bdf85-61ea-88e2-5712-6d9fcd4eabff@fortnox.se	2023-06-21 10:36:09 +05:30
Nathan Bossart	5b1a879943	Move bool parameter for vacuum_rel() to option bits. `ff9618e82a` introduced the skip_privs parameter, which is used to skip privilege checks when recursing to a relation's TOAST table. This parameter should have been added as a flag bit in VacuumParams->options instead. Suggested-by: Michael Paquier Reviewed-by: Michael Paquier, Jeff Davis Discussion: https://postgr.es/m/ZIj4v1CwqlDVJZfB%40paquier.xyz	2023-06-20 15:14:58 -07:00
Tom Lane	45392626c9	Fix hash join when inner hashkey expressions contain Params. If the inner-side expressions contain PARAM_EXEC Params, we must re-hash whenever the values of those Params change. The executor mechanism for that exists already, but we failed to invoke it because finalize_plan() neglected to search the Hash.hashkeys field for Params. This allowed a previous scan's hash table to be re-used when it should not be, leading to rows missing from the join's output. (I believe incorrectly-included join rows are impossible however, since checking the real hashclauses would reject false matches.) This bug is very ancient, dating probably to `d24d75ff1` of 7.4. Sadly, this simple fix depends on the plan representational changes made by `2abd7ae9b`, so it will only work back to v12. I thought about trying to make some kind of hack for v11, but I'm leery of putting code significantly different from what is used in the newer branches into a nearly-EOL branch. Seeing that the bug escaped detection for a full twenty years, problematic cases must be rare; so I don't feel too awful about leaving v11 as-is. Per bug #17985 from Zuming Jiang. Back-patch to v12. Discussion: https://postgr.es/m/17985-748b66607acd432e@postgresql.org	2023-06-20 17:47:53 -04:00
Tom Lane	3af87736bf	Fix another cause of "wrong varnullingrels" planner failures. I removed the delay_upper_joins mechanism in commit `b448f1c8d`, reasoning that it was only needed when we have a single-table (SELECT ... WHERE) as the immediate RHS child of a left join, and we could get rid of that by hoisting the WHERE condition into the parent join's quals. However that new code missed a case: we could have "foo LEFT JOIN ((SELECT ... WHERE) LEFT JOIN bar)", and if the two left joins can be commuted then we now have the problematic query shape. We can fix this too easily enough, by allowing the syntactically-lower left join to pass through its parent qual location pointer recursively. That lets prepjointree.c discard the SELECT by temporarily hoisting the WHERE condition into the ancestor join's qual. Per bug #17978 from Zuming Jiang. Discussion: https://postgr.es/m/17978-12f3d93a55297266@postgresql.org	2023-06-20 11:09:56 -04:00
Tom Lane	efeb12ef0b	Don't include outer join relids in lateral_relids bitmapsets. This avoids an assertion failure when outer joins are rearranged per identity 3. Listing only the baserels from a PlaceHolderVar's ph_lateral set should be enough to ensure that the required values are available when we need to compute the PHV --- it's what we did before inventing nullingrel sets, after all. It's a bit unsatisfying; but with beta2 hard upon us, there's not time to look for an aesthetically cleaner fix. Richard Guo and Tom Lane Discussion: https://postgr.es/m/CAMbWs48Jcw-NvnxT23WiHP324wG44DvzcH1j4hc0Zn+3sR9cfg@mail.gmail.com	2023-06-20 10:29:57 -04:00
Tom Lane	0655c03ef9	Centralize fixups for mismatched nullingrels in nestloop params. It turns out that the fixes we applied in commits `bfd332b3f` and `63e4f13d2` were not nearly enough to solve the problem. We'd focused narrowly on subquery RTEs with lateral references, but lateral references can occur in several other RTE kinds such as function RTEs. Putting the same hack into half a dozen code paths seems quite unattractive. Hence, revert the code changes (but not the test cases) from those commits and instead solve it centrally in identify_current_nestloop_params(), as Richard proposed originally. This is a bit annoying because it could mask erroneous nullingrels in nestloop params that are generated from non-LATERAL parameterized paths; but on balance I don't see a better way. Maybe at some future time we'll be motivated to find a more rigorous approach to nestloop params, but that's not happening for beta2. Richard Guo and Tom Lane Discussion: https://postgr.es/m/CAMbWs48Jcw-NvnxT23WiHP324wG44DvzcH1j4hc0Zn+3sR9cfg@mail.gmail.com	2023-06-20 10:22:52 -04:00
Tom Lane	b334612b8a	Pre-beta2 mechanical code beautification. Run pgindent and pgperltidy. It seems we're still some ways away from all committers doing this automatically. Now that we have a buildfarm animal that will whine about poorly-indented code, we'll try to keep the tree more tidy. Discussion: https://postgr.es/m/3156045.1687208823@sss.pgh.pa.us	2023-06-20 09:50:43 -04:00
Michael Paquier	68cb5af46c	Enable archiving in recovery TAP test 009_twophase.pl This is a follow-up of `f663b00`, that has been committed to v13 and v14, tweaking the TAP test for two-phase transactions so as it provides coverage for the bug that has been fixed. This change is done in its own commit for clarity, as v15 and HEAD did not show the problematic behavior, still missed coverage for it. While on it, this adds a comment about the dependency of the last partial segment rename and RecoverPreparedTransactions() at the end of recovery, as that can be easy to miss. Author: Michael Paquier Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/743b9b45a2d4013bd90b6a5cba8d6faeb717ee34.camel@cybertec.at Backpatch-through: 13	2023-06-20 10:25:27 +09:00
Andres Freund	0d369ac650	fd.c: Retry after EINTR in more places Starting with `4d330a61bb` we can use posix_fallocate() to extend files. Unfortunately in some situation, e.g. on tmpfs filesystems, EINTR may be returned. See also `4518c798b2`. To fix, add a retry path to FileFallocate(). In contrast to `4518c798b2` the amount we extend by is limited and the extending may happen at a high frequency, so disabling signals does not appear to be the correct path here. Also add retry paths to other file operations currently lacking them (around fdatasync(), fsync(), ftruncate(), posix_fadvise(), sync_file_range(), truncate()) - they are all documented or have been observed to return EINTR. Even though most of these functions used in the back branches, it does not seem worth the risk to backpatch - outside of the new-to-16 case of posix_fallocate() I am not aware of problem reports due to the lack of retries. Reported-by: Christoph Berg <myon@debian.org> Discussion: https://postgr.es/m/ZEZDj1H61ryrmY9o@msg.df7cb.de Backpatch: -	2023-06-19 14:11:32 -07:00
David Rowley	7fcd7ef2a9	Don't use partial unique indexes for unique proofs in the planner Here we adjust relation_has_unique_index_for() so that it no longer makes use of partial unique indexes as uniqueness proofs. It is incorrect to use these as the predicates used by check_index_predicates() to set predOK makes use of not only baserestrictinfo quals as proofs, but also qual from join conditions. For relation_has_unique_index_for()'s case, we need to know the relation is unique for a given set of columns before any joins are evaluated, so if predOK was only set to true due to some join qual, then it's unsafe to use such indexes in relation_has_unique_index_for(). The final plan may not even make use of that index, which could result in reading tuples that are not as unique as the planner previously expected them to be. Bug: #17975 Reported-by: Tor Erik Linnerud Backpatch-through: 11, all supported versions Discussion: https://postgr.es/m/17975-98a90c156f25c952%40postgresql.org	2023-06-19 13:00:42 +12:00
Jeff Davis	a14e75eb0b	CREATE DATABASE: make LOCALE apply to all collation providers. For CREATE DATABASE, make LOCALE parameter apply regardless of the provider used. Also affects initdb and createdb --locale arguments. Previously, LOCALE (and --locale) only affected the database default collation when using the libc provider. Discussion: https://postgr.es/m/1a63084d-221e-4075-619e-6b3e590f673e@enterprisedb.com Reviewed-by: Peter Eisentraut	2023-06-16 10:27:32 -07:00
Amit Langote	160c23b5fa	Fix typo in comment. Back-patch down to 11. Author: Sho Kato (<kato-sho@fujitsu.com>) Discussion: https://postgr.es/m/TYCPR01MB68499042A33BC32241193AAF9F5BA%40TYCPR01MB6849.jpnprd01.prod.outlook.com	2023-06-16 10:17:15 +09:00
Tom Lane	f4c00d138f	When removing a left join, clean out references in EquivalenceClasses. Since commit `b448f1c8d`, we've been able to remove left joins (that are otherwise removable) even when they are underneath other left joins, a case that was previously prevented by a delay_upper_joins check. This is a clear improvement, but it has a surprising side-effect: it's now possible that there are EquivalenceClasses whose relid sets mention the removed baserel and/or outer join. If we fail to clean those up, we may drop essential join quals due to not having any join level that appears to satisfy their relid sets. (It's not quite 100% clear that this was impossible before. But the lack of complaints since we added join removal a dozen years ago strongly suggests that it was impossible.) Richard Guo and Tom Lane, per bug #17976 from Zuming Jiang Discussion: https://postgr.es/m/17976-4b638b525e9a983b@postgresql.org	2023-06-15 15:24:50 -04:00
Amit Langote	5c8c8079b0	Remove outdated reference to a removed file parse_jsontable.c was removed as part of `2f2b18bd3f`, though its mention in src/backend/parser/README was not. Fix that. Discussion: https://postgr.es/m/CA%2BHiwqHDzw8AP8p_dEkFr0xg458ZTf58zbivAHhK4UeNrx9Tdg%40mail.gmail.com	2023-06-15 22:35:42 +09:00
Masahiko Sawada	a54fc892ad	Replace GUC_UNIT_MEMORY\|GUC_UNIT_TIME with GUC_UNIT. We used (GUC_UNIT_MEMORY \| GUC_UNIT_TIME) instead of GUC_UNIT some places but we already define it in guc.h. This commit replaces them with GUC_UNIT for better consistency with their surrounding code. Author: Japin Li Reviewed-by: Richard Guo, Michael Paquier, Masahiko Sawada Discussion: https://postgr.es/m/MEYP282MB1669EC0FED922F7A151673ACB65AA@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM	2023-06-15 17:04:19 +09:00
Amit Kapila	b5c517379a	Fix possible crash in tablesync worker. Commit `c3afe8cf5a` added a new password_required option but forgot that you need database access to check whether an arbitrary role ID is a superuser. Commit `e7e7da2f8d` fixed a similar bug in apply worker, and this patch fixes a similar bug in tablesync worker. Author: Hou Zhijie Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/OS0PR01MB571607F5A9D723755268D36294759@OS0PR01MB5716.jpnprd01.prod.outlook.com	2023-06-15 08:37:48 +05:30
Noah Misch	f9f31aa91f	Make parseNodeString() C idiom compatible with Visual Studio 2015. Between v15 and now, this function's "else if" chain grew from 252 lines to 592 lines, exceeding a compiler limit that manifests as "fatal error C1026: parser stack overflow, program too complex (compiling source file src/backend/nodes/readfuncs.c)". Use "if (...) return ...;" instead. Reviewed by Tom Lane, Peter Eisentraut and Michael Paquier. Not all reviewers endorse this. Discussion: https://postgr.es/m/20230607185458.GA1334487@rfd.leadboat.com	2023-06-14 05:31:54 -07:00
Masahiko Sawada	4327f6c748	Fix typo in comment. Introduced in `4d330a61bb`. Author: Masahiko Sawada Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CAD21AoDg8rTWJkrNJg9UTP89vS8smfib2c55DVqKrCn8zR-GYA@mail.gmail.com	2023-06-14 13:28:41 +09:00
Amit Langote	0f8cfaf892	Retain relkind too in RTE_SUBQUERY entries for views. `47bb9db75` modified the ApplyRetrieveRule()'s conversion of a view's original RTE_RELATION entry into an RTE_SUBQUERY one to retain relid, rellockmode, and perminfoindex so that the executor can lock the view and check its permissions. It seems better to also retain relkind for cross-checking that the exception of an RTE_SUBQUERY entry being allowed to carry relation details only applies to views, so do so. Bump catversion because this changes the output format of RTE_SUBQUERY RTEs. Suggested-by: David Steele <david@pgmasters.net> Reviewed-by: David Steele <david@pgmasters.net> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/3953179e-9540-e5d1-a743-4bef368785b0%40pgmasters.net	2023-06-14 12:00:10 +09:00
Tom Lane	63e4f13d2a	Fix "wrong varnullingrels" for Memoize's lateral references, too. The issue fixed in commit `bfd332b3f` can also bite Memoize plans, because of the separate copies of lateral reference Vars made by paraminfo_get_equal_hashops. Apply the same hacky fix there. (In passing, clean up shaky grammar in the existing comments for this function.) Richard Guo Discussion: https://postgr.es/m/CAMbWs4-krwk0Wbd6WdufMAupuou_Ua73ijQ4XQCr1Mb5BaVtKQ@mail.gmail.com	2023-06-13 18:01:33 -04:00
Tom Lane	792213f2e9	Correctly update hasSubLinks while mutating a rule action. rewriteRuleAction neglected to check for SubLink nodes in the securityQuals of range table entries. This could lead to failing to convert such a SubLink to a SubPlan, resulting in assertion crashes or weird errors later in planning. In passing, fix some poor coding in rewriteTargetView: we should not pass the source parsetree's hasSubLinks field to ReplaceVarsFromTargetList's outer_hasSubLinks. ReplaceVarsFromTargetList knows enough to ignore that when a Query node is passed, but it's still confusing and bad precedent: if we did try to update that flag we'd be updating a stale copy of the parsetree. Per bug #17972 from Alexander Lakhin. This has been broken since we added RangeTblEntry.securityQuals (although the presented test case only fails back to `215b43cdc`), so back-patch all the way. Discussion: https://postgr.es/m/17972-f422c094237847d0@postgresql.org	2023-06-13 15:58:43 -04:00
Andres Freund	e3cb1a586c	Report stats when replaying XLOG_RUNNING_XACTS Previously stats in the startup process would only get reported during shutdown of the startup process. It has been that way for a long time, but became a lot more noticeable with the new pg_stat_io view, which separates out IO done by different backend types... While replaying after every XLOG_RUNNING_XACTS isn't the prettiest approach, it has the advantage of being quite easy. Given that we're well past feature freeze... It's not a problem that we don't report stats more frequently with wal_level=minimal, in that case stats can't be read before the stats process has shut down. Besides the above, this commit also changes pgstat_report_stat() to acquire the timestamp with GetCurrentTimestamp() instead of GetCurrentTransactionStopTimestamp(). Thanks to Melih Mutlu, Kyotaro Horiguchi for prototypes of other approaches to solving this issue. Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com> Discussion: https://postgr.es/m/5315aedc-fbca-1556-c5de-dc2e00b23a14@oss.nttdata.com	2023-06-12 15:06:39 -07:00
Tom Lane	7398e27224	Accept fractional seconds in jsonpath's datetime() method. Commit `927d9abb6` purported to make datetime() accept any string that could be output for a datetime value by to_jsonb(). But it overlooked the possibility of fractional seconds being present, so that cases as simple as to_jsonb(now()) would defeat it. Fix by adding formats that include ".US" to the list in executeDateTimeMethod(). (Note that while this is nominally microseconds, it'll do the right thing for fractions with fewer than six digits.) In passing, re-order the list to restore the datatype ordering specified in its comment. The violation accidentally did not break anything; but the next edit might be less lucky, so add more comments. Per report from Tim Field. Back-patch to v13 where datetime() was added, like the previous patch. Discussion: https://postgr.es/m/014A028B-5CE6-4FDF-AC24-426CA6FC9CEE@mohiohio.com	2023-06-12 10:54:44 -04:00
Noah Misch	8c7ad6f156	Add win32ver data to meson-built postgres.exe. As in the older build systems, the resources object is not an input to postgres.def. Reviewed by Andres Freund. Discussion: https://postgr.es/m/20230607231407.GC1334487@rfd.leadboat.com	2023-06-12 07:40:38 -07:00
Noah Misch	04411cbfdb	Give postgres.exe the icon of other executables. We had left it icon-free since users won't achieve much by opening it from Windows Explorer. Subsequent to that decision, Task Manager started to show the icon. That shifts the balance in favor of attaching the icon, so do so. No back-patch, but make this late addition to v16. Reviewed by Andres Freund and Magnus Hagander. Discussion: https://postgr.es/m/20230608014507.GD1334487@rfd.leadboat.com	2023-06-12 07:40:38 -07:00
Tom Lane	bfd332b3fd	Fix "wrong varnullingrels" for subquery nestloop parameters. If we apply outer join identity 3 when relation C is a subquery having lateral references to relation B, then the lateral references within C continue to bear the original syntactically-correct varnullingrels marks, but that won't match what is available from the outer side of the nestloop. Compensate for that in process_subquery_nestloop_params(). This is a slightly hacky fix, but we certainly don't want to re-plan C in toto for each possible outer join order, so there's not a lot of better alternatives. Richard Guo and Tom Lane, per report from Markus Winand Discussion: https://postgr.es/m/DFBB2D25-DE97-49CA-A60E-07C881EA59A7@winand.at	2023-06-12 10:01:26 -04:00
Heikki Linnakangas	548d726030	Remove a few unused global variables and declarations. - Commit `3eb77eba5a`, which moved the pending ops queue from md.c to sync.c, introduced a duplicate, unused 'pendingOpsCxt' variable. (I'm surprised none of the compilers or static analysis tools have complained about that.) - Commit `c2fe139c20` moved the 'synchronize_seqscans' variable and introduced an extern declaration in tableam.h, making the one in guc_tables.c unnecessary. - Commit `6f0cf87872` removed the 'pgstat_temp_directory' GUC, but forgot to remove the corresponding global variable. - Commit `1b4e729eaa` removed the 'pg_krb_realm' GUC, and its global variable, but forgot the declaration in auth.h. Spotted all these by reading the code.	2023-06-12 16:25:37 +03:00
Peter Geoghegan	d088ba5a5a	nbtree: Allocate new pages in separate function. Split nbtree's _bt_getbuf function is two: code that read locks or write locks existing pages remains in _bt_getbuf, while code that deals with allocating new pages is moved to a new, dedicated function called _bt_allocbuf. This simplifies most _bt_getbuf callers, since it is no longer necessary for them to pass a heaprel argument. Many of the changes to nbtree from commit `61b313e4` can be reverted. This minimizes the divergence between HEAD/PostgreSQL 16 and earlier release branches. _bt_allocbuf replaces the previous nbtree idiom of passing P_NEW to _bt_getbuf. There are only 3 affected call sites, all of which continue to pass a heaprel for recovery conflict purposes. Note that nbtree's use of P_NEW was superficial; nbtree never actually relied on the P_NEW code paths in bufmgr.c, so this change is strictly mechanical. GiST already took the same approach; it has a dedicated function for allocating new pages called gistNewBuffer(). That factor allowed commit `61b313e4` to make much more targeted changes to GiST. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CAH2-Wz=8Z9qY58bjm_7TAHgtW6RzZ5Ke62q5emdCEy9BAzwhmg@mail.gmail.com	2023-06-10 14:08:25 -07:00
Jeff Davis	2fcc7ee7af	Revert "Fix search_path to a safe value during maintenance operations." This reverts commit `05e1737351`.	2023-06-10 08:11:41 -07:00
Jeff Davis	05e1737351	Fix search_path to a safe value during maintenance operations. While executing maintenance operations (ANALYZE, CLUSTER, REFRESH MATERIALIZED VIEW, REINDEX, or VACUUM), set search_path to 'pg_catalog, pg_temp' to prevent inconsistent behavior. Functions that are used for functional indexes, in index expressions, or in materialized views and depend on a different search path must be declared with CREATE FUNCTION ... SET search_path='...'. This change addresses a security risk introduced in commit `60684dd834`, where a role with MAINTAIN privileges on a table may be able to escalate privileges to the table owner. That commit is not yet part of any release, so no need to backpatch. Discussion: https://postgr.es/m/e44327179e5c9015c8dda67351c04da552066017.camel%40j-davis.com Reviewed-by: Greg Stark Reviewed-by: Nathan Bossart	2023-06-09 11:20:47 -07:00
Nathan Bossart	9aee26a491	Fix missing word in nbtree/README. Reported-by: Daniel Westermann Author: Gurjeet Singh Reviewed-by: Richard Guo Discussion: https://postgr.es/m/ZR0P278MB0427F0E0CE4ED140F52D1923D250A%40ZR0P278MB0427.CHEP278.PROD.OUTLOOK.COM	2023-06-08 21:20:24 -07:00
Masahiko Sawada	a83edeaf68	Honor run_as_owner option in tablesync worker. Commit `482675987` introduced "run_as_owner" subscription option so that subscription runs with either the permissions of the subscription owner or the permission of the table owner. However, tablesync workers did not use this option for the initial data copy. With this change, tablesync workers run with appropriate permissions based on "run_as_owner" option. Ajin Cherian, with changes and regression tests added by me. Reported-By: Amit Kapila Author: Ajin Cherian, Masahiko Sawada Reviewed-by: Ajin Cherian, Amit Kapila Discussion: https://postgr.es/m/CAA4eK1L=qzRHPEn+qeMoKQGFBzqGoLBzt_ov0A89iFFiut+ppA@mail.gmail.com	2023-06-09 10:43:03 +09:00
Tom Lane	9a2dbc614e	Fix oversight in outer join removal. A placeholder that references the outer join's relid in ph_eval_at is logically "above" the join, and therefore we can't remove its PlaceHolderInfo: it might still be used somewhere in the query. This was not an issue pre-v16 because we failed to remove the join at all in such cases. The new outer-join-aware-Var infrastructure permits deducing that it's okay to remove the join, but then we have to clean up correctly afterwards. Report and fix by Richard Guo Discussion: https://postgr.es/m/CAMbWs4_tuVn9EwwMcggGiZJWWstdXX_ci8FeEU17vs+4nLgw3w@mail.gmail.com	2023-06-08 17:10:04 -04:00
Tom Lane	fbf80421ea	Re-allow INDEX_VAR as rt_index in ChangeVarNodes(). Apparently some extensions are in the habit of calling ChangeVarNodes() with INDEX_VAR as the rt_index to replace. That worked before `2489d76c4`, at least as long as there were not PlaceHolderVars in the expression; but now it fails because bms_is_member spits up. Add a test to avoid that. Per report from Anton Melnikov, though this is not his proposed patch. Discussion: https://postgr.es/m/5b370a46-f6d2-373d-9dbc-0d55250e82c1@inbox.ru	2023-06-08 13:11:49 -04:00
Tom Lane	d98ed080bb	Fix small overestimation of base64 encoding output length. pg_base64_enc_len() and its clones overestimated the output length by up to 2 bytes, as a result of sloppy thinking about where to divide. No callers require a precise estimate, so this has no consequences worse than palloc'ing a byte or two more than necessary. We might as well get it right though. This bug is very ancient, dating to commit `79d78bb26` which added encode.c. (The other instances were presumably copied from there.) Still, it doesn't quite seem worth back-patching. Oleg Tselebrovskiy Discussion: https://postgr.es/m/f94da55286a63022150bc266afdab754@postgrespro.ru	2023-06-08 11:24:31 -04:00
Tomas Vondra	f24523672d	Use per-tuple context in ExecGetAllUpdatedCols Commit `fc22b6623b` (generated columns) replaced ExecGetUpdatedCols() with ExecGetAllUpdatedCols() in a couple places handling UPDATE (triggers and lock mode). However, ExecGetUpdatedCols() did exec_rt_fetch() while ExecGetAllUpdatedCols() also allocates memory through bms_union() without paying attention to the memory context and happened to use the long-lived ExecutorState, leaking the memory until the end of the query. The amount of leaked memory is proportional to the number of (updated) attributes, types of UPDATE triggers, and the number of processed rows (which for UPDATE ... FROM ... may be much higher than updated rows). Fixed by switching to the per-tuple context in GetAllUpdatedColumns(). This is fine for all in-core callers, but external callers may need to copy the result. But we're not aware of any such callers. Note the issue was introduced by `fc22b6623b`, but the macros were later renamed by `f50e888990`. Backpatch to 12, where the issue was introduced. Reported-by: Tomas Vondra Reviewed-by: Andres Freund, Tom Lane, Jakub Wartak Backpatch-through: 12 Discussion: https://postgr.es/m/222a3442-7f7d-246c-ed9b-a76209d19239@enterprisedb.com	2023-06-07 18:54:34 +02:00
Peter Eisentraut	b0f6c43716	Remove read-only server settings lc_collate and lc_ctype The GUC settings lc_collate and lc_ctype are from a time when those locale settings were cluster-global. When those locale settings were made per-database (PG 8.4), the settings were kept as read-only. As of PG 15, you can use ICU as the per-database locale provider, so examining these settings is already less meaningful and possibly confusing, since you need to look into pg_database to find out what is really happening, and they would likely become fully obsolete in the future anyway. Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://www.postgresql.org/message-id/696054d1-bc88-b6ab-129a-18b8bce6a6f0@enterprisedb.com	2023-06-07 16:57:06 +02:00
Amit Kapila	d64e6468f4	Reload configuration more frequently in apply worker. The apply worker was not reloading the configuration while processing messages if there is a continuous flow of messages from upstream. It was also not reloading the configuration if there is a change in the configuration after it has waited for the message and before receiving the new replication message. This can lead to failure in tests because we expect that after reload, the behavior of apply worker to respect the changed GUCs. We found this while analyzing a rare buildfarm failure. Author: Hou Zhijie Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/OS0PR01MB5716AF9079CC0755CD015322947E9@OS0PR01MB5716.jpnprd01.prod.outlook.com	2023-06-07 09:19:17 +05:30
Heikki Linnakangas	95f0340c3b	Initialize 'recordXtime' to silence compiler warning. In reality, recordXtime will always be set by the getRecordTimestamp call, but the compiler doesn't necessarily see that. Back-patch to all supported versions. Author: Tristan Partin Discussion: https://www.postgresql.org/message-id/CT5MN8E11U0M.1NYNCHXYUHY41@gonk	2023-06-06 20:30:53 +03:00
Tom Lane	7a844c77ec	Fix joinclause removal logic to cope with cloned clauses. When we're deleting a no-op LEFT JOIN from the query, we must remove the join's joinclauses from surviving relations' joininfo lists. The invention of "cloned" clauses in `2489d76c4` broke the logic for that; it'd fail to remove clones that include OJ relids outside the doomed join's min relid sets, which could happen if that join was previously discovered to commute with some other join. This accidentally failed to cause problems in the majority of cases, because we'd never decide that such a cloned clause was evaluatable at any surviving join. However, Richard Guo discovered a case where that did happen, leading to "no relation entry for relid" errors later. Also, adding assertions that a non-removed clause contains no Vars from the doomed join exposes that there are quite a few existing regression test cases where the problem happens but is accidentally not exposed. The fix for this is just to include the target join's commute_above_r and commute_below_l sets in the relid set we test against when deciding whether a join clause is "pushed down" and thus not removable. While at it, do a little refactoring: the join's relid set can be computed inside remove_rel_from_query rather than in the caller. Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/CAMbWs4_PHrRqTKDNnTRsxxQy6BtYCVKsgXm1_gdN2yQ=kmcO5g@mail.gmail.com	2023-05-26 12:13:19 -04:00
Peter Geoghegan	5abff197cc	nbtree VACUUM: cope with right sibling link corruption. Avoid "right sibling's left-link doesn't match" errors when vacuuming a corrupt nbtree index. Just LOG the issue and press on. That way VACUUM will have a decent chance of finishing off all required processing for the index (and for the table as a whole). This error was seen in the field from time to time (it's more than a theoretical risk), so giving VACUUM the ability to press on like this has real value. Nothing short of a REINDEX is expected to fix the underlying index corruption, so giving up (by throwing an error) risks making a bad situation far worse. Anything that blocks forward progress by VACUUM like this might go unnoticed for a long time. This could eventually lead to a wraparound/xidStopLimit outage. Note that _bt_unlink_halfdead_page() has always been able to bail on page deletion when the target page's left sibling page was in an inconsistent state. It now does the same thing (returns false to back out of the second phase of deletion) when it notices sibling link corruption in the target page's right sibling page. This is similar to the work from commit `5b861baa` (later backpatched as commit `43e409ce`), which taught nbtree to press on with vacuuming an index when page deletion fails to "re-find" a downlink in the target page's parent page. The "re-find" check seems to make VACUUM bail on page deletion more often in practice, but there is no reason to take any chances here. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CAH2-Wzko2q2kP1+UvgJyP9g0mF4hopK0NtQZcxwvMv9_ytGhkQ@mail.gmail.com Backpatch: 11- (all supported versions).	2023-05-25 15:33:00 -07:00
Tom Lane	991a3df227	Fix filtering of "cloned" outer-join quals some more. We've had multiple issues with the clause_is_computable_at logic that I introduced in `2489d76c4`: it's been known to accept more than one clone of the same qual at the same plan node, and also to accept no clones at all. It's looking impractical to get it 100% right on the basis of the currently-stored information, so fix it by introducing a new RestrictInfo field "incompatible_relids" that explicitly shows which outer joins a given clone mustn't be pushed above. In principle we could populate this field in every RestrictInfo, but that would cost space and there doesn't presently seem to be a need for it in general. Also, while deconstruct_distribute_oj_quals can easily fill the field with the remaining members of the commutative join set that it's considering, computing it in the general case seems again pretty complicated. So for now, just fill it for clone quals. Along the way, fix a bug that may or may not be only latent: equivclass.c was generating replacement clauses with is_pushed_down and has_clone/is_clone markings that didn't match their required_relids. This led me to conclude that leaving the clone flags out of make_restrictinfo's purview wasn't such a great idea after all, so add them. Per report from Richard Guo. Discussion: https://postgr.es/m/CAMbWs48EYi_9-pSd0ORes1kTmTeAjT4Q3gu49hJtYCbSn2JyeA@mail.gmail.com	2023-05-25 10:28:33 -04:00
Tom Lane	5df5bea290	Fix the install rule for snowball_create.sql. This file could be in the current (build) directory if we just built it. However, when installing from a VPATH build from a tarball, it will exist in the source directory and gmake will therefore not rebuild it. Use the $< macro to find out where gmake found it. Oversight in `b3a0d8324`, which also exposes a buildfarm testing gap: we test install from VPATH builds from bare source trees, but not from tarballs. Per report from Christoph Berg. Discussion: https://postgr.es/m/ZGzEAqjxkkoY3ooH@msg.df7cb.de	2023-05-23 11:15:57 -04:00
Peter Eisentraut	0ffbe6e591	Use lower case for icu_validation_level values Similar to client_min_messages etc.	2023-05-23 15:19:33 +02:00
Peter Eisentraut	dfe0169988	Punctuation improvement in postgresql.conf.sample	2023-05-23 15:19:12 +02:00
Peter Eisentraut	473e02f6f9	Translation updates Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 642d41265b1ea68ae71a66ade5c5440ba366a890	2023-05-22 12:44:31 +02:00
Tom Lane	b9c755a2f6	In clause_is_computable_at(), test required_relids for clone clauses. Use the clause's required_relids not clause_relids for testing whether it is computable at the current join level, if it is a clone clause generated by deconstruct_distribute_oj_quals(). Arguably, this is more correct and we should do it for all clauses; that would at least remove the handwavy claim that we are doing it to save cycles compared to inspecting Vars individually. However, attempting to do that exposes that we are not being careful to compute an accurate value for required_relids in all cases. I'm unsure whether it's a good idea to attempt to do that for v16, or leave it as future clean-up. In the meantime, this quick hack demonstrably fixes some cases, so let's squeeze it in for beta1. Patch by me, but great thanks to Richard Guo for investigation and testing. The new test cases are all modeled on his examples. Discussion: https://postgr.es/m/CAMbWs4-_vwkBij4XOQ5ukxUvLgwTm0kS5_DO9CicUeKbEfKjUw@mail.gmail.com	2023-05-21 15:25:52 -04:00
Andres Freund	eabb22525e	Remove over-eager assertion in ExtendBufferedRelTo() The assertion checked that the size of the relation is not "too large" - but the code is explicitly dealing with the possibility of another backend extending the relation concurrently. In that case the new relation size could be bigger than what the current backend needs, wrongly triggering an assertion failure. Unfortunately it is hard to write a reliable and affordable regression tests for this, as a lot of concurrency is needed to encounter the bug. Introduced in `31966b151e`. Reported-by: Melanie Plageman <melanieplageman@gmail.com>	2023-05-21 09:53:49 -07:00
Andres Freund	bc971f4025	Optimize walsender wake up logic using condition variables WalSndWakeup() currently loops through all the walsenders slots, with a spinlock acquisition and release for every iteration, to wake up waiting walsenders. This commonly was not a problem before `e101dfac3a`. But, to allow logical decoding on standbys, we need to wake up logical walsenders after every WAL record is applied on the standby, rather just when flushing WAL or switching timelines. This causes a performance regression for workloads replaying a lot of WAL records. To solve this, we use condition variable (CV) to efficiently wake up walsenders in WalSndWakeup(). Every walsender prepares to sleep on a shared memory CV. Note that it just prepares to sleep on the CV (i.e., adds itself to the CV's waitlist), but does not actually wait on the CV (IOW, it never calls ConditionVariableSleep()). It still uses WaitEventSetWait() for waiting, because CV infrastructure doesn't handle FeBe socket events currently. The processes (startup process, walreceiver etc.) wanting to wake up walsenders use ConditionVariableBroadcast(), which in turn calls SetLatch(), helping walsenders come out of WaitEventSetWait(). We use separate shared memory CVs for physical and logical walsenders for selective wake ups, see WalSndWakeup() for more details. This approach is simple and reasonably efficient. But not very elegant. But for 16 it seems to be a better path than a larger redesign of the CV mechanism. A desirable future improvement would be to add support for CVs into WaitEventSetWait(). This still leaves us with a small regression in very extreme workloads (due to the spinlock acquisition in ConditionVariableBroadcast() when there are no waiters) - but that seems acceptable. Reported-by: Andres Freund <andres@anarazel.de> Suggested-by: Andres Freund <andres@anarazel.de> Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Discussion: https://www.postgresql.org/message-id/20230509190247.3rrplhdgem6su6cg%40awork3.anarazel.de	2023-05-21 09:44:55 -07:00
Tom Lane	a2eb99a01e	Expand some more uses of "deleg" to "delegation" or "delegated". Complete the task begun in `9c0a0e2ed`: we don't want to use the abbreviation "deleg" for GSS delegation in any user-visible places. (For consistency, this also changes most internal uses too.) Abhijit Menon-Sen and Tom Lane Discussion: https://postgr.es/m/949048.1684639317@sss.pgh.pa.us	2023-05-21 10:55:18 -04:00
Nathan Bossart	f4001a5537	Fix remaining references to gss_accept_deleg. These were missed in `9c0a0e2ed9`. Discussion: https://postgr.es/m/20230521031757.GA3835667%40nathanxps13	2023-05-20 20:32:56 -07:00
Bruce Momjian	9c0a0e2ed9	rename "gss_accept_deleg" to "gss_accept_delegation". This is more consistent with existing GUC spelling. Discussion: https://postgr.es/m/ZGdnEsGtNj7+fZoa@momjian.us	2023-05-20 21:32:54 -04:00
Tom Lane	0245f8db36	Pre-beta mechanical code beautification. Run pgindent, pgperltidy, and reformat-dat-files. This set of diffs is a bit larger than typical. We've updated to pg_bsd_indent 2.1.2, which properly indents variable declarations that have multi-line initialization expressions (the continuation lines are now indented one tab stop). We've also updated to perltidy version 20230309 and changed some of its settings, which reduces its desire to add whitespace to lines to make assignments etc. line up. Going forward, that should make for fewer random-seeming changes to existing code. Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql	2023-05-19 17:24:48 -04:00
Tom Lane	d0f952691f	Fix thinko in join removal. In commit `9df8f903e` I (tgl) switched join_is_removable() from using the min relid sets of the join under consideration to using its full syntactic relid sets. This was a mistake, as it allowed join removal in cases where a reference to the join output would survive in some syntactically-lower join condition. Revert to the former coding. Richard Guo Discussion: https://postgr.es/m/CAMbWs4-EU9uBGSP7G-iTwLBhRQ=rnZKvFDhD+n+xhajokyPCKg@mail.gmail.com	2023-05-19 15:24:07 -04:00
Tom Lane	70b42f2790	Fix misbehavior of EvalPlanQual checks with multiple result relations. The idea of EvalPlanQual is that we replace the query's scan of the result relation with a single injected tuple, and see if we get a tuple out, thereby implying that the injected tuple still passes the query quals. (In join cases, other relations in the query are still scanned normally.) This logic was not updated when commit `86dc90056` made it possible for a single DML query plan to have multiple result relations, when the query target relation has inheritance or partition children. We replaced the output for the current result relation successfully, but other result relations were still scanned normally; thus, if any other result relation contained a tuple satisfying the quals, we'd think the EPQ check passed, even if it did not pass for the injected tuple itself. This would lead to update or delete actions getting performed when they should have been skipped due to a conflicting concurrent update in READ COMMITTED isolation mode. Fix by blocking all sibling result relations from emitting tuples during an EvalPlanQual recheck. In the back branches, the fix is complicated a bit by the need to not change the size of struct EPQState (else we'd have ABI-breaking changes in offsets in struct ModifyTableState). Like the back-patches of `3f7836ff6` and `4b3e37993`, add a separately palloc'd struct to avoid that. The logic is the same as in HEAD otherwise. This is only a live bug back to v14 where `86dc90056` came in. However, I chose to back-patch the test cases further, on the grounds that this whole area is none too well tested. I skipped doing so in v11 though because none of the test applied cleanly, and it didn't quite seem worth extra work for a branch with only six months to live. Per report from Ante Krešić (via Aleksander Alekseev) Discussion: https://postgr.es/m/CAJ7c6TMBTN3rcz4=AjYhLPD_w3FFT0Wq_C15jxCDn8U4tZnH1g@mail.gmail.com	2023-05-19 14:26:40 -04:00
Peter Eisentraut	8e7912e73d	Message style improvements	2023-05-19 18:45:29 +02:00
Tomas Vondra	8c4040edf4	Allocate hash join files in a separate memory context Should a hash join exceed memory limit, the hashtable is split up into multiple batches. The number of batches is doubled each time a given batch is determined not to fit in memory. Each batch file is allocated with a block-sized buffer for buffering tuples and parallel hash join has additional sharedtuplestore accessor buffers. In some pathological cases requiring a lot of batches, often with skewed data, bad stats, or very large datasets, users can run out-of-memory solely from the memory overhead of all the batch files' buffers. Batch files were allocated in the ExecutorState memory context, making it very hard to identify when this batch explosion was the source of an OOM. This commit allocates the batch files in a dedicated memory context, making it easier to identify the cause of an OOM and work to avoid it. Based on initial draft by Tomas Vondra, with significant reworks and improvements by Jehan-Guillaume de Rorthais. Author: Jehan-Guillaume de Rorthais <jgdr@dalibo.com> Author: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20190421114618.z3mpgmimc3rmubi4@development Discussion: https://postgr.es/m/20230504193006.1b5b9622%40karst#273020ff4061fc7a2fbb1ba96b281f17	2023-05-19 17:17:58 +02:00
Tomas Vondra	507615fc53	Describe hash join implementation Add a high level description of our implementation of the hybrid hash join algorithm to the block comment in nodeHashjoin.c. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com> Discussion: https://postgr.es/m/20230516160051.4267a800%40karst	2023-05-19 17:17:58 +02:00
Peter Eisentraut	803b4a26ca	Remove stray mid-sentence tabs in comments	2023-05-19 16:13:16 +02:00
Peter Eisentraut	4c9deebd37	Move mdwriteback() to better place The previous order in the file didn't make sense and matched neither the header file nor the smgr API. Discussion: https://www.postgresql.org/message-id/flat/22fed8ba-01c3-2008-a256-4ea912d68fab%40enterprisedb.com	2023-05-19 13:42:06 +02:00
Peter Eisentraut	0b8ace8d77	Reindent some comments Most (older) comments in md.c and smgr.c are indented with a leading tab on all lines, which isn't the current style and makes updating the comments a bit annoying. This reindents all these lines with a single space, as is the normal style. This issue exists in various shapes throughout the code but it's pretty consistent here, and since there is a patch pending to refresh some of the comments in these files, it seems sensible to clean this up here separately. Discussion: https://www.postgresql.org/message-id/flat/22fed8ba-01c3-2008-a256-4ea912d68fab%40enterprisedb.com	2023-05-19 10:52:04 +02:00
Michael Paquier	e7bff46e50	pageinspect: Fix gist_page_items() with included columns Non-leaf pages of GiST indexes contain key attributes, leaf pages contain both key and non-key attributes, and gist_page_items() ignored the handling of non-key attributes. This caused a few problems when using gist_page_items() on a GiST index with INCLUDE: - On a non-leaf page, the function would crash. - On a leaf page, the function would work, but miss to display all the values for included attributes. This commit fixes gist_page_items() to handle such cases in a more appropriate way, and now displays the values of key and non-key attributes for each item separately in a style consistent with what ruleutils.c would generate for the attribute list, depending on the page type dealt with. In a way similar to how a record is displayed, values would be double-quoted for key or non-key attributes if required. ruleutils.c did not provide a routine able to control if non-key attributes should be displayed, so an extended() routine for index definitions is added to work around the leaf and non-leaf page differences. While on it, this commit fixes a third problem related to the amount of data reported for key attributes. The code originally relied on BuildIndexValueDescription() (used for error reports on constraints) that would not print all the data stored in the index but the index opclass's input type, so this limited the amount of information available. This switch makes gist_page_items() much cheaper as there is no need to run ACL checks for each item printed, which is not an issue anyway as superuser rights are required to execute the functions of pageinspect. Opclasses whose data cannot be displayed can rely on gist_page_items_bytea(). The documentation of this function was slightly incorrect for the output results generated on HEAD and v15, so adjust it on these branches. Author: Alexander Lakhin, Michael Paquier Discussion: https://postgr.es/m/17884-cb8c326522977acb@postgresql.org Backpatch-through: 14	2023-05-19 12:37:58 +09:00
Tomas Vondra	3581cbdcd6	Fix handling of empty ranges and NULLs in BRIN BRIN indexes did not properly distinguish between summaries for empty (no rows) and all-NULL ranges, treating them as essentially the same thing. Summaries were initialized with allnulls=true, and opclasses simply reset allnulls to false when processing the first non-NULL value. This however produces incorrect results if the range starts with a NULL value (or a sequence of NULL values), in which case we forget the range contains NULL values when adding the first non-NULL value. This happens because the allnulls flag is used for two separate purposes - to mark empty ranges (not representing any rows yet) and ranges containing only NULL values. Opclasses don't know which of these cases it is, and so don't know whether to set hasnulls=true. Setting the flag in both cases would make it correct, but it would also make BRIN indexes useless for queries with IS NULL clauses. All ranges start empty (and thus allnulls=true), so all ranges would end up with either allnulls=true or hasnulls=true. The severity of the issue is somewhat reduced by the fact that it only happens when adding values to an existing summary with allnulls=true. This can happen e.g. for small tables (because a summary for the first range exists for all BRIN indexes), or for tables with large fraction of NULL values in the indexed columns. Bulk summarization (e.g. during CREATE INDEX or automatic summarization) that processes all values at once is not affected by this issue. In this case the flags were updated in a slightly different way, not forgetting the NULL values. To identify empty ranges we use a new flag, stored in an unused bit in the BRIN tuple header so the on-disk format remains the same. A matching flag is added to BrinMemTuple, into a 3B gap after bt_placeholder. That means there's no risk of ABI breakage, although we don't actually pass the BrinMemTuple to any public API. We could also skip storing index tuples for empty summaries, but then we'd have to always process such ranges - even if there are no rows in large parts of the table (e.g. after a bulk DELETE), it would still require reading the pages etc. So we store them, but ignore them when building the bitmap. Backpatch to 11. The issue exists since BRIN indexes were introduced in 9.5, but older releases are already EOL. Backpatch-through: 11 Reviewed-by: Justin Pryzby, Matthias van de Meent, Alvaro Herrera Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b@enterprisedb.com	2023-05-19 01:29:44 +02:00
Tomas Vondra	3ec8a3bfb5	Fix handling of NULLs when merging BRIN summaries When merging BRIN summaries, union_tuples() did not correctly update the target hasnulls/allnulls flags. When merging all-NULL summary into a summary without any NULL values, the result had both flags set to false (instead of having hasnulls=true). This happened because the code only considered the hasnulls flags, ignoring the possibility the source summary has allnulls=true. Discovered while investigating issues with handling empty BRIN ranges and handling of NULL values, but it's a separate problem (has nothing to do with empty ranges). Fixed by considering both flags on the source summary, and updating the hasnulls flag on the target summary. Backpatch to 11. The bug exists since 9.5 (where BRIN indexes were introduced), but those releases are EOL already. Discussion: https://postgr.es/m/9d993d0d-e431-2196-9ccc-0554d0e60154%40enterprisedb.com	2023-05-18 23:33:23 +02:00
Tom Lane	8a2523ff35	Tweak API of new function clause_is_computable_at(). Pass it the RestrictInfo under consideration, not just the clause_relids. This should save some trivial amount of code at the call sites, and it gives us more flexibility about what clause_is_computable_at() does. There's no actual functional change here, though. Discussion: https://postgr.es/m/3564467.1684352557@sss.pgh.pa.us	2023-05-18 10:39:16 -04:00
Jeff Davis	1c634f6647	ICU: check for U_STRING_NOT_TERMINATED_WARNING. Fixes memory error in cases where the length of the language name returned by uloc_getLanguage() is exactly ULOC_LANG_CAPACITY, in which case the status is set to U_STRING_NOT_TERMINATED_WARNING. Also check in call sites for other ICU functions that are expected to return a C string to be safe (no bug is known at these other call sites). Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/2098874d-c111-41e4-9063-30bcf135226b@gmail.com	2023-05-17 14:18:45 -07:00
Jeff Davis	6de31ce446	Reduce icu_validation_level default to WARNING. Discussion: https://postgr.es/m/daa9f060aa2349ebc84444515efece49e7b32c5d.camel@j-davis.com	2023-05-17 13:18:40 -07:00
Andres Freund	093e5c57d5	Add writeback to pg_stat_io `28e626bde0` added the concept of IOOps but neglected to include writeback operations. `ac8d53dae5` added time spent doing these I/O operations. Without counting writeback, checkpointer write time in the log often differed substantially from that in pg_stat_io. To fix this, add IOOp IOOP_WRITEBACK and track writeback in pg_stat_io. Bumps catversion. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20230419172326.dhgyo4wrrhulovt6%40awork3.anarazel.de	2023-05-17 11:18:35 -07:00
Andres Freund	52676dc2e0	Update parameter name context to wb_context For clarity of review, renaming the function parameter "context" in ScheduleBufferTagForWriteback() and IssuePendingWritebacks() to "wb_context" is a separate commit. The next commit adds an "io_context" parameter and "wb_context" makes it more clear which is which. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_acc6iL4M3hvOTeztf_ZPpsB3Pqio5aVHgZ5q=Pi3BZKg@mail.gmail.com	2023-05-17 11:18:30 -07:00
Alexander Korotkov	b9a7a82272	Revert "Add USER SET parameter values for pg_db_role_setting" This reverts commit `096dd80f3c` and its fixups `beecbe8e50`, `afdd9f7f0e`, `529da086ba`, `db93e739ac`. Catversion is bumped. Discussion: https://postgr.es/m/d46f9265-ff3c-6743-2278-6772598233c2%40pgmasters.net	2023-05-17 20:28:57 +03:00
Tom Lane	69c430626b	Track tlist_vinfo.varnullingrels even in non-Assert builds. Oversight in commit `867be9c07` (which should get reverted along with that, if we ever do revert it). Per buildfarm.	2023-05-17 11:46:15 -04:00
Tom Lane	9df8f903eb	Fix some issues with improper placement of outer join clauses. After applying outer-join identity 3 in the forward direction, it was possible for the planner to mistakenly apply a qual clause from above the two outer joins at the now-lower join level. This can give the wrong answer, since a value that would get nulled by the now-upper join might not yet be null. To fix, when we perform such a transformation, consider that the now-lower join hasn't really completed the outer join it's nominally responsible for and thus its relid set should not include that OJ's relid (nor should its output Vars have that nullingrel bit set). Instead we add those bits when the now-upper join is performed. The existing rules for qual placement then suffice to prevent higher qual clauses from dropping below the now-upper join. There are a few complications from needing to consider transitive closures in case multiple pushdowns have happened, but all in all it's not a very complex patch. This is all new logic (from `2489d76c4`) so no need to back-patch. The added test cases all have the same results as in v15. Tom Lane and Richard Guo Discussion: https://postgr.es/m/0b819232-4b50-f245-1c7d-c8c61bf41827@postgrespro.ru	2023-05-17 11:14:04 -04:00
Tom Lane	867be9c073	Convert nullingrels match checks from Asserts to test-and-elog. It seems like the code that these checks are backstopping may have a few bugs left in it. Use a test-and-elog so that the tests are performed even in non-assert builds, and so that we get something more informative than "server closed the connection" on failure. Committed separately with the idea that eventually we'll revert this. It might be awhile though. Discussion: https://postgr.es/m/3014965.1684293045@sss.pgh.pa.us	2023-05-17 11:14:04 -04:00
Michael Paquier	d8c3106bb6	Add back SQLValueFunction for SQL keywords This is equivalent to a revert of `f193883` and `fb32748`, with the addition that the declaration of the SQLValueFunction node needs to gain a couple of node_attr for query jumbling. The performance impact of removing the function call inlining is proving to be too huge for some workloads where these are used. A worst-case test case of involving only simple SELECT queries with a SQL keyword is proving to lead to a reduction of 10% in TPS via pgbench and prepared queries on a high-end machine. None of the tests I ran back for this set of changes saw such a huge gap, but Alexander Lakhin and Andres Freund have found that this can be noticeable. Keeping the older performance would mean to do more inlining in the executor when using COERCE_SQL_SYNTAX for a function expression, similarly to what SQLValueFunction does. This requires more redesign work and there is little time until 16beta1 is released, so for now reverting the change is the best way forward, bringing back the previous performance. Bump catalog version. Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/b32bed1b-0746-9b20-1472-4bdc9ca66d52@gmail.com	2023-05-17 10:19:17 +09:00
Alvaro Herrera	c44b59fad4	Mark internal messages as no longer translatable The problem that these messages protect against can only occur because a corrupted hash spill file was written, i.e., a Postgres bug. There's no reason to have them as translatable. Backpatch to 15, where these messages were changed by commit `c4649cce39`. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/20230510175407.dwa5v477pw62ikyx@alvherre.pgsql	2023-05-16 11:47:25 +02:00
Thomas Munro	63932a6d38	Fix wal_writer_flush_after initializer value. Commit `a73952b795` (new in 16) required default values in guc_table.c and C variable initializers to match. This one only matched when XLOG_BLCKSZ == 8kB. Fix by using the same expression in both places with a new DEFAULT_XXX macro, as done for other GUCs. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGLNmLV=VrT==5MqnbARgx2ifRSFtdd8ofdfrdSLL3yv5A@mail.gmail.com	2023-05-15 11:19:54 +12:00
Thomas Munro	319bae9a8d	Rename io_direct to debug_io_direct. Give the new GUC introduced by `d4e71df6` a name that is clearly not intended for mainstream use quite yet. Future proposals would drop the prefix only after adding infrastructure to make it efficient. Having the switch in the tree sooner is good because it might lead to new discoveries about the hazards awaiting us on a wide range of systems, but that name was too enticing and could lead to cross-version confusion in future, per complaints from Noah and Justin. Suggested-by: Noah Misch <noah@leadboat.com> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> (the idea, not the patch) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (ditto) Discussion: https://postgr.es/m/20230430041106.GA2268796%40rfd.leadboat.com	2023-05-15 10:31:14 +12:00
Nathan Bossart	4d5105a684	Improve error message for pg_create_subscription. `c3afe8cf5a` updated this error message, but it didn't use the new style established in `de4d456b40`. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/20230512203721.GA2644063%40nathanxps13.home	2023-05-12 14:16:56 -07:00
Tom Lane	c8b881d21f	Undo faulty attempt at not relying on RINFO_IS_PUSHED_DOWN. I've had a bee in my bonnet for some time about getting rid of RestrictInfo.is_pushed_down, because it's squishily defined and requires not-inexpensive extra tests to use (cf RINFO_IS_PUSHED_DOWN). In commit `2489d76c4`, I tried to make remove_rel_from_query() not depend on that macro; but the replacement test is buggy, as exposed by a report from Rushabh Lathia and Robert Haas. That change was pretty incidental to the main goal of `2489d76c4`, so let's just revert it for now. (Getting rid of is_pushed_down is still far away, anyway.) Discussion: https://postgr.es/m/CA+TgmoYco=hmg+iX1CW9Y1_CzNoSL81J03wUG-d2_3=rue+L2A@mail.gmail.com	2023-05-11 13:44:25 -04:00
Alvaro Herrera	c39f2f68e9	Fix publication syntax error message There was some odd wording in corner-case gram.y error messages "some error ... at or near", which appears to have been modeled after "syntax error" messages. However, they don't work that way, and they're just wrong. They're also uncovered by tests. Remove the trailing words, and also add tests. They were introduced with 5a2832465fd8; backpatch to 15. Author: Álvaro Herrera <alvherre@alvh.no-ip.org>	2023-05-10 18:26:10 +02:00
Peter Eisentraut	d8bcce1b5e	Add missing gettext triggers due to the changes in commit `dac048f71e`	2023-05-10 13:51:51 +02:00
Michael Paquier	605994651b	Fix assertion failure when updating stats_fetch_consistency in a transaction An update of the GUC stats_fetch_consistency in a transaction would be able to trigger an assertion when doing cache->snapshot. In this case, when retrieving a pgstat entry after the switch, a new snapshot would be rebuilt, confusing pgstat_build_snapshot() because a snapshot is already cached with an unexpected mode ("cache"). In order to fix this problem, this commit adds a flag to force a snapshot clear each time this GUC is changed. Some tests are added to check, while on it. Some optimizations in avoiding the snapshot clear should be possible depending on what is cached and the current GUC value, I guess, but this solution is simple, and ensures that the state of the cache is updated each time a new pgstat entry is fetched, hence being consistent with the level wanted by the client that has set the GUC. Note that cache->none and snapshot->none would not cause issues, as fetching a pgstat entry would be retrieved from shared memory on the second attempt, however a snapshot would still be cached. Similarly, none->snapshot and none->cache would build a new snapshot on the second fetch attempt. Finally, snapshot->cache would cache a new snapshot on the second attempt. Reported-by: Alexander Lakhin Author: Kyotaro Horiguchi Discussion: https://postgr.es/m/17804-2a118cd046f2d0e5@postgresql.org backpatch-through: 15	2023-05-10 11:24:30 +09:00
Michael Paquier	4d47eff99c	Document values of stats_fetch_consistency in postgresql.conf.sample Issue noted while looking at a patch related to that. Discussion: https://postgr.es/m/ZE9LiFc7JdNHokz/@paquier.xyz	2023-05-10 10:19:57 +09:00
Amit Kapila	3d144c6c86	Fix invalid memory access during the shutdown of the parallel apply worker. The callback function pa_shutdown() accesses MyLogicalRepWorker which may not be initialized if there is an error during the initialization of the parallel apply worker. The other problem is that by the time it is invoked even after the initialization of the worker, the MyLogicalRepWorker will be reset by another callback logicalrep_worker_onexit. So, it won't have the required information. To fix this, register the shutdown callback after we are attached to the worker slot. After this fix, we observed another issue which is that sometimes the leader apply worker tries to receive the message from the error queue that might already be detached by the parallel apply worker leading to an error. To prevent such an error, we ensure that the leader apply worker detaches from the parallel apply worker's error queue before stopping it. Reported-by: Sawada Masahiko Author: Hou Zhijie Reviewed-by: Sawada Masahiko, Amit Kapila Discussion: https://postgr.es/m/CAD21AoDo+yUwNq6nTrvE2h9bB2vZfcag=jxWc7QxuWCmkDAqcA@mail.gmail.com	2023-05-09 09:28:06 +05:30
Jeff Davis	455f948b0d	Revert "ICU: do not convert locale 'C' to 'en-US-u-va-posix'." This reverts commit `f7faa9976c`. Discussion: https://postgr.es/m/483826.1683582475@sss.pgh.pa.us	2023-05-08 20:50:51 -07:00
Jeff Davis	f7faa9976c	ICU: do not convert locale 'C' to 'en-US-u-va-posix'. The conversion was intended to be for convenience, but it's more likely to be confusing than useful. The user can still directly specify 'en-US-u-va-posix' if desired. Discussion: https://postgr.es/m/f83f089ee1e9acd5dbbbf3353294d24e1f196e95.camel@j-davis.com Discussion: https://postgr.es/m/37520ec1ae9591f83132f82dbd625f3fc2d69c16.camel@j-davis.com	2023-05-08 10:34:51 -07:00
Tom Lane	ca73753b09	Handle RLS dependencies in inlined set-returning functions properly. If an SRF in the FROM clause references a table having row-level security policies, and we inline that SRF into the calling query, we neglected to mark the plan as potentially dependent on which role is executing it. This could lead to later executions in the same session returning or hiding rows that should have been hidden or returned instead. Our thanks to Wolfgang Walther for reporting this problem. Stephen Frost and Tom Lane Security: CVE-2023-2455	2023-05-08 10:12:44 -04:00
Noah Misch	681d9e4621	Replace last PushOverrideSearchPath() call with set_config_option(). The two methods don't cooperate, so set_config_option("search_path", ...) has been ineffective under non-empty overrideStack. This defect enabled an attacker having database-level CREATE privilege to execute arbitrary code as the bootstrap superuser. While that particular attack requires v13+ for the trusted extension attribute, other attacks are feasible in all supported versions. Standardize on the combination of NewGUCNestLevel() and set_config_option("search_path", ...). It is newer than PushOverrideSearchPath(), more-prevalent, and has no known disadvantages. The "override" mechanism remains for now, for compatibility with out-of-tree code. Users should update such code, which likely suffers from the same sort of vulnerability closed here. Back-patch to v11 (all supported versions). Alexander Lakhin. Reported by Alexander Lakhin. Security: CVE-2023-2454	2023-05-08 06:14:07 -07:00
Tom Lane	41e2c52fd6	Add ruleutils support for decompiling MERGE commands. This was overlooked when MERGE was added, but it's essential support for MERGE in new-style SQL functions. Alvaro Herrera Discussion: https://postgr.es/m/3579737.1683293801@sss.pgh.pa.us	2023-05-07 11:01:15 -04:00
Michael Paquier	58f5edf849	Fix typo with wait event for SLRU buffer of commit timestamps This wait event was documented as "CommitTsBuffer" since its introduction, but the code named it "CommitTSBuffer". This commit fixes the code to follow the term documented, which is also more consistent with the naming of the other wait events used for commit timestamps. Introduced by `5da1493`. Author: Alexander Lakhin Discussion: https://postgr.es/m/e8c38840-596a-83d6-bd8d-cebc51111572@gmail.com Backpatch-through: 13	2023-05-05 21:25:44 +09:00
Alvaro Herrera	f75cec4fff	Fix ExecCheckPermissions call in RI_Initial_Check RI_Initial_Check was setting up a list of RTEPermissionInfo for ExecCheckPermissions() wrong, and the problem is subtle enough that it doesn't have any immediate effect in core code. However, if an extension is using the ExecutorCheckPerms_hook, then it would get the wrong parameters and perhaps arrive at a wrong conclusion, or outright malfunction. Fix by constructing that list and the RTE list more honestly. We also add an assertion check to verify that these lists match. This new assertion would have caught this bug. Co-authored-by: Олег Целебровский (Oleg Tselebrovskii) <o.tselebrovskiy@postgrespro.ru> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/3722b7a2cbe27a1796ee40824bd86dd1@postgrespro.ru	2023-05-04 19:55:56 +02:00
Tom Lane	4c40995f61	In array_position()/array_positions(), beware of empty input array. These functions incautiously fetched the array's first lower bound even when the array is zero-dimensional, thus fetching the word after the allocated array space. While almost always harmless, with very bad luck this could result in SIGSEGV. Fix by adding an early exit for empty input. Per bug #17920 from Alexander Lakhin. Discussion: https://postgr.es/m/17920-f7c228c627b6d02e%40postgresql.org	2023-05-04 11:48:23 -04:00
Alvaro Herrera	5472743d9e	Revert "Move PartitionPruneInfo out of plan nodes into PlannedStmt" This reverts commit `ec38694894` and its fixup `589bb81649`. This change was intended to support query planning avoiding acquisition of locks on partitions that were going to be pruned; however, the overall project took a different direction at [1] and this bit is no longer needed. Put things back the way they were as agreed in [2], to avoid unnecessary complexity. Discussion: [1] https://postgr.es/m/4191508.1674157166@sss.pgh.pa.us Discussion: [2] https://postgr.es/m/20230502175409.kcoirxczpdha26wt@alvherre.pgsql	2023-05-04 12:09:59 +02:00
Amit Kapila	de63f8dade	Fix assertion failure in apply worker. During exit, the logical replication apply worker tries to release session level locks, if any. However, if the apply worker exits due to an error before its connection is initialized, trying to release locks can lead to assertion failure. The locks will be acquired once the worker is initialized, so we don't need to release them till the worker initialization is complete. Reported-by: Alexander Lakhin Author: Hou Zhijie based on inputs from Sawada Masahiko and Amit Kapila Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/2185d65f-5aae-3efa-c48f-fb42b173ef5c@gmail.com	2023-05-03 10:17:49 +05:30
Peter Eisentraut	e0bb5d0c5b	Update SQL features Some updates for SQL:2023 and some new features in PostgreSQL 16.	2023-05-02 10:59:21 +02:00
Michael Paquier	8961cb9a03	Fix typos in comments The changes done in this commit impact comments with no direct user-visible changes, with fixes for incorrect function, variable or structure names. Author: Alexander Lakhin Discussion: https://postgr.es/m/e8c38840-596a-83d6-bd8d-cebc51111572@gmail.com	2023-05-02 12:23:08 +09:00
Michael Paquier	4dadd660f0	Fix crashes with CREATE SCHEMA AUTHORIZATION and schema elements CREATE SCHEMA AUTHORIZATION with appended schema elements can lead to crashes when comparing the schema name of the query with the schemas used in the qualification of some clauses in the elements' queries. The origin of the problem is that the transformation routine for the elements listed in a CREATE SCHEMA query uses as new, expected, schema name the one listed in CreateSchemaStmt itself. However, depending on the query, CreateSchemaStmt.schemaname may be NULL, being computed instead from the role specification of the query given by the AUTHORIZATION clause, that could be either: - A user name string, with the new schema name being set to the same value as the role given. - Guessed from CURRENT_ROLE, SESSION_ROLE or CURRENT_ROLE, with a new schema name computed from the security context where CREATE SCHEMA is running. Regression tests are added for CREATE SCHEMA with some appended elements (some of them with schema qualifications), covering also some role specification patterns. While on it, this simplifies the context structure used during the transformation of the elements listed in a CREATE SCHEMA query by removing the fields for the role specification and the role type. They were not used, and for the role specification this could be confusing as the schema name may by extracted from that at the beginning of CreateSchemaCommand(). This issue exists for a long time, so backpatch down to all the versions supported. Reported-by: Song Hongyu Author: Michael Paquier Reviewed-by: Richard Guo Discussion: https://postgr.es/m/17909-f65c12dfc5f0451d@postgresql.org Backpatch-through: 11	2023-04-28 19:29:12 +09:00
Daniel Gustafsson	4a6603cd46	Fix assertion failure in heap_vacuum_rel Commit `7d71d3dd08` changed resetting the VacuumFailsafeActive flag to an assertion since the flag is reset before starting vacuuming a relation. This however failed to take recursive calls of vacuum_rel() and vacuum of TOAST tables into consideration. Fix by reverting back to resettting the flag. Author: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reported-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://postgr.es/m/CAFBsxsFz=GqaG5Ens5aNgVYoV2Y+pfMUijX0ku+CCkWfALwiqg@mail.gmail.com	2023-04-28 10:30:05 +02:00
Masahiko Sawada	b72f564d87	Add unit to vacuum_buffer_usage_limit value in postgresql.conf.sample. Also adjust the indentation of the comment to the surrounding parameters. Author: Masahiko Sawada Reviewed-by: David Rowley, Daniel Gustafsson, Melanie Plageman Discussion: https://postgr.es/m/CAD21AoCBSqmqOKVH4Q256DeCC_UE50gu1sgixcjLFZGLEbABVA@mail.gmail.com	2023-04-28 15:40:12 +09:00
Nathan Bossart	b72623671d	Prevent underflow in KeepLogSeg(). The call to XLogGetReplicationSlotMinimumLSN() might return a greater LSN than the one given to the function. Subsequent segment number calculations might then underflow, which could result in unexpected behavior when removing or recyling WAL files. This was introduced with max_slot_wal_keep_size in `c655077639`. To fix, skip the block of code for replication slots if the LSN is greater. Reported-by: Xu Xingwang Author: Kyotaro Horiguchi Reviewed-by: Junwang Zhao Discussion: https://postgr.es/m/17903-4288d439dee856c6%40postgresql.org Backpatch-through: 13	2023-04-27 14:31:17 -07:00
Alexander Korotkov	db93e739ac	Fix wrong construct_array_builtin() call in GUCArrayDelete() The current code unintentionally uses the wrong datum to construct an array. The bug was introduced by `096dd80f3c`, so no backpatching is needed. Reported-by: David Steele Discussion: https://postgr.es/m/d46f9265-ff3c-6743-2278-6772598233c2%40pgmasters.net Author: Nathan Bossart Reviewed-by: David Steele, Tom Lane	2023-04-27 22:06:14 +03:00
Michael Paquier	84cc142674	Re-add tracking of wait event SLRUFlushSync SLRUFlushSync has been accidently removed during `dee663f`, that has moved the flush of the SLRU files to the checkpointer, so add it back. The issue has been noticed by Thomas when checking for orphaned wait events. Author: Thomas Munro Reviewed-by: Bharath Rupireddy Discussion: https://postgr.es/m/CA+hUKGK6tqm59KuF1z+h5Y8fsWcu5v8+84kduSHwRzwjB2aa_A@mail.gmail.com	2023-04-26 07:10:06 +09:00
Daniel Gustafsson	bfac8f8bc4	Fix vacuum_cost_delay check for balance calculation. Commit `1021bd6a89` excluded autovacuum workers from cost-limit balance calculations when per-relation options were set. The code checks for limit and cost_delay being greater than zero, but since cost_delay can be set to -1 the test needs to check for greater than or zero. Backpatch to all supported branches since `1021bd6a89` was backpatched all the way at the time. Author: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CAD21AoBS7o6Ljt_vfqPQPf67AhzKu3fR0iqk8B=vVYczMugKMQ@mail.gmail.com Backpatch-through: v11 (all supported branches)	2023-04-25 13:54:10 +02:00
Michael Paquier	806fad7573	Fix buffer refcount leak with FDW bulk inserts The leak would show up when using batch inserts with foreign tables included in a partition tree, as the slots used in the batch were not reset once processed. In order to fix this problem, some ExecClearTuple() are added to clean up the slots used once a batch is filled and processed, mapping with the number of slots currently in use as tracked by the counter ri_NumSlots. This buffer refcount leak has been introduced in `b676ac4` with the addition of the executor facility to improve bulk inserts for FDWs, so backpatch down to 14. Alexander has provided the patch (slightly modified by me). The test for postgres_fdw comes from me, based on the test case that the author has sent in the report. Author: Alexander Pyhalov Discussion: https://postgr.es/m/b035780a740efd38dc30790c76927255@postgrespro.ru Backpatch-through: 14	2023-04-25 09:42:19 +09:00
Andres Freund	1118cd37eb	Remove vacuum_defer_cleanup_age vacuum_defer_cleanup_age was introduced before hot_standby_feedback and replication slots existed. It is hard to use reasonably - commonly it will either be set too low (not preventing recovery conflicts, while still causing some bloat), or too high (causing a lot of bloat). The alternatives do not have that issue. That on its own might not be sufficient reason to remove vacuum_defer_cleanup_age, but it also complicates computation of xid horizons. See e.g. the bug fixed in `be504a3e97`. It also is untested. This commit removes TransactionIdRetreatSafely(), as there are no users anymore. There might be potential future users, hence noting that here. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20230317230930.nhsgk3qfk7f4axls@awork3.anarazel.de	2023-04-24 12:21:02 -07:00
Tom Lane	fce3b26e97	Rename ExecAggTransReparent, and improve its documentation. The name of this function suggests that it ought to reparent R/W expanded objects to be children of the persistent aggcontext, instead of copying them. In fact it does no such thing, and if you try to make it do so you will see multiple regression failures. Rename it to the less-misleading ExecAggCopyTransValue, and add commentary about why that attractive-sounding optimization won't work. Also adjust comments at call sites, some of which were describing logic that has since been moved into ExecAggCopyTransValue. Discussion: https://postgr.es/m/3004282.1681930251@sss.pgh.pa.us	2023-04-24 13:01:33 -04:00
Peter Eisentraut	5a8366ad75	doc: Update SQL features names Some feature names have been adjusted over time upstream. This pulls in those changes.	2023-04-24 15:39:54 +02:00
Daniel Gustafsson	69537f5d17	Remove duplicate lines of code Commit `6df7a9698b` accidentally included two identical prototypes for default_multirange_selectivi() and commit `086cf1458c` added a break; statement where one was already present, thus duplicating it. While there is no bug caused by this, fix by removing the duplicated lines as they provide no value. Backpatch the fix for duplicate prototypes to v14 and the duplicate break statement fix to all supported branches to avoid backpatching hazards due to the removal. Reported-by: Anton Voloshin <a.voloshin@postgrespro.ru> Discussion: https://postgr.es/m/0e69cb60-0176-f6d0-7e15-6478b7d85724@postgrespro.ru	2023-04-24 11:16:17 +02:00
Masahiko Sawada	781ac42d43	Use elog to report unexpected action in handle_streamed_transaction(). An oversight in commit `216a784829`. Author: Masahiko Sawada Reviewed-by: Kyotaro Horiguchi, Amit Kapila Discussion: https://postgr.es/m/CAD21AoDDbM8_HJt-nMCvcjTK8K9hPzXWqJj7pyaUvR4mm_NrSg@mail.gmail.com	2023-04-24 15:37:14 +09:00
Alexander Korotkov	cd115c3530	Fix custom validators call in build_local_reloptions() We need to call them only when validate == true. Backpatch to 13, where opclass options were introduced. Reported-by: Tom Lane Discussion: https://postgr.es/m/2656633.1681831542%40sss.pgh.pa.us Reviewed-by: Tom Lane, Pavel Borisov Backpatch-through: 13	2023-04-23 13:58:41 +03:00
Jeff Davis	c04c6c5d6f	Avoid character classification in regex escape parsing. For regex escape sequences, just test directly for the relevant ASCII characters rather than using locale-sensitive character classification. This fixes an assertion failure when a locale considers a non-ASCII character, such as "൧", to be a digit. Reported-by: Richard Guo Discussion: https://postgr.es/m/CAMbWs49Q6UoKGeT8pBkMtJGJd+16CBFZaaWUk9Du+2ERE5g_YA@mail.gmail.com Backpatch-through: 11	2023-04-21 08:19:41 -07:00
David Rowley	d91d163529	Fix incorrect function name reference This function was renamed in `0c9d84427` but this comment wasn't updated. Author: Alexander Lakhin Discussion: https://postgr.es/m/699beab4-a6ca-92c9-f152-f559caf6dc25@gmail.com	2023-04-21 10:46:08 +12:00
Michael Paquier	0ecb87e1fa	Remove io prefix from pg_stat_io columns `a9c70b46` added the statistics view pg_stat_io which contained columns "io_context" and "io_object". Given that the columns are in the pg_stat_io view, the "io" prefix is somewhat redundant, so remove it. The code variables referring to these fields are kept unchanged so as they can keep their context about I/O. Bump catalog version. Author: Melanie Plageman Reviewed-by: Kyotaro Horiguchi, Fabrízio de Royes Mello Discussion: https://postgr.es/m/CAAKRu_aAQoJWrvT2BYYQvJChFKra_O-5ra3jhzKJZqWsTR1CPQ@mail.gmail.com	2023-04-21 07:21:50 +09:00
Daniel Gustafsson	a9781ae11b	Fix autovacuum cost debug logging Commit `7d71d3dd0` introduced finer grained updates of autovacuum option changes by increasing the frequency of reading the configuration file. The debug logging of cost parameter was however changed such that some initial values weren't logged. Fix by changing logging to use the old frequency of logging regardless of them changing. Also avoid taking a log for rendering the log message unless the set loglevel is such that the log entry will be emitted. Author: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoBS7o6Ljt_vfqPQPf67AhzKu3fR0iqk8B=vVYczMugKMQ@mail.gmail.com	2023-04-20 15:45:44 +02:00
Amit Kapila	c1cc4e688b	Restart the apply worker if the 'password_required' option is changed. The apply worker is restarted if any subscription option that affects the remote connection was changed. In commit `c3afe8cf5a`, we added the option 'password_required' which can affect the remote connection, so we should restart the worker if it was changed. Author: Amit Kapila Reviewed-by: Robert Haas Discussion: https://postgr.es/m/CAA4eK1+z9UDFEynXLsWeMMuUZc1iQkRwj2HNDtxUHTPo-u1F4A@mail.gmail.com Discussion: https://postgr.es/m/9DFC88D3-1300-4DE8-ACBC-4CEF84399A53@enterprisedb.com	2023-04-20 08:56:18 +05:30
Thomas Munro	7d3d72b55e	Remove obsolete defense against strxfrm() bugs. Old versions of Solaris and illumos had buffer overrun bugs in their strxfrm() implementations. The bugs were fixed more than a decade ago and the relevant releases are long out of vendor support. It's time to remove the defense added by commit `be8b06c3`. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA+hUKGJ-ZPJwKHVLbqye92-ZXeLoCHu5wJL6L6HhNP7FkJ=meA@mail.gmail.com	2023-04-20 13:20:14 +12:00
David Rowley	e35ded2956	Fix list_copy_head() with empty Lists list_copy_head() given an empty List would crash from trying to dereference the List to obtain its length. Since NIL is how we represent an empty List, we should just be returning another empty List in this case. list_copy_head() is new to v16, so let's fix it now before too many people start coding around the buggy NIL behavior. Reported-by: Miroslav Bendik Discussion: https://postgr.es/m/CAPoEpV02WhawuWnmnKet6BqU63bEu7oec0pJc=nKMtPsHMzTXQ@mail.gmail.com	2023-04-20 10:34:46 +12:00
Peter Geoghegan	2584639653	Use nbtdesc "level" field name consistently. The "lev" name that appeared in NEWROOT nbtree record desc output was inconsistent with the symbol name from the underlying C struct. It was also inconsistent with nbtdesc output for other nearby record types with similar level fields. Standardize on "level" to make everything consistent. Follow-up to commit `1c453cfd`.	2023-04-19 12:15:15 -07:00
Peter Geoghegan	50547a3fae	Fix wal_consistency_checking enhanced desc output. Recent enhancements to rmgr desc routines that made the output summarize certain block data (added by commits `7d8219a4` and `1c453cfd`) dealt with records that lack relevant block data (and so have nothing to give a more detailed summary of) by testing !DecodedBkpBlock.has_image. As a result, more detailed descriptions of block data were not output when wal_consistency_checking was enabled. This bug affected records with summarizable block data that also happened to have an FPI that the REDO routine isn't supposed to apply (FPIs used for consistency checking purposes only). The presence of such an FPI was incorrectly taken to indicate the absence of block data. To fix, test DecodedBkpBlock.has_data, not !DecodedBkpBlock.has_image. This is the exact condition that we care about, not an inexact proxy. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wzm5Sc9cBg1qWV_cEBfLNJCrW9FjS-SoHVt8FLA7Ldn8yg@mail.gmail.com	2023-04-19 10:42:39 -07:00
Peter Eisentraut	77dedeb2c4	Remove some tabs in SQL code in C string literals This is not handled uniformly throughout the code, but at least nearby code can be consistent.	2023-04-19 09:29:43 +02:00
David Rowley	3f58a4e296	Fix various typos and incorrect/outdated name references Author: Alexander Lakhin Discussion: https://postgr.es/m/699beab4-a6ca-92c9-f152-f559caf6dc25@gmail.com	2023-04-19 13:50:33 +12:00
Peter Geoghegan	06e0652750	Remove useless argument from nbtree dedup function. _bt_dedup_pass()'s heapRel argument hasn't been needed or used since commit `cf2acaf4dc` made deleting any existing LP_DEAD index tuples the caller's responsibility.	2023-04-18 10:33:15 -07:00
Robert Haas	363e8f9115	Fix pg_basebackup with in-place tablespaces some more. Commit `c6f2f01611` purported to make this work, but problems remained. In a plain-format backup, the files from an in-place tablespace got included in the tar file for the main tablespace, which is wrong but it's not clear that it has any user-visible consequences. In a tar-format backup, the TABLESPACE_MAP option is used, and so we never iterated over pg_tblspc and thus never backed up the in-place tablespaces anywhere at all. To fix this, reverse the changes in that commit, so that when we scan pg_tblspc during a backup, we create tablespaceinfo objects even for in-place tablespaces. We set the field that would normally contain the absolute pathname to the relative path pg_tblspc/${TSOID}, and that's good enough to make basebackup.c happy without any further changes. However, pg_basebackup needs a couple of adjustments to make it work. First, it needs to understand that a relative path for a tablespace means it's an in-place tablespace. Second, it needs to tolerate the situation where restoring the main tablespace tries to create pg_tblspc or a subdirectory and finds that it already exists, because we restore user-defined tablespaces before the main tablespace. Since in-place tablespaces are only intended for use in development and testing, no back-patch. Patch by me, reviewed by Thomas Munro and Michael Paquier. Discussion: http://postgr.es/m/CA+TgmobwvbEp+fLq2PykMYzizcvuNv0a7gPMJtxOTMOuuRLMHg@mail.gmail.com	2023-04-18 11:23:34 -04:00
David Rowley	eef231e816	Fix some typos and some incorrectly duplicated words Author: Justin Pryzby Reviewed-by: David Rowley Discussion: https://postgr.es/m/ZD3D1QxoccnN8A1V@telsasoft.com	2023-04-18 14:03:49 +12:00
David Rowley	b4dbf3e924	Fix various typos This fixes many spelling mistakes in comments, but a few references to invalid parameter names, function names and option names too in comments and also some in string constants Also, fix an #undef that was undefining the incorrect definition Author: Alexander Lakhin Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/d5f68d19-c0fc-91a9-118d-7c6a5a3f5fad@gmail.com	2023-04-18 13:23:23 +12:00
Jeff Davis	e39d512f3e	Comment fix for `60684dd834`. Reported-by: Peter Eisentraut Discussion: https://postgr.es/m/766f3799-0269-162f-ba63-4cae34a5534f@enterprisedb.com	2023-04-17 13:45:50 -07:00
Tom Lane	3e383f9b68	Avoid trying to write an empty WAL record in log_newpage_range(). If the last few pages in the specified range are empty (all zero), then log_newpage_range() could try to emit an empty WAL record containing no FPIs. This at least upsets an Assert in ReserveXLogInsertLocation, and might perhaps have bad real-world consequences in non-assert builds. This has been broken since log_newpage_range() was introduced, but the case was hard if not impossible to hit before commit `3d6a98457` decided it was okay to leave VM and FSM pages intentionally zero. Nonetheless, it seems prudent to back-patch. log_newpage_range() was added in v12 but later back-patched, so this affects all supported branches. Matthias van de Meent, per report from Justin Pryzby Discussion: https://postgr.es/m/ZD1daibg4RF50IOj@telsasoft.com	2023-04-17 14:22:26 -04:00
Peter Eisentraut	1c77de9801	doc: Add additional SQL features codes from SQL:2023 These were mysteriously omitted in `c9f57541d9`.	2023-04-17 16:06:41 +02:00
Tom Lane	78d5952dd0	Ensure result of an aggregate's finalfunc is made read-only. The finalfunc might return a read-write expanded object. If we de-duplicate multiple call sites for the aggregate, any function(s) receiving the aggregate result earlier could alter or destroy the value that reaches the ones called later. This is a brown-paper-bag bug in commit `42b746d4c`, because we actually considered the need for read-only-ness but failed to realize that it applied to the case with a finalfunc as well as the case without. Per report from Justin Pryzby. New error in HEAD, no need for back-patch. Discussion: https://postgr.es/m/ZDm5TuKsh3tzoEjz@telsasoft.com	2023-04-16 14:16:40 -04:00
Tom Lane	064eb89e83	Fix assignment to array of domain over composite, redux. Commit `3e310d837` taught isAssignmentIndirectionExpr() to look through CoerceToDomain nodes. That's not sufficient, because since commit `04fe805a1` it's been possible for the planner to simplify CoerceToDomain to RelabelType when the domain has no constraints to enforce. So we need to look through RelabelType too. Per bug #17897 from Alexander Lakhin. Although `3e310d837` was back-patched to v11, it seems sufficient to apply this change to v12 and later, since `04fe805a1` came in in v12. Dmitry Dolgov Discussion: https://postgr.es/m/17897-4216c546c3874044@postgresql.org	2023-04-15 12:01:39 -04:00
David Rowley	414d66220a	Adjust Valgrind macro usage to protect chunk headers Prior to this commit we only ever protected MemoryChunk's requested_size field with Valgrind NOACCESS. This means that if the hdrmask field is ever accessed accidentally then we're not going to get any warnings from Valgrind about it. Valgrind would have warned us about the problem fixed in `92957ed98` had we already been doing this. Per suggestion from Tom Lane Reviewed-by: Richard Guo Discussion: https://postgr.es/m/1650235.1672694719@sss.pgh.pa.us Discussion: https://postgr.es/m/CAApHDvr=FZNGbj252Z6M9BSFKoq6BMxgkQ2yEAGUYoo7RquqZg@mail.gmail.com	2023-04-15 11:59:52 +12:00
Andres Freund	43a33ef54e	Support RBM_ZERO_AND_CLEANUP_LOCK in ExtendBufferedRelTo(), add tests For some reason I had not implemented RBM_ZERO_AND_CLEANUP_LOCK support in ExtendBufferedRelTo(), likely thinking it not being reachable. But it is reachable, e.g. when replaying a WAL record for a page in a relation that subsequently is truncated (likely only reachable when doing crash recovery or PITR, not during ongoing streaming replication). As now all of the RBM_* modes are supported, remove assertions checking mode. As we had no test coverage for this scenario, add a new TAP test. There's plenty more that ought to be tested in this area... Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/392271.1681238924%40sss.pgh.pa.us Discussion: https://postgr.es/m/0b5eb82b-cb99-e0a4-b932-3dc60e2e3926@gmail.com	2023-04-14 11:30:33 -07:00
Tom Lane	e4d905f772	NULL is not an ideal way to spell bool "false". Thinko in commit `6633cfb21`, detected by buildfarm member hamerkop.	2023-04-14 13:31:51 -04:00
David Rowley	e0693faf79	Fix incorrect partition pruning logic for boolean partitioned tables The partition pruning logic assumed that "b IS NOT true" was exactly the same as "b IS FALSE". This is not the case when considering NULL values. Fix this so we correctly include any partition which could hold NULL values for the NOT case. Additionally, this fixes a bug in the partition pruning code which handles partitioned tables partitioned like ((NOT boolcol)). This is a seemingly unlikely schema design, and it was untested and also broken. Here we add tests for the ((NOT boolcol)) case and insert some actual data into those tables and verify we do get the correct rows back when running queries. I've also adjusted the existing boolpart tests to include some data and verify we get the correct results too. Both the bugs being fixed here could lead to incorrect query results with fewer rows being returned than expected. No additional rows could have been returned accidentally. In passing, remove needless ternary expression. It's more simple just to pass !is_not_clause to makeBoolConst(). It makes sense to do this so the code is consistent with the bug fix in the "else if" condition just below. David Kimura did submit a patch to fix the first of the issues here, but that's not what's being committed here. Reported-by: David Kimura Reviewed-by: Richard Guo, David Kimura Discussion: https://postgr.es/m/CAHnPFjQ5qxs6J_p+g8=ww7GQvfn71_JE+Tygj0S7RdRci1uwPw@mail.gmail.com Backpatch-through: 11, all supported versions	2023-04-14 16:20:27 +12:00
Thomas Munro	558c9d75fe	Fix PHJ match bit initialization. Hash join tuples reuse the HOT status bit to indicate match status during hash join execution. Correct reuse requires clearing the bit in all tuples. Serial hash join and parallel multi-batch hash join do so upon inserting the tuple into the hashtable. Single batch parallel hash join and batch 0 of unexpected multi-batch hash joins forgot to do this. It hadn't come up before because hashtable tuple match bits are only used for right and full outer joins and parallel ROJ and FOJ were unsupported. `11c2d6fdf5` introduced support for parallel ROJ/FOJ but neglected to ensure the match bits were reset. Author: Melanie Plageman <melanieplageman@gmail.com> Reported-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/flat/CAMbWs48Nde1Mv%3DBJv6_vXmRKHMuHZm2Q_g4F6Z3_pn%2B3EV6BGQ%40mail.gmail.com	2023-04-14 11:02:38 +12:00
Michael Paquier	a282697088	Remove code in charge of freeing regexps generation by Lab.c `bea3d7e` has redesigned the regexp engine so as all the allocations go through palloc() with a dedicated memory context. hba.c had to cope with the past memory management logic by going through all the HBA and ident lines generated, then directly free all the regexps found in AuthTokens to ensure that no leaks would happen. Such leaks could happen for example in the postmaster after a SIGHUP, in the event of an HBA and/or ident reload failure where all the new content parsed must be discarded, including all the regexps that may have been compiled. Now that regexps are palloc()'d in their own memory context, MemoryContextDelete() is enough to ensure that all the compiled regexps are properly gone. Simplifying this logic in hba.c has the effect to only remove code. Most of it is new in v16, except the part for regexps compiled in ident entries for the system username, so doing this cleanup now rather than when v17 opens for business will reduce future diffs with the upcoming REL_16_STABLE. Some comments were incorrect since `bea3d7e`, now fixed to reflect the reality. Reviewed-by: Bertrand Drouvot, Álvaro Herrera Discussion: https://postgr.es/m/ZDdJ289Ky2qEj4h+@paquier.xyz	2023-04-14 07:27:44 +09:00
David Rowley	0981846b9c	Remove old GUC name mapping for "force_parallel_mode" This GUC was renamed to debug_parallel_query in `5352ca22e`. That commit added an entry into map_old_guc_names[] to allow the old name still to work. That was done to allow a transition time where the buildfarm configs could be swapped over to use debug_parallel_query instead. That work is now complete. Here we remove the old name with the intention of breaking any user code which is using force_parallel_query. As mentioned in the commit message for `5352ca22e`, it appeared many users were misled into thinking that setting this GUC was doing something useful for them to make queries run more quickly. Discussion: https://postgr.es/m/CAApHDvoR7EOz7Tvyzrd18FO-Dw2Cp4Uyq25TEWguK+XyCJtzOw@mail.gmail.com	2023-04-14 10:19:45 +12:00
Peter Geoghegan	d6f0f95a6b	Harmonize some more function parameter names. Make sure that function declarations use names that exactly match the corresponding names from function definitions in a few places. These inconsistencies were all introduced relatively recently, after the code base had parameter name mismatches fixed in bulk (see commits starting with commits `4274dc22` and `035ce1fe`). pg_bsd_indent still has a couple of similar inconsistencies, which I (pgeoghegan) have left untouched for now. Like all earlier commits that cleaned up function parameter names, this commit was written with help from clang-tidy.	2023-04-13 10:15:20 -07:00
Stephen Frost	f7431bca8b	Explicitly require MIT Kerberos for GSSAPI WHen building with GSSAPI support, explicitly require MIT Kerberos and check for gssapi_ext.h in configure.ac and meson.build. Also add documentation explicitly stating that we now require MIT Kerberos when building with GSSAPI support. Reveiwed by: Johnathan Katz Discussion: https://postgr.es/m/abcc73d0-acf7-6896-e0dc-f5bc12a61bb1@postgresql.org	2023-04-13 08:55:13 -04:00
Stephen Frost	6633cfb216	De-Revert "Add support for Kerberos credential delegation" This reverts commit `3d03b24c3` (Revert Add support for Kerberos credential delegation) which was committed on the grounds of concern about portability, but on further review and discussion, it's clear that we are better off explicitly requiring MIT Kerberos as that appears to be the only GSSAPI library currently that's under proper maintenance and ongoing development. The API used for storing credentials was added to MIT Kerberos over a decade ago while for the other libraries which appear to be mainly based on Heimdal, which exists explicitly to be a re-implementation of MIT Kerberos, the API never made it to a released version (even though it was added to the Heimdal git repo over 5 years ago..). This post-feature-freeze change was approved by the RMT. Discussion: https://postgr.es/m/ZDDO6jaESKaBgej0%40tamriel.snowman.net	2023-04-13 08:55:07 -04:00
Thomas Munro	b37d051b0e	Remove overzealous assertion from PHJ. We can't assert that we're the only process attached to a barrier after BarrierArriveAndDetachExceptLast(). Although that'll be true almost always, a late-starting parallel worker can attach very briefly (that is, immediately detach after checking the phase) right at that moment. BarrierArriveAndDetachExceptLast() already contains an assertion like that, but it holds a spinlock preventing the race. This thinko caused a one-off failure on build farm animal chimaera. Diagnosed-by: Melanie Plageman <melanieplageman@gmail.com> Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/3590249.1680971629@sss.pgh.pa.us	2023-04-13 09:37:54 +12:00
Andres Freund	5ec69b71f1	Improve error messages introduced in `be87200efd` and `0fdab27ad6` Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20230411.120301.93333867350615278.horikyota.ntt@gmail.com Discussion: https://postgr.es/m/20230412174244.6njadz4uoiez3l74@awork3.anarazel.de	2023-04-12 11:00:37 -07:00
Alvaro Herrera	9ce04b50e1	Revert "Catalog NOT NULL constraints" and fallout This reverts commit `e056c557ae` and minor later fixes thereof. There's a few problems in this new feature -- most notably regarding pg_upgrade behavior, but others as well. This new feature is not in any way critical on its own, so instead of scrambling to fix it we revert it and try again in early 17 with these issues in mind. Discussion: https://postgr.es/m/3801207.1681057430@sss.pgh.pa.us	2023-04-12 19:29:21 +02:00
Tom Lane	88ceac5d77	Fix parallel-safety marking when moving initplans to another node. Our policy since commit `ab77a5a45` has been that a plan node having any initplans is automatically not parallel-safe. (This could be relaxed, but not today.) clean_up_removed_plan_level neglected this, and could attach initplans to a parallel-safe child plan node without clearing the plan's parallel-safe flag. That could lead to "subplan was not initialized" errors at runtime, in case an initplan referenced another one and only the referencing one got transmitted to parallel workers. The fix in clean_up_removed_plan_level is trivial enough. materialize_finished_plan also moves initplans from one node to another, but it's okay because it already copies the source node's parallel_safe flag. The other place that does this kind of thing is standard_planner's hack to inject a top-level Gather when debug_parallel_query is active. But that's actually dead code given that we're correctly enforcing the "initplans aren't parallel safe" rule, so just replace it with an Assert that there are no initplans. Also improve some related comments. Normally we'd add a regression test case for this sort of bug. The mistake itself is already reached by existing tests, but there is accidentally no visible problem. The only known test case that creates an actual failure seems too indirect and fragile to justify keeping it as a regression test (not least because it fails to fail in v11, though the bug is clearly present there too). Per report from Justin Pryzby. Back-patch to all supported branches. Discussion: https://postgr.es/m/ZDVt6MaNWkRDO1LQ@telsasoft.com	2023-04-12 10:46:38 -04:00
Peter Eisentraut	5f38a2034e	Fix incorrect format placeholders	2023-04-12 10:05:50 +02:00
Peter Geoghegan	c03c2eae0a	Refine the guidelines for rmgrdesc authors. Clarify the goals of the recently added guidelines for rmgrdesc authors: to avoid gratuitous inconsistencies across resource managers, and to make it reasonably easy to write a reusable custom parser. Beyond that, the guidelines leave rmgrdesc authors with a significant amount of leeway. This even includes the leeway to invent custom conventions (in cases where it's warranted). Follow-up to commit `7d8219a4`. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkbYuvwYKm-Y-72QEh6SPMQcAo9uONv+mR3bMGcu9E_Cg@mail.gmail.com	2023-04-11 15:26:24 -07:00
Peter Geoghegan	96149a180d	Fix Heap rmgr's desc output for infobits arrays. Make heap desc routines that output status bit as arrays of constants avoid outputting array literals that contain superfluous punctuation characters that complicate parsing the output. Also make sure that no heap desc routine repeats the same key name (at the same nesting level), for the same reason. Arguably, these were both oversights in commit `7d8219a4`. In passing, make the desc output code (which covers Heap's DELETE, UPDATE, HOT_UPDATE, LOCK, and LOCK_UPDATED record types) consistent in terms of the output order of each field. This order also matches WAL record struct order. Heap's DELETE desc output now shows the record's xmax field for the first time (just like UPDATE/HOT_UPDATE records). Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAH2-Wz=pNYtxiJ2Jx5Lj=fKo1OEZ4GE0p_kct+ugAUTqBwU46g@mail.gmail.com	2023-04-11 15:25:02 -07:00
Peter Geoghegan	e944063294	Fix xl_heap_lock WAL record field's data type. Make xl_heap_lock's infobits_set field of type uint8, not int8. Using int8 isn't appropriate given that the field just holds status bits. This fixes an oversight in commit `0ac5ad5134`. In passing rename the nearby TransactionId field to "xmax" to make things consistency with related records, such as xl_heap_lock_updated. Deliberately avoid a bump in XLOG_PAGE_MAGIC. No backpatch, either. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkCd3kOS8b7Rfxw7Mh1_6jvX=Nzo-CWR1VBTiOtVZkWHA@mail.gmail.com	2023-04-11 14:07:54 -07:00
David Rowley	4c8a1b4769	Fix uninitialized variable in transformTableLikeClause() process_notnull_constraints should be set to false until we discover a NOT NULL column. Discovered while running Valgrind. Discussion: https://postgr.es/m/CAApHDvoMyiZVi1KW5WVdqMRzWsWkD3F7n6QD+BbAO6WTeAWsUQ@mail.gmail.com	2023-04-11 23:01:12 +12:00
David Rowley	68a2a437f4	Improve ereports for VACUUM's BUFFER_USAGE_LIMIT option There's no need to check if opt->arg is NULL since defGetString() already does that and raises an ERROR if it is. Let's just remove that check. Also, combine the two remaining ERRORs into a single check. It seems better to give an indication about what sort of values we're looking for rather than just to state that the value given isn't valid. Make BUFFER_USAGE_LIMIT uppercase in this ERROR message too. It's already upper case in one other error message, so make that consistent. Reported-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/20230411.102335.1643720544536884844.horikyota.ntt@gmail.com	2023-04-11 19:36:34 +12:00
Peter Geoghegan	26e65ebdb2	Clarify nbtree posting list update desc issue. Per complaint from Melanie Plageman. Follow-up to commit `5d6728e5`. Reported-By: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20230411002315.oyaicmcqrq2hb3ek@liskov	2023-04-10 17:55:23 -07:00
Peter Geoghegan	5d6728e588	Fix nbtree posting list update desc output. We cannot use the generic array_desc approach with per-tuple nbtree posting list update metadata because array_desc can only deal with fixed width elements (e.g., page offset numbers). Using array_desc led to incorrect rmgr descriptions for updates from nbtree DELETE/VACUUM WAL records. To fix, add specialized code to describe the update metadata as array elements in desc output. We now iterate over the update metadata using an approach that matches related REDO routines. Also stop showing the updates offset number array separately in nbtree DELETE/VACUUM desc output. It's redundant information, since the same page offset numbers appear in the description of each individual update element. Also make some small tweaks to the way that we format arrays in all desc routines (not just nbtree desc routines) to make arrays a little less verbose. Oversight in commit `1c453cfd`, which enhanced the nbtree rmgr desc routines. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkbYuvwYKm-Y-72QEh6SPMQcAo9uONv+mR3bMGcu9E_Cg@mail.gmail.com	2023-04-10 11:15:41 -07:00
Stephen Frost	3d03b24c35	Revert "Add support for Kerberos credential delegation" This reverts commit `3d4fa227bc`. Per discussion and buildfarm, this depends on APIs that seem to not be available on at least one platform (NetBSD). Should be certainly possible to rework to be optional on that platform if necessary but bit late for that at this point. Discussion: https://postgr.es/m/3286097.1680922218@sss.pgh.pa.us	2023-04-08 07:21:35 -04:00
Thomas Munro	db4f21e4a3	Redesign interrupt/cancel API for regex engine. Previously, a PostgreSQL-specific callback checked by the regex engine had a way to trigger a special error code REG_CANCEL if it detected that the next call to CHECK_FOR_INTERRUPTS() would certainly throw via ereport(). A later proposed bugfix aims to move some complex logic out of signal handlers, so that it won't run until the next CHECK_FOR_INTERRUPTS(), which makes the above design impossible unless we split CHECK_FOR_INTERRUPTS() into two phases, one to run logic and another to ereport(). We may develop such a system in the future, but for the regex code it is no longer necessary. An earlier commit moved regex memory management over to our MemoryContext system. Given that the purpose of the two-phase interrupt checking was to free memory before throwing, something we don't need to worry about anymore, it seems simpler to inject CHECK_FOR_INTERRUPTS() directly into cancelation points, and just let it throw. Since the plan is to keep PostgreSQL-specific concerns separate from the main regex engine code (with a view to bein able to stay in sync with other projects), do this with a new macro INTERRUPT(), customizable in regcustom.h and defaulting to nothing. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com	2023-04-08 22:10:39 +12:00
Thomas Munro	4f51429dd7	Update tsearch regex memory management. Now that our regex engine uses palloc(), it's not necessary to set up a special memory context callback to free compiled regexes. The regex has no resources other than the memory that is already going to be freed in bulk. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com	2023-04-08 22:09:17 +12:00
Thomas Munro	bea3d7e383	Use MemoryContext API for regex memory management. Previously, regex_t objects' memory was managed with malloc() and free() directly. Switch to palloc()-based memory management instead. Advantages: * memory used by cached regexes is now visible with MemoryContext observability tools * cleanup can be done automatically in certain failure modes (something that later commits will take advantage of) * cleanup can be done in bulk On the downside, there may be more fragmentation (wasted memory) due to per-regex MemoryContext objects. This is a problem shared with other cached objects in PostgreSQL and can probably be improved with later tuning. Thanks to Noah Misch for suggesting this general approach, which unblocks later work on interrupts. Suggested-by: Noah Misch <noah@leadboat.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com	2023-04-08 22:08:41 +12:00
Andres Freund	0fdab27ad6	Allow logical decoding on standbys Unsurprisingly, this requires wal_level = logical to be set on the primary and standby. The infrastructure added in `26669757b6` ensures that slots are invalidated if the primary's wal_level is lowered. Creating a slot on a standby waits for a xl_running_xact record to be processed. If the primary is idle (and thus not emitting xl_running_xact records), that can take a while. To make that faster, this commit also introduces the pg_log_standby_snapshot() function. By executing it on the primary, completion of slot creation on the standby can be accelerated. Note that logical decoding on a standby does not itself enforce that required catalog rows are not removed. The user has to use physical replication slots + hot_standby_feedback or other measures to prevent that. If catalog rows required for a slot are removed, the slot is invalidated. See `6af1793954` for an overall design of logical decoding on a standby. Bumps catversion, for the addition of the pg_log_standby_snapshot() function. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> (in an older version) Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: FabrÌzio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com>	2023-04-08 02:20:05 -07:00
Andres Freund	e101dfac3a	For cascading replication, wake physical and logical walsenders separately Physical walsenders can't send data until it's been flushed; logical walsenders can't decode and send data until it's been applied. On the standby, the WAL is flushed first, which will only wake up physical walsenders; and then applied, which will only wake up logical walsenders. Previously, all walsenders were awakened when the WAL was flushed. That was fine for logical walsenders on the primary; but on the standby the flushed WAL would have been not applied yet, so logical walsenders were awakened too early. Per idea from Jeff Davis and Amit Kapila. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-By: Jeff Davis <pgsql@j-davis.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAA4eK1+zO5LUeisabX10c81LU-fWMKO4M9Wyg1cdkbW7Hqh6vQ@mail.gmail.com	2023-04-08 01:06:00 -07:00
Andres Freund	26669757b6	Handle logical slot conflicts on standby During WAL replay on the standby, when a conflict with a logical slot is identified, invalidate such slots. There are two sources of conflicts: 1) Using the information added in `6af1793954`, logical slots are invalidated if required rows are removed 2) wal_level on the primary server is reduced to below logical Uses the infrastructure introduced in the prior commit. FIXME: add commit reference. Change InvalidatePossiblyObsoleteSlot() to use a recovery conflict to interrupt use of a slot, if called in the startup process. The new recovery conflict is added to pg_stat_database_conflicts, as confl_active_logicalslot. See `6af1793954` for an overall design of logical decoding on a standby. Bumps catversion for the addition of the pg_stat_database_conflicts column. Bumps PGSTAT_FILE_FORMAT_ID for the same reason. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-08 00:05:44 -07:00
Andres Freund	be87200efd	Support invalidating replication slots due to horizon and wal_level Needed for logical decoding on a standby. Slots need to be invalidated because of the horizon if rows required for logical decoding are removed. If the primary's wal_level is lowered from 'logical', logical slots on the standby need to be invalidated. The new invalidation methods will be used in a subsequent commit. Logical slots that have been invalidated can be identified via the new pg_replication_slots.conflicting column. See `6af1793954` for an overall design of logical decoding on a standby. Bumps catversion for the addition of the new pg_replication_slots column. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-07 22:40:27 -07:00
Andres Freund	4397abd0a2	Prevent use of invalidated logical slot in CreateDecodingContext() Previously we had checks for this in multiple places. Support for logical decoding on standbys will add other forms of invalidation, making it worth while to centralize the checks. This slightly changes the error message for both the walsender and SQL interface. Particularly the SQL interface error was inaccurate, as the "This slot has never previously reserved WAL" portion was unreachable. Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-07 22:19:05 -07:00
Andres Freund	15f8203a59	Replace replication slot's invalidated_at LSN with an enum This is mainly useful because the upcoming logical-decoding-on-standby feature adds further reasons for invalidating slots, and we don't want to end up with multiple invalidated_* fields, or check different attributes. Eventually we should consider not resetting restart_lsn when invalidating a slot due to max_slot_wal_keep_size. But that's a user visible change, so left for later. Increases SLOT_VERSION, due to the changed field (with a different alignment, no less). Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-07 21:47:25 -07:00
Thomas Munro	d4e71df6d7	Add io_direct setting (developer-only). Provide a way to ask the kernel to use O_DIRECT (or local equivalent) where available for data and WAL files, to avoid or minimize kernel caching. This hurts performance currently and is not intended for end users yet. Later proposed work would introduce our own I/O clustering, read-ahead, etc to replace the facilities the kernel disables with this option. The only user-visible change, if the developer-only GUC is not used, is that this commit also removes the obscure logic that would activate O_DIRECT for the WAL when wal_sync_method=open_[data]sync and wal_level=minimal (which also requires max_wal_senders=0). Those are non-default and unlikely settings, and this behavior wasn't (correctly) documented. The same effect can be achieved with io_direct=wal. Author: Thomas Munro <thomas.munro@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg%40mail.gmail.com	2023-04-08 16:35:07 +12:00
Thomas Munro	faeedbcefd	Introduce PG_IO_ALIGN_SIZE and align all I/O buffers. In order to have the option to use O_DIRECT/FILE_FLAG_NO_BUFFERING in a later commit, we need the addresses of user space buffers to be well aligned. The exact requirements vary by OS and file system (typically sectors and/or memory pages). The address alignment size is set to 4096, which is enough for currently known systems: it matches modern sectors and common memory page size. There is no standard governing O_DIRECT's requirements so we might eventually have to reconsider this with more information from the field or future systems. Aligning I/O buffers on memory pages is also known to improve regular buffered I/O performance. Three classes of I/O buffers for regular data pages are adjusted: (1) Heap buffers are now allocated with the new palloc_aligned() or MemoryContextAllocAligned() functions introduced by commit `439f6175`. (2) Stack buffers now use a new struct PGIOAlignedBlock to respect PG_IO_ALIGN_SIZE, if possible with this compiler. (3) The buffer pool is also aligned in shared memory. WAL buffers were already aligned on XLOG_BLCKSZ. It's possible for XLOG_BLCKSZ to be configured smaller than PG_IO_ALIGNED_SIZE and thus for O_DIRECT WAL writes to fail to be well aligned, but that's a pre-existing condition and will be addressed by a later commit. BufFiles are not yet addressed (there's no current plan to use O_DIRECT for those, but they could potentially get some incidental speedup even in plain buffered I/O operations through better alignment). If we can't align stack objects suitably using the compiler extensions we know about, we disable the use of O_DIRECT by setting PG_O_DIRECT to 0. This avoids the need to consider systems that have O_DIRECT but can't align stack objects the way we want; such systems could in theory be supported with more work but we don't currently know of any such machines, so it's easier to pretend there is no O_DIRECT support instead. That's an existing and tested class of system. Add assertions that all buffers passed into smgrread(), smgrwrite() and smgrextend() are correctly aligned, unless PG_O_DIRECT is 0 (= stack alignment tricks may be unavailable) or the block size has been set too small to allow arrays of buffers to be all aligned. Author: Thomas Munro <thomas.munro@gmail.com> Author: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/CA+hUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg@mail.gmail.com	2023-04-08 16:34:50 +12:00
Stephen Frost	3d4fa227bc	Add support for Kerberos credential delegation Support GSSAPI/Kerberos credentials being delegated to the server by a client. With this, a user authenticating to PostgreSQL using Kerberos (GSSAPI) credentials can choose to delegate their credentials to the PostgreSQL server (which can choose to accept them, or not), allowing the server to then use those delegated credentials to connect to another service, such as with postgres_fdw or dblink or theoretically any other service which is able to be authenticated using Kerberos. Both postgres_fdw and dblink are changed to allow non-superuser password-less connections but only when GSSAPI credentials have been delegated to the server by the client and GSSAPI is used to authenticate to the remote system. Authors: Stephen Frost, Peifeng Qiu Reviewed-By: David Christensen Discussion: https://postgr.es/m/CO1PR05MB8023CC2CB575E0FAAD7DF4F8A8E29@CO1PR05MB8023.namprd05.prod.outlook.com	2023-04-07 21:58:04 -04:00
Andres Freund	ac8d53dae5	Track IO times in pg_stat_io `a9c70b46db` and 8aaa04b32S added counting of IO operations to a new view, pg_stat_io. Now, add IO timing for reads, writes, extends, and fsyncs to pg_stat_io as well. This combines the tracking for pgBufferUsage with the tracking for pg_stat_io into a new function pgstat_count_io_op_time(). This should make it a bit easier to avoid the somewhat costly instr_time conversion done for pgBufferUsage. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ay5iKmnbXZ3DsauViF3eMxu4m1oNnJXqV_HyqYeg55Ww%40mail.gmail.com	2023-04-07 17:04:56 -07:00
Peter Geoghegan	1c453cfd89	Show more detail in nbtree rmgr descriptions. Show a detailed description of the page offset number arrays that appear in certain nbtree WAL records. Also brings nbtree desc routines in line with the guidelines established by recent commit `7d8219a4`. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/flat/20230109215842.fktuhesvayno6o4g%40awork3.anarazel.de	2023-04-07 16:46:23 -07:00
Peter Geoghegan	7d8219a444	Show more detail in heapam rmgr descriptions. Add helper functions that output arrays in a standard format, and use the functions inside heapdesc routines. This allows tools like pg_walinspect to show a detailed description of the page offset number arrays for records like PRUNE and VACUUM (unless there was an FPI). Also document the conventions that desc routines should follow. Only the heapdesc routines follow the conventions for now, so they're just guidelines for the time being. Based on a suggestion from Andres Freund. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/flat/20230109215842.fktuhesvayno6o4g%40awork3.anarazel.de	2023-04-07 16:08:52 -07:00
Andres Freund	704261ecc6	Improve IO accounting for temp relation writes Both pgstat_database and pgBufferUsage count IO timing for reads of temporary relation blocks into local buffers. However, both failed to count write IO timing for flushes of dirty local buffers. Fix. Additionally, FlushRelationBuffers() seems to have omitted counting write IO (both count and timing) stats for both pgstat_database and pgBufferUsage. Fix. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20230321023451.7rzy4kjj2iktrg2r%40awork3.anarazel.de	2023-04-07 13:24:26 -07:00
Alvaro Herrera	e056c557ae	Catalog NOT NULL constraints We now create pg_constaint rows for NOT NULL constraints with contype='n'. We propagate these constraints during operations such as adding inheritance relationships, creating and attaching partitions, creating tables LIKE other tables. We mostly follow the well-known rules of conislocal and coninhcount that we have for CHECK constraints, with some adaptations; for example, as opposed to CHECK constraints, we don't match NOT NULL ones by name when descending a hierarchy to alter it; instead we match by column number. This means we don't require the constraint names to be identical across a hierarchy. For now, we omit them from system catalogs. Maybe this is worth reconsidering. We don't support NOT VALID nor DEFERRABLE clauses either; these can be added as separate features later (this patch is already large and complicated enough.) This has been very long in the making. The first patch was written by Bernd Helmle in 2010 to add a new pg_constraint.contype value ('n'), which I (Álvaro) then hijacked in 2011 and 2012, until that one was killed by the realization that we ought to use contype='c' instead: manufactured CHECK constraints. However, later SQL standard development, as well as nonobvious emergent properties of that design (mostly, failure to distinguish them from "normal" CHECK constraints as well as the performance implication of having to test the CHECK expression) led us to reconsider this choice, so now the current implementation uses contype='n' again. In 2016 Vitaly Burovoy also worked on this feature[1] but found no consensus for his proposed approach, which was claimed to be closer to the letter of the standard, requiring additional pg_attribute columns to track the OID of the NOT NULL constraint for that column. [1] https://postgr.es/m/CAKOSWNkN6HSyatuys8xZxzRCR-KL1OkHS5-b9qd9bf1Rad3PLA@mail.gmail.com Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Bernd Helmle <mailings@oopsware.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/CACA0E642A0267EDA387AF2B%40%5B172.26.14.62%5D Discussion: https://postgr.es/m/AANLkTinLXMOEMz+0J29tf1POokKi4XDkWJ6-DDR9BKgU@mail.gmail.com Discussion: https://postgr.es/m/20110707213401.GA27098@alvh.no-ip.org Discussion: https://postgr.es/m/1343682669-sup-2532@alvh.no-ip.org Discussion: https://postgr.es/m/CAKOSWNkN6HSyatuys8xZxzRCR-KL1OkHS5-b9qd9bf1Rad3PLA@mail.gmail.com Discussion: https://postgr.es/m/20220817181249.q7qvj3okywctra3c@alvherre.pgsql	2023-04-07 19:59:57 +02:00
Tom Lane	ff245a3788	Doc: improve descriptions of max_[pred_]locks_per_transaction GUCs. The old wording described these as being multiplied by max_connections plus max_prepared_transactions, which hasn't been exactly right for some time thanks to the addition of various auxiliary processes. Moreover, exactness here is a bit pointless given that the lock tables can expand into the initially-unallocated "slop" space in shared memory. Rather than trying to track exactly what the code is doing, let's just use the term "server processes". Likewise adjust these GUCs' description strings in guc_tables.c. Wang Wei, reviewed by Nathan Bossart and myself Discussion: https://postgr.es/m/OS3PR01MB6275BDD09C9B875C65FCC5AB9EA39@OS3PR01MB6275.jpnprd01.prod.outlook.com	2023-04-07 13:29:29 -04:00
Tom Lane	888f2ea0a8	Add array_sample() and array_shuffle() functions. These are useful in Monte Carlo applications. Martin Kalcher, reviewed/adjusted by Daniel Gustafsson and myself Discussion: https://postgr.es/m/9d160a44-7675-51e8-60cf-6d64b76db831@aboutsource.net	2023-04-07 11:47:07 -04:00
Andres Freund	21d7c05a5c	Fix copy-paste bug in `12f3867f55` triggering an assert after a write error The same condition accidentally was copied to both branches. Manual testing confirms that otherwise the error recovery path works fine. Found while reviewing the logical-decoding-on-standby patch.	2023-04-07 01:02:46 -07:00
Michael Paquier	8fcb32db98	Add more protections in WAL record APIs against overflows This commit adds a limit to the size of an XLogRecord at 1020MB, based on a suggestion by Heikki Linnakangas. This counts for the overhead needed by the XLogReader when allocating the memory it needs to read a record in DecodeXLogRecordRequiredSpace(), based on the record size. An assertion based on that is added to detect that any additions in the XLogReader facilities would not cause any overflows. If that's ever the case, the upper bound allowed would need to be adjusted. Before this, it was possible for an external module to create WAL records large enough to be assembled but not replayable, causing failures when replaying such WAL records on standbys. One case mentioned where this is possible is the in-core function pg_logical_emit_message() (wrapper for LogLogicalMessage), that allows to emit WAL records with an arbitrary amount of data potentially higher than the replay limit of approximately 1GB (limit of a palloc, minus the overhead needed by a XLogReader). This commit is a follow-up of `ffd1b6b` that has added similar protections for the block-level data. Here, the checks are extended to the whole record length, mainrdata_len being extended from uint32 to uint64 with the routines registering buffer and record data still limited to uint32 to minimize the checks when assembling a record. All the error messages related to overflow checks are improved to provide more context about the error happening. Author: Matthias van de Meent Reviewed-by: Andres Freund, Heikki Linnakangas, Michael Paquier Discussion: https://postgr.es/m/CAEze2WgGiw+LZt+vHf8tWqB_6VxeLsMeoAuod0N=ij1q17n5pw@mail.gmail.com	2023-04-07 10:10:17 +09:00
Andres Freund	26158b852d	Use ExtendBufferedRelTo() in XLogReadBufferExtended() Instead of extending the relation block-by-block, use ExtendBufferedRelTo(), introduced in `31966b151e`. This is faster and simpler. This also somewhat reduces the danger that disconnected segments pose (which can be "discovered" once the previous segment reaches SEGSIZE), as ExtendBufferedRelTo() won't extend past the block it has been asked. However, the risk of the content of such a disconnected segment being invalid remains. Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de Discussion: https://postgr.es/m/20230223010147.32oir7sb66slqnjk@awork3.anarazel.de	2023-04-06 17:56:17 -07:00
David Rowley	ae78cae3be	Add --buffer-usage-limit option to vacuumdb `1cbbee033` added BUFFER_USAGE_LIMIT to the VACUUM and ANALYZE commands, so here we permit that option to be specified in vacuumdb. In passing, adjust the documents for vacuum_buffer_usage_limit and the BUFFER_USAGE_LIMIT VACUUM option to mention "kB" rather than "KB". Do the same for the ERROR message in ExecVacuum() and check_vacuum_buffer_usage_limit(). Without that we might tell a user that the valid minimum value is 128 KB only to reject that because we accept only "kB" and not "KB". Also, add a small reminder comment in vacuum.h to try to trigger the memory of anyone adding new fields to VacuumParams that they might want to consider if vacuumdb needs to grow a new option too. Author: Melanie Plageman Reviewed-by: Justin Pryzby Reviewed-by: David Rowley Discussion: https://postgr.es/m/ZAzTg3iEnubscvbf@telsasoft.com	2023-04-07 12:47:10 +12:00
Andres Freund	00d1e02be2	hio: Use ExtendBufferedRelBy() to extend tables more efficiently While we already had some form of bulk extension for relations, it was fairly limited. It only amortized the cost of acquiring the extension lock, the relation itself was still extended one-by-one. Bulk extension was also solely triggered by contention, not by the amount of data inserted. To address this, use ExtendBufferedRelBy(), introduced in `31966b151e`, to extend the relation. We try to extend the relation by multiple blocks in two situations: 1) The caller tells RelationGetBufferForTuple() that it will need multiple pages. For now that's only used by heap_multi_insert(), see commit FIXME. 2) If there is contention on the extension lock, use the number of waiters for the lock as a multiplier for the number of blocks to extend by. This is similar to what we already did. Previously we additionally multiplied the numbers of waiters by 20, but with the new relation extension infrastructure I could not see a benefit in doing so. Using the freespacemap to provide empty pages can cause significant contention, and adds measurable overhead, even if there is no contention. To reduce that, remember the blocks the relation was extended by in the BulkInsertState, in the extending backend. In case 1) from above, the blocks the extending backend needs are not entered into the FSM, as we know that we will need those blocks. One complication with using the FSM to record empty pages, is that we need to insert blocks into the FSM, when we already hold a buffer content lock. To avoid doing IO while holding a content lock, release the content lock before recording free space. Currently that opens a small window in which another backend could fill the block, if a concurrent VACUUM records the free space. If that happens, we retry, similar to the already existing case when otherBuffer is provided. In the future it might be worth closing the race by preventing VACUUM from recording the space in newly extended pages. This change provides very significant wins (3x at 16 clients, on my workstation) for concurrent COPY into a single relation. Even single threaded COPY is measurably faster, primarily due to not dirtying pages while extending, if supported by the operating system (see commit `4d330a61bb`). Even single-row INSERTs benefit, although to a much smaller degree, as the relation extension lock rarely is the primary bottleneck. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-06 16:53:17 -07:00
David Rowley	1cbbee0338	Add VACUUM/ANALYZE BUFFER_USAGE_LIMIT option Add new options to the VACUUM and ANALYZE commands called BUFFER_USAGE_LIMIT to allow users more control over how large to make the buffer access strategy that is used to limit the usage of buffers in shared buffers. Larger rings can allow VACUUM to run more quickly but have the drawback of VACUUM possibly evicting more buffers from shared buffers that might be useful for other queries running on the database. Here we also add a new GUC named vacuum_buffer_usage_limit which controls how large to make the access strategy when it's not specified in the VACUUM/ANALYZE command. This defaults to 256KB, which is the same size as the access strategy was prior to this change. This setting also controls how large to make the buffer access strategy for autovacuum. Per idea by Andres Freund. Author: Melanie Plageman Reviewed-by: David Rowley Reviewed-by: Andres Freund Reviewed-by: Justin Pryzby Reviewed-by: Bharath Rupireddy Discussion: https://postgr.es/m/20230111182720.ejifsclfwymw2reb@awork3.anarazel.de	2023-04-07 11:40:31 +12:00
Andres Freund	5279e9db8e	heapam: Pass number of required pages to RelationGetBufferForTuple() A future commit will use this information to determine how aggressively to extend the relation by. In heap_multi_insert() we know accurately how many pages we need once we need to extend the relation, providing an accurate lower bound for how much to extend. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-06 16:17:16 -07:00
Daniel Gustafsson	7d71d3dd08	Refresh cost-based delay params more frequently in autovacuum Allow autovacuum to reload the config file more often so that cost-based delay parameters can take effect while VACUUMing a relation. Previously, autovacuum workers only reloaded the config file once per relation vacuumed, so config changes could not take effect until beginning to vacuum the next table. Now, check if a reload is pending roughly once per block, when checking if we need to delay. In order for autovacuum workers to safely update their own cost delay and cost limit parameters without impacting performance, we had to rethink when and how these values were accessed. Previously, an autovacuum worker's wi_cost_limit was set only at the beginning of vacuuming a table, after reloading the config file. Therefore, at the time that autovac_balance_cost() was called, workers vacuuming tables with no cost-related storage parameters could still have different values for their wi_cost_limit_base and wi_cost_delay. Now that the cost parameters can be updated while vacuuming a table, workers will (within some margin of error) have no reason to have different values for cost limit and cost delay (in the absence of cost-related storage parameters). This removes the rationale for keeping cost limit and cost delay in shared memory. Balancing the cost limit requires only the number of active autovacuum workers vacuuming a table with no cost-based storage parameters. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com	2023-04-07 01:00:21 +02:00
Daniel Gustafsson	a85c60a945	Separate vacuum cost variables from GUCs Vacuum code run both by autovacuum workers and a backend doing VACUUM/ANALYZE previously inspected VacuumCostLimit and VacuumCostDelay, which are the global variables backing the GUCs vacuum_cost_limit and vacuum_cost_delay. Autovacuum workers needed to override these variables with their own values, derived from autovacuum_vacuum_cost_limit and autovacuum_vacuum_cost_delay and worker cost limit balancing logic. This led to confusing code which, in some cases, both derived and set a new value of VacuumCostLimit from VacuumCostLimit. In preparation for refreshing these GUC values more often, introduce new, independent global variables and add a function to update them using the GUCs and existing logic. Per suggestion by Kyotaro Horiguchi Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com	2023-04-07 00:54:53 +02:00
Daniel Gustafsson	71a825194f	Make vacuum failsafe_active globally visible While vacuuming a table in failsafe mode, VacuumCostActive should not be re-enabled. This currently isn't a problem because vacuum cost parameters are only refreshed in between vacuuming tables and failsafe status is reset for every table. In preparation for allowing vacuum cost parameters to be updated more frequently, elevate LVRelState->failsafe_active to a global, VacuumFailsafeActive, which will be checked when determining whether or not to re-enable vacuum cost-related delays. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com	2023-04-07 00:54:08 +02:00
Tom Lane	029dea882a	Fix ts_headline() edge cases for empty query and empty search text. tsquery's GETQUERY() macro is only safe to apply to a tsquery that is known non-empty; otherwise it gives a pointer to garbage. Before commit `5a617d75d`, ts_headline() avoided this pitfall, but only in a very indirect, nonobvious way. (hlCover could not reach its TS_execute call, because if the query contains no lexemes then hlFirstIndex would surely return -1.) After that commit, it fell into the trap, resulting in weird errors such as "unrecognized operator" and/or valgrind complaints. In HEAD, fix this by not calling TS_execute_locations() at all for an empty query. In the back branches, add a defensive check to hlCover() --- that's not fixing any live bug, but I judge the code a bit too fragile as-is. Also, both mark_hl_fragments() and mark_hl_words() were careless about the possibility of empty search text: in the cases where no match has been found, they'd end up telling mark_fragment() to mark from word indexes 0 to 0 inclusive, even when there is no word 0. This is harmless since we over-allocated the prs->words array, but it does annoy valgrind. Fix so that the end index is -1 and thus mark_fragment() will do nothing in such cases. Bottom line is that this fixes a live bug in HEAD, but in the back branches it's only getting rid of a valgrind nitpick. Back-patch anyway. Per report from Alexander Lakhin. Discussion: https://postgr.es/m/c27f642d-020b-01ff-ae61-086af287c4fd@gmail.com	2023-04-06 15:52:44 -04:00
Andres Freund	18103b7c5f	hio: Don't pin the VM while holding buffer lock while extending Starting with commit `7db0cd2145`, RelationGetBufferForTuple() did a visibilitymap_pin() while holding an exclusive buffer content lock on a newly extended page, when using COPY FREEZE. We elsewhere try hard to avoid to doing IO while holding a content lock. And until `14f98e0af9`, that happened while holding the relation extension lock. Practically, this isn't a huge issue, because COPY FREEZE is restricted to relations created or truncated in the current session, so it's unlikely there's a lot of contention. We can't avoid doing IO while holding the content lock by pinning the VM earlier, because we don't know which page it will be on. While we could just ignore the issue in this case, a future commit will add bulk relation extension, which needs to enter pages into the FSM while also trying to hold onto a buffer lock. To address this issue, use visibilitymap_pin_ok() to see if the relevant buffer is already pinned. If not, release the buffer, pin the VM buffer, and acquire the lock again. This opens up a small window for other backends to insert data onto the page - as the page is not entered into the freespacemap, other backends won't see it normally, but a concurrent vacuum could enter the page, if it started just after the relation is extended. In case the page is used by another backend, retry. This is very similar to how locking "otherBuffer" is already dealt with. Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: http://postgr.es/m/20230325025740.wzvchp2kromw4zqz@awork3.anarazel.de	2023-04-06 11:11:13 -07:00
Andres Freund	bba9003b62	hio: Relax rules for calling GetVisibilityMapPins() GetVisibilityMapPins() insisted on the buffer1/buffer2 being in a specific order. This required checks at the callsite. As a subsequent patch will add another callsite, move related logic into GetVisibilityMapPins(). Discussion: https://postgr.es/m/20230403190030.fk2frxv6faklrseb@awork3.anarazel.de	2023-04-06 10:36:30 -07:00
Tomas Vondra	2820adf775	Support long distance matching for zstd compression zstd compression supports a special mode for finding matched in distant past, which may result in better compression ratio, at the expense of using more memory (the window size is 128MB). To enable this optional mode, use the "long" keyword when specifying the compression method (--compress=zstd:long). Author: Justin Pryzby Reviewed-by: Tomas Vondra, Jacob Champion Discussion: https://postgr.es/m/20230224191840.GD1653@telsasoft.com Discussion: https://postgr.es/m/20220327205020.GM28503@telsasoft.com	2023-04-06 17:18:42 +02:00
David Rowley	b9b125b9c1	Move various prechecks from vacuum() into ExecVacuum() vacuum() is used for both the VACUUM command and for autovacuum. There were many prechecks being done inside vacuum() that were just not relevant to autovacuum. Let's move the bulk of these into ExecVacuum() so that they're only executed when running the VACUUM command. This removes a small amount of overhead when autovacuum vacuums a table. While we are at it, allocate VACUUM's BufferAccessStrategy in ExecVacuum() and pass it into vacuum() instead of expecting vacuum() to make it if it's not already made by the calling function. To make this work, we need to create the vacuum memory context slightly earlier, so we now need to pass that down to vacuum() so that it's available for use in other memory allocations. Author: Melanie Plageman Reviewed-by: David Rowley Discussion: https://postgr.es/m/20230405211534.4skgskbilnxqrmxg@awork3.anarazel.de	2023-04-06 15:44:52 +12:00
Andres Freund	acab1b0914	Convert many uses of ReadBuffer[Extended](P_NEW) to ExtendBufferedRel() A few places are not converted. Some because they are tackled in later commits (e.g. hio.c, xlogutils.c), some because they are more complicated (e.g. brin_pageops.c). Having a few users of ReadBuffer(P_NEW) is good anyway, to ensure the backward compat path stays working. Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-05 18:57:29 -07:00
Andres Freund	fcdda1e4b5	Use ExtendBufferedRelTo() in {vm,fsm}_extend() This uses ExtendBufferedRelTo(), introduced in `31966b151e`, to extend the visibilitymap and freespacemap to the size needed. It also happens to fix a warning introduced in `3d6a98457d`, reported by Tom Lane. Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de Discussion: https://postgr.es/m/2194723.1680736788@sss.pgh.pa.us	2023-04-05 17:50:09 -07:00
David Rowley	bccd6908ca	Always make a BufferAccessStrategy for ANALYZE `32fbe0239` changed things so we didn't bother allocating the BufferAccessStrategy during VACUUM (ONLY_DATABASE_STATS); and VACUUM (FULL), however, it forgot to consider that VACUUM (FULL, ANALYZE) is a possible combination. That change would have resulted in such a command allowing ANALYZE to make full use of shared buffers, which wasn't intended, so fix that. Reported-by: Melanie Plageman Discussion: https://postgr.es/m/CAAKRu_bJRKe+v_=OqwC+5sA3j5qv8rqdAwy3+yHaO3wmtfrCRg@mail.gmail.com	2023-04-06 12:37:03 +12:00
Michael Paquier	1d477a907e	Fix row tracking in pg_stat_statements with extended query protocol pg_stat_statements relies on EState->es_processed to count the number of rows processed by ExecutorRun(). This proves to be a problem under the extended query protocol when the result of a query is fetched through more than one call of ExecutorRun(), as es_processed is reset each time ExecutorRun() is called. This causes pg_stat_statements to report the number of rows calculated in the last execute fetch, rather than the global sum of all the rows processed. As pquery.c tells, this is a problem when a portal does not use holdStore. For example, DMLs with RETURNING would report a correct tuple count as these do one execution cycle when the query is first executed to fill in the portal's store with one ExecutorRun(), feeding on the portal's store for each follow-up execute fetch depending on the fetch size requested by the client. The fix proposed for this issue is simple with the addition of an extra counter in EState that's preserved across multiple ExecutorRun() calls, incremented with the value calculated in es_processed. This approach is not back-patchable, unfortunately. Note that libpq does not currently give any way to control the fetch size when using the extended v3 protocol, meaning that in-core testing is not possible yet. This issue can be easily verified with the JDBC driver, though, with autocommit disabled. Hence, having in-core tests requires more features, left for future discussion: - At least two new libpq routines splitting PQsendQueryGuts(), one for the bind/describe and a second for a series of execute fetches with a custom fetch size, likely in a fashion similar to what JDBC does. - A psql meta-command for the execute phase. This part is not strictly mandatory, still it could be handy. Reported-by: Andrew Dunstan (original discovery by Simon Siggs) Author: Sami Imseih Reviewed-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/EBE6C507-9EB6-4142-9E4D-38B1673363A7@amazon.com Discussion: https://postgr.es/m/c90890e7-9c89-c34f-d3c5-d5c763a34bd8@dunslane.net	2023-04-06 09:29:03 +09:00
Andres Freund	31966b151e	bufmgr: Introduce infrastructure for faster relation extension The primary bottlenecks for relation extension are: 1) The extension lock is held while acquiring a victim buffer for the new page. Acquiring a victim buffer can require writing out the old page contents including possibly needing to flush WAL. 2) When extending via ReadBuffer() et al, we write a zero page during the extension, and then later write out the actual page contents. This can nearly double the write rate. 3) The existing bulk relation extension infrastructure in hio.c just amortized the cost of acquiring the relation extension lock, but none of the other costs. Unfortunately 1) cannot currently be addressed in a central manner as the callers to ReadBuffer() need to acquire the extension lock. To address that, this this commit moves the responsibility for acquiring the extension lock into bufmgr.c functions. That allows to acquire the relation extension lock for just the required time. This will also allow us to improve relation extension further, without changing callers. The reason we write all-zeroes pages during relation extension is that we hope to get ENOSPC errors earlier that way (largely works, except for CoW filesystems). It is easier to handle out-of-space errors gracefully if the page doesn't yet contain actual tuples. This commit addresses 2), by using the recently introduced smgrzeroextend(), which extends the relation, without dirtying the kernel page cache for all the extended pages. To address 3), this commit introduces a function to extend a relation by multiple blocks at a time. There are three new exposed functions: ExtendBufferedRel() for extending the relation by a single block, ExtendBufferedRelBy() to extend a relation by multiple blocks at once, and ExtendBufferedRelTo() for extending a relation up to a certain size. To avoid duplicating code between ReadBuffer(P_NEW) and the new functions, ReadBuffer(P_NEW) now implements relation extension with ExtendBufferedRel(), using a flag to tell ExtendBufferedRel() that the relation lock is already held. Note that this commit does not yet lead to a meaningful performance or scalability improvement - for that uses of ReadBuffer(P_NEW) will need to be converted to ExtendBuffered*(), which will be done in subsequent commits. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-05 16:21:09 -07:00
Andres Freund	12f3867f55	bufmgr: Support multiple in-progress IOs by using resowner A future patch will add support for extending relations by multiple blocks at once. To be concurrency safe, the buffers for those blocks need to be marked as BM_IO_IN_PROGRESS. Until now we only had infrastructure for recovering from an IO error for a single buffer. This commit extends that infrastructure to multiple buffers by using the resource owner infrastructure. This commit increases the size of the ResourceOwnerData struct, which appears to have a just about measurable overhead in very extreme workloads. Medium term we are planning to substantially shrink the size of ResourceOwnerData. Short term the increase is small enough to not worry about it for now. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de Discussion: https://postgr.es/m/20221029200025.w7bvlgvamjfo6z44@awork3.anarazel.de	2023-04-05 14:17:55 -07:00
Tom Lane	16dc2703c5	Support "Right Anti Join" plan shapes. Merge and hash joins can support antijoin with the non-nullable input on the right, using very simple combinations of their existing logic for right join and anti join. This gives the planner more freedom about how to order the join. It's particularly useful for hash join, since we may now have the option to hash the smaller table instead of the larger. Richard Guo, reviewed by Ronan Dunklau and myself Discussion: https://postgr.es/m/CAMbWs48xh9hMzXzSy3VaPzGAz+fkxXXTUbCLohX1_L8THFRm2Q@mail.gmail.com	2023-04-05 16:59:09 -04:00
Andres Freund	dad50f677c	bufmgr: Acquire and clean victim buffer separately Previously we held buffer locks for two buffer mapping partitions at the same time to change the identity of buffers. Particularly for extending relations needing to hold the extension lock while acquiring a victim buffer is painful.But it also creates a bottleneck for workloads that just involve reads. Now we instead first acquire a victim buffer and write it out, if necessary. Then we remove that buffer from the old partition with just the old partition's partition lock held and insert it into the new partition with just that partition's lock held. By separating out the victim buffer acquisition, future commits will be able to change relation extensions to scale better. On my workstation, a micro-benchmark exercising buffered reads strenuously and under a lot of concurrency, sees a >2x improvement. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-05 13:47:46 -07:00
Tom Lane	65eb2d00c6	Acquire locks on views in AcquirePlannerLocks, too. Commit `47bb9db75` taught AcquireExecutorLocks to re-acquire locks on views using data from their RTE_SUBQUERY replacements, but it now seems like we should make AcquirePlannerLocks do the same. In this way, if a view has been redefined, we will notice that a bit earlier while checking validity of a cached plan and thereby avoid some wasted work. Report and patch by Amit Langote. Discussion: https://postgr.es/m/CA+HiwqH0xZOQ+GQAdKeckY1R4NOeHdzhtfxkAMJLSchpapNk5w@mail.gmail.com	2023-04-05 15:56:35 -04:00
Andres Freund	794f259447	bufmgr: Add Pin/UnpinLocalBuffer() So far these were open-coded in quite a few places, without a good reason. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-05 10:42:17 -07:00
Andres Freund	819b69a81d	bufmgr: Add some more error checking [infrastructure] around pinning This adds a few more assertions against a buffer being local in places we don't expect, and extracts the check for a buffer being pinned exactly once from LockBufferForCleanup() into its own function. Later commits will use this function. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: http://postgr.es/m/419312fd-9255-078c-c3e3-f0525f911d7f@iki.fi	2023-04-05 10:42:17 -07:00
Andres Freund	4d330a61bb	Add smgrzeroextend(), FileZero(), FileFallocate() smgrzeroextend() uses FileFallocate() to efficiently extend files by multiple blocks. When extending by a small number of blocks, use FileZero() instead, as using posix_fallocate() for small numbers of blocks is inefficient for some file systems / operating systems. FileZero() is also used as the fallback for FileFallocate() on platforms / filesystems that don't support fallocate. A big advantage of using posix_fallocate() is that it typically won't cause dirty buffers in the kernel pagecache. So far the most common pattern in our code is that we smgrextend() a page full of zeroes and put the corresponding page into shared buffers, from where we later write out the actual contents of the page. If the kernel, e.g. due to memory pressure or elapsed time, already wrote back the all-zeroes page, this can lead to doubling the amount of writes reaching storage. There are no users of smgrzeroextend() as of this commit. That will follow in future commits. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-05 10:06:39 -07:00
Tom Lane	4766eef317	Fix another issue with ENABLE/DISABLE TRIGGER on partitioned tables. In v13 and v14, the ENABLE/DISABLE TRIGGER USER variant malfunctioned on cloned triggers, failing to find the clones because it thought they were system triggers. Other variants of ENABLE/DISABLE TRIGGER would improperly apply a superuserness check. Fix by adjusting the is-it- a-system-trigger check to match reality in those branches. (As far as I can find, this is the only place that got it wrong.) There's no such bug in v15/HEAD, because we revised the catalog representation of system triggers to be what this code was expecting. However, add the test case to these branches anyway, because this area is visibly pretty fragile. Also remove an obsoleted comment. The recent v15/HEAD commit `6949b921d` fixed a nearby bug. I now see that my commit message for that was inaccurate: the behavior of recursing to clone triggers is older than v15, but it didn't apply to the case in v13/v14 because in those branches parent partitioned tables have no pg_trigger entries for foreign-key triggers. But add the test case from that commit to v13/v14, just to show what is happening there. Per bug #17886 from DzmitryH. Discussion: https://postgr.es/m/17886-5406d5d828aa4aa3@postgresql.org	2023-04-05 12:56:32 -04:00
Andres Freund	3d6a98457d	Don't initialize page in {vm,fsm}_extend(), not needed The read path needs to be able to initialize pages anyway, as relation extensions are not durable. By avoiding initializing pages, we can, in a future patch, extend the relation by multiple blocks at once. Using smgrextend() for {vm,fsm}_extend() is not a good idea in general, as at least one page of the VM/FSM will be read immediately after, always causing a cache miss, requiring us to read content we just wrote. Discussion: https://postgr.es/m/20230301223515.pucbj7nb54n4i4nv@awork3.anarazel.de	2023-04-05 08:19:39 -07:00
Robert Haas	86a3fc7ec8	Fix wrong word in comment. Reported by Peter Smith. Discussion: http://postgr.es/m/CAHut+Pt52ueOEAO-G5qeZiiPv1p9pBT_W5Vj3BTYfC8sD8LFxw@mail.gmail.com	2023-04-05 09:50:08 -04:00
Peter Eisentraut	f275af8cb8	Update information_schema for SQL:2023 This is mainly a light renumbering to match the sections in the standard. The comments for SQL_IMPLEMENTATION_INFO and SQL_SIZING are no longer applicable because the required information has been moved into part 9.	2023-04-05 09:57:44 +02:00
Peter Eisentraut	c9f57541d9	doc: Update SQL features/conformance information to SQL:2023 Optional subfeatures have been changed to top-level features, so there is a bit of a churn in the list for that. Some existing functions have been added to the standard, so they are moved from the "other" to the "standard" lists in their sections. Discussion: https://www.postgresql.org/message-id/flat/63f285d9-4ec8-0c9e-4bf5-e76334ddc0af@enterprisedb.com	2023-04-05 09:20:25 +02:00
Peter Eisentraut	c209d317e9	Fix minor signed/unsigned mixup The chunk header is unsigned, and the output format takes unsigned, so casting it to signed in between is incorrect.	2023-04-05 07:34:52 +02:00
Andres Freund	3f695b3117	sequences: Lock buffer before initializing page fill_seq_fork_with_data(), used to initialize a new sequence relation, only locked the buffer after calling PageInit(), even though PageInit() modifies page contents. This is unlikely to cause real-world issues, as the relation is exclusively locked at that point, and the buffer not yet marked dirty, so other processes should not access the buffer. This issue looks to have been present since the introduction of sequences in `e8647c45d6`. Given the low risk, it does not seem worth backpatching the fix. Discussion: https://postgr.es/m/20230404185501.wdkmo3k7bedlx6qk@awork3.anarazel.de	2023-04-04 16:42:52 -07:00
Jeff Davis	36320cbc16	Fix MSVC warning introduced in `ea1db8ae70`. Discussion: https://postgr.es/m/CA+hUKGJR1BhCORa5WdvwxztD3arhENcwaN1zEQ1Upg20BwjKWA@mail.gmail.com Reported-by: Thomas Munro	2023-04-04 15:43:18 -07:00
Thomas Munro	f303ec6210	Remove comment obsoleted by `11c2d6fd`. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1604497.1680637072%40sss.pgh.pa.us	2023-04-05 09:40:34 +12:00
Jeff Davis	ea1db8ae70	Canonicalize ICU locale names to language tags. Convert to BCP47 language tags before storing in the catalog, except during binary upgrade or when the locale comes from an existing collation or template database. The resulting language tags can vary slightly between ICU versions. For instance, "@colBackwards=yes" is converted to "und-u-kb-true" in older versions of ICU, and to the simpler (but equivalent) "und-u-kb" in newer versions. The process of canonicalizing to a language tag also understands more input locale string formats than ucol_open(). For instance, "fr_CA.UTF-8" is misinterpreted by ucol_open() and the region is ignored; effectively treating it the same as the locale "fr" and opening the wrong collator. Canonicalization properly interprets the language and region, resulting in the language tag "fr-CA", which can then be understood by ucol_open(). This commit fixes a problem in prior versions due to ucol_open() misinterpreting locale strings as described above. For instance, creating an ICU collation with locale "fr_CA.UTF-8" would store that string directly in the catalog, which would later be passed to (and misinterpreted by) ucol_open(). After this commit, the locale string will be canonicalized to language tag "fr-CA" in the catalog, which will be properly understood by ucol_open(). Because this fix affects the resulting collator, we cannot change the locale string stored in the catalog for existing databases or collations; otherwise we'd risk corrupting indexes. Therefore, only canonicalize locales for newly-created (not upgraded) collations/databases. For similar reasons, do not backport. Discussion: https://postgr.es/m/8c7af6820aed94dc7bc259d2aa7f9663518e6137.camel@j-davis.com Reviewed-by: Peter Eisentraut	2023-04-04 10:38:58 -07:00
Robert Haas	482675987b	Add a run_as_owner option to subscriptions. This option is normally false, but can be set to true to obtain the legacy behavior where the subscription runs with the permissions of the subscription owner rather than the permissions of the table owner. The advantages of this mode are (1) it doesn't require that the subscription owner have permission to SET ROLE to each table owner and (2) since no role switching occurs, the SECURITY_RESTRICTED_OPERATION restrictions do not apply. On the downside, it allows any table owner to easily usurp the privileges of the subscription owner - basically, to take over their account. Because that's generally quite undesirable, we don't make this mode the default, but we do make it available, just in case the new behavior causes too many problems for someone. Discussion: http://postgr.es/m/CA+TgmoZ-WEeG6Z14AfH7KhmpX2eFh+tZ0z+vf0=eMDdbda269g@mail.gmail.com	2023-04-04 12:03:03 -04:00
Robert Haas	1e10d49b65	Perform logical replication actions as the table owner. Up until now, logical replication actions have been performed as the subscription owner, who will generally be a superuser. Commit `cec57b1a0f` documented hazards associated with that situation, namely, that any user who owns a table on the subscriber side could assume the privileges of the subscription owner by attaching a trigger, expression index, or some other kind of executable code to it. As a remedy, it suggested not creating configurations where users who are not fully trusted own tables on the subscriber. Although that will work, it basically precludes using logical replication in the way that people typically want to use it, namely, to replicate a database from one node to another without necessarily having any restrictions on which database users can own tables. So, instead, change logical replication to execute INSERT, UPDATE, DELETE, and TRUNCATE operations as the table owner when they are replicated. Since this involves switching the active user frequently within a session that is authenticated as the subscription user, also impose SECURITY_RESTRICTED_OPERATION restrictions on logical replication code. As an exception, if the table owner can SET ROLE to the subscription owner, these restrictions have no security value, so don't impose them in that case. Subscription owners are now required to have the ability to SET ROLE to every role that owns a table that the subscription is replicating. If they don't, replication will fail. Superusers, who normally own subscriptions, satisfy this property by default. Non-superusers users who own subscriptions will need to be granted the roles that own relevant tables. Patch by me, reviewed (but not necessarily in its entirety) by Jelte Fennema, Jeff Davis, and Noah Misch. Discussion: http://postgr.es/m/CA+TgmoaSCkg9ww9oppPqqs+9RVqCexYCE6Aq=UsYPfnOoDeFkw@mail.gmail.com	2023-04-04 11:25:23 -04:00
Alvaro Herrera	71bfd1543f	Code review for recent SQL/JSON commits - At the last minute and for no particularly good reason, I changed the WITHOUT token to be marked especially for lookahead, from the one in WITHOUT TIME to the one in WITHOUT UNIQUE. Study of upcoming patches (where a new WITHOUT ARRAY WRAPPER clause is added) showed me that the former was better, so put it back the way the original patch had it. - update exprTypmod() for JsonConstructorExpr to return the typmod of the RETURNING clause, as a comment there suggested. Perhaps it's possible for this to make a difference with datetime types, but I didn't try to build a test case. - The nodeFuncs.c support code for new nodes was calling walker() directly instead of the WALK() macro as introduced by commit `1c27d16e6e`. Modernize that. Also add exprLocation() support for a couple of nodes that missed it. Lastly, reorder the code more sensibly. The WITHOUT_LA -> WITHOUT change means that stored rules containing either WITHOUT TIME ZONE or WITHOUT UNIQUE KEYS would change representation. Therefore, bump catversion. Discussion: https://postgr.es/m/20230329181708.e64g2tpy7jyufqkr@alvherre.pgsql	2023-04-04 14:04:30 +02:00
Andres Freund	8a2b1b1477	bufmgr: Remove buffer-write-dirty tracepoints The trace point was using the relfileno / fork / block for the to-be-read-in buffer. Some upcoming work would make that more expensive to provide. We still have buffer-flush-start/done, which can serve most tracing needs that buffer-write-dirty could serve. Discussion: https://postgr.es/m/f5164e7a-eef6-8972-75a3-8ac622ed0c6e@iki.fi	2023-04-03 18:02:41 -07:00
Peter Geoghegan	05a304a855	Make SP-GiST redirect cleanup more aggressive. Commit `61b313e4` made VACUUM pass down a heaprel to index AM bulkdelete and vacuumcleanup routines. Although this was primarily intended as preparation for logical decoding on standbys, it also made it easy to correct an old deficiency in how we determine how to cleanup SP-GiST redirect and placeholder tuples. Pass the heaprel to GlobalVisTestFor() during cleanup of redirect and placeholder tuples, rather than pessimistically passing NULL. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/02392033-f030-a3c8-c7d0-5c27eb529fec@gmail.com	2023-04-03 11:47:48 -07:00
Peter Geoghegan	e48c817395	Recycle deleted nbtree pages more aggressively. Commit `61b313e4` made nbtree consistently pass down a heaprel to low level routines like _bt_getbuf(). Although this was primarily intended as preparation for logical decoding on standbys, it also made it easy to correct an old deficiency in how nbtree VACUUM determines whether or not it's now safe to recycle deleted pages. Pass the heaprel to GlobalVisTestFor() in nbtree routines that deal with recycle safety. nbtree now makes less pessimistic assumptions about recycle safety within non-catalog relations. This enhancement complements the recycling enhancement added by commit `9dd963ae25`. nbtree remains just as pessimistic as ever when it comes to recycle safety within indexes on catalog relations. There is no fundamental reason why we need to treat catalog relations differently, though. The behavioral inconsistency is a consequence of the way that nbtree uses nextXID values to implement what Lanin and Shasha call "the drain technique". Note in particular that it has nothing to do with whether or not index tuples might still be required for an older MVCC snapshot. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkaiDxCje0yPuH=3Uh2p1V_2pFGY==xfbZoZu7Ax_NB8g@mail.gmail.com	2023-04-03 11:31:43 -07:00
Peter Geoghegan	a349b86603	Move heaprel struct field next to index rel field. Commit `61b313e4` added a heaprel struct member to IndexVacuumInfo, but placed it last. Move the heaprel struct member next to the index struct member to improve the code's readability. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznG=TV6S9d3VA=y0vBHbXwnLs9_LLdiML=aNJuHeriwxg@mail.gmail.com	2023-04-03 11:01:11 -07:00
Robert Haas	e7e7da2f8d	Fix possible logical replication crash. Commit `c3afe8cf5a` added a new password_required option but forgot that you need database access to check whether an arbitrary role ID is a superuser. Report and patch by Hou Zhijie. I added a comment. Thanks to Alexander Lakhin for devising a way to reproduce the crash. Discussion: http://postgr.es/m/OS0PR01MB5716BFD7EC44284C89F40808948F9@OS0PR01MB5716.jpnprd01.prod.outlook.com	2023-04-03 13:54:21 -04:00
Tom Lane	a8a00124f1	When using valgrind, log the current query after an error is detected. In USE_VALGRIND builds, add code to print the text of the current query to the valgrind log whenever the valgrind error count has increased during the query. Valgrind will already have printed its report, if the error is distinct from ones already seen, so that this works out fairly well as a way of annotating the log. Onur Tirtir and Tom Lane Discussion: https://postgr.es/m/AM9PR83MB0498531E804DC8DF8CFF0E8FE9D09@AM9PR83MB0498.EURPRD83.prod.outlook.com	2023-04-03 10:18:38 -04:00
Alexander Korotkov	b0b91ced16	Revert `764da7710b` Discussion: https://postgr.es/m/20230323003003.plgaxjqahjgkuxrk%40awork3.anarazel.de	2023-04-03 16:55:09 +03:00
Alexander Korotkov	2b65bf046d	Revert `11470f544e` Discussion: https://postgr.es/m/20230323003003.plgaxjqahjgkuxrk%40awork3.anarazel.de	2023-04-03 16:54:31 +03:00
David Rowley	8d928e3a9f	Rename BufferAccessStrategyData.ring_size to nbuffers The new name better reflects what the field is - the size of the buffers[] array. ring_size sounded more like it is in units of bytes. An upcoming commit allows a BufferAccessStrategy of custom sizes, so it seems relevant to improve this beforehand. Author: Melanie Plageman Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAAKRu_YefVbhg4rAxU2frYFdTap61UftH-kUNQZBvAs%2BYK81Zg%40mail.gmail.com	2023-04-03 23:31:16 +12:00
David Rowley	4830f10243	Disable vacuum's use of a buffer access strategy during failsafe Traditionally, vacuum always makes use of a buffer access strategy 32 buffers in size. This means that running vacuums tend not to cause too many shared buffers to become dirty, however, this can cause vacuums to run much more slowly than they otherwise could as WAL flushes will occur more frequently due to having to flush WAL out to the LSN of the dirty page before that page can be written to disk. When we are performing failsafe VACUUMs (as added in `1e55e7d17`), we really want to make the vacuum work go as quickly as possible, so here we disable the buffer access strategy when entering failsafe mode while vacuuming a relation. Per idea and analyis from Andres Freund. In passing, also include some changes I had intended for `32fbe0239`. Author: Melanie Plageman Reviewed-by: Justin Pryzby, David Rowley Discussion: https://postgr.es/m/20230111182720.ejifsclfwymw2reb%40awork3.anarazel.de	2023-04-03 23:05:58 +12:00
David Rowley	32fbe0239b	Only make buffer strategy for vacuum when it's likely needed VACUUM FULL and VACUUM ONLY_DATABASE_STATS will not use the vacuum strategy ring created in vacuum(), so don't waste effort making it in those cases. There are other conceivable cases where the buffer strategy also won't be used, but those are probably less common and not worth troubling over too much. For example VACUUM (PROCESS_MAIN false, PROCESS_TOAST false). There are other cases too, but many of these are only discovered once inside vacuum_rel(). Author: Melanie Plageman Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAAKRu_ZLRuzkM3zKogiZAz2hUony37yLY4aaLb8fPf8fgqs5Og@mail.gmail.com	2023-04-03 19:19:45 +12:00
David Rowley	3f476c9534	Remove some global variables from vacuum.c Using global variables because we don't want to pass these values around as parameters does not really seem like a great idea, so let's remove these two global variables and adjust a few functions to accept these values as parameters instead. This is part of a wider patch which intends to allow the size of the buffer access strategy that vacuum uses to be adjusted. Author: Melanie Plageman Reviewed-by: Bharath Rupireddy Discussion: https://postgr.es/m/CAAKRu_b1q_07uquUtAvLqTM%3DW9nzee7QbtzHwA4XdUo7KX_Cnw%40mail.gmail.com	2023-04-03 15:07:25 +12:00
Andres Freund	6af1793954	Add info in WAL records in preparation for logical slot conflict handling This commit only implements one prerequisite part for allowing logical decoding. The commit message contains an explanation of the overall design, which later commits will refer back to. Overall design: 1. We want to enable logical decoding on standbys, but replay of WAL from the primary might remove data that is needed by logical decoding, causing error(s) on the standby. To prevent those errors, a new replication conflict scenario needs to be addressed (as much as hot standby does). 2. Our chosen strategy for dealing with this type of replication slot is to invalidate logical slots for which needed data has been removed. 3. To do this we need the latestRemovedXid for each change, just as we do for physical replication conflicts, but we also need to know whether any particular change was to data that logical replication might access. That way, during WAL replay, we know when there is a risk of conflict and, if so, if there is a conflict. 4. We can't rely on the standby's relcache entries for this purpose in any way, because the startup process can't access catalog contents. 5. Therefore every WAL record that potentially removes data from the index or heap must carry a flag indicating whether or not it is one that might be accessed during logical decoding. Why do we need this for logical decoding on standby? First, let's forget about logical decoding on standby and recall that on a primary database, any catalog rows that may be needed by a logical decoding replication slot are not removed. This is done thanks to the catalog_xmin associated with the logical replication slot. But, with logical decoding on standby, in the following cases: - hot_standby_feedback is off - hot_standby_feedback is on but there is no a physical slot between the primary and the standby. Then, hot_standby_feedback will work, but only while the connection is alive (for example a node restart would break it) Then, the primary may delete system catalog rows that could be needed by the logical decoding on the standby (as it does not know about the catalog_xmin on the standby). So, it’s mandatory to identify those rows and invalidate the slots that may need them if any. Identifying those rows is the purpose of this commit. Implementation: When a WAL replay on standby indicates that a catalog table tuple is to be deleted by an xid that is greater than a logical slot's catalog_xmin, then that means the slot's catalog_xmin conflicts with the xid, and we need to handle the conflict. While subsequent commits will do the actual conflict handling, this commit adds a new field isCatalogRel in such WAL records (and a new bit set in the xl_heap_visible flags field), that is true for catalog tables, so as to arrange for conflict handling. The affected WAL records are the ones that already contain the snapshotConflictHorizon field, namely: - gistxlogDelete - gistxlogPageReuse - xl_hash_vacuum_one_page - xl_heap_prune - xl_heap_freeze_page - xl_heap_visible - xl_btree_reuse_page - xl_btree_delete - spgxlogVacuumRedirect Due to this new field being added, xl_hash_vacuum_one_page and gistxlogDelete do now contain the offsets to be deleted as a FLEXIBLE_ARRAY_MEMBER. This is needed to ensure correct alignment. It's not needed on the others struct where isCatalogRel has been added. This commit just introduces the WAL format changes mentioned above. Handling the actual conflicts will follow in future commits. Bumps XLOG_PAGE_MAGIC as the several WAL records are changed. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> (in an older version) Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>	2023-04-02 12:32:19 -07:00
Andres Freund	61b313e47e	Pass down table relation into more index relation functions This is done in preparation for logical decoding on standby, which needs to include whether visibility affecting WAL records are about a (user) catalog table. Which is only known for the table, not the indexes. It's also nice to be able to pass the heap relation to GlobalVisTestFor() in vacuumRedirectAndPlaceholder(). Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/21b700c3-eecf-2e05-a699-f8c78dd31ec7@gmail.com	2023-04-01 20:18:29 -07:00
Andres Freund	a88a18b125	Assert only valid flag bits are passed to visibilitymap_set() If visibilitymap_set() is called with flags containing a higher bit than VISIBILITYMAP_ALL_FROZEN, the state of neighboring pages is affected. While there was an assertion that some valid bits were set, it did not check that only valid bits were. Change that. Discussion: https://postgr.es/m/20230331043300.gux3s5wzrapqi4oe@awork3.anarazel.de	2023-04-01 18:00:19 -07:00
Andres Freund	14f98e0af9	hio: Release extension lock before initializing page / pinning VM PageInit() while holding the extension lock is unnecessary after `0d1fe9f74e` started to use RBM_ZERO_AND_LOCK - nobody can look at the new page before we release the page lock. PageInit() zeroes the page, which isn't that cheap, so deferring it until after the extension lock is released seems like a good idea. Doing visibilitymap_pin() while holding the extension lock, introduced in `7db0cd2145`, looks like an accident. Due to the restrictions on HEAP_INSERT_FROZEN it's unlikely to be a performance issue, but it still seems better to move it out. We also are doing the visibilitymap_pin() while holding the buffer lock, which will be fixed in a separate commit. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: http://postgr.es/m/419312fd-9255-078c-c3e3-f0525f911d7f@iki.fi	2023-04-01 17:50:18 -07:00

... 10 11 12 13 14 ...

25191 Commits