postgresql

Commit Graph

Author	SHA1	Message	Date
Andres Freund	be87200efd	Support invalidating replication slots due to horizon and wal_level Needed for logical decoding on a standby. Slots need to be invalidated because of the horizon if rows required for logical decoding are removed. If the primary's wal_level is lowered from 'logical', logical slots on the standby need to be invalidated. The new invalidation methods will be used in a subsequent commit. Logical slots that have been invalidated can be identified via the new pg_replication_slots.conflicting column. See `6af1793954` for an overall design of logical decoding on a standby. Bumps catversion for the addition of the new pg_replication_slots column. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-07 22:40:27 -07:00
Andres Freund	2ed16aacf1	Fix underspecified sort order in inherit.sql Introduced in `e056c557ae`. Per buildfarm member prion.	2023-04-07 22:25:46 -07:00
Andres Freund	4397abd0a2	Prevent use of invalidated logical slot in CreateDecodingContext() Previously we had checks for this in multiple places. Support for logical decoding on standbys will add other forms of invalidation, making it worth while to centralize the checks. This slightly changes the error message for both the walsender and SQL interface. Particularly the SQL interface error was inaccurate, as the "This slot has never previously reserved WAL" portion was unreachable. Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-07 22:19:05 -07:00
Andres Freund	15f8203a59	Replace replication slot's invalidated_at LSN with an enum This is mainly useful because the upcoming logical-decoding-on-standby feature adds further reasons for invalidating slots, and we don't want to end up with multiple invalidated_* fields, or check different attributes. Eventually we should consider not resetting restart_lsn when invalidating a slot due to max_slot_wal_keep_size. But that's a user visible change, so left for later. Increases SLOT_VERSION, due to the changed field (with a different alignment, no less). Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-07 21:47:25 -07:00
Thomas Munro	d4e71df6d7	Add io_direct setting (developer-only). Provide a way to ask the kernel to use O_DIRECT (or local equivalent) where available for data and WAL files, to avoid or minimize kernel caching. This hurts performance currently and is not intended for end users yet. Later proposed work would introduce our own I/O clustering, read-ahead, etc to replace the facilities the kernel disables with this option. The only user-visible change, if the developer-only GUC is not used, is that this commit also removes the obscure logic that would activate O_DIRECT for the WAL when wal_sync_method=open_[data]sync and wal_level=minimal (which also requires max_wal_senders=0). Those are non-default and unlikely settings, and this behavior wasn't (correctly) documented. The same effect can be achieved with io_direct=wal. Author: Thomas Munro <thomas.munro@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg%40mail.gmail.com	2023-04-08 16:35:07 +12:00
Thomas Munro	faeedbcefd	Introduce PG_IO_ALIGN_SIZE and align all I/O buffers. In order to have the option to use O_DIRECT/FILE_FLAG_NO_BUFFERING in a later commit, we need the addresses of user space buffers to be well aligned. The exact requirements vary by OS and file system (typically sectors and/or memory pages). The address alignment size is set to 4096, which is enough for currently known systems: it matches modern sectors and common memory page size. There is no standard governing O_DIRECT's requirements so we might eventually have to reconsider this with more information from the field or future systems. Aligning I/O buffers on memory pages is also known to improve regular buffered I/O performance. Three classes of I/O buffers for regular data pages are adjusted: (1) Heap buffers are now allocated with the new palloc_aligned() or MemoryContextAllocAligned() functions introduced by commit `439f6175`. (2) Stack buffers now use a new struct PGIOAlignedBlock to respect PG_IO_ALIGN_SIZE, if possible with this compiler. (3) The buffer pool is also aligned in shared memory. WAL buffers were already aligned on XLOG_BLCKSZ. It's possible for XLOG_BLCKSZ to be configured smaller than PG_IO_ALIGNED_SIZE and thus for O_DIRECT WAL writes to fail to be well aligned, but that's a pre-existing condition and will be addressed by a later commit. BufFiles are not yet addressed (there's no current plan to use O_DIRECT for those, but they could potentially get some incidental speedup even in plain buffered I/O operations through better alignment). If we can't align stack objects suitably using the compiler extensions we know about, we disable the use of O_DIRECT by setting PG_O_DIRECT to 0. This avoids the need to consider systems that have O_DIRECT but can't align stack objects the way we want; such systems could in theory be supported with more work but we don't currently know of any such machines, so it's easier to pretend there is no O_DIRECT support instead. That's an existing and tested class of system. Add assertions that all buffers passed into smgrread(), smgrwrite() and smgrextend() are correctly aligned, unless PG_O_DIRECT is 0 (= stack alignment tricks may be unavailable) or the block size has been set too small to allow arrays of buffers to be all aligned. Author: Thomas Munro <thomas.munro@gmail.com> Author: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/CA+hUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg@mail.gmail.com	2023-04-08 16:34:50 +12:00
Tom Lane	db6957bae8	Add missing .gitignore entry. Seems an oversight in `7d8219a44`. Fix before somebody commits a generated file.	2023-04-07 23:32:49 -04:00
Stephen Frost	3d4fa227bc	Add support for Kerberos credential delegation Support GSSAPI/Kerberos credentials being delegated to the server by a client. With this, a user authenticating to PostgreSQL using Kerberos (GSSAPI) credentials can choose to delegate their credentials to the PostgreSQL server (which can choose to accept them, or not), allowing the server to then use those delegated credentials to connect to another service, such as with postgres_fdw or dblink or theoretically any other service which is able to be authenticated using Kerberos. Both postgres_fdw and dblink are changed to allow non-superuser password-less connections but only when GSSAPI credentials have been delegated to the server by the client and GSSAPI is used to authenticate to the remote system. Authors: Stephen Frost, Peifeng Qiu Reviewed-By: David Christensen Discussion: https://postgr.es/m/CO1PR05MB8023CC2CB575E0FAAD7DF4F8A8E29@CO1PR05MB8023.namprd05.prod.outlook.com	2023-04-07 21:58:04 -04:00
Andres Freund	ac8d53dae5	Track IO times in pg_stat_io `a9c70b46db` and 8aaa04b32S added counting of IO operations to a new view, pg_stat_io. Now, add IO timing for reads, writes, extends, and fsyncs to pg_stat_io as well. This combines the tracking for pgBufferUsage with the tracking for pg_stat_io into a new function pgstat_count_io_op_time(). This should make it a bit easier to avoid the somewhat costly instr_time conversion done for pgBufferUsage. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ay5iKmnbXZ3DsauViF3eMxu4m1oNnJXqV_HyqYeg55Ww%40mail.gmail.com	2023-04-07 17:04:56 -07:00
Peter Geoghegan	1c453cfd89	Show more detail in nbtree rmgr descriptions. Show a detailed description of the page offset number arrays that appear in certain nbtree WAL records. Also brings nbtree desc routines in line with the guidelines established by recent commit `7d8219a4`. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/flat/20230109215842.fktuhesvayno6o4g%40awork3.anarazel.de	2023-04-07 16:46:23 -07:00
Stephen Frost	ce5e234085	For Kerberos testing, disable DNS lookups Similar to `8dff2f224`, this disables DNS lookups by the Kerberos library to look up the KDC and the realm while the Kerberos tests are running. In some environments, these lookups can take a long time and end up timing out and causing tests to fail. Further, since this isn't really our domain, we shouldn't be sending out these DNS requests during our tests.	2023-04-07 19:36:46 -04:00
Peter Geoghegan	7d8219a444	Show more detail in heapam rmgr descriptions. Add helper functions that output arrays in a standard format, and use the functions inside heapdesc routines. This allows tools like pg_walinspect to show a detailed description of the page offset number arrays for records like PRUNE and VACUUM (unless there was an FPI). Also document the conventions that desc routines should follow. Only the heapdesc routines follow the conventions for now, so they're just guidelines for the time being. Based on a suggestion from Andres Freund. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/flat/20230109215842.fktuhesvayno6o4g%40awork3.anarazel.de	2023-04-07 16:08:52 -07:00
Andres Freund	728015a470	Fix table name clash in recently introduced test A few buildfarm animals recently started complaining about the "child" relation already existing. `e056c557ae` added a new child table to inherit.sql, but triggers.sql, running in the same parallel group, also uses a child table. Rename the new table to inh_child. It maybe worth renaming child, parent in other tests as well, but that's work for another day. Discussion: https://postgr.es/m/20230407204530.52q3v5cu5x6dj676@awork3.anarazel.de	2023-04-07 14:02:46 -07:00
Andres Freund	704261ecc6	Improve IO accounting for temp relation writes Both pgstat_database and pgBufferUsage count IO timing for reads of temporary relation blocks into local buffers. However, both failed to count write IO timing for flushes of dirty local buffers. Fix. Additionally, FlushRelationBuffers() seems to have omitted counting write IO (both count and timing) stats for both pgstat_database and pgBufferUsage. Fix. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20230321023451.7rzy4kjj2iktrg2r%40awork3.anarazel.de	2023-04-07 13:24:26 -07:00
Daniel Gustafsson	bf5a894c55	Test SCRAM iteration changes with psql \password A version of this test was included in the original patch for altering SCRAM iteration count, but was omitted due to how interactive psql TAP sessions worked before being refactored. Discussion: https://postgr.es/m/20230130194350.zj5v467x4jgqt3d6@awork3.anarazel.de Discussion: https://postgr.es/m/F72E7BC7-189F-4B17-BF47-9735EB72C364@yesql.se	2023-04-07 22:14:23 +02:00
Daniel Gustafsson	664d757531	Refactor background psql TAP functions This breaks out the background and interactive psql functionality into a new class, PostgreSQL::Test::BackgroundPsql. Sessions are still initiated via PostgreSQL::Test::Cluster, but once started they can be manipulated by the new helper functions which intend to make querying easier. A sample session for a command which can be expected to finish at a later time can be seen below. my $session = $node->background_psql('postgres'); $bsession->query_until(qr/start/, q( \echo start CREATE INDEX CONCURRENTLY idx ON t(a); )); $bsession->quit; Patch by Andres Freund with some additional hacking by me. Author: Andres Freund <andres@anarazel.de> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://postgr.es/m/20230130194350.zj5v467x4jgqt3d6@awork3.anarazel.de	2023-04-07 22:14:20 +02:00
Alvaro Herrera	32bc0d022d	Fix underspecified sort order in test query Fail in `e056c557ae`.	2023-04-07 20:30:04 +02:00
Alvaro Herrera	e056c557ae	Catalog NOT NULL constraints We now create pg_constaint rows for NOT NULL constraints with contype='n'. We propagate these constraints during operations such as adding inheritance relationships, creating and attaching partitions, creating tables LIKE other tables. We mostly follow the well-known rules of conislocal and coninhcount that we have for CHECK constraints, with some adaptations; for example, as opposed to CHECK constraints, we don't match NOT NULL ones by name when descending a hierarchy to alter it; instead we match by column number. This means we don't require the constraint names to be identical across a hierarchy. For now, we omit them from system catalogs. Maybe this is worth reconsidering. We don't support NOT VALID nor DEFERRABLE clauses either; these can be added as separate features later (this patch is already large and complicated enough.) This has been very long in the making. The first patch was written by Bernd Helmle in 2010 to add a new pg_constraint.contype value ('n'), which I (Álvaro) then hijacked in 2011 and 2012, until that one was killed by the realization that we ought to use contype='c' instead: manufactured CHECK constraints. However, later SQL standard development, as well as nonobvious emergent properties of that design (mostly, failure to distinguish them from "normal" CHECK constraints as well as the performance implication of having to test the CHECK expression) led us to reconsider this choice, so now the current implementation uses contype='n' again. In 2016 Vitaly Burovoy also worked on this feature[1] but found no consensus for his proposed approach, which was claimed to be closer to the letter of the standard, requiring additional pg_attribute columns to track the OID of the NOT NULL constraint for that column. [1] https://postgr.es/m/CAKOSWNkN6HSyatuys8xZxzRCR-KL1OkHS5-b9qd9bf1Rad3PLA@mail.gmail.com Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Bernd Helmle <mailings@oopsware.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/CACA0E642A0267EDA387AF2B%40%5B172.26.14.62%5D Discussion: https://postgr.es/m/AANLkTinLXMOEMz+0J29tf1POokKi4XDkWJ6-DDR9BKgU@mail.gmail.com Discussion: https://postgr.es/m/20110707213401.GA27098@alvh.no-ip.org Discussion: https://postgr.es/m/1343682669-sup-2532@alvh.no-ip.org Discussion: https://postgr.es/m/CAKOSWNkN6HSyatuys8xZxzRCR-KL1OkHS5-b9qd9bf1Rad3PLA@mail.gmail.com Discussion: https://postgr.es/m/20220817181249.q7qvj3okywctra3c@alvherre.pgsql	2023-04-07 19:59:57 +02:00
Tom Lane	ff245a3788	Doc: improve descriptions of max_[pred_]locks_per_transaction GUCs. The old wording described these as being multiplied by max_connections plus max_prepared_transactions, which hasn't been exactly right for some time thanks to the addition of various auxiliary processes. Moreover, exactness here is a bit pointless given that the lock tables can expand into the initially-unallocated "slop" space in shared memory. Rather than trying to track exactly what the code is doing, let's just use the term "server processes". Likewise adjust these GUCs' description strings in guc_tables.c. Wang Wei, reviewed by Nathan Bossart and myself Discussion: https://postgr.es/m/OS3PR01MB6275BDD09C9B875C65FCC5AB9EA39@OS3PR01MB6275.jpnprd01.prod.outlook.com	2023-04-07 13:29:29 -04:00
Tom Lane	888f2ea0a8	Add array_sample() and array_shuffle() functions. These are useful in Monte Carlo applications. Martin Kalcher, reviewed/adjusted by Daniel Gustafsson and myself Discussion: https://postgr.es/m/9d160a44-7675-51e8-60cf-6d64b76db831@aboutsource.net	2023-04-07 11:47:07 -04:00
Tom Lane	cd82e5c79d	Fix locale-dependent test case. psql parses the interval argument of \watch with locale-dependent strtod(). In commit `00beecfe8` I added a test case that exercises a fractional interval, but I hard-coded 0.01 which doesn't work in locales where the radix point isn't ".". We don't want to change this longstanding parsing behavior, so fix the test case to generate a suitably locale-aware spelling. Report and patch by Alexander Korotkov. Discussion: https://postgr.es/m/CAPpHfdv+10Uk6FWjsh3+ju7kHYr76LaRXbYayXmrM7FBU-=Hgg@mail.gmail.com	2023-04-07 10:35:11 -04:00
Andres Freund	21d7c05a5c	Fix copy-paste bug in `12f3867f55` triggering an assert after a write error The same condition accidentally was copied to both branches. Manual testing confirms that otherwise the error recovery path works fine. Found while reviewing the logical-decoding-on-standby patch.	2023-04-07 01:02:46 -07:00
Amit Kapila	96c498d2f8	Add tab-completion for newly added SUBSCRIPTION options. Commits `c3afe8cf5a` and `482675987b` added new subscription options "password_required" and "run_as_owner". This patch adds tab-completion for these newly added options. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Pu=pnJf=SS1583pknSQ3CbOqLCkWcJCQYt6zxTagHEdmw@mail.gmail.com	2023-04-07 10:32:36 +05:30
Michael Paquier	8fcb32db98	Add more protections in WAL record APIs against overflows This commit adds a limit to the size of an XLogRecord at 1020MB, based on a suggestion by Heikki Linnakangas. This counts for the overhead needed by the XLogReader when allocating the memory it needs to read a record in DecodeXLogRecordRequiredSpace(), based on the record size. An assertion based on that is added to detect that any additions in the XLogReader facilities would not cause any overflows. If that's ever the case, the upper bound allowed would need to be adjusted. Before this, it was possible for an external module to create WAL records large enough to be assembled but not replayable, causing failures when replaying such WAL records on standbys. One case mentioned where this is possible is the in-core function pg_logical_emit_message() (wrapper for LogLogicalMessage), that allows to emit WAL records with an arbitrary amount of data potentially higher than the replay limit of approximately 1GB (limit of a palloc, minus the overhead needed by a XLogReader). This commit is a follow-up of `ffd1b6b` that has added similar protections for the block-level data. Here, the checks are extended to the whole record length, mainrdata_len being extended from uint32 to uint64 with the routines registering buffer and record data still limited to uint32 to minimize the checks when assembling a record. All the error messages related to overflow checks are improved to provide more context about the error happening. Author: Matthias van de Meent Reviewed-by: Andres Freund, Heikki Linnakangas, Michael Paquier Discussion: https://postgr.es/m/CAEze2WgGiw+LZt+vHf8tWqB_6VxeLsMeoAuod0N=ij1q17n5pw@mail.gmail.com	2023-04-07 10:10:17 +09:00
Andres Freund	26158b852d	Use ExtendBufferedRelTo() in XLogReadBufferExtended() Instead of extending the relation block-by-block, use ExtendBufferedRelTo(), introduced in `31966b151e`. This is faster and simpler. This also somewhat reduces the danger that disconnected segments pose (which can be "discovered" once the previous segment reaches SEGSIZE), as ExtendBufferedRelTo() won't extend past the block it has been asked. However, the risk of the content of such a disconnected segment being invalid remains. Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de Discussion: https://postgr.es/m/20230223010147.32oir7sb66slqnjk@awork3.anarazel.de	2023-04-06 17:56:17 -07:00
David Rowley	ae78cae3be	Add --buffer-usage-limit option to vacuumdb `1cbbee033` added BUFFER_USAGE_LIMIT to the VACUUM and ANALYZE commands, so here we permit that option to be specified in vacuumdb. In passing, adjust the documents for vacuum_buffer_usage_limit and the BUFFER_USAGE_LIMIT VACUUM option to mention "kB" rather than "KB". Do the same for the ERROR message in ExecVacuum() and check_vacuum_buffer_usage_limit(). Without that we might tell a user that the valid minimum value is 128 KB only to reject that because we accept only "kB" and not "KB". Also, add a small reminder comment in vacuum.h to try to trigger the memory of anyone adding new fields to VacuumParams that they might want to consider if vacuumdb needs to grow a new option too. Author: Melanie Plageman Reviewed-by: Justin Pryzby Reviewed-by: David Rowley Discussion: https://postgr.es/m/ZAzTg3iEnubscvbf@telsasoft.com	2023-04-07 12:47:10 +12:00
Andres Freund	00d1e02be2	hio: Use ExtendBufferedRelBy() to extend tables more efficiently While we already had some form of bulk extension for relations, it was fairly limited. It only amortized the cost of acquiring the extension lock, the relation itself was still extended one-by-one. Bulk extension was also solely triggered by contention, not by the amount of data inserted. To address this, use ExtendBufferedRelBy(), introduced in `31966b151e`, to extend the relation. We try to extend the relation by multiple blocks in two situations: 1) The caller tells RelationGetBufferForTuple() that it will need multiple pages. For now that's only used by heap_multi_insert(), see commit FIXME. 2) If there is contention on the extension lock, use the number of waiters for the lock as a multiplier for the number of blocks to extend by. This is similar to what we already did. Previously we additionally multiplied the numbers of waiters by 20, but with the new relation extension infrastructure I could not see a benefit in doing so. Using the freespacemap to provide empty pages can cause significant contention, and adds measurable overhead, even if there is no contention. To reduce that, remember the blocks the relation was extended by in the BulkInsertState, in the extending backend. In case 1) from above, the blocks the extending backend needs are not entered into the FSM, as we know that we will need those blocks. One complication with using the FSM to record empty pages, is that we need to insert blocks into the FSM, when we already hold a buffer content lock. To avoid doing IO while holding a content lock, release the content lock before recording free space. Currently that opens a small window in which another backend could fill the block, if a concurrent VACUUM records the free space. If that happens, we retry, similar to the already existing case when otherBuffer is provided. In the future it might be worth closing the race by preventing VACUUM from recording the space in newly extended pages. This change provides very significant wins (3x at 16 clients, on my workstation) for concurrent COPY into a single relation. Even single threaded COPY is measurably faster, primarily due to not dirtying pages while extending, if supported by the operating system (see commit `4d330a61bb`). Even single-row INSERTs benefit, although to a much smaller degree, as the relation extension lock rarely is the primary bottleneck. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-06 16:53:17 -07:00
David Rowley	1cbbee0338	Add VACUUM/ANALYZE BUFFER_USAGE_LIMIT option Add new options to the VACUUM and ANALYZE commands called BUFFER_USAGE_LIMIT to allow users more control over how large to make the buffer access strategy that is used to limit the usage of buffers in shared buffers. Larger rings can allow VACUUM to run more quickly but have the drawback of VACUUM possibly evicting more buffers from shared buffers that might be useful for other queries running on the database. Here we also add a new GUC named vacuum_buffer_usage_limit which controls how large to make the access strategy when it's not specified in the VACUUM/ANALYZE command. This defaults to 256KB, which is the same size as the access strategy was prior to this change. This setting also controls how large to make the buffer access strategy for autovacuum. Per idea by Andres Freund. Author: Melanie Plageman Reviewed-by: David Rowley Reviewed-by: Andres Freund Reviewed-by: Justin Pryzby Reviewed-by: Bharath Rupireddy Discussion: https://postgr.es/m/20230111182720.ejifsclfwymw2reb@awork3.anarazel.de	2023-04-07 11:40:31 +12:00
Andres Freund	5279e9db8e	heapam: Pass number of required pages to RelationGetBufferForTuple() A future commit will use this information to determine how aggressively to extend the relation by. In heap_multi_insert() we know accurately how many pages we need once we need to extend the relation, providing an accurate lower bound for how much to extend. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-06 16:17:16 -07:00
Daniel Gustafsson	7d71d3dd08	Refresh cost-based delay params more frequently in autovacuum Allow autovacuum to reload the config file more often so that cost-based delay parameters can take effect while VACUUMing a relation. Previously, autovacuum workers only reloaded the config file once per relation vacuumed, so config changes could not take effect until beginning to vacuum the next table. Now, check if a reload is pending roughly once per block, when checking if we need to delay. In order for autovacuum workers to safely update their own cost delay and cost limit parameters without impacting performance, we had to rethink when and how these values were accessed. Previously, an autovacuum worker's wi_cost_limit was set only at the beginning of vacuuming a table, after reloading the config file. Therefore, at the time that autovac_balance_cost() was called, workers vacuuming tables with no cost-related storage parameters could still have different values for their wi_cost_limit_base and wi_cost_delay. Now that the cost parameters can be updated while vacuuming a table, workers will (within some margin of error) have no reason to have different values for cost limit and cost delay (in the absence of cost-related storage parameters). This removes the rationale for keeping cost limit and cost delay in shared memory. Balancing the cost limit requires only the number of active autovacuum workers vacuuming a table with no cost-based storage parameters. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com	2023-04-07 01:00:21 +02:00
Daniel Gustafsson	a85c60a945	Separate vacuum cost variables from GUCs Vacuum code run both by autovacuum workers and a backend doing VACUUM/ANALYZE previously inspected VacuumCostLimit and VacuumCostDelay, which are the global variables backing the GUCs vacuum_cost_limit and vacuum_cost_delay. Autovacuum workers needed to override these variables with their own values, derived from autovacuum_vacuum_cost_limit and autovacuum_vacuum_cost_delay and worker cost limit balancing logic. This led to confusing code which, in some cases, both derived and set a new value of VacuumCostLimit from VacuumCostLimit. In preparation for refreshing these GUC values more often, introduce new, independent global variables and add a function to update them using the GUCs and existing logic. Per suggestion by Kyotaro Horiguchi Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com	2023-04-07 00:54:53 +02:00
Daniel Gustafsson	71a825194f	Make vacuum failsafe_active globally visible While vacuuming a table in failsafe mode, VacuumCostActive should not be re-enabled. This currently isn't a problem because vacuum cost parameters are only refreshed in between vacuuming tables and failsafe status is reset for every table. In preparation for allowing vacuum cost parameters to be updated more frequently, elevate LVRelState->failsafe_active to a global, VacuumFailsafeActive, which will be checked when determining whether or not to re-enable vacuum cost-related delays. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com	2023-04-07 00:54:08 +02:00
Tom Lane	5499706bdf	Stabilize just-added regression test cases. The tests added by commits `029dea882` et al turn out to produce different output under -DRANDOMIZE_ALLOCATED_MEMORY. This is not a bug exactly: that flag causes coerce_type() to invoke the input function twice when coercing an unknown-type literal to a specific type. So you get tsqueryin's bleat about an empty tsquery twice. Revise the test query to avoid that. Discussion: https://postgr.es/m/20230406213813.uep7plg6lvcywujo@awork3.anarazel.de	2023-04-06 18:13:49 -04:00
Tom Lane	31ae2aa9d2	psql: set SHELL_ERROR and SHELL_EXIT_CODE in more places. Make the \g, \o, \w, and \copy commands set these variables when closing a pipe. We missed doing this in commit `b0d8f2d98`, but it seems like a good idea. There are some remaining places in psql that intentionally don't update these variables after running a child program: * pager invocations * backtick evaluation within a prompt * \e (edit query buffer) Corey Huinker and Tom Lane Discussion: https://postgr.es/m/CADkLM=eSKwRGF-rnRqhtBORRtL49QsjcVUCa-kLxKTqxypsakw@mail.gmail.com	2023-04-06 17:33:38 -04:00
Tom Lane	029dea882a	Fix ts_headline() edge cases for empty query and empty search text. tsquery's GETQUERY() macro is only safe to apply to a tsquery that is known non-empty; otherwise it gives a pointer to garbage. Before commit `5a617d75d`, ts_headline() avoided this pitfall, but only in a very indirect, nonobvious way. (hlCover could not reach its TS_execute call, because if the query contains no lexemes then hlFirstIndex would surely return -1.) After that commit, it fell into the trap, resulting in weird errors such as "unrecognized operator" and/or valgrind complaints. In HEAD, fix this by not calling TS_execute_locations() at all for an empty query. In the back branches, add a defensive check to hlCover() --- that's not fixing any live bug, but I judge the code a bit too fragile as-is. Also, both mark_hl_fragments() and mark_hl_words() were careless about the possibility of empty search text: in the cases where no match has been found, they'd end up telling mark_fragment() to mark from word indexes 0 to 0 inclusive, even when there is no word 0. This is harmless since we over-allocated the prs->words array, but it does annoy valgrind. Fix so that the end index is -1 and thus mark_fragment() will do nothing in such cases. Bottom line is that this fixes a live bug in HEAD, but in the back branches it's only getting rid of a valgrind nitpick. Back-patch anyway. Per report from Alexander Lakhin. Discussion: https://postgr.es/m/c27f642d-020b-01ff-ae61-086af287c4fd@gmail.com	2023-04-06 15:52:44 -04:00
Andres Freund	18103b7c5f	hio: Don't pin the VM while holding buffer lock while extending Starting with commit `7db0cd2145`, RelationGetBufferForTuple() did a visibilitymap_pin() while holding an exclusive buffer content lock on a newly extended page, when using COPY FREEZE. We elsewhere try hard to avoid to doing IO while holding a content lock. And until `14f98e0af9`, that happened while holding the relation extension lock. Practically, this isn't a huge issue, because COPY FREEZE is restricted to relations created or truncated in the current session, so it's unlikely there's a lot of contention. We can't avoid doing IO while holding the content lock by pinning the VM earlier, because we don't know which page it will be on. While we could just ignore the issue in this case, a future commit will add bulk relation extension, which needs to enter pages into the FSM while also trying to hold onto a buffer lock. To address this issue, use visibilitymap_pin_ok() to see if the relevant buffer is already pinned. If not, release the buffer, pin the VM buffer, and acquire the lock again. This opens up a small window for other backends to insert data onto the page - as the page is not entered into the freespacemap, other backends won't see it normally, but a concurrent vacuum could enter the page, if it started just after the relation is extended. In case the page is used by another backend, retry. This is very similar to how locking "otherBuffer" is already dealt with. Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: http://postgr.es/m/20230325025740.wzvchp2kromw4zqz@awork3.anarazel.de	2023-04-06 11:11:13 -07:00
Andres Freund	bba9003b62	hio: Relax rules for calling GetVisibilityMapPins() GetVisibilityMapPins() insisted on the buffer1/buffer2 being in a specific order. This required checks at the callsite. As a subsequent patch will add another callsite, move related logic into GetVisibilityMapPins(). Discussion: https://postgr.es/m/20230403190030.fk2frxv6faklrseb@awork3.anarazel.de	2023-04-06 10:36:30 -07:00
Tom Lane	00beecfe83	psql: add an optional execution-count limit to \watch. \watch can now be told to stop after N executions of the query. With the idea that we might want to add more options to \watch in future, this patch generalizes the command's syntax to a list of name=value options, with the interval allowed to omit the name for backwards compatibility. Andrey Borodin, reviewed by Kyotaro Horiguchi, Nathan Bossart, Michael Paquier, Yugo Nagata, and myself Discussion: https://postgr.es/m/CAAhFRxiZ2-n_L1ErMm9AZjgmUK=qS6VHb+0SaMn8sqqbhF7How@mail.gmail.com	2023-04-06 13:18:14 -04:00
Tomas Vondra	2820adf775	Support long distance matching for zstd compression zstd compression supports a special mode for finding matched in distant past, which may result in better compression ratio, at the expense of using more memory (the window size is 128MB). To enable this optional mode, use the "long" keyword when specifying the compression method (--compress=zstd:long). Author: Justin Pryzby Reviewed-by: Tomas Vondra, Jacob Champion Discussion: https://postgr.es/m/20230224191840.GD1653@telsasoft.com Discussion: https://postgr.es/m/20220327205020.GM28503@telsasoft.com	2023-04-06 17:18:42 +02:00
David Rowley	b9b125b9c1	Move various prechecks from vacuum() into ExecVacuum() vacuum() is used for both the VACUUM command and for autovacuum. There were many prechecks being done inside vacuum() that were just not relevant to autovacuum. Let's move the bulk of these into ExecVacuum() so that they're only executed when running the VACUUM command. This removes a small amount of overhead when autovacuum vacuums a table. While we are at it, allocate VACUUM's BufferAccessStrategy in ExecVacuum() and pass it into vacuum() instead of expecting vacuum() to make it if it's not already made by the calling function. To make this work, we need to create the vacuum memory context slightly earlier, so we now need to pass that down to vacuum() so that it's available for use in other memory allocations. Author: Melanie Plageman Reviewed-by: David Rowley Discussion: https://postgr.es/m/20230405211534.4skgskbilnxqrmxg@awork3.anarazel.de	2023-04-06 15:44:52 +12:00
Andres Freund	acab1b0914	Convert many uses of ReadBuffer[Extended](P_NEW) to ExtendBufferedRel() A few places are not converted. Some because they are tackled in later commits (e.g. hio.c, xlogutils.c), some because they are more complicated (e.g. brin_pageops.c). Having a few users of ReadBuffer(P_NEW) is good anyway, to ensure the backward compat path stays working. Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-05 18:57:29 -07:00
Andres Freund	fcdda1e4b5	Use ExtendBufferedRelTo() in {vm,fsm}_extend() This uses ExtendBufferedRelTo(), introduced in `31966b151e`, to extend the visibilitymap and freespacemap to the size needed. It also happens to fix a warning introduced in `3d6a98457d`, reported by Tom Lane. Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de Discussion: https://postgr.es/m/2194723.1680736788@sss.pgh.pa.us	2023-04-05 17:50:09 -07:00
David Rowley	bccd6908ca	Always make a BufferAccessStrategy for ANALYZE `32fbe0239` changed things so we didn't bother allocating the BufferAccessStrategy during VACUUM (ONLY_DATABASE_STATS); and VACUUM (FULL), however, it forgot to consider that VACUUM (FULL, ANALYZE) is a possible combination. That change would have resulted in such a command allowing ANALYZE to make full use of shared buffers, which wasn't intended, so fix that. Reported-by: Melanie Plageman Discussion: https://postgr.es/m/CAAKRu_bJRKe+v_=OqwC+5sA3j5qv8rqdAwy3+yHaO3wmtfrCRg@mail.gmail.com	2023-04-06 12:37:03 +12:00
Michael Paquier	1d477a907e	Fix row tracking in pg_stat_statements with extended query protocol pg_stat_statements relies on EState->es_processed to count the number of rows processed by ExecutorRun(). This proves to be a problem under the extended query protocol when the result of a query is fetched through more than one call of ExecutorRun(), as es_processed is reset each time ExecutorRun() is called. This causes pg_stat_statements to report the number of rows calculated in the last execute fetch, rather than the global sum of all the rows processed. As pquery.c tells, this is a problem when a portal does not use holdStore. For example, DMLs with RETURNING would report a correct tuple count as these do one execution cycle when the query is first executed to fill in the portal's store with one ExecutorRun(), feeding on the portal's store for each follow-up execute fetch depending on the fetch size requested by the client. The fix proposed for this issue is simple with the addition of an extra counter in EState that's preserved across multiple ExecutorRun() calls, incremented with the value calculated in es_processed. This approach is not back-patchable, unfortunately. Note that libpq does not currently give any way to control the fetch size when using the extended v3 protocol, meaning that in-core testing is not possible yet. This issue can be easily verified with the JDBC driver, though, with autocommit disabled. Hence, having in-core tests requires more features, left for future discussion: - At least two new libpq routines splitting PQsendQueryGuts(), one for the bind/describe and a second for a series of execute fetches with a custom fetch size, likely in a fashion similar to what JDBC does. - A psql meta-command for the execute phase. This part is not strictly mandatory, still it could be handy. Reported-by: Andrew Dunstan (original discovery by Simon Siggs) Author: Sami Imseih Reviewed-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/EBE6C507-9EB6-4142-9E4D-38B1673363A7@amazon.com Discussion: https://postgr.es/m/c90890e7-9c89-c34f-d3c5-d5c763a34bd8@dunslane.net	2023-04-06 09:29:03 +09:00
Andres Freund	31966b151e	bufmgr: Introduce infrastructure for faster relation extension The primary bottlenecks for relation extension are: 1) The extension lock is held while acquiring a victim buffer for the new page. Acquiring a victim buffer can require writing out the old page contents including possibly needing to flush WAL. 2) When extending via ReadBuffer() et al, we write a zero page during the extension, and then later write out the actual page contents. This can nearly double the write rate. 3) The existing bulk relation extension infrastructure in hio.c just amortized the cost of acquiring the relation extension lock, but none of the other costs. Unfortunately 1) cannot currently be addressed in a central manner as the callers to ReadBuffer() need to acquire the extension lock. To address that, this this commit moves the responsibility for acquiring the extension lock into bufmgr.c functions. That allows to acquire the relation extension lock for just the required time. This will also allow us to improve relation extension further, without changing callers. The reason we write all-zeroes pages during relation extension is that we hope to get ENOSPC errors earlier that way (largely works, except for CoW filesystems). It is easier to handle out-of-space errors gracefully if the page doesn't yet contain actual tuples. This commit addresses 2), by using the recently introduced smgrzeroextend(), which extends the relation, without dirtying the kernel page cache for all the extended pages. To address 3), this commit introduces a function to extend a relation by multiple blocks at a time. There are three new exposed functions: ExtendBufferedRel() for extending the relation by a single block, ExtendBufferedRelBy() to extend a relation by multiple blocks at once, and ExtendBufferedRelTo() for extending a relation up to a certain size. To avoid duplicating code between ReadBuffer(P_NEW) and the new functions, ReadBuffer(P_NEW) now implements relation extension with ExtendBufferedRel(), using a flag to tell ExtendBufferedRel() that the relation lock is already held. Note that this commit does not yet lead to a meaningful performance or scalability improvement - for that uses of ReadBuffer(P_NEW) will need to be converted to ExtendBuffered*(), which will be done in subsequent commits. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-05 16:21:09 -07:00
Daniel Gustafsson	8eda731465	Allow to use system CA pool for certificate verification This adds a new option to libpq's sslrootcert, "system", which will load the system trusted CA roots for certificate verification. This is a more convenient way to achieve this than pointing to the system CA roots manually since the location can differ by installation and be locally adjusted by env vars in OpenSSL. When sslrootcert is set to system, sslmode is forced to be verify-full as weaker modes aren't providing much security for public CAs. Changing the location of the system roots by setting environment vars is not supported by LibreSSL so the tests will use a heuristic to determine if the system being tested is LibreSSL or OpenSSL. The workaround in .cirrus.yml is required to handle a strange interaction between homebrew and the openssl@3 formula; hopefully this can be removed in the near future. The original patch was written by Thomas Habets, which was later revived by Jacob Champion. Author: Jacob Champion <jchampion@timescale.com> Author: Thomas Habets <thomas@habets.se> Reviewed-by: Jelte Fennema <postgres@jeltef.nl> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Magnus Hagander <magnus@hagander.net> Discussion: https://www.postgresql.org/message-id/flat/CA%2BkHd%2BcJwCUxVb-Gj_0ptr3_KZPwi3%2B67vK6HnLFBK9MzuYrLA%40mail.gmail.com	2023-04-05 23:22:17 +02:00
Andres Freund	12f3867f55	bufmgr: Support multiple in-progress IOs by using resowner A future patch will add support for extending relations by multiple blocks at once. To be concurrency safe, the buffers for those blocks need to be marked as BM_IO_IN_PROGRESS. Until now we only had infrastructure for recovering from an IO error for a single buffer. This commit extends that infrastructure to multiple buffers by using the resource owner infrastructure. This commit increases the size of the ResourceOwnerData struct, which appears to have a just about measurable overhead in very extreme workloads. Medium term we are planning to substantially shrink the size of ResourceOwnerData. Short term the increase is small enough to not worry about it for now. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de Discussion: https://postgr.es/m/20221029200025.w7bvlgvamjfo6z44@awork3.anarazel.de	2023-04-05 14:17:55 -07:00
Tom Lane	16dc2703c5	Support "Right Anti Join" plan shapes. Merge and hash joins can support antijoin with the non-nullable input on the right, using very simple combinations of their existing logic for right join and anti join. This gives the planner more freedom about how to order the join. It's particularly useful for hash join, since we may now have the option to hash the smaller table instead of the larger. Richard Guo, reviewed by Ronan Dunklau and myself Discussion: https://postgr.es/m/CAMbWs48xh9hMzXzSy3VaPzGAz+fkxXXTUbCLohX1_L8THFRm2Q@mail.gmail.com	2023-04-05 16:59:09 -04:00
Andres Freund	dad50f677c	bufmgr: Acquire and clean victim buffer separately Previously we held buffer locks for two buffer mapping partitions at the same time to change the identity of buffers. Particularly for extending relations needing to hold the extension lock while acquiring a victim buffer is painful.But it also creates a bottleneck for workloads that just involve reads. Now we instead first acquire a victim buffer and write it out, if necessary. Then we remove that buffer from the old partition with just the old partition's partition lock held and insert it into the new partition with just that partition's lock held. By separating out the victim buffer acquisition, future commits will be able to change relation extensions to scale better. On my workstation, a micro-benchmark exercising buffered reads strenuously and under a lot of concurrency, sees a >2x improvement. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-05 13:47:46 -07:00
Tom Lane	65eb2d00c6	Acquire locks on views in AcquirePlannerLocks, too. Commit `47bb9db75` taught AcquireExecutorLocks to re-acquire locks on views using data from their RTE_SUBQUERY replacements, but it now seems like we should make AcquirePlannerLocks do the same. In this way, if a view has been redefined, we will notice that a bit earlier while checking validity of a cached plan and thereby avoid some wasted work. Report and patch by Amit Langote. Discussion: https://postgr.es/m/CA+HiwqH0xZOQ+GQAdKeckY1R4NOeHdzhtfxkAMJLSchpapNk5w@mail.gmail.com	2023-04-05 15:56:35 -04:00
Tomas Vondra	84adc8e20f	pg_dump: Add support for zstd compression Allow pg_dump to use the zstd compression, in addition to gzip/lz4. Bulk of the new compression method is implemented in compress_zstd.{c,h}, covering the pg_dump compression APIs. The rest of the patch adds test and makes various places aware of the new compression method. The zstd library (which this patch relies on) supports multithreaded compression since version 1.5. We however disallow that feature for now, as it might interfere with parallel backups on platforms that rely on threads (e.g. Windows). This can be improved / relaxed in the future. This also fixes a minor issue in InitDiscoverCompressFileHandle(), which was not updated to check if the file already has the .lz4 extension. Adding zstd compression was originally proposed in 2020 (see the second thread), but then was reworked to use the new compression API introduced in `e9960732a9`. I've considered both threads when compiling the list of reviewers. Author: Justin Pryzby Reviewed-by: Tomas Vondra, Jacob Champion, Andreas Karlsson Discussion: https://postgr.es/m/20230224191840.GD1653@telsasoft.com Discussion: https://postgr.es/m/20201221194924.GI30237@telsasoft.com	2023-04-05 21:39:33 +02:00
Andres Freund	794f259447	bufmgr: Add Pin/UnpinLocalBuffer() So far these were open-coded in quite a few places, without a good reason. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-05 10:42:17 -07:00
Andres Freund	819b69a81d	bufmgr: Add some more error checking [infrastructure] around pinning This adds a few more assertions against a buffer being local in places we don't expect, and extracts the check for a buffer being pinned exactly once from LockBufferForCleanup() into its own function. Later commits will use this function. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: http://postgr.es/m/419312fd-9255-078c-c3e3-f0525f911d7f@iki.fi	2023-04-05 10:42:17 -07:00
Andres Freund	4d330a61bb	Add smgrzeroextend(), FileZero(), FileFallocate() smgrzeroextend() uses FileFallocate() to efficiently extend files by multiple blocks. When extending by a small number of blocks, use FileZero() instead, as using posix_fallocate() for small numbers of blocks is inefficient for some file systems / operating systems. FileZero() is also used as the fallback for FileFallocate() on platforms / filesystems that don't support fallocate. A big advantage of using posix_fallocate() is that it typically won't cause dirty buffers in the kernel pagecache. So far the most common pattern in our code is that we smgrextend() a page full of zeroes and put the corresponding page into shared buffers, from where we later write out the actual contents of the page. If the kernel, e.g. due to memory pressure or elapsed time, already wrote back the all-zeroes page, this can lead to doubling the amount of writes reaching storage. There are no users of smgrzeroextend() as of this commit. That will follow in future commits. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-05 10:06:39 -07:00
Tom Lane	4766eef317	Fix another issue with ENABLE/DISABLE TRIGGER on partitioned tables. In v13 and v14, the ENABLE/DISABLE TRIGGER USER variant malfunctioned on cloned triggers, failing to find the clones because it thought they were system triggers. Other variants of ENABLE/DISABLE TRIGGER would improperly apply a superuserness check. Fix by adjusting the is-it- a-system-trigger check to match reality in those branches. (As far as I can find, this is the only place that got it wrong.) There's no such bug in v15/HEAD, because we revised the catalog representation of system triggers to be what this code was expecting. However, add the test case to these branches anyway, because this area is visibly pretty fragile. Also remove an obsoleted comment. The recent v15/HEAD commit `6949b921d` fixed a nearby bug. I now see that my commit message for that was inaccurate: the behavior of recursing to clone triggers is older than v15, but it didn't apply to the case in v13/v14 because in those branches parent partitioned tables have no pg_trigger entries for foreign-key triggers. But add the test case from that commit to v13/v14, just to show what is happening there. Per bug #17886 from DzmitryH. Discussion: https://postgr.es/m/17886-5406d5d828aa4aa3@postgresql.org	2023-04-05 12:56:32 -04:00
Andres Freund	3d6a98457d	Don't initialize page in {vm,fsm}_extend(), not needed The read path needs to be able to initialize pages anyway, as relation extensions are not durable. By avoiding initializing pages, we can, in a future patch, extend the relation by multiple blocks at once. Using smgrextend() for {vm,fsm}_extend() is not a good idea in general, as at least one page of the VM/FSM will be read immediately after, always causing a cache miss, requiring us to read content we just wrote. Discussion: https://postgr.es/m/20230301223515.pucbj7nb54n4i4nv@awork3.anarazel.de	2023-04-05 08:19:39 -07:00
Robert Haas	86a3fc7ec8	Fix wrong word in comment. Reported by Peter Smith. Discussion: http://postgr.es/m/CAHut+Pt52ueOEAO-G5qeZiiPv1p9pBT_W5Vj3BTYfC8sD8LFxw@mail.gmail.com	2023-04-05 09:50:08 -04:00
Peter Eisentraut	f275af8cb8	Update information_schema for SQL:2023 This is mainly a light renumbering to match the sections in the standard. The comments for SQL_IMPLEMENTATION_INFO and SQL_SIZING are no longer applicable because the required information has been moved into part 9.	2023-04-05 09:57:44 +02:00
Peter Eisentraut	c9f57541d9	doc: Update SQL features/conformance information to SQL:2023 Optional subfeatures have been changed to top-level features, so there is a bit of a churn in the list for that. Some existing functions have been added to the standard, so they are moved from the "other" to the "standard" lists in their sections. Discussion: https://www.postgresql.org/message-id/flat/63f285d9-4ec8-0c9e-4bf5-e76334ddc0af@enterprisedb.com	2023-04-05 09:20:25 +02:00
Peter Eisentraut	c209d317e9	Fix minor signed/unsigned mixup The chunk header is unsigned, and the output format takes unsigned, so casting it to signed in between is incorrect.	2023-04-05 07:34:52 +02:00
Amit Kapila	8df0d3d530	Add Copyright notice in 001_basic.pl and 002_pg_upgrade.pl. Author: Kuroda Hayato Discussion: https://postgr.es/m/TYCPR01MB587073D91E372B8EF719931EF5929@TYCPR01MB5870.jpnprd01.prod.outlook.com	2023-04-05 09:20:14 +05:30
Andres Freund	3f695b3117	sequences: Lock buffer before initializing page fill_seq_fork_with_data(), used to initialize a new sequence relation, only locked the buffer after calling PageInit(), even though PageInit() modifies page contents. This is unlikely to cause real-world issues, as the relation is exclusively locked at that point, and the buffer not yet marked dirty, so other processes should not access the buffer. This issue looks to have been present since the introduction of sequences in `e8647c45d6`. Given the low risk, it does not seem worth backpatching the fix. Discussion: https://postgr.es/m/20230404185501.wdkmo3k7bedlx6qk@awork3.anarazel.de	2023-04-04 16:42:52 -07:00
Jeff Davis	36320cbc16	Fix MSVC warning introduced in `ea1db8ae70`. Discussion: https://postgr.es/m/CA+hUKGJR1BhCORa5WdvwxztD3arhENcwaN1zEQ1Upg20BwjKWA@mail.gmail.com Reported-by: Thomas Munro	2023-04-04 15:43:18 -07:00
Thomas Munro	f303ec6210	Remove comment obsoleted by `11c2d6fd`. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1604497.1680637072%40sss.pgh.pa.us	2023-04-05 09:40:34 +12:00
Jeff Davis	ea1db8ae70	Canonicalize ICU locale names to language tags. Convert to BCP47 language tags before storing in the catalog, except during binary upgrade or when the locale comes from an existing collation or template database. The resulting language tags can vary slightly between ICU versions. For instance, "@colBackwards=yes" is converted to "und-u-kb-true" in older versions of ICU, and to the simpler (but equivalent) "und-u-kb" in newer versions. The process of canonicalizing to a language tag also understands more input locale string formats than ucol_open(). For instance, "fr_CA.UTF-8" is misinterpreted by ucol_open() and the region is ignored; effectively treating it the same as the locale "fr" and opening the wrong collator. Canonicalization properly interprets the language and region, resulting in the language tag "fr-CA", which can then be understood by ucol_open(). This commit fixes a problem in prior versions due to ucol_open() misinterpreting locale strings as described above. For instance, creating an ICU collation with locale "fr_CA.UTF-8" would store that string directly in the catalog, which would later be passed to (and misinterpreted by) ucol_open(). After this commit, the locale string will be canonicalized to language tag "fr-CA" in the catalog, which will be properly understood by ucol_open(). Because this fix affects the resulting collator, we cannot change the locale string stored in the catalog for existing databases or collations; otherwise we'd risk corrupting indexes. Therefore, only canonicalize locales for newly-created (not upgraded) collations/databases. For similar reasons, do not backport. Discussion: https://postgr.es/m/8c7af6820aed94dc7bc259d2aa7f9663518e6137.camel@j-davis.com Reviewed-by: Peter Eisentraut	2023-04-04 10:38:58 -07:00
Tom Lane	d3d53f955c	Add a way to get the current function's OID in pl/pgsql. Invent "GET DIAGNOSTICS oid_variable = PG_ROUTINE_OID". This is useful for avoiding the maintenance nuisances that come with embedding a function's name in its body, as one might do for logging purposes for example. Typically users would cast the result to regproc or regprocedure to get something human-readable, but we won't pre-judge whether that's appropriate. Pavel Stehule, reviewed by Kirk Wolak and myself Discussion: https://postgr.es/m/CAFj8pRA4zMd5pY-B89Gm64bDLRt-L+akOd34aD1j4PEstHHSVQ@mail.gmail.com	2023-04-04 13:33:18 -04:00
Robert Haas	482675987b	Add a run_as_owner option to subscriptions. This option is normally false, but can be set to true to obtain the legacy behavior where the subscription runs with the permissions of the subscription owner rather than the permissions of the table owner. The advantages of this mode are (1) it doesn't require that the subscription owner have permission to SET ROLE to each table owner and (2) since no role switching occurs, the SECURITY_RESTRICTED_OPERATION restrictions do not apply. On the downside, it allows any table owner to easily usurp the privileges of the subscription owner - basically, to take over their account. Because that's generally quite undesirable, we don't make this mode the default, but we do make it available, just in case the new behavior causes too many problems for someone. Discussion: http://postgr.es/m/CA+TgmoZ-WEeG6Z14AfH7KhmpX2eFh+tZ0z+vf0=eMDdbda269g@mail.gmail.com	2023-04-04 12:03:03 -04:00
Robert Haas	1e10d49b65	Perform logical replication actions as the table owner. Up until now, logical replication actions have been performed as the subscription owner, who will generally be a superuser. Commit `cec57b1a0f` documented hazards associated with that situation, namely, that any user who owns a table on the subscriber side could assume the privileges of the subscription owner by attaching a trigger, expression index, or some other kind of executable code to it. As a remedy, it suggested not creating configurations where users who are not fully trusted own tables on the subscriber. Although that will work, it basically precludes using logical replication in the way that people typically want to use it, namely, to replicate a database from one node to another without necessarily having any restrictions on which database users can own tables. So, instead, change logical replication to execute INSERT, UPDATE, DELETE, and TRUNCATE operations as the table owner when they are replicated. Since this involves switching the active user frequently within a session that is authenticated as the subscription user, also impose SECURITY_RESTRICTED_OPERATION restrictions on logical replication code. As an exception, if the table owner can SET ROLE to the subscription owner, these restrictions have no security value, so don't impose them in that case. Subscription owners are now required to have the ability to SET ROLE to every role that owns a table that the subscription is replicating. If they don't, replication will fail. Superusers, who normally own subscriptions, satisfy this property by default. Non-superusers users who own subscriptions will need to be granted the roles that own relevant tables. Patch by me, reviewed (but not necessarily in its entirety) by Jelte Fennema, Jeff Davis, and Noah Misch. Discussion: http://postgr.es/m/CA+TgmoaSCkg9ww9oppPqqs+9RVqCexYCE6Aq=UsYPfnOoDeFkw@mail.gmail.com	2023-04-04 11:25:23 -04:00
Alvaro Herrera	71bfd1543f	Code review for recent SQL/JSON commits - At the last minute and for no particularly good reason, I changed the WITHOUT token to be marked especially for lookahead, from the one in WITHOUT TIME to the one in WITHOUT UNIQUE. Study of upcoming patches (where a new WITHOUT ARRAY WRAPPER clause is added) showed me that the former was better, so put it back the way the original patch had it. - update exprTypmod() for JsonConstructorExpr to return the typmod of the RETURNING clause, as a comment there suggested. Perhaps it's possible for this to make a difference with datetime types, but I didn't try to build a test case. - The nodeFuncs.c support code for new nodes was calling walker() directly instead of the WALK() macro as introduced by commit `1c27d16e6e`. Modernize that. Also add exprLocation() support for a couple of nodes that missed it. Lastly, reorder the code more sensibly. The WITHOUT_LA -> WITHOUT change means that stored rules containing either WITHOUT TIME ZONE or WITHOUT UNIQUE KEYS would change representation. Therefore, bump catversion. Discussion: https://postgr.es/m/20230329181708.e64g2tpy7jyufqkr@alvherre.pgsql	2023-04-04 14:04:30 +02:00
Andres Freund	8a2b1b1477	bufmgr: Remove buffer-write-dirty tracepoints The trace point was using the relfileno / fork / block for the to-be-read-in buffer. Some upcoming work would make that more expensive to provide. We still have buffer-flush-start/done, which can serve most tracing needs that buffer-write-dirty could serve. Discussion: https://postgr.es/m/f5164e7a-eef6-8972-75a3-8ac622ed0c6e@iki.fi	2023-04-03 18:02:41 -07:00
Peter Geoghegan	05a304a855	Make SP-GiST redirect cleanup more aggressive. Commit `61b313e4` made VACUUM pass down a heaprel to index AM bulkdelete and vacuumcleanup routines. Although this was primarily intended as preparation for logical decoding on standbys, it also made it easy to correct an old deficiency in how we determine how to cleanup SP-GiST redirect and placeholder tuples. Pass the heaprel to GlobalVisTestFor() during cleanup of redirect and placeholder tuples, rather than pessimistically passing NULL. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/02392033-f030-a3c8-c7d0-5c27eb529fec@gmail.com	2023-04-03 11:47:48 -07:00
Peter Geoghegan	e48c817395	Recycle deleted nbtree pages more aggressively. Commit `61b313e4` made nbtree consistently pass down a heaprel to low level routines like _bt_getbuf(). Although this was primarily intended as preparation for logical decoding on standbys, it also made it easy to correct an old deficiency in how nbtree VACUUM determines whether or not it's now safe to recycle deleted pages. Pass the heaprel to GlobalVisTestFor() in nbtree routines that deal with recycle safety. nbtree now makes less pessimistic assumptions about recycle safety within non-catalog relations. This enhancement complements the recycling enhancement added by commit `9dd963ae25`. nbtree remains just as pessimistic as ever when it comes to recycle safety within indexes on catalog relations. There is no fundamental reason why we need to treat catalog relations differently, though. The behavioral inconsistency is a consequence of the way that nbtree uses nextXID values to implement what Lanin and Shasha call "the drain technique". Note in particular that it has nothing to do with whether or not index tuples might still be required for an older MVCC snapshot. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkaiDxCje0yPuH=3Uh2p1V_2pFGY==xfbZoZu7Ax_NB8g@mail.gmail.com	2023-04-03 11:31:43 -07:00
Peter Geoghegan	a349b86603	Move heaprel struct field next to index rel field. Commit `61b313e4` added a heaprel struct member to IndexVacuumInfo, but placed it last. Move the heaprel struct member next to the index struct member to improve the code's readability. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznG=TV6S9d3VA=y0vBHbXwnLs9_LLdiML=aNJuHeriwxg@mail.gmail.com	2023-04-03 11:01:11 -07:00
Robert Haas	e7e7da2f8d	Fix possible logical replication crash. Commit `c3afe8cf5a` added a new password_required option but forgot that you need database access to check whether an arbitrary role ID is a superuser. Report and patch by Hou Zhijie. I added a comment. Thanks to Alexander Lakhin for devising a way to reproduce the crash. Discussion: http://postgr.es/m/OS0PR01MB5716BFD7EC44284C89F40808948F9@OS0PR01MB5716.jpnprd01.prod.outlook.com	2023-04-03 13:54:21 -04:00
Tom Lane	a8a00124f1	When using valgrind, log the current query after an error is detected. In USE_VALGRIND builds, add code to print the text of the current query to the valgrind log whenever the valgrind error count has increased during the query. Valgrind will already have printed its report, if the error is distinct from ones already seen, so that this works out fairly well as a way of annotating the log. Onur Tirtir and Tom Lane Discussion: https://postgr.es/m/AM9PR83MB0498531E804DC8DF8CFF0E8FE9D09@AM9PR83MB0498.EURPRD83.prod.outlook.com	2023-04-03 10:18:38 -04:00
Alexander Korotkov	b0b91ced16	Revert `764da7710b` Discussion: https://postgr.es/m/20230323003003.plgaxjqahjgkuxrk%40awork3.anarazel.de	2023-04-03 16:55:09 +03:00
Alexander Korotkov	2b65bf046d	Revert `11470f544e` Discussion: https://postgr.es/m/20230323003003.plgaxjqahjgkuxrk%40awork3.anarazel.de	2023-04-03 16:54:31 +03:00
David Rowley	8d928e3a9f	Rename BufferAccessStrategyData.ring_size to nbuffers The new name better reflects what the field is - the size of the buffers[] array. ring_size sounded more like it is in units of bytes. An upcoming commit allows a BufferAccessStrategy of custom sizes, so it seems relevant to improve this beforehand. Author: Melanie Plageman Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAAKRu_YefVbhg4rAxU2frYFdTap61UftH-kUNQZBvAs%2BYK81Zg%40mail.gmail.com	2023-04-03 23:31:16 +12:00
David Rowley	4830f10243	Disable vacuum's use of a buffer access strategy during failsafe Traditionally, vacuum always makes use of a buffer access strategy 32 buffers in size. This means that running vacuums tend not to cause too many shared buffers to become dirty, however, this can cause vacuums to run much more slowly than they otherwise could as WAL flushes will occur more frequently due to having to flush WAL out to the LSN of the dirty page before that page can be written to disk. When we are performing failsafe VACUUMs (as added in `1e55e7d17`), we really want to make the vacuum work go as quickly as possible, so here we disable the buffer access strategy when entering failsafe mode while vacuuming a relation. Per idea and analyis from Andres Freund. In passing, also include some changes I had intended for `32fbe0239`. Author: Melanie Plageman Reviewed-by: Justin Pryzby, David Rowley Discussion: https://postgr.es/m/20230111182720.ejifsclfwymw2reb%40awork3.anarazel.de	2023-04-03 23:05:58 +12:00
Daniel Gustafsson	525fb0a171	Fix typo in CI README s/fron/from/	2023-04-03 10:50:17 +02:00
David Rowley	32fbe0239b	Only make buffer strategy for vacuum when it's likely needed VACUUM FULL and VACUUM ONLY_DATABASE_STATS will not use the vacuum strategy ring created in vacuum(), so don't waste effort making it in those cases. There are other conceivable cases where the buffer strategy also won't be used, but those are probably less common and not worth troubling over too much. For example VACUUM (PROCESS_MAIN false, PROCESS_TOAST false). There are other cases too, but many of these are only discovered once inside vacuum_rel(). Author: Melanie Plageman Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAAKRu_ZLRuzkM3zKogiZAz2hUony37yLY4aaLb8fPf8fgqs5Og@mail.gmail.com	2023-04-03 19:19:45 +12:00
Peter Eisentraut	1980d3585e	pg_basebackup: Correct type of WalSegSz The pg_basebackup code had WalSegSz as uint32, whereas the rest of the code has it as int. This seems confusing, and using the extra range wouldn't actually work. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://www.postgresql.org/message-id/flat/1bf15c7a-0acd-1864-081e-7a28814310fe%40enterprisedb.com	2023-04-03 07:21:06 +02:00
David Rowley	3f476c9534	Remove some global variables from vacuum.c Using global variables because we don't want to pass these values around as parameters does not really seem like a great idea, so let's remove these two global variables and adjust a few functions to accept these values as parameters instead. This is part of a wider patch which intends to allow the size of the buffer access strategy that vacuum uses to be adjusted. Author: Melanie Plageman Reviewed-by: Bharath Rupireddy Discussion: https://postgr.es/m/CAAKRu_b1q_07uquUtAvLqTM%3DW9nzee7QbtzHwA4XdUo7KX_Cnw%40mail.gmail.com	2023-04-03 15:07:25 +12:00
Tom Lane	2e6ba13152	Doc: update pgindent/README. I missed updating this when we pulled pg_bsd_indent into the tree.	2023-04-02 20:01:34 -04:00
Andres Freund	6af1793954	Add info in WAL records in preparation for logical slot conflict handling This commit only implements one prerequisite part for allowing logical decoding. The commit message contains an explanation of the overall design, which later commits will refer back to. Overall design: 1. We want to enable logical decoding on standbys, but replay of WAL from the primary might remove data that is needed by logical decoding, causing error(s) on the standby. To prevent those errors, a new replication conflict scenario needs to be addressed (as much as hot standby does). 2. Our chosen strategy for dealing with this type of replication slot is to invalidate logical slots for which needed data has been removed. 3. To do this we need the latestRemovedXid for each change, just as we do for physical replication conflicts, but we also need to know whether any particular change was to data that logical replication might access. That way, during WAL replay, we know when there is a risk of conflict and, if so, if there is a conflict. 4. We can't rely on the standby's relcache entries for this purpose in any way, because the startup process can't access catalog contents. 5. Therefore every WAL record that potentially removes data from the index or heap must carry a flag indicating whether or not it is one that might be accessed during logical decoding. Why do we need this for logical decoding on standby? First, let's forget about logical decoding on standby and recall that on a primary database, any catalog rows that may be needed by a logical decoding replication slot are not removed. This is done thanks to the catalog_xmin associated with the logical replication slot. But, with logical decoding on standby, in the following cases: - hot_standby_feedback is off - hot_standby_feedback is on but there is no a physical slot between the primary and the standby. Then, hot_standby_feedback will work, but only while the connection is alive (for example a node restart would break it) Then, the primary may delete system catalog rows that could be needed by the logical decoding on the standby (as it does not know about the catalog_xmin on the standby). So, it’s mandatory to identify those rows and invalidate the slots that may need them if any. Identifying those rows is the purpose of this commit. Implementation: When a WAL replay on standby indicates that a catalog table tuple is to be deleted by an xid that is greater than a logical slot's catalog_xmin, then that means the slot's catalog_xmin conflicts with the xid, and we need to handle the conflict. While subsequent commits will do the actual conflict handling, this commit adds a new field isCatalogRel in such WAL records (and a new bit set in the xl_heap_visible flags field), that is true for catalog tables, so as to arrange for conflict handling. The affected WAL records are the ones that already contain the snapshotConflictHorizon field, namely: - gistxlogDelete - gistxlogPageReuse - xl_hash_vacuum_one_page - xl_heap_prune - xl_heap_freeze_page - xl_heap_visible - xl_btree_reuse_page - xl_btree_delete - spgxlogVacuumRedirect Due to this new field being added, xl_hash_vacuum_one_page and gistxlogDelete do now contain the offsets to be deleted as a FLEXIBLE_ARRAY_MEMBER. This is needed to ensure correct alignment. It's not needed on the others struct where isCatalogRel has been added. This commit just introduces the WAL format changes mentioned above. Handling the actual conflicts will follow in future commits. Bumps XLOG_PAGE_MAGIC as the several WAL records are changed. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> (in an older version) Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>	2023-04-02 12:32:19 -07:00
Noah Misch	eaa1dd131c	Use PG_TEST_TIMEOUT_DEFAULT in 019_replslot_limit.pl. Oversight in `f2698ea02c`, which introduced the variable. This lowers some 1000s timeouts to the configurable default of 180s, due to a lack of evidence for needing the longer timeout. The alternative was 6*PG_TEST_TIMEOUT_DEFAULT, which we can adopt if the need arises. Given the lack of observed trouble with these timeouts, no back-patch.	2023-04-02 09:31:09 -07:00
Andres Freund	61b313e47e	Pass down table relation into more index relation functions This is done in preparation for logical decoding on standby, which needs to include whether visibility affecting WAL records are about a (user) catalog table. Which is only known for the table, not the indexes. It's also nice to be able to pass the heap relation to GlobalVisTestFor() in vacuumRedirectAndPlaceholder(). Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/21b700c3-eecf-2e05-a699-f8c78dd31ec7@gmail.com	2023-04-01 20:18:29 -07:00
Andres Freund	a88a18b125	Assert only valid flag bits are passed to visibilitymap_set() If visibilitymap_set() is called with flags containing a higher bit than VISIBILITYMAP_ALL_FROZEN, the state of neighboring pages is affected. While there was an assertion that some valid bits were set, it did not check that only valid bits were. Change that. Discussion: https://postgr.es/m/20230331043300.gux3s5wzrapqi4oe@awork3.anarazel.de	2023-04-01 18:00:19 -07:00
Andres Freund	14f98e0af9	hio: Release extension lock before initializing page / pinning VM PageInit() while holding the extension lock is unnecessary after `0d1fe9f74e` started to use RBM_ZERO_AND_LOCK - nobody can look at the new page before we release the page lock. PageInit() zeroes the page, which isn't that cheap, so deferring it until after the extension lock is released seems like a good idea. Doing visibilitymap_pin() while holding the extension lock, introduced in `7db0cd2145`, looks like an accident. Due to the restrictions on HEAP_INSERT_FROZEN it's unlikely to be a performance issue, but it still seems better to move it out. We also are doing the visibilitymap_pin() while holding the buffer lock, which will be fixed in a separate commit. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: http://postgr.es/m/419312fd-9255-078c-c3e3-f0525f911d7f@iki.fi	2023-04-01 17:50:18 -07:00
Tomas Vondra	0070b66fef	pg_dump: Use only LZ4 frame format for compression After `0da243fed0` got committed, it was reported that in some cases the compression ratio is rather poor - particularly for custom format with narrow tables - due to writing the LZ4 header/footer for each row. This commit switches to LZ4F (LZ4 frame format), eliminating most of the overhead and greatly improving the compression ratio. This makes the compressed size about the same for plain and custom formats (just like for gzip, for example). LZ4F is now used by both compression APIs, which allowed refactoring and reusing more of the code. For consistency this also renames the LZ4File struct to LZ4State, and a number of functions are now prefixed with LZ4Stream_ (instead of LZ4File_). Patch by Georgios Kokolatos, based on report and initial patch by Justin Pryzby. Review and minor cleanups by me. Author: Georgios Kokolatos, Justin Pryzby Reported-by: Justin Pryzby Reviewed-by: Tomas Vondra Discussion: https://postgr.es/m/20230227044910.GO1653%40telsasoft.com	2023-04-01 00:54:50 +02:00
Alvaro Herrera	6ee30209a6	SQL/JSON: support the IS JSON predicate This patch introduces the SQL standard IS JSON predicate. It operates on text and bytea values representing JSON, as well as on the json and jsonb types. Each test has IS and IS NOT variants and supports a WITH UNIQUE KEYS flag. The tests are: IS JSON [VALUE] IS JSON ARRAY IS JSON OBJECT IS JSON SCALAR These should be self-explanatory. The WITH UNIQUE KEYS flag makes these return false when duplicate keys exist in any object within the value, not necessarily directly contained in the outermost object. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Author: Andrew Dunstan <andrew@dunslane.net> Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/CAF4Au4w2x-5LTnN_bxky-mq4=WOqsGsxSpENCzHRAzSnEd8+WQ@mail.gmail.com Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org	2023-03-31 22:34:04 +02:00
Tom Lane	a2a0c7c29e	Further tweaking of width_bucket() edge cases. I realized that the third overflow case I posited in commit `b0e9e4d76` actually should be handled in a different way: rather than tolerating the idea that the quotient could round to 1, we should clamp so that the output cannot be more than "count" when we know that the operand is less than bound2. That being the case, we don't need an overflow-aware increment in that code path, which leads me to revert the movement of the pg_add_s32_overflow() call. (The diff in width_bucket_float8 might be easier to read by comparing against b0e9e4d76^.) What's more, width_bucket_numeric also has this problem of the quotient potentially rounding to 1, so add a clamp there too. As before, I'm not quite convinced that a back-patch is warranted. Discussion: https://postgr.es/m/391415.1680268470@sss.pgh.pa.us	2023-03-31 16:29:55 -04:00
Tom Lane	f0d65c0eaf	Reject system columns as elements of foreign keys. Up through v11 it was sensible to use the "oid" system column as a foreign key column, but since that was removed there's no visible usefulness in making any of the remaining system columns a foreign key. Moreover, since the TupleTableSlot rewrites in v12, such cases actively fail because of implicit assumptions that only user columns appear in foreign keys. The lack of complaints about that seems like good evidence that no one is trying to do it. Hence, rather than trying to repair those assumptions (of which there are at least two, maybe more), let's just forbid the case up front. Per this patch, a system column in either the referenced or referencing side of a foreign key will draw this error; however, putting one in the referenced side would have failed later anyway, since we don't allow unique indexes to be made on system columns. Per bug #17877 from Alexander Lakhin. Back-patch to v12; the case still appears to work in v11, so we shouldn't break it there. Discussion: https://postgr.es/m/17877-4bcc658e33df6de1@postgresql.org	2023-03-31 11:18:49 -04:00
Tom Lane	c2d7d679c1	Ensure acquire_inherited_sample_rows sets its output parameters. The totalrows/totaldeadrows outputs were left uninitialized in cases where we find no analyzable child tables of a partitioned table. This could lead to setting the partitioned table's pg_class.reltuples value to garbage. It's not clear that that would have any very bad effects in practice, but fix it anyway because it's making valgrind unhappy. Reported and diagnosed by Alexander Lakhin (bug #17880). Discussion: https://postgr.es/m/17880-9282037c923d856e@postgresql.org	2023-03-31 10:08:40 -04:00
Daniel Gustafsson	558fff0adf	pg_regress: Emit TAP compliant output This converts pg_regress output format to emit TAP compliant output while keeping it as human readable as possible for use without TAP test harnesses. As verbose harness related information isn't really supported by TAP this also reduces the verbosity of pg_regress runs which makes scrolling through log output in buildfarm/CI runs a bit easier as well. As the meson TAP parser conumes whitespace, the leading indentation for differentiating parallel tests from sequential tests has been changed to a single character prefix. This patch has been around for an extended period of time, reviewers listed below may have been involved in reviewing a version quite different from the version in this commit. The original idea for this patch was a hacking session with Jinbao Chen. TAP format testing is also enabled in meson as of this. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Nikolay Shaplov <dhyan@nataraj.su> Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/BD4B107D-7E53-4794-ACBA-275BEB4327C9@yesql.se Discussion: https://postgr.es/m/20220221164736.rq3ornzjdkmwk2wo@alap3.anarazel.de	2023-03-31 13:00:02 +02:00
Alvaro Herrera	9b058f6b0d	Move ExecEvalJsonConstructor new function to a more natural place Commit `7081ac46ac` put it at the end of the file, but that doesn't look very nice.	2023-03-31 12:55:25 +02:00
Alvaro Herrera	47a9709846	No need to add FORMAT to the keyword precedence list Commit `7081ac46ac` put it there. Remove it.	2023-03-31 11:16:43 +02:00
Andres Freund	f95c1cd6b2	Bump PGSTAT_FILE_FORMAT_ID, omitted in `8aaa04b32d` I forgot to do so in the referenced commit. While the consequences of omitting the version change are likely to be harmless (besides discarding stats, as a PGSTAT_FILE_FORMAT_ID bump also does), it still seems worth doing.	2023-03-30 19:48:01 -07:00
Andres Freund	8aaa04b32d	Track shared buffer hits in pg_stat_io Among other things, this should make it easier to calculate a useful cache hit ratio by excluding buffer reads via buffer access strategies. As buffer access strategies reuse buffers (and thus evict the prior buffer contents), it is normal to see reads on repeated scans of the same data. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAAKRu_beMa9Hzih40%3DXPYqhDVz6tsgUGTrhZXRo%3Dunp%2Bszb%3DUA%40mail.gmail.com	2023-03-30 19:24:21 -07:00
David Rowley	6c3b697b19	Fix List memory issue in transformColumnDefinition When calling generateSerialExtraStmts(), we would pass in the constraint->options. In some cases, generateSerialExtraStmts() would modify the referenced List to remove elements from it, but doing so is invalid without assigning the list back to all variables that point to it. In the particular reported problem case, the List became empty, in which cases it became NIL, but the passed in constraint->options didn't get to find out about that and was left pointing to free'd memory. To fix this, just perform a list_copy() inside generateSerialExtraStmts(). We could just do a list_copy() just before we perform the delete from the list, however, that seems less robust. Let's make sure the generated CreateSeqStmt gets a completely different copy of the list to be safe. Bug: #17879 Reported-by: Fei Changhong Diagnosed-by: Fei Changhong Discussion: https://postgr.es/m/17879-b7dfb5debee58ff5@postgresql.org Backpatch-through: 11, all supported versions	2023-03-31 12:13:05 +13:00
Thomas Munro	11c2d6fdf5	Parallel Hash Full Join. Full and right outer joins were not supported in the initial implementation of Parallel Hash Join because of deadlock hazards (see discussion). Therefore FULL JOIN inhibited parallelism, as the other join strategies can't do that in parallel either. Add a new PHJ phase PHJ_BATCH_SCAN that scans for unmatched tuples on the inner side of one batch's hash table. For now, sidestep the deadlock problem by terminating parallelism there. The last process to arrive at that phase emits the unmatched tuples, while others detach and are free to go and work on other batches, if there are any, but otherwise they finish the join early. That unfairness is considered acceptable for now, because it's better than no parallelism at all. The build and probe phases are run in parallel, and the new scan-for-unmatched phase, while serial, is usually applied to the smaller of the two relations and is either limited by some multiple of work_mem, or it's too big and is partitioned into batches and then the situation is improved by batch-level parallelism. Author: Melanie Plageman <melanieplageman@gmail.com> Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKG%2BA6ftXPz4oe92%2Bx8Er%2BxpGZqto70-Q_ERwRaSyA%3DafNg%40mail.gmail.com	2023-03-31 11:34:03 +13:00
Andres Freund	ca7b3c4c00	pg_stat_wal: Accumulate time as instr_time instead of microseconds In instr_time.h it is stated that: * When summing multiple measurements, it's recommended to leave the * running sum in instr_time form (ie, use INSTR_TIME_ADD or * INSTR_TIME_ACCUM_DIFF) and convert to a result format only at the end. The reason for that is that converting to microseconds is not cheap, and can loose precision. Therefore this commit changes 'PendingWalStats' to use 'instr_time' instead of 'PgStat_Counter' while accumulating 'wal_write_time' and 'wal_sync_time'. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/1feedb83-7aa9-cb4b-5086-598349d3f555@gmail.com	2023-03-30 14:23:14 -07:00
Alvaro Herrera	63cc20205c	Simplify transformJsonAggConstructor() API There's no need for callers to pass aggregate names so that the function can resolve them to OIDs, when callers can just pass aggregate OIDs directly to begin with.	2023-03-30 21:07:24 +02:00
Alvaro Herrera	60966f56c3	Fix inconsistencies and style issues in new SQL/JSON code Reported by Alexander Lakhin. Discussion: https://postgr.es/m/60483139-5c34-851d-baee-6c0d014e1710@gmail.com	2023-03-30 21:06:31 +02:00
Alvaro Herrera	589bb81649	Fix setrefs.c code for adjusting partPruneInfos We were transferring partPruneInfos from PlannerInfo into PlannerGlobal wrong, essentially relying on all of them being transferred, and adjusting their list indexes based on that. But apparently it's possible that some of them are skipped, so that strategy leads to a corrupted execution tree. Instead, adjust each Append/MergeAppend's partpruneinfo index as we copy from one list to the other, which seems safer anyway. This requires adjusting the RT offset of the RTE referenced in each partPruneInfo ahead of actually adjusting the RTE itself, which seems a bit too ad-hoc. This problem was introduced by commit `ec38694894`. However, it may be that we no longer require the change introduced there, so perhaps we should revert both the present commit and that one. Problem noticed by sqlsmith. Author: Amit Langote <amitlangote09@gmail.com Discussion: https://postgr.es/m/CA+HiwqG6tbc2oadsbyyy24b2AL295XHQgyLRWghmA7u_SL1K8A@mail.gmail.com	2023-03-30 21:06:31 +02:00
Andres Freund	f9054b7a7c	Fix format code in fd.c debugging infrastructure These were not sufficiently adjusted in `2d4f1ba6cf`.	2023-03-30 10:26:10 -07:00
Andres Freund	558cf80387	bufmgr: Fix undefined behaviour with, unrealistically, large temp_buffers Quoting Melanie: > Since if buffer is INT_MAX, then the -(buffer + 1) version invokes > undefined behavior while the -buffer - 1 version doesn't. All other places were already using the correct version. I (Andres), copied the code into more places in a patch. Melanie caught it in review, but to prevent more people from copying the bad code, fix it. Even if it is a theoretical issue. We really ought to wrap these accesses in a helper function... As this is a theoretical issue, don't backpatch. Reported-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_aW2SX_LWtwHgfnqYpBrunMLfE9PD6-ioPpkh92XH0qpg@mail.gmail.com	2023-03-30 10:26:10 -07:00
Tom Lane	e9d202a149	Clean up role created in new subscription test. This oversight broke repeated runs of "make installcheck".	2023-03-30 13:07:04 -04:00
Robert Haas	c3afe8cf5a	Add new predefined role pg_create_subscription. This role can be granted to non-superusers to allow them to issue CREATE SUBSCRIPTION. The non-superuser must additionally have CREATE permissions on the database in which the subscription is to be created. Most forms of ALTER SUBSCRIPTION, including ALTER SUBSCRIPTION .. SKIP, now require only that the role performing the operation own the subscription, or inherit the privileges of the owner. However, to use ALTER SUBSCRIPTION ... RENAME or ALTER SUBSCRIPTION ... OWNER TO, you also need CREATE permission on the database. This is similar to what we do for schemas. To change the owner of a schema, you must also have permission to SET ROLE to the new owner, similar to what we do for other object types. Non-superusers are required to specify a password for authentication and the remote side must use the password, similar to what is required for postgres_fdw and dblink. A superuser who wants a non-superuser to own a subscription that does not rely on password authentication may set the new password_required=false property on that subscription. A non-superuser may not set password_required=false and may not modify a subscription that already has password_required=false. This new password_required subscription property works much like the eponymous postgres_fdw property. In both cases, the actual semantics are that a password is not required if either (1) the property is set to false or (2) the relevant user is the superuser. Patch by me, reviewed by Andres Freund, Jeff Davis, Mark Dilger, and Stephen Frost (but some of those people did not fully endorse all of the decisions that the patch makes). Discussion: http://postgr.es/m/CA+TgmoaDH=0Xj7OBiQnsHTKcF2c4L+=gzPBUKSJLh8zed2_+Dg@mail.gmail.com	2023-03-30 11:37:19 -04:00
Tom Lane	b0e9e4d76c	Avoid overflow in width_bucket_float8(). The original coding of this function paid little attention to the possibility of overflow. There were actually three different hazards: 1. The range from bound1 to bound2 could exceed DBL_MAX, which on IEEE-compliant machines produces +Infinity in the subtraction. At best we'd lose all precision in the result, and at worst produce NaN due to dividing Inf/Inf. The range can't exceed twice DBL_MAX though, so we can fix this case by scaling all the inputs by 0.5. 2. We computed count * (operand - bound1), which is also at risk of float overflow, before dividing. Safer is to do the division first, producing a quotient that should be in [0,1), and even after allowing for roundoff error can't be outside [0,1]; then multiplying by count can't produce a result overflowing an int. (width_bucket_numeric does the multiplication first on the grounds that that improves accuracy of its result, but I don't think that a similar argument can be made in float arithmetic.) 3. If the division result does round to 1, and count is INT_MAX, the final addition of 1 would overflow an int. We took care of that in the operand >= bound2 case but did not consider that it could be possible in the main path. Fix that by moving the overflow-aware addition of 1 so it is done that way in all cases. The fix for point 2 creates a possibility that values very close to a bucket boundary will be rounded differently than they were before. I'm not troubled by that for HEAD, but it is an argument against putting this into the stable branches. Given that the cases being fixed here are fairly extreme and unlikely to be hit in normal use, it seems best not to back-patch. Mats Kindahl and Tom Lane Discussion: https://postgr.es/m/17876-61f280d1601f978d@postgresql.org	2023-03-30 11:27:36 -04:00
Daniel Gustafsson	2fe7a6df94	Fix pointer cast for seed calculation on 32-bit systems The fallback seed for when pg_strong_random cannot generate a high quality seed mixes in the address of the conn object, but the cast failed to take the word size into consideration. Fix by casting to a uintptr_t instead. The seed calculation was added in `7f5b19817e`. The code as it stood generated the following warning on mamba and lapwing in the buildfarm: fe-connect.c: In function 'libpq_prng_init': fe-connect.c:1048:11: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] 1048 \| rseed = ((uint64) conn) ^ \| ^ Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Discussion: https://postgr.es/m/TYAPR01MB58665250EDCD551CCA9AD117F58E9@TYAPR01MB5866.jpnprd01.prod.outlook.com	2023-03-30 10:53:15 +02:00
Peter Eisentraut	261cf8962b	Fix incorrect format placeholders	2023-03-30 08:33:43 +02:00
Amit Kapila	da324d6cd4	Refactor pgoutput_change(). Instead of mostly-duplicate code for different operation (insert/update/delete) types, write a common code to compute old/new tuples, and check the row filter. Author: Hou Zhijie Reviewed-by: Peter Smith, Amit Kapila Discussion: https://postgr.es/m/OS0PR01MB5716194A47FFA8D91133687D94BF9@OS0PR01MB5716.jpnprd01.prod.outlook.com	2023-03-30 11:10:38 +05:30
David Rowley	902ecd3bd4	Fix outdated comments regarding TupleTableSlots The tts_flag is named TTS_FLAG_SHOULDFREE, so use that instead of TTS_SHOULDFREE, which is the name of the macro that checks for that flag. Additionally, `4da597edf` got rid of the TupleTableSlot.tts_tuple field but forgot to update a comment which referenced that field. Fix that. Reported-by: Zhen Mingyang <zhenmingyang@yeah.net> Reported-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/1a96696c.9d3.187193989c3.Coremail.zhenmingyang@yeah.net	2023-03-30 16:37:03 +13:00
Daniel Gustafsson	7f5b19817e	Support connection load balancing in libpq This adds support for load balancing connections with libpq using a connection parameter: load_balance_hosts=<string>. When setting the param to random, hosts and addresses will be connected to in random order. This then results in load balancing across these addresses and hosts when multiple clients or frequent connection setups are used. The randomization employed performs two levels of shuffling: 1. The given hosts are randomly shuffled, before resolving them one-by-one. 2. Once a host its addresses get resolved, the returned addresses are shuffled, before trying to connect to them one-by-one. Author: Jelte Fennema <postgres@jeltef.nl> Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Michael Banck <mbanck@gmx.net> Reviewed-by: Andrey Borodin <amborodin86@gmail.com> Discussion: https://postgr.es/m/PR3PR83MB04768E2FF04818EEB2179949F7A69@PR3PR83MB0476.EURPRD83.prod.outlook.	2023-03-29 21:53:38 +02:00
Daniel Gustafsson	44d85ba5a3	Copy and store addrinfo in libpq-owned private memory This refactors libpq to copy addrinfos returned by getaddrinfo to memory owned by libpq such that future improvements can alter for example the order of entries. As a nice side effect of this refactor the mechanism for iteration over addresses in PQconnectPoll is now identical to its iteration over hosts. Author: Jelte Fennema <postgres@jeltef.nl> Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Michael Banck <mbanck@gmx.net> Reviewed-by: Andrey Borodin <amborodin86@gmail.com> Discussion: https://postgr.es/m/PR3PR83MB04768E2FF04818EEB2179949F7A69@PR3PR83MB0476.EURPRD83.prod.outlook.com	2023-03-29 21:41:27 +02:00
Tom Lane	8e5eef50c5	Fix dereference of dangling pointer in GiST index buffering build. gistBuildCallback tried to fetch the size of an index tuple that might have already been freed by gistProcessEmptyingQueue. While this seems to usually be harmless in production builds, in principle it could result in a SIGSEGV, or more likely a bogus value for indtuplesSize leading to poor page-split decisions later in the build. The memory management here is confusing and could stand to be refactored, but for the moment it seems to be enough to fetch the tuple size sooner. AFAICT the indtuples[Size] totals aren't used in between these places; even if they were, the updated values shouldn't be any worse to use. So just move the incrementing of the totals up. It's not very clear why our valgrind-using buildfarm animals haven't noticed this problem, because the relevant code path does seem to be exercised according to the code coverage report. I think the reason that we didn't fix this bug after the first report is that I'd wanted to try to understand that better. However, now that it's been re-discovered let's just be pragmatic and fix it already. Original report by Alexander Lakhin (bug #16329), later rediscovered by Egor Chindyaskin (bug #17874). Patch by Alexander Lakhin (commentary by Pavel Borisov and me). Back-patch to all supported branches. Discussion: https://postgr.es/m/16329-7a6aa9b6fa1118a1@postgresql.org Discussion: https://postgr.es/m/17874-63ca6c7ce42d2103@postgresql.org	2023-03-29 11:31:30 -04:00
Tom Lane	3aa961378b	Add missing .gitignore entries. Oversight in commit `7081ac46ac`.	2023-03-29 09:16:53 -04:00
Tom Lane	58c9600a9f	Remove empty function BufmgrCommit(). This function has been a no-op for over a decade. Even if bufmgr regains a need to be called during commit, it seems unlikely that the most appropriate call points would be precisely here, so it's not doing us much good as a placeholder either. Now, removing it probably doesn't save any noticeable number of cycles --- but the main call is inside the commit critical section, and the less work done there the better. Matthias van de Meent Discussion: https://postgr.es/m/CAEze2Wi1=tLKbxZnXzcD+8fYKyKqBtivVakLQC_mYBsP4Y8qVA@mail.gmail.com	2023-03-29 09:13:57 -04:00
Alvaro Herrera	7081ac46ac	SQL/JSON: add standard JSON constructor functions This commit introduces the SQL/JSON standard-conforming constructors for JSON types: JSON_ARRAY() JSON_ARRAYAGG() JSON_OBJECT() JSON_OBJECTAGG() Most of the functionality was already present in PostgreSQL-specific functions, but these include some new functionality such as the ability to skip or include NULL values, and to allow duplicate keys or throw error when they are found, as well as the standard specified syntax to specify output type and format. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/CAF4Au4w2x-5LTnN_bxky-mq4=WOqsGsxSpENCzHRAzSnEd8+WQ@mail.gmail.com Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org	2023-03-29 12:11:36 +02:00
Peter Eisentraut	38b7437b90	Fix some section numbers in information_schema.sql Some of the section numbers that appeared multiple times were not updated completely by previous changes `d61d9aa750` and `eb3a1376c9`.	2023-03-29 11:34:37 +02:00
Peter Eisentraut	563f21cda8	Move definition of standard collations from initdb to pg_collation.dat The standard collations "ucs_basic" and "unicode" were defined in initdb, even though pg_collation.dat seems like the correct place for them. It seems this was just forgotten during various reorganizations of initdb and pg_collation.dat/.h over time. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/08b58ecd-0d50-9395-ed51-dc8294e3fd2b%40enterprisedb.com	2023-03-29 09:45:21 +02:00
Peter Eisentraut	0d15afc875	Simplify useless 0L constants In ancient times, these belonged to arguments or fields that were actually of type long, but now they are not anymore, so this "L" decoration is just confusing. (Some other 0L and other "L" constants remain, where they are actually associated with a long type.)	2023-03-29 08:25:12 +02:00
Amit Kapila	062a844424	Avoid syncing data twice for the 'publish_via_partition_root' option. When there are multiple publications for a subscription and one of those publishes via the parent table by using publish_via_partition_root and the other one directly publishes the child table, we end up copying the same data twice during initial synchronization. The reason for this was that we get both the parent and child tables from the publisher and try to copy the data for both of them. This patch extends the function pg_get_publication_tables() to take a publication list as its input parameter. This allows us to exclude a partition table whose ancestor is published by the same publication list. This problem does exist in back-branches but we decide to fix it there in a separate commit if required. The fix for back-branches requires quite complicated changes to fetch the required table information from the publisher as we can't update the function pg_get_publication_tables() in back-branches. We are not sure whether we want to deviate and complicate the code in back-branches for this problem as there are no field reports yet. Author: Wang wei Reviewed-by: Peter Smith, Jacob Champion, Kuroda Hayato, Vignesh C, Osumi Takamichi, Amit Kapila Discussion: https://postgr.es/m/OS0PR01MB57167F45D481F78CDC5986F794B99@OS0PR01MB5716.jpnprd01.prod.outlook.com	2023-03-29 10:46:58 +05:30
Tomas Vondra	00d9dcf5be	pg_dump: Fix gzip compression of empty data The pg_dump Compressor API has three basic callbacks - Allocate, Write and End. The gzip implementation (since `e9960732a`) wrongly assumed the Write function would always be called, and deferred the initialization of the internal compression system until the first such call. But when there's no data to compress (e.g. for empty LO), this would result in not finalizing the compression state (because it was not actually initialized), producing invalid dump. Fixed by initializing the internal compression system in the Allocate call, whenever the caller provides the Write. For decompression the state is not needed, so we leave the private_data member unpopulated. Introduces a pg_dump TAP test compressing an empty large object. This also rearranges the functions to their original order, to make diffs against older code simpler to understand. Finally, replace an unreachable pg_fatal() with a simple assert check. Reported-by: Justin Pryzby Author: Justin Pryzby, Georgios Kokolatos Reviewed-by: Georgios Kokolatos, Tomas Vondra https://postgr.es/m/20230228235834.GC30529%40telsasoft.com	2023-03-29 02:34:48 +02:00
Jeff Davis	1671f990dd	Validate ICU locales. For ICU collations, ensure that the locale's language exists in ICU, and that the locale can be opened. Basic validation helps avoid minor mistakes and misspellings, which often fall back to the root locale instead of the intended locale. It's even more important to avoid such mistakes in ICU versions 54 and earlier, where the same (misspelled) locale string could fall back to different locales depending on the environment. Discussion: https://postgr.es/m/11b1eeb7e7667fdd4178497aeb796c48d26e69b9.camel@j-davis.com Discussion: https://postgr.es/m/df2efad0cae7c65180df8e5ebb709e5eb4f2a82b.camel@j-davis.com Reviewed-by: Peter Eisentraut	2023-03-28 16:34:29 -07:00
Tom Lane	326a33a289	Fix corner-case planner failure for MERGE. MERGE planning could fail with "variable not found in subplan target list" if the target table is partitioned and all its partitions are excluded at plan time, or in the case where it has no partitions but used to have some. This happened because distribute_row_identity_vars thought it didn't need to make the target table's reltarget list fully valid; but if we generate a join plan then that is required because the dummy Result node's tlist will be made from the reltarget. The same logic appears in distribute_row_identity_vars in v14, but AFAICS the problem is unreachable in that branch for lack of MERGE. In other updating statements, the target table is always inner-joined to any other tables, so if the target is known dummy then the whole plan reduces to dummy, so no join nodes are created. So I'll refrain from back-patching this code change to v14 for now. Per report from Alvaro Herrera. Discussion: https://postgr.es/m/20230328112248.6as34mlx5sr4kltg@alvherre.pgsql	2023-03-28 11:39:24 -04:00
Jeff Davis	c1f1c1f87f	initdb: emit message when using default ICU locale. Helpful to determine from test logs whether the locale came from the environment or a command-line option. Discussion: https://postgr.es/m/04182066-7655-344a-b8b7-040b1b2490fb%40enterprisedb.com Reviewed-by: Peter Eisentraut	2023-03-28 08:24:43 -07:00
Jeff Davis	f8ca22295e	initdb: replace check_icu_locale() with default_icu_locale(). The extra checks done in check_icu_locale() are not necessary. An existing comment already pointed out that the checks would be done during post-bootstrap initialization, when the locale is opened by the backend. This was a mistake in commit `27b62377b4`. This commit creates a simpler function default_icu_locale() to just return the locale of the default collator. Discussion: https://postgr.es/m/04182066-7655-344a-b8b7-040b1b2490fb%40enterprisedb.com Reviewed-by: Peter Eisentraut	2023-03-28 08:24:21 -07:00
Jeff Davis	8b3eb0c584	Fix error inconsistency in older ICU versions. To support older ICU versions, we rely on icu_set_collation_attributes() to do error checking that is handled directly by ucol_open() in newer ICU versions. Commit `3b50275b12` introduced a slight inconsistency, where the error report includes the fixed-up locale string, rather than the locale string passed to pg_ucol_open(). Refactor slightly so that pg_ucol_open() handles the errors from both ucol_open() and icu_set_collation_attributes(), making it easier to see any differences between the error reports. It also makes pg_ucol_open() responsible for closing the UCollator on error, which seems like the right place. Discussion: https://postgr.es/m/04182066-7655-344a-b8b7-040b1b2490fb%40enterprisedb.com Reviewed-by: Peter Eisentraut	2023-03-28 08:24:18 -07:00
Peter Eisentraut	90189eefc1	Save a few bytes in pg_attribute Change the columns attndims, attstattarget, and attinhcount from int32 to int16, and reorder a bit. This saves some space (currently 4 bytes) in pg_attribute and tuple descriptors, which translates into small performance benefits and/or room for new columns in pg_attribute needed by future features. attndims and attinhcount are never realistically used with values larger than int16. Just to be sure, add some overflow checks. attstattarget is currently limited explicitly to 10000. For consistency, pg_constraint.coninhcount is also changed like attinhcount. Discussion: https://www.postgresql.org/message-id/flat/d07ffc2b-e0e8-77f7-38fb-be921dff71af%40enterprisedb.com	2023-03-28 10:05:56 +02:00
Michael Paquier	4efd0bf7ea	Generate a few more functions of pgstatfuncs.c with macros Two new macros are added with their respective functions switched to use them. These are for functions with millisecond stats, with and without "xact" in their names (for the stats that can be tracked within a transaction). While on it, prefix the macro for float8 on database entries with "_MS", as it does a us->ms conversion, based on a suggestion from Andres Freund. Author: Bertrand Drouvot Discussion: https://postgr.es/m/6e2efb4f-6fd0-807e-f6bf-94207db8183a@gmail.com	2023-03-28 07:35:33 +09:00
Tom Lane	a3c9d35ae1	Reject attempts to alter composite types used in indexes. find_composite_type_dependencies() ignored indexes, which is a poor decision because an expression index could have a stored column of a composite (or other container) type even when the underlying table does not. Teach it to detect such cases and error out. We have to work a bit harder than for other relations because the pg_depend entry won't identify the specific index column of concern, but it's not much new code. This does not address bug #17872's original complaint that dropping a column in such a type might lead to violations of the uniqueness property that a unique index is supposed to ensure. That seems of much less concern to me because it won't lead to crashes. Per bug #17872 from Alexander Lakhin. Back-patch to all supported branches. Discussion: https://postgr.es/m/17872-d0fbb799dc3fd85d@postgresql.org	2023-03-27 15:04:15 -04:00
Robert Haas	c87aff065c	amcheck: Generalize one of the recently-added update chain checks. Commit `bbc1376b39` checked that if a redirected line pointer pointed to a tuple, the tuple should be marked both HEAP_ONLY_TUPLE and HEAP_UPDATED. But Andres Freund pointed out that any tuple that is marked HEAP_ONLY_TUPLE should be marked HEAP_UPDATED, not just one that is the target of a redirected line pointer. Do that instead. To see why this is better, consider a redirect line pointer A which points to a heap-only tuple B which points (via CTID) to another heap-only tuple C. With the old code, we'd complain if B was not marked HEAP_UPDATED, but with this change, we'll complain if either B or C is not marked HEAP_UPDATED. (Note that, with or without this commit, if either B or C were not marked HEAP_ONLY_TUPLE, we would also complain about that.) Discussion: http://postgr.es/m/CA%2BTgmobLypZx%3DcOH%2ByY1GZmCruaoucHm77A6y_-Bo%3Dh-_3H28g%40mail.gmail.com	2023-03-27 13:37:16 -04:00
Daniel Gustafsson	b577743000	Make SCRAM iteration count configurable Replace the hardcoded value with a GUC such that the iteration count can be raised in order to increase protection against brute-force attacks. The hardcoded value for SCRAM iteration count was defined to be 4096, which is taken from RFC 7677, so set the default for the GUC to 4096 to match. In RFC 7677 the recommendation is at least 15000 iterations but 4096 is listed as a SHOULD requirement given that it's estimated to yield a 0.5s processing time on a mobile handset of the time of RFC writing (late 2015). Raising the iteration count of SCRAM will make stored passwords more resilient to brute-force attacks at a higher computational cost during connection establishment. Lowering the count will reduce computational overhead during connections at the tradeoff of reducing strength against brute-force attacks. There are however platforms where even a modest iteration count yields a too high computational overhead, with weaker password encryption schemes chosen as a result. In these situations, SCRAM with a very low iteration count still gives benefits over weaker schemes like md5, so we allow the iteration count to be set to one at the low end. The new GUC is intentionally generically named such that it can be made to support future SCRAM standards should they emerge. At that point the value can be made into key:value pairs with an undefined key as a default which will be backwards compatible with this. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org> Discussion: https://postgr.es/m/F72E7BC7-189F-4B17-BF47-9735EB72C364@yesql.se	2023-03-27 09:46:29 +02:00
Michael Paquier	850f4b4c8c	Generate pg_stat_get_xact*() functions for relations using macros This change replaces seven functions definitions by macros. This is the same idea as `8018ffb` or `83a1a1b`, taking advantage of the variable rename done in `8089517` for relation entries. Author: Bertrand Drouvot Discussion: https://postgr.es/m/631e3084-c5d9-8463-7540-fcff4674caa5@gmail.com	2023-03-27 09:57:41 +09:00
Tom Lane	554841699f	Fix oversights in array manipulation. The nested-arrays code path in ExecEvalArrayExpr() used palloc to allocate the result array, whereas every other array-creating function has used palloc0 since `18c0b4ecc`. This mostly works, but unused bits past the end of the nulls bitmap may end up undefined. That causes valgrind complaints with -DWRITE_READ_PARSE_PLAN_TREES, and could cause planner misbehavior as cited in `18c0b4ecc`. There seems no very good reason why we should strive to avoid palloc0 in just this one case, so fix it the easy way with s/palloc/palloc0/. While looking at that I noted that we also failed to check for overflow of "nbytes" and "nitems" while summing the sizes of the sub-arrays, potentially allowing a crash due to undersized output allocation. For "nbytes", follow the policy used by other array-munging code of checking for overflow after each addition. (As elsewhere, the last addition of the array's overhead space doesn't need an extra check, since palloc itself will catch a value between 1Gb and 2Gb.) For "nitems", there's no very good reason to sum the inputs at all, since we can perfectly well use ArrayGetNItems' result instead of ignoring it. Per discussion of this bug, also remove redundant zeroing of the nulls bitmap in array_set_element and array_set_slice. Patch by Alexander Lakhin and myself, per bug #17858 from Alexander Lakhin; thanks also to Richard Guo. These bugs are a dozen years old, so back-patch to all supported branches. Discussion: https://postgr.es/m/17858-8fd287fd3663d051@postgresql.org	2023-03-26 13:41:06 -04:00
Daniel Gustafsson	d435f15fff	Add SysCacheGetAttrNotNull for guaranteed not-null attrs When extracting an attr from a cached tuple in the syscache with SysCacheGetAttr the isnull parameter must be checked in case the attr cannot be NULL. For cases when this is known beforehand, a wrapper is introduced which perform the errorhandling internally on behalf of the caller, invoking an elog in case of a NULL attr. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/AD76405E-DB45-46B6-941F-17B1EB3A9076@yesql.se	2023-03-25 22:49:33 +01:00
Noah Misch	e33967b13b	Comment on expectations for AutoVacuumWorkItem handlers. This might prevent a repeat of the brin_summarize_range() vulnerability that commit `a117cebd63` fixed.	2023-03-25 13:00:27 -07:00
Tom Lane	27f5c712b2	Fix CREATE INDEX progress reporting for multi-level partitioning. The "partitions_total" and "partitions_done" fields were updated as though the current level of partitioning was the only one. In multi-level cases, not only could partitions_total change over the course of the command, but partitions_done could go backwards or exceed the currently-reported partitions_total. Fix by setting partitions_total to the total number of direct and indirect children once at command start, and then just incrementing partitions_done at appropriate points. Invent a new progress monitoring function "pgstat_progress_incr_param" to simplify doing the latter. We can avoid adding cost for the former when doing CREATE INDEX, because ProcessUtility already enumerates the children and it's pretty easy to pass the count down to DefineIndex. In principle the same could be done in ALTER TABLE, but that's structurally difficult; for now, just eat the cost of an extra find_all_inheritors scan in that case. Ilya Gladyshev and Justin Pryzby Discussion: https://postgr.es/m/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel@gmail.com	2023-03-25 15:34:03 -04:00
Jeff Davis	81a6d57e33	Fix abbreviated keys bug introduced in `d87d548cd0`. Discussion: http://postgr.es/m/CAMkU=1z17XJatF-rMCY3Cjqcxer-Kyn57x6h3OSCpJ0LpAp0ig@mail.gmail.com Reported-by: Jeff Janes	2023-03-25 11:08:32 -07:00
Tom Lane	3c05284d83	Invent GENERIC_PLAN option for EXPLAIN. This provides a very simple way to see the generic plan for a parameterized query. Without this, it's necessary to define a prepared statement and temporarily change plan_cache_mode, which is a bit tedious. One thing that's a bit of a hack perhaps is that we disable execution-time partition pruning when the GENERIC_PLAN option is given. That's because the pruning code may attempt to fetch the value of one of the parameters, which would fail. Laurenz Albe, reviewed by Julien Rouhaud, Christoph Berg, Michel Pelletier, Jim Jones, and myself Discussion: https://postgr.es/m/0a29b954b10b57f0d135fe12aa0909bd41883eb0.camel@cybertec.at	2023-03-24 17:07:22 -04:00
Jeff Davis	a03b3b6b4a	Avoid potential UCollator leak for older ICU versions. ICU versions 53 and earlier rely on icu_set_collation_attributes() to process the attributes in the locale string. Avoid leaking the already-opened UCollator object if an error is encountered. Discussion: https://postgr.es/m/04182066-7655-344a-b8b7-040b1b2490fb%40enterprisedb.com Reviewed-by: Peter Eisentraut	2023-03-24 08:48:03 -07:00
Jeff Davis	9a24289915	pg_locale.c: change ereport() to elog(). Discussion: https://postgr.es/m/73553013-3926-0f34-0fb8-f37909fe4902@enterprisedb.com Reported-by: Peter Eisentraut	2023-03-24 08:47:42 -07:00
Daniel Gustafsson	a04761ac77	Fix typo in header comment Commit `4c04be9b0` accidentally left off the _id portion of the function name in the header comment. Author: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAEG8a3LP+ytnAXSzR=yiEaQrde+iCybMHsuPn9n=UN3puV_1tw@mail.gmail.com	2023-03-24 09:03:31 +01:00
Peter Eisentraut	a9bc04b211	Fix incorrect format placeholders The fields of NLSVERSIONINFOEX are of type DWORD, which is unsigned long, so the results of the computations being printed are also of type unsigned long.	2023-03-24 07:21:40 +01:00
Michael Paquier	36f40ce2dc	libpq: Add sslcertmode option to control client certificates The sslcertmode option controls whether the server is allowed and/or required to request a certificate from the client. There are three modes: - "allow" is the default and follows the current behavior, where a configured client certificate is sent if the server requests one (via one of its default locations or sslcert). With the current implementation, will happen whenever TLS is negotiated. - "disable" causes the client to refuse to send a client certificate even if sslcert is configured or if a client certificate is available in one of its default locations. - "require" causes the client to fail if a client certificate is never sent and the server opens a connection anyway. This doesn't add any additional security, since there is no guarantee that the server is validating the certificate correctly, but it may helpful to troubleshoot more complicated TLS setups. sslcertmode=require requires SSL_CTX_set_cert_cb(), available since OpenSSL 1.0.2. Note that LibreSSL does not include it. Using a connection parameter different than require_auth has come up as the simplest design because certificate authentication does not rely directly on any of the AUTH_REQ_* codes, and one may want to require a certificate to be sent in combination of a given authentication method, like SCRAM-SHA-256. TAP tests are added in src/test/ssl/, some of them relying on sslinfo to check if a certificate has been set. These are compatible across all the versions of OpenSSL supported on HEAD (currently down to 1.0.1). Author: Jacob Champion Reviewed-by: Aleksander Alekseev, Peter Eisentraut, David G. Johnston, Michael Paquier Discussion: https://postgr.es/m/9e5a8ccddb8355ea9fa4b75a1e3a9edc88a70cd3.camel@vmware.com	2023-03-24 13:34:26 +09:00
Andres Freund	e522049f23	meson: add install-{quiet, world} targets To define our own install target, we need dependencies on the i18n targets, which we did not collect so far. Discussion: https://postgr.es/m/3fc3bb9b-f7f8-d442-35c1-ec82280c564a@enterprisedb.com	2023-03-23 21:20:18 -07:00
Andres Freund	614c5f5f52	meson: make install_test_files more generic, rename to install_files Now it supports installing directories and directory contents as well. This will be used in a subsequent patch to install documentation. Discussion: https://postgr.es/m/3fc3bb9b-f7f8-d442-35c1-ec82280c564a@enterprisedb.com	2023-03-23 21:20:18 -07:00
Michael Paquier	bcaa1fafc8	Rewrite error message related to sslmode in libpq The same error message will be used for a different option, to be introduced in a separate patch. Reshaping the error message as done here saves in translation. Extracted from a larger patch by the same author. Author: Jacob Champion Discussion: https://postgr.es/m/9e5a8ccddb8355ea9fa4b75a1e3a9edc88a70cd3.camel@vmware.com	2023-03-24 10:14:33 +09:00

1 2 3 4 5 ...

41160 Commits