postgresql

Commit Graph

Author	SHA1	Message	Date
Heikki Linnakangas	1169920ff7	Add tests for libpq gssencmode and sslmode options Test all combinations of gssencmode, sslmode, whether the server supports SSL and/or GSSAPI encryption, and whether they are accepted by pg_hba.conf. This is in preparation for refactoring that code in libpq, and for adding a new option for "direct SSL" connections, which adds another dimension to the logic. If we add even more options in the future, testing all combinations will become unwieldy and we'll need to rethink this, but for now an exhaustive test is nice. Author: Heikki Linnakangas, Matthias van de Meent Reviewed-by: Jacob Champion Discussion: https://www.postgresql.org/message-id/a3af4070-3556-461d-aec8-a8d794f94894@iki.fi	2024-04-08 02:49:32 +03:00
Heikki Linnakangas	9f899562d4	Move Kerberos module So that we can reuse it in new tests. Discussion: https://www.postgresql.org/message-id/a3af4070-3556-461d-aec8-a8d794f94894@iki.fi Reviewed-by: Jacob Champion, Matthias van de Meent	2024-04-08 02:49:30 +03:00
Michael Paquier	997db123c0	Make GIN test using injection points repeatable As written, the test would fail when run repeatedly because one of the injection points attached was not detached. This would not matter if the test is rewritten to be concurrently safe, but let's be clean and it is a good practice. Oversight in `6a1ea02c49`. Discussion: https://postgr.es/m/ZfP7IDs9TvrKe49x@paquier.xyz	2024-04-08 08:45:04 +09:00
David Rowley	705ec05653	Fix incorrect KeeperBlock macro in bump.c The macro was missing a MAXALIGN around the sizeof(BumpContext) which would cause problems detecting the keeper block on 32-bit systems that have a MAXALIGN value of 8. Thank you to Andres Freund, Tomas Vondra and Tom Lane for investigating and testing. Reported-by: Melanie Plageman, Tomas Vondra Discussion: https://postgr.es/m/CAAKRu_Y6dZjiJEZghgNZp0Gjar1JVq-CH7XGDqExDVHnPgDjuw@mail.gmail.com Discussion: https://postgr.es/m/a4a10b89-6ba8-4abd-b449-019aafff04fc@enterprisedb.com	2024-04-08 11:06:31 +12:00
Alexander Korotkov	beabea6c20	Fix usage of same ListCell transform_or_to_any()'s in nested loops Discussion: https://postgr.es/m/CAAKRu_b4SXNW4GAM0bv3e6wcL5ODSXg1ZdRCn6uyLLjSPbveBg%40mail.gmail.com Author: Melanie Plageman	2024-04-08 01:38:37 +03:00
Alexander Korotkov	72bd38cc99	Transform OR clauses to ANY expression Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...]) on the preliminary stage of optimization when we are still working with the expression tree. Here Cn is a n-th constant expression, 'expr' is non-constant expression, 'op' is an operator which returns boolean result and has a commuter (for the case of reverse order of constant and non-constant parts of the expression, like 'Cn op expr'). Sometimes it can lead to not optimal plan. This is why there is a or_to_any_transform_limit GUC. It specifies a threshold value of length of arguments in an OR expression that triggers the OR-to-ANY transformation. Generally, more groupable OR arguments mean that transformation will be more likely to win than to lose. Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru Author: Alena Rybakina <lena.ribackina@yandex.ru> Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Ranier Vilela <ranier.vf@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com>	2024-04-08 01:27:52 +03:00
Daniel Gustafsson	75a47b6a0d	Change debug printing to log filename When restarting the cluster fails the code introduced in `33774978c7` printed the full log contents to aid debugging. For cases when the logfile is large this adds unnecessary overhead. Reduce to printing the logfile path instead. Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20240406214439.2n4zf2w7ukhf7dsy@awork3.anarazel.de	2024-04-08 00:24:20 +02:00
Tom Lane	626603d463	Doc: clarify behavior of boolean options in replication protocol commands. Same idea as `ec7e053a9`, but applying to the walsender commands described in protocol.sgml. Peter Smith Discussion: https://postgr.es/m/CAHut+PvwjZfdGt2R8HTXgSZft=jZKymrS8KUg31pS7zqaaWKKw@mail.gmail.com	2024-04-07 17:16:32 -04:00
Tom Lane	0c66a164e7	Remove useless duplicate call of defGetBoolean(). Seems to be a copy-and-paste error dating to `dc2123400`. Noted while reviewing a related documentation patch.	2024-04-07 17:08:06 -04:00
Tom Lane	2daeba6a4e	Doc: show how to get the equivalent of LIMIT for UPDATE/DELETE. Add examples showing use of a CTE and a self-join to perform partial UPDATEs and DELETEs. Corey Huinker, reviewed by Laurenz Albe Discussion: https://postgr.es/m/CADkLM=caNEQsUwPWnfi2jR4ix99E0EJM_3jtcE-YjnEQC7Rssw@mail.gmail.com	2024-04-07 16:26:47 -04:00
Tom Lane	1973d9fb31	Doc: update documentation about EXCLUDE constraint elements. What the documentation calls an exclude_element is an index_elem according to gram.y, and it allows all the same options that a CREATE INDEX column specification does. The COLLATE patch neglected to update the CREATE/ALTER TABLE docs about that, and later the opclass-parameters patch made the same oversight. Add those options to the syntax synopses, and polish the associated text a bit. Back-patch to v13 where opclass parameters came in. We could update v12 with just the COLLATE omission, but it doesn't quite seem worth the trouble at this point. Shihao Zhong, reviewed by Daniel Vérité, Shubham Khanna and myself Discussion: https://postgr.es/m/CAGRkXqShbVyB8E3gapfdtuwiWTiK=Q67Qb9qwxu=+-w0w46EBA@mail.gmail.com	2024-04-07 15:36:11 -04:00
Alvaro Herrera	a0e0fb1ba5	Use conditional variable to wait for next MultiXact offset In one multixact.c edge case, we need a mechanism to wait for one multixact offset to be written before being allowed to read the next one. We used to handle this case by sleeping for one millisecond and retrying, but such sleeps have been reported as problematic in production cases. We can avoid the problem by using a condition variable: readers sleep on it and then every creator of multixacts broadcasts into the CV when creation is sufficiently far along. Author: Kyotaro Horiguchi <horikyotajntt@gmail.com> Reviewed-by: Andrey Borodin <amborodin@acm.org> Discussion: https://postgr.es/m/47A598F4-B4E7-4029-8FEC-A06A6C3CB4B5@yandex-team.ru Discussion: https://postgr.es/m/20200515.090333.24867479329066911.horikyota.ntt	2024-04-07 20:33:45 +02:00
Peter Geoghegan	473411fc51	Avoid extra lookups with nbtree array inequalities. nbtree index scans with SAOP inequalities (but no SAOP equalities) performed extra ORDER proc lookups for any remaining equality strategy scan keys. This could waste cycles, and caused assertion failures. Keeping around a separate ORDER proc is only necessary for a scan's non-array/non-SAOP equality scan keys when the scan has at least one other SAOP equality strategy key (a SAOP inequality shouldn't count). To fix, replace _bt_preprocess_array_keys_final's assertion with a test that makes the function return early when the scan has no SAOP equality scan keys. Oversight in commit 1b134ca5, which enhanced nbtree ScalarArrayOp execution. Reported-By: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/0539d3d3-a402-0a49-ed5e-26429dffc4bd@gmail.com	2024-04-07 14:15:54 -04:00
Heikki Linnakangas	a475a2fa3b	Don't clobber test exit code at cleanup in LDAP/Kerberors tests If the test script die()d before running the first test, the whole test was interpreted as SKIPped rather than failed. The PostgreSQL::Cluster module got this right. Backpatch to all supported versions. Discussion: https://www.postgresql.org/message-id/fb898a70-3a88-4629-88e9-f2375020061d@iki.fi	2024-04-07 20:21:27 +03:00
Heikki Linnakangas	e13b586d7c	Improve check in LDAP test to find the OpenLDAP installation If the OpenLDAP installation directory is not found, set $setup to 0 so that the LDAP tests are skipped. The macOS checks were already doing that, but the checks on other OS's were not. While we're at it, improve the error message when the tests are skipped, to specify whether the OS is supported at all, or if we just didn't find the installation directory. This was accidentally "working" without this, i.e. we were skipping the tests if the OpenLDAP installation was not found, because of a bug in the LdapServer test module: the END block clobbered the exit code so if the script die()s before running the first subtest, the whole test script was marked as SKIPped. The next commit will fix that bug, but we need to fix the setup code first. These checks should probably go into configure/meson, but this is better than nothing and allows fixing the bug in the END block. Backpatch to all supported versions. Discussion: https://www.postgresql.org/message-id/fb898a70-3a88-4629-88e9-f2375020061d@iki.fi	2024-04-07 20:21:21 +03:00
Thomas Munro	b7b0f3f272	Use streaming I/O in sequential scans. Instead of calling ReadBuffer() for each block, heap sequential scans and TID range scans now use the streaming API introduced in `b5a9b18cd0`. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg%3DgEQ%40mail.gmail.com	2024-04-08 01:53:57 +12:00
David Rowley	6ed83d5fa5	Use bump memory context for tuplesorts `29f6a959c` added a bump allocator type for efficient compact allocations. Here we make use of this for non-bounded tuplesorts to store tuples. This is very space efficient when storing narrow tuples due to bump.c not having chunk headers. This means we can fit more tuples in work_mem before spilling to disk, or perform an in-memory sort touching fewer cacheline. Author: David Rowley Reviewed-by: Nathan Bossart Reviewed-by: Matthias van de Meent Reviewed-by: Tomas Vondra Reviewed-by: John Naylor Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com	2024-04-08 00:32:26 +12:00
Alvaro Herrera	f3ff7bf83b	Add XLogCtl->logInsertResult This tracks the position of WAL that's been fully copied into WAL buffers by all processes emitting WAL. (For some reason we call that "WAL insertion"). This is updated using atomic monotonic advance during WaitXLogInsertionsToFinish, which is not when the insertions actually occur, but it's the only place where we know where have all the insertions have completed. This value is useful in WALReadFromBuffers, which can verify that callers don't try to read past what has been inserted. (However, more infrastructure is needed in order to actually use WAL after the flush point, since it could be lost.) The value is also useful in WaitXLogInsertionsToFinish() itself, since we can now exit quickly when all WAL has been already inserted, without even having to take any locks.	2024-04-07 14:06:30 +02:00
David Rowley	29f6a959cf	Introduce a bump memory allocator This introduces a bump MemoryContext type. The bump context is best suited for short-lived memory contexts which require only allocations of memory and never a pfree or repalloc, which are unsupported. Memory palloc'd into a bump context has no chunk header. This makes bump a useful context type when lots of small allocations need to be done without any need to pfree those allocations. Allocation sizes are rounded up to the next MAXALIGN boundary, so with this and no chunk header, allocations are very compact indeed. Allocations are also very fast as bump does not check any freelists to try and make use of previously free'd chunks. It just checks if there is enough room on the current block, and if so it bumps the freeptr beyond this chunk and returns the value that the freeptr was previously pointing to. Simple and fast. A new block is malloc'd when there's not enough space in the current block. Code using the bump allocator must take care never to call any functions which could try to call realloc() (or variants), pfree(), GetMemoryChunkContext() or GetMemoryChunkSpace() on a bump allocated chunk. Due to lack of chunk headers, these operations are unsupported. To increase the chances of catching such issues, when compiled with MEMORY_CONTEXT_CHECKING, bump allocated chunks are given a header and any attempt to perform an unsupported operation will result in an ERROR. Without MEMORY_CONTEXT_CHECKING, code attempting an unsupported operation could result in a segfault. A follow-on commit will implement the first user of bump. Author: David Rowley Reviewed-by: Nathan Bossart Reviewed-by: Matthias van de Meent Reviewed-by: Tomas Vondra Reviewed-by: John Naylor Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com	2024-04-08 00:02:43 +12:00
David Rowley	0ba8b75e7e	Enlarge bit-space for MemoryContextMethodID Reserve 4 bits for MemoryContextMethodID rather than 3. 3 bits did technically allow a maximum of 8 memory context types, however, we've opted to reserve some bit patterns which left us with only 4 slots, all of which were used. Here we add another bit which frees up 8 slots for future memory context types. In passing, adjust the enum names in MemoryContextMethodID to make it more clear which ones can be used and which ones are reserved. Author: Matthias van de Meent, David Rowley Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com	2024-04-07 23:32:00 +12:00
David Rowley	c4ab7da606	Avoid needless large memcpys in libpq socket writing Until now, when calling pq_putmessage to write new data to a libpq socket, all writes are copied into a buffer and that buffer gets flushed when full to avoid having to perform small writes to the socket. There are cases where we must write large amounts of data to the socket, sometimes larger than the size of the buffer. In this case, it's wasteful to memcpy this data into the buffer and flush it out, instead, we can send it directly from the memory location that the data is already stored in. Here we adjust internal_putbytes() so that after having just flushed the buffer to the socket, if the remaining bytes to send is as big or bigger than the buffer size, we just send directly rather than needlessly copying into the PqSendBuffer buffer first. Examples of operations that write large amounts of data in one message are; outputting large tuples with SELECT or COPY TO STDOUT and pg_basebackup. Author: Melih Mutlu Reviewed-by: Heikki Linnakangas Reviewed-by: Jelte Fennema-Nio Reviewed-by: David Rowley Reviewed-by: Ranier Vilela Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CAGPVpCR15nosj0f6xe-c2h477zFR88q12e6WjEoEZc8ZYkTh3Q@mail.gmail.com	2024-04-07 21:20:18 +12:00
Andres Freund	a97bbe1f1d	Reduce branches in heapgetpage()'s per-tuple loop Until now, heapgetpage()'s loop over all tuples performed some conditional checks for each tuple, even though condition did not change across the loop. This commit fixes that by moving the loop into an inline function. By calling it with different constant arguments, the compiler can generate an optimized loop for the different conditions, at the price of two per-page checks. For cases of all-visible tables and an isolation level other than serializable, speedups of up to 25% have been measured. Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Zhang Mingli <zmlpostgres@gmail.com> Tested-by: Quan Zongliang <quanzongliang@yeah.net> Discussion: https://postgr.es/m/20230716015656.xjvemfbp5fysjiea@awork3.anarazel.de Discussion: https://postgr.es/m/2ef7ff1b-3d18-2283-61b1-bbd25fc6c7ce@yeah.net	2024-04-06 23:52:26 -07:00
Nathan Bossart	41c51f0c68	Optimize visibilitymap_count() with AVX-512 instructions. Commit `792752af4e` added infrastructure for using AVX-512 intrinsic functions, and this commit uses that infrastructure to optimize visibilitymap_count(). Specificially, a new pg_popcount_masked() function is introduced that applies a bitmask to every byte in the buffer prior to calculating the population count, which is used to filter out the all-visible or all-frozen bits as needed. Platforms without AVX-512 support should also see a nice speedup due to the reduced number of calls to a function pointer. Co-authored-by: Ants Aasma Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com	2024-04-06 22:58:23 -05:00
Nathan Bossart	792752af4e	Optimize pg_popcount() with AVX-512 instructions. Presently, pg_popcount() processes data in 32-bit or 64-bit chunks when possible. Newer hardware that supports AVX-512 instructions can use 512-bit chunks, which provides a nice speedup, especially for larger buffers. This commit introduces the infrastructure required to detect compiler and CPU support for the required AVX-512 intrinsic functions, and it adds a new pg_popcount() implementation that uses these functions. If CPU support for this optimized implementation is detected at runtime, a function pointer is updated so that it is used by subsequent calls to pg_popcount(). Most of the existing in-tree calls to pg_popcount() should benefit from these instructions, and calls with smaller buffers should at least not regress compared to v16. The new infrastructure introduced by this commit can also be used to optimize visibilitymap_count(), but that is left for a follow-up commit. Co-authored-by: Paul Amonson, Ants Aasma Reviewed-by: Matthias van de Meent, Tom Lane, Noah Misch, Akash Shankaran, Alvaro Herrera, Andres Freund, David Rowley Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com	2024-04-06 21:56:23 -05:00
Thomas Munro	158f581923	Fix if/while thinko in read_stream.c edge case. When we determine that a wanted block can't be combined with the current pending read, it's time to start that read to get it out of the way. An "if" in that code path should have been a "while", because it might take more than one go in case of partial reads. This was only broken for smaller ranges, as the more common case of io_combine_limit-sized ranges is handled earlier in the code and knows how to loop, hiding the bug for a while. Discovered while testing large parallel sequential scans of partially cached tables. The ramp-up-and-down block allocator for parallel scans could hit the problem case and skip some blocks near the end that should have been streamed. Defect in commit `b5a9b18c`. Discussion: https://postgr.es/m/CA%2BhUKG%2Bh8Whpv0YsJqjMVkjYX%2B80fTVc6oi-V%2BzxJvykLpLHYQ%40mail.gmail.com	2024-04-07 14:51:33 +12:00
Tom Lane	beb012b42f	Disable parallel query in psql error-with-FETCH_COUNT test. The buildfarm members using debug_parallel_query = regress are mostly unhappy with this test. I guess what is happening is that rows generated by a parallel worker are buffered, and might or might not get to the leader before the expected error occurs. We did not see any variability in the old version of this test because each FETCH would succeed or fail atomically, leading to a predictable number of rows emitted before failure. I don't find this to be a bug, just unspecified behavior, so let's disable parallel query for this one test case to make the results stable.	2024-04-06 21:49:24 -04:00
Tom Lane	90f5178211	Re-implement psql's FETCH_COUNT feature atop libpq's chunked mode. Formerly this was done with a cursor, which is problematic since not all result-set-returning query types can be put into a cursor. The new implementation is better integrated into other psql features, too. Daniel Vérité, reviewed by Laurenz Albe and myself (and whacked around a bit by me, so any remaining bugs are my fault) Discussion: https://postgr.es/m/CAKZiRmxsVTkO928CM+-ADvsMyePmU3L9DQCa9NwqjvLPcEe5QA@mail.gmail.com	2024-04-06 20:45:11 -04:00
Tom Lane	4643a2b265	Support retrieval of results in chunks with libpq. This patch generalizes libpq's existing single-row mode to allow individual partial-result PGresults to contain up to N rows, rather than always one row. This reduces malloc overhead compared to plain single-row mode, and it is very useful for psql's FETCH_COUNT feature, since otherwise we'd have to add code (and cycles) to either merge single-row PGresults into a bigger one or teach psql's results-printing logic to accept arrays of PGresults. To avoid API breakage, PQsetSingleRowMode() remains the same, and we add a new function PQsetChunkedRowsMode() to invoke the more general case. Also, PGresults obtained the old way continue to carry the PGRES_SINGLE_TUPLE status code, while if PQsetChunkedRowsMode() is used then their status code is PGRES_TUPLES_CHUNK. The underlying logic is the same either way, though. Daniel Vérité, reviewed by Laurenz Albe and myself (and whacked around a bit by me, so any remaining bugs are my fault) Discussion: https://postgr.es/m/CAKZiRmxsVTkO928CM+-ADvsMyePmU3L9DQCa9NwqjvLPcEe5QA@mail.gmail.com	2024-04-06 20:45:11 -04:00
Tomas Vondra	92641d8d65	Change BitmapAdjustPrefetchIterator to accept BlockNumber BitmapAdjustPrefetchIterator() only used the blockno member of the passed in TBMIterateResult to ensure that the prefetch iterator and regular iterator stay in sync. Pass it the BlockNumber only, so that we can move away from using the TBMIterateResult outside of table AM specific code. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com	2024-04-07 01:25:15 +02:00
Tomas Vondra	1fdb0ce9b1	BitmapHeapScan: Use correct recheck flag for skip_fetch As of `7c70996ebf`, BitmapPrefetch() used the recheck flag for the current block to determine whether or not it should skip prefetching the proposed prefetch block. As explained in the comment, this assumed the index AM will report the same recheck value for the future page as it did for the current page - but there's no guarantee. This only affects prefetching - if the recheck flag changes, we may prefetch blocks unecessarily and not prefetch blocks that will be needed. But we don't need to rely on that assumption - we know the recheck flag for the block we're considering prefetching, so we can use that. The impact is very limited in practice - the opclass would need to assign different recheck flags to different blocks, but none of the built-in opclasses seems to do that. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Tom Lane Discussion: https://postgr.es/m/1939305.1712415547%40sss.pgh.pa.us	2024-04-07 00:51:03 +02:00
Tomas Vondra	04e72ed617	BitmapHeapScan: Push skip_fetch optimization into table AM Commit `7c70996ebf` introduced an optimization to allow bitmap scans to operate like index-only scans by not fetching a block from the heap if none of the underlying data is needed and the block is marked all visible in the visibility map. With the introduction of table AMs, a FIXME was added to this code indicating that the skip_fetch logic should be pushed into the table AM-specific code, as not all table AMs may use a visibility map in the same way. This commit resolves this FIXME for the current block. The layering violation is still present in BitmapHeapScans's prefetching code, which uses the visibility map to decide whether or not to prefetch a block. However, this can be addressed independently. Author: Melanie Plageman Reviewed-by: Andres Freund, Heikki Linnakangas, Tomas Vondra, Mark Dilger Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com	2024-04-07 00:24:14 +02:00
Alexander Korotkov	87c21bb941	Implement ALTER TABLE ... SPLIT PARTITION ... command This new DDL command splits a single partition into several parititions. Just like ALTER TABLE ... MERGE PARTITIONS ... command, new patitions are created using createPartitionTable() function with parent partition as the template. This commit comprises quite naive implementation which works in single process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the operations including the tuple routing. This is why this new DDL command can't be recommended for large partitioned tables under a high load. However, this implementation come in handy in certain cases even as is. Also, it could be used as a foundation for future implementations with lesser locking and possibly parallel. Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru Author: Dmitry Koval Reviewed-by: Matthias van de Meent, Laurenz Albe, Zhihong Yu, Justin Pryzby Reviewed-by: Alvaro Herrera, Robert Haas, Stephane Tachoires	2024-04-07 01:18:44 +03:00
Alexander Korotkov	1adf16b8fb	Implement ALTER TABLE ... MERGE PARTITIONS ... command This new DDL command merges several partitions into the one partition of the target table. The target partition is created using new createPartitionTable() function with parent partition as the template. This commit comprises quite naive implementation which works in single process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the operations including the tuple routing. This is why this new DDL command can't be recommended for large partitioned tables under a high load. However, this implementation come in handy in certain cases even as is. Also, it could be used as a foundation for future implementations with lesser locking and possibly parallel. Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru Author: Dmitry Koval Reviewed-by: Matthias van de Meent, Laurenz Albe, Zhihong Yu, Justin Pryzby Reviewed-by: Alvaro Herrera, Robert Haas, Stephane Tachoires	2024-04-07 01:18:43 +03:00
Tomas Vondra	fe1431e39c	BitmapHeapScan: postpone setting can_skip_fetch Set BitmapHeapScanState->can_skip_fetch in BitmapHeapNext() instead of in ExecInitBitmapHeapScan(). This is a preliminary step to pushing the skip fetch optimization into heap AM code. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com	2024-04-06 23:56:49 +02:00
Alexander Korotkov	74eaf66f98	Call WaitLSNCleanup() in AbortTransaction() Even though waiting for replay LSN happens without explicit transaction, AbortTransaction() is responsible for the cleanup of the shared memory if the error is thrown in a stored procedure. So, we need to do WaitLSNCleanup() there to clean up after some unexpected error happened while waiting for replay LSN. Discussion: https://postgr.es/m/202404051815.eri4u5q6oj26%40alvherre.pgsql Author: Alvaro Herrera	2024-04-07 00:49:53 +03:00
Alexander Korotkov	ee79928441	Clarify what is protected by WaitLSNLock Not just WaitLSNState.waitersHeap, but also WaitLSNState.procInfos and updating of WaitLSNState.minWaitedLSN is protected by WaitLSNLock. There is one now documented exclusion on fast-path checking of WaitLSNProcInfo.inHeap flag. Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql	2024-04-07 00:49:53 +03:00
Alexander Korotkov	25f42429e2	Use an LWLock instead of a spinlock in waitlsn.c This should prevent busy-waiting when number of waiting processes is high. Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql Author: Alvaro Herrera	2024-04-07 00:49:53 +03:00
Tomas Vondra	1577081e96	BitmapHeapScan: begin scan after bitmap creation Change the order so that the table scan is initialized only after initializing the index scan and building the bitmap. This is mostly a cosmetic change for now, but later commits will need to pass parameters to table_beginscan_bm() that are unavailable in ExecInitBitmapHeapScan(). Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com	2024-04-06 22:58:04 +02:00
Noah Misch	06558f4952	Backport IPC::Run optimization to src/test/perl. This one-liner makes the TAP portion of "make check-world" 7% faster on a non-Windows machine. Discussion: https://postgr.es/m/20240331050310.09@rfd.leadboat.com	2024-04-06 09:27:55 -07:00
Peter Geoghegan	5bf748b86b	Enhance nbtree ScalarArrayOp execution. Commit `9e8da0f7` taught nbtree to handle ScalarArrayOpExpr quals natively. This works by pushing down the full context (the array keys) to the nbtree index AM, enabling it to execute multiple primitive index scans that the planner treats as one continuous index scan/index path. This earlier enhancement enabled nbtree ScalarArrayOp index-only scans. It also allowed scans with ScalarArrayOp quals to return ordered results (with some notable restrictions, described further down). Take this general approach a lot further: teach nbtree SAOP index scans to decide how to execute ScalarArrayOp scans (when and where to start the next primitive index scan) based on physical index characteristics. This can be far more efficient. All SAOP scans will now reliably avoid duplicative leaf page accesses (just like any other nbtree index scan). SAOP scans whose array keys are naturally clustered together now require far fewer index descents, since we'll reliably avoid starting a new primitive scan just to get to a later offset from the same leaf page. The scan's arrays now advance using binary searches for the array element that best matches the next tuple's attribute value. Required scan key arrays (i.e. arrays from scan keys that can terminate the scan) ratchet forward in lockstep with the index scan. Non-required arrays (i.e. arrays from scan keys that can only exclude non-matching tuples) "advance" without the process ever rolling over to a higher-order array. Naturally, only required SAOP scan keys trigger skipping over leaf pages (non-required arrays cannot safely end or start primitive index scans). Consequently, even index scans of a composite index with a high-order inequality scan key (which we'll mark required) and a low-order SAOP scan key (which we won't mark required) now avoid repeating leaf page accesses -- that benefit isn't limited to simpler equality-only cases. In general, all nbtree index scans now output tuples as if they were one continuous index scan -- even scans that mix a high-order inequality with lower-order SAOP equalities reliably output tuples in index order. This allows us to remove a couple of special cases that were applied when building index paths with SAOP clauses during planning. Bugfix commit `807a40c5` taught the planner to avoid generating unsafe path keys: path keys on a multicolumn index path, with a SAOP clause on any attribute beyond the first/most significant attribute. These cases are now all safe, so we go back to generating path keys without regard for the presence of SAOP clauses (just like with any other clause type). Affected queries can now exploit scan output order in all the usual ways (e.g., certain "ORDER BY ... LIMIT n" queries can now terminate early). Also undo changes from follow-up bugfix commit `a4523c5a`, which taught the planner to produce alternative index paths, with path keys, but without low-order SAOP index quals (filter quals were used instead). We'll no longer generate these alternative paths, since they can no longer offer any meaningful advantages over standard index qual paths. Affected queries thereby avoid all of the disadvantages that come from using filter quals within index scan nodes. They can avoid extra heap page accesses from using filter quals to exclude non-matching tuples (index quals will never have that problem). They can also skip over irrelevant sections of the index in more cases (though only when nbtree determines that starting another primitive scan actually makes sense). There is a theoretical risk that removing restrictions on SAOP index paths from the planner will break compatibility with amcanorder-based index AMs maintained as extensions. Such an index AM could have the same limitations around ordered SAOP scans as nbtree had up until now. Adding a pro forma incompatibility item about the issue to the Postgres 17 release notes seems like a good idea. Author: Peter Geoghegan <pg@bowt.ie> Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/CAH2-Wz=ksvN_sjcnD1+Bt-WtifRA5ok48aDYnq3pkKhxgMQpcw@mail.gmail.com	2024-04-06 11:47:10 -04:00
Tom Lane	ddd9e43a92	Remove obsolete comment in CopyReadLineText(). When this bit of commentary was written, it was alluding to the fact that we looked for newlines and EOD markers in the raw (not yet encoding-converted) input data. We don't do that anymore, preferring to batch the conversion of larger chunks of input and split it into lines later. Hence there's no longer any need for assumptions about the relevant characters being encoding-invariant, and we should remove this comment saying we assume that. Discussion: https://postgr.es/m/1461688.1712347668@sss.pgh.pa.us	2024-04-06 11:16:27 -04:00
John Naylor	a365d9e2e8	Speed up tail processing when hashing aligned C strings, take two After encountering the NUL terminator, the word-at-a-time loop exits and we must hash the remaining bytes. Previously we calculated the terminator's position and re-loaded the remaining bytes from the input string. This was slower than the unaligned case for very short strings. We already have all the data we need in a register, so let's just mask off the bytes we need and hash them immediately. In addition to endianness issues, the previous attempt upset valgrind in the way it computed the mask. Whether by accident or by wisdom, the author's proposed method passes locally with valgrind 3.22. Ants Aasma, with cosmetic adjustments by me Discussion: https://postgr.es/m/CANwKhkP7pCiW_5fAswLhs71-JKGEz1c1%2BPC0a_w1fwY4iGMqUA%40mail.gmail.com	2024-04-06 17:14:28 +07:00
John Naylor	0c25fee359	Teach fasthash_accum to use platform endianness for bytewise loads This function previously used a mix of word-wise loads and bytewise loads. The bytewise loads happened to be little-endian regardless of platform. This in itself is not a problem. However, a future commit will require the same result whether A) the input is loaded as a word with the relevent bytes masked-off, or B) the input is loaded one byte at a time. While at it, improve debuggability of the internal hash state. Discussion: https://postgr.es/m/CANWCAZZpuV1mES1mtSpAq8tWJewbrv4gEz6R_k4gzNG8GZ5gag%40mail.gmail.com	2024-04-06 17:14:28 +07:00
Thomas Munro	98f320eb2e	Increase default vacuum_buffer_usage_limit to 2MB. The BAS_VACUUM ring size has been 256kB since commit `d526575f` introduced the mechanism 17 years ago. Commit `1cbbee03` recently made it configurable but retained the traditional default. The correct default size has been debated for years, but 256kB is certainly very small. VACUUM soon needs to write back data it dirtied only 32 blocks ago, which usually requires flushing the WAL. New experiments in prefetching pages for VACUUM exacerbated the problem by crashing into dirty data even sooner. Let's make the default 2MB. That's 1.6% of the default toy buffer pool size, and 0.2% of 1GB, which would be a considered a small shared_buffers setting for a real system these days. Users are still free to set the GUC to a different value. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20240403221257.md4gfki3z75cdyf6%40awork3.anarazel.de Discussion: https://postgr.es/m/CA%2BhUKGLY4Q4ZY4f1rvnFtv6%2BPkjNf8MejdPkcju3Qii9DYqqcQ%40mail.gmail.com	2024-04-06 23:12:03 +13:00
Thomas Munro	3bd8439ed6	Allow BufferAccessStrategy to limit pin count. While pinning extra buffers to look ahead, users of strategies are in danger of using too many buffers. For some strategies, that means "escaping" from the ring, and in others it means forcing dirty data to disk very frequently with associated WAL flushing. Since external code has no insight into any of that, allow individual strategy types to expose a clamp that should be applied when deciding how many buffers to pin at once. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_aJXnqsyZt6HwFLnxYEBgE17oypkxbKbT1t1geE_wvH2Q%40mail.gmail.com	2024-04-06 23:11:45 +13:00
John Naylor	f956ecd035	Convert uses of hash_string_pointer to fasthash equivalent Remove duplicate hash_string_pointer() function definitions by creating a new inline function hash_string() for this purpose. This has the added advantage of avoiding strlen() calls when doing hash lookup. It's not clear how many of these are perfomance-sensitive enough to benefit from that, but the simplification is worth it on its own. Reviewed by Jeff Davis Discussion: https://postgr.es/m/CANWCAZbg_XeSeY0a_PqWmWqeRATvzTzUNYRLeT%2Bbzs%2BYQdC92g%40mail.gmail.com	2024-04-06 12:20:40 +07:00
John Naylor	db17594ad7	Add macro to disable address safety instrumentation fasthash_accum_cstring_aligned() uses a technique, found in various strlen() implementations, to detect a string's NUL terminator by reading a word at at time. That triggers failures when testing with "-fsanitize=address", at least with frontend code. To enable using this function anywhere, add a function attribute macro to disable such testing. Reviewed by Jeff Davis Discussion: https://postgr.es/m/CANWCAZbwvp7oUEkbw-xP4L0_S_WNKq-J-ucP4RCNDPJnrakUPw%40mail.gmail.com	2024-04-06 12:20:40 +07:00
John Naylor	4b968e2027	Fix incorrect return type fasthash32() calculates a 32-bit hashcode, but the return type was uint64. Change to uint32. Noted by Jeff Davis Discussion: https://postgr.es/m/b16c93e6c736a422d4de668343515375664eb05d.camel%40j-davis.com	2024-04-06 12:20:40 +07:00
Thomas Munro	aa1e8c2064	Improve read_stream.c's fast path. The "fast path" for well cached scans that don't do any I/O was accidentally coded in a way that could only be triggered by pg_prewarm's usage pattern, which starts out with a higher distance because of the flags it passes in. We want it to work for streaming sequential scans too, once that patch is committed. Adjust. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGKXZALJ%3D6aArUsXRJzBm%3Dqvc4AWp7%3DiJNXJQqpbRLnD_w%40mail.gmail.com	2024-04-06 18:00:35 +13:00
Andres Freund	9e7386924e	Fix headerscheck violation introduced in `f8ce4ed78c` Per ci.	2024-04-05 14:27:45 -07:00

1 2 3 4 5 ...

58306 Commits All Branches Search

58306 Commits

All Branches