Properly update the number of bits set in the bitmap after merging the
filters in brin_bloom_union.
This is mostly harmless, as the counter is used only in the output
function, which means pageinspect may show incorrect information about
the BRIN summary. The counter does not affect correctness.
Discovered while adding a regression test comparing indexes built with
and without parallelism. The parallel index builds exercise the union
procedure when merging results from workers, which is otherwise very
hard to do in a test. Which is why this went unnoticed until now.
Backpatch through 14, where the BRIN bloom opclasses were introduced.
Backpatch-through: 14
Discussion: https://postgr.es/m/1df00a66-db5a-4e66-809a-99b386a06d86%40enterprisedb.com
GetPageWithFreeSpace() callers assume the returned block exists in the
main fork, failing with "could not read block" errors if that doesn't
hold. Make that assumption reliable now. It hadn't been guaranteed,
due to the weak WAL and data ordering of participating components. Most
operations on the fsm fork are not WAL-logged. Relation extension is
not WAL-logged. Hence, an fsm-fork block on disk can reference a
main-fork block that no WAL record has initialized. That could happen
after an OS crash, a replica promote, or a PITR restore. wal_log_hints
makes the trouble easier to hit; a replica promote or PITR ending just
after a relevant fsm-fork FPI_FOR_HINT may yield this broken state. The
v16 RelationAddBlocks() mechanism also makes the trouble easier to hit,
since it bulk-extends even without extension lock waiters. Commit
917dc7d239 stopped trouble around
truncation, but vectors involving PageIsNew() pages remained.
This implementation adds a RelationGetNumberOfBlocks() call when the
cached relation size doesn't confirm a block exists. We've been unable
to identify a benchmark that slows materially, but this may show up as
additional time in lseek(). An alternative without that overhead would
be a new ReadBufferMode such that ReadBufferExtended() returns NULL
after a 0-byte read, with all other errors handled normally. However,
each GetFreeIndexPage() caller would then need code for the return-NULL
case. Back-patch to v14, due to earlier versions not caching relation
size and the absence of a pre-v16 problem report.
Ronan Dunklau. Reported by Ronan Dunklau.
Discussion: https://postgr.es/m/1878547.tdWV9SEqCh%40aivenlaptop
In the libpq encryption negotiation tests, don't run the GSSAPI tests
unless PG_TEST_EXTRA='kerberos' is also set. That makes it possible to
still run most of the tests when GSSAPI support is compiled in, but
there's no MIT Kerberos installation.
The test targets libpq's options, so 'src/test/interfaces/libpq/t' is
a more natural place for it.
While doing this, I noticed that I had missed adding the
libpq_encryption subdir to the Makefile. That's why this commit only
needs to remove it from the meson.build file.
Per Peter Eisentraut's suggestion.
Discussion: https://www.postgresql.org/message-id/09d4bf5d-d0fa-4c66-a1d7-5ec757609646@eisentraut.org
It correctly skipped tests involving SSL in the server when SSL
support was not compiled in, but even when SSL is not enabled in the
server and the connection is established without SSL, libpq behaves
differently in many of the test scenarios when libpq is compiled
without SSL support. For example, with sslmode=prefer, if libpq is
compiled with SSL support it will attempt to use SSL, but without SSL
support it will try authenticating in plaintext mode directly. The
expected test output didn't take that into account.
Discussion: https://www.postgresql.org/message-id/CA%2BhUKG%2BHRTtB%2Bx%2BKKKj_cfX6sNhbeGuqmGxjGMwdVPG7YGFP8w@mail.gmail.com
Per gripes from Michael Paquier
Discussion: https://postgr.es/m/ZhTQ6_w1vwOhqTQI@paquier.xyz
Along the way, also clean up a handful of typos in 3311ea86ed and
ea7b4e9a2a, found by Alexander Lakhin, and a couple of stylistic
snafus noted by Daniel Westermann and Daniel Gustafsson.
Coverity complained about not freeing some memory associated with
incrementally parsing backup manifests. To fix that, provide and use a new
shutdown function for the JsonManifestParseIncrementalState object, in
line with a suggestion from Tom Lane.
While analysing the problem, I noticed a buglet in freeing memory for
incremental json lexers. To fix that remove a bogus condition on
freeing the memory allocated for them.
Ensure that when updating the catalog_xmin of the synced slots, it is
first written to disk before changing the in-memory value
(effective_catalog_xmin). This is to prevent a scenario where the
in-memory value change triggers a vacuum to remove catalog tuples before
the catalog_xmin is written to disk. In the event of a crash before the
catalog_xmin is persisted, we would not know that some required catalog
tuples have been removed and the synced slot would be invalidated.
Change the sanity check to ensure that remote_slot's confirmed_flush LSN
can't precede the local/synced slot during slot sync. Note that the
restart_lsn of the synced/local slot can be ahead of remote_slot. This can
happen when slot advancing machinery finds a running xacts record after
reaching the consistent state at a later point than the primary where it
serializes the snapshot and updates the restart_lsn.
Make the check to sync slots robust by allowing to sync only when the
confirmed_lsn, restart_lsn, or catalog_xmin of the remote slot is ahead of
the synced/local slot.
Reported-by: Amit Kapila and Shveta Malik
Author: Hou Zhijie, Shveta Malik
Reviewed-by: Amit Kapila, Bertrand Drouvot
Discussion: https://postgr.es/m/OS0PR01MB57162B67D3CB01B2756FBA6D94062@OS0PR01MB5716.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/CAJpy0uCSS5zmdyUXhvw41HSdTbRqX1hbYqkOfHNj7qQ+2zn0AQ@mail.gmail.com
b262ad440 added code to have the planner remove redundant IS NOT NULL
quals and eliminate needless scans for IS NULL quals on tables where the
qual's column has a NOT NULL constraint.
That commit failed to consider that an inheritance parent table could
have differing NOT NULL constraints between the parent and the child.
This caused issues as if we eliminated a qual on the parent, when
applying the quals to child tables in apply_child_basequals(), the qual
might not have been added to the parent's baserestrictinfo.
Here we fix this by not applying the optimization to remove redundant
quals to RelOptInfos belonging to inheritance parents and applying the
optimization again in apply_child_basequals(). Effectively, this means
that the parent and child are considered independently as the parent has
both an inh=true and inh=false RTE and we still apply the optimization
to the RelOptInfo corresponding to the inh=false RTE.
We're able to still apply the optimization in add_base_clause_to_rel()
for partitioned tables as the NULLability of partitions must match that
of their parent. And, if we ever expand restriction_is_always_false()
and restriction_is_always_true() to handle partition constraints then we
can apply the same logic as, even in multi-level partitioned tables,
there's no way to route values to a partition when the qual does not
match the partition qual of the partitioned table's parent partition.
The same is true for CHECK constraints as those must also match between
arent partitioned tables and their partitions.
Author: Richard Guo, David Rowley
Discussion: https://postgr.es/m/CAMbWs4930gQSZmjR7aANzEapdy61gCg6z8dT-kAEYD0sYWKPdQ@mail.gmail.com
A pairing heap can perform the same operations as the binary heap +
index, with as good or better algorithmic complexity, and that's an
existing data structure so that we don't need to invent anything new
compared to v16. This commit makes the new binaryheap functionality
that was added in commits b840508644 and bcb14f4abc unnecessary, but
they will be reverted separately.
Remove the optimization to only build and maintain the heap when the
amount of memory used is close to the limit, becuase the bookkeeping
overhead with the pairing heap seems to be small enough that it
doesn't matter in practice.
Reported-by: Jeff Davis
Author: Heikki Linnakangas
Reviewed-by: Michael Paquier, Hayato Kuroda, Masahiko Sawada
Discussion: https://postgr.es/m/12747c15811d94efcc5cda72d6b35c80d7bf3443.camel%40j-davis.com
The user can set RELSEG_SIZE to a high number at compile time, so we
can't use it to control the size of an array on the stack: it could be
many gigabytes in size. On closer inspection, we don't really need that
intermediate array anyway. Let's just write directly into the output
array, and then perform the absolute->relative adjustment in place.
This fixes new code from commit dc21234005.
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2B2hZ0sBztPW4mkLfng0qfkNtAHFUfxOMLizJ0BPmi5%2Bg%40mail.gmail.com
aa5edbe379 has tweaked _hash_freeovflpage() so as the write buffer's
LSN is updated only when necessary, when REGBUF_NO_CHANGE is not used.
The replay code was not consistent with that, causing the write buffer's
LSN to be updated and its page to be marked as dirty even if the buffer
was registered in a "clean" state. This was possible for the case of a
squeeze record when there are no tuples to add to the write buffer, for
(is_prim_bucket_same_wrt && !is_prev_bucket_same_wrt).
I have performed some validation of this commit with
wal_consistency_checking and a change in WAL that logs REGBUF_NO_CHANGE
to a new BKPIMAGE_*. Thanks to that, it is possible to know at replay
if a buffer was clean when it was registered, then cross-checked the LSN
of the "clean" page copy coming from WAL with the LSN of the block once
the record has been replayed. This eats one bit in bimg_info, which is
not acceptable to be integrated as-is, but it could become handy in the
future. I didn't spot other areas than the one fixed by this commit at
the extent of what the main regression test suite covers.
As this is an oversight in aa5edbe379, no backpatch is required.
Reported-by: Zubeyr Eryilmaz
Author: Hayato Kuroda
Reviewed-by: Amit Kapila, Michael Paquier
Discussion: https://postgr.es/m/ZbyVVG_7eW3YD5-A@paquier.xyz
Up to now, read_sql_construct() has collected all the source text from
the statement or expression's initial token up to the character just
before the "until" token. It normally tries to strip trailing
whitespace from that, largely for neatness. If there was a "-- text"
comment after the expression, this resulted in removing the newline
that terminates the comment, which creates a hazard if we try to paste
the collected text into a larger SQL construct without inserting a
newline after it. In particular this caused our handling of CASE
constructs to fail if there's a comment after a WHEN expression.
Commit 4adead1d2 noticed a similar problem with cursor arguments,
and worked around it through the rather crude hack of suppressing
the whitespace-trimming behavior for those. Rather than do that
and leave the hazard open for future hackers to trip over, let's
fix it properly. pl_scanner.c already has enough infrastructure
to report the end location of the expression's last token, so
we can copy up to that location and never collect any trailing
whitespace or comment to begin with.
Erik Wienhold and Tom Lane, per report from Michal Bartak.
Back-patch to all supported branches.
Discussion: https://postgr.es/m/CAAVzF_FjRoi8fOVuLCZhQJx6HATQ7MKm=aFOHWZODFnLmjX-xA@mail.gmail.com
f587338dec has introduced in the test module injection_points a SQL
function called injection_points_set_local(), that can be used to make
all the injection points linked to the process where they are attached,
discarded automatically if any remain once the process exits.
e2e3b8ae9e has added a NO_INSTALLCHECK to the test module to prevent
the use of installcheck. Now that there is a way to make the test
concurrent-safe, let's use it and remove the installcheck restriction.
Concurrency issues could be easily reproduced by running in a tight
loop a command like this one, in src/test/modules/gin/ (hardcoding
pg_sleep() after attaching injection points enlarges the race window)
and a second test suite like contrib/btree_gin/:
make installcheck USE_MODULE_DB=1
Reviewed-by: Andrey Borodin
Discussion: https://postgr.es/m/ZhNG4Io9uYOgwv3F@paquier.xyz
Wait for the standby to catch up before syncing the slots with
pg_sync_replication_slots(), otherwise, the logical slot could be ahead
and the sync would fail.
The other way to fix the test is to change it to use slotsync worker and
poll for the sync to get finished but the current approach is better as
this is a predictable way to write the test.
Per buildfarm
Author: Hou Zhijie
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/OS0PR01MB571665359F2F5DCD3ADABC9F94002@OS0PR01MB5716.jpnprd01.prod.outlook.com
Commit 72559438 started copying more attributes from AttributeTemplate
to the functions we generate on the fly. In the case of deform
functions, which return void, this meant that "noundef", from
AttributeTemplate's return value (a Datum) was copied to a void type.
Older LLVM releases were OK with that, but LLVM 18 crashes.
Update our llvm_copy_attributes() function to skip copying the attribute
for the return value, if the target function returns void.
Thanks to Dmitry Dolgov for help chasing this down.
Back-patch to all supported releases, like 72559438.
Reported-by: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Discussion: https://postgr.es/m/CAFj8pRACpVFr7LMdVYENUkScG5FCYMZDDdSGNU-tch%2Bw98OxYg%40mail.gmail.com
This adjusts various appendStringInfo* function calls to use a more
appropriate and efficient function with the same behavior. For example,
use appendStringInfoChar() when appending a single character rather than
appendStringInfo() and appendStringInfoString() when no formatting is
required rather than using appendStringInfo().
All adjustments made here are in code that's new to v17, so it makes
sense to fix these now rather than wait a few years and make
backpatching harder.
Discussion: https://postgr.es/m/CAApHDvojY2UvMiO+9_55ArTj10P1LBNJyyoGB+C65BLDNT0GsQ@mail.gmail.com
Reviewed-by: Nathan Bossart, Tom Lane
The bump allocator was recently added in 29f6a959c. Our other
allocators have a similar macro to this, but seemingly the version of
the macro for those allocators is only used in places where the chunk
header is decoded. Since the bump allocator has no chunk header, none
of those functions exist for bump therefore macro is unused. Remove it.
Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/5f724fb2-96e1-4f36-b65b-47b337ad432e@eisentraut.org
This is a C11 feature, and we require C99. While at it, go the further
step and get rid of the surrounding union (with uintptr_t) entirely,
as there is currently no use case for this file to access the header of
BlocktableEntry as a uintptr_t, and there are no additional alignment
requirements. The least invasive way seems to be to transfer the old
union name to this struct.
Reported by Pavel Borisov and Andres Freund, per buildfarm member mylodon
Reviewed by Pavel Borisov
Discussion: https://postgr.es/m/CALT9ZEH11NYV8AOzKb1bWhCf6J0H=H31f0MgT9xX+HdqvcA1rw@mail.gmail.com
The module relies on a shmem exit callback to clean up any injection
points linked to a specific process. One of the tests checks for the
case of an injection point name reused in a second connection where the
first connection should clean it up, but it did not count for the fact
that the shmem exit callback of the first connection may not have run
when the second connection begins its work.
The regress library includes a wait_pid() that can be used for this
purpose, instead of a custom wait logic, so let's rely on it to wait for
the first connection to exit before working with the second connection.
The module gains a REGRESS_OPTS to be able to look at the regress
library's dlpath.
This issue could be reproduced with a hardcoded sleep() in the shmem
exit callback, and the CI has been able to trigger it sporadically.
Oversight in f587338dec.
Reported-by: Bharath Rupireddy
Reviewed-by: Andrey Borodin
Discussion: https://postgr.es/m/ZhOd3NXAutteokGL@paquier.xyz
After a query cancel, the tail end of ExecQueryAndProcessResults
took care to clear any not-yet-read PGresults; but it forgot about
the one it has already read. There would only be such a result
when handling a multi-command string made with "\;", so that you'd
have to cancel an earlier command in such a string to reach the
bug at all. Even then, there would only be leakage of a single
PGresult per cancel, so it's not surprising nobody noticed this.
But a leak is a leak.
Noted while re-reviewing 90f517821, but this is independent of that:
it dates to 7844c9918. Back-patch to v15 where that came in.
Alexander Lakhin noted an obsolete comment, which led me to revisit
some other important comments in the patch, and that study turned up a
couple of unintended ways in which the chunked-fetch code path didn't
match the normal code path in ExecQueryAndProcessResults. The only
nontrivial problem is that it didn't call PrintQueryStatus, so that
we'd not print the final status result from DML ... RETURNING
commands. To avoid code duplication, move the filter for whether a
result is from RETURNING from PrintQueryResult to PrintQueryStatus.
Discussion: https://postgr.es/m/0023bea5-79c0-476e-96c8-dad599cc3ad8@gmail.com
Previously, the decision to store values in leaves or within the child
pointer was made at compile time, with variable length values using
leaves by necessity. This commit allows introspecting the length of
variable length values at runtime for that decision. This requires
the ability to tell whether the last-level child pointer is actually
a value, so we use a pointer tag in the lowest level bit.
Use this in TID store. This entails adding a byte to the header to
reserve space for the tag. Commit f35bd9bf3 stores up to three offsets
within the header with no bitmap, and now the header can be embedded
as above. This reduces worst-case memory usage when TIDs are sparse.
Reviewed (in an earlier version) by Masahiko Sawada
Discussion: https://postgr.es/m/CANWCAZYw+_KAaUNruhJfE=h6WgtBKeDG32St8vBJBEY82bGVRQ@mail.gmail.com
Discussion: https://postgr.es/m/CAD21AoBci3Hujzijubomo1tdwH3XtQ9F89cTNQ4bsQijOmqnEw@mail.gmail.com
While keeping API the same, this commit provides a way for block-level table
AMs to re-use existing acquire_sample_rows() by providing custom callbacks
for getting the next block and the next tuple.
Reported-by: Andres Freund
Discussion: https://postgr.es/m/20240407214001.jgpg5q3yv33ve6y3%40awork3.anarazel.de
Reviewed-by: Pavel Borisov
Let table AM define custom reloptions for its tables. This allows specifying
AM-specific parameters by the WITH clause when creating a table.
The reloptions, which could be used outside of table AM, are now extracted
into the CommonRdOptions data structure. These options could be by decision
of table AM directly specified by a user or calculated in some way.
The new test module test_tam_options evaluates the ability to set up custom
reloptions and calculate fields of CommonRdOptions on their base.
The code may use some parts from prior work by Hao Wu.
Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Discussion: https://postgr.es/m/AMUA1wBBBxfc3tKRLLdU64rb.1.1683276279979.Hmail.wuhao%40hashdata.cn
Reviewed-by: Reviewed-by: Pavel Borisov, Matthias van de Meent, Jess Davis
It is possible that even if the primary waits for the subscriber to catch
up and then disables the subscription, the XLOG_RUNNING_XACTS record gets
inserted between the two steps by bgwriter and walsender processes it.
This can move the restart_lsn of the corresponding slot in an
unpredictable way which further leads to slot sync failure.
To ensure predictable behaviour, we drop the subscription and manually
create the slot before the test. The other idea we discussed to write a
predictable test is to use injection points to control the bgwriter
logging XLOG_RUNNING_XACTS but that needs more analysis. We can add a
separate test using injection points.
Per buildfarm
Author: Hou Zhijie
Reviewed-by: Amit Kapila, Shveta Malik
Discussion: https://postgr.es/m/CAA4eK1JD8h_XLRsK_o_Xh=5MhTzm+6d4Cb4_uPgFJ2wSQDah=g@mail.gmail.com
A NESTED path allows to extract data from nested levels of JSON
objects given by the parent path expression, which are projected as
columns specified using a nested COLUMNS clause, just like the parent
COLUMNS clause. Rows comprised from a NESTED columns are "joined"
to the row comprised from the parent columns. If a particular NESTED
path evaluates to 0 rows, then the nested COLUMNS will emit NULLs,
making it an OUTER join.
NESTED columns themselves may include NESTED paths to allow
extracting data from arbitrary nesting levels, which are likewise
joined against the rows at the parent level.
Multiple NESTED paths at a given level are called "sibling" paths
and their rows are combined by UNIONing them, that is, after being
joined against the parent row as described above.
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Reviewers have included (in no particular order):
Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup,
Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson,
Justin Pryzby, Álvaro Herrera, Jian He
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
Currently, get_json_expr_options() does not emit the default values
for QUOTES (KEEP QUOTES) and WRAPPER (WITHOUT WRAPPER). That causes
the deparsed JSON_TABLE() columns, such as those contained in a a
view's query, to behave differently when executed than the original
definition. That's because the rules encoded in
transformJsonTableColumns() will choose either JSON_VALUE() or
JSON_QUERY() as implementation to execute a given column's path
expression depending on the QUOTES and WRAPPER specificationd and
they have slightly different semantics.
Reported-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxEqhqsfrg_p7EMyo5zak3d767iFDL8vz_4%3DZBHpOtrghw%40mail.gmail.com
Currently, transformJsonFuncExpr() enforces some restrictions on
the combinations of QUOTES and WRAPPER clauses that can be specified
in JSON_QUERY(). The intent was to only prevent the useless
combination WITH WRAPPER OMIT QUOTES, but the coding prevented KEEP
QUOTES too, which is not helpful. Fix that.
When testing buffer pool logic, it is useful to be able to evict
arbitrary blocks. This function can be used in SQL queries over the
pg_buffercache view to set up a wide range of buffer pool states. Of
course, buffer mappings might change concurrently so you might evict a
block other than the one you had in mind, and another session might
bring it back in at any time. That's OK for the intended purpose of
setting up developer testing scenarios, and more complicated interlocking
schemes to give stronger guararantees about that would likely be less
flexible for actual testing work anyway. Superuser-only.
Author: Palak Chaturvedi <chaturvedipalak1911@gmail.com>
Author: Thomas Munro <thomas.munro@gmail.com> (docs, small tweaks)
Reviewed-by: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Cary Huang <cary.huang@highgo.ca>
Reviewed-by: Cédric Villemain <cedric.villemain+pgsql@abcsql.com>
Reviewed-by: Jim Nasby <jim.nasby@gmail.com>
Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CALfch19pW48ZwWzUoRSpsaV9hqt0UPyaBPC4bOZ4W+c7FF566A@mail.gmail.com
This adds tab completion of "GRANT" and "REVOKE [GRANT OPTION FOR]"
for ALTER DEFAULT PRIVILEGES, and adds "WITH GRANT OPTION" for
ALTER DEFAULT PRIVILEGES ... GRANT ... TO role.
Author: Vignesh C, with cosmetic adjustments by me
Reviewed-by: Shubham Khanna, Masahiko Sawada
Discussion: https://postgr.es/m/CALDaNm1aEdJb-QJi%3DGWStkfj_%2BEDUK_VtDkn%2BTjQ2z7HyU0MBw%40mail.gmail.com
One of the assertions was the subject of a false positive complaint from
Coverity, but none of the assertions added much, so get rid of them.
Reported-By: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3000247.1712537309@sss.pgh.pa.us
While SH_STAT() is only used for debugging, the allocated array can be large,
and therefore should be freed.
It's unclear why coverity started warning now.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Coverity
Discussion: https://postgr.es/m/3005248.1712538233@sss.pgh.pa.us
Backpatch: 12-
libpq now always tries to send ALPN. With the traditional negotiated
SSL connections, the server accepts the ALPN, and refuses the
connection if it's not what we expect, but connecting without ALPN is
still OK. With the new direct SSL connections, ALPN is mandatory.
NOTE: This uses "TBD-pgsql" as the protocol ID. We must register a
proper one with IANA before the release!
Author: Greg Stark, Heikki Linnakangas
Reviewed-by: Matthias van de Meent, Jacob Champion
By skipping SSLRequest, you can eliminate one round-trip when
establishing a TLS connection. It is also more friendly to generic TLS
proxies that don't understand the PostgreSQL protocol.
This is disabled by default in libpq, because the direct TLS handshake
will fail with old server versions. It can be enabled with the
sslnegotation=direct option. It will still fall back to the negotiated
TLS handshake if the server rejects the direct attempt, either because
it is an older version or the server doesn't support TLS at all, but
the fallback can be disabled with the sslnegotiation=requiredirect
option.
Author: Greg Stark, Heikki Linnakangas
Reviewed-by: Matthias van de Meent, Jacob Champion
This fixes the few corner cases noted in commit 705843d294, as shown
by the changes in the test.
Author: Heikki Linnakangas, Matthias van de Meent
Reviewed-by: Jacob Champion
The ANALYZE command prefetches and reads sample blocks chosen by a
BlockSampler algorithm. Instead of calling [Prefetch|Read]Buffer() for
each block, ANALYZE now uses the streaming API introduced in b5a9b18cd0.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com
This adds a new SQL function injection_points_set_local() that can be
used to force injection points to be run only in the process where they
are attached. This is handy for SQL tests to:
- Detach automatically injection points when the process exits.
- Allow tests with injection points to run concurrently with other test
suites, so as such modules do not have to be marked with
NO_INSTALLCHECK.
Currently, the only condition that can be registered is for a PID.
This could be extended to more kinds later, if required, like database
names/OIDs, roles, or more concepts I did not consider.
Using a single function for SQL scripts is an idea from Heikki
Linnakangas.
Reviewed-by: Andrey Borodin
Discussion: https://postgr.es/m/ZfP7IDs9TvrKe49x@paquier.xyz
The new "log_connection_negotiation" server option causes the server
to print messages to the log when it receives a SSLRequest or
GSSENCRequest packet from the client. Together with "log_connections",
it gives a trace of how a connection and encryption is
negotiatated. Use the option in the libpq_encryption test, to verify
in more detail how libpq negotiates encryption with different
gssencmode and sslmode options.
This revealed a couple of cases where libpq retries encryption or
authentication, when it should already know that it cannot succeed. I
marked them with XXX comments in the test tables. They only happen
when the connection was going to fail anyway, and only with rare
combinations of options, so they're not serious.
Discussion: https://www.postgresql.org/message-id/CAEze2Wja8VUoZygCepwUeiCrWa4jP316k0mvJrOW4PFmWP0Tcw@mail.gmail.com
Previously, libpq would establish the TCP connection, and then
immediately disconnect if the credentials were not available. The
same thing happened if you tried to use a Unix domain socket with
gssencmode=require. Check those conditions before establishing the TCP
connection.
This is a very minor issue, but my motivation to do this now is that
I'm about to add more detail to the tests for encryption negotiation.
This makes the case of gssencmode=require but no credentials
configured fail at the same stage as with gssencmode=require and
GSSAPI support not compiled at all. That avoids having to deal with
variations in expected output depending on build options.
Discussion: https://www.postgresql.org/message-id/CAEze2Wja8VUoZygCepwUeiCrWa4jP316k0mvJrOW4PFmWP0Tcw@mail.gmail.com
Test all combinations of gssencmode, sslmode, whether the server
supports SSL and/or GSSAPI encryption, and whether they are accepted
by pg_hba.conf. This is in preparation for refactoring that code in
libpq, and for adding a new option for "direct SSL" connections, which
adds another dimension to the logic.
If we add even more options in the future, testing all combinations
will become unwieldy and we'll need to rethink this, but for now an
exhaustive test is nice.
Author: Heikki Linnakangas, Matthias van de Meent
Reviewed-by: Jacob Champion
Discussion: https://www.postgresql.org/message-id/a3af4070-3556-461d-aec8-a8d794f94894@iki.fi
As written, the test would fail when run repeatedly because one of the
injection points attached was not detached. This would not matter if
the test is rewritten to be concurrently safe, but let's be clean and
it is a good practice.
Oversight in 6a1ea02c49.
Discussion: https://postgr.es/m/ZfP7IDs9TvrKe49x@paquier.xyz
Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...])
on the preliminary stage of optimization when we are still working with the
expression tree.
Here Cn is a n-th constant expression, 'expr' is non-constant expression, 'op'
is an operator which returns boolean result and has a commuter (for the case
of reverse order of constant and non-constant parts of the expression,
like 'Cn op expr').
Sometimes it can lead to not optimal plan. This is why there is a
or_to_any_transform_limit GUC. It specifies a threshold value of length of
arguments in an OR expression that triggers the OR-to-ANY transformation.
Generally, more groupable OR arguments mean that transformation will be more
likely to win than to lose.
Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru
Author: Alena Rybakina <lena.ribackina@yandex.ru>
Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Ranier Vilela <ranier.vf@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
When restarting the cluster fails the code introduced in 33774978c7
printed the full log contents to aid debugging. For cases when the
logfile is large this adds unnecessary overhead. Reduce to printing
the logfile path instead.
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20240406214439.2n4zf2w7ukhf7dsy@awork3.anarazel.de
In one multixact.c edge case, we need a mechanism to wait for one
multixact offset to be written before being allowed to read the next
one. We used to handle this case by sleeping for one millisecond and
retrying, but such sleeps have been reported as problematic in
production cases. We can avoid the problem by using a condition
variable: readers sleep on it and then every creator of multixacts
broadcasts into the CV when creation is sufficiently far along.
Author: Kyotaro Horiguchi <horikyotajntt@gmail.com>
Reviewed-by: Andrey Borodin <amborodin@acm.org>
Discussion: https://postgr.es/m/47A598F4-B4E7-4029-8FEC-A06A6C3CB4B5@yandex-team.ru
Discussion: https://postgr.es/m/20200515.090333.24867479329066911.horikyota.ntt
nbtree index scans with SAOP inequalities (but no SAOP equalities)
performed extra ORDER proc lookups for any remaining equality strategy
scan keys. This could waste cycles, and caused assertion failures.
Keeping around a separate ORDER proc is only necessary for a scan's
non-array/non-SAOP equality scan keys when the scan has at least one
other SAOP equality strategy key (a SAOP inequality shouldn't count).
To fix, replace _bt_preprocess_array_keys_final's assertion with a test
that makes the function return early when the scan has no SAOP equality
scan keys.
Oversight in commit 1b134ca5, which enhanced nbtree ScalarArrayOp
execution.
Reported-By: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/0539d3d3-a402-0a49-ed5e-26429dffc4bd@gmail.com
If the OpenLDAP installation directory is not found, set $setup to 0
so that the LDAP tests are skipped. The macOS checks were already
doing that, but the checks on other OS's were not. While we're at it,
improve the error message when the tests are skipped, to specify
whether the OS is supported at all, or if we just didn't find the
installation directory.
This was accidentally "working" without this, i.e. we were skipping
the tests if the OpenLDAP installation was not found, because of a bug
in the LdapServer test module: the END block clobbered the exit code
so if the script die()s before running the first subtest, the whole
test script was marked as SKIPped. The next commit will fix that bug,
but we need to fix the setup code first.
These checks should probably go into configure/meson, but this is
better than nothing and allows fixing the bug in the END block.
Backpatch to all supported versions.
Discussion: https://www.postgresql.org/message-id/fb898a70-3a88-4629-88e9-f2375020061d@iki.fi
29f6a959c added a bump allocator type for efficient compact allocations.
Here we make use of this for non-bounded tuplesorts to store tuples.
This is very space efficient when storing narrow tuples due to bump.c
not having chunk headers. This means we can fit more tuples in work_mem
before spilling to disk, or perform an in-memory sort touching fewer
cacheline.
Author: David Rowley
Reviewed-by: Nathan Bossart
Reviewed-by: Matthias van de Meent
Reviewed-by: Tomas Vondra
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com
This tracks the position of WAL that's been fully copied into WAL
buffers by all processes emitting WAL. (For some reason we call that
"WAL insertion"). This is updated using atomic monotonic advance during
WaitXLogInsertionsToFinish, which is not when the insertions actually
occur, but it's the only place where we know where have all the
insertions have completed.
This value is useful in WALReadFromBuffers, which can verify that
callers don't try to read past what has been inserted. (However, more
infrastructure is needed in order to actually use WAL after the flush
point, since it could be lost.)
The value is also useful in WaitXLogInsertionsToFinish() itself, since
we can now exit quickly when all WAL has been already inserted, without
even having to take any locks.
This introduces a bump MemoryContext type. The bump context is best
suited for short-lived memory contexts which require only allocations
of memory and never a pfree or repalloc, which are unsupported.
Memory palloc'd into a bump context has no chunk header. This makes
bump a useful context type when lots of small allocations need to be
done without any need to pfree those allocations. Allocation sizes are
rounded up to the next MAXALIGN boundary, so with this and no chunk
header, allocations are very compact indeed.
Allocations are also very fast as bump does not check any freelists to
try and make use of previously free'd chunks. It just checks if there
is enough room on the current block, and if so it bumps the freeptr
beyond this chunk and returns the value that the freeptr was previously
pointing to. Simple and fast. A new block is malloc'd when there's not
enough space in the current block.
Code using the bump allocator must take care never to call any functions
which could try to call realloc() (or variants), pfree(),
GetMemoryChunkContext() or GetMemoryChunkSpace() on a bump allocated
chunk. Due to lack of chunk headers, these operations are unsupported.
To increase the chances of catching such issues, when compiled with
MEMORY_CONTEXT_CHECKING, bump allocated chunks are given a header and
any attempt to perform an unsupported operation will result in an ERROR.
Without MEMORY_CONTEXT_CHECKING, code attempting an unsupported
operation could result in a segfault.
A follow-on commit will implement the first user of bump.
Author: David Rowley
Reviewed-by: Nathan Bossart
Reviewed-by: Matthias van de Meent
Reviewed-by: Tomas Vondra
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com
Reserve 4 bits for MemoryContextMethodID rather than 3. 3 bits did
technically allow a maximum of 8 memory context types, however, we've
opted to reserve some bit patterns which left us with only 4 slots, all
of which were used.
Here we add another bit which frees up 8 slots for future memory context
types.
In passing, adjust the enum names in MemoryContextMethodID to make it
more clear which ones can be used and which ones are reserved.
Author: Matthias van de Meent, David Rowley
Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com
Until now, when calling pq_putmessage to write new data to a libpq
socket, all writes are copied into a buffer and that buffer gets flushed
when full to avoid having to perform small writes to the socket.
There are cases where we must write large amounts of data to the socket,
sometimes larger than the size of the buffer. In this case, it's
wasteful to memcpy this data into the buffer and flush it out, instead,
we can send it directly from the memory location that the data is already
stored in.
Here we adjust internal_putbytes() so that after having just flushed the
buffer to the socket, if the remaining bytes to send is as big or bigger
than the buffer size, we just send directly rather than needlessly
copying into the PqSendBuffer buffer first.
Examples of operations that write large amounts of data in one message
are; outputting large tuples with SELECT or COPY TO STDOUT and
pg_basebackup.
Author: Melih Mutlu
Reviewed-by: Heikki Linnakangas
Reviewed-by: Jelte Fennema-Nio
Reviewed-by: David Rowley
Reviewed-by: Ranier Vilela
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CAGPVpCR15nosj0f6xe-c2h477zFR88q12e6WjEoEZc8ZYkTh3Q@mail.gmail.com
Until now, heapgetpage()'s loop over all tuples performed some conditional
checks for each tuple, even though condition did not change across the loop.
This commit fixes that by moving the loop into an inline function. By calling
it with different constant arguments, the compiler can generate an optimized
loop for the different conditions, at the price of two per-page checks.
For cases of all-visible tables and an isolation level other than
serializable, speedups of up to 25% have been measured.
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Zhang Mingli <zmlpostgres@gmail.com>
Tested-by: Quan Zongliang <quanzongliang@yeah.net>
Discussion: https://postgr.es/m/20230716015656.xjvemfbp5fysjiea@awork3.anarazel.de
Discussion: https://postgr.es/m/2ef7ff1b-3d18-2283-61b1-bbd25fc6c7ce@yeah.net
Commit 792752af4e added infrastructure for using AVX-512 intrinsic
functions, and this commit uses that infrastructure to optimize
visibilitymap_count(). Specificially, a new pg_popcount_masked()
function is introduced that applies a bitmask to every byte in the
buffer prior to calculating the population count, which is used to
filter out the all-visible or all-frozen bits as needed. Platforms
without AVX-512 support should also see a nice speedup due to the
reduced number of calls to a function pointer.
Co-authored-by: Ants Aasma
Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com
Presently, pg_popcount() processes data in 32-bit or 64-bit chunks
when possible. Newer hardware that supports AVX-512 instructions
can use 512-bit chunks, which provides a nice speedup, especially
for larger buffers. This commit introduces the infrastructure
required to detect compiler and CPU support for the required
AVX-512 intrinsic functions, and it adds a new pg_popcount()
implementation that uses these functions. If CPU support for this
optimized implementation is detected at runtime, a function pointer
is updated so that it is used by subsequent calls to pg_popcount().
Most of the existing in-tree calls to pg_popcount() should benefit
from these instructions, and calls with smaller buffers should at
least not regress compared to v16. The new infrastructure
introduced by this commit can also be used to optimize
visibilitymap_count(), but that is left for a follow-up commit.
Co-authored-by: Paul Amonson, Ants Aasma
Reviewed-by: Matthias van de Meent, Tom Lane, Noah Misch, Akash Shankaran, Alvaro Herrera, Andres Freund, David Rowley
Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com
When we determine that a wanted block can't be combined with the current
pending read, it's time to start that read to get it out of the way. An
"if" in that code path should have been a "while", because it might take
more than one go in case of partial reads. This was only broken for
smaller ranges, as the more common case of io_combine_limit-sized ranges
is handled earlier in the code and knows how to loop, hiding the bug for
a while.
Discovered while testing large parallel sequential scans of partially
cached tables. The ramp-up-and-down block allocator for parallel scans
could hit the problem case and skip some blocks near the end that should
have been streamed.
Defect in commit b5a9b18c.
Discussion: https://postgr.es/m/CA%2BhUKG%2Bh8Whpv0YsJqjMVkjYX%2B80fTVc6oi-V%2BzxJvykLpLHYQ%40mail.gmail.com
The buildfarm members using debug_parallel_query = regress are mostly
unhappy with this test. I guess what is happening is that rows
generated by a parallel worker are buffered, and might or might not
get to the leader before the expected error occurs. We did not see
any variability in the old version of this test because each FETCH
would succeed or fail atomically, leading to a predictable number of
rows emitted before failure. I don't find this to be a bug, just
unspecified behavior, so let's disable parallel query for this one
test case to make the results stable.
Formerly this was done with a cursor, which is problematic since
not all result-set-returning query types can be put into a cursor.
The new implementation is better integrated into other psql
features, too.
Daniel Vérité, reviewed by Laurenz Albe and myself (and whacked
around a bit by me, so any remaining bugs are my fault)
Discussion: https://postgr.es/m/CAKZiRmxsVTkO928CM+-ADvsMyePmU3L9DQCa9NwqjvLPcEe5QA@mail.gmail.com
This patch generalizes libpq's existing single-row mode to allow
individual partial-result PGresults to contain up to N rows, rather
than always one row. This reduces malloc overhead compared to plain
single-row mode, and it is very useful for psql's FETCH_COUNT feature,
since otherwise we'd have to add code (and cycles) to either merge
single-row PGresults into a bigger one or teach psql's
results-printing logic to accept arrays of PGresults.
To avoid API breakage, PQsetSingleRowMode() remains the same, and we
add a new function PQsetChunkedRowsMode() to invoke the more general
case. Also, PGresults obtained the old way continue to carry the
PGRES_SINGLE_TUPLE status code, while if PQsetChunkedRowsMode() is
used then their status code is PGRES_TUPLES_CHUNK. The underlying
logic is the same either way, though.
Daniel Vérité, reviewed by Laurenz Albe and myself (and whacked
around a bit by me, so any remaining bugs are my fault)
Discussion: https://postgr.es/m/CAKZiRmxsVTkO928CM+-ADvsMyePmU3L9DQCa9NwqjvLPcEe5QA@mail.gmail.com
BitmapAdjustPrefetchIterator() only used the blockno member of the
passed in TBMIterateResult to ensure that the prefetch iterator and
regular iterator stay in sync. Pass it the BlockNumber only, so that we
can move away from using the TBMIterateResult outside of table AM
specific code.
Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
As of 7c70996ebf, BitmapPrefetch() used the recheck flag for
the current block to determine whether or not it should skip prefetching
the proposed prefetch block. As explained in the comment, this assumed
the index AM will report the same recheck value for the future page as
it did for the current page - but there's no guarantee.
This only affects prefetching - if the recheck flag changes, we may
prefetch blocks unecessarily and not prefetch blocks that will be
needed. But we don't need to rely on that assumption - we know the
recheck flag for the block we're considering prefetching, so we can
use that.
The impact is very limited in practice - the opclass would need to
assign different recheck flags to different blocks, but none of the
built-in opclasses seems to do that.
Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Tom Lane
Discussion: https://postgr.es/m/1939305.1712415547%40sss.pgh.pa.us
Commit 7c70996ebf introduced an optimization to allow bitmap
scans to operate like index-only scans by not fetching a block from the
heap if none of the underlying data is needed and the block is marked
all visible in the visibility map.
With the introduction of table AMs, a FIXME was added to this code
indicating that the skip_fetch logic should be pushed into the table
AM-specific code, as not all table AMs may use a visibility map in the
same way.
This commit resolves this FIXME for the current block. The layering
violation is still present in BitmapHeapScans's prefetching code, which
uses the visibility map to decide whether or not to prefetch a block.
However, this can be addressed independently.
Author: Melanie Plageman
Reviewed-by: Andres Freund, Heikki Linnakangas, Tomas Vondra, Mark Dilger
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
This new DDL command splits a single partition into several parititions.
Just like ALTER TABLE ... MERGE PARTITIONS ... command, new patitions are
created using createPartitionTable() function with parent partition as the
template.
This commit comprises quite naive implementation which works in single process
and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the
operations including the tuple routing. This is why this new DDL command
can't be recommended for large partitioned tables under a high load. However,
this implementation come in handy in certain cases even as is.
Also, it could be used as a foundation for future implementations with lesser
locking and possibly parallel.
Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru
Author: Dmitry Koval
Reviewed-by: Matthias van de Meent, Laurenz Albe, Zhihong Yu, Justin Pryzby
Reviewed-by: Alvaro Herrera, Robert Haas, Stephane Tachoires