Run pgindent, pgperltidy, and reformat-dat-files.
This set of diffs is a bit larger than typical. We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop). We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up. Going
forward, that should make for fewer random-seeming changes to existing
code.
Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
In commit 9df8f903e I (tgl) switched join_is_removable() from
using the min relid sets of the join under consideration to
using its full syntactic relid sets. This was a mistake,
as it allowed join removal in cases where a reference to the
join output would survive in some syntactically-lower join
condition. Revert to the former coding.
Richard Guo
Discussion: https://postgr.es/m/CAMbWs4-EU9uBGSP7G-iTwLBhRQ=rnZKvFDhD+n+xhajokyPCKg@mail.gmail.com
The idea of EvalPlanQual is that we replace the query's scan of the
result relation with a single injected tuple, and see if we get a
tuple out, thereby implying that the injected tuple still passes the
query quals. (In join cases, other relations in the query are still
scanned normally.) This logic was not updated when commit 86dc90056
made it possible for a single DML query plan to have multiple result
relations, when the query target relation has inheritance or partition
children. We replaced the output for the current result relation
successfully, but other result relations were still scanned normally;
thus, if any other result relation contained a tuple satisfying the
quals, we'd think the EPQ check passed, even if it did not pass for
the injected tuple itself. This would lead to update or delete
actions getting performed when they should have been skipped due to
a conflicting concurrent update in READ COMMITTED isolation mode.
Fix by blocking all sibling result relations from emitting tuples
during an EvalPlanQual recheck. In the back branches, the fix is
complicated a bit by the need to not change the size of struct
EPQState (else we'd have ABI-breaking changes in offsets in
struct ModifyTableState). Like the back-patches of 3f7836ff6
and 4b3e37993, add a separately palloc'd struct to avoid that.
The logic is the same as in HEAD otherwise.
This is only a live bug back to v14 where 86dc90056 came in.
However, I chose to back-patch the test cases further, on the
grounds that this whole area is none too well tested. I skipped
doing so in v11 though because none of the test applied cleanly,
and it didn't quite seem worth extra work for a branch with only
six months to live.
Per report from Ante Krešić (via Aleksander Alekseev)
Discussion: https://postgr.es/m/CAJ7c6TMBTN3rcz4=AjYhLPD_w3FFT0Wq_C15jxCDn8U4tZnH1g@mail.gmail.com
Should a hash join exceed memory limit, the hashtable is split up into
multiple batches. The number of batches is doubled each time a given
batch is determined not to fit in memory. Each batch file is allocated
with a block-sized buffer for buffering tuples and parallel hash join
has additional sharedtuplestore accessor buffers.
In some pathological cases requiring a lot of batches, often with skewed
data, bad stats, or very large datasets, users can run out-of-memory
solely from the memory overhead of all the batch files' buffers.
Batch files were allocated in the ExecutorState memory context, making
it very hard to identify when this batch explosion was the source of an
OOM. This commit allocates the batch files in a dedicated memory
context, making it easier to identify the cause of an OOM and work to
avoid it.
Based on initial draft by Tomas Vondra, with significant reworks and
improvements by Jehan-Guillaume de Rorthais.
Author: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Author: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/20190421114618.z3mpgmimc3rmubi4@development
Discussion: https://postgr.es/m/20230504193006.1b5b9622%40karst#273020ff4061fc7a2fbb1ba96b281f17
Add a high level description of our implementation of the hybrid hash
join algorithm to the block comment in nodeHashjoin.c.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Discussion: https://postgr.es/m/20230516160051.4267a800%40karst
Most (older) comments in md.c and smgr.c are indented with a leading
tab on all lines, which isn't the current style and makes updating the
comments a bit annoying. This reindents all these lines with a single
space, as is the normal style. This issue exists in various shapes
throughout the code but it's pretty consistent here, and since there
is a patch pending to refresh some of the comments in these files, it
seems sensible to clean this up here separately.
Discussion: https://www.postgresql.org/message-id/flat/22fed8ba-01c3-2008-a256-4ea912d68fab%40enterprisedb.com
Non-leaf pages of GiST indexes contain key attributes, leaf pages
contain both key and non-key attributes, and gist_page_items() ignored
the handling of non-key attributes. This caused a few problems when
using gist_page_items() on a GiST index with INCLUDE:
- On a non-leaf page, the function would crash.
- On a leaf page, the function would work, but miss to display all the
values for included attributes.
This commit fixes gist_page_items() to handle such cases in a more
appropriate way, and now displays the values of key and non-key
attributes for each item separately in a style consistent with what
ruleutils.c would generate for the attribute list, depending on the page
type dealt with. In a way similar to how a record is displayed, values
would be double-quoted for key or non-key attributes if required.
ruleutils.c did not provide a routine able to control if non-key
attributes should be displayed, so an extended() routine for index
definitions is added to work around the leaf and non-leaf page
differences.
While on it, this commit fixes a third problem related to the amount of
data reported for key attributes. The code originally relied on
BuildIndexValueDescription() (used for error reports on constraints)
that would not print all the data stored in the index but the index
opclass's input type, so this limited the amount of information
available. This switch makes gist_page_items() much cheaper as there is
no need to run ACL checks for each item printed, which is not an issue
anyway as superuser rights are required to execute the functions of
pageinspect. Opclasses whose data cannot be displayed can rely on
gist_page_items_bytea().
The documentation of this function was slightly incorrect for the
output results generated on HEAD and v15, so adjust it on these
branches.
Author: Alexander Lakhin, Michael Paquier
Discussion: https://postgr.es/m/17884-cb8c326522977acb@postgresql.org
Backpatch-through: 14
BRIN indexes did not properly distinguish between summaries for empty
(no rows) and all-NULL ranges, treating them as essentially the same
thing. Summaries were initialized with allnulls=true, and opclasses
simply reset allnulls to false when processing the first non-NULL value.
This however produces incorrect results if the range starts with a NULL
value (or a sequence of NULL values), in which case we forget the range
contains NULL values when adding the first non-NULL value.
This happens because the allnulls flag is used for two separate
purposes - to mark empty ranges (not representing any rows yet) and
ranges containing only NULL values.
Opclasses don't know which of these cases it is, and so don't know
whether to set hasnulls=true. Setting the flag in both cases would make
it correct, but it would also make BRIN indexes useless for queries with
IS NULL clauses. All ranges start empty (and thus allnulls=true), so all
ranges would end up with either allnulls=true or hasnulls=true.
The severity of the issue is somewhat reduced by the fact that it only
happens when adding values to an existing summary with allnulls=true.
This can happen e.g. for small tables (because a summary for the first
range exists for all BRIN indexes), or for tables with large fraction of
NULL values in the indexed columns.
Bulk summarization (e.g. during CREATE INDEX or automatic summarization)
that processes all values at once is not affected by this issue. In this
case the flags were updated in a slightly different way, not forgetting
the NULL values.
To identify empty ranges we use a new flag, stored in an unused bit in
the BRIN tuple header so the on-disk format remains the same. A matching
flag is added to BrinMemTuple, into a 3B gap after bt_placeholder.
That means there's no risk of ABI breakage, although we don't actually
pass the BrinMemTuple to any public API.
We could also skip storing index tuples for empty summaries, but then
we'd have to always process such ranges - even if there are no rows in
large parts of the table (e.g. after a bulk DELETE), it would still
require reading the pages etc. So we store them, but ignore them when
building the bitmap.
Backpatch to 11. The issue exists since BRIN indexes were introduced in
9.5, but older releases are already EOL.
Backpatch-through: 11
Reviewed-by: Justin Pryzby, Matthias van de Meent, Alvaro Herrera
Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b@enterprisedb.com
When merging BRIN summaries, union_tuples() did not correctly update the
target hasnulls/allnulls flags. When merging all-NULL summary into a
summary without any NULL values, the result had both flags set to false
(instead of having hasnulls=true).
This happened because the code only considered the hasnulls flags,
ignoring the possibility the source summary has allnulls=true.
Discovered while investigating issues with handling empty BRIN ranges
and handling of NULL values, but it's a separate problem (has nothing to
do with empty ranges).
Fixed by considering both flags on the source summary, and updating the
hasnulls flag on the target summary.
Backpatch to 11. The bug exists since 9.5 (where BRIN indexes were
introduced), but those releases are EOL already.
Discussion: https://postgr.es/m/9d993d0d-e431-2196-9ccc-0554d0e60154%40enterprisedb.com
Pass it the RestrictInfo under consideration, not just the
clause_relids. This should save some trivial amount of
code at the call sites, and it gives us more flexibility
about what clause_is_computable_at() does. There's no
actual functional change here, though.
Discussion: https://postgr.es/m/3564467.1684352557@sss.pgh.pa.us
Fixes memory error in cases where the length of the language name
returned by uloc_getLanguage() is exactly ULOC_LANG_CAPACITY, in which
case the status is set to U_STRING_NOT_TERMINATED_WARNING.
Also check in call sites for other ICU functions that are expected to
return a C string to be safe (no bug is known at these other call
sites).
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/2098874d-c111-41e4-9063-30bcf135226b@gmail.com
28e626bde0 added the concept of IOOps but neglected to include writeback
operations. ac8d53dae5 added time spent doing these I/O operations. Without
counting writeback, checkpointer write time in the log often differed
substantially from that in pg_stat_io. To fix this, add IOOp IOOP_WRITEBACK
and track writeback in pg_stat_io.
Bumps catversion.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20230419172326.dhgyo4wrrhulovt6%40awork3.anarazel.de
For clarity of review, renaming the function parameter "context" in
ScheduleBufferTagForWriteback() and IssuePendingWritebacks() to
"wb_context" is a separate commit. The next commit adds an "io_context"
parameter and "wb_context" makes it more clear which is which.
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_acc6iL4M3hvOTeztf_ZPpsB3Pqio5aVHgZ5q=Pi3BZKg@mail.gmail.com
After applying outer-join identity 3 in the forward direction,
it was possible for the planner to mistakenly apply a qual clause
from above the two outer joins at the now-lower join level.
This can give the wrong answer, since a value that would get nulled
by the now-upper join might not yet be null.
To fix, when we perform such a transformation, consider that the
now-lower join hasn't really completed the outer join it's nominally
responsible for and thus its relid set should not include that OJ's
relid (nor should its output Vars have that nullingrel bit set).
Instead we add those bits when the now-upper join is performed.
The existing rules for qual placement then suffice to prevent
higher qual clauses from dropping below the now-upper join.
There are a few complications from needing to consider transitive
closures in case multiple pushdowns have happened, but all in all
it's not a very complex patch.
This is all new logic (from 2489d76c4) so no need to back-patch.
The added test cases all have the same results as in v15.
Tom Lane and Richard Guo
Discussion: https://postgr.es/m/0b819232-4b50-f245-1c7d-c8c61bf41827@postgrespro.ru
It seems like the code that these checks are backstopping may have
a few bugs left in it. Use a test-and-elog so that the tests are
performed even in non-assert builds, and so that we get something
more informative than "server closed the connection" on failure.
Committed separately with the idea that eventually we'll revert
this. It might be awhile though.
Discussion: https://postgr.es/m/3014965.1684293045@sss.pgh.pa.us
This is equivalent to a revert of f193883 and fb32748, with the addition
that the declaration of the SQLValueFunction node needs to gain a couple
of node_attr for query jumbling. The performance impact of removing the
function call inlining is proving to be too huge for some workloads
where these are used. A worst-case test case of involving only simple
SELECT queries with a SQL keyword is proving to lead to a reduction of
10% in TPS via pgbench and prepared queries on a high-end machine.
None of the tests I ran back for this set of changes saw such a huge
gap, but Alexander Lakhin and Andres Freund have found that this can be
noticeable. Keeping the older performance would mean to do more
inlining in the executor when using COERCE_SQL_SYNTAX for a function
expression, similarly to what SQLValueFunction does. This requires more
redesign work and there is little time until 16beta1 is released, so for
now reverting the change is the best way forward, bringing back the
previous performance.
Bump catalog version.
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/b32bed1b-0746-9b20-1472-4bdc9ca66d52@gmail.com
The problem that these messages protect against can only occur because
a corrupted hash spill file was written, i.e., a Postgres bug. There's
no reason to have them as translatable.
Backpatch to 15, where these messages were changed by commit c4649cce39.
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/20230510175407.dwa5v477pw62ikyx@alvherre.pgsql
Commit a73952b795 (new in 16) required default values in guc_table.c
and C variable initializers to match. This one only matched when
XLOG_BLCKSZ == 8kB. Fix by using the same expression in both places
with a new DEFAULT_XXX macro, as done for other GUCs.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGLNmLV=VrT==5MqnbARgx2ifRSFtdd8ofdfrdSLL3yv5A@mail.gmail.com
Give the new GUC introduced by d4e71df6 a name that is clearly not
intended for mainstream use quite yet.
Future proposals would drop the prefix only after adding infrastructure
to make it efficient. Having the switch in the tree sooner is good
because it might lead to new discoveries about the hazards awaiting us
on a wide range of systems, but that name was too enticing and could
lead to cross-version confusion in future, per complaints from Noah and
Justin.
Suggested-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> (the idea, not the patch)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (ditto)
Discussion: https://postgr.es/m/20230430041106.GA2268796%40rfd.leadboat.com
I've had a bee in my bonnet for some time about getting rid of
RestrictInfo.is_pushed_down, because it's squishily defined and
requires not-inexpensive extra tests to use (cf RINFO_IS_PUSHED_DOWN).
In commit 2489d76c4, I tried to make remove_rel_from_query() not
depend on that macro; but the replacement test is buggy,
as exposed by a report from Rushabh Lathia and Robert Haas.
That change was pretty incidental to the main goal of 2489d76c4,
so let's just revert it for now. (Getting rid of is_pushed_down
is still far away, anyway.)
Discussion: https://postgr.es/m/CA+TgmoYco=hmg+iX1CW9Y1_CzNoSL81J03wUG-d2_3=rue+L2A@mail.gmail.com
There was some odd wording in corner-case gram.y error messages "some
error ... at or near", which appears to have been modeled after "syntax
error" messages. However, they don't work that way, and they're just
wrong. They're also uncovered by tests. Remove the trailing words,
and also add tests.
They were introduced with 5a2832465fd8; backpatch to 15.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
An update of the GUC stats_fetch_consistency in a transaction would be
able to trigger an assertion when doing cache->snapshot. In this case,
when retrieving a pgstat entry after the switch, a new snapshot would be
rebuilt, confusing pgstat_build_snapshot() because a snapshot is already
cached with an unexpected mode ("cache").
In order to fix this problem, this commit adds a flag to force a
snapshot clear each time this GUC is changed. Some tests are added to
check, while on it.
Some optimizations in avoiding the snapshot clear should be possible
depending on what is cached and the current GUC value, I guess, but this
solution is simple, and ensures that the state of the cache is updated
each time a new pgstat entry is fetched, hence being consistent with the
level wanted by the client that has set the GUC.
Note that cache->none and snapshot->none would not cause issues, as
fetching a pgstat entry would be retrieved from shared memory on the
second attempt, however a snapshot would still be cached. Similarly,
none->snapshot and none->cache would build a new snapshot on the second
fetch attempt. Finally, snapshot->cache would cache a new snapshot on
the second attempt.
Reported-by: Alexander Lakhin
Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/17804-2a118cd046f2d0e5@postgresql.org
backpatch-through: 15
The callback function pa_shutdown() accesses MyLogicalRepWorker which may
not be initialized if there is an error during the initialization of the
parallel apply worker. The other problem is that by the time it is invoked
even after the initialization of the worker, the MyLogicalRepWorker will
be reset by another callback logicalrep_worker_onexit. So, it won't have
the required information.
To fix this, register the shutdown callback after we are attached to the
worker slot.
After this fix, we observed another issue which is that sometimes the
leader apply worker tries to receive the message from the error queue that
might already be detached by the parallel apply worker leading to an
error. To prevent such an error, we ensure that the leader apply worker
detaches from the parallel apply worker's error queue before stopping it.
Reported-by: Sawada Masahiko
Author: Hou Zhijie
Reviewed-by: Sawada Masahiko, Amit Kapila
Discussion: https://postgr.es/m/CAD21AoDo+yUwNq6nTrvE2h9bB2vZfcag=jxWc7QxuWCmkDAqcA@mail.gmail.com
If an SRF in the FROM clause references a table having row-level
security policies, and we inline that SRF into the calling query,
we neglected to mark the plan as potentially dependent on which
role is executing it. This could lead to later executions in the
same session returning or hiding rows that should have been hidden
or returned instead.
Our thanks to Wolfgang Walther for reporting this problem.
Stephen Frost and Tom Lane
Security: CVE-2023-2455
The two methods don't cooperate, so set_config_option("search_path",
...) has been ineffective under non-empty overrideStack. This defect
enabled an attacker having database-level CREATE privilege to execute
arbitrary code as the bootstrap superuser. While that particular attack
requires v13+ for the trusted extension attribute, other attacks are
feasible in all supported versions.
Standardize on the combination of NewGUCNestLevel() and
set_config_option("search_path", ...). It is newer than
PushOverrideSearchPath(), more-prevalent, and has no known
disadvantages. The "override" mechanism remains for now, for
compatibility with out-of-tree code. Users should update such code,
which likely suffers from the same sort of vulnerability closed here.
Back-patch to v11 (all supported versions).
Alexander Lakhin. Reported by Alexander Lakhin.
Security: CVE-2023-2454
This wait event was documented as "CommitTsBuffer" since its
introduction, but the code named it "CommitTSBuffer". This commit fixes
the code to follow the term documented, which is also more consistent
with the naming of the other wait events used for commit timestamps.
Introduced by 5da1493.
Author: Alexander Lakhin
Discussion: https://postgr.es/m/e8c38840-596a-83d6-bd8d-cebc51111572@gmail.com
Backpatch-through: 13
RI_Initial_Check was setting up a list of RTEPermissionInfo for
ExecCheckPermissions() wrong, and the problem is subtle enough that it
doesn't have any immediate effect in core code. However, if an
extension is using the ExecutorCheckPerms_hook, then it would get the
wrong parameters and perhaps arrive at a wrong conclusion, or outright
malfunction. Fix by constructing that list and the RTE list more
honestly.
We also add an assertion check to verify that these lists match. This
new assertion would have caught this bug.
Co-authored-by: Олег Целебровский (Oleg Tselebrovskii) <o.tselebrovskiy@postgrespro.ru>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/3722b7a2cbe27a1796ee40824bd86dd1@postgrespro.ru
These functions incautiously fetched the array's first lower bound
even when the array is zero-dimensional, thus fetching the word
after the allocated array space. While almost always harmless,
with very bad luck this could result in SIGSEGV. Fix by adding
an early exit for empty input.
Per bug #17920 from Alexander Lakhin.
Discussion: https://postgr.es/m/17920-f7c228c627b6d02e%40postgresql.org
During exit, the logical replication apply worker tries to release session
level locks, if any. However, if the apply worker exits due to an error
before its connection is initialized, trying to release locks can lead to
assertion failure. The locks will be acquired once the worker is
initialized, so we don't need to release them till the worker
initialization is complete.
Reported-by: Alexander Lakhin
Author: Hou Zhijie based on inputs from Sawada Masahiko and Amit Kapila
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/2185d65f-5aae-3efa-c48f-fb42b173ef5c@gmail.com
CREATE SCHEMA AUTHORIZATION with appended schema elements can lead to
crashes when comparing the schema name of the query with the schemas
used in the qualification of some clauses in the elements' queries.
The origin of the problem is that the transformation routine for the
elements listed in a CREATE SCHEMA query uses as new, expected, schema
name the one listed in CreateSchemaStmt itself. However, depending on
the query, CreateSchemaStmt.schemaname may be NULL, being computed
instead from the role specification of the query given by the
AUTHORIZATION clause, that could be either:
- A user name string, with the new schema name being set to the same
value as the role given.
- Guessed from CURRENT_ROLE, SESSION_ROLE or CURRENT_ROLE, with a new
schema name computed from the security context where CREATE SCHEMA is
running.
Regression tests are added for CREATE SCHEMA with some appended elements
(some of them with schema qualifications), covering also some role
specification patterns.
While on it, this simplifies the context structure used during the
transformation of the elements listed in a CREATE SCHEMA query by
removing the fields for the role specification and the role type. They
were not used, and for the role specification this could be confusing as
the schema name may by extracted from that at the beginning of
CreateSchemaCommand().
This issue exists for a long time, so backpatch down to all the versions
supported.
Reported-by: Song Hongyu
Author: Michael Paquier
Reviewed-by: Richard Guo
Discussion: https://postgr.es/m/17909-f65c12dfc5f0451d@postgresql.org
Backpatch-through: 11
Commit 7d71d3dd08 changed resetting the VacuumFailsafeActive flag to an
assertion since the flag is reset before starting vacuuming a relation.
This however failed to take recursive calls of vacuum_rel() and vacuum
of TOAST tables into consideration. Fix by reverting back to resettting
the flag.
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/CAFBsxsFz=GqaG5Ens5aNgVYoV2Y+pfMUijX0ku+CCkWfALwiqg@mail.gmail.com
The call to XLogGetReplicationSlotMinimumLSN() might return a
greater LSN than the one given to the function. Subsequent segment
number calculations might then underflow, which could result in
unexpected behavior when removing or recyling WAL files. This was
introduced with max_slot_wal_keep_size in c655077639. To fix, skip
the block of code for replication slots if the LSN is greater.
Reported-by: Xu Xingwang
Author: Kyotaro Horiguchi
Reviewed-by: Junwang Zhao
Discussion: https://postgr.es/m/17903-4288d439dee856c6%40postgresql.org
Backpatch-through: 13