Commit Graph

10961 Commits

Author SHA1 Message Date
Tom Lane 991a3df227 Fix filtering of "cloned" outer-join quals some more.
We've had multiple issues with the clause_is_computable_at logic that
I introduced in 2489d76c4: it's been known to accept more than one
clone of the same qual at the same plan node, and also to accept no
clones at all.  It's looking impractical to get it 100% right on the
basis of the currently-stored information, so fix it by introducing a
new RestrictInfo field "incompatible_relids" that explicitly shows
which outer joins a given clone mustn't be pushed above.

In principle we could populate this field in every RestrictInfo, but
that would cost space and there doesn't presently seem to be a need
for it in general.  Also, while deconstruct_distribute_oj_quals can
easily fill the field with the remaining members of the commutative
join set that it's considering, computing it in the general case
seems again pretty complicated.  So for now, just fill it for
clone quals.

Along the way, fix a bug that may or may not be only latent:
equivclass.c was generating replacement clauses with is_pushed_down
and has_clone/is_clone markings that didn't match their
required_relids.  This led me to conclude that leaving the clone flags
out of make_restrictinfo's purview wasn't such a great idea after all,
so add them.

Per report from Richard Guo.

Discussion: https://postgr.es/m/CAMbWs48EYi_9-pSd0ORes1kTmTeAjT4Q3gu49hJtYCbSn2JyeA@mail.gmail.com
2023-05-25 10:28:33 -04:00
Andres Freund bc971f4025 Optimize walsender wake up logic using condition variables
WalSndWakeup() currently loops through all the walsenders slots, with a
spinlock acquisition and release for every iteration, to wake up waiting
walsenders.

This commonly was not a problem before e101dfac3a. But, to allow logical
decoding on standbys, we need to wake up logical walsenders after every WAL
record is applied on the standby, rather just when flushing WAL or switching
timelines.  This causes a performance regression for workloads replaying a lot
of WAL records.

To solve this, we use condition variable (CV) to efficiently wake up
walsenders in WalSndWakeup().

Every walsender prepares to sleep on a shared memory CV. Note that it just
prepares to sleep on the CV (i.e., adds itself to the CV's waitlist), but does
not actually wait on the CV (IOW, it never calls ConditionVariableSleep()). It
still uses WaitEventSetWait() for waiting, because CV infrastructure doesn't
handle FeBe socket events currently. The processes (startup process,
walreceiver etc.)  wanting to wake up walsenders use
ConditionVariableBroadcast(), which in turn calls SetLatch(), helping
walsenders come out of WaitEventSetWait().

We use separate shared memory CVs for physical and logical walsenders for
selective wake ups, see WalSndWakeup() for more details.

This approach is simple and reasonably efficient. But not very elegant. But
for 16 it seems to be a better path than a larger redesign of the CV
mechanism.  A desirable future improvement would be to add support for CVs
into WaitEventSetWait().

This still leaves us with a small regression in very extreme workloads (due to
the spinlock acquisition in ConditionVariableBroadcast() when there are no
waiters) - but that seems acceptable.

Reported-by: Andres Freund <andres@anarazel.de>
Suggested-by: Andres Freund <andres@anarazel.de>
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Discussion: https://www.postgresql.org/message-id/20230509190247.3rrplhdgem6su6cg%40awork3.anarazel.de
2023-05-21 09:44:55 -07:00
Tom Lane a2eb99a01e Expand some more uses of "deleg" to "delegation" or "delegated".
Complete the task begun in 9c0a0e2ed: we don't want to use the
abbreviation "deleg" for GSS delegation in any user-visible places.
(For consistency, this also changes most internal uses too.)

Abhijit Menon-Sen and Tom Lane

Discussion: https://postgr.es/m/949048.1684639317@sss.pgh.pa.us
2023-05-21 10:55:18 -04:00
Bruce Momjian 9c0a0e2ed9 rename "gss_accept_deleg" to "gss_accept_delegation".
This is more consistent with existing GUC spelling.

Discussion: https://postgr.es/m/ZGdnEsGtNj7+fZoa@momjian.us
2023-05-20 21:32:54 -04:00
Tom Lane 0245f8db36 Pre-beta mechanical code beautification.
Run pgindent, pgperltidy, and reformat-dat-files.

This set of diffs is a bit larger than typical.  We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop).  We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up.  Going
forward, that should make for fewer random-seeming changes to existing
code.

Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
2023-05-19 17:24:48 -04:00
Tom Lane 722541ead1 Do pre-release housekeeping on catalog data.
Run renumber_oids.pl to move high-numbered OIDs down, as per pre-beta
tasks specified by RELEASE_CHANGES.  For reference, the command was

./renumber_oids.pl --first-mapped-oid 8000 --target-oid 6200
2023-05-19 16:36:38 -04:00
Tom Lane 70b42f2790 Fix misbehavior of EvalPlanQual checks with multiple result relations.
The idea of EvalPlanQual is that we replace the query's scan of the
result relation with a single injected tuple, and see if we get a
tuple out, thereby implying that the injected tuple still passes the
query quals.  (In join cases, other relations in the query are still
scanned normally.)  This logic was not updated when commit 86dc90056
made it possible for a single DML query plan to have multiple result
relations, when the query target relation has inheritance or partition
children.  We replaced the output for the current result relation
successfully, but other result relations were still scanned normally;
thus, if any other result relation contained a tuple satisfying the
quals, we'd think the EPQ check passed, even if it did not pass for
the injected tuple itself.  This would lead to update or delete
actions getting performed when they should have been skipped due to
a conflicting concurrent update in READ COMMITTED isolation mode.

Fix by blocking all sibling result relations from emitting tuples
during an EvalPlanQual recheck.  In the back branches, the fix is
complicated a bit by the need to not change the size of struct
EPQState (else we'd have ABI-breaking changes in offsets in
struct ModifyTableState).  Like the back-patches of 3f7836ff6
and 4b3e37993, add a separately palloc'd struct to avoid that.
The logic is the same as in HEAD otherwise.

This is only a live bug back to v14 where 86dc90056 came in.
However, I chose to back-patch the test cases further, on the
grounds that this whole area is none too well tested.  I skipped
doing so in v11 though because none of the test applied cleanly,
and it didn't quite seem worth extra work for a branch with only
six months to live.

Per report from Ante Krešić (via Aleksander Alekseev)

Discussion: https://postgr.es/m/CAJ7c6TMBTN3rcz4=AjYhLPD_w3FFT0Wq_C15jxCDn8U4tZnH1g@mail.gmail.com
2023-05-19 14:26:40 -04:00
Tomas Vondra 8c4040edf4 Allocate hash join files in a separate memory context
Should a hash join exceed memory limit, the hashtable is split up into
multiple batches. The number of batches is doubled each time a given
batch is determined not to fit in memory. Each batch file is allocated
with a block-sized buffer for buffering tuples and parallel hash join
has additional sharedtuplestore accessor buffers.

In some pathological cases requiring a lot of batches, often with skewed
data, bad stats, or very large datasets, users can run out-of-memory
solely from the memory overhead of all the batch files' buffers.

Batch files were allocated in the ExecutorState memory context, making
it very hard to identify when this batch explosion was the source of an
OOM. This commit allocates the batch files in a dedicated memory
context, making it easier to identify the cause of an OOM and work to
avoid it.

Based on initial draft by Tomas Vondra, with significant reworks and
improvements by Jehan-Guillaume de Rorthais.

Author: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Author: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by:  Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/20190421114618.z3mpgmimc3rmubi4@development
Discussion: https://postgr.es/m/20230504193006.1b5b9622%40karst#273020ff4061fc7a2fbb1ba96b281f17
2023-05-19 17:17:58 +02:00
Peter Eisentraut 803b4a26ca Remove stray mid-sentence tabs in comments 2023-05-19 16:13:16 +02:00
Michael Paquier e7bff46e50 pageinspect: Fix gist_page_items() with included columns
Non-leaf pages of GiST indexes contain key attributes, leaf pages
contain both key and non-key attributes, and gist_page_items() ignored
the handling of non-key attributes.  This caused a few problems when
using gist_page_items() on a GiST index with INCLUDE:
- On a non-leaf page, the function would crash.
- On a leaf page, the function would work, but miss to display all the
values for included attributes.

This commit fixes gist_page_items() to handle such cases in a more
appropriate way, and now displays the values of key and non-key
attributes for each item separately in a style consistent with what
ruleutils.c would generate for the attribute list, depending on the page
type dealt with.  In a way similar to how a record is displayed, values
would be double-quoted for key or non-key attributes if required.

ruleutils.c did not provide a routine able to control if non-key
attributes should be displayed, so an extended() routine for index
definitions is added to work around the leaf and non-leaf page
differences.

While on it, this commit fixes a third problem related to the amount of
data reported for key attributes.  The code originally relied on
BuildIndexValueDescription() (used for error reports on constraints)
that would not print all the data stored in the index but the index
opclass's input type, so this limited the amount of information
available.  This switch makes gist_page_items() much cheaper as there is
no need to run ACL checks for each item printed, which is not an issue
anyway as superuser rights are required to execute the functions of
pageinspect.  Opclasses whose data cannot be displayed can rely on
gist_page_items_bytea().

The documentation of this function was slightly incorrect for the
output results generated on HEAD and v15, so adjust it on these
branches.

Author: Alexander Lakhin, Michael Paquier
Discussion: https://postgr.es/m/17884-cb8c326522977acb@postgresql.org
Backpatch-through: 14
2023-05-19 12:37:58 +09:00
Tomas Vondra 3581cbdcd6 Fix handling of empty ranges and NULLs in BRIN
BRIN indexes did not properly distinguish between summaries for empty
(no rows) and all-NULL ranges, treating them as essentially the same
thing. Summaries were initialized with allnulls=true, and opclasses
simply reset allnulls to false when processing the first non-NULL value.
This however produces incorrect results if the range starts with a NULL
value (or a sequence of NULL values), in which case we forget the range
contains NULL values when adding the first non-NULL value.

This happens because the allnulls flag is used for two separate
purposes - to mark empty ranges (not representing any rows yet) and
ranges containing only NULL values.

Opclasses don't know which of these cases it is, and so don't know
whether to set hasnulls=true. Setting the flag in both cases would make
it correct, but it would also make BRIN indexes useless for queries with
IS NULL clauses. All ranges start empty (and thus allnulls=true), so all
ranges would end up with either allnulls=true or hasnulls=true.

The severity of the issue is somewhat reduced by the fact that it only
happens when adding values to an existing summary with allnulls=true.
This can happen e.g. for small tables (because a summary for the first
range exists for all BRIN indexes), or for tables with large fraction of
NULL values in the indexed columns.

Bulk summarization (e.g. during CREATE INDEX or automatic summarization)
that processes all values at once is not affected by this issue. In this
case the flags were updated in a slightly different way, not forgetting
the NULL values.

To identify empty ranges we use a new flag, stored in an unused bit in
the BRIN tuple header so the on-disk format remains the same. A matching
flag is added to BrinMemTuple, into a 3B gap after bt_placeholder.
That means there's no risk of ABI breakage, although we don't actually
pass the BrinMemTuple to any public API.

We could also skip storing index tuples for empty summaries, but then
we'd have to always process such ranges - even if there are no rows in
large parts of the table (e.g. after a bulk DELETE), it would still
require reading the pages etc. So we store them, but ignore them when
building the bitmap.

Backpatch to 11. The issue exists since BRIN indexes were introduced in
9.5, but older releases are already EOL.

Backpatch-through: 11
Reviewed-by: Justin Pryzby, Matthias van de Meent, Alvaro Herrera
Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b@enterprisedb.com
2023-05-19 01:29:44 +02:00
Tom Lane 8a2523ff35 Tweak API of new function clause_is_computable_at().
Pass it the RestrictInfo under consideration, not just the
clause_relids.  This should save some trivial amount of
code at the call sites, and it gives us more flexibility
about what clause_is_computable_at() does.  There's no
actual functional change here, though.

Discussion: https://postgr.es/m/3564467.1684352557@sss.pgh.pa.us
2023-05-18 10:39:16 -04:00
Andres Freund 093e5c57d5 Add writeback to pg_stat_io
28e626bde0 added the concept of IOOps but neglected to include writeback
operations. ac8d53dae5 added time spent doing these I/O operations. Without
counting writeback, checkpointer write time in the log often differed
substantially from that in pg_stat_io. To fix this, add IOOp IOOP_WRITEBACK
and track writeback in pg_stat_io.

Bumps catversion.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20230419172326.dhgyo4wrrhulovt6%40awork3.anarazel.de
2023-05-17 11:18:35 -07:00
Andres Freund 52676dc2e0 Update parameter name context to wb_context
For clarity of review, renaming the function parameter "context" in
ScheduleBufferTagForWriteback() and IssuePendingWritebacks() to
"wb_context" is a separate commit. The next commit adds an "io_context"
parameter and "wb_context" makes it more clear which is which.

Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_acc6iL4M3hvOTeztf_ZPpsB3Pqio5aVHgZ5q=Pi3BZKg@mail.gmail.com
2023-05-17 11:18:30 -07:00
Alexander Korotkov b9a7a82272 Revert "Add USER SET parameter values for pg_db_role_setting"
This reverts commit 096dd80f3c and its fixups beecbe8e50, afdd9f7f0e,
529da086ba, db93e739ac.

Catversion is bumped.

Discussion: https://postgr.es/m/d46f9265-ff3c-6743-2278-6772598233c2%40pgmasters.net
2023-05-17 20:28:57 +03:00
Tom Lane 9df8f903eb Fix some issues with improper placement of outer join clauses.
After applying outer-join identity 3 in the forward direction,
it was possible for the planner to mistakenly apply a qual clause
from above the two outer joins at the now-lower join level.
This can give the wrong answer, since a value that would get nulled
by the now-upper join might not yet be null.

To fix, when we perform such a transformation, consider that the
now-lower join hasn't really completed the outer join it's nominally
responsible for and thus its relid set should not include that OJ's
relid (nor should its output Vars have that nullingrel bit set).
Instead we add those bits when the now-upper join is performed.
The existing rules for qual placement then suffice to prevent
higher qual clauses from dropping below the now-upper join.
There are a few complications from needing to consider transitive
closures in case multiple pushdowns have happened, but all in all
it's not a very complex patch.

This is all new logic (from 2489d76c4) so no need to back-patch.
The added test cases all have the same results as in v15.

Tom Lane and Richard Guo

Discussion: https://postgr.es/m/0b819232-4b50-f245-1c7d-c8c61bf41827@postgrespro.ru
2023-05-17 11:14:04 -04:00
Michael Paquier d8c3106bb6 Add back SQLValueFunction for SQL keywords
This is equivalent to a revert of f193883 and fb32748, with the addition
that the declaration of the SQLValueFunction node needs to gain a couple
of node_attr for query jumbling.  The performance impact of removing the
function call inlining is proving to be too huge for some workloads
where these are used.  A worst-case test case of involving only simple
SELECT queries with a SQL keyword is proving to lead to a reduction of
10% in TPS via pgbench and prepared queries on a high-end machine.

None of the tests I ran back for this set of changes saw such a huge
gap, but Alexander Lakhin and Andres Freund have found that this can be
noticeable.  Keeping the older performance would mean to do more
inlining in the executor when using COERCE_SQL_SYNTAX for a function
expression, similarly to what SQLValueFunction does.  This requires more
redesign work and there is little time until 16beta1 is released, so for
now reverting the change is the best way forward, bringing back the
previous performance.

Bump catalog version.

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/b32bed1b-0746-9b20-1472-4bdc9ca66d52@gmail.com
2023-05-17 10:19:17 +09:00
Thomas Munro 63932a6d38 Fix wal_writer_flush_after initializer value.
Commit a73952b795 (new in 16) required default values in guc_table.c
and C variable initializers to match.  This one only matched when
XLOG_BLCKSZ == 8kB.  Fix by using the same expression in both places
with a new DEFAULT_XXX macro, as done for other GUCs.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGLNmLV=VrT==5MqnbARgx2ifRSFtdd8ofdfrdSLL3yv5A@mail.gmail.com
2023-05-15 11:19:54 +12:00
Peter Eisentraut e32701b8d2 initdb: Set collversion for standard collation UNICODE
Since the behavior of the UNICODE collation can change with new
ICU/Unicode versions, we need to apply the versioning mechanism to it.
We do this with an UPDATE command in initdb; this is similar to how we
put the collation version into pg_database already.

Reported-by: Daniel Verite <daniel@manitou-mail.org>
Discussion: https://www.postgresql.org/message-id/49417853-7bdd-4b23-a4e9-04c7aff33821@manitou-mail.org
2023-05-12 10:03:05 +02:00
Michael Paquier 605994651b Fix assertion failure when updating stats_fetch_consistency in a transaction
An update of the GUC stats_fetch_consistency in a transaction would be
able to trigger an assertion when doing cache->snapshot.  In this case,
when retrieving a pgstat entry after the switch, a new snapshot would be
rebuilt, confusing pgstat_build_snapshot() because a snapshot is already
cached with an unexpected mode ("cache").

In order to fix this problem, this commit adds a flag to force a
snapshot clear each time this GUC is changed.  Some tests are added to
check, while on it.

Some optimizations in avoiding the snapshot clear should be possible
depending on what is cached and the current GUC value, I guess, but this
solution is simple, and ensures that the state of the cache is updated
each time a new pgstat entry is fetched, hence being consistent with the
level wanted by the client that has set the GUC.

Note that cache->none and snapshot->none would not cause issues, as
fetching a pgstat entry would be retrieved from shared memory on the
second attempt, however a snapshot would still be cached.  Similarly,
none->snapshot and none->cache would build a new snapshot on the second
fetch attempt.  Finally, snapshot->cache would cache a new snapshot on
the second attempt.

Reported-by: Alexander Lakhin
Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/17804-2a118cd046f2d0e5@postgresql.org
backpatch-through: 15
2023-05-10 11:24:30 +09:00
Amit Kapila 3d144c6c86 Fix invalid memory access during the shutdown of the parallel apply worker.
The callback function pa_shutdown() accesses MyLogicalRepWorker which may
not be initialized if there is an error during the initialization of the
parallel apply worker. The other problem is that by the time it is invoked
even after the initialization of the worker, the MyLogicalRepWorker will
be reset by another callback logicalrep_worker_onexit. So, it won't have
the required information.

To fix this, register the shutdown callback after we are attached to the
worker slot.

After this fix, we observed another issue which is that sometimes the
leader apply worker tries to receive the message from the error queue that
might already be detached by the parallel apply worker leading to an
error. To prevent such an error, we ensure that the leader apply worker
detaches from the parallel apply worker's error queue before stopping it.

Reported-by: Sawada Masahiko
Author: Hou Zhijie
Reviewed-by: Sawada Masahiko, Amit Kapila
Discussion: https://postgr.es/m/CAD21AoDo+yUwNq6nTrvE2h9bB2vZfcag=jxWc7QxuWCmkDAqcA@mail.gmail.com
2023-05-09 09:28:06 +05:30
Alvaro Herrera 5472743d9e
Revert "Move PartitionPruneInfo out of plan nodes into PlannedStmt"
This reverts commit ec38694894 and its fixup 589bb81649.

This change was intended to support query planning avoiding acquisition
of locks on partitions that were going to be pruned; however, the
overall project took a different direction at [1] and this bit is no
longer needed.  Put things back the way they were as agreed in [2], to
avoid unnecessary complexity.

Discussion: [1] https://postgr.es/m/4191508.1674157166@sss.pgh.pa.us
Discussion: [2] https://postgr.es/m/20230502175409.kcoirxczpdha26wt@alvherre.pgsql
2023-05-04 12:09:59 +02:00
Amit Kapila de63f8dade Fix assertion failure in apply worker.
During exit, the logical replication apply worker tries to release session
level locks, if any. However, if the apply worker exits due to an error
before its connection is initialized, trying to release locks can lead to
assertion failure. The locks will be acquired once the worker is
initialized, so we don't need to release them till the worker
initialization is complete.

Reported-by: Alexander Lakhin
Author: Hou Zhijie based on inputs from Sawada Masahiko and Amit Kapila
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/2185d65f-5aae-3efa-c48f-fb42b173ef5c@gmail.com
2023-05-03 10:17:49 +05:30
Michael Paquier 8961cb9a03 Fix typos in comments
The changes done in this commit impact comments with no direct
user-visible changes, with fixes for incorrect function, variable or
structure names.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/e8c38840-596a-83d6-bd8d-cebc51111572@gmail.com
2023-05-02 12:23:08 +09:00
Michael Paquier 4dadd660f0 Fix crashes with CREATE SCHEMA AUTHORIZATION and schema elements
CREATE SCHEMA AUTHORIZATION with appended schema elements can lead to
crashes when comparing the schema name of the query with the schemas
used in the qualification of some clauses in the elements' queries.

The origin of the problem is that the transformation routine for the
elements listed in a CREATE SCHEMA query uses as new, expected, schema
name the one listed in CreateSchemaStmt itself.  However, depending on
the query, CreateSchemaStmt.schemaname may be NULL, being computed
instead from the role specification of the query given by the
AUTHORIZATION clause, that could be either:
- A user name string, with the new schema name being set to the same
value as the role given.
- Guessed from CURRENT_ROLE, SESSION_ROLE or CURRENT_ROLE, with a new
schema name computed from the security context where CREATE SCHEMA is
running.

Regression tests are added for CREATE SCHEMA with some appended elements
(some of them with schema qualifications), covering also some role
specification patterns.

While on it, this simplifies the context structure used during the
transformation of the elements listed in a CREATE SCHEMA query by
removing the fields for the role specification and the role type.  They
were not used, and for the role specification this could be confusing as
the schema name may by extracted from that at the beginning of
CreateSchemaCommand().

This issue exists for a long time, so backpatch down to all the versions
supported.

Reported-by: Song Hongyu
Author: Michael Paquier
Reviewed-by: Richard Guo
Discussion: https://postgr.es/m/17909-f65c12dfc5f0451d@postgresql.org
Backpatch-through: 11
2023-04-28 19:29:12 +09:00
Thomas Munro 828e93a6f2 Remove bogus #include added by d4e71df6d7.
The recently added inclusion of guc.h in smgr.h is not necessary and
introduces more server-related stuff. Removing the directive helps
avoid potential issues with including sgmr.h in frontends.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20230425.115748.2130383825066921512.horikyota.ntt%40gmail.com
2023-04-26 10:43:53 +12:00
Andres Freund 1118cd37eb Remove vacuum_defer_cleanup_age
vacuum_defer_cleanup_age was introduced before hot_standby_feedback and
replication slots existed. It is hard to use reasonably - commonly it will
either be set too low (not preventing recovery conflicts, while still causing
some bloat), or too high (causing a lot of bloat). The alternatives do not
have that issue.

That on its own might not be sufficient reason to remove
vacuum_defer_cleanup_age, but it also complicates computation of xid
horizons. See e.g. the bug fixed in be504a3e97. It also is untested.

This commit removes TransactionIdRetreatSafely(), as there are no users
anymore. There might be potential future users, hence noting that here.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20230317230930.nhsgk3qfk7f4axls@awork3.anarazel.de
2023-04-24 12:21:02 -07:00
Tom Lane fce3b26e97 Rename ExecAggTransReparent, and improve its documentation.
The name of this function suggests that it ought to reparent R/W
expanded objects to be children of the persistent aggcontext, instead
of copying them.  In fact it does no such thing, and if you try to
make it do so you will see multiple regression failures.  Rename it
to the less-misleading ExecAggCopyTransValue, and add commentary
about why that attractive-sounding optimization won't work.  Also
adjust comments at call sites, some of which were describing logic
that has since been moved into ExecAggCopyTransValue.

Discussion: https://postgr.es/m/3004282.1681930251@sss.pgh.pa.us
2023-04-24 13:01:33 -04:00
Michael Paquier 0ecb87e1fa Remove io prefix from pg_stat_io columns
a9c70b46 added the statistics view pg_stat_io which contained columns
"io_context" and "io_object".  Given that the columns are in the
pg_stat_io view, the "io" prefix is somewhat redundant, so remove it.

The code variables referring to these fields are kept unchanged so as
they can keep their context about I/O.

Bump catalog version.

Author: Melanie Plageman
Reviewed-by: Kyotaro Horiguchi, Fabrízio de Royes Mello
Discussion: https://postgr.es/m/CAAKRu_aAQoJWrvT2BYYQvJChFKra_O-5ra3jhzKJZqWsTR1CPQ@mail.gmail.com
2023-04-21 07:21:50 +09:00
Thomas Munro 7d3d72b55e Remove obsolete defense against strxfrm() bugs.
Old versions of Solaris and illumos had buffer overrun bugs in their
strxfrm() implementations.  The bugs were fixed more than a decade ago
and the relevant releases are long out of vendor support.  It's time to
remove the defense added by commit be8b06c3.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA+hUKGJ-ZPJwKHVLbqye92-ZXeLoCHu5wJL6L6HhNP7FkJ=meA@mail.gmail.com
2023-04-20 13:20:14 +12:00
Peter Geoghegan 50547a3fae Fix wal_consistency_checking enhanced desc output.
Recent enhancements to rmgr desc routines that made the output summarize
certain block data (added by commits 7d8219a4 and 1c453cfd) dealt with
records that lack relevant block data (and so have nothing to give a
more detailed summary of) by testing !DecodedBkpBlock.has_image.  As a
result, more detailed descriptions of block data were not output when
wal_consistency_checking was enabled.

This bug affected records with summarizable block data that also
happened to have an FPI that the REDO routine isn't supposed to apply
(FPIs used for consistency checking purposes only).  The presence of
such an FPI was incorrectly taken to indicate the absence of block data.

To fix, test DecodedBkpBlock.has_data, not !DecodedBkpBlock.has_image.
This is the exact condition that we care about, not an inexact proxy.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wzm5Sc9cBg1qWV_cEBfLNJCrW9FjS-SoHVt8FLA7Ldn8yg@mail.gmail.com
2023-04-19 10:42:39 -07:00
David Rowley 3f58a4e296 Fix various typos and incorrect/outdated name references
Author: Alexander Lakhin
Discussion: https://postgr.es/m/699beab4-a6ca-92c9-f152-f559caf6dc25@gmail.com
2023-04-19 13:50:33 +12:00
Peter Geoghegan 06e0652750 Remove useless argument from nbtree dedup function.
_bt_dedup_pass()'s heapRel argument hasn't been needed or used since
commit cf2acaf4dc made deleting any existing LP_DEAD index tuples the
caller's responsibility.
2023-04-18 10:33:15 -07:00
David Rowley eef231e816 Fix some typos and some incorrectly duplicated words
Author: Justin Pryzby
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/ZD3D1QxoccnN8A1V@telsasoft.com
2023-04-18 14:03:49 +12:00
David Rowley b4dbf3e924 Fix various typos
This fixes many spelling mistakes in comments, but a few references to
invalid parameter names, function names and option names too in comments
and also some in string constants

Also, fix an #undef that was undefining the incorrect definition

Author: Alexander Lakhin
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/d5f68d19-c0fc-91a9-118d-7c6a5a3f5fad@gmail.com
2023-04-18 13:23:23 +12:00
Peter Geoghegan cd7cdc550c Fix incorrect comment about nbtree WAL record.
The nbtree VACUUM WAL record stores its page offset number payload in
blk 0 (just like the closely related nbtree DELETE WAL record).  Commit
ebd551f5 fixed a similar issue with the DELETE WAL record, but missed
this one.
2023-04-17 09:58:18 -07:00
Tom Lane d48ac0070c Further cleanup of autoconf output files for GSSAPI changes.
Running autoheader was missed in f7431bca8.  This is cosmetic since
we aren't using these HAVE_ symbols, but let's get everything in
sync while we're looking at this.

Discussion: https://postgr.es/m/2422362.1681741814@sss.pgh.pa.us
2023-04-17 11:21:50 -04:00
Peter Geoghegan d6f0f95a6b Harmonize some more function parameter names.
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places.  These
inconsistencies were all introduced relatively recently, after the code
base had parameter name mismatches fixed in bulk (see commits starting
with commits 4274dc22 and 035ce1fe).

pg_bsd_indent still has a couple of similar inconsistencies, which I
(pgeoghegan) have left untouched for now.

Like all earlier commits that cleaned up function parameter names, this
commit was written with help from clang-tidy.
2023-04-13 10:15:20 -07:00
Stephen Frost 6633cfb216 De-Revert "Add support for Kerberos credential delegation"
This reverts commit 3d03b24c3 (Revert Add support for Kerberos
credential delegation) which was committed on the grounds of concern
about portability, but on further review and discussion, it's clear that
we are better off explicitly requiring MIT Kerberos as that appears to
be the only GSSAPI library currently that's under proper maintenance
and ongoing development.  The API used for storing credentials was added
to MIT Kerberos over a decade ago while for the other libraries which
appear to be mainly based on Heimdal, which exists explicitly to be a
re-implementation of MIT Kerberos, the API never made it to a released
version (even though it was added to the Heimdal git repo over 5 years
ago..).

This post-feature-freeze change was approved by the RMT.

Discussion: https://postgr.es/m/ZDDO6jaESKaBgej0%40tamriel.snowman.net
2023-04-13 08:55:07 -04:00
Alvaro Herrera 9ce04b50e1
Revert "Catalog NOT NULL constraints" and fallout
This reverts commit e056c557ae and minor later fixes thereof.

There's a few problems in this new feature -- most notably regarding
pg_upgrade behavior, but others as well.  This new feature is not in any
way critical on its own, so instead of scrambling to fix it we revert it
and try again in early 17 with these issues in mind.

Discussion: https://postgr.es/m/3801207.1681057430@sss.pgh.pa.us
2023-04-12 19:29:21 +02:00
Michael Paquier a923e21631 Fix detection of unseekable files for fseek() and ftello() with MSVC
Calling fseek() or ftello() on a handle to a non-seeking device such as
a pipe or a communications device is not supported.  Unfortunately,
MSVC's flavor of these routines, _fseeki64() and _ftelli64(), do not
return an error when given a pipe as handle.  Some of the logic of
pg_dump and restore relies on these routines to check if a handle is
seekable, causing failures when passing the contents of pg_dump to
pg_restore through a pipe, for example.

This commit introduces wrappers for fseeko() and ftello() on MSVC so as
any callers are able to properly detect the cases of non-seekable
handles.  This relies mainly on GetFileType(), sharing a bit of code
with the MSVC port for fstat().  The code in charge of getting a file
type is refactored into a new file called win32common.c, shared by
win32stat.c and the new win32fseek.c.  It includes the MSVC ports for
fseeko() and ftello().

Like 765f5df, this is backpatched down to 14, where the fstat()
implementation for MSVC is able to understand about files larger than
4GB in size.  Using a TAP test for that is proving to be tricky as
IPC::Run handles the pipes by itself, still I have been able to check
the fix manually.

Reported-by: Daniel Watzinger
Author: Juan José Santamaría Flecha, Michael Paquier
Discussion: https://postgr.es/m/CAC+AXB26a4EmxM2suXxPpJaGrqAdxracd7hskLg-zxtPB50h7A@mail.gmail.com
Backpatch-through: 14
2023-04-12 09:09:38 +09:00
Peter Geoghegan c03c2eae0a Refine the guidelines for rmgrdesc authors.
Clarify the goals of the recently added guidelines for rmgrdesc authors:
to avoid gratuitous inconsistencies across resource managers, and to
make it reasonably easy to write a reusable custom parser.

Beyond that, the guidelines leave rmgrdesc authors with a significant
amount of leeway.  This even includes the leeway to invent custom
conventions (in cases where it's warranted).

Follow-up to commit 7d8219a4.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkbYuvwYKm-Y-72QEh6SPMQcAo9uONv+mR3bMGcu9E_Cg@mail.gmail.com
2023-04-11 15:26:24 -07:00
Peter Geoghegan e944063294 Fix xl_heap_lock WAL record field's data type.
Make xl_heap_lock's infobits_set field of type uint8, not int8.  Using
int8 isn't appropriate given that the field just holds status bits.
This fixes an oversight in commit 0ac5ad5134.

In passing rename the nearby TransactionId field to "xmax" to make
things consistency with related records, such as xl_heap_lock_updated.

Deliberately avoid a bump in XLOG_PAGE_MAGIC.  No backpatch, either.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkCd3kOS8b7Rfxw7Mh1_6jvX=Nzo-CWR1VBTiOtVZkWHA@mail.gmail.com
2023-04-11 14:07:54 -07:00
Peter Geoghegan 5d6728e588 Fix nbtree posting list update desc output.
We cannot use the generic array_desc approach with per-tuple nbtree
posting list update metadata because array_desc can only deal with fixed
width elements (e.g., page offset numbers).  Using array_desc led to
incorrect rmgr descriptions for updates from nbtree DELETE/VACUUM WAL
records.

To fix, add specialized code to describe the update metadata as array
elements in desc output.  We now iterate over the update metadata using
an approach that matches related REDO routines.

Also stop showing the updates offset number array separately in nbtree
DELETE/VACUUM desc output.  It's redundant information, since the same
page offset numbers appear in the description of each individual update
element.  Also make some small tweaks to the way that we format arrays
in all desc routines (not just nbtree desc routines) to make arrays a
little less verbose.

Oversight in commit 1c453cfd, which enhanced the nbtree rmgr desc
routines.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkbYuvwYKm-Y-72QEh6SPMQcAo9uONv+mR3bMGcu9E_Cg@mail.gmail.com
2023-04-10 11:15:41 -07:00
Stephen Frost 3d03b24c35 Revert "Add support for Kerberos credential delegation"
This reverts commit 3d4fa227bc.

Per discussion and buildfarm, this depends on APIs that seem to not
be available on at least one platform (NetBSD).  Should be certainly
possible to rework to be optional on that platform if necessary but bit
late for that at this point.

Discussion: https://postgr.es/m/3286097.1680922218@sss.pgh.pa.us
2023-04-08 07:21:35 -04:00
Thomas Munro db4f21e4a3 Redesign interrupt/cancel API for regex engine.
Previously, a PostgreSQL-specific callback checked by the regex engine
had a way to trigger a special error code REG_CANCEL if it detected that
the next call to CHECK_FOR_INTERRUPTS() would certainly throw via
ereport().

A later proposed bugfix aims to move some complex logic out of signal
handlers, so that it won't run until the next CHECK_FOR_INTERRUPTS(),
which makes the above design impossible unless we split
CHECK_FOR_INTERRUPTS() into two phases, one to run logic and another to
ereport().  We may develop such a system in the future, but for the
regex code it is no longer necessary.

An earlier commit moved regex memory management over to our
MemoryContext system.  Given that the purpose of the two-phase interrupt
checking was to free memory before throwing, something we don't need to
worry about anymore, it seems simpler to inject CHECK_FOR_INTERRUPTS()
directly into cancelation points, and just let it throw.

Since the plan is to keep PostgreSQL-specific concerns separate from the
main regex engine code (with a view to bein able to stay in sync with
other projects), do this with a new macro INTERRUPT(), customizable in
regcustom.h and defaulting to nothing.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
2023-04-08 22:10:39 +12:00
Thomas Munro 4f51429dd7 Update tsearch regex memory management.
Now that our regex engine uses palloc(), it's not necessary to set up a
special memory context callback to free compiled regexes.  The regex has
no resources other than the memory that is already going to be freed in
bulk.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
2023-04-08 22:09:17 +12:00
Thomas Munro bea3d7e383 Use MemoryContext API for regex memory management.
Previously, regex_t objects' memory was managed with malloc() and free()
directly.  Switch to palloc()-based memory management instead.
Advantages:

 * memory used by cached regexes is now visible with MemoryContext
   observability tools

 * cleanup can be done automatically in certain failure modes
   (something that later commits will take advantage of)

 * cleanup can be done in bulk

On the downside, there may be more fragmentation (wasted memory) due to
per-regex MemoryContext objects.  This is a problem shared with other
cached objects in PostgreSQL and can probably be improved with later
tuning.

Thanks to Noah Misch for suggesting this general approach, which
unblocks later work on interrupts.

Suggested-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
2023-04-08 22:08:41 +12:00
Andres Freund 0fdab27ad6 Allow logical decoding on standbys
Unsurprisingly, this requires wal_level = logical to be set on the primary and
standby. The infrastructure added in 26669757b6 ensures that slots are
invalidated if the primary's wal_level is lowered.

Creating a slot on a standby waits for a xl_running_xact record to be
processed. If the primary is idle (and thus not emitting xl_running_xact
records), that can take a while.  To make that faster, this commit also
introduces the pg_log_standby_snapshot() function. By executing it on the
primary, completion of slot creation on the standby can be accelerated.

Note that logical decoding on a standby does not itself enforce that required
catalog rows are not removed. The user has to use physical replication slots +
hot_standby_feedback or other measures to prevent that. If catalog rows
required for a slot are removed, the slot is invalidated.

See 6af1793954 for an overall design of logical decoding on a standby.

Bumps catversion, for the addition of the pg_log_standby_snapshot() function.

Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Author: Andres Freund <andres@anarazel.de> (in an older version)
Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version)
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: FabrÌzio de Royes Mello <fabriziomello@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
2023-04-08 02:20:05 -07:00
Andres Freund e101dfac3a For cascading replication, wake physical and logical walsenders separately
Physical walsenders can't send data until it's been flushed; logical
walsenders can't decode and send data until it's been applied. On the
standby, the WAL is flushed first, which will only wake up physical
walsenders; and then applied, which will only wake up logical
walsenders.

Previously, all walsenders were awakened when the WAL was flushed. That
was fine for logical walsenders on the primary; but on the standby the
flushed WAL would have been not applied yet, so logical walsenders were
awakened too early.

Per idea from Jeff Davis and Amit Kapila.

Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1+zO5LUeisabX10c81LU-fWMKO4M9Wyg1cdkbW7Hqh6vQ@mail.gmail.com
2023-04-08 01:06:00 -07:00