Commit Graph

53041 Commits

Author SHA1 Message Date
Heikki Linnakangas d85bf0719e Ensure that creation of an empty relfile is fsync'd at checkpoint.
If you create a table and don't insert any data into it, the relation file
is never fsync'd. You don't lose data, because an empty table doesn't have
any data to begin with, but if you crash and lose the file, subsequent
operations on the table will fail with "could not open file" error.

To fix, register an fsync request in mdcreate(), like we do for mdwrite().

Per discussion, we probably should also fsync the containing directory
after creating a new file. But that's a separate and much wider issue.

Backpatch to all supported versions.

Reviewed-by: Andres Freund, Thomas Munro
Discussion: https://www.postgresql.org/message-id/d47d8122-415e-425c-d0a2-e0160829702d%40iki.fi
2023-07-04 18:07:46 +03:00
Peter Eisentraut 070bf5cda5 Adjust kerberos and ldap tests for Homebrew on ARM
The Homebrew package manager changed its default installation prefix
for the new architecture, so a couple of tests need tweaks to find
binaries.

This is a partial backpatch of dc513bc654.
2023-07-04 11:14:53 +02:00
Thomas Munro b7ec66731d Re-bin segment when memory pages are freed.
It's OK to be lazy about re-binning memory segments when allocating,
because that can only leave segments in a bin that's too high.  We'll
search higher bins if necessary while allocating next time, and
also eventually re-bin, so no memory can become unreachable that way.

However, when freeing memory, the largest contiguous range of free pages
might go up, so we should re-bin eagerly to make sure we don't leave the
segment in a bin that is too low for get_best_segment() to find.

The re-binning code is moved into a function of its own, so it can be
called whenever free pages are returned to the segment's free page map.

Back-patch to all supported releases.

Author: Dongming Liu <ldming101@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier version)
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CAL1p7e8LzB2LSeAXo2pXCW4%2BRya9s0sJ3G_ReKOU%3DAjSUWjHWQ%40mail.gmail.com
2023-07-04 15:26:42 +12:00
Thomas Munro fb663f3879 Fix race in SSI interaction with gin fast path.
The ginfast.c code previously checked for conflicts in before locking
the relevant buffer, leaving a window where a RW conflict could be
missed.  Re-order.

There was also a place where buffer ID and block number were confused
while trying to predicate-lock a page, noted by visual inspection.

Back-patch to all supported releases.  Fixes one more problem discovered
with the reproducer from bug #17949, in this case when Dmitry tried
other index types.

Reported-by: Artem Anisimov <artem.anisimov.255@gmail.com>
Reported-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/17949-a0f17035294a55e2%40postgresql.org
2023-07-04 09:14:16 +12:00
Thomas Munro 3f7d3a77e1 Fix race in SSI interaction with bitmap heap scan.
When performing a bitmap heap scan, we don't want to miss concurrent
writes that occurred after we observed the heap's rs_nblocks, but before
we took predicate locks on index pages.  Therefore, we can't skip
fetching any heap tuples that are referenced by the index, because we
need to test them all with CheckForSerializableConflictOut().  The
old optimization that would ignore any references to blocks >=
rs_nblocks gets in the way of that requirement, because it means that
concurrent writes in that window are ignored.

Removing that optimization shouldn't affect correctness at any isolation
level, because any new tuples shouldn't be visible to an MVCC snapshot.
There also shouldn't be any error-causing references to heap blocks past
the end, because we should have held at least an AccessShareLock on the
table before the index scan.  It can't get smaller while our transaction
is running.  For now, though, we'll keep the optimization at lower
levels to avoid making unnecessary changes in a bug fix.

Back-patch to all supported releases.  In release 11, the code is in a
different place but not fundamentally different.  Fixes one aspect of
bug #17949.

Reported-by: Artem Anisimov <artem.anisimov.255@gmail.com>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/17949-a0f17035294a55e2%40postgresql.org
2023-07-04 09:14:16 +12:00
Thomas Munro ae6d536ed0 Fix race in SSI interaction with empty btrees.
When predicate-locking btrees, we have a special case for completely
empty btrees, since there is no page to lock.  This was racy, because,
without buffer lock held, a matching key could be inserted between the
_bt_search() and the PredicateLockRelation() calls.

Fix, by rechecking _bt_search() after taking the relation-level SIREAD
lock, if using SERIALIZABLE isolation and an empty btree is discovered.

Back-patch to all supported releases.  Fixes one aspect of bug #17949.

Reported-by: Artem Anisimov <artem.anisimov.255@gmail.com>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/17949-a0f17035294a55e2%40postgresql.org
2023-07-04 09:14:16 +12:00
Tomas Vondra 5396b188c9 Remove expensive test of postgres_fdw batch inserts
The test inserted 70k rows into a foreign table, in order to verify
correct behavior with more than 65535 parameters, and was added in
response to a bug report.

However, this is rather expensive, especially when running the tests
under valgrind, CLOBBER_CACHE_ALWAYS etc. It doesn't seem worth it to
keep running the test, so remove it from all branches (14+).

Backpatch-through: 14
Discussion: https://postgr.es/m/2131017.1623451468@sss.pgh.pa.us
2023-07-03 18:40:44 +02:00
Andrew Dunstan 8d3e1718d5 Use older package name in pg_basebackup test
Commit 83ed4de20f inadvertently used the new package names. In version
14 or older, use TestLib intead of using PostgreSQL::Test::Utils
2023-07-03 10:46:49 -04:00
Andrew Dunstan 83ed4de20f Improve pg_basebackup long file name test Windows robustness
Creation of a file with a very long name can create problems on Windows
due to its file path limits. Work around that by creating the file via a
symlink with a shorter name.

Error displayed by buildfarm animal fairywren.o

Backpatch to all live branches
2023-07-03 10:07:24 -04:00
Michael Paquier c8987ea90c Make PG_TEST_NOCLEAN work for temporary directories in TAP tests
When set, this environment variable was only effective for data
directories but not for all the other temporary files created by
PostgreSQL::Test::Utils.  Keeping the temporary files after a successful
run can be useful for debugging purposes.

The documentation is updated to reflect the new behavior, with contents
available in doc/ since v16 and in src/test/perl/README since v15.

Author: Jacob Champion
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/CAAWbhmgHtDH1SGZ+Fw05CsXtE0mzTmjbuUxLB9mY9iPKgM6cUw@mail.gmail.com
Discussion: https://postgr.es/m/YyPd9unV14SX2bLF@paquier.xyz
Backpatch-through: 11
2023-07-03 10:06:16 +09:00
Tomas Vondra 260dbf19a5 Fix oversight in handling of modifiedCols since f24523672d
Commit f24523672d fixed a memory leak by moving the modifiedCols bitmap
into the per-row memory context. In the case of AFTER UPDATE triggers,
the bitmap is however referenced from an event kept until the end of the
query, resulting in a use-after-free bug.

Fixed by copying the bitmap into the AfterTriggerEvents memory context,
which is the one where we keep the trigger events. There's only one
place that needs to do the copy, but the memory context may not exist
yet. Doing that in a separate function seems more readable.

Report by Alexander Pyhalov, fix by me. Backpatch to 13, where the
bitmap was added to the event by commit 71d60e2aa0.

Reported-by: Alexander Pyhalov
Backpatch-through: 13
Discussion: https://postgr.es/m/acddb17c89b0d6cb940eaeda18c08bbe@postgrespro.ru
2023-07-02 22:23:04 +02:00
Tomas Vondra c1affa38c7 Fix memory leak in Incremental Sort rescans
The Incremental Sort had a couple issues, resulting in leaking memory
during rescans, possibly triggering OOM. The code had a couple of
related flaws:

1. During rescans, the sort states were reset but then also set to NULL
   (despite the comment saying otherwise). ExecIncrementalSort then
   sees NULL and initializes a new sort state, leaking the memory used
   by the old one.

2. Initializing the sort state also automatically rebuilt the info about
   presorted keys, leaking the already initialized info. presorted_keys
   was also unnecessarily reset to NULL.

Patch by James Coleman, based on patches by Laurenz Albe and Tom Lane.
Backpatch to 13, where Incremental Sort was introduced.

Author: James Coleman, Laurenz Albe, Tom Lane
Reported-by: Laurenz Albe, Zu-Ming Jiang
Backpatch-through: 13
Discussion: https://postgr.es/m/b2bd02dff61af15e3526293e2771f874cf2a3be7.camel%40cybertec.at
Discussion: https://postgr.es/m/db03c582-086d-e7cd-d4a1-3bc722f81765%40inf.ethz.ch
2023-07-02 20:05:14 +02:00
Bruce Momjian 49d1d3c2c8 doc: PG _14_ relnotes, remove duplicate commit comment
Backpatch-through: 14 only
2023-06-30 08:37:15 -04:00
Michael Paquier 663b35f2df Fix marking of indisvalid for partitioned indexes at creation
The logic that introduced partitioned indexes missed a few things when
invalidating a partitioned index when these are created, still the code
is written to handle recursions:
1) If created from scratch because a mapping index could not be found,
the new index created could be itself invalid, if for example it was a
partitioned index with one of its leaves invalid.
2) A CCI was missing when indisvalid is set for a parent index, leading
to inconsistent trees when recursing across more than one level for a
partitioned index creation if an invalidation of the parent was
required.

This could lead to the creation of a partition index tree where some of
the partitioned indexes are marked as invalid, but some of the parents
are marked valid, which is not something that should happen (as
validatePartitionedIndex() defines, indisvalid is switched to true for a
partitioned index iff all its partitions are themselves valid).

This patch makes sure that indisvalid is set to false on a partitioned
index if at least one of its partition is invalid.  The flag is set to
true if *all* its partitions are valid.

The regression test added in this commit abuses of a failed concurrent
index creation, marked as invalid, that maps with an index created on
its partitioned table afterwards.

Reported-by: Alexander Lakhin
Reviewed-by: Alexander Lakhin
Discussion: https://postgr.es/m/14987634-43c0-0cb3-e075-94d423607e08@gmail.com
Backpatch-through: 11
2023-06-30 13:54:56 +09:00
Tom Lane 0789b82a97 Fix order of operations in ExecEvalFieldStoreDeForm().
If the given composite datum is toasted out-of-line,
DatumGetHeapTupleHeader will perform database accesses to detoast it.
That can invalidate the result of get_cached_rowtype, as documented
(perhaps not plainly enough) in that function's API spec; which leads
to strange errors or crashes when we try to use the TupleDesc to read
the tuple.  In short then, trying to update a field of a composite
column could fail intermittently if the overall column value is wide
enough to require toasting.

We can fix the bug at no cost by just changing the order of
operations, since we don't need the TupleDesc until after detoasting.
(Other callers of get_cached_rowtype appear to get this right already,
so there's only one bug.)

Note that the added regression test case reveals this bug reliably
only with debug_discard_caches/CLOBBER_CACHE_ALWAYS.

Per bug #17994 from Alexander Lakhin.  Sadly, this patch does not fix
the missing-values issue revealed in the bug discussion; we'll need
some more work to cover that.

Discussion: https://postgr.es/m/17994-5c7100b51b4790e9@postgresql.org
2023-06-29 10:19:10 -04:00
Peter Eisentraut 6bc7873da1 Remove inappropriate raw_expression_tree_walker() code
It was walking into the ColumnDef->compression field, which is not a
node but a string.  This code is currently not reachable (because the
compression field is only set in situations that don't go through
raw_expression_tree_walker()), but if it had been, this could have
behaved erratically.
2023-06-29 10:35:53 +02:00
Michael Paquier 7e8349cbd7 pg_stat_statements: Fix second comment related to entry resets
This should have been part of dc73db6, but it got lost in the mix.
Oversight in 6b4d23f.

Author: Japin Li
Discussion: https://postgr.es/m/MEYP282MB1669FC91C764E277821936D3B624A@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
Backpatch-through: 14
2023-06-29 09:17:34 +09:00
Michael Paquier aa4b11e8be pg_stat_statements: Fix incorrect comment with entry resets
Oversight in 6b4d23f.

Author: Japin Li, Richard Guo
Discussion: https://postgr.es/m/MEYP282MB1669FC91C764E277821936D3B624A@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
Backpatch-through: 14
2023-06-29 08:05:10 +09:00
Michael Paquier 6160e221d5 Ignore invalid indexes when enforcing index rules in ALTER TABLE ATTACH PARTITION
A portion of ALTER TABLE .. ATTACH PARTITION is to ensure that the
partition being attached to the partitioned table has a correct set of
indexes, so as there is a consistent index mapping between the
partitioned table and its new-to-be partition.  However, as introduced
in 8b08f7d, the current logic could choose an invalid index as a match,
which is something that can exist when dealing with more than two levels
of partitioning, like attaching a partitioned table (that has
partitions, with an index created by CREATE INDEX ON ONLY) to another
partitioned table.

A partitioned index with indisvalid set to false is equivalent to an
incomplete partition tree, meaning that an invalid partitioned index
does not have indexes defined in all its partitions.  Hence, choosing an
invalid partitioned index can create inconsistent partition index trees,
where the parent attaching to is valid, but its partition may be
invalid.

In the report from Alexander Lakhin, this showed up as an assertion
failure when validating an index.  Without assertions enabled, the
partition index tree would be actually broken, as indisvalid should
be switched to true for a partitioned index once all its partitions are
themselves valid.  With two levels of partitioning, the top partitioned
table used a valid index and was able to link to an invalid index stored
on its partition, itself a partitioned table.

I have studied a few options here (like the possibility to switch
indisvalid to false for the parent), but came down to the conclusion
that we'd better rely on a simple rule: invalid indexes had better never
be chosen, so as the partition attached uses and creates indexes that
the parent expects.  Some regression tests are added to provide some
coverage.  Note that the existing coverage is not impacted.

This is a problem since partitioned indexes exist, so backpatch all the
way down to v11.

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/14987634-43c0-0cb3-e075-94d423607e08@gmail.com
Backpatch-through: 11
2023-06-28 15:57:48 +09:00
Heikki Linnakangas 0c3fb8ac5f Fix comment on clearing padding.
Author: Japin Li
Discussion: https://www.postgresql.org/message-id/MEYP282MB16696317B5DA7D0D92306149B627A@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2023-06-27 10:15:26 +03:00
Tom Lane 4c61afa47c Check for interrupts and stack overflow in TParserGet().
TParserGet() recurses for some token types, meaning it's possible
to drive it to stack overflow.  Since this is a minority behavior,
I chose to add the check_stack_depth() call to the two places that
recurse rather than doing it during every single call.

While at it, add CHECK_FOR_INTERRUPTS(), because this can run
unpleasantly long for long inputs.

Per bug #17995 from Zuming Jiang.  This is old, so back-patch
to all supported branches.

Discussion: https://postgr.es/m/17995-9f20ff3e6389db4c@postgresql.org
2023-06-24 17:18:08 -04:00
Bruce Momjian c1589923c6 doc: rename "decades" to be more generic
Reported-by: Michael Paquier

Discussion: https://postgr.es/m/ZJTzwD2rTbHWWQ9g@paquier.xyz

Backpatch-through: 11
2023-06-23 22:50:55 -04:00
Michael Paquier 451ca5c1e6 Fix incorrect error message in libpq_pipeline
One of the tests for the pipeline mode with portal description expects a
non-NULL PQgetResult, but used an incorrect error message on failure,
telling that PQgetResult being NULL was the expected result.

Author: Jelte Fennema
Discussion: https://postgr.es/m/CAGECzQTkShHecFF+EZrm94Lbsu2ej569T=bz+PjMbw9Aiioxuw@mail.gmail.com
Backpatch-through: 14
2023-06-23 17:50:28 +09:00
Amit Kapila 991983fcf6 Doc: Clarify the behavior of triggers/rules in a logical subscriber.
By default, triggers and rules do not fire on a logical replication
subscriber based on the "session_replication_role" GUC being set to
"replica". However, the docs in the logical replication section assumed
that the reader understood how this GUC worked. This modifies the docs to
be more explicit and links back to the GUC itself.

Author: Jonathan Katz, Peter Smith
Reviewed-by: Vignesh C, Euler Taveira
Backpatch-through: 11
Discussion: https://postgr.es/m/5bb2c9a2-499f-e1a2-6e33-5ce96b35cc4a@postgresql.org
2023-06-22 12:16:51 +05:30
David Rowley 8145b5e138 Doc: mention that extended stats aren't used for joins
Statistics defined by the CREATE STATISTICS command are only used to
assist with the selectivity estimations of base relations, never for
joins.  Here we mention this fact in the notes section of the CREATE
STATISTICS command.

Discussion: https://postgr.es/m/CAApHDvrMuVgDOrmg_EtFDZ=AOovq6EsJNnHH1ddyZ8EqL4yzMw@mail.gmail.com
Backpatch-through: 11
2023-06-22 12:47:53 +12:00
Peter Geoghegan 63fa0deb31 nbtree VACUUM: cope with topparent inconsistencies.
Avoid "right sibling %u of block %u is not next child" errors when
vacuuming a corrupt nbtree index.  Just LOG the issue and press on.
That way VACUUM will have a decent chance of finishing off all required
processing for the index (and for the table as a whole).

This is similar to recent work from commit 5abff197, as well as work
from commit 5b861baa (later backpatched as commit 43e409ce), which
taught nbtree VACUUM to keep going when its "re-find" check fails.  The
hardening added by this commit takes place directly after the "re-find"
check, right before the critical section for the first stage of page
deletion.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wz=dayg0vjs4+er84TS9ami=csdzjpuiCGbEw=idhwqhzQ@mail.gmail.com
Backpatch: 11- (all supported versions).
2023-06-21 17:41:54 -07:00
Bruce Momjian 43b28fc39f doc: update PG history as over "three decades"
Reported-by: Pierre <pbaumard@gmail.com>

Discussion: https://postgr.es/m/168724660637.399156.7642965215720120947@wrigleys.postgresql.org

Backpatch-through: 11
2023-06-21 19:20:07 -04:00
Tom Lane 120ea65b8a Avoid Assert failure when processing empty statement in aborted xact.
exec_parse_message() wants to create a cached plan in all cases,
including for empty input.  The empty-input path does not have
a test for being in an aborted transaction, making it possible
that plancache.c will fail due to trying to do database lookups
even though there's no real work to do.

One solution would be to throw an aborted-transaction error in
this path too, but it's not entirely clear whether the lack of
such an error was intentional or whether some clients might be
relying on non-error behavior.  Instead, let's hack plancache.c
so that it treats empty statements with the same logic it
already had for transaction control commands, ensuring that it
can soldier through even in an already-aborted transaction.

Per bug #17983 from Alexander Lakhin.  Back-patch to all
supported branches.

Discussion: https://postgr.es/m/17983-da4569fcb878672e@postgresql.org
2023-06-21 11:07:11 -04:00
Michael Paquier 2634926906 Disable use of archiving in 009_twophase.pl
This partially reverts 68cb5af, as using archiving to enforce the
rename of the last partial segment of the old timeline at promotion to
use .partial as suffix is impacting the tests when it does switchovers.
As showed by the logs gathered by the CI in the tests that failed, a new
standby may fail to find the WAL segment it needs to follow a promoted
instance with its timeline jump, as it got renamed to .partial.

This problem would manifest as a run timeout with 009_twophase.pl, as
the new standby repeatedly requests a segment from the promoted primary
that it would not find.

Reported-by: Nathan Bossart
Discussion: https://postgr.es/m/20230621043345.GA787473@nathanxps13
Backpatch-through: 13
2023-06-21 16:16:24 +09:00
Amit Kapila 0b79042701 Fix the errhint message and docs for drop subscription failure.
The existing errhint message and docs were missing the fact that we can't
disassociate from the slot unless the subscription is disabled.

Author: Robert Sjöblom, Peter Smith
Reviewed-by: Peter Eisentraut, Amit Kapila
Backpatch-through: 11
Discussion: https://postgr.es/m/807bdf85-61ea-88e2-5712-6d9fcd4eabff@fortnox.se
2023-06-21 10:23:09 +05:30
Tom Lane d911dce14d Fix hash join when inner hashkey expressions contain Params.
If the inner-side expressions contain PARAM_EXEC Params, we must
re-hash whenever the values of those Params change.  The executor
mechanism for that exists already, but we failed to invoke it because
finalize_plan() neglected to search the Hash.hashkeys field for
Params.  This allowed a previous scan's hash table to be re-used
when it should not be, leading to rows missing from the join's output.
(I believe incorrectly-included join rows are impossible however,
since checking the real hashclauses would reject false matches.)

This bug is very ancient, dating probably to d24d75ff1 of 7.4.
Sadly, this simple fix depends on the plan representational changes
made by 2abd7ae9b, so it will only work back to v12.  I thought
about trying to make some kind of hack for v11, but I'm leery
of putting code significantly different from what is used in the
newer branches into a nearly-EOL branch.  Seeing that the bug
escaped detection for a full twenty years, problematic cases
must be rare; so I don't feel too awful about leaving v11 as-is.

Per bug #17985 from Zuming Jiang.  Back-patch to v12.

Discussion: https://postgr.es/m/17985-748b66607acd432e@postgresql.org
2023-06-20 17:47:36 -04:00
Michael Paquier e6317d9b50 Enable archiving in recovery TAP test 009_twophase.pl
This is a follow-up of f663b00, that has been committed to v13 and v14,
tweaking the TAP test for two-phase transactions so as it provides
coverage for the bug that has been fixed.  This change is done in its
own commit for clarity, as v15 and HEAD did not show the problematic
behavior, still missed coverage for it.

While on it, this adds a comment about the dependency of the last
partial segment rename and RecoverPreparedTransactions() at the end of
recovery, as that can be easy to miss.

Author: Michael Paquier
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/743b9b45a2d4013bd90b6a5cba8d6faeb717ee34.camel@cybertec.at
Backpatch-through: 13
2023-06-20 10:25:45 +09:00
Michael Paquier f663b00918 Fix failure at promotion with 2PC transactions and archiving enabled
When archiving is enabled, a promotion request would fail with the
following error when some 2PC transaction needs to be recovered from
WAL, preventing the promotion to complete:
FATAL:  requested WAL segment pg_wal/000000010000000000000001 has already been removed

The origin of the problem is that the last partial segment of the old
timeline is renamed before recovering the 2PC data via
RecoverPreparedTransactions() at the end of recovery, causing the FATAL
because the segment wanted is now renamed with a .partial suffix.  This
commit reorders a bit the end-of-recovery actions so as the execution of
recovery_end_command, the cleanup of the old segments of the old
timeline (RemoveNonParentXlogFiles) and the last partial segment rename
are done after the 2PC transaction data is recovered with
RecoverPreparedTransactions().  This makes the order of these
end-of-recovery actions more consistent with ~15, at the exception of
the end-of-recovery checkpoint that still needs to happen before all the
actions reordered here in v13 and v14, contrary to what 15~ does.

v15 and newer versions have "fixed" this problem somewhat accidentally
with 811051c, where the end-of-recovery actions got reordered.  In this
case, the recovery of 2PC transactions happens before the renaming of
the last partial segment of the old timeline.

v13 and v14 are the versions that can easily see this problem as per the
refactoring of 38a95731 where XLogReaderState is reset in
XLogBeginRead() before reading the 2PC transaction data.  v11 and v12
could also see this problem, but may finish by reading the 2PC data from
some of the WAL buffers instead.  Perhaps something could be done for
these two branches, but I am not really excited about doing something on
these per the lack of complaints and per the fact that v11 is soon going
to be EOL'd soon (there is always a risk of breaking something).

Note that the TAP test 009_twophase.pl is able to exhibit the issue if
it enables archiving on the primary node, which does not impact the test
coverage as restore_command would remain unused.  This is something that
should be changed on v15 and HEAD as well, so this will be changed in a
separate commit for clarity.

Author: Julian Markwort
Reviewed-by: Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/743b9b45a2d4013bd90b6a5cba8d6faeb717ee34.camel@cybertec.at
Backpatch-through: 13
2023-06-20 09:36:35 +09:00
David Rowley 73f1c17fc3 Don't use partial unique indexes for unique proofs in the planner
Here we adjust relation_has_unique_index_for() so that it no longer makes
use of partial unique indexes as uniqueness proofs.  It is incorrect to
use these as the predicates used by check_index_predicates() to set
predOK makes use of not only baserestrictinfo quals as proofs, but also
qual from join conditions.  For relation_has_unique_index_for()'s case, we
need to know the relation is unique for a given set of columns before any
joins are evaluated, so if predOK was only set to true due to some join
qual, then it's unsafe to use such indexes in
relation_has_unique_index_for().  The final plan may not even make use
of that index, which could result in reading tuples that are not as
unique as the planner previously expected them to be.

Bug: #17975
Reported-by: Tor Erik Linnerud
Backpatch-through: 11, all supported versions
Discussion: https://postgr.es/m/17975-98a90c156f25c952%40postgresql.org
2023-06-19 13:01:58 +12:00
Amit Langote 3f157d085b Fix typo in comment.
Back-patch down to 11.

Author: Sho Kato (<kato-sho@fujitsu.com>)
Discussion: https://postgr.es/m/TYCPR01MB68499042A33BC32241193AAF9F5BA%40TYCPR01MB6849.jpnprd01.prod.outlook.com
2023-06-16 10:19:33 +09:00
Michael Paquier 019a40d619 intarray: Prevent out-of-bound memory reads with gist__int_ops
As gist__int_ops stands in intarray, it is possible to store GiST
entries for leaf pages that can cause corruptions when decompressed.
Leaf nodes are stored as decompressed all the time by the compression
method, and the decompression method should map with that, retrieving
the contents of the page without doing any decompression.  However, the
code authorized the insertion of leaf page data with a higher number of
array items than what can be supported, generating a NOTICE message to
inform about this matter (199 for a 8k page, for reference).  When
calling the decompression method, a decompression would be attempted on
this leaf node item but the contents should be retrieved as they are.

The NOTICE message generated when dealing with the compression of a leaf
page and too many elements in the input array for gist__int_ops has been
introduced by 08ee64e, removing the marker stored in the array to track
if this is actually a leaf node.  However, it also missed the fact that
the decompression path should do nothing for a leaf page.  Hence, as the
code stand, a too-large array would be stored as uncompressed but the
decompression path would attempt a decompression rather that retrieving
the contents as they are.

This leads to various problems.  First, even if 08ee64e tried to address
that, it is possible to do out-of-bound chunk writes with a large input
array, with the backend informing about that with WARNINGs.  On
decompression, retrieving the stored leaf data would lead to incorrect
memory reads, leading to crashes or even worse.

Perhaps somebody would be interested in expanding the number of array
items that can be handled in a leaf page for this operator in the
future, which would require revisiting the choice done in 08ee64e, but
based on the lack of reports about this problem since 2005 it does not
look so.  For now, this commit prevents the insertion of data for leaf
pages when using more array items that the code can handle on
decompression, switching the NOTICE message to an ERROR.  If one wishes
to use more array items, gist__intbig_ops is an optional choice.

While on it, use ERRCODE_PROGRAM_LIMIT_EXCEEDED as error code when a
limit is reached, because that's what the module is facing in such
cases.

Author: Ankit Kumar Pandey, Alexander Lakhin
Reviewed-by: Richard Guo, Michael Paquier
Discussion: https://postgr.es/m/796b65c3-57b7-bddf-b0d5-a8afafb8b627@gmail.com
Discussion: https://postgr.es/m/17888-f72930e6b5ce8c14@postgresql.org
Backpatch-through: 11
2023-06-15 13:45:40 +09:00
Tom Lane d1423c52e3 Correctly update hasSubLinks while mutating a rule action.
rewriteRuleAction neglected to check for SubLink nodes in the
securityQuals of range table entries.  This could lead to failing
to convert such a SubLink to a SubPlan, resulting in assertion
crashes or weird errors later in planning.

In passing, fix some poor coding in rewriteTargetView:
we should not pass the source parsetree's hasSubLinks
field to ReplaceVarsFromTargetList's outer_hasSubLinks.
ReplaceVarsFromTargetList knows enough to ignore that
when a Query node is passed, but it's still confusing
and bad precedent: if we did try to update that flag
we'd be updating a stale copy of the parsetree.

Per bug #17972 from Alexander Lakhin.  This has been broken since
we added RangeTblEntry.securityQuals (although the presented test
case only fails back to 215b43cdc), so back-patch all the way.

Discussion: https://postgr.es/m/17972-f422c094237847d0@postgresql.org
2023-06-13 15:58:37 -04:00
Tom Lane 5eaa05f637 Accept fractional seconds in jsonpath's datetime() method.
Commit 927d9abb6 purported to make datetime() accept any string
that could be output for a datetime value by to_jsonb().  But it
overlooked the possibility of fractional seconds being present,
so that cases as simple as to_jsonb(now()) would defeat it.

Fix by adding formats that include ".US" to the list in
executeDateTimeMethod().  (Note that while this is nominally
microseconds, it'll do the right thing for fractions with
fewer than six digits.)

In passing, re-order the list to restore the datatype ordering
specified in its comment.  The violation accidentally did not
break anything; but the next edit might be less lucky, so add
more comments.

Per report from Tim Field.  Back-patch to v13 where datetime()
was added, like the previous patch.

Discussion: https://postgr.es/m/014A028B-5CE6-4FDF-AC24-426CA6FC9CEE@mohiohio.com
2023-06-12 10:54:28 -04:00
Michael Paquier e0e6829459 hstore: Tighten key/value parsing check for whitespaces
isspace() can be locale-sensitive depending on the platform, causing
hstore to consider as whitespaces characters it should not see as such.
For example, U+0105, being decoded as 0xC4 0x85 in UTF-8, would be
discarded from the input given.

This problem is similar to 9ae2661, though it was missed that hstore
can also manipulate non-ASCII inputs, so replace the existing isspace()
calls with scanner_isspace().

This problem exists for a long time, so backpatch all the way down.

Author: Evan Jones
Discussion: https://postgr.es/m/CA+HWA9awUW0+RV_gO9r1ABZwGoZxPztcJxPy8vMFSTbTfi4jig@mail.gmail.com
Backpatch-through: 11
2023-06-12 09:14:14 +09:00
Michael Paquier c6043fcbb2 Fix missing initializations of MyProc.delayChkptEnd
This commit fixes an oversight introduced in 10520f4, that has added
delayChkptEnd to PGPROC to avoid ABI breakages on stable branches, where
two spots have missed to initialize this variable (delayChkpt was
switched back from int to bool, and it was initialized as 0 so there was
no consequences for it):
- InitProcess(), where the per-process data structures of a backend are
initialized.
- InitAuxiliaryProcess(), same but for auxiliary processes.

An interruption during relation truncation while this flag is set could
cause an assertion failure when a follow-up process does a relation
truncation while reusing the same PGPROC entry.  A second effect could
be incorrect checkpoint end delays.

While on it, add an assertion in ProcArrayClearTransaction() for
delayChkptEnd to be in line with 5788e25.  This is needed only for v14.

This issue affects v11~v14, but not v15~, as we use a single field
called delayChkptFlags to delay checkpoints there.

Author: suyu.cmj (mengjuan.cmj@alibaba-inc.com)
Reviewed-by: Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/9c3d2a49-db5f-43cb-840b-d58f9a684295.mengjuan.cmj@alibaba-inc.com
Backpatch-through: 11
2023-06-11 10:33:46 +09:00
Michael Paquier 28af91b4e7 Refactor routine to find single log content pattern in TAP tests
The same routine to check if a specific pattern can be found in the
server logs was copied over four different test scripts.  This refactors
the whole to use a single routine located in PostgreSQL::Test::Cluster,
named log_contains, to grab the contents of the server logs and check
for a specific pattern.

On HEAD, the code previously used assumed that slurp_file() could not
handle an undefined offset, setting it to zero, but slurp_file() does
do an extra fseek() before retrieving the log contents only if an offset
is defined.  In two places, the test was retrieving the full log
contents with slurp_file() after calling substr() to apply an offset,
ignoring that slurp_file() would be able to handle that.

Backpatch all the way down to ease the introduction of new tests that
could rely on the new routine.

Author: Vignesh C
Reviewed-by: Andrew Dunstan, Dagfinn Ilmari Mannsåker, Michael Paquier
Discussion: https://postgr.es/m/CALDaNm0YSiLpjCmajwLfidQrFOrLNKPQir7s__PeVvh9U3uoTQ@mail.gmail.com
Backpatch-through: 11
2023-06-09 11:56:41 +09:00
Michael Paquier 30469a6ed4 Refactor log check logic for connect_ok/fails in PostgreSQL::Test::Cluster
This commit refactors a bit the code in charge of checking for log
patterns when connections fail or succeed, by moving the log pattern
checks into their own routine, for clarity.  This has come up as
something to improve while discussing the refactoring of find_in_log().

Backpatch down to 14 where these routines are used, to ease the
introduction of new tests that could rely on them.

Author: Vignesh C, Michael Paquier
Discussion: https://postgr.es/m/CALDaNm0YSiLpjCmajwLfidQrFOrLNKPQir7s__PeVvh9U3uoTQ@mail.gmail.com
Backpatch-through: 14
2023-06-09 09:37:34 +09:00
Fujii Masao fd3def3950 doc: Fix example command for ALTER FOREIGN TABLE ... OPTIONS.
In the documentation, previously the example command for
ALTER FOREIGN TABLE ... OPTIONS incorrectly included both
the option name and value with the DROP operation.
The correct syntax for the DROP operation requires only
the name of the option to be specified. This commit fixes
the example by removing the option value from the DROP operation.

Back-patch to all supported versions.

Author: Mehmet Emin KARAKAS <emin100@gmail.com>
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CANQrdXAHzbcEYhjGoe5A42OmfvdQhHFJzyKj9gJvHuDKyOF5Ng@mail.gmail.com
2023-06-08 20:14:10 +09:00
Tomas Vondra 7f528e96c5 Use per-tuple context in ExecGetAllUpdatedCols
Commit fc22b6623b (generated columns) replaced ExecGetUpdatedCols() with
ExecGetAllUpdatedCols() in a couple places handling UPDATE (triggers and
lock mode). However, ExecGetUpdatedCols() did exec_rt_fetch() while
ExecGetAllUpdatedCols() also allocates memory through bms_union()
without paying attention to the memory context and happened to use the
long-lived ExecutorState, leaking the memory until the end of the query.

The amount of leaked memory is proportional to the number of (updated)
attributes, types of UPDATE triggers, and the number of processed rows
(which for UPDATE ... FROM ... may be much higher than updated rows).

Fixed by switching to the per-tuple context in GetAllUpdatedColumns().
This is fine for all in-core callers, but external callers may need to
copy the result. But we're not aware of any such callers.

Note the issue was introduced by fc22b6623b, but the macros were later
renamed by f50e888990.

Backpatch to 12, where the issue was introduced.

Reported-by: Tomas Vondra
Reviewed-by: Andres Freund, Tom Lane, Jakub Wartak
Backpatch-through: 12
Discussion: https://postgr.es/m/222a3442-7f7d-246c-ed9b-a76209d19239@enterprisedb.com
2023-06-07 18:53:04 +02:00
Heikki Linnakangas 525ec837e1 Initialize 'recordXtime' to silence compiler warning.
In reality, recordXtime will always be set by the getRecordTimestamp
call, but the compiler doesn't necessarily see that.

Back-patch to all supported versions.

Author: Tristan Partin
Discussion: https://www.postgresql.org/message-id/CT5MN8E11U0M.1NYNCHXYUHY41@gonk
2023-06-06 20:32:02 +03:00
Tom Lane 1b322c1fae Doc: explain about dependency tracking for new-style SQL functions.
5.14 Dependency Tracking was not updated when we added new-style
SQL functions.  Improve that.

Noted by Sami Imseih.  Back-patch to v14 where
new-style SQL functions came in.

Discussion: https://postgr.es/m/2C1933AB-C2F8-499B-9D18-4AC1882256A0@amazon.com
2023-06-04 13:27:34 -04:00
Tom Lane d6f549d7a6 Fix pg_dump's failure to honor dependencies of SQL functions.
A new-style SQL function can contain a parse-time dependency
on a unique index, much as views and matviews can (such cases
arise from GROUP BY and ON CONFLICT clauses, for example).
To dump and restore such a function successfully, pg_dump must
postpone the function until after the unique index is created,
which will happen in the post-data part of the dump.  Therefore
we have to remove the normal constraint that functions are
dumped in pre-data.  Add code similar to the existing logic
that handles this for matviews.  I added test cases for both
as well, since code coverage tests showed that we weren't
testing the matview logic.

Per report from Sami Imseih.  Back-patch to v14 where
new-style SQL functions came in.

Discussion: https://postgr.es/m/2C1933AB-C2F8-499B-9D18-4AC1882256A0@amazon.com
2023-06-04 13:05:54 -04:00
Bruce Momjian 3d8aefece4 doc: add missing "the" in LATERAL sentence.
Backpatch-through: 11
2023-06-01 10:22:16 -04:00
Peter Geoghegan 322c9b340a nbtree VACUUM: cope with right sibling link corruption.
Avoid "right sibling's left-link doesn't match" errors when vacuuming a
corrupt nbtree index.  Just LOG the issue and press on.  That way VACUUM
will have a decent chance of finishing off all required processing for
the index (and for the table as a whole).

This error was seen in the field from time to time (it's more than a
theoretical risk), so giving VACUUM the ability to press on like this
has real value.  Nothing short of a REINDEX is expected to fix the
underlying index corruption, so giving up (by throwing an error) risks
making a bad situation far worse.  Anything that blocks forward progress
by VACUUM like this might go unnoticed for a long time.  This could
eventually lead to a wraparound/xidStopLimit outage.

Note that _bt_unlink_halfdead_page() has always been able to bail on
page deletion when the target page's left sibling page was in an
inconsistent state.  It now does the same thing (returns false to back
out of the second phase of deletion) when it notices sibling link
corruption in the target page's right sibling page.

This is similar to the work from commit 5b861baa (later backpatched as
commit 43e409ce), which taught nbtree to press on with vacuuming an
index when page deletion fails to "re-find" a downlink in the target
page's parent page.  The "re-find" check seems to make VACUUM bail on
page deletion more often in practice, but there is no reason to take any
chances here.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CAH2-Wzko2q2kP1+UvgJyP9g0mF4hopK0NtQZcxwvMv9_ytGhkQ@mail.gmail.com
Backpatch: 11- (all supported versions).
2023-05-25 15:32:53 -07:00
Tom Lane f8320cc72d Fix misbehavior of EvalPlanQual checks with multiple result relations.
The idea of EvalPlanQual is that we replace the query's scan of the
result relation with a single injected tuple, and see if we get a
tuple out, thereby implying that the injected tuple still passes the
query quals.  (In join cases, other relations in the query are still
scanned normally.)  This logic was not updated when commit 86dc90056
made it possible for a single DML query plan to have multiple result
relations, when the query target relation has inheritance or partition
children.  We replaced the output for the current result relation
successfully, but other result relations were still scanned normally;
thus, if any other result relation contained a tuple satisfying the
quals, we'd think the EPQ check passed, even if it did not pass for
the injected tuple itself.  This would lead to update or delete
actions getting performed when they should have been skipped due to
a conflicting concurrent update in READ COMMITTED isolation mode.

Fix by blocking all sibling result relations from emitting tuples
during an EvalPlanQual recheck.  In the back branches, the fix is
complicated a bit by the need to not change the size of struct
EPQState (else we'd have ABI-breaking changes in offsets in
struct ModifyTableState).  Like the back-patches of 3f7836ff6
and 4b3e37993, add a separately palloc'd struct to avoid that.
The logic is the same as in HEAD otherwise.

This is only a live bug back to v14 where 86dc90056 came in.
However, I chose to back-patch the test cases further, on the
grounds that this whole area is none too well tested.  I skipped
doing so in v11 though because none of the test applied cleanly,
and it didn't quite seem worth extra work for a branch with only
six months to live.

Per report from Ante Krešić (via Aleksander Alekseev)

Discussion: https://postgr.es/m/CAJ7c6TMBTN3rcz4=AjYhLPD_w3FFT0Wq_C15jxCDn8U4tZnH1g@mail.gmail.com
2023-05-19 14:26:34 -04:00