Commit Graph

22324 Commits

Author SHA1 Message Date
Robert Haas e997a0c642 Remove all use of ThisTimeLineID global variable outside of xlog.c
All such code deals with this global variable in one of three ways.
Sometimes the same functions use it in more than one of these ways
at the same time.

First, sometimes it's an implicit argument to one or more functions
being called in xlog.c or elsewhere, and must be set to the
appropriate value before calling those functions lest they
misbehave. In those cases, it is now passed as an explicit argument
instead.

Second, sometimes it's used to obtain the current timeline after
the end of recovery, i.e. the timeline to which WAL is being
written and flushed. Such code now calls GetWALInsertionTimeLine()
or relies on the new out parameter added to GetFlushRecPtr().

Third, sometimes it's used during recovery to store the current
replay timeline. That can change, so such code must generally
update the value before each use. It can still do that, but must
now use a local variable instead.

The net effect of these changes is to reduce by a fair amount the
amount of code that is directly accessing this global variable.
That's good, because history has shown that we don't always think
clearly about which timeline ID it's supposed to contain at any
given point in time, or indeed, whether it has been or needs to
be initialized at any given point in the code.

Patch by me, reviewed and tested by Michael Paquier, Amul Sul, and
Álvaro Herrera.

Discussion: https://postgr.es/m/CA+TgmobfAAqhfWa1kaFBBFvX+5CjM=7TE=n4r4Q1o2bjbGYBpA@mail.gmail.com
2021-11-05 12:50:01 -04:00
Robert Haas caf1f675b8 Don't set ThisTimeLineID when there's no reason to do so.
In slotfuncs.c, pg_replication_slot_advance() needs to determine
the LSN up to which the slot should be advanced, but that doesn't
require us to update ThisTimeLineID, because none of the code called
from here depends on it. If the replication slot is logical,
pg_logical_replication_slot_advance will call read_local_xlog_page,
which does use ThisTimeLineID, but also takes care of making sure
it's up to date. If the replication slot is physical, the timeline
isn't used for anything at all.

In logicalfuncs.c, pg_logical_slot_get_changes_guts() has the same
issue: the only code we're going to run that cares about timelines
is in or downstream of read_local_xlog_page, which already makes
sure that the correct value gets set. Hence, don't do it here.

Patch by me, reviewed and tested by Michael Paquier, Amul Sul, and
Álvaro Herrera.

Discussion: https://postgr.es/m/CA+TgmobfAAqhfWa1kaFBBFvX+5CjM=7TE=n4r4Q1o2bjbGYBpA@mail.gmail.com
2021-11-05 12:43:04 -04:00
Alvaro Herrera d74b54b3dd
Avoid crash in rare case of concurrent DROP
When a role being dropped contains is referenced by catalog objects that
are concurrently also being dropped, a crash can result while trying to
construct the string that describes the objects.  Suppress that by
ignoring objects whose descriptions are returned as NULL.

The majority of relevant codesites were already cautious about this
already; we had just missed a couple.

This is an old bug, so backpatch all the way back.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/17126-21887f04508cb5c8@postgresql.org
2021-11-05 12:29:35 -03:00
Robert Haas bef47ff85d Introduce 'bbsink' abstraction to modularize base backup code.
The base backup code has accumulated a healthy number of new
features over the years, but it's becoming increasingly difficult
to maintain and further enhance that code because there's no
real separation of concerns. For example, the code that
understands knows the details of how we send data to the client
using the libpq protocol is scattered throughout basebackup.c,
rather than being centralized in one place.

To try to improve this situation, introduce a new 'bbsink' object
which acts as a recipient for archives generated during the base
backup progress and also for the backup manifest. This commit
introduces three types of bbsink: a 'copytblspc' bbsink forwards the
backup to the client using one COPY OUT operation per tablespace and
another for the manifest, a 'progress' bbsink performs command
progress reporting, and a 'throttle' bbsink performs rate-limiting.
The 'progress' and 'throttle' bbsink types also forward the data to a
successor bbsink; at present, the last bbsink in the chain will
always be of type 'copytblspc'. There are plans to add more types
of 'bbsink' in future commits.

This abstraction is a bit leaky in the case of progress reporting,
but this still seems cleaner than what we had before.

Patch by me, reviewed and tested by Andres Freund, Sumanta Mukherjee,
Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja, Mark Dilger,
and Jeevan Ladhe.

Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 10:08:30 -04:00
Peter Geoghegan e7428a99a1 Add hardening to catch invalid TIDs in indexes.
Add hardening to the heapam index tuple deletion path to catch TIDs in
index pages that point to a heap item that index tuples should never
point to.  The corruption we're trying to catch here is particularly
tricky to detect, since it typically involves "extra" (corrupt) index
tuples, as opposed to the absence of required index tuples in the index.

For example, a heap TID from an index page that turns out to point to an
LP_UNUSED item in the heap page has a good chance of being caught by one
of the new checks.  There is a decent chance that the recently fixed
parallel VACUUM bug (see commit 9bacec15) would have been caught had
that particular check been in place for Postgres 14.  No backpatch of
this extra hardening for now, though.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wzk-4_raTzawWGaiqNvkpwDXxv3y1AQhQyUeHfkU=tFCeA@mail.gmail.com
2021-11-04 19:54:05 -07:00
Peter Geoghegan 5cd7eb1f1c Add various assertions to heap pruning code.
These assertions document (and verify) our high level assumptions about
how pruning can and cannot affect existing items from target heap pages.
For example, one of the new assertions verifies that pruning does not
set a heap-only tuple to LP_DEAD.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wz=vhvBx1GjF+oueHh8YQcHoQYrMi0F0zFMHEr8yc4sCoA@mail.gmail.com
2021-11-04 19:07:54 -07:00
Heikki Linnakangas 6b1b405ebf Fix snapshot reference leak if lo_export fails.
If lo_export() fails to open the target file or to write to it, it leaks
the created LargeObjectDesc and its snapshot in the top-transaction
context and resource owner. That's pretty harmless, it's a small leak
after all, but it gives the user a "Snapshot reference leak" warning.

Fix by using a short-lived memory context and no resource owner for
transient LargeObjectDescs that are opened and closed within one function
call. The leak is easiest to reproduce with lo_export() on a directory
that doesn't exist, but in principle the other lo_* functions could also
fail.

Backpatch to all supported versions.

Reported-by: Andrew B
Reviewed-by: Alvaro Herrera
Discussion: https://www.postgresql.org/message-id/32bf767a-2d65-71c4-f170-122f416bab7e@iki.fi
2021-11-03 10:52:38 +02:00
Peter Geoghegan c59278a1aa Fix parallel amvacuumcleanup safety bug.
Commit b4af70cb inverted the return value of the function
parallel_processing_is_safe(), but missed the amvacuumcleanup test.
Index AMs that don't support parallel cleanup at all were affected.

The practical consequences of this bug were not very serious.  Hash
indexes are affected, but since they just return the number of blocks
during hashvacuumcleanup anyway, it can't have had much impact.

Author: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoA-Em+aeVPmBbL_s1V-ghsJQSxYL-i3JP8nTfPiD1wjKw@mail.gmail.com
Backpatch: 14-, where commit b4af70cb appears.
2021-11-02 19:52:11 -07:00
Tom Lane 24f9e49e43 Blind attempt to silence SSL compile failures on hamerkop.
Buildfarm member hamerkop has been failing for the last few days
with errors that look like OpenSSL's X509-related symbols have
not been imported into be-secure-openssl.c.  It's unclear why
this should be, but let's try adding an explicit #include of
<openssl/x509v3.h>, as there has long been in fe-secure-openssl.c.

Discussion: https://postgr.es/m/1051867.1635720347@sss.pgh.pa.us
2021-11-02 15:18:07 -04:00
Peter Geoghegan 9bacec15b6 Don't overlook indexes during parallel VACUUM.
Commit b4af70cb, which simplified state managed by VACUUM, performed
refactoring of parallel VACUUM in passing.  Confusion about the exact
details of the tasks that the leader process is responsible for led to
code that made it possible for parallel VACUUM to miss a subset of the
table's indexes entirely.  Specifically, indexes that fell under the
min_parallel_index_scan_size size cutoff were missed.  These indexes are
supposed to be vacuumed by the leader (alongside any parallel unsafe
indexes), but weren't vacuumed at all.  Affected indexes could easily
end up with duplicate heap TIDs, once heap TIDs were recycled for new
heap tuples.  This had generic symptoms that might be seen with almost
any index corruption involving structural inconsistencies between an
index and its table.

To fix, make sure that the parallel VACUUM leader process performs any
required index vacuuming for indexes that happen to be below the size
cutoff.  Also document the design of parallel VACUUM with these
below-size-cutoff indexes.

It's unclear how many users might be affected by this bug.  There had to
be at least three indexes on the table to hit the bug: a smaller index,
plus at least two additional indexes that themselves exceed the size
cutoff.  Cases with just one additional index would not run into
trouble, since the parallel VACUUM cost model requires two
larger-than-cutoff indexes on the table to apply any parallel
processing.  Note also that autovacuum was not affected, since it never
uses parallel processing.

Test case based on tests from a larger patch to test parallel VACUUM by
Masahiko Sawada.

Many thanks to Kamigishi Rei for her invaluable help with tracking this
problem down.

Author: Peter Geoghegan <pg@bowt.ie>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-By: Kamigishi Rei <iijima.yun@koumakan.jp>
Reported-By: Andrew Gierth <andrew@tao11.riddles.org.uk>
Diagnosed-By: Andres Freund <andres@anarazel.de>
Bug: #17245
Discussion: https://postgr.es/m/17245-ddf06aaf85735f36@postgresql.org
Discussion: https://postgr.es/m/20211030023740.qbnsl2xaoh2grq3d@alap3.anarazel.de
Backpatch: 14-, where the refactoring commit appears.
2021-11-02 12:06:17 -07:00
Tom Lane f3d4019da5 Ensure consistent logical replication of datetime and float8 values.
In walreceiver, set the publisher's relevant GUCs (datestyle,
intervalstyle, extra_float_digits) to the same values that pg_dump uses,
and for the same reason: we need the output to be read the same way
regardless of the receiver's settings.  Without this, it's possible
for subscribers to misinterpret transmitted values.

Although this is clearly a bug fix, it's not without downsides:
subscribers that are storing values into some other datatype, such as
text, could get different results than before, and perhaps be unhappy
about that.  Given the lack of previous complaints, it seems best
to change this only in HEAD, and to call it out as an incompatible
change in v15.

Japin Li, per report from Sadhuprasad Patro

Discussion: https://postgr.es/m/CAFF0-CF=D7pc6st-3A9f1JnOt0qmc+BcBPVzD6fLYisKyAjkGA@mail.gmail.com
2021-11-02 14:28:50 -04:00
Tom Lane 01fc652703 Fix variable lifespan in ExecInitCoerceToDomain().
This undoes a mistake in 1ec7679f1: domainval and domainnull were
meant to live across loop iterations, but they were incorrectly
moved inside the loop.  The effect was only to emit useless extra
EEOP_MAKE_READONLY steps, so it's not a big deal; nonetheless,
back-patch to v13 where the mistake was introduced.

Ranier Vilela

Discussion: https://postgr.es/m/CAEudQAqXuhbkaAp-sGH6dR6Nsq7v28_0TPexHOm6FiDYqwQD-w@mail.gmail.com
2021-11-02 13:36:47 -04:00
Tom Lane 65c6cab136 Avoid O(N^2) behavior in SyncPostCheckpoint().
As in commits 6301c3ada and e9d9ba2a4, avoid doing repetitive
list_delete_first() operations, since that would be expensive when
there are many files waiting to be unlinked.  This is a slightly
larger change than in those cases.  We have to keep the list state
valid for calls to AbsorbSyncRequests(), so it's necessary to invent a
"canceled" field instead of immediately deleting PendingUnlinkEntry
entries.  Also, because we might not be able to process all the
entries, we need a new list primitive list_delete_first_n().

list_delete_first_n() is almost list_copy_tail(), but it modifies the
input List instead of making a new copy.  I found a couple of existing
uses of the latter that could profitably use the new function.  (There
might be more, but the other callers look like they probably shouldn't
overwrite the input List.)

As before, back-patch to v13.

Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com
2021-11-02 11:31:54 -04:00
Amit Kapila 335397456b Move MarkCurrentTransactionIdLoggedIfAny() out of the critical section.
We don't modify any shared state in this function which could cause
problems for any concurrent session. This will make it look similar to the
other updates for the same structure (TransactionState) which avoids
confusion for future readers of code.

Author: Dilip Kumar
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/E1mSoYz-0007Fh-D9@gemulon.postgresql.org
2021-11-02 09:11:05 +05:30
Amit Kapila 71db6459e6 Replace XLOG_INCLUDE_XID flag with a more localized flag.
Commit 0bead9af48 introduced XLOG_INCLUDE_XID flag to indicate that the
WAL record contains subXID-to-topXID association. It uses that flag later
to mark in CurrentTransactionState that top-xid is logged so that we
should not try to log it again with the next WAL record in the current
subtransaction. However, we can use a localized variable to pass that
information.

In passing, change the related function and variable names to make them
consistent with what the code is actually doing.

Author: Dilip Kumar
Reviewed-by: Alvaro Herrera, Amit Kapila
Discussion: https://postgr.es/m/E1mSoYz-0007Fh-D9@gemulon.postgresql.org
2021-11-02 08:35:29 +05:30
Daniel Gustafsson 43a134f28b Replace unicode characters in comments with ascii
The unicode characters, while in comments and not code, caused MSVC
to emit compiler warning C4819:

  The file contains a character that cannot be represented in the
  current code page (number).  Save the file in Unicode format to
  prevent data loss.

Fix by replacing the characters in print.c with descriptive comments
containing the codepoints and symbol names, and remove the character
in brin_bloom.c which was a footnote reference copied from the paper
citation.

Per report from hamerkop in the buildfarm.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/340E4118-0D0C-4E85-8141-8C40EB22DA3A@yesql.se
2021-11-01 22:42:49 +01:00
Tom Lane e9d9ba2a4d Avoid some other O(N^2) hazards in list manipulation.
In the same spirit as 6301c3ada, fix some more places where we were
using list_delete_first() in a loop and thereby risking O(N^2)
behavior.  It's not clear that the lists manipulated in these spots
can get long enough to be really problematic ... but it's not clear
that they can't, either, and the fixes are simple enough.

As before, back-patch to v13.

Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com
2021-11-01 16:24:39 -04:00
Alvaro Herrera 40c516bba8
Handle XLOG_OVERWRITE_CONTRECORD in DecodeXLogOp
Failing to do so results in inability of logical decoding to process the
WAL stream.  Handle it by doing nothing.

Backpatch all the way back.

Reported-by: Petr Jelínek <petr.jelinek@enterprisedb.com>
2021-11-01 13:07:23 -03:00
Michael Paquier add5cf28d4 Preserve opclass parameters across REINDEX CONCURRENTLY
The opclass parameter Datums from the old index are fetched in the same
way as for predicates and expressions, by grabbing them directly from
the system catalogs.  They are then copied into the new IndexInfo that
will be used for the creation of the new copy.

This caused the new index to be rebuilt with default parameters rather
than the ones pre-defined by a user.  The only way to get back a new
index with correct opclass parameters would be to recreate a new index
from scratch.

The issue has been introduced by 911e702.

Author: Michael Paquier
Reviewed-by: Zhihong Yu
Discussion: https://postgr.es/m/YX0CG/QpLXcPr8HJ@paquier.xyz
Backpatch-through: 13
2021-11-01 11:38:23 +09:00
Tom Lane 6301c3adab Avoid O(N^2) behavior when the standby process releases many locks.
When replaying a transaction that held many exclusive locks on the
primary, a standby server's startup process would expend O(N^2)
effort on manipulating the list of locks.  This code was fine when
written, but commit 1cff1b95a made repetitive list_delete_first()
calls inefficient, as explained in its commit message.  Fix by just
iterating the list normally, and releasing storage only when done.
(This'd be inadequate if we needed to recover from an error occurring
partway through; but we don't.)

Back-patch to v13 where 1cff1b95a came in.

Nathan Bossart

Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com
2021-10-31 15:31:29 -04:00
Robert Haas 5ccceb2946 Fix race condition in startup progress reporting.
Commit 9ce346eabf added startup
progress reporting, but begin_startup_progress_phase has a race
condition: the timeout for the previous phase might fire just
before we reschedule the interrupt for the next phase.

To avoid the race, disable the timeout, clear the flag, and then
re-enable the timeout.

Patch by me, reviewed by Nitin Jadhav.

Discussion: https://postgr.es/m/CA+TgmoYq38i6iAzfRLVxA6Cm+wMCf4WM8wC3o_a+X_JvWC8bJg@mail.gmail.com
2021-10-29 14:40:15 -04:00
Robert Haas 2f5c4397c3 When fetching WAL for a basebackup, report errors with a sensible TLI.
The previous code used ThisTimeLineID, which need not even be
initialized here, although it usually was in practice, because
pg_basebackup issues IDENTIFY_SYSTEM before calling BASE_BACKUP,
and that initializes ThisTimeLineID as a side effect. That's not
really good enough, though, not only because we shoudn't be counting
on side effects like that, but also because the TLI could change
meanwhile. Fortunately, we have convenient access to more meaningful
TLI values, so use those instead.

Because of the way this logic is coded, the consequences of using
a possibly-incorrect TLI here are no worse than a slightly confusing
error message, I don't want to take any risk here, so no back-patch
at least for now.

Patch by me, reviewed by Kyotaro Horiguchi and Michael Paquier

Discussion: http://postgr.es/m/CA+TgmoZRNWGWYDX9RgTXMG6_nwSdB=PB-PPRUbvMUTGfmL2sHQ@mail.gmail.com
2021-10-29 14:00:32 -04:00
Peter Geoghegan 5f55fc5a34 Demote pg_unreachable() in heapam to an assertion.
Commit d168b66682, which overhauled index deletion, added a
pg_unreachable() to the end of a sort comparator used when sorting heap
TIDs from an index page.  This allows the compiler to apply
optimizations that assume that the heap TIDs from the index AM must
always be unique.

That doesn't seem like a good idea now, given recent reports of
corruption involving duplicate TIDs in indexes on Postgres 14.  Demote
to an assertion, just in case.

Backpatch: 14-, where index deletion was overhauled.
2021-10-29 10:53:48 -07:00
Peter Geoghegan 4c6afd805b Remove obsolete nbtree LP_DEAD item comments.
Comments above _bt_findinsertloc() that talk about LP_DEAD items are now
out of place.  We already discuss index tuple deletion at an earlier
point in the same comment block.

Oversight in commit d168b666.
2021-10-27 14:35:21 -07:00
Jeff Davis 77ea4f9439 Grant memory views to pg_read_all_stats.
Grant privileges on views pg_backend_memory_contexts and
pg_shmem_allocations to the role pg_read_all_stats. Also grant on the
underlying functions that those views depend on.

Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Nathan Bossart <bossartn@amazon.com>
Discussion: https://postgr.es/m/CALj2ACWAZo3Ar_EVsn2Zf9irG+hYK3cmh1KWhZS_Od45nd01RA@mail.gmail.com
2021-10-27 14:06:30 -07:00
Daniel Gustafsson 8af57ad815 Fix typos in comments
Author: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/CAHut+PsN_gmKu-KfeEb9NDARoTPbs4AN4PPu=6LZXFZRJ13SEw@mail.gmail.com
2021-10-27 22:38:38 +02:00
Peter Geoghegan c2381b5104 Fix ordering of items in nbtree error message.
Oversight in commit a5213adf.

Backpatch: 13-, just like commit a5213adf.
2021-10-27 13:09:24 -07:00
Peter Geoghegan a5213adf3d Further harden nbtree posting split code.
Add more defensive checks around posting list split code.  These should
detect corruption involving duplicate table TIDs earlier and more
reliably than any existing check.

Follow up to commit 8f72bbac.

Discussion: https://postgr.es/m/CAH2-WzkrSY_kjyd1_M5xJK1uM0govJXMxPn8JUSvwcUOiHuWVw@mail.gmail.com
Backpatch: 13-, where nbtree deduplication was introduced.
2021-10-27 12:10:47 -07:00
Amit Kapila 5a2832465f Allow publishing the tables of schema.
A new option "FOR ALL TABLES IN SCHEMA" in Create/Alter Publication allows
one or more schemas to be specified, whose tables are selected by the
publisher for sending the data to the subscriber.

The new syntax allows specifying both the tables and schemas. For example:
CREATE PUBLICATION pub1 FOR TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2;

A new system table "pg_publication_namespace" has been added, to maintain
the schemas that the user wants to publish through the publication.
Modified the output plugin (pgoutput) to publish the changes if the
relation is part of schema publication.

Updates pg_dump to identify and dump schema publications. Updates the \d
family of commands to display schema publications and \dRp+ variant will
now display associated schemas if any.

Author: Vignesh C, Hou Zhijie, Amit Kapila
Syntax-Suggested-by: Tom Lane, Alvaro Herrera
Reviewed-by: Greg Nancarrow, Masahiko Sawada, Hou Zhijie, Amit Kapila, Haiying Tang, Ajin Cherian, Rahila Syed, Bharath Rupireddy, Mark Dilger
Tested-by: Haiying Tang
Discussion: https://www.postgresql.org/message-id/CALDaNm0OANxuJ6RXqwZsM1MSY4s19nuH3734j4a72etDwvBETQ@mail.gmail.com
2021-10-27 07:44:52 +05:30
Jeff Davis f0b051e322 Allow GRANT on pg_log_backend_memory_contexts().
Remove superuser check, allowing any user granted permissions on
pg_log_backend_memory_contexts() to log the memory contexts of any
backend.

Note that this could allow a privileged non-superuser to log the
memory contexts of a superuser backend, but as discussed, that does
not seem to be a problem.

Reviewed-by: Nathan Bossart, Bharath Rupireddy, Michael Paquier, Kyotaro Horiguchi, Andres Freund
Discussion: https://postgr.es/m/e5cf6684d17c8d1ef4904ae248605ccd6da03e72.camel@j-davis.com
2021-10-26 13:31:38 -07:00
Fujii Masao 5fedf7417b Improve HINT message that FDW reports when there are no valid options.
The foreign data wrapper's validator function provides a HINT message with
list of valid options for the object specified in CREATE or ALTER command,
when the option given in the command is invalid. Previously
postgresql_fdw_validator() and the validator functions for postgres_fdw and
dblink_fdw worked in that way even there were no valid options in the object,
which could lead to the HINT message with empty list (because there were
no valid options). For example, ALTER FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (format 'csv') reported the following ERROR and HINT messages.
This behavior was confusing.

    ERROR: invalid option "format"
    HINT: Valid options in this context are:

There is no such issue in file_fdw. The validator function for file_fdw
reports the HINT message "There are no valid options in this context."
instead in that case.

This commit improves postgresql_fdw_validator() and the validator functions
for postgres_fdw and dblink_fdw so that they do likewise. For example,
this change causes the above ALTER FOREIGN DATA WRAPPER command to
report the following messages.

    ERROR:  invalid option "nonexistent"
    HINT:  There are no valid options in this context.

Author: Kosei Masumura
Reviewed-by: Bharath Rupireddy, Fujii Masao
Discussion: https://postgr.es/m/557d06cebe19081bfcc83ee2affc98d3@oss.nttdata.com
2021-10-27 00:46:52 +09:00
Daniel Gustafsson e63ce9e8d6 Ensure that slots are zeroed before use
The previous coding relied on the memory for the slots being zeroed
elsewhere, which while it was true in this case is not an contract
which is guaranteed to hold.  Explicitly clear the tts_isnull array
to ensure that the slots are filled from a known state.

Backpatch to v14 where the catalog multi-inserts were introduced.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAJ7c6TP0AowkUgNL6zcAK-s5HYsVHVBRWfu69FRubPpfwZGM9A@mail.gmail.com
Backpatch-through: 14
2021-10-26 10:40:08 +02:00
Thomas Munro 8781b0ce25 Reject huge_pages=on if shared_memory_type=sysv.
It doesn't work (it could, but hasn't been implemented).
Back-patch to 12, where shared_memory_type arrived.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/163271880203.22789.1125998876173795966@wrigleys.postgresql.org
2021-10-26 12:54:55 +13:00
Robert Haas a030a0c5cc Initialize variable to placate compiler.
Per Nathan Bossart.

Discussion: http://postgr.es/m/FECEE7FC-CB74-45A9-BB24-89FEE52A9585@amazon.com
2021-10-25 16:31:00 -04:00
Robert Haas 9ce346eabf Report progress of startup operations that take a long time.
Users sometimes get concerned whe they start the server and it
emits a few messages and then doesn't emit any more messages for
a long time. Generally, what's happening is either that the
system is taking a long time to apply WAL, or it's taking a
long time to reset unlogged relations, or it's taking a long
time to fsync the data directory, but it's not easy to tell
which is the case.

To fix that, add a new 'log_startup_progress_interval' setting,
by default 10s. When an operation that is known to be potentially
long-running takes more than this amount of time, we'll log a
status update each time this interval elapses.

To avoid undesirable log chatter, don't log anything about WAL
replay when in standby mode.

Nitin Jadhav and Robert Haas, reviewed by Amul Sul, Bharath
Rupireddy, Justin Pryzby, Michael Paquier, and Álvaro Herrera.

Discussion: https://postgr.es/m/CA+TgmoaHQrgDFOBwgY16XCoMtXxsrVGFB2jNCvb7-ubuEe1MGg@mail.gmail.com
Discussion: https://postgr.es/m/CAMm1aWaHF7VE69572_OLQ+MgpT5RUiUDgF1x5RrtkJBLdpRj3Q@mail.gmail.com
2021-10-25 11:51:57 -04:00
Robert Haas 732e6677a6 Add enable_timeout_every() to fire the same timeout repeatedly.
enable_timeout_at() and enable_timeout_after() can still be used
when you want to fire a timeout just once.

Patch by me, per a suggestion from Tom Lane.

Discussion: http://postgr.es/m/2992585.1632938816@sss.pgh.pa.us
Discussion: http://postgr.es/m/CA+TgmoYqSF5sCNrgTom9r3Nh=at4WmYFD=gsV-omStZ60S0ZUQ@mail.gmail.com
2021-10-25 11:33:44 -04:00
Robert Haas 902a2c2800 Remove useless code from CreateReplicationSlot.
According to the comments, we initialize sendTimeLineIsHistoric
and sendTimeLine here for the benefit of WalSndSegmentOpen.
However, the only way that can happen is if logical_read_xlog_page
calls WALRead. And since logical_read_xlog_page initializes the
same global variables internally, we don't need to also do it here.

These initializations have been here since replication slots were
introduced in commit 858ec11858. They
were certainly useless at that time, too, because logical decoding
didn't yet exist then, and physical replication doesn't examine any
WAL at the time of slot creation. I haven't checked all the
intermediate versions, but I suspect there's no point at which
this code ever did anything useful.

To reduce future confusion, remove the code. Since there's no
functional defect, no back-patch.

Discussion: http://postgr.es/m/CA+TgmobSWzacEs+r6C-7DrOPDHoDar4i9gzxB3SCBr5qjnLmVQ@mail.gmail.com
2021-10-25 10:57:12 -04:00
Robert Haas 18e0913a42 StartupXLOG: Don't repeatedly disable/enable local xlog insertion.
All the code that runs in the startup process to write WAL records
before that's allowed generally is now consecutive, so there's no
reason to shut the facility to write WAL locally off and then turn
it on again three times in a row.

Unfortunately, this requires a slight kludge in the checkpointer,
which needs to separately enable writing WAL in order to write the
checkpoint record. Because that code might run in the same process
as StartupXLOG() if we are in single-user mode, we must save/restore
the state of the LocalXLogInsertAllowed flag. Hopefully, we'll be
able to eliminate this wart in further refactoring, but it's
not too bad anyway.

Amul Sul, with modifications by me.

Discussion: http://postgr.es/m/CAAJ_b97fysj6sRSQEfOHj-y8Jfd5uPqOgO74qast89B4WfD+TA@mail.gmail.com
2021-10-25 10:16:28 -04:00
Robert Haas a75dbf7f9e StartupXLOG: Call CleanupAfterArchiveRecovery after XLogReportParameters.
This does a better job grouping related operations together, since
all of the WAL records that we need to write prior to allowing WAL
writes generally and written by a single uninterrupted stretch of code.

Since CleanupAfterArchiveRecovery() just (1) runs recovery_end_command,
(2) removes non-parent xlog files, and (3) archives any final partial
segment, this should be safe, because all of those things are pretty
much unrelated to the WAL record written by XLogReportParameters().

Amul Sul, per a suggestion from me

Discussion: http://postgr.es/m/CAAJ_b97fysj6sRSQEfOHj-y8Jfd5uPqOgO74qast89B4WfD+TA@mail.gmail.com
2021-10-25 10:02:36 -04:00
Heikki Linnakangas 166f94377c Clarify the logic in a few places in the new balanced merge code.
In selectnewtape(), use 'nOutputTapes' rather than 'nOutputRuns' in the
check for whether to start a new tape or to append a new run to an
existing tape. Until 'maxTapes' is reached, nOutputTapes is always equal
to nOutputRuns, so it doesn't change the logic, but it seems more logical
to compare # of tapes with # of tapes. Also, currently maxTapes is never
modified after the merging begins, but written this way, the code would
still work if it was. (Although the nOutputRuns == nOutputTapes assertion
would need to be removed and using nOutputRuns % nOutputTapes to
distribute the runs evenly across the tapes wouldn't do a good job
anymore).

Similarly in mergeruns(), change to USEMEM(state->tape_buffer_mem) to
account for the memory used for tape buffers. It's equal to availMem
currently, but tape_buffer_mem is more direct and future-proof. For
example, if we changed the logic to only allocate half of the remaining
memory to tape buffers, USEMEM(state->tape_buffer_mem) would still be
correct.

Coverity complained about these. Hopefully this patch helps it to
understand the logic better. Thanks to Tom Lane for initial analysis.
2021-10-25 09:30:49 +03:00
Michael Paquier b4ada4e19f Add replication command READ_REPLICATION_SLOT
The command is supported for physical slots for now, and returns the
type of slot, its restart_lsn and its restart_tli.

This will be useful for an upcoming patch related to pg_receivewal, to
allow the tool to be able to stream from the position of a slot, rather
than the last WAL position flushed by the backend (as reported by
IDENTIFY_SYSTEM) if the archive directory is found as empty, which would
be an advantage in the case of switching to a different archive
locations with the same slot used to avoid holes in WAL segment
archives.

Author: Ronan Dunklau
Reviewed-by: Kyotaro Horiguchi, Michael Paquier, Bharath Rupireddy
Discussion: https://postgr.es/m/18708360.4lzOvYHigE@aivenronan
2021-10-25 07:40:42 +09:00
Noah Misch 3cd9c3b921 Fix CREATE INDEX CONCURRENTLY for the newest prepared transactions.
The purpose of commit 8a54e12a38 was to
fix this, and it sufficed when the PREPARE TRANSACTION completed before
the CIC looked for lock conflicts.  Otherwise, things still broke.  As
before, in a cluster having used CIC while having enabled prepared
transactions, queries that use the resulting index can silently fail to
find rows.  It may be necessary to reindex to recover from past
occurrences; REINDEX CONCURRENTLY suffices.  Fix this for future index
builds by making CIC wait for arbitrarily-recent prepared transactions
and for ordinary transactions that may yet PREPARE TRANSACTION.  As part
of that, have PREPARE TRANSACTION transfer locks to its dummy PGPROC
before it calls ProcArrayClearTransaction().  Back-patch to 9.6 (all
supported versions).

Andrey Borodin, reviewed (in earlier versions) by Andres Freund.

Discussion: https://postgr.es/m/01824242-AA92-4FE9-9BA7-AEBAFFEA3D0C@yandex-team.ru
2021-10-23 18:36:38 -07:00
Noah Misch fdd965d074 Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.
CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start.  That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index.  Queries that use the
resulting index can silently fail to find rows.  Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation.  It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).

Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.

Discussion: https://postgr.es/m/20210730022548.GA1940096@gust.leadboat.com
2021-10-23 18:36:38 -07:00
Amit Kapila 1607cd0b6c Remove unused wait events.
Commit 464824323e introduced the wait events which were neither used by
that commit nor by follow-up commits for that work.

Author: Masahiro Ikeda
Backpatch-through: 14, where it was introduced
Discussion: https://postgr.es/m/ff077840-3ab2-04dd-bbe4-4f5dfd2ad481@oss.nttdata.com
2021-10-21 08:01:25 +05:30
Michael Paquier 98ec35b0bb Fix corruption of pg_shdepend when copying deps from template database
Using for a new database a template database with shared dependencies
that need to be copied over was causing a corruption of pg_shdepend
because of an off-by-one computation error of the index number used for
the values inserted with a slot.

Issue introduced by e3931d0.  Monitoring the rest of the code, there are
no similar mistakes.

Reported-by: Sven Klemm
Author: Aleksander Alekseev
Reviewed-by: Daniel Gustafsson, Michael Paquier
Discussion: https://postgr.es/m/CAJ7c6TP0AowkUgNL6zcAK-s5HYsVHVBRWfu69FRubPpfwZGM9A@mail.gmail.com
Backpatch-through: 14
2021-10-21 10:39:01 +09:00
Alvaro Herrera c2c618ff11
Ensure correct lock level is used in ALTER ... RENAME
Commit 1b5d797cd4 intended to relax the lock level used to rename
indexes, but inadvertently allowed *any* relation to be renamed with a
lowered lock level, as long as the command is spelled ALTER INDEX.
That's undesirable for other relation types, so retry the operation with
the higher lock if the relation turns out not to be an index.

After this fix, ALTER INDEX <sometable> RENAME will require access
exclusive lock, which it didn't before.

Author: Nathan Bossart <bossartn@amazon.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Onder Kalaci <onderk@microsoft.com>
Discussion: https://postgr.es/m/PH0PR21MB1328189E2821CDEC646F8178D8AE9@PH0PR21MB1328.namprd21.prod.outlook.com
2021-10-19 19:08:45 -03:00
Tom Lane 3e310d837a Fix assignment to array of domain over composite.
An update such as "UPDATE ... SET fld[n].subfld = whatever"
failed if the array elements were domains rather than plain
composites.  That's because isAssignmentIndirectionExpr()
failed to cope with the CoerceToDomain node that would appear
in the expression tree in this case.  The result would typically
be a crash, and even if we accidentally didn't crash, we'd not
correctly preserve other fields of the same array element.

Per report from Onder Kalaci.  Back-patch to v11 where arrays of
domains came in.

Discussion: https://postgr.es/m/PH0PR21MB132823A46AA36F0685B7A29AD8BD9@PH0PR21MB1328.namprd21.prod.outlook.com
2021-10-19 13:54:45 -04:00
Tom Lane 697dd1925f Remove bogus assertion in transformExpressionList().
I think when I added this assertion (in commit 8f889b108), I was only
thinking of the use of transformExpressionList at top level of INSERT
and VALUES.  But it's also called by transformRowExpr(), which can
certainly occur in an UPDATE targetlist, so it's inappropriate to
suppose that p_multiassign_exprs must be empty.  Besides, since the
input is not expected to contain ResTargets, there's no reason it
should contain MultiAssignRefs either.  Hence this code need not
be concerned about the state of p_multiassign_exprs, and we should
just drop the assertion.

Per bug #17236 from ocean_li_996.  It's been wrong for years,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/17236-3210de9bcba1d7ca@postgresql.org
2021-10-19 11:35:15 -04:00
Michael Paquier fdd8857145 Block ALTER INDEX/TABLE index_name ALTER COLUMN colname SET (options)
The grammar of this command run on indexes with column names has always
been authorized by the parser, and it has never been documented.

Since 911e702, it is possible to define opclass parameters as of CREATE
INDEX, which actually broke the old case of ALTER INDEX/TABLE where
relation-level parameters n_distinct and n_distinct_inherited could be
defined for an index (see 76a47c0 and its thread where this point has
been touched, still remained unused).  Attempting to do that in v13~
would cause the index to become unusable, as there is a new dedicated
code path to load opclass parameters instead of the relation-level ones
previously available.  Note that it is possible to fix things with a
manual catalog update to bring the relation back online.

This commit disables this command for now as the use of column names for
indexes does not make sense anyway, particularly when it comes to index
expressions where names are automatically computed.  One way to properly
support this case properly in the future would be to use column numbers
when it comes to indexes, in the same way as ALTER INDEX .. ALTER COLUMN
.. SET STATISTICS.

Partitioned indexes were already blocked, but not indexes.  Some tests
are added for both cases.

There was some code in ANALYZE to enforce n_distinct to be used for an
index expression if the parameter was defined, but just remove it for
now until/if there is support for this (note that index-level parameters
never had support in pg_dump either, previously), so this was just dead
code.

Reported-by: Matthijs van der Vleuten
Author: Nathan Bossart, Michael Paquier
Reviewed-by: Vik Fearing, Dilip Kumar
Discussion: https://postgr.es/m/17220-15d684c6c2171a83@postgresql.org
Backpatch-through: 13
2021-10-19 11:03:52 +09:00
Alvaro Herrera d6f1e16c8f
Invalidate partitions of table being attached/detached
Failing to do that, any direct inserts/updates of those partitions
would fail to enforce the correct constraint, that is, one that
considers the new partition constraint of their parent table.

Backpatch to 10.

Reported by: Hou Zhijie <houzj.fnst@fujitsu.com>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>

Discussion: https://postgr.es/m/OS3PR01MB5718DA1C4609A25186D1FBF194089%40OS3PR01MB5718.jpnprd01.prod.outlook.com
2021-10-18 19:08:25 -03:00