Commit Graph

2334 Commits

Author SHA1 Message Date
Tom Lane edc0a8d82a Detect integer overflow while computing new array dimensions.
array_set_element() and related functions allow an array to be
enlarged by assigning to subscripts outside the current array bounds.
While these places were careful to check that the new bounds are
allowable, they neglected to consider the risk of integer overflow
in computing the new bounds.  In edge cases, we could compute new
bounds that are invalid but get past the subsequent checks,
allowing bad things to happen.  Memory stomps that are potentially
exploitable for arbitrary code execution are possible, and so is
disclosure of server memory.

To fix, perform the hazardous computations using overflow-detecting
arithmetic routines, which fortunately exist in all still-supported
branches.

The test cases added for this generate (after patching) errors that
mention the value of MaxArraySize, which is platform-dependent.
Rather than introduce multiple expected-files, use psql's VERBOSITY
parameter to suppress the printing of the message text.  v11 psql
lacks that parameter, so omit the tests in that branch.

Our thanks to Pedro Gallegos for reporting this problem.

Security: CVE-2023-5869
2023-11-06 10:56:43 -05:00
Tom Lane 47c0f00cf9 Be more wary about NULL values for GUC string variables.
get_explain_guc_options() crashed if a string GUC marked GUC_EXPLAIN
has a NULL boot_val.  Nosing around found a couple of other places
that seemed insufficiently cautious about NULL string values, although
those are likely unreachable in practice.  Add some commentary
defining the expectations for NULL values of string variables,
in hopes of forestalling future additions of more such bugs.

Xing Guo, Aleksander Alekseev, Tom Lane

Discussion: https://postgr.es/m/CACpMh+AyDx5YUpPaAgzVwC1d8zfOL4JoD-uyFDnNSa1z0EsDQQ@mail.gmail.com
2023-11-02 11:47:33 -04:00
Nathan Bossart 54fc9dca5b Avoid calling proc_exit() in processes forked by system().
The SIGTERM handler for the startup process immediately calls
proc_exit() for the duration of the restore_command, i.e., a call
to system().  This system() call forks a new process to execute the
shell command, and this child process inherits the parent's signal
handlers.  If both the parent and child processes receive SIGTERM,
both will attempt to call proc_exit().  This can end badly.  For
example, both processes will try to remove themselves from the
PGPROC shared array.

To fix this problem, this commit adds a check in
StartupProcShutdownHandler() to see whether MyProcPid == getpid().
If they match, this is the parent process, and we can proc_exit()
like before.  If they do not match, this is a child process, and we
just emit a message to STDERR (in a signal safe manner) and
_exit(), thereby skipping any problematic exit callbacks.

This commit also adds checks in proc_exit(), ProcKill(), and
AuxiliaryProcKill() that verify they are not being called within
such child processes.

Suggested-by: Andres Freund
Reviewed-by: Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz
Discussion: https://postgr.es/m/20230223231503.GA743455%40nathanxps13
Backpatch-through: 11
2023-10-17 10:42:12 -05:00
Michael Paquier e72580232c pageinspect: Fix gist_page_items() with included columns
Non-leaf pages of GiST indexes contain key attributes, leaf pages
contain both key and non-key attributes, and gist_page_items() ignored
the handling of non-key attributes.  This caused a few problems when
using gist_page_items() on a GiST index with INCLUDE:
- On a non-leaf page, the function would crash.
- On a leaf page, the function would work, but miss to display all the
values for included attributes.

This commit fixes gist_page_items() to handle such cases in a more
appropriate way, and now displays the values of key and non-key
attributes for each item separately in a style consistent with what
ruleutils.c would generate for the attribute list, depending on the page
type dealt with.  In a way similar to how a record is displayed, values
would be double-quoted for key or non-key attributes if required.

ruleutils.c did not provide a routine able to control if non-key
attributes should be displayed, so an extended() routine for index
definitions is added to work around the leaf and non-leaf page
differences.

While on it, this commit fixes a third problem related to the amount of
data reported for key attributes.  The code originally relied on
BuildIndexValueDescription() (used for error reports on constraints)
that would not print all the data stored in the index but the index
opclass's input type, so this limited the amount of information
available.  This switch makes gist_page_items() much cheaper as there is
no need to run ACL checks for each item printed, which is not an issue
anyway as superuser rights are required to execute the functions of
pageinspect.  Opclasses whose data cannot be displayed can rely on
gist_page_items_bytea().

The documentation of this function was slightly incorrect for the
output results generated on HEAD and v15, so adjust it on these
branches.

Author: Alexander Lakhin, Michael Paquier
Discussion: https://postgr.es/m/17884-cb8c326522977acb@postgresql.org
Backpatch-through: 14
2023-05-19 12:38:18 +09:00
Tom Lane 21c058648e Make our back branches build under -fkeep-inline-functions.
Add "#ifndef FRONTEND" where necessary to make pg_waldump build
on compilers that don't elide unused static-inline functions.

This back-patches relevant parts of commit 3e9ca5260, fixing build
breakage from dc7420c2c and back-patching of f10f0ae42.

Per recently-resurrected buildfarm member castoroides.  We aren't
expecting castoroides to build anything newer than v11, but we
might as well clean up the intermediate branches while at it.
2023-01-20 11:58:12 -05:00
Tom Lane 32d5a4974c Replace RelationOpenSmgr() with RelationGetSmgr().
This is a back-patch of the v15-era commit f10f0ae42 into older
supported branches.  The idea is to design out bugs in which an
ill-timed relcache flush clears rel->rd_smgr partway through
some code sequence that wasn't expecting that.  We had another
report today of a corner case that reliably crashes v14 under
debug_discard_caches (nee CLOBBER_CACHE_ALWAYS), and therefore
would crash once in a blue moon in the field.  We're unlikely
to get rid of all such code paths unless we adopt the more
rigorous coding rules instituted by f10f0ae42.  Therefore,
even though this is a bit invasive, it's time to back-patch.
Some comfort can be taken in the fact that f10f0ae42 has been
in v15 for 16 months without problems.

I left the RelationOpenSmgr macro present in the back branches,
even though no core code should use it anymore, in order to not break
third-party extensions in minor releases.  Such extensions might opt
to start using RelationGetSmgr instead, to reduce their code
differential between v15 and earlier branches.  This carries a hazard
of failing to compile against headers from existing minor releases.
However, once compiled the extension should work fine even with such
releases, because RelationGetSmgr is a "static inline" function so
it creates no link-time dependency.  So depending on distribution
practices, that might be an OK tradeoff.

Per report from Spyridon Dimitrios Agathos.  Original patch
by Amul Sul.

Discussion: https://postgr.es/m/CAFM5RaqdgyusQvmWkyPYaWMwoK5gigdtW-7HcgHgOeAw7mqJ_Q@mail.gmail.com
Discussion: https://postgr.es/m/CANiYTQsU7yMFpQYnv=BrcRVqK_3U3mtAzAsJCaqtzsDHfsUbdQ@mail.gmail.com
2022-11-17 16:54:30 -05:00
Peter Eisentraut b7f37af7c1 Expand palloc/pg_malloc API for more type safety
This adds additional variants of palloc, pg_malloc, etc. that
encapsulate common usage patterns and provide more type safety.

Specifically, this adds palloc_object(), palloc_array(), and
repalloc_array(), which take the type name of the object to be
allocated as its first argument and cast the return as a pointer to
that type.  There are also palloc0_object() and palloc0_array()
variants for initializing with zero, and pg_malloc_*() variants of all
of the above.

Inspired by the talloc library.

This is backpatched from master so that future backpatchable code can
make use of these APIs.  This patch by itself does not contain any
users of these APIs.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/bb755632-2a43-d523-36f8-a1e7a389a907@enterprisedb.com
2022-09-14 06:08:34 +02:00
Thomas Munro 99504ff826 Fix relptr's encoding of the base address.
Previously, we encoded both NULL and the first byte at the base address
as 0.  That confusion led to the assertion in commit e07d4ddc, which
failed when min_dynamic_shared_memory was used.  Give them distinct
encodings, by switching to 1-based offsets for non-NULL pointers.  Also
improve macro hygiene in passing (missing/misplaced parentheses), and
remove open-coded access to the raw offset value from freepage.c/h.

Although e07d4ddc was back-patched to 10, the only code that actually
makes use of relptr at the base address arrived in 84b1c63a, so no need
to back-patch further than 14 for now.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/20220519193839.GT19626%40telsasoft.com
2022-06-27 11:45:03 +12:00
Peter Eisentraut d480ae069e Remove obsolete comment
accidentally left behind by 4cb658af70
2022-04-02 07:28:11 +02:00
Tom Lane 0144c9c7e7 Suppress compiler warning in relptr_store().
clang 13 with -Wextra warns that "performing pointer subtraction with
a null pointer has undefined behavior" in the places where freepage.c
tries to set a relptr variable to constant NULL.  This appears to be
a compiler bug, but it's unlikely to get fixed instantly.  Fortunately,
we can work around it by introducing an inline support function, which
seems like a good change anyway because it removes the macro's existing
double-evaluation hazard.

Backpatch to v10 where this code was introduced.

Patch by me, based on an idea of Andres Freund's.

Discussion: https://postgr.es/m/48826.1648310694@sss.pgh.pa.us
2022-03-26 14:29:40 -04:00
Thomas Munro 1396b5c6ed Fix waiting in RegisterSyncRequest().
If we run out of space in the checkpointer sync request queue (which is
hopefully rare on real systems, but common with very small buffer pool),
we wait for it to drain.  While waiting, we should report that as a wait
event so that users know what is going on, and also handle postmaster
death, since otherwise the loop might never terminate if the
checkpointer has exited.

Back-patch to 12.  Although the problem exists in earlier releases too,
the code is structured differently before 12 so I haven't gone any
further for now, in the absence of field complaints.

Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220226213942.nb7uvb2pamyu26dj%40alap3.anarazel.de
2022-03-16 15:35:42 +13:00
Thomas Munro 78c0f85e43 Wake up for latches in CheckpointWriteDelay().
The checkpointer shouldn't ignore its latch.  Other backends may be
waiting for it to drain the request queue.  Hopefully real systems don't
have a full queue often, but the condition is reached easily when
shared_buffers is small.

This involves defining a new wait event, which will appear in the
pg_stat_activity view often due to spread checkpoints.

Back-patch only to 14.  Even though the problem exists in earlier
branches too, it's hard to hit there.  In 14 we stopped using signal
handlers for latches on Linux, *BSD and macOS, which were previously
hiding this problem by interrupting the sleep (though not reliably, as
the signal could arrive before the sleep begins; precisely the problem
latches address).

Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220226213942.nb7uvb2pamyu26dj%40alap3.anarazel.de
2022-03-16 13:57:07 +13:00
Michael Paquier 627c79a1e8 Add compute_query_id = regress
"regress" is a new mode added to compute_query_id aimed at facilitating
regression testing when a module computing query IDs is loaded into the
backend, like pg_stat_statements.  It works the same way as "auto",
meaning that query IDs are computed if a module enables it, except that
query IDs are hidden in EXPLAIN outputs to ensure regression output
stability.

Like any GUCs of the kind (force_parallel_mode, etc.), this new
configuration can be added to an instance's postgresql.conf, or just
passed down with PGOPTIONS at command level.  compute_query_id uses an
enum for its set of option values, meaning that this addition ensures
ABI compatibility.

Using this new configuration mode allows installcheck-world to pass when
running the tests on an instance with pg_stat_statements enabled,
stabilizing the test output while checking the paths doing query ID
computations.

Reported-by: Anton Melnikov
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/1634283396.372373993@f75.i.mail.ru
Discussion: https://postgr.es/m/YgHlxgc/OimuPYhH@paquier.xyz
Backpatch-through: 14
2022-02-22 10:23:49 +09:00
Tomas Vondra fb2f8e534a Fix ordering of XIDs in ProcArrayApplyRecoveryInfo
Commit 8431e296ea reworked ProcArrayApplyRecoveryInfo to sort XIDs
before adding them to KnownAssignedXids. But the XIDs are sorted using
xidComparator, which compares the XIDs simply as uint32 values, not
logically. KnownAssignedXidsAdd() however expects XIDs in logical order,
and calls TransactionIdFollowsOrEquals() to enforce that. If there are
XIDs for which the two orderings disagree, an error is raised and the
recovery fails/restarts.

Hitting this issue is fairly easy - you just need two transactions, one
started before the 4B limit (e.g. XID 4294967290), the other sometime
after it (e.g. XID 1000). Logically (4294967290 <= 1000) but when
compared using xidComparator we try to add them in the opposite order.
Which makes KnownAssignedXidsAdd() fail with an error like this:

  ERROR: out-of-order XID insertion in KnownAssignedXids

This only happens during replica startup, while processing RUNNING_XACTS
records to build the snapshot. Once we reach STANDBY_SNAPSHOT_READY, we
skip these records. So this does not affect already running replicas,
but if you restart (or create) a replica while there are transactions
with XIDs for which the two orderings disagree, you may hit this.

Long-running transactions and frequent replica restarts increase the
likelihood of hitting this issue. Once the replica gets into this state,
it can't be started (even if the old transactions are terminated).

Fixed by sorting the XIDs logically - this is fine because we're dealing
with normal XIDs (because it's XIDs assigned to backends) and from the
same wraparound epoch (otherwise the backends could not be running at
the same time on the primary node). So there are no problems with the
triangle inequality, which is why xidComparator compares raw values.

Investigation and root cause analysis by Abhijit Menon-Sen. Patch by me.

This issue is present in all releases since 9.4, however releases up to
9.6 are EOL already so backpatch to 10 only.

Reviewed-by: Abhijit Menon-Sen
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
Discussion: https://postgr.es/m/36b8a501-5d73-277c-4972-f58a4dce088a%40enterprisedb.com
2022-01-27 20:15:37 +01:00
David Rowley 6c32c09777 Allow Memoize to operate in binary comparison mode
Memoize would always use the hash equality operator for the cache key
types to determine if the current set of parameters were the same as some
previously cached set.  Certain types such as floating points where -0.0
and +0.0 differ in their binary representation but are classed as equal by
the hash equality operator may cause problems as unless the join uses the
same operator it's possible that whichever join operator is being used
would be able to distinguish the two values.  In which case we may
accidentally return in the incorrect rows out of the cache.

To fix this here we add a binary mode to Memoize to allow it to the
current set of parameters to previously cached values by comparing
bit-by-bit rather than logically using the hash equality operator.  This
binary mode is always used for LATERAL joins and it's used for normal
joins when any of the join operators are not hashable.

Reported-by: Tom Lane
Author: David Rowley
Discussion: https://postgr.es/m/3004308.1632952496@sss.pgh.pa.us
Backpatch-through: 14, where Memoize was added
2021-11-24 10:07:38 +13:00
Amit Kapila a6a0ae127e Revert "Remove unused wait events."
This reverts commit 671eb8f344. The removed
wait events are used by some extensions and removal of these would force a
recompile of those extensions. We don't want that for released branches.

Discussion: https://postgr.es/m/E1mdOBY-0005j2-QL@gemulon.postgresql.org
2021-10-26 08:19:33 +05:30
Noah Misch dde966efb2 Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.
CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start.  That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index.  Queries that use the
resulting index can silently fail to find rows.  Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation.  It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).

Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.

Discussion: https://postgr.es/m/20210730022548.GA1940096@gust.leadboat.com
2021-10-23 18:36:42 -07:00
Amit Kapila 671eb8f344 Remove unused wait events.
Commit 464824323e introduced the wait events which were neither used by
that commit nor by follow-up commits for that work.

Author: Masahiro Ikeda
Backpatch-through: 14, where it was introduced
Discussion: https://postgr.es/m/ff077840-3ab2-04dd-bbe4-4f5dfd2ad481@oss.nttdata.com
2021-10-21 08:07:08 +05:30
Tom Lane e6adaa1795 Fix Portal snapshot tracking to handle subtransactions properly.
Commit 84f5c2908 forgot to consider the possibility that
EnsurePortalSnapshotExists could run inside a subtransaction with
lifespan shorter than the Portal's.  In that case, the new active
snapshot would be popped at the end of the subtransaction, leaving
a dangling pointer in the Portal, with mayhem ensuing.

To fix, make sure the ActiveSnapshot stack entry is marked with
the same subtransaction nesting level as the associated Portal.
It's certainly safe to do so since we won't be here at all unless
the stack is empty; hence we can't create an out-of-order stack.

Let's also apply this logic in the case where PortalRunUtility
sets portalSnapshot, just to be sure that path can't cause similar
problems.  It's slightly less clear that that path can't create
an out-of-order stack, so add an assertion guarding it.

Report and patch by Bertrand Drouvot (with kibitzing by me).
Back-patch to v11, like the previous commit.

Discussion: https://postgr.es/m/ff82b8c5-77f4-3fe7-6028-fcf3303e82dd@amazon.com
2021-10-01 11:10:12 -04:00
Alexander Korotkov 7186f07189 Split macros from visibilitymap.h into a separate header
That allows to include just visibilitymapdefs.h from file.c, and in turn,
remove include of postgres.h from relcache.h.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/20210913232614.czafiubr435l6egi%40alap3.anarazel.de
Author: Alexander Korotkov
Reviewed-by: Andres Freund, Tom Lane, Alvaro Herrera
Backpatch-through: 13
2021-09-23 19:59:11 +03:00
Peter Eisentraut a599109e59 Fix typo 2021-08-25 10:15:05 +02:00
Michael Paquier 1900c14055 Revert refactoring of hex code to src/common/
This is a combined revert of the following commits:
- c3826f8, a refactoring piece that moved the hex decoding code to
src/common/.  This code was cleaned up by aef8948, as it originally
included no overflow checks in the same way as the base64 routines in
src/common/ used by SCRAM, making it unsafe for its purpose.
- aef8948, a more advanced refactoring of the hex encoding/decoding code
to src/common/ that added sanity checks on the result buffer for hex
decoding and encoding.  As reported by Hans Buschmann, those overflow
checks are expensive, and it is possible to see a performance drop in
the decoding/encoding of bytea or LOs the longer they are.  Simple SQLs
working on large bytea values show a clear difference in perf profile.
- ccf4e27, a cleanup made possible by aef8948.

The reverts of all those commits bring back the performance of hex
decoding and encoding back to what it was in ~13.  Fow now and
post-beta3, this is the simplest option.

Reported-by: Hans Buschmann
Discussion: https://postgr.es/m/1629039545467.80333@nidsa.net
Backpatch-through: 14
2021-08-19 09:20:19 +09:00
Tom Lane 6201fa3c16 Rename debug_invalidate_system_caches_always to debug_discard_caches.
The name introduced by commit 4656e3d66 was agreed to be unreasonably
long.  To match this change, rename initdb's recently-added
--clobber-cache option to --discard-caches.

Discussion: https://postgr.es/m/1374320.1625430433@sss.pgh.pa.us
2021-07-13 15:01:01 -04:00
Andrew Dunstan e1c1c30f63
Pre branch pgindent / pgperltidy run
Along the way make a slight adjustment to
src/include/utils/queryjumble.h to avoid an unused typedef.
2021-06-28 11:05:54 -04:00
Peter Geoghegan 3499df0dee Support disabling index bypassing by VACUUM.
Generalize the INDEX_CLEANUP VACUUM parameter (and the corresponding
reloption): make it into a ternary style boolean parameter.  It now
exposes a third option, "auto".  The "auto" option (which is now the
default) enables the "bypass index vacuuming" optimization added by
commit 1e55e7d1.

"VACUUM (INDEX_CLEANUP TRUE)" is redefined to once again make VACUUM
simply do any required index vacuuming, regardless of how few dead
tuples are encountered during the first scan of the target heap relation
(unless there are exactly zero).  This gives users a way of opting out
of the "bypass index vacuuming" optimization, if for whatever reason
that proves necessary.  It is also expected to be used by PostgreSQL
developers as a testing option from time to time.

"VACUUM (INDEX_CLEANUP FALSE)" does the same thing as it always has: it
forcibly disables both index vacuuming and index cleanup.  It's not
expected to be used much in PostgreSQL 14.  The failsafe mechanism added
by commit 1e55e7d1 addresses the same problem in a simpler way.
INDEX_CLEANUP can now be thought of as a testing and compatibility
option.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-By: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CAH2-WznrBoCST4_Gxh_G9hA8NzGUbeBGnOUC8FcXcrhqsv6OHQ@mail.gmail.com
2021-06-18 20:04:07 -07:00
Tom Lane 84f5c2908d Restore the portal-level snapshot after procedure COMMIT/ROLLBACK.
COMMIT/ROLLBACK necessarily destroys all snapshots within the session.
The original implementation of intra-procedure transactions just
cavalierly did that, ignoring the fact that this left us executing in
a rather different environment than normal.  In particular, it turns
out that handling of toasted datums depends rather critically on there
being an outer ActiveSnapshot: otherwise, when SPI or the core
executor pop whatever snapshot they used and return, it's unsafe to
dereference any toasted datums that may appear in the query result.
It's possible to demonstrate "no known snapshots" and "missing chunk
number N for toast value" errors as a result of this oversight.

Historically this outer snapshot has been held by the Portal code,
and that seems like a good plan to preserve.  So add infrastructure
to pquery.c to allow re-establishing the Portal-owned snapshot if it's
not there anymore, and add enough bookkeeping support that we can tell
whether it is or not.

We can't, however, just re-establish the Portal snapshot as part of
COMMIT/ROLLBACK.  As in normal transaction start, acquiring the first
snapshot should wait until after SET and LOCK commands.  Hence, teach
spi.c about doing this at the right time.  (Note that this patch
doesn't fix the problem for any PLs that try to run intra-procedure
transactions without using SPI to execute SQL commands.)

This makes SPI's no_snapshots parameter rather a misnomer, so in HEAD,
rename that to allow_nonatomic.

replication/logical/worker.c also needs some fixes, because it wasn't
careful to hold a snapshot open around AFTER trigger execution.
That code doesn't use a Portal, which I suspect someday we're gonna
have to fix.  But for now, just rearrange the order of operations.
This includes back-patching the recent addition of finish_estate()
to centralize the cleanup logic there.

This also back-patches commit 2ecfeda3e into v13, to improve the
test coverage for worker.c (it was that test that exposed that
worker.c's snapshot management is wrong).

Per bug #15990 from Andreas Wicht.  Back-patch to v11 where
intra-procedure COMMIT was added.

Discussion: https://postgr.es/m/15990-eee2ac466b11293d@postgresql.org
2021-05-21 14:03:59 -04:00
Alvaro Herrera cafde58b33
Allow compute_query_id to be set to 'auto' and make it default
Allowing only on/off meant that all either all existing configuration
guides would become obsolete if we disabled it by default, or that we
would have to accept a performance loss in the default config if we
enabled it by default.  By allowing 'auto' as a middle ground, the
performance cost is only paid by those who enable pg_stat_statements and
similar modules.

I only edited the release notes to comment-out a paragraph that is now
factually wrong; further edits are probably needed to describe the
related change in more detail.

Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20210513002623.eugftm4nk2lvvks3@nol
2021-05-15 14:13:09 -04:00
Tom Lane def5b065ff Initial pgindent and pgperltidy run for v14.
Also "make reformat-dat-files".

The only change worthy of note is that pgindent messed up the formatting
of launcher.c's struct LogicalRepWorkerId, which led me to notice that
that struct wasn't used at all anymore, so I just took it out.
2021-05-12 13:14:10 -04:00
Tom Lane f02b9085ad Prevent integer overflows in array subscripting calculations.
While we were (mostly) careful about ensuring that the dimensions of
arrays aren't large enough to cause integer overflow, the lower bound
values were generally not checked.  This allows situations where
lower_bound + dimension overflows an integer.  It seems that that's
harmless so far as array reading is concerned, except that array
elements with subscripts notionally exceeding INT_MAX are inaccessible.
However, it confuses various array-assignment logic, resulting in a
potential for memory stomps.

Fix by adding checks that array lower bounds aren't large enough to
cause lower_bound + dimension to overflow.  (Note: this results in
disallowing cases where the last subscript position would be exactly
INT_MAX.  In principle we could probably allow that, but there's a lot
of code that computes lower_bound + dimension and would need adjustment.
It seems doubtful that it's worth the trouble/risk to allow it.)

Somewhat independently of that, array_set_element() was careless
about possible overflow when checking the subscript of a fixed-length
array, creating a different route to memory stomps.  Fix that too.

Security: CVE-2021-32027
2021-05-10 10:44:38 -04:00
Thomas Munro c2dc19342e Revert recovery prefetching feature.
This set of commits has some bugs with known fixes, but at this late
stage in the release cycle it seems best to revert and resubmit next
time, along with some new automated test coverage for this whole area.

Commits reverted:

dc88460c: Doc: Review for "Optionally prefetch referenced data in recovery."
1d257577: Optionally prefetch referenced data in recovery.
f003d9f8: Add circular WAL decoding buffer.
323cbe7c: Remove read_page callback from XLogReader.

Remove the new GUC group WAL_RECOVERY recently added by a55a9847, as the
corresponding section of config.sgml is now reverted.

Discussion: https://postgr.es/m/CAOuzzgrn7iKnFRsB4MHp3UisEQAGgZMbk_ViTN4HV4-Ksq8zCg%40mail.gmail.com
2021-05-10 16:06:09 +12:00
Tom Lane a55a98477b Sync guc.c and postgresql.conf.sample with the SGML docs.
It seems that various people have moved GUCs around in the config.sgml
listing without bothering to make the code agree.  Ensure that the
config_group codes assigned to GUCs match where they are listed in
config.sgml.  Likewise ensure that postgresql.conf.sample lists GUCs
in the same sub-section and same ordering as they appear in config.sgml.

(I've got some doubts about some of these choices, but for the purposes
of this patch, we'll treat config.sgml as gospel.)

Notably, this requires adding a WAL_RECOVERY config_group value,
because 1d257577e didn't.  As long as we're renumbering that enum
anyway, let's take out the values corresponding to major groups
that are divided into sub-groups.  No GUC should be assigned to the
major group itself, so those values just create a temptation to
do the wrong thing, while adding work for translators.

In passing, adjust the short_desc strings for PRESET_OPTIONS GUCs
to uniformly use the phrasing "Shows XYZ.", removing the impression
some of these strings left that you can set the value.

While some of these errors are old, no back-patch, as changing the
contents of the pg_settings view in stable branches seems more likely
to be seen as a compatibility break than anything helpful.

Bharath Rupireddy, Justin Pryzby, Tom Lane

Discussion: https://postgr.es/m/16997-ff16127f6e0d1390@postgresql.org
Discussion: https://postgr.es/m/20210413123139.GE6091@telsasoft.com
2021-05-08 12:13:33 -04:00
Thomas Munro ec48314708 Revert per-index collation version tracking feature.
Design problems were discovered in the handling of composite types and
record types that would cause some relevant versions not to be recorded.
Misgivings were also expressed about the use of the pg_depend catalog
for this purpose.  We're out of time for this release so we'll revert
and try again.

Commits reverted:

1bf946bd: Doc: Document known problem with Windows collation versions.
cf002008: Remove no-longer-relevant test case.
ef387bed: Fix bogus collation-version-recording logic.
0fb0a050: Hide internal error for pg_collation_actual_version(<bad OID>).
ff942057: Suppress "warning: variable 'collcollate' set but not used".
d50e3b1f: Fix assertion in collation version lookup.
f24b1569: Rethink extraction of collation dependencies.
257836a7: Track collation versions for indexes.
cd6f479e: Add pg_depend.refobjversion.
7d1297df: Remove pg_collation.collversion.

Discussion: https://postgr.es/m/CA%2BhUKGLhj5t1fcjqAu8iD9B3ixJtsTNqyCCD4V0aTO9kAKAjjA%40mail.gmail.com
2021-05-07 21:10:11 +12:00
Alvaro Herrera 3fe773b149
Track detached partitions more accurately in partdescs
In d6b8d29419 I (Álvaro) was sloppy about recording whether a
partition descripor does or does not include detached partitions, when
the snapshot checking does not see the pg_inherits row marked detached.
In that case no partition was omitted, yet in the relcache entry we were
saving the partdesc as omitting partitions.  Flip that (so we save it as
a partdesc not omitting partitions, which indeed it doesn't), which
hopefully makes the code easier to reason about.

Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqE7GxGU4VdzwZzfiz+Ont5SsopoFkgtrZGEdPqWRL+biA@mail.gmail.com
2021-05-06 12:47:30 -04:00
Alvaro Herrera d6b8d29419
Allow a partdesc-omitting-partitions to be cached
Makes partition descriptor acquisition faster during the transient
period in which a partition is in the process of being detached.

This also adds the restriction that only one partition can be in
pending-detach state for a partitioned table.

While at it, return find_inheritance_children() API to what it was
before 71f4c8c6f7, and create a separate
find_inheritance_children_extended() that returns detailed info about
detached partitions.

(This incidentally fixes a bug in 8aba932251 whereby a memory context
holding a transient partdesc is reparented to a NULL PortalContext,
leading to permanent leak of that memory.  The fix is to no longer rely
on reparenting contexts to PortalContext.   Reported by Amit Langote.)

Per gripe from Amit Langote
Discussion: https://postgr.es/m/CA+HiwqFgpP1LxJZOBYGt9rpvTjXXkg5qG2+Xch2Z1Q7KrqZR1A@mail.gmail.com
2021-04-28 15:44:35 -04:00
Amit Kapila e7eea52b2d Fix Logical Replication of Truncate in synchronous commit mode.
The Truncate operation acquires an exclusive lock on the target relation
and indexes. It then waits for logical replication of the operation to
finish at commit. Now because we are acquiring the shared lock on the
target index to get index attributes in pgoutput while sending the
changes for the Truncate operation, it leads to a deadlock.

Actually, we don't need to acquire a lock on the target index as we build
the cache entry using a historic snapshot and all the later changes are
absorbed while decoding WAL. So, we wrote a special purpose function for
logical replication to get a bitmap of replica identity attribute numbers
where we get that information without locking the target index.

We decided not to backpatch this as there doesn't seem to be any field
complaint about this issue since it was introduced in commit 5dfd1e5a in
v11.

Reported-by: Haiying Tang
Author: Takamichi Osumi, test case by Li Japin
Reviewed-by: Amit Kapila, Ajin Cherian
Discussion: https://postgr.es/m/OS0PR01MB6113C2499C7DC70EE55ADB82FB759@OS0PR01MB6113.jpnprd01.prod.outlook.com
2021-04-27 08:28:26 +05:30
Bruce Momjian 9660834dd8 adjust query id feature to use pg_stat_activity.query_id
Previously, it was pg_stat_activity.queryid to match the
pg_stat_statements queryid column.  This is an adjustment to patch
4f0b0966c8.  This also adjusts some of the internal function calls to
match.  Catversion bumped.

Reported-by: Álvaro Herrera, Julien Rouhaud

Discussion: https://postgr.es/m/20210408032704.GA7498@alvherre.pgsql
2021-04-20 12:22:26 -04:00
Tom Lane d7cff12c4c Add macro PGWARNING, and make PGERROR available on all platforms.
We'd previously noted the need for coping with Windows headers
that provide some other definition of macro "ERROR" than elog.h
does.  It turns out that R also wants to define ERROR, and
WARNING too.  PL/R has been working around this in a hacky way
that broke when we recently changed the numeric value of ERROR.
To let them have a more future-proof solution, provide an
alternate macro PGWARNING for WARNING, and make PGERROR visible
always, not only when #ifdef WIN32.

Discussion: https://postgr.es/m/CADK3HHK6iMChd1yoOqssxBn5Z14Zar8Ztr3G-N_fuG7F8YTP3w@mail.gmail.com
2021-04-11 13:22:56 -04:00
Michael Paquier 609b0652af Fix typos and grammar in documentation and code comments
Comment fixes are applied on HEAD, and documentation improvements are
applied on back-branches where needed.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20210408164008.GJ6592@telsasoft.com
Backpatch-through: 9.6
2021-04-09 13:53:07 +09:00
Fujii Masao 8ff1c94649 Allow TRUNCATE command to truncate foreign tables.
This commit introduces new foreign data wrapper API for TRUNCATE.
It extends TRUNCATE command so that it accepts foreign tables as
the targets to truncate and invokes that API. Also it extends postgres_fdw
so that it can issue TRUNCATE command to foreign servers, by adding
new routine for that TRUNCATE API.

The information about options specified in TRUNCATE command, e.g.,
ONLY, CACADE, etc is passed to FDW via API. The list of foreign tables to
truncate is also passed to FDW. FDW truncates the foreign data sources
that the passed foreign tables specify, based on those information.
For example, postgres_fdw constructs TRUNCATE command using them
and issues it to the foreign server.

For performance, TRUNCATE command invokes the FDW routine for
TRUNCATE once per foreign server that foreign tables to truncate belong to.

Author: Kazutaka Onishi, Kohei KaiGai, slightly modified by Fujii Masao
Reviewed-by: Bharath Rupireddy, Michael Paquier, Zhihong Yu, Alvaro Herrera, Stephen Frost, Ashutosh Bapat, Amit Langote, Daniel Gustafsson, Ibrar Ahmed, Fujii Masao
Discussion: https://postgr.es/m/CAOP8fzb_gkReLput7OvOK+8NHgw-RKqNv59vem7=524krQTcWA@mail.gmail.com
Discussion: https://postgr.es/m/CAJuF6cMWDDqU-vn_knZgma+2GMaout68YUgn1uyDnexRhqqM5Q@mail.gmail.com
2021-04-08 20:56:08 +09:00
Thomas Munro 1d257577e0 Optionally prefetch referenced data in recovery.
Introduce a new GUC recovery_prefetch, disabled by default.  When
enabled, look ahead in the WAL and try to initiate asynchronous reading
of referenced data blocks that are not yet cached in our buffer pool.
For now, this is done with posix_fadvise(), which has several caveats.
Better mechanisms will follow in later work on the I/O subsystem.

The GUC maintenance_io_concurrency is used to limit the number of
concurrent I/Os we allow ourselves to initiate, based on pessimistic
heuristics used to infer that I/Os have begun and completed.

The GUC wal_decode_buffer_size is used to limit the maximum distance we
are prepared to read ahead in the WAL to find uncached blocks.

Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (parts)
Reviewed-by: Andres Freund <andres@anarazel.de> (parts)
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (parts)
Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com>
Tested-by: Dmitry Dolgov <9erthalion6@gmail.com>
Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com>
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
2021-04-08 23:20:42 +12:00
Magnus Hagander aaf0432572 Add functions to wait for backend termination
This adds a function, pg_wait_for_backend_termination(), and a new
timeout argument to pg_terminate_backend(), which will wait for the
backend to actually terminate (with or without signaling it to do so
depending on which function is called). The default behaviour of
pg_terminate_backend() remains being timeout=0 which does not waiting.
For pg_wait_for_backend_termination() the default wait is 5 seconds.

Author: Bharath Rupireddy
Reviewed-By: Fujii Masao, David Johnston, Muhammad Usama,
             Hou Zhijie, Magnus Hagander
Discussion: https://postgr.es/m/CALj2ACUBpunmyhYZw-kXCYs5NM+h6oG_7Df_Tn4mLmmUQifkqA@mail.gmail.com
2021-04-08 11:40:54 +02:00
Bruce Momjian 4f0b0966c8 Make use of in-core query id added by commit 5fd9dfa5f5
Use the in-core query id computation for pg_stat_activity,
log_line_prefix, and EXPLAIN VERBOSE.

Similar to other fields in pg_stat_activity, only the queryid from the
top level statements are exposed, and if the backends status isn't
active then the queryid from the last executed statements is displayed.

Add a %Q placeholder to include the queryid in log_line_prefix, which
will also only expose top level statements.

For EXPLAIN VERBOSE, if a query identifier has been computed, either by
enabling compute_query_id or using a third-party module, display it.

Bump catalog version.

Discussion: https://postgr.es/m/20210407125726.tkvjdbw76hxnpwfi@nol

Author: Julien Rouhaud

Reviewed-by: Alvaro Herrera, Nitin Jadhav, Zhihong Yu
2021-04-07 14:04:06 -04:00
Bruce Momjian 5fd9dfa5f5 Move pg_stat_statements query jumbling to core.
Add compute_query_id GUC to control whether a query identifier should be
computed by the core (off by default).  It's thefore now possible to
disable core queryid computation and use pg_stat_statements with a
different algorithm to compute the query identifier by using a
third-party module.

To ensure that a single source of query identifier can be used and is
well defined, modules that calculate a query identifier should throw an
error if compute_query_id specified to compute a query id and if a query
idenfitier was already calculated.

Discussion: https://postgr.es/m/20210407125726.tkvjdbw76hxnpwfi@nol

Author: Julien Rouhaud

Reviewed-by: Alvaro Herrera, Nitin Jadhav, Zhihong Yu
2021-04-07 13:06:56 -04:00
Peter Eisentraut a2da77cdb4 Change return type of EXTRACT to numeric
The previous implementation of EXTRACT mapped internally to
date_part(), which returned type double precision (since it was
implemented long before the numeric type existed).  This can lead to
imprecise output in some cases, so returning numeric would be
preferrable.  Changing the return type of an existing function is a
bit risky, so instead we do the following:  We implement a new set of
functions, which are now called "extract", in parallel to the existing
date_part functions.  They work the same way internally but use
numeric instead of float8.  The EXTRACT construct is now mapped by the
parser to these new extract functions.  That way, dumps of views
etc. from old versions (which would use date_part) continue to work
unchanged, but new uses will map to the new extract functions.

Additionally, the reverse compilation of EXTRACT now reproduces the
original syntax, using the new mechanism introduced in
40c24bfef9.

The following minor changes of behavior result from the new
implementation:

- The column name from an isolated EXTRACT call is now "extract"
  instead of "date_part".

- Extract from date now rejects inappropriate field names such as
  HOUR.  It was previously mapped internally to extract from
  timestamp, so it would silently accept everything appropriate for
  timestamp.

- Return values when extracting fields with possibly fractional
  values, such as second and epoch, now have the full scale that the
  value has internally (so, for example, '1.000000' instead of just
  '1').

Reported-by: Petr Fedorov <petr.fedorov@phystech.edu>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu
2021-04-06 07:20:42 +02:00
Fujii Masao 43620e3286 Add function to log the memory contexts of specified backend process.
Commit 3e98c0bafb added pg_backend_memory_contexts view to display
the memory contexts of the backend process. However its target process
is limited to the backend that is accessing to the view. So this is
not so convenient when investigating the local memory bloat of other
backend process. To improve this situation, this commit adds
pg_log_backend_memory_contexts() function that requests to log
the memory contexts of the specified backend process.

This information can be also collected by calling
MemoryContextStats(TopMemoryContext) via a debugger. But
this technique cannot be used in some environments because no debugger
is available there. So, pg_log_backend_memory_contexts() allows us to
see the memory contexts of specified backend more easily.

Only superusers are allowed to request to log the memory contexts
because allowing any users to issue this request at an unbounded rate
would cause lots of log messages and which can lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its memory contexts at LOG_SERVER_ONLY level,
so that these memory contexts will appear in the server log but not
be sent to the client. It logs one message per memory context.
Because if it buffers all memory contexts into StringInfo to log them
as one message, which may require the buffer to be enlarged very much
and lead to OOM error since there can be a large number of memory
contexts in a backend.

When a backend process is consuming huge memory, logging all its
memory contexts might overrun available disk space. To prevent this,
now this patch limits the number of child contexts to log per parent
to 100. As with MemoryContextStats(), it supposes that practical cases
where the log gets long will typically be huge numbers of siblings
under the same parent context; while the additional debugging value
from seeing details about individual siblings beyond 100 will not be large.

There was another proposed patch to add the function to return
the memory contexts of specified backend as the result sets,
instead of logging them, in the discussion. However that patch is
not included in this commit because it had several issues to address.

Thanks to Tatsuhito Kasahara, Andres Freund, Tom Lane, Tomas Vondra,
Michael Paquier, Kyotaro Horiguchi and Zhihong Yu for the discussion.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Kyotaro Horiguchi, Zhihong Yu, Fujii Masao
Discussion: https://postgr.es/m/0271f440ac77f2a4180e0e56ebd944d1@oss.nttdata.com
2021-04-06 13:44:15 +09:00
Andres Freund 225a22b19e Improve efficiency of wait event reporting, remove proc.h dependency.
pgstat_report_wait_start() and pgstat_report_wait_end() required two
conditional branches so far. One to check if MyProc is NULL, the other to
check if pgstat_track_activities is set. As wait events are used around
comparatively lightweight operations, and are inlined (reducing branch
predictor effectiveness), that's not great.

The dependency on MyProc has a second disadvantage: Low-level subsystems, like
storage/file/fd.c, report wait events, but architecturally it is preferable
for them to not depend on inter-process subsystems like proc.h (defining
PGPROC).  After this change including pgstat.h (nor obviously its
sub-components like backend_status.h, wait_event.h, ...) does not pull in IPC
related headers anymore.

These goals, efficiency and abstraction, are achieved by having
pgstat_report_wait_start/end() not interact with MyProc, but instead a new
my_wait_event_info variable. At backend startup it points to a local variable,
removing the need to check for MyProc being NULL. During process
initialization my_wait_event_info is redirected to MyProc->wait_event_info. At
shutdown this is reversed. Because wait event reporting now does not need to
know about where the wait event is stored, it does not need to know about
PGPROC anymore.

The removal of the branch for checking pgstat_track_activities is simpler:
Don't check anymore. The cost due to the branch are often higher than the
store - and even if not, pgstat_track_activities is rarely disabled.

The main motivator to commit this work now is that removing the (indirect)
pgproc.h include from pgstat.h simplifies a patch to move statistics reporting
to shared memory (which still has a chance to get into 14).

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210402194458.2vu324hkk2djq6ce@alap3.anarazel.de
2021-04-03 12:03:45 -07:00
Andres Freund e1025044cd Split backend status and progress related functionality out of pgstat.c.
Backend status (supporting pg_stat_activity) and command
progress (supporting pg_stat_progress*) related code is largely
independent from the rest of pgstat.[ch] (supporting views like
pg_stat_all_tables that accumulate data over time). See also
a333476b92.

This commit doesn't rename the function names to make the distinction
from the rest of pgstat_ clearer - that'd be more invasive and not
clearly beneficial. If we were to decide to do such a rename at some
point, it's better done separately from moving the code as well.

Robert's review was of an earlier version.

Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/20210316195440.twxmlov24rr2nxrg@alap3.anarazel.de
2021-04-03 11:42:52 -07:00
Michael Paquier e6bdfd9700 Refactor HMAC implementations
Similarly to the cryptohash implementations, this refactors the existing
HMAC code into a single set of APIs that can be plugged with any crypto
libraries PostgreSQL is built with (only OpenSSL currently).  If there
is no such libraries, a fallback implementation is available.  Those new
APIs are designed similarly to the existing cryptohash layer, so there
is no real new design here, with the same logic around buffer bound
checks and memory handling.

HMAC has a dependency on cryptohashes, so all the cryptohash types
supported by cryptohash{_openssl}.c can be used with HMAC.  This
refactoring is an advantage mainly for SCRAM, that included its own
implementation of HMAC with SHA256 without relying on the existing
crypto libraries even if PostgreSQL was built with their support.

This code has been tested on Windows and Linux, with and without
OpenSSL, across all the versions supported on HEAD from 1.1.1 down to
1.0.1.  I have also checked that the implementations are working fine
using some sample results, a custom extension of my own, and doing
cross-checks across different major versions with SCRAM with the client
and the backend.

Author: Michael Paquier
Reviewed-by: Bruce Momjian
Discussion: https://postgr.es/m/X9m0nkEJEzIPXjeZ@paquier.xyz
2021-04-03 17:30:49 +09:00
Andres Freund a333476b92 Split wait event related code from pgstat.[ch] into wait_event.[ch].
The wait event related code is independent from the rest of the
pgstat.[ch] code, of nontrivial size and changes on a regular
basis. Put it into its own set of files.

As there doesn't seem to be a good pre-existing directory for code
like this, add src/backend/utils/activity.

Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/20210316195440.twxmlov24rr2nxrg@alap3.anarazel.de
2021-04-02 20:02:26 -07:00
Thomas Munro c30f54ad73 Detect POLLHUP/POLLRDHUP while running queries.
Provide a new GUC check_client_connection_interval that can be used to
check whether the client connection has gone away, while running very
long queries.  It is disabled by default.

For now this uses a non-standard Linux extension (also adopted by at
least one other OS).  POLLRDHUP is not defined by POSIX, and other OSes
don't have a reliable way to know if a connection was closed without
actually trying to read or write.

In future we might consider trying to send a no-op/heartbeat message
instead, but that could require protocol changes.

Author: Sergey Cherkashin <s.cherkashin@postgrespro.ru>
Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Tatsuo Ishii <ishii@sraoss.co.jp>
Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Maksim Milyutin <milyutinma@gmail.com>
Reviewed-by: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (much earlier version)
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
2021-04-03 09:02:41 +13:00