Commit Graph

34034 Commits

Author SHA1 Message Date
Tom Lane b1cb77bdf4 Update time zone data files to tzdata release 2022f.
DST law changes in Chile, Fiji, Iran, Jordan, Mexico, Palestine,
and Syria.  Historical corrections for Chile, Crimea, Iran, and
Mexico.

Also, the Europe/Kiev zone has been renamed to Europe/Kyiv
(retaining the old name as a link).

The following zones have been merged into nearby, more-populous zones
whose clocks have agreed since 1970: Antarctica/Vostok, Asia/Brunei,
Asia/Kuala_Lumpur, Atlantic/Reykjavik, Europe/Amsterdam,
Europe/Copenhagen, Europe/Luxembourg, Europe/Monaco, Europe/Oslo,
Europe/Stockholm, Indian/Christmas, Indian/Cocos, Indian/Kerguelen,
Indian/Mahe, Indian/Reunion, Pacific/Chuuk, Pacific/Funafuti,
Pacific/Majuro, Pacific/Pohnpei, Pacific/Wake and Pacific/Wallis.
(This indirectly affects zones that were already links to one of
these: Arctic/Longyearbyen, Atlantic/Jan_Mayen, Iceland,
Pacific/Ponape, Pacific/Truk, and Pacific/Yap.)  America/Nipigon,
America/Rainy_River, America/Thunder_Bay, Europe/Uzhgorod, and
Europe/Zaporozhye were also merged into nearby zones after discovering
that their claimed post-1970 differences from those zones seem to have
been errors.

While the IANA crew have been working on merging zones that have no
post-1970 differences for some time, this batch of changes affects
some zones that are significantly more populous than those merged
in the past, notably parts of Europe.  The loss of pre-1970 timezone
history for those zones may be troublesome for applications
expecting consistency of timestamptz display.  As an example, the
stored value '1944-06-01 12:00 UTC' would previously display as
'1944-06-01 13:00:00+01' if the Europe/Stockholm zone is selected,
but now it will read out as '1944-06-01 14:00:00+02'.

There exists a "packrat" option that will build the timezone data
files with this old data preserved, but the problem is that it also
resurrects a bunch of other, far less well-attested data; so much so
that actually more zones' contents change from 2022a with that option
than without it.  I have chosen not to do that here, for that reason
and because it appears that no major OS distributions are using the
"packrat" option, so that doing so would cause Postgres' behavior
to diverge significantly depending on whether it was built with
--with-system-tzdata.  However, for anyone for whom these changes pose
significant problems, there is a solution: build a set of timezone
files with the "packrat" option and use those with Postgres.
2022-11-01 17:09:16 -04:00
Michael Paquier 341fba2a62 Fix ordering issue with WAL operations in GIN fast insert path
Contrary to what is documented in src/backend/access/transam/README,
ginHeapTupleFastInsert() had a few ordering issues with the way it does
its WAL operations when inserting items in its fast path.

First, when using a separate list, XLogBeginInsert() was being always
called before START_CRIT_SECTION(), and in this case a second thing was
wrong when merging lists, as an exclusive lock was taken on the tail
page *before* calling XLogBeginInsert().  Finally, when inserting items
into a tail page, the order of XLogBeginInsert() and
START_CRIT_SECTION() was reversed.  This commit addresses all these
issues by moving the calls of XLogBeginInsert() after all the pages
logged are locked and pinned, within a critical section.

This has been applied first only on HEAD as of 56b6625, but as per
discussion with Tom Lane and Álvaro Herrera, a backpatch is preferred to
keep all the branches consistent and to respect the transam's README
where we can.

Author:  Matthias van de Meent, Zhang Mingli
Discussion: https://postgr.es/m/CAEze2WhL8uLMqynnnCu1LAPwxD5RKEo0nHV+eXGg_N6ELU88HQ@mail.gmail.com
Backpatch-through: 10
2022-10-26 09:41:28 +09:00
Robert Haas 38214dabd4 pg_basebackup: Fix cross-platform tablespace relocation.
Specifically, when pg_basebackup is invoked with -Tx=y, don't error
out if x could plausibly be an absolute path either on Windows or on
non-Windows systems. We don't know whether the remote system is
running the same OS as the local system, so it's not appropriate to
assume that our local rule about absolute pathnames is the same as
the rule on the remote system.

Patch by me, reviewed by Tom Lane, Andrew Dunstan, and
Davinder Singh.

Discussion: http://postgr.es/m/CA+TgmoY+jC3YiskomvYKDPK3FbrmsDU7_8+wMHt02HOdJeRb0g@mail.gmail.com
2022-10-21 09:05:57 -04:00
Amit Kapila 5c51afe23d Add CHECK_FOR_INTERRUPTS while restoring changes during decoding.
Previously in commit 42681dffaf, we added CFI during decoding changes but
missed another similar case that can happen while restoring changes
spilled to disk back into memory in a loop.

Reported-by: Robert Haas
Author: Amit Kapila
Backpatch-through: 10
Discussion: https://postgr.es/m/CA+TgmoaLObg0QbstbC8ykDwOdD1bDkr4AbPpB=0DPgA2JW0mFg@mail.gmail.com
2022-10-21 12:08:14 +05:30
Amit Kapila 216af69aec Fix executing invalidation messages generated by subtransactions during decoding.
This problem has been introduced by commit 272248a0c1 where we started
assigning the subtransactions to the top-level transaction when we mark
both the top-level transaction and its subtransactions as containing
catalog changes. After we assign subtransactions to the top-level
transaction, we were not allowed to execute any invalidations associated
with it when we decide to skip the transaction.

The reason to assign the subtransactions to the top-level transaction was
to avoid the assertion failure in AssertTXNLsnOrder() as they have the
same LSN when we sometimes start accumulating transaction changes for
partial transactions after the restart. Now that with commit 64ff0fe4e8,
we skip this assertion check until we reach the LSN at which we start
decoding the contents of the transaction, so, there is no reason for such
an assignment anymore.

The assignment change was introduced in 15 and prior versions but this bug
doesn't exist in branches prior to 14 since we don't add invalidation
messages to subtransactions. We decided to backpatch through 11 for
consistency but not for 10 since its final release is near.

Reported-by: Kuroda Hayato
Author: Masahiko Sawada
Reviewed-by: Amit Kapila
Backpatch-through: 11
Discussion: https://postgr.es/m/TYAPR01MB58660803BCAA7849C8584AA4F57E9%40TYAPR01MB5866.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/a89b46b6-0239-2fd5-71a9-b19b1f7a7145%40enterprisedb.com
2022-10-21 09:22:20 +05:30
Amit Kapila 5f7076cb60 Fix assertion failures while processing NEW_CID record in logical decoding.
When the logical decoding restarts from NEW_CID, since there is no
association between the top transaction and its subtransaction, both are
created as top transactions and have the same LSN. This caused the
assertion failure in AssertTXNLsnOrder().

This patch skips the assertion check until we reach the LSN at which we
start decoding the contents of the transaction, specifically
start_decoding_at LSN in SnapBuild. This is okay because we don't
guarantee to make the association between top transaction and
subtransaction until we try to decode the actual contents of transaction.
The ordering of the records prior to the start_decoding_at LSN should have
been checked before the restart.

The other assertion failure is due to the reason that we forgot to track
that we have considered top-level transaction id in the list of catalog
changing transactions that were committed when one of its subtransactions
is marked as containing catalog change.

Reported-by: Tomas Vondra, Osumi Takamichi
Author: Masahiko Sawada, Kuroda Hayato
Reviewed-by: Amit Kapila, Dilip Kumar, Kuroda Hayato, Kyotaro Horiguchi, Masahiko Sawada
Backpatch-through: 10
Discussion: https://postgr.es/m/a89b46b6-0239-2fd5-71a9-b19b1f7a7145%40enterprisedb.com
Discussion: https://postgr.es/m/TYCPR01MB83733C6CEAE47D0280814D5AED7A9%40TYCPR01MB8373.jpnprd01.prod.outlook.com
2022-10-20 09:07:04 +05:30
Thomas Munro da3a6825eb Track LLVM 15 changes.
Per https://llvm.org/docs/OpaquePointers.html, support for non-opaque
pointers still exists and we can request that on our context.  We have
until LLVM 16 to move to opaque pointers, a much larger change.

Back-patch to 11, where LLVM support arrived.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAMHz58Sf_xncdyqsekoVsNeKcruKootLtVH6cYXVhhUR1oKPCg%40mail.gmail.com
2022-10-19 22:49:25 +13:00
Tom Lane e9377e3e53 Reject non-ON-SELECT rules that are named "_RETURN".
DefineQueryRewrite() has long required that ON SELECT rules be named
"_RETURN".  But we overlooked the converse case: we should forbid
non-ON-SELECT rules that are named "_RETURN".  In particular this
prevents using CREATE OR REPLACE RULE to overwrite a view's _RETURN
rule with some other kind of rule, thereby breaking the view.

Per bug #17646 from Kui Liu.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/17646-70c93cfa40365776@postgresql.org
2022-10-17 12:14:39 -04:00
Tom Lane 6618c276b7 Rename parser token REF to REF_P to avoid a symbol conflict.
In the latest version of Apple's macOS SDK, <sys/socket.h>
fails to compile if "REF" is #define'd as something.
Apple may or may not agree that this is a bug, and even if
they do accept the bug report I filed, they probably won't
fix it very quickly.  In the meantime, our back branches will all
fail to compile gram.y.  v15 and HEAD currently escape the problem
thanks to the refactoring done in 98e93a1fc, but that's purely
accidental.  Moreover, since that patch removed a widely-visible
inclusion of <netdb.h>, back-patching it seems too likely to break
third-party code.

Instead, change the token's code name to REF_P, following our usual
convention for naming parser tokens that are likely to have symbol
conflicts.  The effects of that should be localized to the grammar
and immediately surrounding files, so it seems like a safer answer.

Per project policy that we want to keep recently-out-of-support
branches buildable on modern systems, back-patch all the way to 9.2.

Discussion: https://postgr.es/m/1803927.1665938411@sss.pgh.pa.us
2022-10-16 15:27:04 -04:00
Tom Lane 6c1de98bad Harden pmsignal.c against clobbered shared memory.
The postmaster is not supposed to do anything that depends
fundamentally on shared memory contents, because that creates
the risk that a backend crash that trashes shared memory will
take the postmaster down with it, preventing automatic recovery.
In commit 969d7cd43 I lost sight of this principle and coded
AssignPostmasterChildSlot() in such a way that it could fail
or even crash if the shared PMSignalState structure became
corrupted.  Remarkably, we've not seen field reports of such
crashes; but I managed to induce one while testing the recent
changes around palloc chunk headers.

To fix, make a semi-duplicative state array inside the postmaster
so that we need consult only local state while choosing a "child
slot" for a new backend.  Ensure that other postmaster-executed
routines in pmsignal.c don't have critical dependencies on the
shared state, either.  Corruption of PMSignalState might now
lead ReleasePostmasterChildSlot() to conclude that backend X
failed, when actually backend Y was the one that trashed things.
But that doesn't matter, because we'll force a cluster-wide reset
regardless.

Back-patch to all supported branches, since this is an old bug.

Discussion: https://postgr.es/m/3436789.1665187055@sss.pgh.pa.us
2022-10-11 18:54:31 -04:00
Tom Lane addde9bc6c Yet further fixes for multi-row VALUES lists for updatable views.
DEFAULT markers appearing in an INSERT on an updatable view
could be mis-processed if they were in a multi-row VALUES clause.
This would lead to strange errors such as "cache lookup failed
for type NNNN", or in older branches even to crashes.

The cause is that commit 41531e42d tried to re-use rewriteValuesRTE()
to remove any SetToDefault nodes (that hadn't previously been replaced
by the view's own default values) appearing in "product" queries,
that is DO ALSO queries.  That's fundamentally wrong because the
DO ALSO queries might not even be INSERTs; and even if they are,
their targetlists don't necessarily match the view's column list,
so that almost all the logic in rewriteValuesRTE() is inapplicable.

What we want is a narrow focus on replacing any such nodes with NULL
constants.  (That is, in this context we are interpreting the defaults
as being strictly those of the view itself; and we already replaced
any that aren't NULL.)  We could add still more !force_nulls tests
to further lobotomize rewriteValuesRTE(); but it seems cleaner to
split out this case to a new function, restoring rewriteValuesRTE()
to the charter it had before.

Per bug #17633 from jiye_sw.  Patch by me, but thanks to
Richard Guo and Japin Li for initial investigation.
Back-patch to all supported branches, as the previous fix was.

Discussion: https://postgr.es/m/17633-98cc85e1fa91e905@postgresql.org
2022-10-11 18:24:15 -04:00
Alvaro Herrera dd82638734
Ensure all perl test modules are installed
PostgreSQL::Test::Cluster and ::Utils were not being installed.  This is
very hard to notice, as it only seems to affect external modules that
want to run tests from 15 back in earlier versions.  Oversight in
b235d41d96.

This applies only to branches 14 and back, because 15 had already been
made correct in commit b3b4d8e68a.

Discussion: https://postgr.es/m/20221010093415.poplkyn7pjeiv2y7@alvherre.pgsql
2022-10-11 09:56:13 +02:00
Alvaro Herrera 1f2f50bc1b
Change some errdetail() to errdetail_internal()
This prevents marking the argument string for translation for gettext,
and it also prevents the given string (which is already translated) from
being translated at runtime.

Also, mark the strings used as arguments to check_rolespec_name for
translation.

Backpatch all the way back as appropriate.  None of this is caught by
any tests (necessarily so), so I verified it manually.
2022-09-28 17:14:53 +02:00
Alvaro Herrera c9a9eae569
Add missing source files to pg_waldump/nls.mk 2022-09-25 17:48:03 +02:00
Tom Lane bb8dbc9f25 Suppress more variable-set-but-not-used warnings from clang 15.
Mop up assorted set-but-not-used warnings in the back branches.
This includes back-patching relevant fixes from commit 152c9f7b8
the rest of the way, but there are also several cases that did not
appear in HEAD.  Some of those we'd fixed in a retail way but not
back-patched, and others I think just got rewritten out of existence
during nearby refactoring.

While here, also back-patch b1980f6d0 (PL/Tcl: Fix compiler warnings
with Tcl 8.6) into 9.2, so that that branch compiles warning-free
with modern Tcl.

Per project policy, this is a candidate for back-patching into
out-of-support branches: it suppresses annoying compiler warnings
but changes no behavior.  Hence, back-patch all the way to 9.2.

Discussion: https://postgr.es/m/514615.1663615243@sss.pgh.pa.us
2022-09-21 13:52:38 -04:00
Tom Lane 6ae8aee0b6 Suppress variable-set-but-not-used warnings from clang 15.
clang 15+ will issue a set-but-not-used warning when the only
use of a variable is in autoincrements (e.g., "foo++;").
That's perfectly sensible, but it detects a few more cases that
we'd not noticed before.  Silence the warnings with our usual
methods, such as PG_USED_FOR_ASSERTS_ONLY, or in one case by
actually removing a useless variable.

One thing that we can't nicely get rid of is that with %pure-parser,
Bison emits "yynerrs" as a local variable that falls foul of this
warning.  To silence those, I inserted "(void) yynerrs;" in the
top-level productions of affected grammars.

Per recently-established project policy, this is a candidate
for back-patching into out-of-support branches: it suppresses
annoying compiler warnings but changes no behavior.  Hence,
back-patch to 9.5, which is as far as these patches go without
issues.  (A preliminary check shows that the prior branches
need some other set-but-not-used cleanups too, so I'll leave
them for another day.)

Discussion: https://postgr.es/m/514615.1663615243@sss.pgh.pa.us
2022-09-20 12:04:37 -04:00
Tom Lane cb7d636e0f Future-proof the recursion inside ExecShutdownNode().
The API contract for planstate_tree_walker() callbacks is that they
take a PlanState pointer and a context pointer.  Somebody figured
they could save a couple lines of code by ignoring that, and passing
ExecShutdownNode itself as the walker even though it has but one
argument.  Somewhat remarkably, we've gotten away with that so far.
However, it seems clear that the upcoming C2x standard means to
forbid such cases, and compilers that actively break such code
likely won't be far behind.  So spend the extra few lines of code
to do it honestly with a separate walker function.

In HEAD, we might as well go further and remove ExecShutdownNode's
useless return value.  I left that as-is in back branches though,
to forestall complaints about ABI breakage.

Back-patch, with the thought that this might become of practical
importance before our stable branches are all out of service.
It doesn't seem to be fixing any live bug on any currently known
platform, however.

Discussion: https://postgr.es/m/208054.1663534665@sss.pgh.pa.us
2022-09-19 12:16:02 -04:00
Peter Geoghegan f01fd89b15 Make check_usermap() parameter names consistent.
The function has a bool argument named "case_insensitive", but that was
spelled "case_sensitive" in the declaration.  Make them consistent now
to avoid confusion in the future.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Michael Paquiër <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
Backpatch: 10-
2022-09-17 16:54:08 -07:00
Tom Lane 7391ab28a6 Improve plpgsql's ability to handle arguments declared as RECORD.
Treat arguments declared as RECORD as if that were a polymorphic type
(which it is, sort of), in that we substitute the actual argument type
while forming the function cache lookup key.  This allows the specific
composite type to be known in some cases where it was not before,
at the cost of making a separate function cache entry for each named
composite type that's passed to the function during a session.  The
particular symptom discussed in bug #17610 could be solved in other
more-efficient ways, but only at the cost of considerable development
work, and there are other cases where we'd still fail without this.

Per bug #17610 from Martin Jurča.  Back-patch to v11 where we first
allowed plpgsql functions to be declared as taking type RECORD.

Discussion: https://postgr.es/m/17610-fb1eef75bf6c2364@postgresql.org
2022-09-16 13:23:01 -04:00
Tom Lane 19bb5e46b9 In back branches, fix conditions for pullup of FROM-less subqueries.
In branches before commit 4be058fe9, we have to prevent flattening
of subqueries with empty jointrees if the subqueries' output
columns might need to be wrapped in PlaceHolderVars.  That's
because the empty jointree would result in empty phrels for the
PlaceHolderVars, meaning we'd be unable to figure out where to
evaluate them.  However, we've failed to keep is_simple_subquery's
check for this hazard in sync with what pull_up_simple_subquery
actually does.  The former is checking "lowest_outer_join != NULL",
whereas the conditions pull_up_simple_subquery actually uses are

	if (lowest_nulling_outer_join != NULL)
	if (containing_appendrel != NULL)
	if (parse->groupingSets)

So the outer-join test is overly conservative, while we missed out
checking for appendrels and groupingSets.  The appendrel omission
is harmless, because in that case we also check is_safe_append_member
which will also reject such subqueries.  The groupingSets omission
is a bug though, leading to assertion failures or planner errors
such as "variable not found in subplan target lists".

is_simple_subquery has access to none of the three variables used
in the correct tests, but its callers do, so I chose to have them
pass down a bool corresponding to the OR of these conditions.
(The need for duplicative conditions would be a maintenance
hazard in actively-developed code, but I'm not too concerned
about it in branches that have only ~ 1 year to live.)

Per bug #17614 from Wei Wei.  Patch v10 and v11 only, since we have a
better answer to this in v12 and later (indeed, the faulty check in
is_simple_subquery is gone entirely).

Discussion: https://postgr.es/m/17614-8ec20c85bdecaa2a@postgresql.org
2022-09-15 15:21:35 -04:00
Michael Paquier f01cc02259 Fix incorrect value for "strategy" with deflateParams() in walmethods.c
The zlib documentation mentions the values supported for the compression
strategy, but this code has been using a hardcoded value of 0 rather
than Z_DEFAULT_STRATEGY.  This commit adjusts the code to use
Z_DEFAULT_STRATEGY.

Backpatch down to where this code has been added to ease the backport of
any future patch touching this area.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/1400032.1662217889@sss.pgh.pa.us
Backpatch-through: 10
2022-09-14 14:52:35 +09:00
Peter Eisentraut e962235fe1 Expand palloc/pg_malloc API for more type safety
This adds additional variants of palloc, pg_malloc, etc. that
encapsulate common usage patterns and provide more type safety.

Specifically, this adds palloc_object(), palloc_array(), and
repalloc_array(), which take the type name of the object to be
allocated as its first argument and cast the return as a pointer to
that type.  There are also palloc0_object() and palloc0_array()
variants for initializing with zero, and pg_malloc_*() variants of all
of the above.

Inspired by the talloc library.

This is backpatched from master so that future backpatchable code can
make use of these APIs.  This patch by itself does not contain any
users of these APIs.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/bb755632-2a43-d523-36f8-a1e7a389a907@enterprisedb.com
2022-09-14 06:08:56 +02:00
Tom Lane fe4e151d4a Fix possible omission of variable storage markers in ECPG.
The ECPG preprocessor converted code such as

static varchar str1[10], str2[20], str3[30];

into

static  struct varchar_1  { int len; char arr[ 10 ]; }  str1 ;
        struct varchar_2  { int len; char arr[ 20 ]; }  str2 ;
        struct varchar_3  { int len; char arr[ 30 ]; }  str3 ;

thus losing the storage attribute for the later variables.
Repeat the declaration for each such variable.

(Note that this occurred only for variables declared "varchar"
or "bytea", which may help explain how it escaped detection
for so long.)

Andrey Sokolov

Discussion: https://postgr.es/m/942241662288242@mail.yandex.ru
2022-09-09 15:34:04 -04:00
Tom Lane 9bcf6fb281 Further fixes for MULTIEXPR_SUBLINK fix.
Some more things I didn't think about in commits 3f7323cbb et al:

MULTIEXPR_SUBLINK subplans might have been converted to initplans
instead of regular subplans, in which case they won't show up in
the modified targetlist.  Fortunately, this would only happen if
they have no input parameters, which means that the problem we
originally needed to fix can't happen with them.  Therefore, there's
no need to clone their output parameters, and thus it doesn't hurt
that we'll fail to see them in the first pass over the targetlist.
Nonetheless, this complicates matters greatly, because now we have
to distinguish output Params of initplans (which shouldn't get
renumbered) from those of regular subplans (which should).
This also breaks the simplistic scheme I used of assuming that the
subplans found in the targetlist have consecutive subLinkIds.
We really can't avoid the need to know the subplans' subLinkIds in
this code.  To fix that, add subLinkId as the last field of SubPlan.
We can get away with that change in back branches because SubPlan
nodes will never be stored in the catalogs, and there's no ABI
break for external code that might be looking at the existing
fields of SubPlan.

Secondly, rewriteTargetListIU might have rolled up multiple
FieldStores or SubscriptingRefs into one targetlist entry,
breaking the assumption that there's at most one Param to fix
per targetlist entry.  (That assumption is OK I think in the
ruleutils.c code I stole the logic from in 18f51083c, because
that only deals with pre-rewrite query trees.  But it's
definitely not OK here.)  Abandon that shortcut and just do a
full tree walk on the targetlist to ensure we find all the
Params we have to change.

Per bug #17606 from Andre Lin.  As before, only v10-v13 need the
patch.

Discussion: https://postgr.es/m/17606-e5c8ad18d31db96a@postgresql.org
2022-09-06 16:38:18 -04:00
Peter Geoghegan a228cca46d Backpatch nbtree page deletion hardening.
Postgres 14 commit 5b861baa taught nbtree VACUUM to tolerate buggy
opclasses.  VACUUM's inability to locate a to-be-deleted page's downlink
in the parent page was logged instead of throwing an error.  VACUUM
could just press on with vacuuming the index, and vacuuming the table as
a whole.

There are now anecdotal reports of this error causing problems that were
much more disruptive than the underlying index corruption ever could be.
Anything that makes VACUUM unable to make forward progress against one
table/index ultimately risks making the system enter xidStopLimit mode.
There is no good reason to take any chances here, so backpatch the
hardening commit.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wzm9HR6Pow=t-iQa57zT8qmX6_M4h14F-pTtb=xFDW5FBA@mail.gmail.com
Backpatch: 10-13 (all supported versions that lacked the hardening)
2022-09-05 11:20:01 -07:00
Tom Lane 56dc442447 Fix oversight in recent MULTIEXPR_SUBLINK fix.
Commits 3f7323cbb et al missed the possibility that the Params
they are looking for could be buried under implicit coercions,
as well as other stuff that processIndirection() could add to
the original targetlist entry.  Copy the code in ruleutils.c
that deals with such cases.  (I thought about refactoring so
that there's just one copy; but seeing that we only need this
in old back branches, it seems not worth the trouble.)

Per off-list report from Andre Lin.  As before, only v10-v13
need the patch.

Discussion: https://postgr.es/m/17596-c5357f61427a81dc@postgresql.org
2022-09-02 14:54:40 -04:00
David Rowley 1b11543967 Fix some possibly latent bugs in slab.c
Primarily, this fixes an incorrect calculation in SlabCheck which was
looking in the wrong byte for the sentinel check.  The reason that we've
never noticed this before in the form of a failing sentinel check is
because the pre-check to this always fails because all current core users
of slab contexts have a chunk size which is already MAXALIGNed, therefore
there's never any space for the sentinel byte.  It is possible that an
extension needs to use a slab context and if they do with a chunk size
that's not MAXALIGNed, then they'll likely get errors about overwritten
sentinel bytes.

Additionally, this patch changes various calculations which are being done
based on the sizeof(SlabBlock).  Currently, sizeof(SlabBlock) is a
multiple of 8, therefore sizeof(SlabBlock) is the same as
MAXALIGN(sizeof(SlabBlock)), however, if we were to ever have to add any
fields to that struct as part of a bug fix, then SlabAlloc could end up
returning a non-MAXALIGNed pointer.  To be safe, let's ensure we always
MAXALIGN sizeof(SlabBlock) before using it in any calculations.

This patch has already been applied to master in d5ee4db0e.

Diagnosed-by: Tomas Vondra, Tom Lane
Author: Tomas Vondra, David Rowley
Discussion: https://postgr.es/m/CAA4eK1%2B1JyW5TiL%3DyV-3Uq1CrfnTyn0Xrk5uArt31Z%3D8rgPhXQ%40mail.gmail.com
Backpatch-through: 10
2022-09-01 19:24:19 +12:00
Tom Lane cc0e506a64 Port regress-python3-mangle.mk to Solaris "sed", redux.
Per experimentation and buildfarm failures, Solaris' "sed"
has got some kind of problem with regexes that use both '*'
and '[[:alpha:]]'.  We can work around that by replacing
'[[:alpha:]]' with '[a-zA-Z]', which is plenty good enough
for our purposes, especially since this is only needed in
long-stable branches.

I chose to flat-out remove the second pattern of this sort,
's/except \([a-zA-Z][a-zA-Z.]*\), *\([a-zA-Z][a-zA-Z]*\):/except \1 as \2:/g'
because we haven't needed it since 8.4.

Follow-on to c3556f6fa, which probably missed catching this
because the problematic pattern was already gone when that
patch was written.

Patch v10-v12 only, as the problem manifests only there.
We have a line of dead code in v13-v14, which isn't worth
changing, and the whole mess is gone as of v15.

Discussion: https://postgr.es/m/165561.1661984701@sss.pgh.pa.us
2022-08-31 21:33:45 -04:00
Tom Lane cb9232d166 Prevent long-term memory leakage in autovacuum launcher.
get_database_list() failed to restore the caller's memory context,
instead leaving current context set to TopMemoryContext which is
how CommitTransactionCommand() leaves it.  The callers both think
they are using short-lived contexts, for the express purpose of
not having to worry about cleaning up individual allocations.
The net effect therefore is that supposedly short-lived allocations
could accumulate indefinitely in the launcher's TopMemoryContext.

Although this has been broken for a long time, it seems we didn't
have any obvious memory leak here until v15's rearrangement of the
stats logic.  I (tgl) am not entirely convinced that there's no
other leak at all, though, and we're surely at risk of adding one
in future back-patched fixes.  So back-patch to all supported
branches, even though this may be only a latent bug in pre-v15.

Reid Thompson

Discussion: https://postgr.es/m/972a4e12b68b0f96db514777a150ceef7dcd2e0f.camel@crunchydata.com
2022-08-31 16:23:20 -04:00
Tom Lane f5aa855cd8 In the Snowball dictionary, don't try to stem excessively-long words.
If the input word exceeds 1000 bytes, don't pass it to the stemmer;
just return it as-is after case folding.  Such an input is surely
not a word in any human language, so whatever the stemmer might
do to it would be pretty dubious in the first place.  Adding this
restriction protects us against a known recursion-to-stack-overflow
problem in the Turkish stemmer, and it seems like good insurance
against any other safety or performance issues that may exist in
the Snowball stemmers.  (I note, for example, that they contain no
CHECK_FOR_INTERRUPTS calls, so we really don't want them running
for a long time.)  The threshold of 1000 bytes is arbitrary.

An alternative definition could have been to treat such words as
stopwords, but that seems like a bigger break from the old behavior.

Per report from Egor Chindyaskin and Alexander Lakhin.
Thanks to Olly Betts for the recommendation to fix it this way.

Discussion: https://postgr.es/m/1661334672.728714027@f473.i.mail.ru
2022-08-31 10:42:05 -04:00
Tom Lane 6fd58ca770 On NetBSD, force dynamic symbol resolution at postmaster start.
The default of lazy symbol resolution means that when the postmaster
first reaches the select() call in ServerLoop, it'll need to resolve
the link to that libc entry point.  NetBSD's dynamic loader takes
an internal lock while doing that, and if a signal interrupts the
operation then there is a risk of self-deadlock should the signal
handler do anything that requires that lock, as several of the
postmaster signal handlers do.  The window for this is pretty narrow,
and timing considerations make it unlikely that a signal would arrive
right then anyway.  But it's semi-repeatable on slow single-CPU
machines, and in principle the race could happen with any hardware.

The least messy solution to this is to force binding of dynamic
symbols at postmaster start, using the "-z now" linker option.
While we're at it, also use "-z relro" so as to provide a small
security gain.

It's not entirely clear whether any other platforms share this
issue, but for now we'll assume it's NetBSD-specific.  (We might
later try to use "-z now" on more platforms for performance
reasons, but that would not likely be something to back-patch.)

Report and patch by me; the idea to fix it this way is from
Andres Freund.

Discussion: https://postgr.es/m/3384826.1661802235@sss.pgh.pa.us
2022-08-30 17:29:17 -04:00
Robert Haas 002fba80eb Prevent WAL corruption after a standby promotion.
When a PostgreSQL instance performing archive recovery but not using
standby mode is promoted, and the last WAL segment that it attempted
to read ended in a partial record, the previous code would create
invalid WAL on the new timeline. The WAL from the previously timeline
would be copied to the new timeline up until the end of the last valid
record, but instead of beginning to write WAL at immediately
afterwards, the promoted server would write an overwrite contrecord at
the beginning of the next segment. The end of the previous segment
would be left as all-zeroes, resulting in failures if anything tried
to read WAL from that file.

The root of the issue is that ReadRecord() decides whether to set
abortedRecPtr and missingContrecPtr based on the value of StandbyMode,
but ReadRecord() switches to a new timeline based on the value of
ArchiveRecoveryRequested. We shouldn't try to write an overwrite
contrecord if we're switching to a new timeline, so change the test in
ReadRecod() to check ArchiveRecoveryRequested instead.

Code fix by Dilip Kumar. Comments by me incorporating suggested
language from Álvaro Herrera. Further review from Kyotaro Horiguchi
and Sami Imseih.

Discussion: http://postgr.es/m/CAFiTN-t7umki=PK8dT1tcPV=mOUe2vNhHML6b3T7W7qqvvajjg@mail.gmail.com
Discussion: http://postgr.es/m/FB0DEA0B-E14E-43A0-811F-C1AE93D00FF3%40amazon.com
2022-08-29 12:06:30 -04:00
Tom Lane d9ebc582fb Repair rare failure of MULTIEXPR_SUBLINK subplans in inherited updates.
Prior to v14, if we have a MULTIEXPR SubPlan (that is, use of the syntax
UPDATE ... SET (c1, ...) = (SELECT ...)) in an UPDATE with an inherited
or partitioned target table, inheritance_planner() will clone the
targetlist and therefore also the MULTIEXPR SubPlan and the Param nodes
referencing it for each child target table.  Up to now, we've allowed
all the clones to share the underlying subplan as well as the output
parameter IDs -- that is, the runtime ParamExecData slots.  That
technique is borrowed from the far older code that supports initplans,
and it works okay in that case because the cloned SubPlan nodes are
essentially identical.  So it doesn't matter which one of the clones
the shared ParamExecData.execPlan field might point to.

However, this fails to hold for MULTIEXPR SubPlans, because they can
have nonempty "args" lists (values to be passed into the subplan), and
those lists could get mutated to different states in the various clones.
In the submitted reproducer, as well as the test case added here, one
clone contains Vars with varno OUTER_VAR where another has INNER_VAR,
because the child tables are respectively on the outer or inner side of
the join.  Sharing the execPlan pointer can result in trying to evaluate
an args list that doesn't match the local execution state, with mayhem
ensuing.  The result often is to trigger consistency checks in the
executor, but I believe this could end in a crash or incorrect updates.

To fix, assign new Param IDs to each of the cloned SubPlans, so that
they don't share ParamExecData slots at runtime.  It still seems fine
for the clones to share the underlying subplan, and extra ParamExecData
slots are cheap enough that this fix shouldn't cost much.

This has been busted since we invented MULTIEXPR SubPlans in 9.5.
Probably the lack of previous reports is because query plans in which
the different clones of a MULTIEXPR mutate to effectively-different
states are pretty rare.  There's no issue in v14 and later, because
without inheritance_planner() there's never a reason to clone
MULTIEXPR SubPlans.

Per bug #17596 from Andre Lin.  Patch v10-v13 only.

Discussion: https://postgr.es/m/17596-c5357f61427a81dc@postgresql.org
2022-08-27 12:11:20 -04:00
Etsuro Fujita 1d40200a98 Fix typo in comment. 2022-08-26 16:55:07 +09:00
Tom Lane 310d734efb Defend against stack overrun in a few more places.
SplitToVariants() in the ispell code, lseg_inside_poly() in geo_ops.c,
and regex_selectivity_sub() in selectivity estimation could recurse
until stack overflow; fix by adding check_stack_depth() calls.
So could next() in the regex compiler, but that case is better fixed by
converting its tail recursion to a loop.  (We probably get better code
that way too, since next() can now be inlined into its sole caller.)

There remains a reachable stack overrun in the Turkish stemmer, but
we'll need some advice from the Snowball people about how to fix that.

Per report from Egor Chindyaskin and Alexander Lakhin.  These mistakes
are old, so back-patch to all supported branches.

Richard Guo and Tom Lane

Discussion: https://postgr.es/m/1661334672.728714027@f473.i.mail.ru
2022-08-24 13:01:40 -04:00
Tom Lane d0371f190a Doc: prefer sysctl to /proc/sys in docs and comments.
sysctl is more portable than Linux's /proc/sys file tree, and
often easier to use too.  That's why most of our docs refer to
sysctl when talking about how to adjust kernel parameters.
Bring the few stragglers into line.

Discussion: https://postgr.es/m/361175.1661187463@sss.pgh.pa.us
2022-08-23 09:42:28 -04:00
Amit Kapila 51e9469a41 Add CHECK_FOR_INTERRUPTS while decoding changes.
While decoding changes in a loop, if we skip all the changes there is no
CFI making the loop uninterruptible.

Reported-by: Whale Song and Andrey Borodin
Bug: 17580
Author: Masahiko Sawada
Reviwed-by: Amit Kapila
Backpatch-through: 10
Discussion: https://postgr.es/m/17580-849c1d5b6d7eb422@postgresql.org
Discussion: https://postgr.es/m/B319ECD6-9A28-4CDF-A8F4-3591E0BF2369@yandex-team.ru
2022-08-23 08:42:51 +05:30
Tom Lane 116f20f926 Fix subtly-incorrect matching of parent and child partitioned indexes.
When creating a partitioned index, DefineIndex tries to identify
any existing indexes on the partitions that match the partitioned
index, so that it can absorb those as child indexes instead of
building new ones.  Part of the matching is to compare IndexInfo
structs --- but that wasn't done quite right.  We're comparing
the IndexInfo built within DefineIndex itself to one made from
existing catalog contents by BuildIndexInfo.  Notably, while
BuildIndexInfo will run index expressions and predicates through
expression preprocessing, that has not happened to DefineIndex's
struct.  The result is failure to match and subsequent creation
of duplicate indexes.

The easiest and most bulletproof fix is to build a new IndexInfo
using BuildIndexInfo, thereby guaranteeing that the processing done
is identical.

While here, let's also extract the opfamily and collation data
from the new partitioned index, removing ad-hoc logic that
duplicated knowledge about how those are constructed.

Per report from Christophe Pettus.  Back-patch to v11 where
we invented partitioned indexes.

Richard Guo and Tom Lane

Discussion: https://postgr.es/m/8864BFAA-81FD-4BF9-8E06-7DEB8D4164ED@thebuild.com
2022-08-18 12:11:47 -04:00
Tom Lane ee4a17e200 Add missing bad-PGconn guards in libpq entry points.
There's a convention that externally-visible libpq functions should
check for a NULL PGconn pointer, and fail gracefully instead of
crashing.  PQflush() and PQisnonblocking() didn't get that memo
though.  Also add a similar check to PQdefaultSSLKeyPassHook_OpenSSL;
while it's not clear that ordinary usage could reach that with a
null conn pointer, it's cheap enough to check, so let's be consistent.

Daniele Varrazzo and Tom Lane

Discussion: https://postgr.es/m/CA+mi_8Zm_mVVyW1iNFgyMd9Oh0Nv8-F+7Y3-BqwMgTMHuo_h2Q@mail.gmail.com
2022-08-15 15:40:07 -04:00
Michael Paquier 6e086bb1db Fix outdated --help message for postgres -f
This option switch supports a total of 8 values, as told by
set_plan_disabling_options() and the documentation, but this was not
reflected in the output generated by --help.

Author: Junwang Zhao
Discussion: https://postgr.es/m/CAEG8a3+pT3cWzyjzKs184L1XMNm8NDnoJLiSjAYSO7XqpRh_vA@mail.gmail.com
Backpatch-through: 10
2022-08-15 13:37:44 +09:00
Tom Lane 84f9691a19 Preserve memory context of VarStringSortSupport buffers.
When enlarging the work buffers of a VarStringSortSupport object,
varstrfastcmp_locale was careful to keep them in the ssup_cxt
memory context; but varstr_abbrev_convert just used palloc().
The latter creates a hazard that the buffers could be freed out
from under the VarStringSortSupport object, resulting in stomping
on whatever gets allocated in that memory later.

In practice, because we only use this code for ICU collations
(cf. 3df9c374e), the problem is confined to use of ICU collations.
I believe it may have been unreachable before the introduction
of incremental sort, too, as traditional sorting usually just
uses one context for the duration of the sort.

We could fix this by making the broken stanzas in varstr_abbrev_convert
match the non-broken ones in varstrfastcmp_locale.  However, it seems
like a better idea to dodge the issue altogether by replacing the
pfree-and-allocate-anew coding with repalloc, which automatically
preserves the chunk's memory context.  This fix does add a few cycles
because repalloc will copy the chunk's content, which the existing
coding assumes is useless.  However, we don't expect that these buffer
enlargement operations are performance-critical.  Besides that, it's
far from obvious that copying the buffer contents isn't required, since
these stanzas make no effort to mark the buffers invalid by resetting
last_returned, cache_blob, etc.  That seems to be safe upon examination,
but it's fragile and could easily get broken in future, which wouldn't
get revealed in testing with short-to-moderate-size strings.

Per bug #17584 from James Inform.  Whether or not the issue is
reachable in the older branches, this code has been broken on its
own terms from its introduction, so patch all the way back.

Discussion: https://postgr.es/m/17584-95c79b4a7d771f44@postgresql.org
2022-08-14 12:05:27 -04:00
Tom Lane b744e13b02 Catch stack overflow when recursing in transformFromClauseItem().
Most parts of the parser can expect that the stack overflow check
in transformExprRecurse() will trigger before things get desperate.
However, transformFromClauseItem() can recurse directly to self
without having analyzed any expressions, so it's possible to drive
it to a stack-overrun crash.  Add a check to prevent that.

Per bug #17583 from Egor Chindyaskin.  Back-patch to all supported
branches.

Richard Guo

Discussion: https://postgr.es/m/17583-33be55b9f981f75c@postgresql.org
2022-08-13 15:21:28 -04:00
Peter Eisentraut db9ec28d64 Add missing fields to _outConstraint()
As of 897795240c, check constraints can
be declared invalid.  But that patch didn't update _outConstraint() to
also show the relevant struct fields (which were only applicable to
foreign keys before that).  This currently only affects debugging
output, so no impact in practice.
2022-08-13 10:37:55 +02:00
Peter Eisentraut e5c2feac0e pg_upgrade: Fix some minor code issues
96ef3b8ff1 accidentally copied a not
applicable comment from the float8_pass_by_value code to the
data_checksums code.  Remove that.

87d3b35a1c changed pg_upgrade to
checking the checksum version rather than just the Boolean presence of
checksums, but didn't change the field type in its ControlData struct
from bool.  So this would not work correctly if there ever is a
checksum version larger than 1.
2022-08-13 00:16:17 +02:00
Peter Eisentraut bc11a7e2f3 Fix _outConstraint() for "identity" constraints
The set of fields printed by _outConstraint() in the CONSTR_IDENTITY
case didn't match the set of fields actually used in that case.  (The
code was probably uncarefully copied from the CONSTR_DEFAULT case.)
Fix that by using the right set of fields.  Since there is no read
support for this node type, this is really just for debugging output
right now, so it doesn't affect anything important.
2022-08-12 08:52:28 +02:00
Amit Kapila 63903546e5 Back-Patch "Add wait_for_subscription_sync for TAP tests."
This was originally done in commit 0c20dd33db for 16 only, to eliminate
duplicate code and as an infrastructure that makes it easier to write
future tests. However, it has been suggested that it would be good to
back-patch this testing infrastructure to aid future tests in
back-branches.

Backpatch to all supported versions.

Author: Masahiko Sawada
Reviewed by: Amit Kapila, Shi yu
Discussion: https://postgr.es/m/CAD21AoC-fvAkaKHa4t1urupwL8xbAcWRePeETvshvy80f6WV1A@mail.gmail.com
Discussion: https://postgr.es/m/E1oJBIf-0006sw-SA@gemulon.postgresql.org
2022-08-12 10:30:04 +05:30
Amit Kapila e721123b7a Fix catalog lookup with the wrong snapshot during logical decoding.
Previously, we relied on HEAP2_NEW_CID records and XACT_INVALIDATION
records to know if the transaction has modified the catalog, and that
information is not serialized to snapshot. Therefore, after the restart,
if the logical decoding decodes only the commit record of the transaction
that has actually modified a catalog, we will miss adding its XID to the
snapshot. Thus, we will end up looking at catalogs with the wrong
snapshot.

To fix this problem, this changes the snapshot builder so that it
remembers the last-running-xacts list of the decoded RUNNING_XACTS record
after restoring the previously serialized snapshot. Then, we mark the
transaction as containing catalog changes if it's in the list of initial
running transactions and its commit record has XACT_XINFO_HAS_INVALS. To
avoid ABI breakage, we store the array of the initial running transactions
in the static variables InitialRunningXacts and NInitialRunningXacts,
instead of storing those in SnapBuild or ReorderBuffer.

This approach has a false positive; we could end up adding the transaction
that didn't change catalog to the snapshot since we cannot distinguish
whether the transaction has catalog changes only by checking the COMMIT
record. It doesn't have the information on which (sub) transaction has
catalog changes, and XACT_XINFO_HAS_INVALS doesn't necessarily indicate
that the transaction has catalog change. But that won't be a problem since
we use snapshot built during decoding only to read system catalogs.

On the master branch, we took a more future-proof approach by writing
catalog modifying transactions to the serialized snapshot which avoids the
above false positive. But we cannot backpatch it because of a change in
the SnapBuild.

Reported-by: Mike Oh
Author: Masahiko Sawada
Reviewed-by: Amit Kapila, Shi yu, Takamichi Osumi, Kyotaro Horiguchi, Bertrand Drouvot, Ahsan Hadi
Backpatch-through: 10
Discussion: https://postgr.es/m/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2%40amazon.com
2022-08-11 08:55:31 +05:30
Tom Lane 442dbd6698 Fix handling of R/W expanded datums that are passed to SQL functions.
fmgr_sql must make expanded-datum arguments read-only, because
it's possible that the function body will pass the argument to
more than one callee function.  If one of those functions takes
the datum's R/W property as license to scribble on it, then later
callees will see an unexpected value, leading to wrong answers.

From a performance standpoint, it'd be nice to skip this in the
common case that the argument value is passed to only one callee.
However, detecting that seems fairly hard, and certainly not
something that I care to attempt in a back-patched bug fix.

Per report from Adam Mackler.  This has been broken since we
invented expanded datums, so back-patch to all supported branches.

Discussion: https://postgr.es/m/WScDU5qfoZ7PB2gXwNqwGGgDPmWzz08VdydcPFLhOwUKZcdWbblbo-0Lku-qhuEiZoXJ82jpiQU4hOjOcrevYEDeoAvz6nR0IU4IHhXnaCA=@mackler.email
Discussion: https://postgr.es/m/187436.1660143060@sss.pgh.pa.us
2022-08-10 13:37:25 -04:00
Tom Lane 866f915688 Stamp 11.17. 2022-08-08 16:49:00 -04:00
Tom Lane 4c1dd68026 Stabilize output of new regression test.
Per buildfarm, the output order of \dx+ isn't consistent across
locales.  Apply NO_LOCALE to force C locale.  There might be a
more localized way, but I'm not seeing it offhand, and anyway
there is nothing in this test module that particularly cares
about locales.

Security: CVE-2022-2625
2022-08-08 12:16:01 -04:00
Tom Lane f52d2fbd8c In extensions, don't replace objects not belonging to the extension.
Previously, if an extension script did CREATE OR REPLACE and there was
an existing object not belonging to the extension, it would overwrite
the object and adopt it into the extension.  This is problematic, first
because the overwrite is probably unintentional, and second because we
didn't change the object's ownership.  Thus a hostile user could create
an object in advance of an expected CREATE EXTENSION command, and would
then have ownership rights on an extension object, which could be
modified for trojan-horse-type attacks.

Hence, forbid CREATE OR REPLACE of an existing object unless it already
belongs to the extension.  (Note that we've always forbidden replacing
an object that belongs to some other extension; only the behavior for
previously-free-standing objects changes here.)

For the same reason, also fail CREATE IF NOT EXISTS when there is
an existing object that doesn't belong to the extension.

Our thanks to Sven Klemm for reporting this problem.

Security: CVE-2022-2625
2022-08-08 11:12:31 -04:00
Alvaro Herrera 97d6614d70
Translation updates
Source-Git-URL: ssh://git@git.postgresql.org/pgtranslation/messages.git
Source-Git-Hash: ca66f38be9dcb1cf18429e1cd62b9f6b7ae7a89e
2022-08-08 12:39:52 +02:00
Alvaro Herrera 61904503b7
Remove unportable use of timezone in recent test
Per buildfarm member snapper

Discussion: https://postgr.es/m/129951.1659812518@sss.pgh.pa.us
2022-08-07 10:19:40 +02:00
Alvaro Herrera 772e6383d1
Improve recently-added test reliability
Commit 59be1c942a already tried to make
src/test/recovery/t/033_replay_tsp_drops more reliable, but it wasn't
enough.  Try to improve on that by making this use of a replication slot
to be more like others.  Also, don't drop the slot.

Make a few other stylistic changes while at it.  It's still quite slow,
which is another thing that we need to fix in this script.

Backpatch to all supported branches.

Discussion: https://postgr.es/m/349302.1659191875@sss.pgh.pa.us
2022-08-06 15:52:10 +02:00
Alvaro Herrera 39e45d3cec
BRIN: mask BRIN_EVACUATE_PAGE for WAL consistency checking
That bit is unlogged and therefore it's wrong to consider it in WAL page
comparison.

Add a test that tickles the case, as branch testing technology allows.

This has been a problem ever since wal consistency checking was
introduced (commit a507b86900 for pg10), so backpatch to all supported
branches.

Author: 王海洋 (Haiyang Wang) <wanghaiyang.001@bytedance.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/CACciXAD2UvLMOhc4jX9VvOKt7DtYLr3OYRBhvOZ-jRxtzc_7Jg@mail.gmail.com
Discussion: https://postgr.es/m/CACciXADOfErX9Bx0nzE_SkdfXr6Bbpo5R=v_B6MUTEYW4ya+cg@mail.gmail.com
2022-08-05 18:00:17 +02:00
Noah Misch cf86fddc1c Add HINT for restartpoint race with KeepFileRestoredFromArchive().
The five commits ending at cc2c7d65fc
closed this race condition for v15+.  For v14 through v10, add a HINT to
discourage studying the cosmetic problem.

Reviewed by Kyotaro Horiguchi and David Steele.

Discussion: https://postgr.es/m/20220731061747.GA3692882@rfd.leadboat.com
2022-08-05 08:31:02 -07:00
Alvaro Herrera 91130dd316
regress: fix test instability
Having additional triggers in a test table made the ORDER BY clauses in
old queries underspecified.  Add another column there for stability.

Per sporadic buildfarm pink.
2022-08-05 11:55:52 +02:00
Alvaro Herrera ce8e066521
Fix ENABLE/DISABLE TRIGGER to handle recursion correctly
Using ATSimpleRecursion() in ATPrepCmd() to do so as bbb927b4db did is
not correct, because ATPrepCmd() can't distinguish between triggers that
may be cloned and those that may not, so would wrongly try to recurse
for the latter category of triggers.

So this commit restores the code in EnableDisableTrigger() that
86f575948c had added to do the recursion, which would do it only for
triggers that may be cloned, that is, row-level triggers.  This also
changes tablecmds.c such that ATExecCmd() is able to pass the value of
ONLY flag down to EnableDisableTrigger() using its new 'recurse'
parameter.

This also fixes what seems like an oversight of 86f575948c that the
recursion to partition triggers would only occur if EnableDisableTrigger()
had actually changed the trigger.  It is more apt to recurse to inspect
partition triggers even if the parent's trigger didn't need to be
changed: only then can we be certain that all descendants share the same
state afterwards.

Backpatch all the way back to 11, like bbb927b4db.  Care is taken not
to break ABI compatibility (and that no catversion bump is needed.)

Co-authored-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Dmitry Koval <d.koval@postgrespro.ru>
Discussion: https://postgr.es/m/CA+HiwqG-cZT3XzGAnEgZQLoQbyfJApVwOTQaCaas1mhpf+4V5A@mail.gmail.com
2022-08-05 09:46:58 +02:00
Tom Lane 2e2f3435af Add CHECK_FOR_INTERRUPTS in ExecInsert's speculative insertion loop.
Ordinarily the functions called in this loop ought to have plenty
of CFIs themselves; but we've now seen a case where no such CFI is
reached, making the loop uninterruptible.  Even though that's from
a recently-introduced bug, it seems prudent to install a CFI at
the loop level in all branches.

Per discussion of bug #17558 from Andrew Kesper (an actual fix for
that bug will follow).

Discussion: https://postgr.es/m/17558-3f6599ffcf52fd4a@postgresql.org
2022-08-04 14:10:06 -04:00
Tom Lane e94c52fca6 Reduce test runtime of src/test/modules/snapshot_too_old.
The sto_using_cursor and sto_using_select tests were coded to exercise
every permutation of their test steps, but AFAICS there is no value in
exercising more than one.  This matters because each permutation costs
about six seconds, thanks to the "pg_sleep(6)".  Perhaps we could
reduce that, but the useless permutations seem worth getting rid of
in any case.  (Note that sto_using_hash_index got it right already.)

While here, clean up some other sloppiness such as an unused table.

This doesn't make too much difference in interactive testing, since the
wasted time is typically masked by parallelization with other tests.
However, the buildfarm runs this as a serial step, which means we can
expect to shave ~40 seconds from every buildfarm run.  That makes it
worth back-patching.

Discussion: https://postgr.es/m/2515192.1659454702@sss.pgh.pa.us
2022-08-03 11:14:55 -04:00
Tom Lane 51d8b52fcf Check maximum number of columns in function RTEs, too.
I thought commit fd96d14d9 had plugged all the holes of this sort,
but no, function RTEs could produce oversize tuples too, either
via long coldeflists or just from multiple functions in one RTE.
(I'm pretty sure the other variants of base RTEs aren't a problem,
because they ultimately refer to either a table or a sub-SELECT,
whose widths are enforced elsewhere.  But we explicitly allow join
RTEs to be overwidth, as long as you don't try to form their
tuple result.)

Per further discussion of bug #17561.  As before, patch all branches.

Discussion: https://postgr.es/m/17561-80350151b9ad2ad4@postgresql.org
2022-08-01 12:22:35 -04:00
Andrew Dunstan 3f9c205367 Fix new recovery test for log_error_verbosity=verbose case
The new test is from commit 9e4f914b5e.

With this setting messages have SQL error numbers included, so that
needs to be provided for in the pattern looked for.

Backpatch to all live branches like the original.
2022-07-29 18:17:49 -04:00
Tom Lane 8dea18372f In transformRowExpr(), check for too many columns in the row.
A RowExpr with more than MaxTupleAttributeNumber columns would fail at
execution anyway, since we cannot form a tuple datum with more than that
many columns.  While heap_form_tuple() has a check for too many columns,
it emerges that there are some intermediate bits of code that don't
check and can be driven to failure with sufficiently many columns.
Checking this at parse time seems like the most appropriate place to
install a defense, since we already check SELECT list length there.

While at it, make the SELECT-list-length error use the same errcode
(TOO_MANY_COLUMNS) as heap_form_tuple does, rather than the generic
PROGRAM_LIMIT_EXCEEDED.

Per bug #17561 from Egor Chindyaskin.  The given test case crashes
in all supported branches (and probably a lot further back),
so patch all.

Discussion: https://postgr.es/m/17561-80350151b9ad2ad4@postgresql.org
2022-07-29 13:30:50 -04:00
Alvaro Herrera fcd72cf295
Fix test instability
On FreeBSD, the new test fails due to a WAL file being removed before
the standby has had the chance to copy it.  Fix by adding a replication
slot to prevent the removal until after the standby has connected.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reported-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAEze2Wj5nau_qpjbwihvmXLfkAWOZ5TKdbnqOc6nKSiRJEoPyQ@mail.gmail.com
2022-07-29 12:50:47 +02:00
Alvaro Herrera 5a10c262fc
Fix replay of create database records on standby
Crash recovery on standby may encounter missing directories
when replaying database-creation WAL records.  Prior to this
patch, the standby would fail to recover in such a case;
however, the directories could be legitimately missing.
Consider the following sequence of commands:

    CREATE DATABASE
    DROP DATABASE
    DROP TABLESPACE

If, after replaying the last WAL record and removing the
tablespace directory, the standby crashes and has to replay the
create database record again, crash recovery must be able to continue.

A fix for this problem was already attempted in 49d9cfc68b, but it
was reverted because of design issues.  This new version is based
on Robert Haas' proposal: any missing tablespaces are created
during recovery before reaching consistency.  Tablespaces
are created as real directories, and should be deleted
by later replay.  CheckRecoveryConsistency ensures
they have disappeared.

The problems detected by this new code are reported as PANIC,
except when allow_in_place_tablespaces is set to ON, in which
case they are WARNING.  Apart from making tests possible, this
gives users an escape hatch in case things don't go as planned.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Asim R Praveen <apraveen@pivotal.io>
Author: Paul Guo <paulguo@gmail.com>
Reviewed-by: Anastasia Lubennikova <lubennikovaav@gmail.com> (older versions)
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> (older versions)
Reviewed-by: Michaël Paquier <michael@paquier.xyz>
Diagnosed-by: Paul Guo <paulguo@gmail.com>
Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
2022-07-28 08:26:05 +02:00
Alvaro Herrera 258c896418
Allow "in place" tablespaces.
This is a backpatch to branches 10-14 of the following commits:

7170f2159f Allow "in place" tablespaces.
c6f2f01611 Fix pg_basebackup with in-place tablespaces.
f6f0db4d62 Fix pg_tablespace_location() with in-place tablespaces
7a7cd84893 doc: Remove mention to in-place tablespaces for pg_tablespace_location()
5344723755 Remove unnecessary Windows-specific basebackup code.

In-place tablespaces were introduced as a testing helper mechanism, but
they are going to be used for a bugfix in WAL replay to be backpatched
to all stable branches.

I (Álvaro) had to adjust some code to account for lack of
get_dirent_type() in branches prior to 14.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Michaël Paquier <michael@paquier.xyz>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20220722081858.omhn2in5zt3g4nek@alvherre.pgsql
2022-07-27 07:55:12 +02:00
Tom Lane 9e3e1ac458 Force immediate commit after CREATE DATABASE etc in extended protocol.
We have a few commands that "can't run in a transaction block",
meaning that if they complete their processing but then we fail
to COMMIT, we'll be left with inconsistent on-disk state.
However, the existing defenses for this are only watertight for
simple query protocol.  In extended protocol, we didn't commit
until receiving a Sync message.  Since the client is allowed to
issue another command instead of Sync, we're in trouble if that
command fails or is an explicit ROLLBACK.  In any case, sitting
in an inconsistent state while waiting for a client message
that might not come seems pretty risky.

This case wasn't reachable via libpq before we introduced pipeline
mode, but it's always been an intended aspect of extended query
protocol, and likely there are other clients that could reach it
before.

To fix, set a flag in PreventInTransactionBlock that tells
exec_execute_message to force an immediate commit.  This seems
to be the approach that does least damage to existing working
cases while still preventing the undesirable outcomes.

While here, add some documentation to protocol.sgml that explicitly
says how to use pipelining.  That's latent in the existing docs if
you know what to look for, but it's better to spell it out; and it
provides a place to document this new behavior.

Per bug #17434 from Yugo Nagata.  It's been wrong for ages,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/17434-d9f7a064ce2a88a3@postgresql.org
2022-07-26 13:07:03 -04:00
Tom Lane 1078742af0 Fix ruleutils issues with dropped cols in functions-returning-composite.
Due to lack of concern for the case in the dependency code, it's
possible to drop a column of a composite type even though stored
queries have references to the dropped column via functions-in-FROM
that return the composite type.  There are "soft" references,
namely FROM-clause aliases for such columns, and "hard" references,
that is actual Vars referring to them.  The right fix for hard
references is to add dependencies preventing the drop; something
we've known for many years and not done (and this commit still doesn't
address it).  A "soft" reference shouldn't prevent a drop though.
We've been around on this before (cf. 9b35ddce9, 2c4debbd0), but
nobody had noticed that the current behavior can result in dump/reload
failures, because ruleutils.c can print more column aliases than the
underlying composite type now has.  So we need to rejigger the
column-alias-handling code to treat such columns as dropped and not
print aliases for them.

Rather than writing new code for this, I used expandRTE() which already
knows how to figure out which function result columns are dropped.
I'd initially thought maybe we could use expandRTE() in all cases, but
that fails for EXPLAIN's purposes, because the planner strips a lot of
RTE infrastructure that expandRTE() needs.  So this patch just uses it
for unplanned function RTEs and otherwise does things the old way.

If there is a hard reference (Var), then removing the column alias
causes us to fail to print the Var, since there's no longer a name
to print.  Failing seems less desirable than printing a made-up
name, so I made it print "?dropped?column?" instead.

Per report from Timo Stolz.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/5c91267e-3b6d-5795-189c-d15a55d61dbb@nullachtvierzehn.de
2022-07-21 13:56:02 -04:00
Fujii Masao 80c3ea9185 Fix assertion failure and segmentation fault in backup code.
When a non-exclusive backup is canceled, do_pg_abort_backup() is called
and resets some variables set by pg_backup_start (pg_start_backup in v14
or before). But previously it forgot to reset the session state indicating
whether a non-exclusive backup is in progress or not in this session.

This issue could cause an assertion failure when the session running
BASE_BACKUP is terminated after it executed pg_backup_start and
pg_backup_stop (pg_stop_backup in v14 or before). Also it could cause
a segmentation fault when pg_backup_stop is called after BASE_BACKUP
in the same session is canceled.

This commit fixes the issue by making do_pg_abort_backup reset
that session state.

Back-patch to all supported branches.

Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Michael Paquier, Robert Haas
Discussion: https://postgr.es/m/3374718f-9fbf-a950-6d66-d973e027f44c@oss.nttdata.com
2022-07-20 09:54:10 +09:00
Fujii Masao 87e5044877 Prevent BASE_BACKUP in the middle of another backup in the same session.
Multiple non-exclusive backups are able to be run conrrently in different
sessions. But, in the same session, only one non-exclusive backup can be
run at the same moment. If pg_backup_start (pg_start_backup in v14 or before)
is called in the middle of another non-exclusive backup in the same session,
an error is thrown.

However, previously, in logical replication walsender mode, even if that
walsender session had already called pg_backup_start and started
a non-exclusive backup, it could execute BASE_BACKUP command and
start another non-exclusive backup. Which caused subsequent pg_backup_stop
to throw an error because BASE_BACKUP unexpectedly reset the session state
marked by pg_backup_start.

This commit prevents BASE_BACKUP command in the middle of another
non-exclusive backup in the same session.

Back-patch to all supported branches.

Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Michael Paquier, Robert Haas
Discussion: https://postgr.es/m/3374718f-9fbf-a950-6d66-d973e027f44c@oss.nttdata.com
2022-07-20 09:52:36 +09:00
Peter Eisentraut 6d61aef5db Re-add SPICleanup for ABI compatibility in stable branch
This fixes an ABI break introduced by
2f6b8c287b.

Author: Markus Wanner <markus.wanner@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/defd749a-8410-841d-1126-21398686d63d@enterprisedb.com
2022-07-18 19:38:24 +02:00
Thomas Munro 3f2344d4ae Make dsm_impl_posix_resize more future-proof.
Commit 4518c798 blocks signals for a short region of code, but it
assumed that whatever called it had the signal mask set to UnBlockSig on
entry.  That may be true today (or may even not be, in extensions in the
wild), but it would be better not to make that assumption.  We should
save-and-restore the caller's signal mask.

The PG_SETMASK() portability macro couldn't be used for that, which is
why it wasn't done before.  But... considering that commit a65e0864
established back in 9.6 that supported POSIX systems have sigprocmask(),
and that this is POSIX-only code, there is no reason not to use standard
sigprocmask() directly to achieve that.

Back-patch to all supported releases, like 4518c798 and 80845b7c.

Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKGKx6Biq7_UuV0kn9DW%2B8QWcpJC1qwhizdtD9tN-fn0H0g%40mail.gmail.com
2022-07-16 12:23:43 +12:00
Thomas Munro 74a9ee0348 Don't clobber postmaster sigmask in dsm_impl_resize.
Commit 4518c798 intended to block signals in regular backends that
allocate DSM segments, but dsm_impl_resize() is also reached by
dsm_postmaster_startup().  It's not OK to clobber the postmaster's
signal mask, so only manipulate the signal mask when under the
postmaster.

Back-patch to all releases, like 4518c798.

Discussion: https://postgr.es/m/CA%2BhUKGKNpK%3D2OMeea_AZwpLg7Bm4%3DgYWk7eDjZ5F6YbozfOf8w%40mail.gmail.com
2022-07-15 02:07:15 +12:00
Thomas Munro 39683c69a0 Block signals while allocating DSM memory.
On Linux, we call posix_fallocate() on shm_open()'d memory to avoid
later potential SIGBUS (see commit 899bd785).

Based on field reports of systems stuck in an EINTR retry loop there,
there, we made it possible to break out of that loop via slightly odd
coding where the CHECK_FOR_INTERRUPTS() call was somewhat removed from
the loop (see commit 422952ee).

On further reflection, that was not a great choice for at least two
reasons:

1.  If interrupts were held, the CHECK_FOR_INTERRUPTS() would do nothing
and the EINTR error would be surfaced to the user.

2.  If EINTR was reported but neither QueryCancelPending nor
ProcDiePending was set, then we'd dutifully retry, but with a bit more
understanding of how posix_fallocate() works, it's now clear that you
can get into a loop that never terminates.  posix_fallocate() is not a
function that can do some of the job and tell you about progress if it's
interrupted, it has to undo what it's done so far and report EINTR, and
if signals keep arriving faster than it can complete (cf recovery
conflict signals), you're stuck.

Therefore, for now, we'll simply block most signals to guarantee
progress.  SIGQUIT is not blocked (see InitPostmasterChild()), because
its expected handler doesn't return, and unblockable signals like
SIGCONT are not expected to arrive at a high rate.  For good measure,
we'll include the ftruncate() call in the blocked region, and add a
retry loop.

Back-patch to all supported releases.

Reported-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Nicola Contu <nicola.contu@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220701154105.jjfutmngoedgiad3%40alvherre.pgsql
2022-07-14 14:23:03 +12:00
Thomas Munro cd26139a30 Fix lock assertions in dshash.c.
dshash.c previously maintained flags to be able to assert that you
didn't hold any partition lock.  These flags could get out of sync with
reality in error scenarios.

Get rid of all that, and make assertions about the locks themselves
instead.  Since LWLockHeldByMe() loops internally, we don't want to put
that inside another loop over all partition locks.  Introduce a new
debugging-only interface LWLockAnyHeldByMe() to avoid that.

This problem was noted by Tom and Andres while reviewing changes to
support the new shared memory stats system, and later showed up in
reality while working on commit 389869af.

Back-patch to 11, where dshash.c arrived.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220311012712.botrpsikaufzteyt@alap3.anarazel.de
Discussion: https://postgr.es/m/CA%2BhUKGJ31Wce6HJ7xnVTKWjFUWQZPBngxfJVx4q0E98pDr3kAw%40mail.gmail.com
2022-07-11 15:54:24 +12:00
Thomas Munro 21ed12b14a Fix \watch's interaction with libedit on ^C.
When you hit ^C, the terminal driver in Unix-like systems echoes "^C" as
well as sending an interrupt signal (depending on stty settings).  At
least libedit (but maybe also libreadline) is then confused about the
current cursor location, and corrupts the display if you try to scroll
back.  Fix, by moving to a new line before the next prompt is displayed.

Back-patch to all supported released.

Author: Pavel Stehule <pavel.stehule@gmail.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3278793.1626198638%40sss.pgh.pa.us
2022-07-10 16:55:18 +12:00
Dean Rasheed e88b1f1e22 Fix alias matching in transformLockingClause().
When locking a specific named relation for a FOR [KEY] UPDATE/SHARE
clause, transformLockingClause() finds the relation to lock by
scanning the rangetable for an RTE with a matching eref->aliasname.
However, it failed to account for the visibility rules of a join RTE.

If a join RTE doesn't have a user-supplied alias, it will have a
generated eref->aliasname of "unnamed_join" that is not visible as a
relation name in the parse namespace. Such an RTE needs to be skipped,
otherwise it might be found in preference to a regular base relation
with a user-supplied alias of "unnamed_join", preventing it from being
locked.

In addition, if a join RTE doesn't have a user-supplied alias, but
does have a join_using_alias, then the RTE needs to be matched using
that alias rather than the generated eref->aliasname, otherwise a
misleading "relation not found" error will be reported rather than a
"join cannot be locked" error.

Backpatch all the way, except for the second part which only goes back
to 14, where JOIN USING aliases were added.

Dean Rasheed, reviewed by Tom Lane.

Discussion: https://postgr.es/m/CAEZATCUY_KOBnqxbTSPf=7fz9HWPnZ5Xgb9SwYzZ8rFXe7nb=w@mail.gmail.com
2022-07-07 13:07:54 +01:00
Noah Misch 1cad30e3ba Fix previous commit's ecpg_clocale for ppc Darwin.
Per buildfarm member prairiedog, this platform rejects uninitialized
global variables in shared libraries.  Back-patch to v10, like the
addition of the variable.

Reviewed by Tom Lane.

Discussion: https://postgr.es/m/20220703030619.GB2378460@rfd.leadboat.com
2022-07-02 21:03:24 -07:00
Noah Misch d68b731a15 ecpglib: call newlocale() once per process.
ecpglib has been calling it once per SQL query and once per EXEC SQL GET
DESCRIPTOR.  Instead, if newlocale() has not succeeded before, call it
while establishing a connection.  This mitigates three problems:
- If newlocale() failed in EXEC SQL GET DESCRIPTOR, the command silently
  proceeded without the intended locale change.
- On AIX, each newlocale()+freelocale() cycle leaked memory.
- newlocale() CPU usage may have been nontrivial.

Fail the connection attempt if newlocale() fails.  Rearrange
ecpg_do_prologue() to validate the connection before its uselocale().

The sort of program that may regress is one running in an environment
where newlocale() fails.  If that program establishes connections
without running SQL statements, it will stop working in response to this
change.  I'm betting against the importance of such an ECPG use case.
Most SQL execution (any using ECPGdo()) has long required newlocale()
success, so there's little a connection could do without newlocale().

Back-patch to v10 (all supported versions).

Reviewed by Tom Lane.  Reported by Guillaume Lelarge.

Discussion: https://postgr.es/m/20220101074055.GA54621@rfd.leadboat.com
2022-07-02 13:00:35 -07:00
Thomas Munro facfd04678 Harden dsm_impl.c against unexpected EEXIST.
Previously, we trusted the OS not to report EEXIST unless we'd passed in
IPC_CREAT | IPC_EXCL or O_CREAT | O_EXCL, as appropriate.  Solaris's
shm_open() can in fact do that, causing us to crash because we didn't
ereport and then we blithely assumed the mapping was successful.

Let's treat EEXIST just like any other error, unless we're actually
trying to create a new segment.  This applies to shm_open(), where this
behavior has been seen, and also to the equivalent operations for our
sysv and mmap modes just on principle.

Based on the underlying reason for the error, namely contention on a
lock file managed by Solaris librt for each distinct name, this problem
is only likely to happen on 15 and later, because the new shared memory
stats system produces shm_open() calls for the same path from
potentially large numbers of backends concurrently during
authentication.  Earlier releases only shared memory segments between a
small number of parallel workers under one Gather node.  You could
probably hit it if you tried hard enough though, and we should have been
more defensive in the first place.  Therefore, back-patch to all
supported releases.

Per build farm animal margay.  This isn't the end of the story, though,
it just changes random crashes into random "File exists" errors; more
work needed for a green build farm.

Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGKqKrCV5xKWfh9rnm%3Do%3DDwZLTLtnsj_XpUi9g5%3DV%2B9oyg%40mail.gmail.com
2022-07-01 13:21:28 +12:00
Heikki Linnakangas b49889f3c0 Fix visibility check when XID is committed in CLOG but not in procarray.
TransactionIdIsInProgress had a fast path to return 'false' if the
single-item CLOG cache said that the transaction was known to be
committed. However, that was wrong, because a transaction is first
marked as committed in the CLOG but doesn't become visible to others
until it has removed its XID from the proc array. That could lead to an
error:

    ERROR:  t_xmin is uncommitted in tuple to be updated

or for an UPDATE to go ahead without blocking, before the previous
UPDATE on the same row was made visible.

The window is usually very short, but synchronous replication makes it
much wider, because the wait for synchronous replica happens in that
window.

Another thing that makes it hard to hit is that it's hard to get such
a commit-in-progress transaction into the single item CLOG cache.
Normally, if you call TransactionIdIsInProgress on such a transaction,
it determines that the XID is in progress without checking the CLOG
and without populating the cache. One way to prime the cache is to
explicitly call pg_xact_status() on the XID. Another way is to use a
lot of subtransactions, so that the subxid cache in the proc array is
overflown, making TransactionIdIsInProgress rely on pg_subtrans and
CLOG checks.

This has been broken ever since it was introduced in 2008, but the race
condition is very hard to hit, especially without synchronous
replication. There were a couple of reports of the error starting from
summer 2021, but no one was able to find the root cause then.

TransactionIdIsKnownCompleted() is now unused. In 'master', remove it,
but I left it in place in backbranches in case it's used by extensions.

Also change pg_xact_status() to check TransactionIdIsInProgress().
Previously, it only checked the CLOG, and returned "committed" before
the transaction was actually made visible to other queries. Note that
this also means that you cannot use pg_xact_status() to reproduce the
bug anymore, even if the code wasn't fixed.

Report and analysis by Konstantin Knizhnik. Patch by Simon Riggs, with
the pg_xact_status() change added by me.

Author: Simon Riggs
Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/flat/4da7913d-398c-e2ad-d777-f752cf7f0bbb%40garret.ru
2022-06-27 08:24:37 +03:00
Noah Misch bf92b73beb Fix PostgreSQL::Test aliasing for Perl v5.10.1.
This Perl segfaults if a declaration of the to-be-aliased package
precedes the aliasing itself.  Per buildfarm members lapwing and wrasse.
Like commit 20911775de, back-patch to v10
(all supported versions).

Discussion: https://postgr.es/m/20220625171533.GA2012493@rfd.leadboat.com
2022-06-25 14:16:00 -07:00
Noah Misch 6d49cc2861 CREATE INDEX: use the original userid for more ACL checks.
Commit a117cebd63 used the original userid
for ACL checks located directly in DefineIndex(), but it still adopted
the table owner userid for more ACL checks than intended.  That broke
dump/reload of indexes that refer to an operator class, collation, or
exclusion operator in a schema other than "public" or "pg_catalog".
Back-patch to v10 (all supported versions), like the earlier commit.

Nathan Bossart and Noah Misch

Discussion: https://postgr.es/m/f8a4105f076544c180a87ef0c4822352@stmuk.bayern.de
2022-06-25 09:07:46 -07:00
Noah Misch c41edb3242 For PostgreSQL::Test compatibility, alias entire package symbol tables.
Remove the need to edit back-branch-specific code sites when
back-patching the addition of a PostgreSQL::Test::Utils symbol.  Replace
per-symbol, incomplete alias lists.  Give old and new package names the
same EXPORT and EXPORT_OK semantics.  Back-patch to v10 (all supported
versions).

Reviewed by Andrew Dunstan.

Discussion: https://postgr.es/m/20220622072144.GD4167527@rfd.leadboat.com
2022-06-25 09:07:45 -07:00
Amit Kapila ed2a7a6bf6 Fix memory leak due to LogicalRepRelMapEntry.attrmap.
When rebuilding the relation mapping on subscribers, we were not releasing
the attribute mapping's memory which was no longer required.

The attribute mapping used in logical tuple conversion was refactored in
PG13 (by commit e1551f96e6) but we forgot to update the related code that
frees the attribute map.

Author: Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila, Shi yu
Backpatch-through: 10, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
2022-06-23 08:37:40 +05:30
Tom Lane 2f6b8c287b Fix SPI's handling of errors during transaction commit.
SPI_commit previously left it up to the caller to recover from any error
occurring during commit.  Since that's complicated and requires use of
low-level xact.c facilities, it's not too surprising that no caller got
it right.  Let's move the responsibility for cleanup into spi.c.  Doing
that requires redefining SPI_commit as starting a new transaction, so
that it becomes equivalent to SPI_commit_and_chain except that you get
default transaction characteristics instead of preserving the prior
transaction's characteristics.  We can make this pretty transparent
API-wise by redefining SPI_start_transaction() as a no-op.  Callers
that expect to do something in between might be surprised, but
available evidence is that no callers do so.

Having made that API redefinition, we can fix this mess by having
SPI_commit[_and_chain] trap errors and start a new, clean transaction
before re-throwing the error.  Likewise for SPI_rollback[_and_chain].
Some cleanup is also needed in AtEOXact_SPI, which was nowhere near
smart enough to deal with SPI contexts nested inside a committing
context.

While plperl and pltcl need no changes beyond removing their now-useless
SPI_start_transaction() calls, plpython needs some more work because it
hadn't gotten the memo about catching commit/rollback errors in the
first place.  Such an error resulted in longjmp'ing out of the Python
interpreter, which leaks Python stack entries at present and is reported
to crash Python 3.11 altogether.  Add the missing logic to catch such
errors and convert them into Python exceptions.

This is a back-patch of commit 2e517818f.  That's now aged long enough
to reduce the concerns about whether it will break something, and we
do need to ensure that supported branches will work with Python 3.11.

Peter Eisentraut and Tom Lane

Discussion: https://postgr.es/m/3375ffd8-d71c-2565-e348-a597d6e739e3@enterprisedb.com
Discussion: https://postgr.es/m/17416-ed8fe5d7213d6c25@postgresql.org
2022-06-22 12:12:00 -04:00
Tom Lane f7797747fc Avoid ecpglib core dump with out-of-order operations.
If an application executed operations like EXEC SQL PREPARE
without having first established a database connection, it could
get a core dump instead of the expected clean failure.  This
occurred because we did "pthread_getspecific(actual_connection_key)"
without ever having initialized the TSD key actual_connection_key.
The results of that are probably platform-specific, but at least
on Linux it often leads to a crash.

To fix, add calls to ecpg_pthreads_init() in the code paths that
might use actual_connection_key uninitialized.  It's harmless
(and hopefully inexpensive) to do that more than once.

Per bug #17514 from Okano Naoki.  The problem's ancient, so
back-patch to all supported branches.

Discussion: https://postgr.es/m/17514-edd4fad547c5692c@postgresql.org
2022-06-14 18:16:46 -04:00
Tom Lane aaa88d3828 Revert "Fix psql's single transaction mode on client-side errors with -c/-f switches".
This reverts commits a04ccf6df et al. in the back branches only.
There was some disagreement already over whether to back-patch
157f8739a, on the grounds that it is the sort of behavioral
change that we don't like to back-patch.  Furthermore, it now
looks like the logic needs some more work, which we don't have
time for before the upcoming 14.4 release.  Revert for now, and
perhaps reconsider later.

Discussion: https://postgr.es/m/17504-76b68018e130415e@postgresql.org
2022-06-10 16:34:25 -04:00
Tom Lane 199aac8b2f Un-break whole-row Vars referencing domain-over-composite types.
In commit ec62cb0aa, I foolishly replaced ExecEvalWholeRowVar's
lookup_rowtype_tupdesc_domain call with just lookup_rowtype_tupdesc,
because I didn't see how a domain could be involved there, and
there were no regression test cases to jog my memory.  But the
existing code was correct, so revert that change and add a test
case showing why it's necessary.  (Note: per comment in struct
DatumTupleFields, it is correct to produce an output tuple that's
labeled with the base composite type, not the domain; hence just
blindly looking through the domain is correct here.)

Per bug #17515 from Dan Kubb.  Back-patch to v11 where domains over
composites became a thing.

Discussion: https://postgr.es/m/17515-a24737438363aca0@postgresql.org
2022-06-10 10:35:57 -04:00
Peter Eisentraut 6450d15e2e Fix whitespace 2022-06-08 13:23:53 +02:00
Tom Lane d628ce048d Fix off-by-one loop termination condition in pg_stat_get_subscription().
pg_stat_get_subscription scanned one more LogicalRepWorker array entry
than is really allocated.  In the worst case this could lead to SIGSEGV,
if the LogicalRepCtx data structure is near the end of shared memory.
That seems quite unlikely though (thanks to the ordering of calls in
CreateSharedMemoryAndSemaphores) and we've heard no field reports of it.
A more likely misbehavior is one row of garbage data in the function's
result, but even that is not real likely because of the check that the
pid field matches some live backend.

Report and fix by Kuntal Ghosh.  This bug is old, so back-patch
to all supported branches.

Discussion: https://postgr.es/m/CAGz5QCJykEDzW6jQK6Yz7Qh_PMtD=95de_7QoocbVR2Qy8hWZA@mail.gmail.com
2022-06-07 15:34:30 -04:00
Tom Lane d82ed5b2f8 Don't fail on libpq-generated error reports in ecpg_raise_backend().
An error PGresult generated by libpq itself, such as a report of
connection loss, won't have broken-down error fields.
ecpg_raise_backend() blithely assumed that PG_DIAG_MESSAGE_PRIMARY
would always be present, and would end up passing a NULL string
pointer to snprintf when it isn't.  That would typically crash
before 3779ac62d, and it would fail to provide a useful error report
in any case.  Best practice is to substitute PQerrorMessage(conn)
in such cases, so do that.

Per bug #17421 from Masayuki Hirose.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/17421-790ff887e3188874@postgresql.org
2022-06-06 11:20:46 -04:00
Michael Paquier b0bd9327dd Fix psql's single transaction mode on client-side errors with -c/-f switches
psql --single-transaction is able to handle multiple -c and -f switches
in a single transaction since d5563d7d, but this had the surprising
behavior of forcing a transaction COMMIT even if psql failed with an
error in the client (for example incorrect path given to \copy), which
would generate an error, but still commit any changes that were already
applied in the backend.  This commit makes the behavior more consistent,
by enforcing a transaction ROLLBACK if any commands fail, both
client-side and backend-side, so as no changes are applied if one error
happens in any of them.

Some tests are added on HEAD to provide some coverage about all that.
Backend-side errors are unreliable as IPC::Run can complain on SIGPIPE
if psql quits before reading a query result, but that should work
properly in the case where any errors come from psql itself, which is
what the original report is about.

Reported-by: Christoph Berg
Author: Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/17504-76b68018e130415e@postgresql.org
Backpatch-through: 10
2022-06-06 11:07:35 +09:00
Tom Lane 37290f7f5a Silence compiler warnings from some older compilers.
Since a117cebd6, some older gcc versions issue "variable may be used
uninitialized in this function" complaints for brin_summarize_range.
Silence that using the same coding pattern as in bt_index_check_internal;
arguably, a117cebd6 had too narrow a view of which compilers might give
trouble.

Nathan Bossart and Tom Lane.  Back-patch as the previous commit was.

Discussion: https://postgr.es/m/20220601163537.GA2331988@nathanxps13
2022-06-01 17:21:45 -04:00
Tom Lane b5265196e5 Fix pl/perl test case so it will still work under Perl 5.36.
Perl 5.36 has reclassified the warning condition that this test
case used, so that the expected error fails to appear.  Tweak
the test so it instead exercises a case that's handled the same
way in all Perl versions of interest.

This appears to meet our standards for back-patching into
out-of-support branches: it changes no user-visible behavior
but enables testing of old branches with newer tools.
Hence, back-patch as far as 9.2.

Dagfinn Ilmari Mannsåker, per report from Jitka Plesníková.

Discussion: https://postgr.es/m/564579.1654093326@sss.pgh.pa.us
2022-06-01 16:15:47 -04:00
Tom Lane ae758e603d Ensure ParseTzFile() closes the input file after failing.
We hadn't noticed this because (a) few people feed invalid
timezone abbreviation files to the server, and (b) in typical
scenarios guc.c would throw ereport(ERROR) and then transaction
abort handling would silently clean up the leaked file reference.
However, it was possible to observe file leakage warnings if one
breaks an already-active abbreviation file, because guc.c does
not throw ERROR when loading supposedly-validated settings during
session start or SIGHUP processing.

Report and fix by Kyotaro Horiguchi (cosmetic adjustments by me)

Discussion: https://postgr.es/m/20220530.173740.748502979257582392.horikyota.ntt@gmail.com
2022-05-31 14:47:44 -04:00
Michael Paquier c3db8a2e2e Handle NULL for short descriptions of custom GUC variables
If a short description is specified as NULL in one of the various
DefineCustomXXXVariable() functions available to external modules to
define a custom parameter, SHOW ALL would crash.  This change teaches
SHOW ALL to properly handle NULL short descriptions, as well as any code
paths that manipulate it, to gain in flexibility.  Note that
help_config.c was already able to do that, when describing a set of GUCs
for postgres --describe-config.

Author: Steve Chavez
Reviewed by: Nathan Bossart, Andres Freund, Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/CAGRrpzY6hO-Kmykna_XvsTv8P2DshGiU6G3j8yGao4mk0CqjHA%40mail.gmail.com
Backpatch-through: 10
2022-05-28 12:12:58 +09:00
Tom Lane a44bc8b8f4 Remove misguided SSL key file ownership check in libpq.
Commits a59c79564 et al. tried to sync libpq's SSL key file
permissions checks with what we've used for years in the backend.
We did not intend to create any new failure cases, but it turns out
we did: restricting the key file's ownership breaks cases where the
client is allowed to read a key file despite not having the identical
UID.  In particular a client running as root used to be able to read
someone else's key file; and having seen that I suspect that there are
other, less-dubious use cases that this restriction breaks on some
platforms.

We don't really need an ownership check, since if we can read the key
file despite its having restricted permissions, it must have the right
ownership --- under normal conditions anyway, and the point of this
patch is that any additional corner cases where that works should be
deemed allowable, as they have been historically.  Hence, just drop
the ownership check, and rearrange the permissions check to get rid
of its faulty assumption that geteuid() can't be zero.  (Note that the
comparable backend-side code doesn't have to cater for geteuid() == 0,
since the server rejects that very early on.)

This does have the end result that the permissions safety check used
for a root user's private key file is weaker than that used for
anyone else's.  While odd, root really ought to know what she's doing
with file permissions, so I think this is acceptable.

Per report from Yogendra Suralkar.  Like the previous patch,
back-patch to all supported branches.

Discussion: https://postgr.es/m/MW3PR15MB3931DF96896DC36D21AFD47CA3D39@MW3PR15MB3931.namprd15.prod.outlook.com
2022-05-26 14:14:05 -04:00
Tom Lane f3b8d7244a Show 'AS "?column?"' explicitly when it's important.
ruleutils.c was coded to suppress the AS label for a SELECT output
expression if the column name is "?column?", which is the parser's
fallback if it can't think of something better.  This is fine, and
avoids ugly clutter, so long as (1) nothing further up in the parse
tree relies on that column name or (2) the same fallback would be
assigned when the rule or view definition is reloaded.  Unfortunately
(2) is far from certain, both because ruleutils.c might print the
expression in a different form from how it was originally written
and because FigureColname's rules might change in future releases.
So we shouldn't rely on that.

Detecting exactly whether there is any outer-level use of a SELECT
column name would be rather expensive.  This patch takes the simpler
approach of just passing down a flag indicating whether there *could*
be any outer use; for example, the output column names of a SubLink
are not referenceable, and we also do not care about the names exposed
by the right-hand side of a setop.  This is sufficient to suppress
unwanted clutter in all but one case in the regression tests.  That
seems like reasonable evidence that it won't be too much in users'
faces, while still fixing the cases we need to fix.

Per bug #17486 from Nicolas Lutic.  This issue is ancient, so
back-patch to all supported branches.

Discussion: https://postgr.es/m/17486-1ad6fd786728b8af@postgresql.org
2022-05-21 14:45:58 -04:00
Alvaro Herrera 6c6ea6ea81
Fix DDL deparse of CREATE OPERATOR CLASS
When an implicit operator family is created, it wasn't getting reported.
Make it do so.

This has always been missing.  Backpatch to 10.

Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-by: Leslie LEMAIRE <leslie.lemaire@developpement-durable.gouv.fr>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Michael Paquiër <michael@paquier.xyz>
Discussion: https://postgr.es/m/f74d69e151b22171e8829551b1159e77@developpement-durable.gouv.fr
2022-05-20 18:52:55 +02:00
Alvaro Herrera daf015fd0c
Backpatch regression tests added by 2d689babe3
A new plpgsql test function was added in 14 and up to cover for a bugfix
that was not backpatchable.  We can add it to older versions as a way to
cover other bits of DDL event triggers, with an exception clause to
avoid the problematic corner case.

Originally authored by Michaël Paquier.

Backpatch: 10 through 13.

Discussion: https://postgr.es/m/202205201523.7m5jbfvyanmj@alvherre.pgsql
2022-05-20 17:52:16 +02:00
Alvaro Herrera 50bf3157a1
Update xml_1.out and xml_2.out
Commit 0fbf011200 should have updated them but didn't.
2022-05-18 23:19:53 +02:00
Alvaro Herrera ba83de8ada
Check column list length in XMLTABLE/JSON_TABLE alias
We weren't checking the length of the column list in the alias clause of
an XMLTABLE or JSON_TABLE function (a "tablefunc" RTE), and it was
possible to make the server crash by passing an overly long one.  Fix it
by throwing an error in that case, like the other places that deal with
alias lists.

In passing, modify the equivalent test used for join RTEs to look like
the other ones, which was different for no apparent reason.

This bug came in when XMLTABLE was born in version 10; backpatch to all
stable versions.

Reported-by: Wang Ke <krking@zju.edu.cn>
Discussion: https://postgr.es/m/17480-1c9d73565bb28e90@postgresql.org
2022-05-18 20:28:31 +02:00
Michael Paquier 4525151d4c Fix control file update done in restartpoints still running after promotion
If a cluster is promoted (aka the control file shows a state different
than DB_IN_ARCHIVE_RECOVERY) while CreateRestartPoint() is still
processing, this function could miss an update of the control file for
"checkPoint" and "checkPointCopy" but still do the recycling and/or
removal of the past WAL segments, assuming that the to-be-updated LSN
values should be used as reference points for the cleanup.  This causes
a follow-up restart attempting crash recovery to fail with a PANIC on a
missing checkpoint record if the end-of-recovery checkpoint triggered by
the promotion did not complete while the cluster abruptly stopped or
crashed before the completion of this checkpoint.  The PANIC would be
caused by the redo LSN referred in the control file as located in a
segment already gone, recycled by the previous restartpoint with
"checkPoint" out-of-sync in the control file.

This commit fixes the update of the control file during restartpoints so
as "checkPoint" and "checkPointCopy" are updated even if the cluster has
been promoted while a restartpoint is running, to be on par with the set
of WAL segments actually recycled in the end of CreateRestartPoint().

7863ee4 has fixed this problem already on master, but the release timing
of the latest point versions did not let me enough time to study and fix
that on all the stable branches.

Reported-by: Fujii Masao, Rui Zhao
Author: Kyotaro Horiguchi
Reviewed-by: Nathan Bossart, Michael Paquier
Discussion: https://postgr.es/m/20220316.102444.2193181487576617583.horikyota.ntt@gmail.com
Backpatch-through: 10
2022-05-16 11:26:36 +09:00
Tom Lane 7f7f1750d4 Make pull_var_clause() handle GroupingFuncs exactly like Aggrefs.
This follows in the footsteps of commit 2591ee8ec by removing one more
ill-advised shortcut from planning of GroupingFuncs.  It's true that
we don't intend to execute the argument expression(s) at runtime, but
we still have to process any Vars appearing within them, or we risk
failure at setrefs.c time (or more fundamentally, in EXPLAIN trying
to print such an expression).  Vars in upper plan nodes have to have
referents in the next plan level, whether we ever execute 'em or not.

Per bug #17479 from Michael J. Sullivan.  Back-patch to all supported
branches.

Richard Guo

Discussion: https://postgr.es/m/17479-6260deceaf0ad304@postgresql.org
2022-05-12 11:31:46 -04:00
Amit Kapila 87c1dd246a Fix the logical replication timeout during large transactions.
The problem is that we don't send keep-alive messages for a long time
while processing large transactions during logical replication where we
don't send any data of such transactions. This can happen when the table
modified in the transaction is not published or because all the changes
got filtered. We do try to send the keep_alive if necessary at the end of
the transaction (via WalSndWriteData()) but by that time the
subscriber-side can timeout and exit.

To fix this we try to send the keepalive message if required after
processing certain threshold of changes.

Reported-by: Fabrice Chapuis
Author: Wang wei and Amit Kapila
Reviewed By: Masahiko Sawada, Euler Taveira, Hou Zhijie, Hayato Kuroda
Backpatch-through: 10
Discussion: https://postgr.es/m/CAA5-nLARN7-3SLU_QUxfy510pmrYK6JJb=bk3hcgemAM_pAv+w@mail.gmail.com
2022-05-11 10:12:23 +05:30
Michael Paquier f0fb2e9509 Improve setup of environment values for commands in MSVC's vcregress.pl
The current setup assumes that commands for lz4, zstd and gzip always
exist by default if not enforced by a user's environment.  However,
vcpkg, as one example, installs libraries but no binaries, so this
default setup to assume that a command should always be present would
cause failures.  This commit improves the detection of such external
commands as follows:
* If a ENV value is available, trust the environment/user and use it.
* If a ENV value is not available, check its execution by looking in the
current PATH, by launching a simple "$command --version" (that should be
portable enough).
** On execution failure, ignore ENV{command}.
** On execution success, set ENV{command} = "$command".

Note that this new rule applies to gzip, lz4 and zstd but not tar that
we assume will always exist.  Those commands are set up in the
environment only when using bincheck and taptest.  The CI includes all
those commands and I have checked that their setup is correct there.  I
have also tested this change in a MSVC environment where we have none of
those commands.

While on it, remove the references to lz4 from the documentation and
vcregress.pl in ~v13.  --with-lz4 has been added in v14~ so there is no
point to have this information in these older branches.

Reported-by: Andrew Dunstan
Reviewed-by: Andrew Dunstan
Discussion: https://postgr.es/m/14402151-376b-a57a-6d0c-10ad12608e12@dunslane.net
Backpatch-through: 10
2022-05-11 10:22:40 +09:00
Tom Lane 6ac9161f69 Stamp 11.16. 2022-05-09 17:20:11 -04:00
Tom Lane 539f8c563c Fix core dump in transformValuesClause when there are no columns.
The parser code that transformed VALUES from row-oriented to
column-oriented lists failed if there were zero columns.
You can't write that straightforwardly (though probably you
should be able to), but the case can be reached by expanding
a "tab.*" reference to a zero-column table.

Per bug #17477 from Wang Ke.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/17477-0af3c6ac6b0a6ae0@postgresql.org
2022-05-09 14:15:37 -04:00
Tom Lane c0406fa768 Revert "Disallow infinite endpoints in generate_series() for timestamps."
This reverts commit eafdf9de06
and its back-branch counterparts.  Corey Huinker pointed out that
we'd discussed this exact change back in 2016 and rejected it,
on the grounds that there's at least one usage pattern with LIMIT
where an infinite endpoint can usefully be used.  Perhaps that
argument needs to be re-litigated, but there's no time left before
our back-branch releases.  To keep our options open, restore the
status quo ante; if we do end up deciding to change things, waiting
one more quarter won't hurt anything.

Rather than just doing a straight revert, I added a new test case
demonstrating the usage with LIMIT.  That'll at least remind us of
the issue if we forget again.

Discussion: https://postgr.es/m/3603504.1652068977@sss.pgh.pa.us
Discussion: https://postgr.es/m/CADkLM=dzw0Pvdqp5yWKxMd+VmNkAMhG=4ku7GnCZxebWnzmz3Q@mail.gmail.com
2022-05-09 11:40:42 -04:00
Noah Misch 34ff15660b In REFRESH MATERIALIZED VIEW, set user ID before running user code.
It intended to, but did not, achieve this.  Adopt the new standard of
setting user ID just after locking the relation.  Back-patch to v10 (all
supported versions).

Reviewed by Simon Riggs.  Reported by Alvaro Herrera.

Security: CVE-2022-1552
2022-05-09 08:35:13 -07:00
Noah Misch 48ca2904c1 Make relation-enumerating operations be security-restricted operations.
When a feature enumerates relations and runs functions associated with
all found relations, the feature's user shall not need to trust every
user having permission to create objects.  BRIN-specific functionality
in autovacuum neglected to account for this, as did pg_amcheck and
CLUSTER.  An attacker having permission to create non-temp objects in at
least one schema could execute arbitrary SQL functions under the
identity of the bootstrap superuser.  CREATE INDEX (not a
relation-enumerating operation) and REINDEX protected themselves too
late.  This change extends to the non-enumerating amcheck interface.
Back-patch to v10 (all supported versions).

Sergey Shinderuk, reviewed (in earlier versions) by Alexander Lakhin.
Reported by Alexander Lakhin.

Security: CVE-2022-1552
2022-05-09 08:35:13 -07:00
Peter Eisentraut d8ab73f3ba Translation updates
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: a148941910171cea2b5391c98cdd90c5cd980d93
2022-05-09 12:27:34 +02:00
Andres Freund 4caa29414d Disable 031_recovery_conflict.pl until after minor releases.
f40d362a66 disabled part of 031_recovery_conflict.pl due to instability
that's not trivial to fix in the back branches. That fixed most of the
issues. But there was one more failure (on lapwing / REL_10_STABLE).

That failure looks like it might be caused by a genuine problem. Disable the
test until after the set of releases, to avoid packagers etc potentially
having to fight with a test failure they can't do anything about.

Discussion: https://postgr.es/m/3447060.1652032749@sss.pgh.pa.us
Backpatch: 10-14
2022-05-08 18:06:40 -07:00
Andres Freund fb965ab0f6 Temporarily skip recovery deadlock test in back branches.
The recovery deadlock test has a timing issue that was fixed in 5136967f1e in
HEAD. Unfortunately the same fix doesn't quite work in the back branches: 1)
adjust_conf() doesn't exist, which is easy enough to work around 2) a restart
cleares the recovery conflict stats < 15.

These issues can be worked around, but given the upcoming set of minor
releases, skip the problematic test for now. The buildfarm doesn't show
failures in other parts of 031_recovery_conflict.pl.

Discussion: https://postgr.es/m/20220506155827.dfnaheq6ufylwrqf@alap3.anarazel.de
Backpatch: 10-14
2022-05-06 09:07:42 -07:00
Andres Freund 47f47a8d47 Backpatch addition of pump_until() more completely.
In a2ab9c06ea I just backpatched the introduction of pump_until(), without
changing the existing local definitions (as 6da65a3f9a). The necessary
changes seemed more verbose than desirable. However, that leads to warnings,
as I failed to realize...

Backpatch to all versions containing pump_until() calls before
f74496dd61 (there's none in 10).

Discussion: https://postgr.es/m/2808491.1651802860@sss.pgh.pa.us
Discussion: https://postgr.es/m/18b37361-b482-b9d8-f30d-6115cd5ce25c@enterprisedb.com
Backpatch: 11-14
2022-05-06 08:39:14 -07:00
Tom Lane da72ff09b6 Update time zone data files to tzdata release 2022a.
DST law changes in Palestine.  Historical corrections for
Chile and Ukraine.
2022-05-05 14:55:22 -04:00
Andres Freund 8b9991cf07 Revert "Fix timing issue in deadlock recovery conflict test."
This reverts commit 7a1267f9fe.
2022-05-04 14:15:36 -07:00
Andres Freund 7a1267f9fe Fix timing issue in deadlock recovery conflict test.
Per buildfarm members longfin and skink.

Discussion: https://postgr.es/m/20220413002626.udl7lll7f3o7nre7@alap3.anarazel.de
Backpatch: 10-
2022-05-04 12:57:28 -07:00
Andres Freund 25d5494e28 Backpatch 031_recovery_conflict.pl.
The prior commit showed that the introduction of recovery conflict tests was a
good idea. Without these tests it's hard to know that the fix didn't break
something...

031_recovery_conflict.pl was introduced in 9f8a050f68 and extended in
21e184403b.

Discussion: https://postgr.es/m/20220413002626.udl7lll7f3o7nre7@alap3.anarazel.de
Backpatch: 10-14
2022-05-02 18:30:15 -07:00
Andres Freund 9cda785b4e Fix possibility of self-deadlock in ResolveRecoveryConflictWithBufferPin().
The tests added in 9f8a050f68 failed nearly reliably on FreeBSD in CI, and
occasionally on the buildfarm. That turns out to be caused not by a bug in the
test, but by a longstanding bug in recovery conflict handling.

The standby timeout handler, used by ResolveRecoveryConflictWithBufferPin(),
executed SendRecoveryConflictWithBufferPin() inside a signal handler. A bad
idea, because the deadlock timeout handler (or a spurious latch set) could
have interrupted ProcWaitForSignal(). If unlucky that could cause a
self-deadlock on ProcArrayLock, if the deadlock check is in
SendRecoveryConflictWithBufferPin()->CancelDBBackends().

To fix, set a flag in StandbyTimeoutHandler(), and check the flag in
ResolveRecoveryConflictWithBufferPin().

Subsequently the recovery conflict tests will be backpatched.

Discussion: https://postgr.es/m/20220413002626.udl7lll7f3o7nre7@alap3.anarazel.de
Backpatch: 10-
2022-05-02 18:30:15 -07:00
Andres Freund 2adb8debe5 Backpatch addition of wait_for_log(), pump_until().
These were originally introduced in a2ab9c06ea and a2ab9c06ea, as they are
needed by a about-to-be-backpatched test.

Discussion: https://postgr.es/m/20220413002626.udl7lll7f3o7nre7@alap3.anarazel.de
Backpatch: 10-14
2022-05-02 18:09:44 -07:00
Etsuro Fujita 1dd49df783 Fix typo in comment. 2022-05-02 16:45:07 +09:00
Andrew Dunstan b90ce0dd74 Inhibit mingw CRT's auto-globbing of command line arguments
For some reason by default the mingw C Runtime takes it upon itself to
expand program arguments that look like shell globbing characters. That
has caused much scratching of heads and mis-attribution of the causes of
some TAP test failures, so stop doing that.

This removes an inconsistency with Windows binaries built with MSVC,
which have no such behaviour.

Per suggestion from Noah Misch.

Backpatch to all live branches.

Discussion: https://postgr.es/m/20220423025927.GA1274057@rfd.leadboat.com
2022-04-25 15:51:27 -04:00
Andrew Dunstan 8b910d9423 Support new perl module namespace in stable branches
Commit b3b4d8e68a moved our perl test modules to a better namespace
structure, but this has made life hard for people wishing to backpatch
improvements in the TAP tests. Here we alleviate much of that difficulty
by implementing the new module names on top of the old modules, mostly
by using a little perl typeglob aliasing magic, so that we don't have a
dual maintenance burden. This should work both for the case where a new
test is backpatched and the case where a fix to an existing test that
uses the new namespace is backpatched.

Reviewed by Michael Paquier

Per complaint from Andres Freund

Discussion: https://postgr.es/m/20220418141530.nfxtkohefvwnzncl@alap3.anarazel.de

Applied to branches 10 through 14
2022-04-21 09:28:47 -04:00
Peter Geoghegan adb2d84fc2 Fix CLUSTER tuplesorts on abbreviated expressions.
CLUSTER sort won't use the datum1 SortTuple field when clustering
against an index whose leading key is an expression.  This makes it
unsafe to use the abbreviated keys optimization, which was missed by the
logic that sets up SortSupport state.  Affected tuplesorts output tuples
in a completely bogus order as a result (the wrong SortSupport based
comparator was used for the leading attribute).

This issue is similar to the bug fixed on the master branch by recent
commit cc58eecc5d.  But it's a far older issue, that dates back to the
introduction of the abbreviated keys optimization by commit 4ea51cdfe8.

Backpatch to all supported versions.

Author: Peter Geoghegan <pg@bowt.ie>
Author: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA+hUKG+bA+bmwD36_oDxAoLrCwZjVtST2fqe=b4=qZcmU7u89A@mail.gmail.com
Backpatch: 10-
2022-04-20 17:17:35 -07:00
Tom Lane e7adbd282d Disallow infinite endpoints in generate_series() for timestamps.
Such cases will lead to infinite loops, so they're of no practical
value.  The numeric variant of generate_series() already threw error
for this, so borrow its message wording.

Per report from Richard Wesley.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/91B44E7B-68D5-448F-95C8-B4B3B0F5DEAF@duckdblabs.com
2022-04-20 18:08:15 -04:00
Tom Lane 9130f8cbb9 Fix breakage in AlterFunction().
An ALTER FUNCTION command that tried to update both the function's
proparallel property and its proconfig list failed to do the former,
because it stored the new proparallel value into a tuple that was
no longer the interesting one.  Carelessness in 7aea8e4f2.

(I did not bother with a regression test, because the only likely
future breakage would be for someone to ignore the comment I added
and add some other field update after the heap_modify_tuple step.
A test using existing function properties could not catch that.)

Per report from Bryn Llewellyn.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/8AC9A37F-99BD-446F-A2F7-B89AD0022774@yugabyte.com
2022-04-19 23:03:59 -04:00
Amit Kapila a90de822e4 Fix the check to limit sync workers.
We don't allow to invoke more sync workers once we have reached the sync
worker limit per subscription. But the check to enforce this also doesn't
allow to launch an apply worker if it gets restarted.

This code was introduced by commit de43897122 but we caught the problem
only with the test added by recent commit c91f71b9dc which started failing
occasionally in the buildfarm.

As per buildfarm.
Diagnosed-by: Amit Kapila, Masahiko Sawada, Tomas Vondra
Author: Amit Kapila
Backpatch-through: 10
Discussion: https://postgr.es/m/CAH2L28vddB_NFdRVpuyRBJEBWjz4BSyTB=_ektNRH8NJ1jf95g@mail.gmail.com
	    https://postgr.es/m/f90d2b03-4462-ce95-a524-d91464e797c8@enterprisedb.com
2022-04-19 09:29:34 +05:30
Tom Lane 20af44693c Avoid invalid array reference in transformAlterTableStmt().
Don't try to look at the attidentity field of system attributes,
because they're not there in the TupleDescAttr array.  Sometimes
this is harmless because we accidentally pick up a zero, but
otherwise we'll report "no owned sequence found" from an attempt
to alter a system attribute.  (It seems possible that a SIGSEGV
could occur, too, though I've not seen it in testing.)

It's not in this function's charter to complain that you can't
alter a system column, so instead just hard-wire an assumption
that system attributes aren't identities.  I didn't bother with
a regression test because the appearance of the bug is very
erratic.

Per bug #17465 from Roman Zharkov.  Back-patch to all supported
branches.  (There's not actually a live bug before v12, because
before that get_attidentity() did the right thing anyway.
But for consistency I changed the test in the older branches too.)

Discussion: https://postgr.es/m/17465-f2a554a6cb5740d3@postgresql.org
2022-04-18 12:16:45 -04:00
Michael Paquier 4751a13b63 Fix race in TAP test 002_archiving.pl when restoring history file
This test, introduced in df86e52, uses a second standby to check that
it is able to remove correctly RECOVERYHISTORY and RECOVERYXLOG at the
end of recovery.  This standby uses the archives of the primary to
restore its contents, with some of the archive's contents coming from
the first standby previously promoted.  In slow environments, it was
possible that the test did not check what it should, as the history file
generated by the promotion of the first standby may not be stored yet on
the archives the second standby feeds on.  So, it could be possible that
the second standby selects an incorrect timeline, without restoring a
history file at all.

This commits adds a wait phase to make sure that the history file
required by the second standby is archived before this cluster is
created.  This relies on poll_query_until() with pg_stat_file() and an
absolute path, something not supported in REL_10_STABLE.

While on it, this adds a new test to check that the history file has
been restored by looking at the logs of the second standby.  This
ensures that a RECOVERYHISTORY, whose removal needs to be checked,
is created in the first place.  This should make the test more robust.

This test has been introduced by df86e52, but it came in light as an
effect of the bug fixed by acf1dd42, where the extra restore_command
calls made the test much slower.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/YlT23IvsXkGuLzFi@paquier.xyz
Backpatch-through: 11
2022-04-18 11:40:25 +09:00
Noah Misch d9f4348d47 Add a temp-install prerequisite to src/interfaces/ecpg "checktcp".
The target failed, tested $PATH binaries, or tested a stale temporary
installation.  Commit c66b438db6 missed
this.  Back-patch to v10 (all supported versions).
2022-04-16 17:43:58 -07:00
Robert Haas 6270ee4450 Rethink the delay-checkpoint-end mechanism in the back-branches.
The back-patch of commit bbace5697d had
the unfortunate effect of changing the layout of PGPROC in the
back-branches, which could break extensions. This happened because it
changed the delayChkpt from type bool to type int. So, change it back,
and add a new bool delayChkptEnd field instead. The new field should
fall within what used to be padding space within the struct, and so
hopefully won't cause any extensions to break.

Per report from Markus Wanner and discussion with Tom Lane and others.

Patch originally by me, somewhat revised by Markus Wanner per a
suggestion from Michael Paquier. A very similar patch was developed
by Kyotaro Horiguchi, but I failed to see the email in which that was
posted before writing one of my own.

Discussion: http://postgr.es/m/CA+Tgmoao-kUD9c5nG5sub3F7tbo39+cdr8jKaOVEs_1aBWcJ3Q@mail.gmail.com
Discussion: http://postgr.es/m/20220406.164521.17171257901083417.horikyota.ntt@gmail.com
2022-04-14 11:10:16 -04:00
Tom Lane 143043191b Add missing newline in one libpq error message.
Oversight in commit a59c79564.  Back-patch, as that was.
Noted by Peter Eisentraut.

Discussion: https://postgr.es/m/7f85ef6d-250b-f5ec-9867-89f0b16d019f@enterprisedb.com
2022-03-31 11:24:26 -04:00
Etsuro Fujita 166f143869 Fix typo in comment. 2022-03-30 19:00:07 +09:00
Andres Freund 7d935bdf7a waldump: fix use-after-free in search_directory().
After closedir() dirent->d_name is not valid anymore. As there alerady are a
few places relying on the limited lifetime of pg_waldump, do so here as well,
and just pg_strdup() the string.

The bug was introduced in fc49e24fa6.

Found by UBSan, run locally.

Backpatch: 11-, like fc49e24fa6 itself.
2022-03-27 18:15:17 -07:00
Tom Lane a2a1aa830a Suppress compiler warning in relptr_store().
clang 13 with -Wextra warns that "performing pointer subtraction with
a null pointer has undefined behavior" in the places where freepage.c
tries to set a relptr variable to constant NULL.  This appears to be
a compiler bug, but it's unlikely to get fixed instantly.  Fortunately,
we can work around it by introducing an inline support function, which
seems like a good change anyway because it removes the macro's existing
double-evaluation hazard.

Backpatch to v10 where this code was introduced.

Patch by me, based on an idea of Andres Freund's.

Discussion: https://postgr.es/m/48826.1648310694@sss.pgh.pa.us
2022-03-26 14:29:56 -04:00
Tom Lane 3d263b0905 Harden TAP tests that intentionally corrupt page checksums.
The previous method for doing that was to write zeroes into a
predetermined set of page locations.  However, there's a roughly
1-in-64K chance that the existing checksum will match by chance,
and yesterday several buildfarm animals started to reproducibly
see that, resulting in test failures because no checksum mismatch
was reported.

Since the checksum includes the page LSN, test success depends on
the length of the installation's WAL history, which is affected by
(at least) the initial catalog contents, the set of locales installed
on the system, and the length of the pathname of the test directory.
Sooner or later we were going to hit a chance match, and today is
that day.

Harden these tests by specifically inverting the checksum field and
leaving all else alone, thereby guaranteeing that the checksum is
incorrect.

In passing, fix places that were using seek() to set up for syswrite(),
a combination that the Perl docs very explicitly warn against.  We've
probably escaped problems because no regular buffered I/O is done on
these filehandles; but if it ever breaks, we wouldn't deserve or get
much sympathy.

Although we've only seen problems in HEAD, now that we recognize the
environmental dependencies it seems like it might be just a matter
of time until someone manages to hit this in back-branch testing.
Hence, back-patch to v11 where we started doing this kind of test.

Discussion: https://postgr.es/m/3192026.1648185780@sss.pgh.pa.us
2022-03-25 14:23:26 -04:00
Robert Haas 118f1a332b Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.
If TRUNCATE causes some buffers to be invalidated and thus the
checkpoint does not flush them, TRUNCATE must also ensure that the
corresponding files are truncated on disk. Otherwise, a replay
from the checkpoint might find that the buffers exist but have
the wrong contents, which may cause replay to fail.

Report by Teja Mupparti. Patch by Kyotaro Horiguchi, per a design
suggestion from Heikki Linnakangas, with some changes to the
comments by me. Review of this and a prior patch that approached
the issue differently by Heikki Linnakangas, Andres Freund, Álvaro
Herrera, Masahiko Sawada, and Tom Lane.

Discussion: http://postgr.es/m/BYAPR06MB6373BF50B469CA393C614257ABF00@BYAPR06MB6373.namprd06.prod.outlook.com
2022-03-24 14:49:08 -04:00
Andres Freund 2121d58091 Don't call fwrite() with len == 0 when writing out relcache init file.
Noticed via -fsanitize=undefined.

Backpatch to all branches, for the same reasons as 46ab07ffda.

Discussion: https://postgr.es/m/20220323173537.ll7klrglnp4gn2um@alap3.anarazel.de
Backpatch: 10-
2022-03-23 13:13:40 -07:00
Alvaro Herrera efe44fb2f6
pg_upgrade: Upgrade an Assert to a real 'if' test
It seems possible for the condition being tested to be true in
production, and nobody would never know (except when some data
eventually becomes corrupt?).

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m//202109040001.zky3wgv2qeqg@alvherre.pgsql
2022-03-23 19:23:51 +01:00
Alvaro Herrera 199cd7b59a
Fix "missing continuation record" after standby promotion
Invalidate abortedRecPtr and missingContrecPtr after a missing
continuation record is successfully skipped on a standby. This fixes a
PANIC caused when a recently promoted standby attempts to write an
OVERWRITE_RECORD with an LSN of the previously read aborted record.

Backpatch to 10 (all stable versions).

Author: Sami Imseih <simseih@amazon.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/44D259DE-7542-49C4-8A52-2AB01534DCA9@amazon.com
2022-03-23 18:22:10 +01:00
Andres Freund cd1951ba08 Add missing dependency of pg_dumpall to WIN32RES.
When cross-building to windows, or building with mingw on windows, the build
could fail with
  x86_64-w64-mingw32-gcc: error: win32ver.o: No such file or director
because pg_dumpall didn't depend on WIN32RES, but it's recipe references
it. The build nevertheless succeeded most of the time, due to
pg_dump/pg_restore having the required dependency, causing win32ver.o to be
built.

Reported-By: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA+hUKGJeekpUPWW6yCVdf9=oBAcCp86RrBivo4Y4cwazAzGPng@mail.gmail.com
Backpatch: 10-, omission present on all live branches
2022-03-22 08:28:54 -07:00
Michael Paquier 635abe6cd4 Fix failures in SSL tests caused by out-of-tree keys and certificates
This issue is environment-sensitive, where the SSL tests could fail in
various way by feeding on defaults provided by sslcert, sslkey,
sslrootkey, sslrootcert, sslcrl and sslcrldir coming from a local setup,
as of ~/.postgresql/ by default.  Horiguchi-san has reported two
failures, but more advanced testing from me (aka inclusion of garbage
SSL configuration in ~/.postgresql/ for all the configuration
parameters) has showed dozens of failures that can be triggered in the
whole test suite.

History has showed that we are not good when it comes to address such
issues, fixing them locally like in dd87799, and such problems keep
appearing.  This commit strengthens the entire test suite to put an end
to this set of problems by embedding invalid default values in all the
connection strings used in the tests.  The invalid values are prefixed
in each connection string, relying on the follow-up values passed in the
connection string to enforce any invalid value previously set.  Note
that two tests related to CRLs are required to fail with certain pre-set
configurations, but we can rely on enforcing an empty value instead
after the invalid set of values.

Reported-by: Kyotaro Horiguchi
Reviewed-by: Andrew Dunstan, Daniel Gustafsson, Kyotaro Horiguchi
Discussion: https://postgr.es/m/20220316.163658.1122740600489097632.horikyota.ntt@gmail.com
backpatch-through: 10
2022-03-22 13:21:54 +09:00
Tom Lane 5de244196a Fix assorted missing logic for GroupingFunc nodes.
The planner needs to treat GroupingFunc like Aggref for many purposes,
in particular with respect to processing of the argument expressions,
which are not to be evaluated at runtime.  A few places hadn't gotten
that memo, notably including subselect.c's processing of outer-level
aggregates.  This resulted in assertion failures or wrong plans for
cases in which a GROUPING() construct references an outer aggregation
level.

Also fix missing special cases for GroupingFunc in cost_qual_eval
(resulting in wrong cost estimates for GROUPING(), although it's
not clear that that would affect plan shapes in practice) and in
ruleutils.c (resulting in excess parentheses in pretty-print mode).

Per bug #17088 from Yaoguang Chen.  Back-patch to all supported
branches.

Richard Guo, Tom Lane

Discussion: https://postgr.es/m/17088-e33882b387de7f5c@postgresql.org
2022-03-21 17:44:29 -04:00
Tom Lane b8ae17fd9f Fix risk of deadlock failure while dropping a partitioned index.
DROP INDEX needs to lock the index's table before the index itself,
else it will deadlock against ordinary queries that acquire the
relation locks in that order.  This is correctly mechanized for
plain indexes by RangeVarCallbackForDropRelation; but in the case of
a partitioned index, we neglected to lock the child tables in advance
of locking the child indexes.  We can fix that by traversing the
inheritance tree and acquiring the needed locks in RemoveRelations,
after we have acquired our locks on the parent partitioned table and
index.

While at it, do some refactoring to eliminate confusion between
the actual and expected relkind in RangeVarCallbackForDropRelation.
We can save a couple of syscache lookups too, by having that function
pass back info that RemoveRelations will need.

Back-patch to v11 where partitioned indexes were added.

Jimmy Yih, Gaurab Dey, Tom Lane

Discussion: https://postgr.es/m/BYAPR05MB645402330042E17D91A70C12BD5F9@BYAPR05MB6454.namprd05.prod.outlook.com
2022-03-21 12:22:13 -04:00
Tom Lane 84f3ecdaae Fix incorrect xmlschema output for types timetz and timestamptz.
The output of table_to_xmlschema() and allied functions includes
a regex describing valid values for these types ... but the regex
was itself invalid, as it failed to escape a literal "+" sign.

Report and fix by Renan Soares Lopes.  Back-patch to all
supported branches.

Discussion: https://postgr.es/m/7f6fabaa-3f8f-49ab-89ca-59fbfe633105@me.com
2022-03-18 16:01:42 -04:00
Tom Lane 13b54d1e0d Revert applying column aliases to the output of whole-row Vars.
In commit bf7ca1587, I had the bright idea that we could make the
result of a whole-row Var (that is, foo.*) track any column aliases
that had been applied to the FROM entry the Var refers to.  However,
that's not terribly logically consistent, because now the output of
the Var is no longer of the named composite type that the Var claims
to emit.  bf7ca1587 tried to handle that by changing the output
tuple values to be labeled with a blessed RECORD type, but that's
really pretty disastrous: we can wind up storing such tuples onto
disk, whereupon they're not readable by other sessions.

The only practical fix I can see is to give up on what bf7ca1587
tried to do, and say that the column names of tuples produced by
a whole-row Var are always those of the underlying named composite
type, query aliases or no.  While this introduces some inconsistencies,
it removes others, so it's not that awful in the abstract.  What *is*
kind of awful is to make such a behavioral change in a back-patched
bug fix.  But corrupt data is worse, so back-patched it will be.

(A workaround available to anyone who's unhappy about this is to
introduce an extra level of sub-SELECT, so that the whole-row Var is
referring to the sub-SELECT's output and not to a named table type.
Then the Var is of type RECORD to begin with and there's no issue.)

Per report from Miles Delahunty.  The faulty commit dates to 9.5,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/2950001.1638729947@sss.pgh.pa.us
2022-03-17 18:18:05 -04:00
Thomas Munro ca522c60a5 Fix race between DROP TABLESPACE and checkpointing.
Commands like ALTER TABLE SET TABLESPACE may leave files for the next
checkpoint to clean up.  If such files are not removed by the time DROP
TABLESPACE is called, we request a checkpoint so that they are deleted.
However, there is presently a window before checkpoint start where new
unlink requests won't be scheduled until the following checkpoint.  This
means that the checkpoint forced by DROP TABLESPACE might not remove the
files we expect it to remove, and the following ERROR will be emitted:

	ERROR:  tablespace "mytblspc" is not empty

To fix, add a call to AbsorbSyncRequests() just before advancing the
unlink cycle counter.  This ensures that any unlink requests forwarded
prior to checkpoint start (i.e., when ckpt_started is incremented) will
be processed by the current checkpoint.  Since AbsorbSyncRequests()
performs memory allocations, it cannot be called within a critical
section, so we also need to move SyncPreCheckpoint() to before
CreateCheckPoint()'s critical section.

This is an old bug, so back-patch to all supported versions.

Author: Nathan Bossart <nathandbossart@gmail.com>
Reported-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220215235845.GA2665318%40nathanxps13
2022-03-16 17:38:55 +13:00
Thomas Munro 986d24042b Back-patch LLVM 14 API changes.
Since LLVM 14 has stopped changing and is about to be released,
back-patch the following changes from the master branch:

  e6a7600202
  807fee1a39
  a56e7b6601

Back-patch to 11, where LLVM JIT support came in.
2022-03-16 11:35:00 +13:00
Noah Misch 49e8a5d399 Introduce PG_TEST_TIMEOUT_DEFAULT for TAP suite non-elapsing timeouts.
Slow hosts may avoid load-induced, spurious failures by setting
environment variable PG_TEST_TIMEOUT_DEFAULT to some number of seconds
greater than 180.  Developers may see faster failures by setting that
environment variable to some lesser number of seconds.  In tests, write
$PostgreSQL::Test::Utils::timeout_default wherever the convention has
been to write 180.  This change raises the default for some briefer
timeouts.  Back-patch to v10 (all supported versions).

Discussion: https://postgr.es/m/20220218052842.GA3627003@rfd.leadboat.com
2022-03-04 18:53:17 -08:00
Tom Lane 566e1c04df Fix bogus casting in BlockIdGetBlockNumber().
This macro cast the result to BlockNumber after shifting, not before,
which is the wrong thing.  Per the C spec, the uint16 fields would
promote to int not unsigned int, so that (for 32-bit int) the shift
potentially shifts a nonzero bit into the sign position.  I doubt
there are any production systems where this would actually end with
the wrong answer, but it is undefined behavior per the C spec, and
clang's -fsanitize=undefined option reputedly warns about it on some
platforms.  (I can't reproduce that right now, but the code is
undeniably wrong per spec.)  It's easy to fix by casting to
BlockNumber (uint32) in the proper places.

It's been wrong for ages, so back-patch to all supported branches.

Report and patch by Zhihong Yu (cosmetic tweaking by me)

Discussion: https://postgr.es/m/CALNJ-vT9r0DSsAOw9OXVJFxLENoVS_68kJ5x0p44atoYH+H4dg@mail.gmail.com
2022-03-03 19:03:50 -05:00
Tom Lane f2087e26eb Clean up assorted failures under clang's -fsanitize=undefined checks.
Most of these are cases where we could call memcpy() or other libc
functions with a NULL pointer and a zero count, which is forbidden
by POSIX even though every production version of libc allows it.
We've fixed such things before in a piecemeal way, but apparently
never made an effort to try to get them all.  I don't claim that
this patch does so either, but it gets every failure I observe in
check-world, using clang 12.0.1 on current RHEL8.

numeric.c has a different issue that the sanitizer doesn't like:
"ln(-1.0)" will compute log10(0) and then try to assign the
resulting -Inf to an integer variable.  We don't actually use the
result in such a case, so there's no live bug.

Back-patch to all supported branches, with the idea that we might
start running a buildfarm member that tests this case.  This includes
back-patching c1132aae3 (Check the size in COPY_POINTER_FIELD),
which previously silenced some of these issues in copyfuncs.c.

Discussion: https://postgr.es/m/CALNJ-vT9r0DSsAOw9OXVJFxLENoVS_68kJ5x0p44atoYH+H4dg@mail.gmail.com
2022-03-03 18:13:24 -05:00
Tom Lane 5bb3d91ea9 Allow root-owned SSL private keys in libpq, not only the backend.
This change makes libpq apply the same private-key-file ownership
and permissions checks that we have used in the backend since commit
9a83564c5.  Namely, that the private key can be owned by either the
current user or root (with different file permissions allowed in the
two cases).  This allows system-wide management of key files, which
is just as sensible on the client side as the server, particularly
when the client is itself some application daemon.

Sync the comments about this between libpq and the backend, too.

Back-patch of a59c79564 and 50f03473e into all supported branches.

David Steele

Discussion: https://postgr.es/m/f4b7bc55-97ac-9e69-7398-335e212f7743@pgmasters.net
2022-03-02 11:57:02 -05:00
Tom Lane 31befa6be6 Disallow execution of SPI functions during plperl function compilation.
Perl can be convinced to execute user-defined code during compilation
of a plperl function (or at least a plperlu function).  That's not
such a big problem as long as the activity is confined within the
Perl interpreter, and it's not clear we could do anything about that
anyway.  However, if such code tries to use plperl's SPI functions,
we have a bigger problem.  In the first place, those functions are
likely to crash because current_call_data->prodesc isn't set up yet.
In the second place, because it isn't set up, we lack critical info
such as whether the function is supposed to be read-only.  And in
the third place, this path allows code execution during function
validation, which is strongly discouraged because of the potential
for security exploits.  Hence, reject execution of the SPI functions
until compilation is finished.

While here, add check_spi_usage_allowed() calls to various functions
that hadn't gotten the memo about checking that.  I think that perhaps
plperl_sv_to_literal may have been intentionally omitted on the grounds
that it was safe at the time; but if so, the addition of transforms
functionality changed that.  The others are more recently added and
seem to be flat-out oversights.

Per report from Mark Murawski.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/9acdf918-7fff-4f40-f750-2ffa84f083d2@intellasoft.net
2022-02-25 17:40:45 -05:00
Andres Freund 51c3416561 pg_waldump: Fix error message for WAL files smaller than XLOG_BLCKSZ.
When opening a WAL file smaller than XLOG_BLCKSZ (e.g. 0 bytes long) while
determining the wal_segment_size, pg_waldump checked errno, despite errno not
being set by the short read. Resulting in a bogus error message.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220214.181847.775024684568733277.horikyota.ntt@gmail.com
Backpatch: 11-, the bug was introducedin fc49e24fa
2022-02-25 10:40:32 -08:00
Andres Freund 3faa21bb76 Fix temporary object cleanup failing due to toast access without snapshot.
When cleaning up temporary objects during process exit the cleanup could fail
with:
  FATAL: cannot fetch toast data without an active snapshot

The bug is caused by RemoveTempRelationsCallback() not setting up a
snapshot. If an object with toasted catalog data needs to be cleaned up,
init_toast_snapshot() could fail with the above error.

Most of the time however the the problem is masked due to cached catalog
snapshots being returned by GetOldestSnapshot(). But dropping an object can
cause catalog invalidations to be emitted. If no further catalog accesses are
necessary between the invalidation processing and the next toast datum
deletion, the bug becomes visible.

It's easy to miss this bug because it typically happens after clients
disconnect and the FATAL error just ends up in the log.

Luckily temporary table cleanup at the next use of the same temporary schema
or during DISCARD ALL does not have the same problem.

Fix the bug by pushing a snapshot in RemoveTempRelationsCallback(). Also add
isolation tests for temporary object cleanup, including objects with toasted
catalog data.

A future HEAD only commit will add more assertions.

Reported-By: Miles Delahunty
Author: Andres Freund
Discussion: https://postgr.es/m/CAOFAq3BU5Mf2TTvu8D9n_ZOoFAeQswuzk7yziAb7xuw_qyw5gw@mail.gmail.com
Backpatch: 10-
2022-02-21 08:59:34 -08:00
Andrew Dunstan f1f82301d6
Remove most msys special processing in TAP tests
Following migration of Windows buildfarm members running TAP tests to
use of ucrt64 perl for those tests, special processing for msys perl is
no longer necessary and so is removed.

Backpatch to release 10

Discussion: https://postgr.es/m/c65a8781-77ac-ea95-d185-6db291e1baeb@dunslane.net
2022-02-20 11:50:54 -05:00
Andrew Dunstan b4a6ceed52
Remove PostgreSQL::Test::Utils::perl2host completely
Commit f1ac4a74de disabled this processing, and as nothing has broken (as
expected) here we proceed to remove the routine and adjust all the call
sites.

Backpatch to release 10

Discussion: https://postgr.es/m/0ba775a2-8aa0-0d56-d780-69427cf6f33d@dunslane.net
Discussion: https://postgr.es/m/20220125023609.5ohu3nslxgoygihl@alap3.anarazel.de
2022-02-20 10:50:48 -05:00
Tom Lane efae4401c4 Improve subscriber's error message for wrong publication relkind.
Pre-v13 versions only support logical replication from plain tables,
while v13 and later also allow partitioned tables to be published.
If you tried to subscribe an older server to such a publication,
you got "table XXX not found on publisher", which is pretty
unhelpful/confusing.  Arrange to deliver a more on-point error
message.  As commit c314c147c did in v13, remove the relkind check
from the query WHERE clause altogether, so that "not there"
is distinguishable from "wrong relkind".

Per report from Radoslav Nedyalkov.  Patch v10-v12.

Discussion: https://postgr.es/m/2952568.1644876730@sss.pgh.pa.us
2022-02-15 12:21:28 -05:00
Amit Kapila 1cd5802ac6 WAL log unchanged toasted replica identity key attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.

Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Reviewed-by: Alvaro Herrera, Haiying Tang, Andres Freund
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
2022-02-14 08:37:23 +05:30
Alexander Korotkov 0d554775bd Fix memory leak in IndexScan node with reordering
Fix ExecReScanIndexScan() to free the referenced tuples while emptying the
priority queue.  Backpatch to all supported versions.

Discussion: https://postgr.es/m/CAHqSB9gECMENBQmpbv5rvmT3HTaORmMK3Ukg73DsX5H7EJV7jw%40mail.gmail.com
Author: Aliaksandr Kalenik
Reviewed-by: Tom Lane, Alexander Korotkov
Backpatch-through: 10
2022-02-14 04:04:19 +03:00
Tom Lane 14ee565f39 Don't use_physical_tlist for an IOS with non-returnable columns.
createplan.c tries to save a runtime projection step by specifying
a scan plan node's output as being exactly the table's columns, or
index's columns in the case of an index-only scan, if there is not a
reason to do otherwise.  This logic did not previously pay attention
to whether an index's columns are returnable.  That worked, sort of
accidentally, until commit 9a3ddeb51 taught setrefs.c to reject plans
that try to read a non-returnable column.  I have no desire to loosen
setrefs.c's new check, so instead adjust use_physical_tlist() to not
try to optimize this way when there are non-returnable column(s).

Per report from Ryan Kelly.  Like the previous patch, back-patch
to all supported branches.

Discussion: https://postgr.es/m/CAHUie24ddN+pDNw7fkhNrjrwAX=fXXfGZZEHhRuofV_N_ftaSg@mail.gmail.com
2022-02-11 15:23:52 -05:00
Tom Lane 69cc15c316 Make pg_ctl stop/restart/promote recheck postmaster aliveness.
"pg_ctl stop/restart" checked that the postmaster PID is valid just
once, as a side-effect of sending the stop signal, and then would
wait-till-timeout for the postmaster.pid file to go away.  This
neglects the case wherein the postmaster dies uncleanly after we
signal it.  Similarly, once "pg_ctl promote" has sent the signal,
it'd wait for the corresponding on-disk state change to occur
even if the postmaster dies.

I'm not sure how we've managed not to notice this problem, but it
seems to explain slow execution of the 017_shm.pl test script on AIX
since commit 4fdbf9af5, which added a speculative "pg_ctl stop" with
the idea of making real sure that the postmaster isn't there.  In the
test steps that kill-9 and then restart the postmaster, it's possible
to get past the initial signal attempt before kill() stops working
for the doomed postmaster.  If that happens, pg_ctl waited till
PGCTLTIMEOUT before giving up ... and the buildfarm's AIX members
have that set very high.

To fix, include a "kill(pid, 0)" test (similar to what
postmaster_is_alive uses) in these wait loops, so that we'll
give up immediately if the postmaster PID disappears.

While here, I chose to refactor those loops out of where they were.
do_stop() and do_restart() can perfectly well share one copy of the
wait-for-stop loop, and it seems desirable to put a similar function
beside that for wait-for-promote.

Back-patch to all supported versions, since pg_ctl's wait logic
is substantially identical in all, and we're seeing the slow test
behavior in all branches.

Discussion: https://postgr.es/m/20220210023537.GA3222837@rfd.leadboat.com
2022-02-10 16:49:39 -05:00
Andrew Dunstan e2d104e190
Use gendef instead of pexports for building windows .def files
Modern msys systems lack pexports but have gendef instead, so use that.

Discussion: https://postgr.es/m/3ccde7a9-e4f9-e194-30e0-0936e6ad68ba@dunslane.net

Backpatch to release 9.4 to enable building with perl on older branches.
Before that pexports is not used for plperl.
2022-02-10 13:51:59 -05:00
Noah Misch 2373429975 Use Test::Builder::todo_start(), replacing $::TODO.
Some pre-2017 Test::More versions need perfect $Test::Builder::Level
maintenance to find the variable.  Buildfarm member snapper reported an
overall failure that the file intended to hide via the TODO construct.
That trouble was reachable in v11 and v10.  For later branches, this
serves as defense in depth.  Back-patch to v10 (all supported versions).

Discussion: https://postgr.es/m/20220202055556.GB2745933@rfd.leadboat.com
2022-02-09 18:17:03 -08:00
Noah Misch 3cc89d9c38 Fix back-patch of "Avoid race in RelationBuildDesc() ..."
The back-patch of commit fdd965d074 broke
CLOBBER_CACHE_ALWAYS for v9.6 through v13.  It updated the
InvalidateSystemCaches() call for CLOBBER_CACHE_RECURSIVELY, neglecting
the one for CLOBBER_CACHE_ALWAYS.  Back-patch to v13, v12, v11, and v10.

Reviewed by Tomas Vondra.  Reported by Tomas Vondra.

Discussion: https://postgr.es/m/df7b4c0b-7d92-f03f-75c4-9e08b269a716@enterprisedb.com
2022-02-09 18:16:59 -08:00
Tom Lane 3a6e3a8902 Remove ppport.h's broken re-implementation of eval_pv().
Recent versions of Devel::PPPort try to redefine eval_pv() to
dodge a bug in pre-5.31 Perl versions.  Unfortunately the redefinition
fails on compilers that don't support statements nested within
expressions.  However, we aren't actually interested in this bug fix,
since we always call eval_pv() with croak_on_error = FALSE.
So, until there's an upstream fix for this breakage, just comment
out the macro to revert to the older behavior.

Per report from Wei Sun, as well as previous buildfarm failure
on pademelon (which I'd unfortunately not looked at carefully
enough to understand the cause).  Back-patch to all supported
versions, since we're using the same ppport.h in all.

Discussion: https://postgr.es/m/tencent_2EFCC8BA0107B6EC0F97179E019A8A43C806@qq.com
Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pademelon&dt=2022-02-02%2001%3A22%3A58
2022-02-08 19:26:26 -05:00
Tom Lane ff2a6d2621 Stamp 11.15. 2022-02-07 16:20:23 -05:00
Peter Eisentraut d793dcf28b Translation updates
Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 250dd3558ab52b3b5f7c9b6d0cfe9ff81d726de2
2022-02-07 13:51:36 +01:00
Tom Lane 46bf1f2dc7 Test, don't just Assert, that mergejoin's inputs are in order.
There are two Asserts in nodeMergejoin.c that are reachable if
the input data is not in the expected order.  This seems way too
fragile.  Alexander Lakhin reported a case where the assertions
could be triggered with misconfigured foreign-table partitions,
and bitter experience with unstable operating system collation
definitions suggests another easy route to hitting them.  Neither
Assert is in a place where we can't afford one more test-and-branch,
so replace 'em with plain test-and-elog logic.

Per bug #17395.  While the reported symptom is relatively recent,
collation changes could happen anytime, so back-patch to all
supported branches.

Discussion: https://postgr.es/m/17395-8c326292078d1a57@postgresql.org
2022-02-05 11:59:30 -05:00
Tom Lane e368910543 Revert "plperl: Fix breakage of c89f409749 in back branches."
This reverts commits d81cac47a et al.  We shouldn't need that
hack after the preceding commits.

Discussion: https://postgr.es/m/20220131015130.shn6wr2fzuymerf6@alap3.anarazel.de
2022-01-31 15:07:39 -05:00
Tom Lane 923f9f4161 plperl: update ppport.h to Perl 5.34.0.
Also apply the changes suggested by running
perl ppport.h --compat-version=5.8.0

And remove some no-longer-required NEED_foo declarations.

Dagfinn Ilmari Mannsåker

Back-patch of commit 05798c9f7 into all supported branches.
At the time we thought this update was mostly cosmetic, but the
lack of it has caused trouble, while the patch itself hasn't.

Discussion: https://postgr.es/m/87y278s6iq.fsf@wibble.ilmari.org
Discussion: https://postgr.es/m/20220131015130.shn6wr2fzuymerf6@alap3.anarazel.de
2022-01-31 15:01:05 -05:00
Andres Freund ac374485b8 plperl: Fix breakage of c89f409749 in back branches.
ppport.h was only updated in 05798c9f7f (master). Unfortunately my commit
c89f409749 uses PERL_VERSION_LT which came in with that update. Breaking most
buildfarm animals.

I should have noticed that...

We might want to backpatch the ppport update instead, but for now lets get the
buildfarm green again.

Discussion: https://postgr.es/m/20220131015130.shn6wr2fzuymerf6@alap3.anarazel.de
Backpatch: 10-14, master doesn't need it
2022-01-30 18:01:15 -08:00
Andres Freund ad95a639ab plperl: windows: Use Perl_setlocale on 5.28+, fixing compile failure.
For older versions we need our own copy of perl's setlocale(), because it was
not exposed (why we need the setlocale in the first place is explained in
plperl_init_interp) . The copy stopped working in 5.28, as some of the used
macros are not public anymore.  But Perl_setlocale is available in 5.28, so
use that.

Author: Victor Wagner <vitus@wagner.pp.ru>
Reviewed-By: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/20200501134711.08750c5f@antares.wagner.home
Backpatch: all versions
2022-01-30 16:42:49 -08:00
Tomas Vondra 5cb88648ed Fix ordering of XIDs in ProcArrayApplyRecoveryInfo
Commit 8431e296ea reworked ProcArrayApplyRecoveryInfo to sort XIDs
before adding them to KnownAssignedXids. But the XIDs are sorted using
xidComparator, which compares the XIDs simply as uint32 values, not
logically. KnownAssignedXidsAdd() however expects XIDs in logical order,
and calls TransactionIdFollowsOrEquals() to enforce that. If there are
XIDs for which the two orderings disagree, an error is raised and the
recovery fails/restarts.

Hitting this issue is fairly easy - you just need two transactions, one
started before the 4B limit (e.g. XID 4294967290), the other sometime
after it (e.g. XID 1000). Logically (4294967290 <= 1000) but when
compared using xidComparator we try to add them in the opposite order.
Which makes KnownAssignedXidsAdd() fail with an error like this:

  ERROR: out-of-order XID insertion in KnownAssignedXids

This only happens during replica startup, while processing RUNNING_XACTS
records to build the snapshot. Once we reach STANDBY_SNAPSHOT_READY, we
skip these records. So this does not affect already running replicas,
but if you restart (or create) a replica while there are transactions
with XIDs for which the two orderings disagree, you may hit this.

Long-running transactions and frequent replica restarts increase the
likelihood of hitting this issue. Once the replica gets into this state,
it can't be started (even if the old transactions are terminated).

Fixed by sorting the XIDs logically - this is fine because we're dealing
with normal XIDs (because it's XIDs assigned to backends) and from the
same wraparound epoch (otherwise the backends could not be running at
the same time on the primary node). So there are no problems with the
triangle inequality, which is why xidComparator compares raw values.

Investigation and root cause analysis by Abhijit Menon-Sen. Patch by me.

This issue is present in all releases since 9.4, however releases up to
9.6 are EOL already so backpatch to 10 only.

Reviewed-by: Abhijit Menon-Sen
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
Discussion: https://postgr.es/m/36b8a501-5d73-277c-4972-f58a4dce088a%40enterprisedb.com
2022-01-27 20:18:22 +01:00
Noah Misch e092f00d06 On sparc64+ext4, suppress test failures from known WAL read failure.
Buildfarm members kittiwake, tadarida and snapper began to fail
frequently when commits 3cd9c3b921 and
f47ed79cc8 added tests of concurrency, but
the problem was reachable before those commits.  Back-patch to v10 (all
supported versions).

Discussion: https://postgr.es/m/20220116210241.GC756210@rfd.leadboat.com
2022-01-26 18:06:23 -08:00
Tom Lane bbb1caf6b2 Revert "graceful shutdown" changes for Windows, in back branches only.
This reverts commits 6051857fc and ed52c3707, but only in the back
branches.  Further testing has shown that while those changes do fix
some things, they also break others; in particular, it looks like
walreceivers fail to detect walsender-initiated connection close
reliably if the walsender shuts down this way.  We'll keep trying to
improve matters in HEAD, but it now seems unwise to push these changes
into stable releases.

Discussion: https://postgr.es/m/CA+hUKG+OeoETZQ=Qw5Ub5h3tmwQhBmDA=nuNO3KG=zWfUypFAw@mail.gmail.com
2022-01-25 12:17:40 -05:00
Tom Lane 4ec54498c5 Fix limitations on what SQL commands can be issued to a walsender.
In logical replication mode, a WalSender is supposed to be able
to execute any regular SQL command, as well as the special
replication commands.  Poor design of the replication-command
parser caused it to fail in various cases, notably:

* semicolons embedded in a command, or multiple SQL commands
sent in a single message;

* dollar-quoted literals containing odd numbers of single
or double quote marks;

* commands starting with a comment.

The basic problem here is that we're trying to run repl_scanner.l
across the entire input string even when it's not a replication
command.  Since repl_scanner.l does not understand all of the
token types known to the core lexer, this is doomed to have
failure modes.

We certainly don't want to make repl_scanner.l as big as scan.l,
so instead rejigger stuff so that we only lex the first token of
a non-replication command.  That will usually look like an IDENT
to repl_scanner.l, though a comment would end up getting reported
as a '-' or '/' single-character token.  If the token is a replication
command keyword, we push it back and proceed normally with repl_gram.y
parsing.  Otherwise, we can drop out of exec_replication_command()
without examining the rest of the string.

(It's still theoretically possible for repl_scanner.l to fail on
the first token; but that could only happen if it's an unterminated
single- or double-quoted string, in which case you'd have gotten
largely the same error from the core lexer too.)

In this way, repl_gram.y isn't involved at all in handling general
SQL commands, so we can get rid of the SQLCmd node type.  (In
the back branches, we can't remove it because renumbering enum
NodeTag would be an ABI break; so just leave it sit there unused.)

I failed to resist the temptation to clean up some other sloppy
coding in repl_scanner.l while at it.  The only externally-visible
behavior change from that is it now accepts \r and \f as whitespace,
same as the core lexer.

Per bug #17379 from Greg Rychlewski.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/17379-6a5c6cfb3f1f5e77@postgresql.org
2022-01-24 15:33:34 -05:00
Tom Lane 449a696236 Remember to reset yy_start state when firing up repl_scanner.l.
Without this, we get odd behavior when the previous cycle of
lexing exited in a non-default exclusive state.  Every other
copy of this code is aware that it has to do BEGIN(INITIAL),
but repl_scanner.l did not get that memo.

The real-world impact of this is probably limited, since most
replication clients would abandon their connection after getting
a syntax error.  Still, it's a bug.

This mistake is old, so back-patch to all supported branches.

Discussion: https://postgr.es/m/1874781.1643035952@sss.pgh.pa.us
2022-01-24 12:09:46 -05:00
Tom Lane bd110e250e Suppress variable-set-but-not-used warning from clang 13.
In the normal configuration where GEQO_DEBUG isn't defined,
recent clang versions have started to complain that geqo_main.c
accumulates the edge_failures count but never does anything
with it.  As a minimal back-patchable fix, insert a void cast
to silence this warning.  (I'd speculated about ripping out the
GEQO_DEBUG logic altogether, but I don't think we'd wish to
back-patch that.)

Per recently-established project policy, this is a candidate
for back-patching into out-of-support branches: it suppresses
an annoying compiler warning but changes no behavior.  Hence,
back-patch all the way to 9.2.

Discussion: https://postgr.es/m/CA+hUKGLTSZQwES8VNPmWO9AO0wSeLt36OCPDAZTccT1h7Q7kTQ@mail.gmail.com
2022-01-23 11:09:38 -05:00
Tomas Vondra 5e9fe25307 Correct type of front_pathkey to PathKey
In sort_inner_and_outer we iterate a list of PathKey elements, but the
variable is declared as (List *). This mistake is benign, because we
only pass the pointer to lcons() and never dereference it.

This exists since ~2004, but it's confusing. So fix and backpatch to all
supported branches.

Backpatch-through: 10
Discussion: https://postgr.es/m/bf3a6ea1-a7d8-7211-0669-189d5c169374%40enterprisedb.com
2022-01-23 03:56:44 +01:00
Tom Lane 26c841ed1b Flush table's relcache during ALTER TABLE ADD PRIMARY KEY USING INDEX.
Previously, unless we had to add a NOT NULL constraint to the column,
this command resulted in updating only the index's relcache entry.
That's problematic when replication behavior is being driven off the
existence of a primary key: other sessions (and ours too for that
matter) failed to recalculate their opinion of whether the table can
be replicated.  Add a relcache invalidation to fix it.

This has been broken since pg_class.relhaspkey was removed in v11.
Before that, updating the table's relhaspkey value sufficed to cause
a cache flush.  Hence, backpatch to v11.

Report and patch by Hou Zhijie

Discussion: https://postgr.es/m/OS0PR01MB5716EBE01F112C62F8F9B786947B9@OS0PR01MB5716.jpnprd01.prod.outlook.com
2022-01-22 13:32:40 -05:00
Tom Lane 37f5dc8b8c Fix race condition in gettext() initialization in libpq and ecpglib.
In libpq and ecpglib, multiple threads can concurrently enter the
initialization logic for message localization.  Since we set the
its-done flag before actually doing the work, it'd be possible
for some threads to reach gettext() before anyone has called
bindtextdomain().  Barring bugs in libintl itself, this would not
result in anything worse than failure to localize some early
messages.  Nonetheless, it's a bug, and an easy one to fix.

Noted while investigating bug #17299 from Clemens Zeidler
(much thanks to Liam Bowen for followup investigation on that).
It currently appears that that actually *is* a bug in libintl itself,
but that doesn't let us off the hook for this bit.

Back-patch to all supported versions.

Discussion: https://postgr.es/m/17299-7270741958c0b1ab@postgresql.org
Discussion: https://postgr.es/m/CAE7q7Eit4Eq2=bxce=Fm8HAStECjaXUE=WBQc-sDDcgJQ7s7eg@mail.gmail.com
2022-01-21 15:36:29 -05:00
Andres Freund 2c15b29f7c fsync pg_logical/mappings in CheckPointLogicalRewriteHeap().
While individual logical rewrite files were synced to disk, the directory was
not. On some filesystems that could lead to loosing directory entries after a
crash.

Reported-By: Tom Lane <tgl@sss.pgh.pa.us>
Author: Nathan Bossart <bossartn@amazon.com>
Discussion: https://postgr.es/m/867F2E29-2782-4869-970E-B984C6D35A8F@amazon.com
Backpatch: 10-
2022-01-21 11:24:12 -08:00
Michael Paquier 0ffe2975c3 Fix one-off bug causing missing commit timestamps for subtransactions
The logic in charge of writing commit timestamps (enabled with
track_commit_timestamp) for subtransactions had a one-bug bug,
where it would be possible that commit timestamps go missing for the
last subtransaction committed.

While on it, simplify a bit the iteration logic in the loop writing the
commit timestamps, as per suggestions from Kyotaro Horiguchi and Tom
Lane, so as some variable initializations are not part of the loop
itself.

Issue introduced in 73c986a.

Analyzed-by: Alex Kingsborough
Author: Alex Kingsborough, Kyotaro Horiguchi
Discussion: https://postgr.es/m/73A66172-4050-4F2A-B7F1-13508EDA2144@amazon.com
Backpatch-through: 10
2022-01-21 14:54:59 +09:00
Tom Lane 17019c00ff Tighten TAP tests' tracking of postmaster state some more.
Commits 6c4a8903b et al. had a couple of deficiencies:

* The logic I added to Cluster::start to see if a PID file is present
could be fooled by a stale PID file left over from a previous
postmaster.  To fix, if we're not sure whether we expect to find a
running postmaster or not, validate the PID using "kill 0".

* 017_shm.pl has a loop in which it just issues repeated Cluster::start
calls; this will fail if some invocation fails but leaves self->_pid
set.  Per buildfarm results, the above fix is not enough to make this
safe: we might have "validated" a PID for a postmaster that exits
immediately after we look.  Hence, match each failed start call with
a stop call that will get us back to the self->_pid == undef state.
Add a fail_ok option to Cluster::stop to make this work.

Discussion: https://postgr.es/m/CA+hUKGKV6fOHvfiPt8=dOKzvswjAyLoFoJF1iQXMNpi7+hD1JQ@mail.gmail.com
2022-01-20 17:28:07 -05:00
Andrew Dunstan 0a79feeca7
Allow clean.bat to be run from anywhere
This was omitted from c3879a7b4c which modified the other msvc .bat
files.

Per request from Juan José Santamaría Flecha

Discussion: https://postgr.es/m/CAC+AXB0_fxYGbQoaYjCA8um7TTbOVP4L9aXnVmHwK8WzaT4gdA@mail.gmail.com

Backpatch to all live branches.
2022-01-20 10:21:12 -05:00
Tom Lane 99aa0ff68a TAP tests: check for postmaster.pid anyway when "pg_ctl start" fails.
"pg_ctl start" might start a new postmaster and then return failure
anyway, for example if PGCTLTIMEOUT is exceeded.  If there is a
postmaster there, it's still incumbent on us to shut it down at
script end, so check for the PID file even though we are about
to fail.

This has been broken all along, so back-patch to all supported branches.

Discussion: https://postgr.es/m/647439.1642622744@sss.pgh.pa.us
2022-01-19 16:29:09 -05:00
Tom Lane 92e6c1c9be Avoid calling gettext() in signal handlers.
It seems highly unlikely that gettext() can be relied on to be
async-signal-safe.  psql used to understand that, but someone got
it wrong long ago in the src/bin/scripts/ version of handle_sigint,
and then the bad idea was perpetuated when those two versions were
unified into src/fe_utils/cancel.c.

I'm unsure why there have not been field complaints about this
... maybe gettext() is signal-safe once it's translated at least
one message?  But we have no business assuming any such thing.

In cancel.c (v13 and up), I preserved our ability to localize
"Cancel request sent" messages by invoking gettext() before
the signal handler is set up.  In earlier branches I just made
src/bin/scripts/ not localize those messages, as psql did then.

(Just for extra unsafety, the src/bin/scripts/ version was
invoking fprintf() from a signal handler.  Sigh.)

Noted while fixing signal-safety issues in PQcancel() itself.
Back-patch to all supported branches.

Discussion: https://postgr.es/m/2937814.1641960929@sss.pgh.pa.us
2022-01-17 13:30:04 -05:00
Tom Lane 8b107467c6 Avoid calling strerror[_r] in PQcancel().
PQcancel() is supposed to be safe to call from a signal handler,
and indeed psql uses it that way.  All of the library functions
it uses are specified to be async-signal-safe by POSIX ...
except for strerror.  Neither plain strerror nor strerror_r
are considered safe.  When this code was written, back in the
dark ages, we probably figured "oh, strerror will just index
into a constant array of strings" ... but in any locale except C,
that's unlikely to be true.  Probably the reason we've not heard
complaints is that (a) this error-handling code is unlikely to be
reached in normal use, and (b) in many scenarios, localized error
strings would already have been loaded, after which maybe it's
safe to call strerror here.  Still, this is clearly unacceptable.

The best we can do without relying on strerror is to print the
decimal value of errno, so make it do that instead.  (This is
probably not much loss of user-friendliness, given that it is
hard to get a failure here.)

Back-patch to all supported branches.

Discussion: https://postgr.es/m/2937814.1641960929@sss.pgh.pa.us
2022-01-17 12:52:44 -05:00
Tomas Vondra 491182e529 Build inherited extended stats on partitioned tables
Commit 859b3003de disabled building of extended stats for inheritance
trees, to prevent updating the same catalog row twice. While that
resolved the issue, it also means there are no extended stats for
declaratively partitioned tables, because there are no data in the
non-leaf relations.

That also means declaratively partitioned tables were not affected by
the issue 859b3003de addressed, which means this is a regression
affecting queries that calculate estimates for the whole inheritance
tree as a whole (which includes e.g. GROUP BY queries).

But because partitioned tables are empty, we can invert the condition
and build statistics only for the case with inheritance, without losing
anything. And we can consider them when calculating estimates.

It may be necessary to run ANALYZE on partitioned tables, to collect
proper statistics. For declarative partitioning there should no prior
statistics, and it might take time before autoanalyze is triggered. For
tables partitioned by inheritance the statistics may include data from
child relations (if built 859b3003de), contradicting the current code.

Report and patch by Justin Pryzby, minor fixes and cleanup by me.
Backpatch all the way back to PostgreSQL 10, where extended statistics
were introduced (same as 859b3003de).

Author: Justin Pryzby
Reported-by: Justin Pryzby
Backpatch-through: 10
Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com
2022-01-15 18:32:20 +01:00
Tomas Vondra b3cac25f4d Ignore extended statistics for inheritance trees
Since commit 859b3003de we only build extended statistics for individual
relations, ignoring the child relations. This resolved the issue with
updating catalog tuple twice, but we still tried to use the statistics
when calculating estimates for the whole inheritance tree. When the
relations contain very distinct data, it may produce bogus estimates.

This is roughly the same issue 427c6b5b9 addressed ~15 years ago, and we
fix it the same way - by ignoring extended statistics when calculating
estimates for the inheritance tree as a whole. We still consider
extended statistics when calculating estimates for individual child
relations, of course.

This may result in plan changes due to different estimates, but if the
old statistics were not describing the inheritance tree particularly
well it's quite likely the new plans is actually better.

Report and patch by Justin Pryzby, minor fixes and cleanup by me.
Backpatch all the way back to PostgreSQL 10, where extended statistics
were introduced (same as 859b3003de).

Author: Justin Pryzby
Reported-by: Justin Pryzby
Backpatch-through: 10
Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com
2022-01-15 02:40:40 +01:00
Tom Lane 3a1bfe2565 Fix ruleutils.c's dumping of whole-row Vars in more contexts.
Commit 7745bc352 intended to ensure that whole-row Vars would be
printed with "::type" decoration in all contexts where plain
"var.*" notation would result in star-expansion, notably in
ROW() and VALUES() constructs.  However, it missed the case of
INSERT with a single-row VALUES, as reported by Timur Khanjanov.

Nosing around ruleutils.c, I found a second oversight: the
code for RowCompareExpr generates ROW() notation without benefit
of an actual RowExpr, and naturally it wasn't in sync :-(.
(The code for FieldStore also does this, but we don't expect that
to generate strictly parsable SQL anyway, so I left it alone.)

Back-patch to all supported branches.

Discussion: https://postgr.es/m/efaba6f9-4190-56be-8ff2-7a1674f9194f@intrans.baku.az
2022-01-13 17:49:26 -05:00
Andrew Dunstan 03c545b66f
Avoid warning about uninitialized value in MSVC python3 tests
Juan José Santamaría Flecha

Backpatch to all live branches
2022-01-10 10:12:43 -05:00
Andrew Dunstan c7fa0f55de
Allow MSVC .bat wrappers to be called from anywhere
Instead of using a hardcoded or default path to the perl file the .bat
file is a wrapper for, we use a path that means the file is found in
the same directory as the .bat file.

Patch by Anton Voloshin, slightly tweaked by me.

Backpatch to all live branches

Discussion: https://postgr.es/m/2b7a674b-5fb0-d264-75ef-ecc7a31e54f8@postgrespro.ru
2022-01-07 16:14:32 -05:00
Tom Lane 2ce113a4f0 Prevent altering partitioned table's rowtype, if it's used elsewhere.
We disallow altering a column datatype within a regular table,
if the table's rowtype is used as a column type elsewhere,
because we lack code to go around and rewrite the other tables.
This restriction should apply to partitioned tables as well, but it
was not checked because ATRewriteTables and ATPrepAlterColumnType
were not on the same page about who should do it for which relkinds.

Per bug #17351 from Alexander Lakhin.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/17351-6db1870f3f4f612a@postgresql.org
2022-01-06 16:46:46 -05:00
Alvaro Herrera b63851a456
Fix silly mistake in Assert 2022-01-04 13:21:23 -03:00
Alvaro Herrera 28cd57416e
Allow special SKIP LOCKED condition in Assert()
Under concurrency, it is possible for two sessions to be merrily locking
and releasing a tuple and marking it again as HEAP_XMAX_INVALID all the
while a third session attempts to lock it, miserably fails at it, and
then contemplates life, the universe and everything only to eventually
fail an assertion that said bit is not set.  Before SKIP LOCKED that was
indeed a reasonable expectation, but alas! commit df630b0dd5 falsified
it.

This bug is as old as time itself, and even older, if you think time
begins with the oldest supported branch.  Therefore, backpatch to all
supported branches.

Author: Simon Riggs <simon.riggs@enterprisedb.com>
Discussion: https://postgr.es/m/CANbhV-FeEwMnN8yuMyss7if1ZKjOKfjcgqB26n8pqu1e=q0ebg@mail.gmail.com
2022-01-04 13:01:05 -03:00
Tom Lane ec36745217 Fix index-only scan plans, take 2.
Commit 4ace45677 failed to fix the problem fully, because the
same issue of attempting to fetch a non-returnable index column
can occur when rechecking the indexqual after using a lossy index
operator.  Moreover, it broke EXPLAIN for such indexquals (which
indicates a gap in our test cases :-().

Revert the code changes of 4ace45677 in favor of adding a new field
to struct IndexOnlyScan, containing a version of the indexqual that
can be executed against the index-returned tuple without using any
non-returnable columns.  (The restrictions imposed by check_index_only
guarantee this is possible, although we may have to recompute indexed
expressions.)  Support construction of that during setrefs.c
processing by marking IndexOnlyScan.indextlist entries as resjunk
if they can't be returned, rather than removing them entirely.
(We could alternatively require setrefs.c to look up the IndexOptInfo
again, but abusing resjunk this way seems like a reasonably safe way
to avoid needing to do that.)

This solution isn't great from an API-stability standpoint: if there
are any extensions out there that build IndexOnlyScan structs directly,
they'll be broken in the next minor releases.  However, only a very
invasive extension would be likely to do such a thing.  There's no
change in the Path representation, so typical planner extensions
shouldn't have a problem.

As before, back-patch to all supported branches.

Discussion: https://postgr.es/m/3179992.1641150853@sss.pgh.pa.us
Discussion: https://postgr.es/m/17350-b5bdcf476e5badbb@postgresql.org
2022-01-03 15:42:27 -05:00