BuildDescForRelation() main job is to convert ColumnDef lists to
pg_attribute/tuple descriptor arrays, which is really mostly an
internal subroutine of DefineRelation() and some related functions,
which is more the remit of tablecmds.c and doesn't have much to do
with the basic tuple descriptor interfaces in tupdesc.c. This is also
supported by observing the header includes we can remove in tupdesc.c.
By moving it over, we can also (in the future) make
BuildDescForRelation() use more internals of tablecmds.c that are not
sensible to be exposed in tupdesc.c.
Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org
Keep track of the used size of the array. That avoids looping through
the whole array in a few places. It doesn't matter from a performance
point of view since the array is small anyway, but this feels less
surprising and is a little less code. Now that we have an explicit
NumListenSockets variable that is statically initialized to 0, we
don't need the loop to initialize the array.
Allocate the array in PostmasterContext. The array isn't needed in
child processes, so this allows reusing that memory. We could easily
make the array resizable now, but we haven't heard any complaints
about the current 64 sockets limit.
Discussion: https://www.postgresql.org/message-id/7bb7ad65-a018-2419-742f-fa5fd877d338@iki.fi
Previously, the JSON code didn't have to worry too much about freeing
JsonLexContext, because it was never too long-lived. With new features
being added for SQL/JSON this is no longer the case. Add a routine
that knows how to free this struct and apply that to a few places, to
prevent this from becoming problematic.
At the same time, we change the API of makeJsonLexContextCstringLen to
make it receive a pointer to JsonLexContext for callers that want it to
be stack-allocated; it can also be passed as NULL to get the original
behavior of a palloc'ed one.
This also causes an ABI break due to the addition of flags to
JsonLexContext, so we can't easily backpatch it. AFAICS that's not much
of a problem; apparently some leaks might exist in JSON usage of
text-search, for example via json_to_tsvector, but I haven't seen any
complaints about that.
Per Coverity complaint about datum_to_jsonb_internal().
Discussion: https://postgr.es/m/20230808174110.oq3iymllsv6amkih@alvherre.pgsql
6b94e7a6d did this for ordered append paths to allow fast startup
MergeAppends, however, nothing was done for the Append case.
Here we adjust add_paths_to_append_rel() to have it build an AppendPath
containing the cheapest startup paths from each of the child relations
when the append rel has "consider_startup" set.
Author: Andy Fan, David Rowley
Discussion: https://www.postgresql.org/message-id/CAKU4AWrXSkUV=Pt-gRxQT7EbfUeNssprGyNsB=5mJibFZ6S3ww@mail.gmail.com
Ensure we switch to the per-tuple memory context to prevent any memory
leaks of detoasted Datums in MemoizeHash_hash() and MemoizeHash_equal().
Reported-by: Orlov Aleksej
Author: Orlov Aleksej, David Rowley
Discussion: https://postgr.es/m/83281eed63c74e4f940317186372abfd%40cft.ru
Backpatch-through: 14, where Memoize was added
It is unnecessary to include this field in IndexInfo. It is only used
by DDL code, not during execution. It is really only used to pass
local information around between functions in index.c and indexcmds.c,
for which it is clearer to use local variables, like in similar cases.
Discussion: https://www.postgresql.org/message-id/flat/f84640e3-00d3-5abd-3f41-e6a19d33c40b@eisentraut.org
In hopes of putting these where any would-be implementer is sure to
find them, make a placeholder grammar production for ALTER DROP VALUE
and put them there. This is really just a docs patch, though.
Vik Fearing, with a bit more wordsmithing by me
Discussion: https://postgr.es/m/9fffd149-da0f-0c9c-6745-731fb688642a@postgresfriends.org
This further reduces the length and complexity of sendFile(),
hopefully make it easier to understand and modify. In addition
to moving some logic into a new function, I took this opportunity
to make a few slight adjustments to sendFile() itself, including
renaming the 'len' variable to 'bytes_done', since we use it to represent
the number of bytes we've already handled so far, not the total
length of the file.
Patch by me, reviewed by David Steele.
Discussion: http://postgr.es/m/CA+TgmoYt5jXH4U6cu1dm9Oe2FTn1aae6hBNhZzJJjyjbE_zYig@mail.gmail.com
If checksum verification fails for a particular page, we reread the
page and try one more time. The code that does this somewhat complex
and difficult to follow. Move some of the logic into a new function
and rearrange the code a bit to try to make it clearer. This way,
we don't need the block_retry Boolean, a couple of other variables
move from sendFile() into the new function, and some code is now less
deeply indented.
Patch by me, reviewed by David Steele.
Discussion: http://postgr.es/m/CA+TgmoYt5jXH4U6cu1dm9Oe2FTn1aae6hBNhZzJJjyjbE_zYig@mail.gmail.com
The code in charge of copying the contents of PgBackendStatus to local
memory could fail on memory allocation because of an overflow on the
amount of memory to use. The overflow can happen when combining a high
value track_activity_query_size (max at 1MB) with a large
max_connections, when both multiplied get higher than INT32_MAX as both
parameters treated as signed integers. This could for example trigger
with the following functions, all calling pgstat_read_current_status():
- pg_stat_get_backend_subxact()
- pg_stat_get_backend_idset()
- pg_stat_get_progress_info()
- pg_stat_get_activity()
- pg_stat_get_db_numbackends()
The change to use MemoryContextAllocHuge() has been introduced in
8d0ddccec6, so backpatch down to 12.
Author: Jakub Wartak
Discussion: https://postgr.es/m/CAKZiRmw8QSNVw2qNK-dznsatQqz+9DkCquxP0GHbbv1jMkGHMA@mail.gmail.com
Backpatch-through: 12
Make a few newish calls to appendStringInfo() which have no special
formatting use appendStringInfoString() instead. Also, adjust usages of
appendStringInfoString() which only append a string containing a single
character to make use of appendStringInfoChar() instead.
This makes the code marginally faster, but primarily this change is so
we use the StringInfo type as it was intended to be used.
Discussion: https://postgr.es/m/CAApHDvpXKQmL+r=VDNS98upqhr9yGBhv2Jw3GBFFk_wKHcB39A@mail.gmail.com
This commit changes the WAL reader routines so as a FATAL for the
backend or exit(FAILURE) for the frontend is triggered if an allocation
for a WAL record decode fails in walreader.c, rather than treating this
case as bogus data, which would be equivalent to the end of WAL. The
key is to avoid palloc_extended(MCXT_ALLOC_NO_OOM) in walreader.c,
relying on plain palloc() calls.
The previous behavior could make WAL replay finish too early than it
should. For example, crash recovery finishing earlier may corrupt
clusters because not all the WAL available locally was replayed to
ensure a consistent state. Out-of-memory failures would show up
randomly depending on the memory pressure on the host, but one simple
case would be to generate a large record, then replay this record after
downsizing a host, as Ethan Mertz originally reported.
This relies on bae868caf2, as the WAL reader routines now do the
memory allocation required for a record only once its header has been
fully read and validated, making xl_tot_len trustable. Making the WAL
reader react differently on out-of-memory or bogus record data would
require ABI changes, so this is the safest choice for stable branches.
Also, it is worth noting that 3f1ce97346 has been using a plain
palloc() in this code for some time now.
Thanks to Noah Misch and Thomas Munro for the discussion.
Like the other commit, backpatch down to 12, leaving out v11 that will
be EOL'd soon. The behavior of considering a failed allocation as bogus
data comes originally from 0ffe11abd3, where the record length
retrieved from its header was not entirely trustable.
Reported-by: Ethan Mertz
Discussion: https://postgr.es/m/ZRKKdI5-RRlta3aF@paquier.xyz
Backpatch-through: 12
The retry loop is needed because heap_page_prune() calls
HeapTupleSatisfiesVacuum() and then lazy_scan_prune() does the same
thing again, and they might get different answers due to concurrent
clog updates. But this patch makes heap_page_prune() return the
HeapTupleSatisfiesVacuum() results that it computed back to the
caller, which allows lazy_scan_prune() to avoid needing to recompute
those values in the first place. That's nice both because it eliminates
the need for a retry loop and also because it's cheaper.
Melanie Plageman, reviewed by David Geier, Andres Freund, and me.
Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
This adjusts the expression evaluation code for CoerceViaIO and
CoerceToDomain to handle errors softly if needed.
For CoerceViaIo, this means using InputFunctionCallSafe(), which
provides the option to handle errors softly, instead of calling the
type input function directly.
For CoerceToDomain, this simply entails replacing the ereport() in
ExecEvalConstraintCheck() by errsave().
In both cases, the ErrorSaveContext to be used when evaluating the
expression is stored by ExecInitExprRec() in the expression's struct
in the expression's ExprEvalStep. The ErrorSaveContext is passed by
setting ExprState.escontext to point to it when calling
ExecInitExprRec() on the expression whose errors are to be handled
softly.
Note that no call site of ExecInitExprRec() has been changed in this
commit, so there's no functional change. This is intended for
implementing new SQL/JSON expression nodes in future commits that
will use to it suppress errors that may occur during type coercions.
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
After receiving position data for a lexeme, tsvectorrecv()
advanced its "datalen" value by (npos+1)*sizeof(WordEntry)
where the correct calculation is (npos+1)*sizeof(WordEntryPos).
This accidentally failed to render the constructed tsvector
invalid, but it did result in leaving some wasted space
approximately equal to the space consumed by the position data.
That could have several bad effects:
* Disk space is wasted if the received tsvector is stored into a
table as-is.
* A legal tsvector could get rejected with "maximum total lexeme
length exceeded" if the extra space pushes it over the MAXSTRPOS
limit.
* In edge cases, the finished tsvector could be assigned a length
larger than the allocated size of its palloc chunk, conceivably
leading to SIGSEGV when the tsvector gets copied somewhere else.
The odds of a field failure of this sort seem low, though valgrind
testing could probably have found this.
While we're here, let's express the calculation as
"sizeof(uint16) + npos * sizeof(WordEntryPos)" to avoid the type
pun implicit in the "npos + 1" formulation. It's not wrong
given that WordEntryPos had better be 2 bytes to avoid padding
problems, but it seems clearer this way.
Report and patch by Denis Erokhin. Back-patch to all supported
versions.
Discussion: https://postgr.es/m/009801d9f2d9$f29730c0$d7c59240$@datagile.ru
In recent releases, such cases fail with "cache lookup failed for
function 0" rather than complaining that the conversion function
doesn't exist as prior versions did. Seems to be a consequence of
sloppy refactoring in commit f82de5c46. Add the missing error check.
Per report from Pierre Fortin. Back-patch to v14 where the
oversight crept in.
Discussion: https://postgr.es/m/20230929163739.3bea46e5.pfortin@pfortin.com
Commit 9f8377f7a2 was a little too eager in fetching default values.
Normally this would not matter, but if the default value is not valid
for the type (e.g. a varchar that's too long) it caused an unnecessary
error.
Complaint and fix from Laurenz Albe
Backpatch to release 16.
Discussion: https://postgr.es/m/75a7b7483aeb331aa017328d606d568fc715b90d.camel@cybertec.at
These options already exist, but you need to specify a column list for
them, which can be cumbersome. We already have the possibility of all
columns for FORCE QUOTE, so this is simply extending that facility to
FORCE_NULL and FORCE_NOT_NULL.
Author: Zhang Mingli
Reviewed-By: Richard Guo, Kyatoro Horiguchi, Michael Paquier.
Discussion: https://postgr.es/m/CACJufxEnVqzOFtqhexF2+AwOKFrV8zHOY3y=p+gPK6eB14pn_w@mail.gmail.com
ANALYZE on a table with inheritance children analyzes all the child
tables in a loop. When stepping to next child table, it updated the
child rel ID value in the command progress stats, but did not reset
the 'sample_blks_total' and 'sample_blks_scanned' counters.
acquire_sample_rows() updates 'sample_blks_total' as soon as the scan
starts and 'sample_blks_scanned' after processing the first block, but
until then, pg_stat_progress_analyze would display a bogus combination
of the new child table relid with old counter values from the
previously processed child table. Fix by resetting 'sample_blks_total'
and 'sample_blks_scanned' to zero at the same time that
'current_child_table_relid' is updated.
Backpatch to v13, where pg_stat_progress_analyze view was introduced.
Reported-by: Justin Pryzby
Discussion: https://www.postgresql.org/message-id/20230122162345.GP13860%40telsasoft.com
Under some circumstances, concurrent MERGE operations could lead to
inconsistent results, that varied according the plan chosen. This was
caused by a lack of rowmarks on the source relation, which meant that
EvalPlanQual rechecking was not guaranteed to return the same source
tuples when re-running the join query.
Fix by ensuring that preprocess_rowmarks() sets up PlanRowMarks for
all non-target relations used in MERGE, in the same way that it does
for UPDATE and DELETE.
Per bug #18103. Back-patch to v15, where MERGE was introduced.
Dean Rasheed, reviewed by Richard Guo.
Discussion: https://postgr.es/m/18103-c4386baab8e355e3%40postgresql.org
Improve find_base_rel() and find_base_rel_ignore_join() so that they
raise an ERROR if they ever receive a negative relid value in
non-cassert builds. If either of these functions had ever received a
negative relid then they'd have attempted to access memory that does not
belong to simple_rel_array.
Because no evidence has been presented of actual cases where bugs have
caused this to happen, here we take a lightweight approach to checking
for negative values and simply cast both values to uint32 before
performing the comparison. This will cause any negative relids to be
seen as greater than simple_rel_array_size which will ERROR rather than
attempt to access a negative simple_rel_array element. Obviously, the
run-time error is better than a crash, so it makes sense to protect
against this, especially when it can be done without adding any
additional run-time overhead.
There is a slight change here if the functions are ever called with a
relid of 0. This will pass the bounds check, but that array entry
should be NULL (along with the corresponding simple_rte_array entry), so
won't pass the "if (rel)" condition and still fall through and raise an
ERROR.
Author: Ranier Vilela
Reviewed-by: Ashutosh Bapat, David Rowley
Discussion: https://postgr.es/m/CAEudQArQSghBu2gLojg4o_tnHj_x2HcS%3D%2BwewL3NJS8z0VnK%2Bg%40mail.gmail.com
nbtree's mark/restore processing failed to correctly handle an edge case
involving array key advancement and related search-type scan key state.
Scans with ScalarArrayScalarArrayOpExpr quals requiring mark/restore
processing (for a merge join) could incorrectly conclude that an
affected array/scan key must not have advanced during the time between
marking and restoring the scan's position.
As a result of all this, array key handling within btrestrpos could skip
a required call to _bt_preprocess_keys(). This confusion allowed later
primitive index scans to overlook tuples matching the true current array
keys. The scan's search-type scan keys would still have spurious values
corresponding to the final array element(s) -- not values matching the
first/now-current array element(s).
To fix, remember that "array key wraparound" has taken place during the
ongoing btrescan in a flag variable stored in the scan's state, and use
that information at the point where btrestrpos decides if another call
to _bt_preprocess_keys is required.
Oversight in commit 70bc5833, which taught nbtree to handle array keys
during mark/restore processing, but missed this subtlety. That commit
was itself a bug fix for an issue in commit 9e8da0f7, which taught
nbtree to handle ScalarArrayOpExpr quals natively.
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkgP3DDRJxw6DgjCxo-cu-DKrvjEv_ArkP2ctBJatDCYg@mail.gmail.com
Backpatch: 11- (all supported branches).
This code was sloppy about comparison of index columns that
are expressions. It didn't reliably reject cases where one
index has an expression where the other has a plain column,
and it could index off the start of the attmap array, leading
to a Valgrind complaint (though an actual crash seems unlikely).
I'm not sure that the expression-vs-column sloppiness leads
to any visible problem in practice, because the subsequent
comparison of the two expression lists would reject cases
where the indexes have different numbers of expressions
overall. Maybe we could falsely match indexes having the
same expressions in different column positions, but it'd
require unlucky contents of the word before the attmap array.
It's not too surprising that no problem has been reported
from the field. Nonetheless, this code is clearly wrong.
Per bug #18135 from Alexander Lakhin. Back-patch to all
supported branches.
Discussion: https://postgr.es/m/18135-532f4a755e71e4d2@postgresql.org
Previously, one of the values in the struct was returned as the return
value, and another was returned via an output parameter. In
preparation for returning more stuff, consolidate both values into a
struct returned via an output parameter.
Melanie Plageman, reviewed by Andres Freund and by me.
Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
Tid Range scans were added back in bb437f995. That commit forgot to add
handling for TidRangePaths in print_path().
Only people building with OPTIMIZER_DEBUG might have noticed this, which
likely is the reason it's taken 4 years for anyone to notice.
Author: Andrey Lepikhov
Reported-by: Andrey Lepikhov
Discussion: https://postgr.es/m/379082d6-1b6a-4cd6-9ecf-7157d8c08635@postgrespro.ru
Backpatch-through: 14, where bb437f995 was introduced
This commit removes unnecessary ExecExprFreeContext() calls in
ExecEnd* routines because the actual cleanup is managed by
FreeExecutorState(). With no callers remaining for
ExecExprFreeContext(), this commit also removes the function.
This commit also drops redundant ExecClearTuple() calls, because
ExecResetTupleTable() in ExecEndPlan() already takes care of
resetting and dropping all TupleTableSlots initialized with
ExecInitScanTupleSlot() and ExecInitExtraTupleSlot().
After these modifications, the ExecEnd*() routines for ValuesScan,
NamedTuplestoreScan, and WorkTableScan became redundant. So, this
commit removes them.
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
"in_streaming" is a flag used to track if an instance of pgoutput is
streaming changes. When pgoutput is started, the flag was always reset,
switched it back and forth in the stream start/stop callbacks.
Before this commit, it was a global variable, which is confusing as it
is actually attached to a state of PGOutputData. Per my analysis, using
a global variable did not lead to an active bug like in 54ccfd6586,
but it makes the code more consistent. Note that we cannot backpatch
this change anyway as it requires the addition of a new field to
PGOutputData, exposed in pgoutput.h.
Author: Hou Zhijie
Reviewed-by: Amit Kapila, Michael Paquier, Peter Smith
Discussion: https://postgr.es/m/OS0PR01MB571690EF24F51F51EFFCBB0E94FAA@OS0PR01MB5716.jpnprd01.prod.outlook.com
This unifies some repetitive code.
Note: I didn't push the "not found" error message into the new
function, even though all existing callers would be able to make use
of it. Using the existing error handling as-is would probably require
exposing the Relation type via tupdesc.h, which doesn't seem
desirable. (Or even if we changed it to just report the OID, it would
inject the concept of a relation containing the tuple descriptor into
tupdesc.h, which might be a layering violation. Perhaps some further
improvements could be considered here separately.)
Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da%40eisentraut.org
When performing inlining LLVM unfortunately "leaks" types (the
types survive and are usable, but a new round of inlining will
recreate new structurally equivalent types). This accumulation
will over time amount to a memory leak which for some queries
can be large enough to trigger the OOM process killer.
To avoid accumulation of types, all IR related data is stored
in an LLVMContextRef which is dropped and recreated in order
to release all types. Dropping and recreating incurs overhead,
so it will be done only after 100 queries. This is a heuristic
which might be revisited, but until we can get the size of the
context from LLVM we are flying a bit blind.
This issue has been reported several times, there may be more
references to it in the archives on top of the threads linked
below.
Backpatching of this fix will be handled once it has matured
in master for a bit.
Reported-By: Justin Pryzby <pryzby@telsasoft.com>
Reported-By: Kurt Roeckx <kurt@roeckx.be>
Reported-By: Jaime Casanova <jcasanov@systemguards.com.ec>
Reported-By: Lauri Laanmets <pcspets@gmail.com>
Author: Andres Freund and Daniel Gustafsson
Discussion: https://postgr.es/m/7acc8678-df5f-4923-9cf6-e843131ae89d@www.fastmail.com
Discussion: https://postgr.es/m/20201218235607.GC30237@telsasoft.com
Discussion: https://postgr.es/m/CAPH-tTxLf44s3CvUUtQpkDr1D8Hxqc2NGDzGXS1ODsfiJ6WSqA@mail.gmail.com
Commit b059d2f456 introduced llvm_types_module and accidentally
exported it. As there is no usecase for accessing this variable
externally, this makes it static.
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/20221101055132.pjjsvlkeo4stbjkq@awork3.anarazel.de
The pgoutput module uses a global variable (publish_no_origin) to cache
the action for the origin filter, but we didn't reset the flag when
shutting down the output plugin, so subsequent retries may access the
previous publish_no_origin value.
We fix this by storing the flag in the output plugin's private data.
Additionally, the patch removes the currently unused origin string from the
structure.
For the back branch, to avoid changing the exposed structure, we eliminated the
global variable and instead directly used the origin string for change
filtering.
Author: Hou Zhijie
Reviewed-by: Amit Kapila, Michael Paquier
Backpatch-through: 16
Discussion: http://postgr.es/m/OS0PR01MB571690EF24F51F51EFFCBB0E94FAA@OS0PR01MB5716.jpnprd01.prod.outlook.com
Yet another bug in the ilk of commits a7ee7c851 and 741b88435. In
741b88435, we took care to clear the memorized location of the
downlink when we split the parent page, because splitting the parent
page can move the downlink. But we missed that even *updating* a tuple
on the parent can move it, because updating a tuple on a gist page is
implemented as a delete+insert, so the updated tuple gets moved to the
end of the page.
This commit fixes the bug in two different ways (belt and suspenders):
1. Clear the downlink when we update a tuple on the parent page, even
if it's not split. This the same approach as in commits a7ee7c851
and 741b88435.
I also noticed that gistFindCorrectParent did not clear the
'downlinkoffnum' when it stepped to the right sibling. Fix that
too, as it seems like a clear bug even though I haven't been able
to find a test case to hit that.
2. Change gistFindCorrectParent so that it treats 'downlinkoffnum'
merely as a hint. It now always first checks if the downlink is
still at that location, and if not, it scans the page like before.
That's more robust if there are still more cases where we fail to
clear 'downlinkoffnum' that we haven't yet uncovered. With this,
it's no longer necessary to meticulously clear 'downlinkoffnum',
so this makes the previous fixes unnecessary, but I didn't revert
them because it still seems nice to clear it when we know that the
downlink has moved.
Also add the test case using the same test data that Alexander
posted. I tried to reduce it to a smaller test, and I also tried to
reproduce this with different test data, but I was not able to, so
let's just include what we have.
Backpatch to v12, like the previous fixes.
Reported-by: Alexander Lakhin
Discussion: https://www.postgresql.org/message-id/18129-caca016eaf0c3702@postgresql.org
There was a mismatch between the const qualifiers for
excludeDirContents in src/backend/backup/basebackup.c and
src/bin/pg_rewind/filemap.c, which led to a quick search for similar
cases. We should make excludeDirContents match, but the rest of the
changes seem like a good idea as well.
Author: David Steele <david@pgmasters.net>
Discussion: https://www.postgresql.org/message-id/flat/669a035c-d23d-2f38-7ff0-0cb93e01d610@pgmasters.net
As implemented in 5891c7a8ed, setting "force" to true in
pgstat_report_wal() causes the routine to not wait for the pgstat
shmem lock if it cannot be acquired, in which case the WAL and I/O
statistics finish by not being flushed. The origin of the confusion
comes from pgstat_flush_wal() and pgstat_flush_io(), that use "nowait"
as sole argument. The I/O stats are new in v16.
This is the opposite behavior of what has been used in
pgstat_report_stat(), where "force" is the opposite of "nowait". In
this case, when "force" is true, the routine sets "nowait" to false,
which would cause the routine to wait for the pgstat shmem lock,
ensuring that the stats are always flushed. When "force" is false,
"nowait" is set to true, and the stats would only not be flushed if the
pgstat shmem lock can be acquired, returning immediately without
flushing the stats if the lock cannot be acquired.
This commit changes pgstat_report_wal() so as "force" has the same
behavior as in pgstat_report_stat(). There are currently three callers
of pgstat_report_wal():
- Two in the checkpointer where force=true during a shutdown and the
main checkpointer loop. Now the code behaves so as the stats are always
flushed.
- One in the main loop of the bgwriter, where force=false. Now the code
behaves so as the stats would not be flushed if the pgstat shmem lock
could not be acquired.
Before this commit, some stats on WAL and I/O could have been lost after
a shutdown, for example.
Reported-by: Ryoga Yoshida
Author: Ryoga Yoshida, Michael Paquier
Discussion: https://postgr.es/m/f87a4d7be70530606b864fd1df91718c@oss.nttdata.com
Backpatch-through: 15
bae868ca removed a check that was still needed. If you had an
xl_tot_len at the end of a page that was too small for a record header,
but not big enough to span onto the next page, we'd immediately perform
the CRC check using a bogus large length. Because of arbitrary coding
differences between the CRC implementations on different platforms,
nothing very bad happened on common modern systems. On systems using
the _sb8.c fallback we could segfault.
Restore that check, add a new assertion and supply a test for that case.
Back-patch to 12, like bae868ca.
Tested-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGLCkTT7zYjzOxuLGahBdQ%3DMcF%3Dz5ZvrjSOnW4EDhVjT-g%40mail.gmail.com
Thanks to commit 2a8b40e368, the logical replication worker type is
easily determined. The worker type could already be deduced via
other columns such as leader_pid and relid, but that is unnecessary
complexity for users.
Bumps catversion.
Author: Peter Smith
Reviewed-by: Michael Paquier, Maxim Orlov, Amit Kapila
Discussion: https://postgr.es/m/CAHut%2BPtmbSMfErSk0S7xxVdZJ9XVE3xVLhqBTmT91kf57BeKDQ%40mail.gmail.com
Parse analysis of a CallStmt will inject mutable information,
for instance the OID of the called procedure, so that subsequent
DDL may create a need to re-parse the CALL. We failed to detect
this for CALLs in plpgsql routines, because no dependency information
was collected when putting a CallStmt into the plan cache. That
could lead to misbehavior or strange errors such as "cache lookup
failed".
Before commit ee895a655, the issue would only manifest for CALLs
appearing in atomic contexts, because we re-planned non-atomic
CALLs every time through anyway.
It is now apparent that extract_query_dependencies() probably
needs a special case for every utility statement type for which
stmt_requires_parse_analysis() returns true. I wanted to add
something like Assert(!stmt_requires_parse_analysis(...)) when
falling out of extract_query_dependencies_walker without doing
anything, but there are API issues as well as a more fundamental
point: stmt_requires_parse_analysis is supposed to be applied to
raw parser output, so it'd be cheating to assume it will give the
correct answer for post-parse-analysis trees. I contented myself
with adding a comment.
Per bug #18131 from Christian Stork. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/18131-576854e79c5cd264@postgresql.org
The initial estimate of the number of distinct ParsedWords is just
that: an estimate. Don't let it exceed what palloc is willing to
allocate. If in fact we need more entries, we'll eventually fail
trying to enlarge the array. But if we don't, this allows success on
inputs that currently draw "invalid memory alloc request size".
Per bug #18080 from Uwe Binder. Back-patch to all supported branches.
Discussion: https://postgr.es/m/18080-d5c5e58fef8c99b7@postgresql.org
In order to troubleshoot misbehaving or buggy event triggers, the
documented advice is to enter single-user mode. In an attempt to
reduce the number of situations where single-user mode is required
(or even recommended) for non-extraordinary maintenance, this GUC
allows to temporarily suspend event triggers.
This was originally extracted from a larger patchset which aimed
at supporting event triggers on login events.
Reviewed-by: Ted Yu <yuzhihong@gmail.com>
Reviewed-by: Mikhail Gribkov <youzhick@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/9140106E-F9BF-4D85-8FC8-F2D3C094A6D9@yesql.se
Discussion: https://postgr.es/m/0d46d29f-4558-3af9-9c85-7774e14a7709@postgrespro.ru
xl_tot_len comes first in a WAL record. Usually we don't trust it to be
the true length until we've validated the record header. If the record
header was split across two pages, previously we wouldn't do the
validation until after we'd already tried to allocate enough memory to
hold the record, which was bad because it might actually be garbage
bytes from a recycled WAL file, so we could try to allocate a lot of
memory. Release 15 made it worse.
Since 70b4f82a4b, we'd at least generate an end-of-WAL condition if the
garbage 4 byte value happened to be > 1GB, but we'd still try to
allocate up to 1GB of memory bogusly otherwise. That was an
improvement, but unfortunately release 15 tries to allocate another
object before that, so you could get a FATAL error and recovery could
fail.
We can fix both variants of the problem more fundamentally using
pre-existing page-level validation, if we just re-order some logic.
The new order of operations in the split-header case defers all memory
allocation based on xl_tot_len until we've read the following page. At
that point we know that its first few bytes are not recycled data, by
checking its xlp_pageaddr, and that its xlp_rem_len agrees with
xl_tot_len on the preceding page. That is strong evidence that
xl_tot_len was truly the start of a record that was logged.
This problem was most likely to occur on a standby, because
walreceiver.c recycles WAL files without zeroing out trailing regions of
each page. We could fix that too, but it wouldn't protect us from rare
crash scenarios where the trailing zeroes don't make it to disk.
With reliable xl_tot_len validation in place, the ancient policy of
considering malloc failure to indicate corruption at end-of-WAL seems
quite surprising, but changing that is left for later work.
Also included is a new TAP test to exercise various cases of end-of-WAL
detection by writing contrived data into the WAL from Perl.
Back-patch to 12. We decided not to put this change into the final
release of 11.
Author: Thomas Munro <thomas.munro@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com> (the idea, not the code)
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/17928-aa92416a70ff44a2%40postgresql.org
Guard against the pointer being NULL before pfreeing upon an error
returned from OpenSSL. Also handle errors from X509_NAME_print_ex
which can return -1 on memory allocation errors.
Backpatch down to v15 where the code was added.
Author: Sergey Shinderuk <s.shinderuk@postgrespro.ru>
Discussion: https://postgr.es/m/8db5374d-32e0-6abb-d402-40762511eff2@postgrespro.ru
Backpatch-through: v15
The computation of the column
information_schema.check_constraints.check_clause used
pg_get_constraintdef() plus some string manipulation to get the check
clause back out. This ended up with an extra pair of parentheses,
which is only an aesthetic problem, but also with suffixes like "NOT
VALID", which don't belong into that column. We can fix both of these
problems and simplify the code by just using pg_get_expr() instead.
Discussion: https://www.postgresql.org/message-id/799b59ef-3330-f0d2-ee23-8cdfa1740987@eisentraut.org
In older branches, COMMIT/ROLLBACK AND CHAIN failed to propagate
the current transaction's properties to the new transaction if
there was any open subtransaction (unreleased savepoint).
Instead, some previous transaction's properties would be restored.
This is because the "if (s->chain)" check in CommitTransactionCommand
examined the wrong instance of the "chain" flag and falsely
concluded that it didn't need to save transaction properties.
Our regression tests would have noticed this, except they used
identical transaction properties for multiple tests in a row,
so that the faulty behavior was not distinguishable from correct
behavior.
Commit 12d768e70 fixed the problem in v15 and later, but only rather
accidentally, because I removed the "if (s->chain)" test to avoid a
compiler warning, while not realizing that the warning was flagging a
real bug.
In v14 and before, remove the if-test and save transaction properties
unconditionally; just as in the newer branches, that's not expensive
enough to justify thinking harder.
Add the comment and extra regression test to v15 and later to
forestall any future recurrence, but there's no live bug in those
branches.
Patch by me, per bug #18118 from Liu Xiang. Back-patch to v12 where
the AND CHAIN feature was added.
Discussion: https://postgr.es/m/18118-4b72fcbb903aace6@postgresql.org
The comment introduced by commit e7cb7ee14 was a bit too terse, which
could lead to extensions doing different things within the hook function
than we intend to allow. Extend the comment to explain what they can do
within the hook function.
Back-patch to all supported branches.
In passing, I rephrased a nearby comment that I recently added to the
back branches.
Reviewed by David Rowley and Andrei Lepikhov.
Discussion: https://postgr.es/m/CAPmGK15SBPA1nr3Aqsdm%2BYyS-ay0Ayo2BRYQ8_A2To9eLqwopQ%40mail.gmail.com
When a snapshot file fails to be read in ImportSnapshot(), it would
issue an ERROR as "invalid snapshot identifier" when opening a stream
for it in read-only mode. The error handling is improved to be more
talkative in failure cases:
- If a snapshot identifier uses incorrect characters, complain with the
same error as before this commit.
- If the snapshot file cannot be found in pg_snapshots/, complain with a
"snapshot \"foo\" does not exist" instead. This maps to the case where
AllocateFile() fails on ENOENT. Based on a suggestion from Andres
Freund.
- If AllocateFile() throws something else than ENOENT as errno, report
it with more details in %m instead, as these failures are never
expected.
b29504eeb489 was the first improvement take. The older error message
exists since bb446b689b that introduced snapshot imports. Two test
cases are added to cover the cases of an identifier with an incorrect
format and of a missing snapshot.
Author: Bharath Rupireddy
Reviewed-by: Andres Freund, Daniel Gustafsson, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACWmr=3KdxDkm8h7Zn1XxBoF6hdzq8WQyMn2y1OL5RYFrg@mail.gmail.com
There are a couple of places in frontend code that could make use
of this simple binary heap implementation. This commit makes
binaryheap usable in frontend code, much like commit 26aaf97b68 did
for StringInfo. Like StringInfo, the header file is left in lib/
to reduce the likelihood of unnecessary breakage.
The frontend version of binaryheap exposes a void *-based API since
frontend code does not have access to the Datum definitions. This
seemed like a better approach than switching all existing uses to
void * or making the Datum definitions available to frontend code.
Reviewed-by: Tom Lane, Alvaro Herrera
Discussion: https://postgr.es/m/3612876.1689443232%40sss.pgh.pa.us
cursor_to_xmlschema() assumed that any Portal must have a tupDesc,
which is not so. Add a defensive check.
It's plausible that this mistake occurred because of the rather
poorly chosen name of the lookup function SPI_cursor_find(),
which in such cases is returning something that isn't very much
like a cursor. Add some documentation to try to forestall future
errors of the same ilk.
Report and patch by Boyu Yang (docs changes by me). Back-patch
to all supported branches.
Discussion: https://postgr.es/m/dd343010-c637-434c-a8cb-418f53bda3b8.yangboyu.yby@alibaba-inc.com
expandRecordVariable() failed to adjust the parse nesting structure
correctly when recursing to inspect an outer-level Var. This could
result in assertion failures or core dumps in corner cases.
Likewise, get_name_for_var_field() failed to adjust the deparse
namespace stack correctly when recursing to inspect an outer-level
Var. In this case the likely result was a "bogus varno" error
while deparsing a view.
Per bug #18077 from Jingzhou Fu. Back-patch to all supported
branches.
Richard Guo, with some adjustments by me
Discussion: https://postgr.es/m/18077-b9db97c6e0ab45d8@postgresql.org
When tracking IO timing for WAL, the duration is what we calculate
based on the start and end timestamps, it's not what the variable
contains. Rename the timestamp variable to end to better communicate
what it contains. Original patch by Krishnakumar with additional
hacking to fix another occurrence by me.
Author: Krishnakumar R <kksrcv001@gmail.com>
Discussion: https://postgr.es/m/CAPMWgZ9f9o8awrQpjo8oxnNQ=bMDVPx00NE0QcDzvHD_ZrdLPw@mail.gmail.com
This became safe after commit 4b4798e138. The smgrcreate() call will
now register the segment for syncing at the next checkpoint, so we
don't need to sync it here. If a checkpoint happens before the
creation is WAL-logged, the records will be replayed when starting
recovery from the checkpoint. If a checkpoint happens after the WAL
logging, the checkpoint will fsync() it.
In the passing, clarify a comment in smgrDoPendingSyncs().
Discussion: https://www.postgresql.org/message-id/6e5bbc08-cdfc-b2b3-9e23-1a914b9850a9%40iki.fi
Reviewed-by: Robert Haas
The majority of all filenames are quoted in user facing error and
log messages, but a few were still printed without quotes. While
these filenames do not risk causing any ambiguity as their format
is strict, quote them anyways to be consistent across all logs.
Also concatenate a message to keep it one line to make it easier
to grep for in the code.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/080EEABE-6645-4A46-AB20-6285ADAC44FE@yesql.se
This reverts commit a0d87bcd9b, following a remark from Andres Frend
that the new error can be triggered with an incorrect SET TRANSACTION
SNAPSHOT command without being really helpful for the user as it uses
the internal file name.
Discussion: https://postgr.es/m/20230914020724.hlks7vunitvtbbz4@awork3.anarazel.de
Backpatch-through: 11
It's entirely possible for a logical slot to have a confirmed_flush LSN
higher than the last value saved on disk while not being marked as dirty.
Currently, it is not a major problem but a later patch adding support for
the upgrade of slots relies on that value being properly flushed to disk.
It can also help avoid processing the same transactions again in some
boundary cases after the clean shutdown and restart. Say, we process
some transactions for which we didn't send anything downstream (the
changes got filtered) but the confirm_flush LSN is updated due to
keepalives. As we don't flush the latest value of confirm_flush LSN, it
may lead to processing the same changes again without this patch.
The approach taken by this patch has been suggested by Ashutosh Bapat.
Author: Vignesh C, Julien Rouhaud, Kuroda Hayato
Reviewed-by: Amit Kapila, Dilip Kumar, Michael Paquier, Ashutosh Bapat, Peter Smith, Hou Zhijie
Discussion: http://postgr.es/m/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
Karina figured out that I (Andres) confused BufferUsage.temp_blks_written with
BufferUsage.local_blks_written in fcdda1e4b5.
Tests in core PG can't easily test this, as BufferUsage is just used for
EXPLAIN (ANALYZE, BUFFERS) and pg_stat_statements. Thus this commit adds tests
for this to pg_stat_statements.
Reported-by: Karina Litskevich <litskevichkarina@gmail.com>
Author: Karina Litskevich <litskevichkarina@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CACiT8ibxXA6+0amGikbeFhm8B84XdQVo6D0Qfd1pQ1s8zpsnxQ@mail.gmail.com
Backpatch: 16-, where fcdda1e4b5 was merged
When a snapshot file fails to be read in ImportSnapshot(), it would
issue an ERROR as "invalid snapshot identifier" when opening a stream
for it in read-only mode. This error message is reworded to be the same
as all the other messages used in this case on failure, which is useful
when debugging this area.
Thinko introduced by bb446b689b where snapshot imports have been
added. A backpatch down to 11 is done as this can improve any work
related to snapshot imports in older branches.
Author: Bharath Rupireddy
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/CALj2ACWmr=3KdxDkm8h7Zn1XxBoF6hdzq8WQyMn2y1OL5RYFrg@mail.gmail.com
Backpatch-through: 11
These code paths should not be reached normally, but if they are an
error with "(null)" as information for the collation provider would show
up if no locale is set, while we can assume that we are referring to
libc.
This refactors the code so as the provider is always reported even if no
locale is set. The name of the function where the error happens is
added, while on it, as it can be helpful for debugging.
Issue introduced by d87d548cd0, so backpatch down to 16.
Author: Michael Paquier, Ranier Vilela
Reviewed-by: Jeff Davis, Kyotaro Horiguchi
Discussion: https://postgr.es/m/7073610042fcf97e1bea2ce08b7e0214b5e11094.camel@j-davis.com
Backpatch-through: 16
Both 50e17ad28 and 29f45e299 mistakenly tried to record a plan dependency
on a function but mistakenly inverted the OidIsValid test. This meant
that we'd record a dependency only when the function's Oid was
InvalidOid. Clearly this was meant to *not* record the dependency in
that case.
50e17ad28 made this mistake first, then in v15 29f45e299 copied the same
mistake.
Reported-by: Tom Lane
Backpatch-through: 14, where 50e17ad28 first made this mistake
Discussion: https://postgr.es/m/2277537.1694301772@sss.pgh.pa.us
If an out-of-memory error was thrown at an unfortunate time,
ensure_record_cache_typmod_slot_exists() could leak memory and leave
behind a global state that produced an infinite loop on the next call.
Fix by merging RecordCacheArray and RecordIdentifierArray into a single
array. With only one allocation or re-allocation, there is no
intermediate state.
Back-patch to all supported releases.
Reported-by: "James Pang (chaolpan)" <chaolpan@cisco.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/PH0PR11MB519113E738814BDDA702EDADD6EFA%40PH0PR11MB5191.namprd11.prod.outlook.com
generation_counter includes time spent on both JIT:ing expressions
and tuple deforming which are configured independently via options
jit_expressions and jit_tuple_deforming. As they are combined in
the same counter it's not apparent what fraction of time the tuple
deforming takes.
This adds deform_counter dedicated to tuple deforming, which allows
seeing more directly the influence jit_tuple_deforming is having on
the query. The counter is exposed in EXPLAIN and pg_stat_statements
bumpin pg_stat_statements to 1.11.
Author: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/20220612091253.eegstkufdsu4kfls@erthalion.local
The WAIT_USE_WIN32 implementation of WaitEventSetWait() previously
reported at most one event per call, because that's what the underlying
WaitForMultipleObjects() call does.
We can make the behavior match the three Unix implementations by looping
until our output buffer is full, or there are no more events available
now. This makes no difference to most callers including the regular
FEBE socket code, since they ask for at most one event anyway. A
difference in socket accept priority might be perceived by end users
after commit 7389aad6 started using WaitEventSet in the postmaster.
With this commit, the accept order now matches Unix systems, servicing
listening sockets in round-robin order.
We decided it wasn't really a bug or worth back-patching, but it seems
good to align the behavior across platforms.
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version)
Tested-by: "Wei Wang (Fujitsu)" <wangw.fnst@fujitsu.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BA2dk29hr5zRP3HVJQ-_PncNJM6HVQ7aaYLXLRBZU-xw%40mail.gmail.com
Now that ATExecDropConstraint doesn't recurse anymore, so it's wrong to
test privileges "during recursion" there. Move the check to
dropconstraint_internal, which is the place where recursion occurs.
In passing, remove now-useless 'recursing' argument to
ATExecDropConstraint.
Discussion: https://postgr.es/m/202309051744.y4mndw5gwzhh@alvherre.pgsql
Now that we have catalogued not-null constraints, our information_schema
definition can be updated to grab those rather than fabricate synthetic
definitions.
Note that we still don't have catalog rows for not-null constraints on
domains, but we've never had not-null constraints listed in
information_schema, so that's a problem to be solved separately.
Co-authored-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/81b461c4-edab-5d8c-2f88-203108425340@enterprisedb.com
Discussion: https://postgr.es/m/202309041710.psytrxlsiqex@alvherre.pgsql
We shouldn't be doing non-trivial work in signal handlers in general,
and in this case the handler could reach unsafe code and corrupt state.
It also clobbered its own "reason" code.
Move all recovery conflict decision logic into the next
CHECK_FOR_INTERRUPTS(), and have the signal handler just set flags and
the latch, following the standard pattern. Since there are several
different "reasons", use a separate flag for each.
With this refactoring, the recovery conflict system no longer
piggy-backs on top of the regular query cancelation mechanism, but
instead raises an error directly if it decides that is necessary. It
still needs to respect QueryCancelHoldoffCount, because otherwise the
FEBE protocol might get out of sync (see commit 2b3a8b20c2).
This fixes one class of intermittent failure in the new
031_recovery_conflict.pl test added by commit 9f8a050f, though the buggy
coding is much older. Failures outside contrived testing seem to be
very rare (or perhaps incorrectly attributed) in the field, based on
lack of reports.
No back-patch for now due to complexity and release schedule. We have
the option to back-patch into 16 later, as 16 has prerequisite commit
bea3d7e.
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version)
Reviewed-by: Michael Paquier <michael@paquier.xyz> (earlier version)
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier version)
Tested-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
Discussion: https://postgr.es/m/CALj2ACVr8au2J_9D88UfRCi0JdWhyQDDxAcSVav0B0irx9nXEg%40mail.gmail.com
This commit renames RecoveryInitSyncMethod to DataDirSyncMethod and
moves it to common/file_utils.h. This is preparatory work for a
follow-up commit that will allow specifying the synchronization
method in frontend utilities such as pg_upgrade and pg_basebackup.
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/ZN2ZB4afQ2JbR9TA%40paquier.xyz
This file is now made of two columns, removing the column listing the
user-visible strings used in the system views and the documentation:
- Enum definitions for each class without the prefix "WAIT_EVENT_", so
as this information can be grepped in the code and wait_event_names.txt
at the same time.
- Description in the documentation.
The wait event names are now generated from the enum objects in
CamelCase, with the underscores removed. The data generated for wait
events is consistent with what was produced by 414f6c0fb7.
This has the advantage to remove WAIT_EVENT_DOCONLY, which was a
placeholder for the wait event types Lock and LWLock as these two only
require the generation of the documentation.
Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/ZOxVHQwEC/9X/p/z@paquier.xyz
The event names use the same case-insensitive characters, hence applying
lower() or upper() to the monitoring queries allows the detection of the
same events as before this change. It is possible to cross-check the
data with the system view pg_wait_events, for instance, with a query
like that showing no differences:
SELECT lower(type), lower(name), description
FROM pg_wait_events ORDER BY 1, 2;
This will help in the introduction of more simplifications in the format
of wait_event_names. Some of the enum values in the code had to be
renamed a bit to follow the same convention naming across the board.
Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/ZOxVHQwEC/9X/p/z@paquier.xyz
Presently, frontend code that needs to use these macros must either
include storage/fd.h, which declares several frontend-unsafe
functions, or duplicate the macros. This commit moves these macros
to common/file_utils.h, which is safe for both frontend and backend
code. Consequently, we can also remove the duplicated macros in
pg_checksums and stop including storage/fd.h in pg_rewind.
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/ZOP5qoUualu5xl2Z%40paquier.xyz
This lock was introduced before memory barrier support was added,
and it is only used to guarantee proper memory ordering when
KnownAssignedXidsAdd() appends to the array without a lock. Now
that such memory barrier support exists, we can remove the lock and
use barriers instead.
Suggested-by: Tom Lane
Author: Michail Nikolaev
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CANtu0oh0si%3DjG5z_fLeFtmYcETssQ08kLEa8b6TQqDm_cinroA%40mail.gmail.com
If all the bits of a key in a tsvector are true (marked with ALLISTRUE),
gtsvectorout() would show the following description:
"0 true bits, 0 false bits"
This is confusing, as all the bits are true, but this would be
equivalent to the information if siglen is 0.
This commit improves the output so as "all true bits" show instead in
this case. Alexander has proposed a regression test for pageinspect,
not included here as it is rather expensive compared to its coverage
value.
Author: Alexander Lakhin
Discussion: https://postgr.es/m/17950-6c80a8d2b94ec695@postgresql.org
This could lead to an imprecise choice when splitting an index page of a
GiST index on a tsvector, deciding which entries should remain on the
old page and which entries should move to a new page.
This is wrong since tsearch2 has been moved into core with commit
140d4ebcb4, so backpatch all the way down. This error has been
spotted by valgrind.
Author: Alexander Lakhin
Discussion: https://postgr.es/m/17950-6c80a8d2b94ec695@postgresql.org
Backpatch-through: 11
Dropping a database while a connection is attempted on it was able to
lead to the presence of valid database entries in shared statistics.
The issue is that MyDatabaseId was getting set too early than it should,
as, if the connection attempted on the dropped database fails when
renamed or dropped, the shutdown callback of the shared statistics would
finish by re-inserting a correct entry related to the database already
dropped.
As analyzed by the bug reporters, this issue could lead to phantom
entries in the database list maintained by the autovacuum launcher
(in rebuild_database_list()) if the database dropped was part of the
database list when it was still valid. After the database was dropped,
it would remain the highest on the list of databases to considered by
the autovacuum worker as things to process. This would prevent
autovacuum jobs to happen on all the other databases still present.
The commit fixes this issue by delaying setting MyDatabaseId until the
database existence has been re-checked with the second scan on
pg_database after getting a shared lock on it, and by switching
pgstat_update_dbstats() so as nothing happens if MyDatabaseId is not
valid.
Issue introduced by 5891c7a8ed, so backpatch down to 15.
Reported-by: Will Mortensen, Jacob Speidel
Analyzed-by: Will Mortensen, Jacob Speidel
Author: Andres Freund
Discussion: https://postgr.es/m/17973-bca1f7d5c14f601e@postgresql.org
Backpatch-through: 15
When a partitioned table has a primary key, trying to find the
corresponding not-null constraint for that column would come up empty,
causing code that's trying to check said not-null constraint to crash.
Fix by only running the check when the not-null constraint exists.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/d57b4a69-7394-3146-5976-9a1ef27e7972@gmail.com
The comment in heapgettup_advance_block() says that it reports the
scan position before checking for end of scan, but that didn't match
the code. The code was refactored in commit 7ae0ab0ad9, which
inadvertently changed the order of the check and reporting. Change it
back.
This caused a few regression test failures with a small shared_buffers
setting like 10 MB. The 'portals' and 'cluster' tests perform seqscans
that are large enough that sync seqscans kick in. When the sync scan
position is not updated at end of scan, the next seq scan doesn't
start at the beginning of the table, and the test queries are
sensitive to that.
Reviewed-by: Melanie Plageman, David Rowley
Discussion: https://www.postgresql.org/message-id/6f991389-ae22-d844-a9d8-9aceb7c01a9a@iki.fi
Backpatch-through: 16
There was some confusion in the ObjectProperty entry for transforms.
Some fields had values that were apparently meant for a different
field. Also, some fields were not assigned, which is okay for most
fields, but not for all. In particular, for .oid_catcache_id,
.name_catcache_id, and .objtype, zero is a valid value, so we need to
use -1 if not applicable. It has apparently been like that from the
very beginning (commit cac7658205). The faulty values were not
actually reachable, so it's not a big problem in practice, but we
should make it correct.
Discussion: https://www.postgresql.org/message-id/flat/75ae5875-3abc-dafc-8aec-73247ed41cde@eisentraut.org
The code is able to compile already without warnings under
-Wshadow=compatible-local, which is itself already enabled in the tree,
and the ones fixed here showed up with the more restrictive -Wshadow.
There are more of these that we may want to look at, and the ones fixed
here made the code confusing.
Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+PuR0y4ofNOxi691VTVWmBfScHV9AaBMGSpeh8+DKp81Nw@mail.gmail.com
Unlike the other pg_stat_get_backend* functions,
pg_stat_get_backend_subxact() looks up the backend entry by using
its integer argument as a 1-based index in an internal array. The
other functions look for the entry with the matching session
backend ID. These numbers often match, but that isn't reliably
true.
This commit resolves this discrepancy by introducing
pgstat_get_local_beentry_by_backend_id() and using it in
pg_stat_get_backend_subxact(). We cannot use
pgstat_get_beentry_by_backend_id() because it returns a
PgBackendStatus, which lacks the locally computed additions
available in LocalPgBackendStatus that are required by
pg_stat_get_backend_subxact().
Author: Ian Barwick
Reviewed-by: Sami Imseih, Michael Paquier, Robert Haas
Discussion: https://postgr.es/m/CAB8KJ%3Dj-ACb3H4L9a_b3ZG3iCYDW5aEu3WsPAzkm2S7JzS1Few%40mail.gmail.com
Backpatch-through: 16
Presently, pgstat_fetch_stat_beentry() accepts a session's backend
ID as its argument, and pgstat_fetch_stat_local_beentry() accepts a
1-based index in an internal array as its argument. The former is
typically used wherever a user must provide a backend ID, and the
latter is usually used internally when looping over all entries in
the array. This difference was first introduced by d7e39d72ca.
Before that commit, both functions accepted a 1-based index to the
internal array.
This commit renames these two functions to make it clear whether
they use the backend ID or the 1-based index to look up the entry.
This is preparatory work for a follow-up change that will introduce
a function for looking up a LocalPgBackendStatus using a backend
ID.
Reviewed-by: Ian Barwick, Sami Imseih, Michael Paquier, Robert Haas
Discussion: https://postgr.es/m/CAB8KJ%3Dj-ACb3H4L9a_b3ZG3iCYDW5aEu3WsPAzkm2S7JzS1Few%40mail.gmail.com
Backpatch-through: 16
nFreeBlocks, defined as a long, stores the number of free blocks in a
logical tape. ltsGetFreeBlock() has been using an int to store the
value of nFreeBlocks, which could lead to overflows on platforms where
long and int are not the same size (in short everything except Windows
where long is 4 bytes).
The problematic intermediate variable is switched to be a long instead
of an int.
Issue introduced by c02fdc9223, so backpatch down to 13.
Author: Ranier vilela
Reviewed-by: Peter Geoghegan, David Rowley
Discussion: https://postgr.es/m/CAEudQApLDWCBR_xmwNjGBrDo+f+S4E87x3s7-+hoaKqYdtC4JQ@mail.gmail.com
Backpatch-through: 13
It makes no sense to add a NO INHERIT not-null constraint to a child
table that already has one in that column inherited from its parent.
Disallow that, and add tests for the relevant cases.
Per complaint from Kyotaro Horiguchi. I also used part of his proposed
patch.
Co-authored-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20230828.161658.1184657435220765047.horikyota.ntt@gmail.com
The logical_replication_mode GUC is intended for testing and debugging
purposes, but its current name may be misleading and encourage users to make
unnecessary changes.
To avoid confusion, renaming the GUC to a less misleading name
debug_logical_replication_streaming that casual users are less likely to mistakenly
assume needs to be modified in a regular logical replication setup.
Author: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/d672d774-c44b-6fec-f993-793e744f169a%40eisentraut.org
After commit b0bea38705, syslogger prints 63 warnings about failing to
close a listen socket at postmaster startup. That's because the
syslogger process forks before the ListenSockets array is initialized,
so ClosePostmasterPorts() calls "close(0)" 64 times. The first call
succeeds, because fd 0 is stdin.
This has been like this since commit 9a86f03b4e in version 13, which
moved the SysLogger_Start() call to before initializing ListenSockets.
We just didn't notice until commit b0bea38705 added the LOG message.
Reported by Michael Paquier and Jeff Janes.
Author: Michael Paquier
Discussion: https://www.postgresql.org/message-id/ZOvvuQe0rdj2slA9%40paquier.xyz
Discussion: https://www.postgresql.org/message-id/ZO0fgDwVw2SUJiZx@paquier.xyz#482670177eb4eaf4c9f03c1eed963e5f
Backpatch-through: 13
Since its introduction in 10074651e3, pg_promote() has been returning
a false status in three cases:
- SIGUSR1 not sent to the postmaster process.
- Postmaster death during standby promotion.
- Standby not promoted within the specified wait time.
An application calling this function will have a hard time understanding
what a false state returned actually means.
Per discussion, this switches the two first states to fail rather than
return a "false" status, making the second case more consistent with the
existing CHECK_FOR_INTERRUPTS in the wait loop. False is only returned
when the promotion is not completed within the specified time (60s by
default).
Author: Ashutosh Sharma
Reviewed-by: Fujii Masao, Laurenz Albe, Michael Paquier
Discussion: https://postgr.es/m/CAE9k0P=QTrwptL0t4J0fuBRDDjgsT-0PVKd-ikd96i1hyL7Bcg@mail.gmail.com
Make the primary messages more compact and make the detail messages
uniform. In initdb.c and pg_resetwal.c, use the newish
option_parse_int() to simplify some of the option parsing. For the
backend GUC wal_segment_size, add a GUC check hook to do the
verification instead of coding it in bootstrap.c. This might be
overkill, but that way the check is in the right place and it becomes
more self-documenting.
In passing, make pg_controldata use the logging API for warning
messages.
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/flat/9939aa8a-d7be-da2c-7715-0a0b5535a1f7@eisentraut.org
Interval values now generate an error when the user has multiple
consecutive units or a unit without a value. Previously, it was
possible to specify multiple units consecutively which is contrary to
what the documentation allows, so it was possible to finish with
confusing interval values.
This is a follow-up of the work done in 165d581f14.
Author: Joseph Koshakow
Reviewed-by: Jacob Champion, Gurjeet Singh, Reid Thompson
Discussion: https://postgr.es/m/CAAvxfHd-yNO+XYnUxL=GaNZ1n+eE0V-oE0+-cC1jdjdU0KS3iw@mail.gmail.com
This commit Restrict the unit "ago" to only appear at the end of the
interval. According to the documentation, a direction can only be
defined at the end of an interval, but it was possible to define it in
the middle of the string or define it multiple times.
In spirit, this is similar to the error handling improvements done in
5b3c595355 or bcc704b524.
Author: Joseph Koshakow
Reviewed-by: Jacob Champion, Gurjeet Singh, Reid Thompson
Discussion: https://postgr.es/m/CAAvxfHd-yNO+XYnUxL=GaNZ1n+eE0V-oE0+-cC1jdjdU0KS3iw@mail.gmail.com
This commit removes some dead code related to the unit type RESERV,
whose last use has been removed from the unit lookup table used for
intervals ("deltatktbl" in datetime.c) in 666cbae16d. Before that,
RESERV was used as an equivalent of "invalid", but that's now
unreachable.
Author: Joseph Koshakow
Reviewed-by: Jacob Champion, Gurjeet Singh, Reid Thompson
Discussion: https://postgr.es/m/CAAvxfHd-yNO+XYnUxL=GaNZ1n+eE0V-oE0+-cC1jdjdU0KS3iw@mail.gmail.com
This commit switches query jumbling so as prepared statement names are
treated as constants in DeallocateStmt. A boolean field is added to
DeallocateStmt to make a distinction between ALL and named prepared
statements, as "name" was used to make this difference before, NULL
meaning DEALLOCATE ALL.
Prior to this commit, DEALLOCATE was not tracked in pg_stat_statements,
for the reason that it was not possible to treat its name parameter as a
constant. Now that query jumbling applies to all the utility nodes,
this reason does not apply anymore.
Like 638d42a3c5, this can be a huge advantage for monitoring where
prepared statement names are randomly generated, preventing bloat in
pg_stat_statements. A couple of tests are added to track the new
behavior.
Author: Dagfinn Ilmari Mannsåker, Michael Paquier
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/ZMhT9kNtJJsHw6jK@paquier.xyz
Adding an extra LOG for connections that have not set an authn ID, like
when the "trust" authentication method is used, is useful for audit
purposes.
A couple of TAP tests for SSL and authentication need to be tweaked to
adapt to this new LOG generated, as some scenarios expected no logs but
they now get a hit.
Reported-by: Shaun Thomas
Author: Jacob Champion
Reviewed-by: Robert Haas, Michael Paquier
Discussion: https://postgr.es/m/CAFdbL1N7-GF-ZXKaB3XuGA+CkSmnjFvqb8hgjMnDfd+uhL2u-A@mail.gmail.com
We now create contype='n' pg_constraint rows for not-null constraints.
We propagate these constraints to other tables during operations such as
adding inheritance relationships, creating and attaching partitions and
creating tables LIKE other tables. We also spawn not-null constraints
for inheritance child tables when their parents have primary keys.
These related constraints mostly follow the well-known rules of
conislocal and coninhcount that we have for CHECK constraints, with some
adaptations: for example, as opposed to CHECK constraints, we don't
match not-null ones by name when descending a hierarchy to alter it,
instead matching by column name that they apply to. This means we don't
require the constraint names to be identical across a hierarchy.
For now, we omit them for system catalogs. Maybe this is worth
reconsidering. We don't support NOT VALID nor DEFERRABLE clauses
either; these can be added as separate features later (this patch is
already large and complicated enough.)
psql shows these constraints in \d+.
pg_dump requires some ad-hoc hacks, particularly when dumping a primary
key. We now create one "throwaway" not-null constraint for each column
in the PK together with the CREATE TABLE command, and once the PK is
created, all those throwaway constraints are removed. This avoids
having to check each tuple for nullness when the dump restores the
primary key creation.
pg_upgrading from an older release requires a somewhat brittle procedure
to create a constraint state that matches what would be created if the
database were being created fresh in Postgres 17. I have tested all the
scenarios I could think of, and it works correctly as far as I can tell,
but I could have neglected weird cases.
This patch has been very long in the making. The first patch was
written by Bernd Helmle in 2010 to add a new pg_constraint.contype value
('n'), which I (Álvaro) then hijacked in 2011 and 2012, until that one
was killed by the realization that we ought to use contype='c' instead:
manufactured CHECK constraints. However, later SQL standard
development, as well as nonobvious emergent properties of that design
(mostly, failure to distinguish them from "normal" CHECK constraints as
well as the performance implication of having to test the CHECK
expression) led us to reconsider this choice, so now the current
implementation uses contype='n' again. During Postgres 16 this had
already been introduced by commit e056c557ae, but there were some
problems mainly with the pg_upgrade procedure that couldn't be fixed in
reasonable time, so it was reverted.
In 2016 Vitaly Burovoy also worked on this feature[1] but found no
consensus for his proposed approach, which was claimed to be closer to
the letter of the standard, requiring an additional pg_attribute column
to track the OID of the not-null constraint for that column.
[1] https://postgr.es/m/CAKOSWNkN6HSyatuys8xZxzRCR-KL1OkHS5-b9qd9bf1Rad3PLA@mail.gmail.com
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Author: Bernd Helmle <mailings@oopsware.de>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Commit 2a8b40e36 introduces the worker type field for logical replication
workers, but forgot to reset the type when the worker exits. This can lead
to recognizing a stopped worker as a valid logical replication worker.
Fix it by resetting the worker type and additionally adding the safeguard
to not use LogicalRepWorker until ->in_use is verified.
Reported-by: Thomas Munro based on cfbot reports.
Author: Hou Zhijie, Alvaro Herrera
Reviewed-by: Amit Kapila
Discussion: http://postgr.es/m/CA+hUKGK2RQh4LifVgBmkHsCYChP-65UwGXOmnCzYVa5aAt4GWg@mail.gmail.com
Revalidation of a plancache entry (after a cache invalidation event)
requires acquiring a snapshot. Normally that is harmless, but not
if the cached statement is one that needs to run without acquiring a
snapshot. We were already aware of that for TransactionStmts,
but for some reason hadn't extrapolated to the other statements that
PlannedStmtRequiresSnapshot() knows mustn't set a snapshot. This can
lead to unexpected failures of commands such as SET TRANSACTION
ISOLATION LEVEL. We can fix it in the same way, by excluding those
command types from revalidation.
However, we can do even better than that: there is no need to
revalidate for any statement type for which parse analysis, rewrite,
and plan steps do nothing interesting, which is nearly all utility
commands. To mechanize this, invent a parser function
stmt_requires_parse_analysis() that tells whether parse analysis does
anything beyond wrapping a CMD_UTILITY Query around the raw parse
tree. If that's what it does, then rewrite and plan will just
skip the Query, so that it is not possible for the same raw parse
tree to produce a different plan tree after cache invalidation.
stmt_requires_parse_analysis() is basically equivalent to the
existing function analyze_requires_snapshot(), except that for
obscure reasons that function omits ReturnStmt and CallStmt.
It is unclear whether those were oversights or intentional.
I have not been able to demonstrate a bug from not acquiring a
snapshot while analyzing these commands, but at best it seems mighty
fragile. It seems safer to acquire a snapshot for parse analysis of
these commands too, which allows making stmt_requires_parse_analysis
and analyze_requires_snapshot equivalent.
In passing this fixes a second bug, which is that ResetPlanCache
would exclude ReturnStmts and CallStmts from revalidation.
That's surely *not* safe, since they contain parsable expressions.
Per bug #18059 from Pavel Kulakov. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/18059-79c692f036b25346@postgresql.org
It's good hygiene if e.g. an extension launches a subprogram when
being loaded. We went through some effort to close them in the child
process in EXEC_BACKEND mode, but it's better to not hand them down to
the child process in the first place. We still need to close them
after fork when !EXEC_BACKEND, but it's a little simpler.
In the passing, LOG a message if closing the client connection or
listen socket fails. Shouldn't happen, but if it does, would be nice
to know.
Reviewed-by: Tristan Partin, Andres Freund, Thomas Munro
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
The SnapBuildRestoreContents() used a const value in the error message to
indicate the size in bytes it was expecting to read from the serialized
snapshot file. Fix it by reporting the size that was actually passed.
Author: Hou Zhijie
Reviewed-by: Amit Kapila
Backpatch-through: 16
Discussion: http://postgr.es/m/OS0PR01MB5716D408364F7DF32221C08D941FA@OS0PR01MB5716.jpnprd01.prod.outlook.com
This commit introduces functions for converting numbers to their
equivalent binary and octal representations. Also, the base
conversion code for these functions and to_hex() has been moved to
a common helper function.
Co-authored-by: Eric Radman
Reviewed-by: Ian Barwick, Dag Lem, Vignesh C, Tom Lane, Peter Eisentraut, Kirk Wolak, Vik Fearing, John Naylor, Dean Rasheed
Discussion: https://postgr.es/m/Y6IyTQQ/TsD5wnsH%40vm3.eradman.com
Some of the ambuildempty functions used smgrwrite() directly, followed
by smgrimmedsync(). A few small problems with that:
Firstly, one is supposed to use smgrextend() when extending a
relation, not smgrwrite(). It doesn't make much difference in
production builds. smgrextend() updates the relation size cache, so
you miss that, but that's harmless because we never use the cached
relation size of an init fork. But if you compile with
CHECK_WRITE_VS_EXTEND, you get an assertion failure.
Secondly, the smgrwrite() calls were performed before WAL-logging, so
the page image written to disk had 0/0 as the LSN, not the LSN of the
WAL record. That's also harmless in practice, but seems sloppy.
Thirdly, it's better to use the buffer cache, because then you don't
need to smgrimmedsync() the relation to disk, which adds latency.
Bypassing the cache makes sense for bulk operations like index
creation, but not when you're just initializing an empty index.
Creation of unlogged tables is hardly performance bottleneck in any
real world applications, but nevertheless.
Backpatch to v16, but no further. These issues should be harmless in
practice, so better to not rock the boat in older branches.
Reviewed-by: Robert Haas
Discussion: https://www.postgresql.org/message-id/6e5bbc08-cdfc-b2b3-9e23-1a914b9850a9@iki.fi
This commit introduces descriptively-named macros for the
identifiers used in wire protocol messages. These new macros are
placed in a new header file so that they can be easily used by
third-party code.
Author: Dave Cramer
Reviewed-by: Alvaro Herrera, Tatsuo Ishii, Peter Smith, Robert Haas, Tom Lane, Peter Eisentraut, Michael Paquier
Discussion: https://postgr.es/m/CADK3HHKbBmK-PKf1bPNFoMC%2BoBt%2BpD9PH8h5nvmBQskEHm-Ehw%40mail.gmail.com
Commit 31966b15 invented a way for functions dealing with relation
extension to accept a Relation in online code and an SMgrRelation in
recovery code. It seems highly likely that future bufmgr.c interfaces
will face the same problem, and need to do something similar.
Generalize the names so that each interface doesn't have to re-invent
the wheel.
Back-patch to 16. Since extension AM authors might start using the
constructor macros once 16 ships, we agreed to do the rename in 16
rather than waiting for 17.
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2B6tLD2BhpRWycEoti6LVLyQq457UL4ticP5xd8LqHySA%40mail.gmail.com
Attribute missing values might be needed past the lifetime of the tuple
descriptors from which they are extracted. To avoid possibly using
pointers for by-reference values which might thus be left dangling, we
cache a datumCopy'd version of the datum in the TopMemoryContext. Since
we first search for the value this only needs to be done once per
session for any such value.
Original complaint from Tom Lane, idea for mitigation by Andrew Dunstan,
tweaked by Tom Lane.
Backpatch to version 11 where missing values were introduced.
Discussion: https://postgr.es/m/1306569.1687978174@sss.pgh.pa.us
This commit fixes the function of $subject for shared relations. This
feature has been added by e042678. Unfortunately, this new behavior got
removed by 5891c7a when moving statistics to shared memory.
Reported-by: Mitsuru Hinata
Author: Masahiro Ikeda
Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada
Discussion: https://postgr.es/m/7cc69f863d9b1bc677544e3accd0e4b4@oss.nttdata.com
Backpatch-through: 15
This new view, wrapped around a SRF, shows some information known about
wait events, as of:
- Name.
- Type (Activity, I/O, Extension, etc.).
- Description.
All the information retrieved comes from wait_event_names.txt, and the
description is the same as the documentation with filters applied to
remove any XML markups. This view is useful when joined with
pg_stat_activity to get the description of a wait event reported.
Custom wait events for extensions are included in the view.
Original idea by Yves Colin.
Author: Bertrand Drouvot
Reviewed-by: Kyotaro Horiguchi, Masahiro Ikeda, Tom Lane, Michael
Paquier
Discussion: https://postgr.es/m/0e2ae164-dc89-03c3-cf7f-de86378053ac@gmail.com
The OAT hooks are added in ALTER TABLE for the following subcommands:
- { ENABLE | DISABLE | [NO] FORCE } ROW LEVEL SECURITY
- { ENABLE | DISABLE } TRIGGER
- { ENABLE | DISABLE } RULE. Note that there was hook for pg_rewrite,
but not for relation ALTER'ed in pg_class.
Tests are added to test_oat_hook for all the subcommand patterns gaining
hooks here. Based on an ask from Legs Mansion.
Discussion: https://postgr.es/m/tencent_083B3850655AC6EE04FA0A400766D3FE8309@qq.com
The source code comment already said that the presence of the field
element_types.domain_default might be a bug in the standard, since it
never made sense there. Indeed, the field is gone in newer versions
of the standard. So just remove it.
Previously, if a specialized comparator found equal datum1 keys,
the "comparetup" function would repeat the comparison on the
datum before proceeding with the unabbreviated first key
and/or additional sort keys.
Move comparing additional sort keys into "tiebreak" functions so
that specialized comparators can call these directly if needed,
avoiding duplicate work.
Reviewed by David Rowley
Discussion: https://postgr.es/m/CAFBsxsGaVfUrjTghpf%3DkDBYY%3DjWx1PN-fuusVe7Vw5s0XqGdGw%40mail.gmail.com
This was disabled in commit 6f80a8d9c due to the lack of support for
handling of pseudoconstant quals assigned to replaced joins in
createplan.c. To re-allow it, this patch adds the support by 1)
modifying the ForeignPath and CustomPath structs so that if they
represent foreign and custom scans replacing a join with a scan, they
store the list of RestrictInfo nodes to apply to the join, as in
JoinPaths, and by 2) modifying create_scan_plan() in createplan.c so
that it uses that list in that case, instead of the baserestrictinfo
list, to get pseudoconstant quals assigned to the join, as mentioned in
the commit message for that commit.
Important item for the release notes: this is non-backwards-compatible
since it modifies the ForeignPath and CustomPath structs, as mentioned
above, and changes the argument lists for FDW helper functions
create_foreignscan_path(), create_foreign_join_path(), and
create_foreign_upper_path().
Richard Guo, with some additional changes by me, reviewed by Nishant
Sharma, Suraj Kharage, and Richard Guo.
Discussion: https://postgr.es/m/CADrsxdbcN1vejBaf8a%2BQhrZY5PXL-04mCd4GDu6qm6FigDZd6Q%40mail.gmail.com
Commit b91dd9de was concerned with a theoretical problem with our
non-atomic condition variable operations. If you stop sleeping, and
then cancel the sleep in a separate step, you might be signaled in
between, and that could be lost. That doesn't matter for callers of
ConditionVariableBroadcast(), but callers of ConditionVariableSignal()
might be upset if a signal went missing like this.
Commit bc971f4025 interacted badly with that logic, because it doesn't
use ConditionVariableSleep(), which would normally put us back in the
wait list. ConditionVariableCancelSleep() would be confused and think
we'd received an extra signal, and try to forward it to another backend,
resulting in wakeup storms.
New idea: ConditionVariableCancelSleep() can just return true if we've
been signaled. Hypothetical users of ConditionVariableSignal() would
then still have a way to deal with rare lost signals if they are
concerned about that problem.
Back-patch to 16, where bc971f4025 arrived.
Reported-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/2840876b-4cfe-240f-0a7e-29ffd66711e7%40enterprisedb.com
The new relation extension logic, introduced in 00d1e02be2, could lead to
slowdowns in some scenarios. E.g., when loading narrow rows into a table using
COPY, the caller of RelationGetBufferForTuple() will only request a small
number of pages. Without concurrency, we just extended using pwritev() in that
case. However, if there is *some* concurrency, we switched between extending
by a small number of pages and a larger number of pages, depending on the
number of waiters for the relation extension logic. However, some
filesystems, XFS in particular, do not perform well when switching between
extending files using fallocate() and pwritev().
To avoid that issue, remember the number of prior relation extensions in
BulkInsertState and extend more aggressively if there were prior relation
extensions. That not just avoids the aforementioned slowdown, but also leads
to noticeable performance gains in other situations, primarily due to
extending more aggressively when there is no concurrency. I should have done
it this way from the get go.
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAD21AoDvDmUQeJtZrau1ovnT_smN940=Kp6mszNGK3bq9yRN6g@mail.gmail.com
Backpatch: 16-, where the new relation extension code was added
Currently, the names of the custom wait event must be registered for
each backend, requiring all these to link to the shared memory area of
an extension, even if these are not loaded with
shared_preload_libraries.
This patch relaxes the constraints related to this infrastructure by
storing the wait events and their names in two dynamic hash tables in
shared memory. This has the advantage to simplify the registration of
custom wait events to a single routine call that returns an event ID
ready for consumption:
uint32 WaitEventExtensionNew(const char *wait_event_name);
The caller of this routine can then cache locally the ID returned, to be
used for pgstat_report_wait_start(), WaitLatch() or a similar routine.
The implementation uses two hash tables: one with a key based on the
event name to avoid duplicates and a second using the event ID as key
for event lookups, like on pg_stat_activity. These tables can hold a
minimum of 16 entries, and a maximum of 128 entries, which should be plenty
enough.
The code changes done in worker_spi show how things are simplified (most
of the code removed in this commit comes from there):
- worker_spi_init() is gone.
- No more shared memory hooks required (size requested and
initialization).
- The custom wait event ID is cached in the process that needs to set
it, with one single call to WaitEventExtensionNew() to retrieve it.
Per suggestion from Andres Freund.
Author: Masahiro Ikeda, with a few tweaks from me.
Discussion: https://postgr.es/m/20230801032349.aaiuvhtrcvvcwzcx@awork3.anarazel.de
We deduce a LogicalRepWorker's type from the values of several different
fields ('relid' and 'leader_pid') whenever logic needs to know it.
In fact, the logical replication worker type is already known at the time
of launching the LogicalRepWorker and it never changes for the lifetime of
that process. Instead of deducing the type, it is simpler to just store it
one time, and access it directly thereafter.
Author: Peter Smith
Reviewed-by: Amit Kapila, Bharath Rupireddy
Discussion: http://postgr.es/m/CAHut+PttPSuP0yoZ=9zLDXKqTJ=d0bhxwKaEaNcaym1XqcvDEg@mail.gmail.com
pg_logical_emit_message(false, '_', repeat('x', 1069547465)) failed with
self-contradictory message "WAL record would be 1069547520 bytes (of
maximum 1069547520 bytes)". There's no particular benefit from allowing
or denying one byte in either direction; XLogRecordMaxSize could rise a
few megabytes without trouble. Hence, this is just for cleanliness.
Back-patch to v16, where this check first appeared.
This relies on the "location" field added to TransactionStmt in 31de7e6,
now applied to the "gid" field used by 2PC commands. These commands are
now reported like:
COMMIT PREPARED $1
PREPARE TRANSACTION $1
ROLLBACK PREPARED $1
Applying constants for these commands is a huge advantage for workloads
that rely a lot on 2PC commands with different GIDs. Some tests are
added to track the new behavior.
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/ZMhT9kNtJJsHw6jK@paquier.xyz
Store function config settings in lists to avoid the need to parse and
allocate for each function execution.
Speedup is modest but significant. Additionally, this change also
seems cleaner and supports some other performance improvements under
discussion.
Discussion: https://postgr.es/m/04c8592dbd694e4114a3ed87139a7a04e4363030.camel@j-davis.com
Reviewed-by: Nathan Bossart
Commit 19d8e2308b changed the list of set-of-columns that can be
returned by RelationGetIndexAttrBitmap, but didn't update its
"documentation". That was pretty hard to read already, so rewrite to
make it more comprehensible, adding the missing values while at it.
Backpatch to 16, like that commit.
Discussion: https://postgr.es/m/20230809091155.7c7f3gttjk3dj4ze@alvherre.pgsql
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
fa88928 has introduced wait_event_names.txt, and some of its entries had
some documentation fields with more information than necessary.
This commit brings back the description of all the wait events to be
consistent with the older stable branches. Five descriptions were
incorrect.
Author: Bertrand Drouvot
Discussion: https://postgr.es/m/e378989e-1899-643a-dec1-10f691a0a105@gmail.com
Substituting such values in extension scripts facilitated SQL injection
when @extowner@, @extschema@, or @extschema:...@ appeared inside a
quoting construct (dollar quoting, '', or ""). No bundled extension was
vulnerable. Vulnerable uses do appear in a documentation example and in
non-bundled extensions. Hence, the attack prerequisite was an
administrator having installed files of a vulnerable, trusted,
non-bundled extension. Subject to that prerequisite, this enabled an
attacker having database-level CREATE privilege to execute arbitrary
code as the bootstrap superuser. By blocking this attack in the core
server, there's no need to modify individual extensions. Back-patch to
v11 (all supported versions).
Reported by Micah Gate, Valerie Woolard, Tim Carey-Smith, and Christoph
Berg.
Security: CVE-2023-39417
The use of Memoize was already disabled in normal joins when the join
conditions had volatile functions per the code in
match_opclause_to_indexcol(). Ordinarily, the parameterization for the
inner side of a nested loop will be an Index Scan or at least eventually
lead to an index scan (perhaps nested several joins deep). However, for
lateral joins, that's not the case and seq scans can be parameterized
too, so we can't rely on match_opclause_to_indexcol().
Here we explicitly check the parameterization for volatile functions and
don't consider the generation of a Memoize path when such functions
are present.
Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs49nHFnHbpepLsv_yF3qkpCS4BdB-v8HoJVv8_=Oat0u_w@mail.gmail.com
Backpatch-through: 14, where Memoize was introduced
If MERGE executes an UPDATE action on a table with row-level security,
the code incorrectly applied the WITH CHECK clauses from the target
table's INSERT policies to new rows, instead of the clauses from the
table's UPDATE policies. In addition, it failed to check new rows
against the target table's SELECT policies, if SELECT permissions were
required (likely to always be the case).
In addition, if MERGE executes a DO NOTHING action for matched rows,
the code incorrectly applied the USING clauses from the target table's
DELETE policies to existing target tuples. These policies were applied
as checks that would throw an error, if they did not pass.
Fix this, so that a MERGE UPDATE action applies the same RLS policies
as a plain UPDATE query with a WHERE clause, and a DO NOTHING action
does not apply any RLS checks (other than adding clauses from SELECT
policies to the join).
Back-patch to v15, where MERGE was introduced.
Dean Rasheed, reviewed by Stephen Frost.
Security: CVE-2023-39418
The code in join_search_one_level was a bit convoluted. With a casual
glance, you might think that other_rels_list was being set to something
other than joinrels[1] when level == 2, however, joinrels[level - 1] is
joinrels[1] when level == 2, so nothing special needs to happen to set
other_rels_list. Let's clean that up to avoid confusing anyone.
In passing, we may as well modernize the loop in
make_rels_by_clause_joins() and instead of passing in the ListCell to
start looping from, let's just pass in the index where to start from and
make use of for_each_from(). Ever since 1cff1b95a, Lists are arrays
under the hood. lnext() and list_head() both seem a little too linked-list
like.
Author: Alex Hsieh, David Rowley, Richard Guo
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/CANWNU8x9P9aCXGF%3DaT-A_8mLTAT0LkcZ_ySYrGbcuHzMQw2-1g%40mail.gmail.com
Here we adjust the costs for WindowAggs so that they properly take into
account how much of their subnode they must read before outputting the
first row. Without this, we always assumed that the startup cost for the
WindowAgg was not much more expensive than the startup cost of its
subnode, however, that's going to be completely wrong in many cases. The
WindowAgg may have to read *all* of its subnode to output a single row
with certain window bound options.
Here we estimate how many rows we'll need to read from the WindowAgg's
subnode and proportionally add more of the subnode's run costs onto the
WindowAgg's startup costs according to how much of it we expect to have to
read in order to produce the first WindowAgg row.
The reason this is more important than we might have initially thought is
that we may end up making use of a path from the lower planner that works
well as a cheap startup plan when the query has a LIMIT clause, however,
the WindowAgg might mean we need to read far more rows than what the LIMIT
specifies.
No backpatch on this so as not to cause plan changes in released
versions.
Bug: #17862
Reported-by: Tim Palmer
Author: David Rowley
Reviewed-by: Andy Fan
Discussion: https://postgr.es/m/17862-1ab8f74b0f7b0611@postgresql.org
Discussion: https://postgr.es/m/CAApHDvrB0S5BMv+0-wTTqWFE-BJ0noWqTnDu9QQfjZ2VSpLv_g@mail.gmail.com
Both apply and tablesync workers were using ApplyWorkerMain() as entry
point. As the name implies, ApplyWorkerMain() should be considered as
the main function for apply workers. Tablesync worker's path was hidden
and does not have enough in common to share the same main function with
apply worker.
Also, most of the code shared by both worker types is already combined
in LogicalRepApplyLoop(). There is no need to combine the rest in
ApplyWorkerMain() anymore.
This patch introduces TablesyncWorkerMain() as a new entry point for
tablesync workers. This aims to increase code readability and would help
with future improvements like the reuse of tablesync workers in the
initial synchronization.
Author: Melih Mutlu based on suggestions by Melanie Plageman
Reviewed-by: Peter Smith, Kuroda Hayato, Amit Kapila
Discussion: http://postgr.es/m/CAGPVpCTq=rUDd4JUdaRc1XUWf4BrH2gdSNf3rtOMUGj9rPpfzQ@mail.gmail.com
Between 6fcda9aba and 1b6f632a3, the pg_strtoint functions became quite
a bit slower in v16, despite efforts in 6b423ec67 to speed these up.
Since the majority of cases for these functions will only contain
base-10 digits, perhaps prefixed by a '-', it makes sense to have a
special case for this and just fall back on the more complex version
which processes hex, octal, binary and underscores if the fast path
version fails to parse the string.
While we're here, update the header comments for these functions to
mention that hex, octal and binary formats along with underscore
separators are now supported.
Author: Andres Freund, David Rowley
Reported-by: Masahiko Sawada
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAD21AoDvDmUQeJtZrau1ovnT_smN940%3DKp6mszNGK3bq9yRN6g%40mail.gmail.com
Backpatch-through: 16, where 6fcda9aba and 1b6f632a3 were added
This was failing for queries which try to get the .type() of a
jpiLikeRegex. For example:
select jsonb_path_query('["string", "string"]',
'($[0] like_regex ".{7}").type()');
Reported-by: Alexander Kozhemyakin
Bug: #18035
Discussion: https://postgr.es/m/18035-64af5cdcb5adf2a9@postgresql.org
Backpatch-through: 12, where SQL/JSON path was added.
The previous commit removed the "override" APIs. Surviving APIs facilitate
plancache.c to snapshot search_path and test whether the current value equals
a remembered snapshot.
Aleksander Alekseev. Reported by Alexander Lakhin and Noah Misch.
Discussion: https://postgr.es/m/8ffb4650-52c4-6a81-38fc-8f99be981130@gmail.com
Two backend routines are added to allow extension to allocate and define
custom wait events, all of these being allocated in the type
"Extension":
* WaitEventExtensionNew(), that allocates a wait event ID computed from
a counter in shared memory.
* WaitEventExtensionRegisterName(), to associate a custom string to the
wait event ID allocated.
Note that this includes an example of how to use this new facility in
worker_spi with tests in TAP for various scenarios, and some
documentation about how to use them.
Any code in the tree that currently uses WAIT_EVENT_EXTENSION could
switch to this new facility to define custom wait events. This is left
as work for future patches.
Author: Masahiro Ikeda
Reviewed-by: Andres Freund, Michael Paquier, Tristan Partin, Bharath
Rupireddy
Discussion: https://postgr.es/m/b9f5411acda0cf15c8fbb767702ff43e@oss.nttdata.com
These are useful to extract the class ID and the event ID associated to
a single uint32 wait_event_info. Only two code paths use them now, but
an upcoming patch will extend their use.
Discussion: https://postgr.es/m/ZMcJ7F7nkGkIs8zP@paquier.xyz
Commit e7cb7ee14, which introduced the infrastructure for FDWs and
custom scan providers to replace joins with scans, failed to add support
handling of pseudoconstant quals assigned to replaced joins in
createplan.c, leading to an incorrect plan without a gating Result node
when postgres_fdw replaced a join with such a qual.
To fix, we could add the support by 1) modifying the ForeignPath and
CustomPath structs to store the list of RestrictInfo nodes to apply to
the join, as in JoinPaths, if they represent foreign and custom scans
replacing a join with a scan, and by 2) modifying create_scan_plan() in
createplan.c to use that list in that case, instead of the
baserestrictinfo list, to get pseudoconstant quals assigned to the join;
but #1 would cause an ABI break. So fix by modifying the infrastructure
to just disallow replacing joins with such quals.
Back-patch to all supported branches.
Reported by Nishant Sharma. Patch by me, reviewed by Nishant Sharma and
Richard Guo.
Discussion: https://postgr.es/m/CADrsxdbcN1vejBaf8a%2BQhrZY5PXL-04mCd4GDu6qm6FigDZd6Q%40mail.gmail.com
Historically, hba.c limited tokens in the authentication configuration
files (pg_hba.conf and pg_ident.conf) to less than 256 bytes. We have
seen a few reports of this limit causing problems; notably, for
moderately-complex LDAP configurations. Let's get rid of the fixed
limit by using a StringInfo instead of a fixed-size buffer.
This actually takes less code than before, since we can get rid of
a nontrivial error recovery stanza. It's doubtless a hair slower,
but parsing the content of the HBA files should in no way be
performance-critical.
Although this is a pretty straightforward patch, it doesn't seem
worth the risk to back-patch given the small number of complaints
to date. In released branches, we'll just raise MAX_TOKEN to
ameliorate the problem.
Discussion: https://postgr.es/m/1588937.1690221208@sss.pgh.pa.us
9f8377f7a added code to allow COPY FROM insert a column's default value
when the input matches the DEFAULT string specified in the COPY command.
Here we fix some inefficient code which needlessly palloc0'd an array to
store if we should use the default value or input value for the given
column. This array was being palloc0'd and pfree'd once per row. It's
much more efficient to allocate this once and just reset the values once
per row.
Reported-by: Masahiko Sawada
Author: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoDvDmUQeJtZrau1ovnT_smN940%3DKp6mszNGK3bq9yRN6g%40mail.gmail.com
Backpatch-through: 16, where 9f8377f7a was introduced.
There was already a check on the relation OID, but not its index OID and
the attributes that can be used during the syscache lookups. The two
assertions added by this commit are cheap, and actually useful for
developers to fasten the detection of incorrect data in a new entry
added in the syscache list, as these assertions are triggered during the
initial cache loading (initdb, or just backend startup), not requiring a
syscache that uses the new entry.
While on it, the relation OID check is switched to use OidIsValid().
Author: Aleksander Alekseev
Reviewed-by: Dagfinn Ilmari Mannsåker, Zhang Mingli, Michael Paquier
Discussion: https://postgr.es/m/CAJ7c6TOjUTJ0jxvWY6oJeP2-840OF8ch7qscZQsuVuotXTOS_g@mail.gmail.com
In pg_stat_statements, savepoint names now show up as constants with a
parameter symbol, using as base query string the one added as a new
entry to the PGSS hash table, leading to:
RELEASE $1
ROLLBACK TO $1
SAVEPOINT $1
Applying constants to these query parts is a huge advantage for
workloads that generate randomly savepoint points, like ORMs (Django is
at the origin of this patch). The ODBC driver is a second layer that
likes a lot savepoints, though it does not use a random naming pattern.
A "location" field is added to TransactionStmt, now set only for
savepoints. The savepoint name is ignored by the query jumbling. The
location can be extended to other query patterns, if required, like 2PC
commands. Some tests are added to pg_stat_statements for all the query
patterns supported by the parser.
ROLLBACK, ROLLBACK TO SAVEPOINT and ROLLBACK TRANSACTION TO SAVEPOINT
have the same Node representation, so all these are equivalents. The
same happens for RELEASE and RELEASE SAVEPOINT.
Author: Greg Sabino Mullane
Discussion: https://postgr.es/m/CAKAnmm+2s9PA4OaumwMJReWHk8qvJ_-g1WqxDRDAN1BSUfxyTw@mail.gmail.com
This Patch introduces three SQL standard JSON functions:
JSON()
JSON_SCALAR()
JSON_SERIALIZE()
JSON() produces json values from text, bytea, json or jsonb values,
and has facilitites for handling duplicate keys.
JSON_SCALAR() produces a json value from any scalar sql value,
including json and jsonb.
JSON_SERIALIZE() produces text or bytea from input which containis
or represents json or jsonb;
For the most part these functions don't add any significant new
capabilities, but they will be of use to users wanting standard
compliant JSON handling.
Catversion bumped as this changes ruleutils.c.
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>
Author: Amit Langote <amitlangote09@gmail.com>
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby, Álvaro Herrera,
Peter Eisentraut
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
This is to export datum_to_json(), datum_to_jsonb(), and
jsonb_from_cstring(), though the last one is exported as
jsonb_from_text().
A subsequent commit to add new SQL/JSON constructor functions will
need them for calling from the executor.
Discussion: https://postgr.es/m/20230720160252.ldk7jy6jqclxfxkq%40alvherre.pgsql
Commit 5764f611e used dclist_delete_from() to remove the proc from the
wait queue. However, since it doesn't clear dist_node's next/prev to
NULL, it could call RemoveFromWaitQueue() twice: when the process
detects a deadlock and then when cleaning up locks on aborting the
transaction. The waiting lock information is cleared in the first
call, so it led to a crash in the second call.
Backpatch to v16, where the change was introduced.
Bug: #18031
Reported-by: Justin Pryzby, Alexander Lakhin
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/ZKy4AdrLEfbqrxGJ%40telsasoft.com
Discussion: https://postgr.es/m/18031-ebe2d08cb405f6cc@postgresql.org
Backpatch-through: 16
This commit adds a few comments about what LWLockWaitForVar() relies on
when a backend waits for a variable update on its LWLocks for WAL
insertions up to an expected LSN.
First, LWLockWaitForVar() does not include a memory barrier, relying on
a spinlock taken at the beginning of WaitXLogInsertionsToFinish(). This
was hidden behind two layers of routines in lwlock.c. This assumption
is now documented at the top of LWLockWaitForVar(), and detailed at bit
more within LWLockConflictsWithVar().
Second, document why WaitXLogInsertionsToFinish() does not include
memory barriers, relying on a spinlock at its top, which is, per Andres'
input, fine for two different reasons, both depending on the fact that
the caller of WaitXLogInsertionsToFinish() is waiting for a LSN up to a
certain value.
This area's documentation and assumptions could be improved more in the
future, but at least that's a beginning.
Author: Bharath Rupireddy, Andres Freund
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CALj2ACVF+6jLvqKe6xhDzCCkr=rfd6upaGc3477Pji1Ke9G7Bg@mail.gmail.com
Previously, when selecting an usable index for update/delete for the
REPLICA IDENTITY FULL table, in IsIndexOnlyExpression(), we used to
check if all index fields are not expressions. However, it was not
necessary, because it is enough to check if only the leftmost index
field is not an expression (and references the remote table column)
and this check has already been done by
RemoteRelContainsLeftMostColumnOnIdx().
This commit removes IsIndexOnlyExpression() and
RemoteRelContainsLeftMostColumnOnIdx() and all checks for usable
indexes for REPLICA IDENTITY FULL tables are now performed by
IsIndexUsableForReplicaIdentityFull().
Backpatch this to remain the code consistent.
Reported-by: Peter Smith
Reviewed-by: Amit Kapila, Önder Kalacı
Discussion: https://postgr.es/m/CAHut%2BPsGRE5WSsY0jcLHJEoA17MrbP9yy8FxdjC_ZOAACxbt%2BQ%40mail.gmail.com
Backpatch-through: 16
The WAL insertion lock variable insertingAt is currently being read
and written with the help of the LWLock wait list lock to avoid any read
of torn values. This wait list lock can become a point of contention on
a highly concurrent write workloads.
This commit switches insertingAt to a 64b atomic variable that provides
torn-free reads/writes. On platforms without 64b atomic support, the
fallback implementation uses spinlocks to provide the same guarantees
for the values read. LWLockWaitForVar(), through
LWLockConflictsWithVar(), reads the new value to check if it still needs
to wait with a u64 atomic operation. LWLockUpdateVar() updates the
variable before waking up the waiters with an exchange_u64 (full memory
barrier). LWLockReleaseClearVar() now uses also an exchange_u64 to
reset the variable. Before this commit, all these steps relied on
LWLockWaitListLock() and LWLockWaitListUnlock().
This reduces contention on LWLock wait list lock and improves
performance of highly-concurrent write workloads. Here are some
numbers using pg_logical_emit_message() (HEAD at d6677b93) with various
arbitrary record lengths and clients up to 1k on a rather-large machine
(64 vCPUs, 512GB of RAM, 16 cores per sockets, 2 sockets), in terms of
TPS numbers coming from pgbench:
message_size_b | 16 | 64 | 256 | 1024
--------------------+--------+--------+--------+-------
patch_4_clients | 83830 | 82929 | 80478 | 73131
patch_16_clients | 267655 | 264973 | 250566 | 213985
patch_64_clients | 380423 | 378318 | 356907 | 294248
patch_256_clients | 360915 | 354436 | 326209 | 263664
patch_512_clients | 332654 | 321199 | 287521 | 240128
patch_1024_clients | 288263 | 276614 | 258220 | 217063
patch_2048_clients | 252280 | 243558 | 230062 | 192429
patch_4096_clients | 212566 | 213654 | 205951 | 166955
head_4_clients | 83686 | 83766 | 81233 | 73749
head_16_clients | 266503 | 265546 | 249261 | 213645
head_64_clients | 366122 | 363462 | 341078 | 261707
head_256_clients | 132600 | 132573 | 134392 | 165799
head_512_clients | 118937 | 114332 | 116860 | 150672
head_1024_clients | 133546 | 115256 | 125236 | 151390
head_2048_clients | 137877 | 117802 | 120909 | 138165
head_4096_clients | 113440 | 115611 | 120635 | 114361
Bharath has been measuring similar improvements, where the limit of the
WAL insertion lock begins to be felt when more than 256 concurrent
clients are involved in this specific workload.
An extra patch has been discussed to introduce a fast-exit path in
LWLockUpdateVar() when there are no waiters, still this does not
influence the write-heavy workload cases discussed as there are always
waiters. This will be considered separately.
Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart, Andres Freund, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACVF+6jLvqKe6xhDzCCkr=rfd6upaGc3477Pji1Ke9G7Bg@mail.gmail.com
We include the message type while displaying an error context in the
apply worker. Now, while retrieving the message type string if the
message type is unknown we throw an error that will hide the original
error. So, instead, we need to simply return the string indicating an
unknown message type.
Reported-by: Ashutosh Bapat
Author: Euler Taveira, Amit Kapila
Reviewed-by: Ashutosh Bapat
Backpatch-through: 15
Discussion: https://postgr.es/m/CAExHW5suAEDW-mBZt_qu4RVxWZ1vL54-L+ci2zreYWebpzxYsA@mail.gmail.com
Due to the bug LimitAdditionalPins() could return 0, violating
LimitAdditionalPins()'s API ("One additional pin is always allowed"). This
could be hit when setting shared_buffers very low and using a fair amount of
concurrency.
This bug was introduced in 31966b151e.
Author: "Anton A. Melnikov" <aamelnikov@inbox.ru>
Reported-by: "Anton A. Melnikov" <aamelnikov@inbox.ru>
Reported-by: Victoria Shepard
Discussion: https://postgr.es/m/ae46f2fb-5586-3de0-b54b-1bb0f6410ebd@inbox.ru
Backpatch: 16-
After 3c90dcd03, try_partitionwise_join's child_joinrelids
variable is read only in an Assert, provoking a compiler
warning in non-assert builds. Rearrange code to avoid the
warning and eliminate unnecessary work in the non-assert case.
Per CI testing (via Jeff Davis and Bharath Rupireddy)
Discussion: https://postgr.es/m/ef0de9713e605451f1b60b30648c5ee900b2394c.camel@j-davis.com
Applying add_outer_joins_to_relids() to a child join doesn't actually
work, even if we've built a SpecialJoinInfo specialized to the child,
because that function will also compare the join's relids to elements
of the main join_info_list, which only deal in regular relids not
child relids. This mistake escaped detection by the existing
partitionwise join tests because they didn't test any cases where
add_outer_joins_to_relids() needs to add additional OJ relids (that
is, any cases where join reordering per identity 3 is possible).
Instead, let's apply adjust_child_relids() to the relids of the parent
join. This requires minor code reordering to collect the relevant
AppendRelInfo structures first, but that's work we'd do shortly anyway.
Report and fix by Richard Guo; cosmetic changes by me
Discussion: https://postgr.es/m/CAMbWs49NCNbyubZWgci3o=_OTY=snCfAPtMnM-32f3mm-K-Ckw@mail.gmail.com
b6e1157e7d made some changes to enforce that
JsonValueExpr.formatted_expr is always set and is the expression that
gives a JsonValueExpr its runtime value, but that's not really
apparent from the comments about and the code manipulating
formatted_expr. This commit fixes that.
Per suggestion from Álvaro Herrera.
Discussion: https://postgr.es/m/20230718155313.3wqg6encgt32adqb%40alvherre.pgsql
If both the passed-in plan pointer and plansource->gplan are
NULL, CachedPlanIsSimplyValid would think that the plan pointer
is possibly-valid and try to dereference it. For the one extant
call site in plpgsql, this situation doesn't normally happen
which is why we've not noticed. However, it appears to be possible
if the previous use of the cached plan failed, as per report from
Justin Pryzby. Add an extra check to prevent crashing.
Back-patch to v13 where this code was added.
Discussion: https://postgr.es/m/ZLlV+STFz1l/WhAQ@telsasoft.com
This adds the X509 attributes notBefore and notAfter to sslinfo
as well as pg_stat_ssl to allow verifying and identifying the
validity period of the current client certificate.
Author: Cary Huang <cary.huang@highgo.ca>
Discussion: https://postgr.es/m/182b8565486.10af1a86f158715.2387262617218380588@highgo.ca
This essentially removes the JsonbTypeCategory enum and
jsonb_categorize_type() and integrates any jsonb-specific logic that
was in jsonb_categorize_type() into json_categorize_type(), now
moved to jsonfuncs.c. The remaining JsonTypeCategory enum and
json_categorize_type() cover the needs of the callers in both json.c
and jsonb.c. json_categorize_type() has grown a new parameter named
is_jsonb for callers to engage the jsonb-specific behavior of
json_categorize_type().
One notable change in the now exported API of json_categorize_type()
is that it now always returns *outfuncoid even though a caller may
have no need currently to see one.
This is in preparation of later commits to implement additional
SQL/JSON functions.
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
Based on how postgres.h foes the Oid <-> Datum conversion, there is no
existing bugs but let's be consistent. 17 spots have been noticed as
incorrectly passing down Oids rather than Datums. Aleksander got one,
Zhang two and I the rest.
Author: Michael Paquier, Aleksander Alekseev, Zhang Mingli
Discussion: https://postgr.es/m/ZLUhqsqQN1MOaxdw@paquier.xyz
b5913f6120 added a parenthesized syntax for CLUSTER, but it
requires specifying a table name. This is unlike commands such as
VACUUM and ANALYZE, which do not require specifying a table in the
parenthesized syntax. This change resolves this inconsistency.
This is preparatory work for a follow-up commit that will move the
unparenthesized syntax to the "Compatibility" section of the
CLUSTER documentation.
Reviewed-by: Melanie Plageman, Michael Paquier
Discussion: https://postgr.es/m/CAAKRu_bc5uHieG1976kGqJKxyWtyQt9yvktjsVX%2Bi7NOigDjOA%40mail.gmail.com
This change moves the unparenthesized syntax for CLUSTER to the end
of the ClusterStmt rules in preparation for a follow-up commit that
will move this syntax to the "Compatibility" section of the CLUSTER
documentation. The documentation for the CLUSTER syntaxes has also
been consolidated.
Suggested-by: Melanie Plageman
Discussion https://postgr.es/m/CAAKRu_bc5uHieG1976kGqJKxyWtyQt9yvktjsVX%2Bi7NOigDjOA%40mail.gmail.com
This has been missed in cb0cca1, noticed before buildfarm member koel
has been able to complain while poking at a different patch. Like the
other commit, backpatch all the way down to limit the odds of merge
conflicts.
Backpatch-through: 11
A crash in the middle of a checkpoint with some two-phase state data
already flushed to disk by this checkpoint could cause a follow-up crash
recovery to recover twice the same transaction, once from what has been
found in pg_twophase/ at the beginning of recovery and a second time
when replaying its corresponding record.
This would lead to FATAL failures in the startup process during
recovery, where the same transaction would have a state recovered twice
instead of once:
LOG: recovering prepared transaction 731 from shared memory
LOG: recovering prepared transaction 731 from shared memory
FATAL: lock ExclusiveLock on object 731/0/0 is already held
This issue is fixed by skipping the addition of any 2PC state coming
from a record whose equivalent 2PC state file has already been loaded in
TwoPhaseState at the beginning of recovery by restoreTwoPhaseData(),
which is OK as long as the system has not reached a consistent state.
The timing to get a messed up recovery processing is very racy, and
would very unlikely happen. The thread that has reported the issue has
demonstrated the bug using injection points to force a PANIC in the
middle of a checkpoint.
Issue introduced in 728bd99, so backpatch all the way down.
Reported-by: "suyu.cmj" <mengjuan.cmj@alibaba-inc.com>
Author: "suyu.cmj" <mengjuan.cmj@alibaba-inc.com>
Author: Michael Paquier
Discussion: https://postgr.es/m/109e6994-b971-48cb-84f6-829646f18b4c.mengjuan.cmj@alibaba-inc.com
Backpatch-through: 11
Here we reduce the block size fields in AllocSetContext, GenerationContext
and SlabContext from Size down to uint32. Ever since c6e0fe1f2, blocks
for non-dedicated palloc chunks can no longer be larger than 1GB, so
there's no need to store the various block size fields as 64-bit values.
32 bits are enough to store 2^30.
Here we also further reduce the memory context struct sizes by getting rid
of the 'keeper' field which stores a pointer to the context's keeper
block. All the context types which have this field always allocate the
keeper block in the same allocation as the memory context itself, so the
keeper block always comes right at the end of the context struct. Add
some macros to calculate that address rather than storing it in the
context.
Overall, in AllocSetContext and GenerationContext, this saves 20 bytes on
64-bit builds which for ALLOCSET_SMALL_SIZES can sometimes mean the
difference between having to allocate a 2nd block and storing all the
required allocations on the keeper block alone. Such contexts are used
in relcache to store cache entries for indexes, of which there can be
a large number in a single backend.
Author: Melih Mutlu
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/CAGPVpCSOW3uJ1QJmsMR9_oE3X7fG_z4q0AoU4R_w+2RzvroPFg@mail.gmail.com
If the plan itself is parallel-safe, and the initPlans are too,
there's no reason anymore to prevent the plan from being marked
parallel-safe. That restriction (dating to commit ab77a5a45) was
really a special case of the fact that we couldn't transmit subplans
to parallel workers at all. We fixed that in commit 5e6d8d2bb and
follow-ons, but this case never got addressed.
We still forbid attaching initPlans to a Gather node that's
inserted pursuant to debug_parallel_query = regress. That's because,
when we hide the Gather from EXPLAIN output, we'd hide the initPlans
too, causing cosmetic regression diffs. It seems inadvisable to
kluge EXPLAIN to the extent required to make the output look the
same, so just don't do it in that case.
Along the way, this also takes care of some sloppiness about updating
path costs to match when we move initplans from one place to another
during createplan.c and setrefs.c. Since all the planning decisions
are already made by that point, this is just cosmetic; but it seems
good to keep EXPLAIN output consistent with where the initplans are.
The diff in query_planner() might be worth remarking on. I found that
one because after fixing things to allow parallel-safe initplans, one
partition_prune test case changed plans (as shown in the patch) ---
but only when debug_parallel_query was active. The reason proved to
be that we only bothered to mark Result nodes as potentially
parallel-safe when debug_parallel_query is on. This neglects the fact
that parallel-safety may be of interest for a sub-query even though
the Result itself doesn't parallelize.
Discussion: https://postgr.es/m/1129530.1681317832@sss.pgh.pa.us
We are capable of optimizing MIN() and MAX() aggregates on indexed
columns into subqueries that exploit the index, rather than the normal
thing of scanning the whole table. When we do this, we replace the
Aggref node(s) with Params referencing subquery outputs. Such Params
really ought to be included in the per-plan-node extParam/allParam
sets computed by SS_finalize_plan. However, we've never done so
up to now because of an ancient implementation choice to perform
that substitution during set_plan_references, which runs after
SS_finalize_plan, so that SS_finalize_plan never sees these Params.
This seems like clearly a bug, yet there have been no field reports
of problems that could trace to it. This may be because the types
of Plan nodes that could contain Aggrefs do not have any of the
rescan optimizations that are controlled by extParam/allParam.
Nonetheless it seems certain to bite us someday, so let's fix it
in a self-contained patch that can be back-patched if we find a
case in which there's a live bug pre-v17.
The cleanest fix would be to perform a separate tree walk to do
these substitutions before SS_finalize_plan runs. That seems
unattractive, first because a whole-tree mutation pass is expensive,
and second because we lack infrastructure for visiting expression
subtrees in a Plan tree, so that we'd need a new function knowing
as much as SS_finalize_plan knows about that. I also considered
swapping the order of SS_finalize_plan and set_plan_references,
but that fell foul of various assumptions that seem tricky to fix.
So the approach adopted here is to teach SS_finalize_plan itself
to check for such Aggrefs. I refactored things a bit in setrefs.c
to avoid having three copies of the code that does that.
Given the lack of any currently-known bug, no test case here.
Discussion: https://postgr.es/m/2391880.1689025003@sss.pgh.pa.us
Before, if you went past about 64M array elements in array_agg() and
allied functions, you got a generic "invalid memory alloc request
size" error. This patch replaces that with "array size exceeds the
maximum allowed", which seems more user-friendly since it points you
to needing to reduce the size of your array result. (This is the
same error text you'd get from construct_md_array in the event of
overrunning the maximum physical size for the finished array.)
Per question from Shaozhong Shi. Since this hasn't come up often,
I don't feel a need to back-patch.
Discussion: https://postgr.es/m/CA+i5JwYtVS9z2E71PcNKAVPbOn4R2wuj-LqbJsYr_XOz73q7dQ@mail.gmail.com
In a61b1f7482, we failed to update transformFromClauseItem() and
buildNSItemFromLists() to set ParseNamespaceItem.p_perminfo causing
it to point to garbage.
Pointed out by Tom Lane.
Reported-by: Farias de Oliveira <matheusfarias519@gmail.com>
Discussion: https://postgr.es/m/3173476.1689286373%40sss.pgh.pa.us
Backpatch-through: 16
Presently, the privilege check for SET SESSION AUTHORIZATION checks
whether the original authenticated role was a superuser at
connection start time. Even if the role loses the superuser
attribute, its existing sessions are permitted to change session
authorization to any role.
This commit modifies this privilege check to verify the original
authenticated role currently has superuser. In the event that the
authenticated role loses superuser within a session authorization
change, the authorization change will remain in effect, which means
the user can still take advantage of the target role's privileges.
However, [RE]SET SESSION AUTHORIZATION will only permit switching
to the original authenticated role.
Author: Joseph Koshakow
Discussion: https://postgr.es/m/CAAvxfHc-HHzONQ2oXdvhFF9ayRnidPwK%2BfVBhRzaBWYYLVQL-g%40mail.gmail.com
Presently, the privilege check for SET SESSION AUTHORIZATION is
performed in session_authorization's assign_hook. A relevant
comment states, "It's OK because the check does not require catalog
access and can't fail during an end-of-transaction GUC
reversion..." However, we plan to add a catalog lookup to this
privilege check in a follow-up commit.
This commit moves this privilege check to the check_hook for
session_authorization. Like check_role(), we do not throw a hard
error for insufficient privileges when the source is PGC_S_TEST.
Author: Joseph Koshakow
Discussion: https://postgr.es/m/CAAvxfHc-HHzONQ2oXdvhFF9ayRnidPwK%2BfVBhRzaBWYYLVQL-g%40mail.gmail.com
Commit 89e46da5e5 allowed using BTREE indexes that are neither
PRIMARY KEY nor REPLICA IDENTITY on the subscriber during apply of
update/delete. This patch extends that functionality to also allow HASH
indexes.
We explored supporting other index access methods as well but they don't
have a fixed strategy for equality operation which is required by the
current infrastructure in logical replication to scan the indexes.
Author: Kuroda Hayato
Reviewed-by: Peter Smith, Onder Kalaci, Amit Kapila
Discussion: https://postgr.es/m/TYAPR01MB58669D7414E59664E17A5827F522A@TYAPR01MB5866.jpnprd01.prod.outlook.com
RelationReloadIndexInfo() is a fast-path used for index reloads in the
relation cache, and it has always forgotten about updating
indisreplident, which is something that would happen after an index is
selected for a replica identity. This can lead to incorrect cache
information provided when executing a command in a transaction context
that updates indisreplident.
None of the code paths currently on HEAD that need to check upon
pg_index.indisreplident fetch its value from the relation cache, always
relying on a fresh copy on the syscache. Unfortunately, this may not be
the case of out-of-core code, that could see out-of-date value.
Author: Shruthi Gowda
Reviewed-by: Robert Haas, Dilip Kumar, Michael Paquier
Discussion: https://postgr.es/m/CAASxf_PBcxax0wW-3gErUyftZ0XrCs3Lrpuhq4-Z3Fak1DoW7Q@mail.gmail.com
Backpatch-through: 11
indisvalid is switched to true for partitioned indexes when all its
partitions have valid indexes when attaching a new partition, up to the
top-most parent if all its leaves are themselves valid when dealing with
multiple layers of partitions.
The copy of the tuple from pg_index used to switch indisvalid to true
came from the relation cache, which is incorrect. Particularly, in the
case reported by Shruthi Gowda, executing a series of commands in a
single transaction would cause the validation of partitioned indexes to
use an incorrect version of a pg_index tuple, as indexes are reloaded
after an invalidation request with RelationReloadIndexInfo(), a much
faster version than a full index cache rebuild. In this case, the
limited information updated in the cache leads to an incorrect version
of the tuple used. One of the symptoms reported was the following
error, with a replica identity update, for instance:
"ERROR: attempted to update invisible tuple"
This is incorrect since 8b08f7d, so backpatch all the way down.
Reported-by: Shruthi Gowda
Author: Michael Paquier
Reviewed-by: Shruthi Gowda, Dilip Kumar
Discussion: https://postgr.es/m/CAASxf_PBcxax0wW-3gErUyftZ0XrCs3Lrpuhq4-Z3Fak1DoW7Q@mail.gmail.com
Backpatch-through: 11
The "fsync" level already flushes drive write caches on Windows (as does
"fdatasync"), so it only confuses matters to have an apparently higher
level that isn't actually different at all.
That leaves "fsync_writethrough" only for macOS, where it actually does
something different.
Reviewed-by: Magnus Hagander <magnus@hagander.net>
Discussion: https://postgr.es/m/CA%2BhUKGJ2CG2SouPv2mca2WCTOJxYumvBARRcKPraFMB6GSEMcA%40mail.gmail.com
The contents of the line whose parsing failed was not reported in the
error message produced by generate-wait_event_types.pl, making harder
than necessary the debugging of incorrectly-shaped entries in the file.
Reported-by: Andres Freund
Discussion: https://postgr.es/m/ZK9S3jFEV1X797Ll@paquier.xyz
The double quotes used for the wait event names are not required, as
the values quoted are made of single words. The files generated by
generate-wait_event_types.pl (pgstat_wait_event.c, wait_event_types.h
and wait_event_types.sgml) are exactly the same before and after this
commit, hence the wait event names and the enum elements have the same
names as before.
Discussion: https://postgr.es/m/ZK9S3jFEV1X797Ll@paquier.xyz
Until now, when DROP DATABASE got interrupted in the wrong moment, the removal
of the pg_database row would also roll back, even though some irreversible
steps have already been taken. E.g. DropDatabaseBuffers() might have thrown
out dirty buffers, or files could have been unlinked. But we continued to
allow connections to such a corrupted database.
To fix this, mark databases invalid with an in-place update, just before
starting to perform irreversible steps. As we can't add a new column in the
back branches, we use pg_database.datconnlimit = -2 for this purpose.
An invalid database cannot be connected to anymore, but can still be
dropped.
Unfortunately we can't easily add output to psql's \l to indicate that some
database is invalid, it doesn't fit in any of the existing columns.
Add tests verifying that a interrupted DROP DATABASE is handled correctly in
the backend and in various tools.
Reported-by: Evgeny Morozov <postgresql3@realityexists.net>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20230509004637.cgvmfwrbht7xm7p6@awork3.anarazel.de
Discussion: https://postgr.es/m/20230314174521.74jl6ffqsee5mtug@awork3.anarazel.de
Backpatch: 11-, bug present in all supported versions
When vac_truncate_clog() encounters bogus datfrozenxid / datminmxid values, it
returns early. Unfortunately, until now, it did not release
WrapLimitsVacuumLock. If the backend later tries to acquire
WrapLimitsVacuumLock, the session / autovacuum worker hangs in an
uncancellable way. Similarly, other sessions will hang waiting for the
lock. However, if the backend holding the lock exited or errored out for some
reason, the lock was released.
The bug was introduced as a side effect of 566372b3d6.
It is interesting that there are no production reports of this problem. That
is likely due to a mix of bugs leading to bogus values having gotten less
common, process exit releasing locks and instances of hangs being hard to
debug for "normal" users.
Discussion: https://postgr.es/m/20230621221208.vhsqgduwfpzwxnpg@awork3.anarazel.de
Commit 89e46da5e allowed REPLICA IDENTITY FULL tables to use an index
on the subscriber during apply of update/delete. This commit clarifies
in the documentation that the leftmost field of candidate indexes must
be a column (not an expression) that references the published relation
column.
The source code comments are also updated accordingly.
Reviewed-by: Peter Smith, Amit Kapila
Discussion: https://postgr.es/m/CAD21AoDJjffEvUFKXT27Q5U8-UU9JHv4rrJ9Ke8Zkc5UPWHLvA@mail.gmail.com
Backpatch-through: 16
A CaseTestExpr is currently being put into
JsonValueExpr.formatted_expr as placeholder for the result of
evaluating JsonValueExpr.raw_expr, which in turn is evaluated
separately. Though, there's no need for this indirection if
raw_expr itself can be embedded into formatted_expr and evaluated
as part of evaluating the latter, especially as there is no
special reason to evaluate it separately. So this commit makes it
so. As a result, JsonValueExpr.raw_expr no longer needs to be
evaluated in ExecInterpExpr(), eval_const_exprs_mutator() etc. and
is now only used for displaying the original "unformatted"
expression in ruleutils.c.
While at it, this also removes the function makeCaseTestExpr(),
because the code in makeJsonConstructorExpr() looks more readable
without it IMO and isn't used by anyone else either.
Finally, a note is added in the comment above CaseTestExpr's
definition that JsonConstructorExpr is also using it.
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
The first check on the enum values was not necessary as the values set
in wait_event_names.txt for the classes LWLock and Lock were able to
satisfy the check. The second check when generating the C and header
files is now based on a match of the class names, making it simpler to
understand.
Author: Masahiro Ikeda, Michael Paquier
Discussion: https://postgr.es/m/eaf82a85c0ef1b55dc3b651d3f7b867a@oss.nttdata.com
The special handling of negative attribute numbers in
ATExecAddColumn() was introduced to support SET WITH OIDS (commit
6d1e361852). But that feature doesn't exist anymore, so we can revert
to the previous, simpler version. In passing, also remove an obsolete
comment about OID support.
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org
Previously we only allowed unique B-tree constraints on partitions
(and only if the constraint included all the partition keys). But we
could allow exclusion constraints with the same restriction. We also
require that those columns be compared for equality, not something
like &&.
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/ec8b1d9b-502e-d1f8-e909-1bf9dffe6fa5@illuminatedcomputing.com
This commit adds two columns: indexes_total and indexes_processed, to
pg_stat_progress_vacuum system view to show the index vacuum
progress. These numbers are reported in the "vacuuming indexes" and
"cleaning up indexes" phases.
This uses the new parallel message type for progress reporting added
by be06506e7.
Bump catversion because this changes the definition of
pg_stat_progress_vacuum.
Author: Sami Imseih
Reviewed by: Masahiko Sawada, Michael Paquier, Nathan Bossart, Andres Freund
Discussion: https://www.postgresql.org/message-id/flat/5478DFCD-2333-401A-B2F0-0D186AB09228@amazon.com
This commit adds a new type of parallel message 'P' to allow a
parallel worker to poke at a leader to update the progress.
Currently it supports only incremental progress reporting but it's
possible to allow for supporting of other backend progress APIs in the
future.
There are no users of this new message type as of this commit. That
will follow in future commits.
Idea from Andres Freund.
Author: Sami Imseih
Reviewed by: Michael Paquier, Masahiko Sawada
Discussion: https://www.postgresql.org/message-id/flat/5478DFCD-2333-401A-B2F0-0D186AB09228@amazon.com
Windows has similar functions with leading underscores. Previously, we
provided the rename via a macro in win32_port.h. In fact its functions
are not always good replacements for the Unix functions, since they
can't deal with UTF-8. They are only currently used by pg_locale.c,
which is careful to redirect to other Windows routines for UTF-8. Given
that portability hazard, it seem unlikely to be a good idea to encourage
any other code to think of these functions as being available outside
pg_locale.c. Any code that thinks it wants these functions probably
wants our wchar2char() or char2wchar() routines instead, or it won't
actually work on Windows in UTF-8 databases.
Furthermore, some major libc implementations including glibc don't have
them (they only have the standard variants without _l), so external code
is very unlikely to require them to exist.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKG%2Bt_CHPzEoPnKyARJBJgE9-GxNajJo6ZuSfRK_KWFO%2B6w%40mail.gmail.com
Since PostgresMain calls sigsetjmp, any local variables that are not
marked "volatile" have a risk of unspecified behavior. In practice
this means that when control returns via longjmp, such variables might
get reset to their values as of the time of sigsetjmp, depending on
whether the compiler chose to put them in registers or on the stack.
We were careful about this for "send_ready_for_query", but not the
other local variables.
In the case of the timeout_enabled flags, resetting them to
their initial "false" states is actually good, since we do
"disable_all_timeouts()" in the longjmp cleanup code path. If that
does not happen, we risk uselessly calling "disable_timeout()" later,
which is harmless but a little bit expensive. Let's explicitly reset
these flags so that the behavior is correct and platform-independent.
(This change means that we really don't need the new "volatile"
markings after all, but let's install them anyway since any change
in this logic could re-introduce a problem.)
There is no issue for "firstchar" and "input_message" because those
are explicitly reinitialized each time through the query processing
loop. To make that clearer, move them to be declared inside the loop.
That leaves us with all the function-lifespan locals except the
sigjmp_buf itself marked as volatile, which seems like a good policy
to have going forward.
Because of the possibility of extra disable_timeout() calls, this
seems worth back-patching.
Sergey Shinderuk and Tom Lane
Discussion: https://postgr.es/m/2eda015b-7dff-47fd-d5e2-f1a9899b90a6@postgrespro.ru