Ordinarily the functions called in this loop ought to have plenty
of CFIs themselves; but we've now seen a case where no such CFI is
reached, making the loop uninterruptible. Even though that's from
a recently-introduced bug, it seems prudent to install a CFI at
the loop level in all branches.
Per discussion of bug #17558 from Andrew Kesper (an actual fix for
that bug will follow).
Discussion: https://postgr.es/m/17558-3f6599ffcf52fd4a@postgresql.org
The use of "we" when referring to the active backend might be
misunderstood, so rephrase to make it clearer who is performing
the actions discussed in the comment.
Author: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Erikjan Rijkers <er@xs4all.nl>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAEG8a3LRSMqkvjiURiJoSi4aGWORpiXUmUfQQK5PaD6WfPzu3w@mail.gmail.com
Commit fac1b470a thought we could check for set-returning functions
by testing only the top-level node in an expression tree. This is
wrong in itself, and to make matters worse it encouraged others
to make the same mistake, by exporting tlist.c's special-purpose
IS_SRF_CALL() as a widely-visible macro. I can't find any evidence
that anyone's taken the bait, but it was only a matter of time.
Use expression_returns_set() instead, and stuff the IS_SRF_CALL()
genie back in its bottle, this time with a warning label. I also
added a couple of cross-reference comments.
After a fair amount of fooling around, I've despaired of making
a robust test case that exposes the bug reliably, so no test case
here. (Note that the test case added by fac1b470a is itself
broken, in that it doesn't notice if you remove the code change.
The repro given by the bug submitter currently doesn't fail either
in v15 or HEAD, though I suspect that may indicate an unrelated bug.)
Per bug #17564 from Martijn van Oosterhout. Back-patch to v13,
as the faulty patch was.
Discussion: https://postgr.es/m/17564-c7472c2f90ef2da3@postgresql.org
Previously, a byte with the high bit set was just transmitted
as-is by charin() and charout(). This is problematic if the
database encoding is multibyte, because the result of charout()
won't be validly encoded, which breaks various stuff that
expects all text strings to be validly encoded. We've
previously decided to enforce encoding validity rather than try
to individually harden each place that might have a problem with
such strings, so it's time to do something about "char".
To fix, represent high-bit-set characters as \ooo (backslash
and three octal digits), following the ancient "escape" format
for bytea. charin() will continue to accept the old way as well,
though that is only reachable in single-byte encodings.
Add some test cases just so there is coverage for this code.
We'll otherwise leave this question undocumented as it was before,
because we don't really want to encourage end-user use of "char".
For the moment, back-patch into v15 so that this change appears
in 15beta3. If there's not great pushback we should consider
absorbing this change into the older branches.
Discussion: https://postgr.es/m/2318797.1638558730@sss.pgh.pa.us
ORDER BY / DISTINCT aggreagtes have, since implemented in Postgres, been
executed by always performing a sort in nodeAgg.c to sort the tuples in
the current group into the correct order before calling the transition
function on the sorted tuples. This was not great as often there might be
an index that could have provided pre-sorted input and allowed the
transition functions to be called as the rows come in, rather than having
to store them in a tuplestore in order to sort them once all the tuples
for the group have arrived.
Here we change the planner so it requests a path with a sort order which
supports the most amount of ORDER BY / DISTINCT aggregate functions and
add new code to the executor to allow it to support the processing of
ORDER BY / DISTINCT aggregates where the tuples are already sorted in the
correct order.
Since there can be many ORDER BY / DISTINCT aggregates in any given query
level, it's very possible that we can't find an order that suits all of
these aggregates. The sort order that the planner chooses is simply the
one that suits the most aggregate functions. We take the most strictly
sorted variation of each order and see how many aggregate functions can
use that, then we try again with the order of the remaining aggregates to
see if another order would suit more aggregate functions. For example:
SELECT agg(a ORDER BY a),agg2(a ORDER BY a,b) ...
would request the sort order to be {a, b} because {a} is a subset of the
sort order of {a,b}, but;
SELECT agg(a ORDER BY a),agg2(a ORDER BY c) ...
would just pick a plan ordered by {a} (we give precedence to aggregates
which are earlier in the targetlist).
SELECT agg(a ORDER BY a),agg2(a ORDER BY b),agg3(a ORDER BY b) ...
would choose to order by {b} since two aggregates suit that vs just one
that requires input ordered by {a}.
Author: David Rowley
Reviewed-by: Ronan Dunklau, James Coleman, Ranier Vilela, Richard Guo, Tom Lane
Discussion: https://postgr.es/m/CAApHDvpHzfo92%3DR4W0%2BxVua3BUYCKMckWAmo-2t_KiXN-wYH%3Dw%40mail.gmail.com
The select_outer_pathkeys_for_merge function made an attempt to build the
merge join pathkeys in the same order as query_pathkeys. This was done as
it may have led to no sort being required for an ORDER BY or GROUP BY
clause in the upper planner. However, this restriction seems overly
strict as it required that we match the query_pathkeys entirely or we
don't bother putting the merge join pathkeys in that order.
Here we relax this rule so that we use a prefix of the query_pathkeys
providing that prefix matches all of the join quals. This may provide the
upper planner with partially sorted input which will allow the use of
incremental sorts instead of full sorts.
Author: David Rowley
Reviewed-by: Richard Guo
Discussion: https://postgr.es/m/CAApHDvrtZu0PHVfDPFM4Yx3jNR2Wuwosv+T2zqa7LrhhBr2rRg@mail.gmail.com
Here we add code which detects when ExecFindPartition() continually finds
the same partition and add a caching layer to improve partition lookup
performance for such cases.
Both RANGE and LIST partitioned tables traditionally require a binary
search for the set of Datums that a partition needs to be found for. This
binary search is commonly visible in profiles when bulk loading into a
partitioned table. Here we aim to reduce the overhead of bulk-loading
into partitioned tables for cases where many consecutive tuples belong to
the same partition and make the performance of this operation closer to
what it is with a traditional non-partitioned table.
When we find the same partition 16 times in a row, the next search will
result in us simply just checking if the current set of values belongs to
the last found partition. For LIST partitioning we record the index into
the PartitionBoundInfo's datum array. This allows us to check if the
current Datum is the same as the Datum that was last looked up. This
means if any given LIST partition supports storing multiple different
Datum values, then the caching only works when we find the same value as
we did the last time. For RANGE partitioning we simply check if the given
Datums are in the same range as the previously found partition.
We store the details of the cached partition in PartitionDesc (i.e.
relcache) so that the cached values are maintained over multiple
statements.
No caching is done for HASH partitions. The majority of the cost in HASH
partition lookups are in the hashing function(s), which would also have to
be executed if we were to try to do caching for HASH partitioned tables.
Since most of the cost is already incurred, we just don't bother. We also
don't do any caching for LIST partitions when we continually find the
values being looked up belong to the DEFAULT partition. We've no
corresponding index in the PartitionBoundInfo's datum array for this case.
We also don't cache when we find the given values match to a LIST
partitioned table's NULL partition. This is so cheap that there's no
point in doing any caching for this. We also don't cache for a RANGE
partitioned table's DEFAULT partition.
There have been a number of different patches submitted to improve
partition lookups. Hou, Zhijie submitted a patch to detect when the value
belonging to the partition key column(s) were constant and added code to
cache the partition in that case. Amit Langote then implemented an idea
suggested by me to remember the last found partition and start to check if
the current values work for that partition. The final patch here was
written by me and was done by taking many of the ideas I liked from the
patches in the thread and redesigning other aspects.
Discussion: https://postgr.es/m/OS0PR01MB571649B27E912EA6CC4EEF03942D9%40OS0PR01MB5716.jpnprd01.prod.outlook.com
Author: Amit Langote, Hou Zhijie, David Rowley
Reviewed-by: Amit Langote, Hou Zhijie
I thought commit fd96d14d9 had plugged all the holes of this sort,
but no, function RTEs could produce oversize tuples too, either
via long coldeflists or just from multiple functions in one RTE.
(I'm pretty sure the other variants of base RTEs aren't a problem,
because they ultimately refer to either a table or a sub-SELECT,
whose widths are enforced elsewhere. But we explicitly allow join
RTEs to be overwidth, as long as you don't try to form their
tuple result.)
Per further discussion of bug #17561. As before, patch all branches.
Discussion: https://postgr.es/m/17561-80350151b9ad2ad4@postgresql.org
The code tried to access ARR_DIMS(v)[0] and ARR_LBOUND(v)[0]
whether or not those values exist. This made the range check
on the "n" argument unstable --- it might or might not fail, and
if it did it would report garbage for the allowed upper limit.
These bogus accesses would probably annoy Valgrind, and if you
were very unlucky even lead to SIGSEGV.
Report and fix by Martin Kalcher. Back-patch to v14 where this
function was added.
Discussion: https://postgr.es/m/baaeb413-b8a8-4656-5757-ef347e5ec11f@aboutsource.net
These flavors of ALTER TABLE were already shaped to report the
ObjectAddress of the partition attached or detached, but this data was
not added to what is collected for event triggers. The tests of
test_ddl_deparse are updated to show the modification in the data
reported.
Author: Hou Zhijie
Reviewed-by: Álvaro Herrera, Amit Kapila, Hayato Kuroda, Michael Paquier
Discussion: https://postgr.es/m/OS0PR01MB571626984BD099DADF53F38394899@OS0PR01MB5716.jpnprd01.prod.outlook.com
Two callers of generate_useful_gather_paths were testing the wrong
thing when deciding whether to call that function: they checked for
being at the top of the current join subproblem, rather than being at
the actual top join. This'd result in failing to construct parallel
paths for a sub-join for which they might be useful.
While set_rel_pathlist() isn't actively broken, it seems best to
make its identical-in-intention test for this be like the other two.
This has been wrong all along, but given the lack of field complaints
I'm hesitant to back-patch into stable branches; we usually prefer
to avoid non-bug-fix changes in plan choices in minor releases.
It seems not too late for v15 though.
Richard Guo, reviewed by Antonin Houska and Tom Lane
Discussion: https://postgr.es/m/CAMbWs4-mH8Zf87-w+3P2J=nJB+5OyicO28ia9q_9o=Lamf_VHg@mail.gmail.com
There wasn't an especially nice way to read all of a file while
passing missing_ok = true. Add an additional overloaded variant
to support that use-case.
While here, refactor the C code to avoid a rats-nest of PG_NARGS
checks, instead handling the argument collection in the outer
wrapper functions. It's a bit longer this way, but far more
straightforward.
(Upon looking at the code coverage report for genfile.c, I was
impelled to also add a test case for pg_stat_file() -- tgl)
Kyotaro Horiguchi
Discussion: https://postgr.es/m/20220607.160520.1984541900138970018.horikyota.ntt@gmail.com
A RowExpr with more than MaxTupleAttributeNumber columns would fail at
execution anyway, since we cannot form a tuple datum with more than that
many columns. While heap_form_tuple() has a check for too many columns,
it emerges that there are some intermediate bits of code that don't
check and can be driven to failure with sufficiently many columns.
Checking this at parse time seems like the most appropriate place to
install a defense, since we already check SELECT list length there.
While at it, make the SELECT-list-length error use the same errcode
(TOO_MANY_COLUMNS) as heap_form_tuple does, rather than the generic
PROGRAM_LIMIT_EXCEEDED.
Per bug #17561 from Egor Chindyaskin. The given test case crashes
in all supported branches (and probably a lot further back),
so patch all.
Discussion: https://postgr.es/m/17561-80350151b9ad2ad4@postgresql.org
Commit 9a974cbcba arranged to preserve
the relfilenode of user tables across pg_upgrade, but failed to notice
that pg_upgrade treats pg_largeobject as a user table and thus it needs
the same treatment. Otherwise, large objects will appear to vanish
after a pg_upgrade.
Commit d498e052b4 fixed this problem
by teaching pg_dump to UPDATE pg_class.relfilenode for pg_largeobject
and its index. However, because an UPDATE on the catalog rows doesn't
change anything on disk, this can leave stray files behind in the new
cluster. They will normally be empty, but it's a little bit untidy.
Hence, this commit arranges to do the same thing using DDL. Specifically,
it makes TRUNCATE work for the pg_largeobject catalog when in
binary-upgrade mode, and it then uses that command in binary-upgrade
dumps as a way of setting pg_class.relfilenode for pg_largeobject and
its index. That way, the old files are removed from the new cluster.
Discussion: http://postgr.es/m/CA+TgmoYYMXGUJO5GZk1-MByJGu_bB8CbOL6GJQC8=Bzt6x6vDg@mail.gmail.com
In the initial data sort, if the bucket numbers are the same then
next sort on the hash value. Because index pages are kept in
hash value order, this gains a little speed by allowing the
eventual tuple insertions to be done sequentially, avoiding repeated
data movement within PageAddItem. This seems to be good for overall
speedup of 5%-9%, depending on the incoming data.
Simon Riggs, reviewed by Amit Kapila
Discussion: https://postgr.es/m/CANbhV-FG-1ZNMBuwhUF7AxxJz3u5137dYL-o6hchK1V_dMw86g@mail.gmail.com
Commit b0a55e4329 missed a few places
where we are referring to the number used as a part of the relation
filename as an "OID". We now want to call that a "RelFileNumber".
Some of these places actually made it sound like the OID in question
is pg_class.oid rather than pg_class.relfilenode, which is especially
good to clean up.
Dilip Kumar with some editing by me.
Crash recovery on standby may encounter missing directories
when replaying database-creation WAL records. Prior to this
patch, the standby would fail to recover in such a case;
however, the directories could be legitimately missing.
Consider the following sequence of commands:
CREATE DATABASE
DROP DATABASE
DROP TABLESPACE
If, after replaying the last WAL record and removing the
tablespace directory, the standby crashes and has to replay the
create database record again, crash recovery must be able to continue.
A fix for this problem was already attempted in 49d9cfc68b, but it
was reverted because of design issues. This new version is based
on Robert Haas' proposal: any missing tablespaces are created
during recovery before reaching consistency. Tablespaces
are created as real directories, and should be deleted
by later replay. CheckRecoveryConsistency ensures
they have disappeared.
The problems detected by this new code are reported as PANIC,
except when allow_in_place_tablespaces is set to ON, in which
case they are WARNING. Apart from making tests possible, this
gives users an escape hatch in case things don't go as planned.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Asim R Praveen <apraveen@pivotal.io>
Author: Paul Guo <paulguo@gmail.com>
Reviewed-by: Anastasia Lubennikova <lubennikovaav@gmail.com> (older versions)
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> (older versions)
Reviewed-by: Michaël Paquier <michael@paquier.xyz>
Diagnosed-by: Paul Guo <paulguo@gmail.com>
Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
Commit d8cd0c6c95 introduced a file
rename that could fail on Windows, probably due to other backends
having an open file handle to the old file of the same name.
Re-arrange the locking slightly to prevent that, by making sure the
open() and close() run while we hold the lock.
Thomas Munro. I added an explanatory comment.
Discussion: https://postgr.es/m/CA%2BhUKGLZtCTgp4NTWV-wGbR2Nyag71%3DEfYTKjDKnk%2BfkhuFMHw%40mail.gmail.com
GetSubscriptionRelations() and GetSubscriptionNotReadyRelations() share
mostly the same code, which scans pg_subscription_rel and fetches all
the relations of a given subscription. The only difference is that the
second routine looks for all the relations not in a ready state. This
commit refactors the code to use a single routine, shaving a bit of
code.
Author: Vignesh C
Reviewed-By: Kyotaro Horiguchi, Amit Kapila, Michael Paquier, Peter
Smith
Discussion: https://postgr.es/m/CALDaNm0eW-9g4G_EzHebnFT5zZoasWCS_EzZQ5BgnLZny9S=pg@mail.gmail.com
This commit puts the implementation of Tuple sort variants into the separate
file tuplesortvariants.c. That gives better separation of the code and
serves well as the demonstration that Tuple sort variant can be defined outside
of tuplesort.c.
Discussion: https://postgr.es/m/CAPpHfdvjix0Ahx-H3Jp1M2R%2B_74P-zKnGGygx4OWr%3DbUQ8BNdw%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Pavel Borisov, Maxim Orlov, Matthias van de Meent
Reviewed-by: Andres Freund, John Naylor
The new TuplesortPublic data structure contains the definition of
sort-variant-specific interface methods and the part of Tuple sort operation
state required by their implementations. This will let define Tuple sort
variants without knowledge of Tuplesortstate, that is without knowledge
of generic sort implementation guts.
Discussion: https://postgr.es/m/CAPpHfdvjix0Ahx-H3Jp1M2R%2B_74P-zKnGGygx4OWr%3DbUQ8BNdw%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Pavel Borisov, Maxim Orlov, Matthias van de Meent
Reviewed-by: Andres Freund, John Naylor
This commit puts some generic work away from sort-variant-specific function.
In particular, tuplesort_put*() now doesn't need to decrease available memory
and switch to sort context before calling puttuple_common(). writetup()
doesn't need to free SortTuple.tuple and increase available memory.
Discussion: https://postgr.es/m/CAPpHfdvjix0Ahx-H3Jp1M2R%2B_74P-zKnGGygx4OWr%3DbUQ8BNdw%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Pavel Borisov, Maxim Orlov, Matthias van de Meent
Reviewed-by: Andres Freund, John Naylor
Abbreviation code is very similar along tuplesort_put*() functions. This
commit unifies that code and puts it into puttuple_common(). tuplesort_put*()
functions differs in the abbreviation condition, so it has been added as an
argument to the puttuple_common() function.
Discussion: https://postgr.es/m/CAPpHfdvjix0Ahx-H3Jp1M2R%2B_74P-zKnGGygx4OWr%3DbUQ8BNdw%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Pavel Borisov, Maxim Orlov, Matthias van de Meent
Reviewed-by: Andres Freund, John Naylor
This commit is the preparation to move abbreviation logic into
puttuple_common(). The new removeabbrev function turns datum1 representation
of SortTuple's from the abbreviated key to the first column value. Therefore,
it encapsulates the differential part of abbreviation handling code in
tuplesort_put*() functions, making these functions similar.
Discussion: https://postgr.es/m/CAPpHfdvjix0Ahx-H3Jp1M2R%2B_74P-zKnGGygx4OWr%3DbUQ8BNdw%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Pavel Borisov, Maxim Orlov, Matthias van de Meent
Reviewed-by: Andres Freund, John Naylor
It's currently unclear how do we split functionality between
Tuplesortstate.copytup() function and tuplesort_put*() functions.
For instance, copytup_index() and copytup_datum() return error while
tuplesort_putindextuplevalues() and tuplesort_putdatum() do their work.
This commit removes Tuplesortstate.copytup() altogether, putting the
corresponding code into tuplesort_put*().
Discussion: https://postgr.es/m/CAPpHfdvjix0Ahx-H3Jp1M2R%2B_74P-zKnGGygx4OWr%3DbUQ8BNdw%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Pavel Borisov, Maxim Orlov, Matthias van de Meent
Reviewed-by: Andres Freund, John Naylor
XLogRecordBlockHeader, the header holding the information for the data
related to a block, tracks the length of the data appended to the WAL
record with data_length (uint16). This limitation in size was not
enforced by the public routine in charge of registering the data
assembled later to form the WAL record inserted, XLogRegisterBufData().
Incorrectly used, it could lead to the generation of records with some
of its data overflowed. This commit adds some safeguards to prevent
that for the block data, complaining immediately if attempting to add to
a record block information with a size larger than UINT16_MAX, which is
the limit implied by the internal logic.
Note that this also adjusts XLogRegisterData() and XLogRegisterBufData()
so as the length of the WAL record data given by the caller is unsigned,
matching with what gets stored in XLogRecData->len.
Extracted from a larger patch by the same author. The original patch
includes more protections when assembling a record in full that will be
looked at separately later.
Author: Matthias van de Meent
Reviewed-by: Andres Freund, Heikki Linnakangas, Michael Paquier, David
Zhang
Discussion: https://postgr.es/m/CAEze2WgGiw+LZt+vHf8tWqB_6VxeLsMeoAuod0N=ij1q17n5pw@mail.gmail.com
As before, we start by prepending one underscore (truncating the
base name if necessary). But if there is a conflict, then instead of
prepending more and more underscores, append an underscore and some
digits, in much the same way that ChooseRelationName does. While
the previous logic could be driven to fail by creating a lot of
types with long names differing only near the end, this version seems
certain enough to eventually succeed that we can remove the failure
code path that was there before.
While at it, undo 6df7a9698's decision to split this code out of
makeArrayTypeName. That wasn't actually accomplishing anything,
because no other function was using it --- and it would have been
wrong to do so. The convention that a prefix "_" means an array,
not something else, is too ancient to mess with.
Andrey Lepikhov and Dmitry Koval, reviewed by Masahiko Sawada and myself
Discussion: https://postgr.es/m/b84cd82c-cc67-198a-8b1c-60f44e1259ad@postgrespro.ru
The BoolGetDatum() call ended up in the wrong place. It should be
applied when we, err, want to convert a bool to a datum.
Thanks to Tom Lane for noticing this.
Discussion: http://postgr.es/m/2511599.1658861964@sss.pgh.pa.us
A bootstrap user who is not a superuser will still own many
important system objects, such as the pg_catalog schema, that
will likely allow that user to regain superuser status. Therefore,
allowing the superuser property to be removed from the superuser
creates a false perception of security where none exists.
Although removing superuser from the bootstrap user is also a bad idea
and should be considered unsupported in all released versions, no
back-patch, as this is a behavior change.
Discussion: http://postgr.es/m/CA+TgmoZirCwArJms_fgvLBFrC6b=HdxmG7iAhv+kt_=NBA7tEw@mail.gmail.com
We have a few commands that "can't run in a transaction block",
meaning that if they complete their processing but then we fail
to COMMIT, we'll be left with inconsistent on-disk state.
However, the existing defenses for this are only watertight for
simple query protocol. In extended protocol, we didn't commit
until receiving a Sync message. Since the client is allowed to
issue another command instead of Sync, we're in trouble if that
command fails or is an explicit ROLLBACK. In any case, sitting
in an inconsistent state while waiting for a client message
that might not come seems pretty risky.
This case wasn't reachable via libpq before we introduced pipeline
mode, but it's always been an intended aspect of extended query
protocol, and likely there are other clients that could reach it
before.
To fix, set a flag in PreventInTransactionBlock that tells
exec_execute_message to force an immediate commit. This seems
to be the approach that does least damage to existing working
cases while still preventing the undesirable outcomes.
While here, add some documentation to protocol.sgml that explicitly
says how to use pipelining. That's latent in the existing docs if
you know what to look for, but it's better to spell it out; and it
provides a place to document this new behavior.
Per bug #17434 from Yugo Nagata. It's been wrong for ages,
so back-patch to all supported branches.
Discussion: https://postgr.es/m/17434-d9f7a064ce2a88a3@postgresql.org
Presently, archive status files are durably renamed from .ready to
.done to indicate that a file has been archived. Persisting this
rename to disk accounts for a significant amount of the overhead
associated with archiving. While durably renaming the file
prevents re-archiving in most cases, archive commands and libraries
must already gracefully handle attempts to re-archive the last
archived file after a crash (e.g., a crash immediately after
archive_command exits but before the server renames the status
file).
This change reduces the amount of overhead associated with
archiving by using rename() instead of durable_rename() to rename
the archive status files. As a consequence, the server is more
likely to attempt to re-archive files after a crash, but as noted
above, archive commands and modules are already expected to handle
this. It is also possible that the server will attempt to re-
archive files that have been removed or recycled, but the archiver
already handles this, too.
Author: Nathan Bossart
Reviewed-by: Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/20220222011948.GA3850532@nathanxps13
Since a2c8499, HbaFileName (default pg_hba.conf) was getting used
instead of IdentFileName (default pg_ident.conf) as the parent file to
use as reference when parsing the contents of pg_ident.conf, with
pg_ident.conf correctly opened, when feeding this information to
pg_ident_file_mappings. This had two consequences:
- On an I/O error when reading pg_ident.conf, the user would get an
ERROR message referring to pg_hba.conf and not pg_ident.conf.
- When reading an external file with a relative path using '@' in
pg_ident.conf, the directory used to look at the file to load would be
the base directory of pg_hba.conf rather than the one of pg_ident.conf,
leading to errors in pg_ident_file_mappings inconsistent with what gets
loaded at startup when pg_ident.conf and pg_hba.conf are located in
different directories.
This error only impacted the SQL view pg_ident_file_mappings that uses a
logic new to v15 to fill the view with the parsed information, not the
code paths loading these authentication files at startup.
Author: Julien Rouhaud
Discussion: https://postgr.es/m/20220726050402.vsr6fmz7rsgpmdz3@jrouhaud
Backpatch-through: 15
This addresses a couple of bugs in the REINDEX grammar, introduced by
83011ce:
- A name was never specified for DATABASE/SYSTEM, even if the query
included one. This caused such REINDEX queries to always work with any
object name, but we should complain if the object name specified does
not match the name of the database we are connected to. A test is added
for this case in the main regression test suite, provided by Álvaro.
- REINDEX SYSTEM CONCURRENTLY [name] was getting rejected in the
parser. Concurrent rebuilds are not supported for catalogs but the
error provided at execution time is more helpful for the user, and
allowing this flavor results in a simplification of the parsing logic.
- REINDEX DATABASE CONCURRENTLY was rebuilding the index in a
non-concurrent way, as the option was not being appended correctly in
the list of DefElems in ReindexStmt (REINDEX (CONCURRENTLY) DATABASE was
working fine. A test is added in the TAP tests of reindexdb for this
case, where we already have a REINDEX DATABASE CONCURRENTLY query
running on a small-ish instance. This relies on the work done in
2cbc3c1 for SYSTEM, but here we check if the OIDs of the index relations
match or not after the concurrent rebuild. Note that in order to get
this part to work, I had to tweak the tests so as the index OID and
names are saved separately. This change not affect the reliability or
of the coverage of the existing tests.
While on it, I have implemented a tweak in the grammar to reduce the
parsing by one branch, simplifying things even more.
Author: Michael Paquier, Álvaro Herrera
Discussion: https://postgr.es/m/YttqI6O64wDxGn0K@paquier.xyz
Previously we did this after InitPostgres, at a somewhat randomly chosen
place within PostgresMain. However, since commit a0ffa885e doing this
outside a transaction can cause a crash, if we need to check permissions
while replacing a placeholder GUC. (Besides which, a preloaded library
could itself want to do database access within _PG_init.)
To avoid needing an additional transaction start/end in every session,
move the process_session_preload_libraries call to within InitPostgres's
transaction. That requires teaching the code not to call it when
InitPostgres is called from somewhere other than PostgresMain, since
we don't want session_preload_libraries to affect background workers.
The most future-proof solution here seems to be to add an additional
flag parameter to InitPostgres; fortunately, we're not yet very worried
about API stability for v15.
Doing this also exposed the fact that we're currently honoring
session_preload_libraries in walsenders, even those not connected to
any database. This seems, at minimum, a POLA violation: walsenders
are not interactive sessions. Let's stop doing that.
(All these comments also apply to local_preload_libraries, of course.)
Per report from Gurjeet Singh (thanks also to Nathan Bossart and Kyotaro
Horiguchi for review). Backpatch to v15 where a0ffa885e came in.
Discussion: https://postgr.es/m/CABwTF4VEpwTHhRQ+q5MiC5ucngN-whN-PdcKeufX7eLSoAfbZA@mail.gmail.com
It incorrectly used GetBufferDescriptor instead of
GetLocalBufferDescriptor, causing it to not find the correct buffer in
most cases, and performing an out-of-bounds memory read in the corner
case that temp_buffers > shared_buffers.
It also bumped the usage-count on the buffer, even if it was
previously pinned. That won't lead to crashes or incorrect results,
but it's different from what the shared-buffer case does, and
different from the usual code in LocalBufferAlloc. Fix that too, and
make the code ordering match LocalBufferAlloc() more closely, so that
it's easier to verify that it's doing the same thing.
Currently, ReadRecentBuffer() is only used with non-temp relations, in
WAL redo, so the broken code is currently dead code. However, it could
be used by extensions.
Backpatch-through: 14
Discussion: https://www.postgresql.org/message-id/2d74b46f-27c9-fb31-7f99-327a87184cc0%40iki.fi
Reviewed-by: Thomas Munro, Zhang Mingli, Richard Guo
This commit removes two arguments "report" and "whichChkpt"
in ReadCheckpointRecord().
"report" is obviously useless because it's always true, i.e., there are
two callers of the function and they always specify true as "report".
Commit 1d919de5eb removed the only call with "report" = false.
"whichChkpt" indicated where the specified checkpoint location
came from, pg_control or backup_label. This information was used
to report different error messages depending on where the invalid
checkpoint record came from, when it was found.
But ReadCheckpointRecord() doesn't need to do that because
its callers already do that and users can still identify where
the invalid checkpoint record came from, by reading such log messages.
Also when "whichChkpt" was 0, the word "primary checkpoint" was used
in the log message and could confuse users because the concept of
primary and secondary checkpoints was already removed before.
These are why this commit removes "whichChkpt" argument.
Author: Fujii Masao
Reviewed-by: Bharath Rupireddy, Kyotaro Horiguchi
Discussion: https://postgr.es/m/fa2e12eb-81c3-0717-0272-755f8a81c8f2@oss.nttdata.com
getrusage() is in SUSv2 and all targeted Unix systems have it.
Note that POSIX only covers ru_utime and ru_stime and we rely on many
more fields without any kind of configure probe, but that predates this
commit.
The only supported system we need replacement code for now is Windows,
and that can be done without a configure probe.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
The dependency logic failed to register a column-level dependency
when a view or rule contains a reference to a specific column of
the result of a function-returning-composite. That meant you could
drop the column from the composite type, causing trouble for future
executions of the view. We've known about this for years, but never
summoned the energy to actually fix it, instead installing various
low-level defenses to prevent crashing on references to dropped columns.
We had to do that to plug the hole in stable branches, where there might
be pre-existing broken references; but let's fix the root cause today.
To do that, add some logic (borrowed from get_rte_attribute_is_dropped)
to find_expr_references_walker, to check whether a Var referencing an
RTE_FUNCTION RTE is referencing a column of a composite type, and if
so add the proper dependency.
However ... it seems mighty unwise to remove said low-level defenses,
since there could be other bugs now or in the future that allow
reaching them. By the same token, letting those defenses go untested
seems unwise. Hence, rather than just dropping the associated test
cases, hack them to continue working by the expedient of manually
dropping the pg_depend entries that this fix installs.
Back-patch into v15. I don't want to risk changing this behavior
in stable branches, but it seems not too late for v15. (Since
we have already forced initdb for beta3, we can be sure that all
production v15 installations will have these added dependencies.)
Discussion: https://postgr.es/m/182492.1658431155@sss.pgh.pa.us
Things like "opt_name" can well be shared by various commands rather
than there being multiple definitions of the same thing. Rename these
productions and move them to appear together in gram.y, which may
improve chances of reuse in the future.
Discussion: https://postgr.es/m/20220721174212.cmitjpuimx6ssyyj@alvherre.pgsql
Commit c6f2f016 added an explicit check for a Windows "junction point".
That turned out to be needed only because get_dirent_type() was busted
on Windows. It's been fixed by commit 9d3444dc, so remove it.
Add a TAP-test to demonstrate that in-place tablespaces are copied by
pg_basebackup. This exercises the codepath that would fail before
c6f2f016 on Windows, and shows that it still doesn't fail now that we're
using get_dirent_type() on both Windows and Unix.
Back-patch to 15, where in-place tablespaces arrived and caused this
problem (ie directories where previously only symlinks were expected).
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGLzLK4PUPx0_AwXEWXOYAejU%3D7XpxnYE55Y%2Be7hB2N3FA%40mail.gmail.com