7fcbf6a and 2ff6555 changed the function signature of XLogPageRead()
but did not update the comment.
XLogReaderRoutine contains up to date information about the API, so no
need to repeat all that at XLogPageRead(), but fix the mentions of the
no longer existing function arguments.
This enhances the numeric type input function, adding support for
hexadecimal, octal, and binary integers of any size, up to the limits
of the numeric type.
Since 6fcda9aba8, such non-decimal integers have been accepted by the
parser as integer literals and passed through to numeric_in(). This
commit gives numeric_in() the ability to handle them.
While at it, simplify the handling of NaN and infinities, reducing the
number of calls to pg_strncasecmp(), and arrange for pg_strncasecmp()
to not be called at all for regular numbers. This gives a significant
performance improvement for decimal inputs, more than offsetting the
small performance hit of checking for non-decimal input.
Discussion: https://postgr.es/m/CAEZATCV8XShnmT9HZy25C%2Bo78CVOFmUN5EM9FRAZ5xvYTggPMg%40mail.gmail.com
An upcoming patch by Melanie Plageman does some refactoring work in this
area. Run pgindent on that file now before making any changes so that
it's easier to maintain/evolve each of the individual patches doing the
refactor work. Additionally, add a few new required typedefs to the list
to make it easier to do future pgindent runs on this file during the
refactor work.
Discussion: https://postgr.es/m/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
Commit c4649cce39 removed the "shared" and "ntapes" arguments, but the
comment still talked about "shared". It also talked about "a shared
file handle", which was technically correct because even before commit
c4649cce39, the "shared file handle" referred to the "fileset"
argument, not "shared". But it was very confusing. Improve the
comment.
Also add a comment on what the "preallocate" argument does.
Backpatch to v15, just to make backpatching other patches easier in
the future.
Discussion: https://www.postgresql.org/message-id/af989685-91d5-aad4-8f60-1d066b5ec309@enterprisedb.com
Reviewed-by: Peter Eisentraut
This adds combine, serial and deserial functions for the array_agg() and
string_agg() aggregate functions, thus allowing these aggregates to
partake in partial aggregations. This allows both parallel aggregation to
take place when these aggregates are present and also allows additional
partition-wise aggregation plan shapes to include plans that require
additional aggregation once the partially aggregated results from the
partitions have been combined.
Author: David Rowley
Reviewed-by: Andres Freund, Tomas Vondra, Stephen Frost, Tom Lane
Discussion: https://postgr.es/m/CAKJS1f9sx_6GTcvd6TMuZnNtCh0VhBzhX6FZqw17TgVFH-ga_A@mail.gmail.com
Enforce wal_retrieve_retry_interval on a per-subscription basis,
rather than globally, and arrange to skip that delay in case of
an intentional worker exit. This probably makes little difference
in the field, where apply workers wouldn't be restarted often;
but it has a significant impact on the runtime of our logical
replication regression tests (even though those tests use
artificially-small wal_retrieve_retry_interval settings already).
Nathan Bossart, with mostly-cosmetic editorialization by me
Discussion: https://postgr.es/m/20221122004119.GA132961@nathanxps13
The motivation for this change is that when pg_dump dumps a
partitioned index that's marked REPLICA IDENTITY, it generates a
command sequence that applies REPLICA IDENTITY before the partitioned
index has been marked valid, causing restore to fail. We could
perhaps change pg_dump to not do it like that, but that would be
difficult and would not fix existing dump files with the problem.
There seems to be very little reason for the backend to disallow
this anyway --- the code ignores indisreplident when the index
isn't valid --- so instead let's fix it by allowing the case.
Commit 9511fb37a previously expressed a concern that allowing
indisreplident to be set on invalid indexes might allow us to
wind up in a situation where a table could have indisreplident
set on multiple indexes. I'm not sure I follow that concern
exactly, but in any case the only way that could happen is because
relation_mark_replica_identity is too trusting about the existing set
of markings being valid. Let's just rip out its early-exit code path
(which sure looks like premature optimization anyway; what are we
doing expending code to make redundant ALTER TABLE ... REPLICA
IDENTITY commands marginally faster and not-redundant ones marginally
slower?) and fix it to positively guarantee that no more than one
index is marked indisreplident.
The pg_dump failure can be demonstrated in all supported branches,
so back-patch all the way. I chose to back-patch 9511fb37a as well,
just to keep indisreplident handling the same in all branches.
Per bug #17756 from Sergey Belyashov.
Discussion: https://postgr.es/m/17756-dd50e8e0c8dd4a40@postgresql.org
When the length was too short, the server read outside the allocation.
That yielded the same log noise as sending the correct length with
(backendPID,cancelAuthCode) matching nothing. Change to a message about
the unexpected length. Given the attacker's lack of control over the
memory layout and the general lack of diversity in memory layouts at the
code in question, we doubt a would-be attacker could cause a segfault.
Hence, while the report arrived via security@postgresql.org, this is not
a vulnerability. Back-patch to v11 (all supported versions).
Andrey Borodin, reviewed by Tom Lane. Reported by Andrey Borodin.
These are all not necessary from a correctness POV. However, in the near
future instr_time will be simplified to an int64, at which point gcc would
otherwise start to warn about the changed places.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20230116023639.rn36vf6ajqmfciua@awork3.anarazel.de
This will ease a follow-up move that will generate automatically this
code. The C file is renamed, for consistency with the node-related
files whose code are generated by gen_node_support.pl:
- queryjumble.c -> queryjumblefuncs.c
- utils/queryjumble.h -> nodes/queryjumble.h
Per a suggestion from Peter Eisentraut.
Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/Y5BHOUhX3zTH/ig6@paquier.xyz
This provides a way to reserve connection slots for non-superusers.
The slots reserved via the new GUC are available only to users who
have the new predefined role pg_use_reserved_connections.
superuser_reserved_connections remains as a final reserve in case
reserved_connections has been exhausted.
Patch by Nathan Bossart. Reviewed by Tushar Ahuja and by me.
Discussion: http://postgr.es/m/20230119194601.GA4105788@nathanxps13
This is in preparation for adding a new reserved_connections GUC,
but aligning the GUC name with the variable name is also a good
idea on general principle.
Patch by Nathan Bossart. Reviewed by Tushar Ahuja and by me.
Discussion: http://postgr.es/m/20230119194601.GA4105788@nathanxps13
Commit ea92368cd1 made max_wal_senders
a separate pool of backends from max_connections, but the documentation
and error message for superuser_reserved_connections weren't updated
at the time, and as a result are somewhat misleading. Update.
This is arguably a back-patchable bug fix, but because it seems quite
minor, no back-patch.
Patch by Nathan Bossart. Reviewed by Tushar Ahuja and by me.
Discussion: http://postgr.es/m/20230119194601.GA4105788@nathanxps13
While pg_hba.conf has support for non-literal username matches, and
this commit extends the capabilities that are supported for the
PostgreSQL user listed in an ident entry part of pg_ident.conf, with
support for:
1. The "all" keyword, where all the requested users are allowed.
2. Membership checks using the + prefix.
3. Using a regex to match against multiple roles.
1. is a feature that has been requested by Jelte Fennema, 2. something
that has been mentioned independently by Andrew Dunstan, and 3. is
something I came up with while discussing how to extend the first one,
whose implementation is facilitated by 8fea868.
This allows matching certain system users against many different
postgres users with a single line in pg_ident.conf. Without this, one
would need one line for each of the postgres users that a system user
can log in as, which can be cumbersome to maintain.
Tests are added to the TAP test of peer authentication to provide
coverage for all that.
Note that this introduces a set of backward-incompatible changes to be
able to detect the new patterns, for the following cases:
- A role named "all".
- A role prefixed with '+' characters, which is something that would not
have worked in HBA entries anyway.
- A role prefixed by a slash character, similarly to 8fea868.
Any of these can be still be handled by using quotes in the Postgres
role defined in an ident entry.
A huge advantage of this change is that the code applies the same checks
for the Postgres roles in HBA and ident entries, via the common routine
check_role().
**This compatibility change should be mentioned in the release notes.**
Author: Jelte Fennema
Discussion: https://postgr.es/m/DBBPR83MB0507FEC2E8965012990A80D0F7FC9@DBBPR83MB0507.EURPRD83.prod.outlook.com
This adjusts a few places which were appending a string constant
containing spaces onto a StringInfo. We have appendStringInfoSpaces for
that job, so let's use that instead.
For the change to jsonb.c's add_indent() function, appendStringInfoString
was being called inside a loop to append 4 spaces on each loop. This
meant that enlargeStringInfo would get called once per loop. Here it
should be much more efficient to get rid of the loop and just calculate
the number of spaces with "level * 4" and just append all the spaces in
one go.
Here we additionally adjust the appendStringInfoSpaces function so it
makes use of memset rather than a while loop to apply the required spaces
to the StringInfo. One of the problems with the while loop was that it
was incrementing one variable and decrementing another variable once per
loop. That's more work than what's required to get the job done. We may
as well use memset for this rather than trying to optimize the existing
loop. Some testing has shown memset is faster even for very small sizes.
Discussion: https://postgr.es/m/CAApHDvp_rKkvwudBKgBHniNRg67bzXVjyvVKfX0G2zS967K43A@mail.gmail.com
This patch largely reverts what I did in commits c9b0c678d and
78e73e875. The maximum cover length limit that I added in 78e73e875
(to band-aid over c9b0c678d's performance issues) creates too many
user-visible behavior discrepancies, as complained of for example in
bug #17691. The real problem with hlCover() is not what I thought
at the time, but more that it seems to have been designed with only
AND tsquery semantics in mind. It doesn't work quite right for OR,
and even less so for NOT or phrase queries. However, we can improve
that situation by building a variant of TS_execute() that returns a
list of match locations. We already get an ExecPhraseData struct
representing match locations for the primitive case of a simple match,
as well as one for a phrase match; we just need to add some logic to
combine these for AND and OR operators. The result is a list of
ExecPhraseDatas, which hlCover can regard as having simple AND
semantics, so that its old algorithm works correctly.
There's still a lot not to like about ts_headline's behavior, but
I think the remaining issues have to do with the heuristics used
in mark_hl_words and mark_hl_fragments (which, likewise, were not
revisited when phrase search was added). Improving those is a task
for another day.
Patch by me; thanks to Alvaro Herrera for review.
Discussion: https://postgr.es/m/840.1669405935@sss.pgh.pa.us
When ending recovery based on recovery_target_xid matching with
recovery_target_inclusive = off, we printed an incorrect timestamp
(always 2000-01-01) in the "recovery stopping before ... transaction"
log message. This is a consequence of sloppy refactoring in
c945af80c: the code to fetch recordXtime out of the commit/abort
record used to be executed unconditionally, but it was changed
to get called only in the RECOVERY_TARGET_TIME case. We need only
flip the order of operations to restore the intended behavior.
Per report from Torsten Förtsch. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/CAKkG4_kUevPqbmyOfLajx7opAQk6Cvwkvx0HRcFjSPfRPTXanA@mail.gmail.com
RelOptInfo.userid is the same for all relations in a given inheritance
tree, so the code in examine_variable() and example_simple_variable()
that repeats the ACL checks on the root parent rel instead of a given
leaf child relations need not recompute userid too.
Author: Amit Langote <amitlangote09@gmail.com>
Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20221210201753.GA27893@telsasoft.com
This is wrong since 88e9823, that has switched the WAL sizing
configuration from checkpoint_segments to min_wal_size and
max_wal_size. This missed the recalculation of the internal value of
the internal "CheckPointSegments", that works as a mapping of the old
GUC checkpoint_segments, on reload, for example, and it controls the
timing of checkpoints depending on the volume of WAL generated.
Most users tend to leave checkpoint_completion_target at 0.9 to smooth
the I/O workload, which is why I guess this has gone unnoticed for so
long, still it can be useful to tweak and reload the value dynamically
in some cases to control the timing of checkpoints.
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACXgPPAm28mruojSBno+F_=9cTOOxHAywu_dfZPeBdybQw@mail.gmail.com
Backpatch-through: 11
In c8ad4d8166 dlist_member_check()'s arguments were made const. Unfortunately
the implementation of dlist_member_check() used dlist_foreach(), which
currently doesn't work for const lists.
As a workaround, open-code the list iteration. The other check functions
already do so.
Discussion: https://postgr.es/m/20230118182214.co7dp4oahiunwg57@awork3.anarazel.de
The rule system needs "old" and/or "new" pseudo-RTEs in rule actions
that are ON INSERT/UPDATE/DELETE. Historically it's put such entries
into the ON SELECT rules of views as well, but those are really quite
vestigial. The only thing we've used them for is to carry the
view's relid forward to AcquireExecutorLocks (so that we can
re-lock the view to verify it hasn't changed before re-using a plan)
and to carry its relid and permissions data forward to execution-time
permissions checks. What we can do instead of that is to retain
these fields of the RTE_RELATION RTE for the view even after we
convert it to an RTE_SUBQUERY RTE. This requires a tiny amount of
extra complication in the planner and AcquireExecutorLocks, but on
the other hand we can get rid of the logic that moves that data from
one place to another.
The principal immediate benefit of doing this, aside from a small
saving in the pg_rewrite data for views, is that these pseudo-RTEs
no longer trigger ruleutils.c's heuristic about qualifying variable
names when the rangetable's length is more than 1. That results
in quite a number of small simplifications in regression test outputs,
which are all to the good IMO.
Bump catversion because we need to dump a few more fields of
RTE_SUBQUERY RTEs. While those will always be zeroes anyway in
stored rules (because we'd never populate them until query rewrite)
they are useful for debugging, and it seems like we'd better make
sure to transmit such RTEs accurately in plans sent to parallel
workers. I don't think the executor actually examines these fields
after startup, but someday it might.
This is a second attempt at committing 1b4d280ea. The difference
from the first time is that now we can add some filtering rules to
AdjustUpgrade.pm to allow cross-version upgrade testing to pass
despite all the cosmetic changes in CREATE VIEW outputs.
Amit Langote (filtering rules by me)
Discussion: https://postgr.es/m/CA+HiwqEf7gPN4Hn+LoZ4tP2q_Qt7n3vw7-6fJKOf92tSEnX6Gg@mail.gmail.com
Discussion: https://postgr.es/m/891521.1673657296@sss.pgh.pa.us
Avoid explicitly grouping by columns that we know are redundant
for sorting, for example we need group by only one of x and y in
SELECT ... WHERE x = y GROUP BY x, y
This comes up more often than you might think, as shown by the
changes in the regression tests. It's nearly free to detect too,
since we are just piggybacking on the existing logic that detects
redundant pathkeys. (In some of the existing plans that change,
it's visible that a sort step preceding the grouping step already
didn't bother to sort by the redundant column, making the old plan
a bit silly-looking.)
To do this, build processed_groupClause and processed_distinctClause
lists that omit any provably-redundant sort items, and consult those
not the originals where relevant. This means that within the
planner, one should usually consult root->processed_groupClause or
root->processed_distinctClause if one wants to know which columns
are to be grouped on; but to check whether grouping or distinct-ing
is happening at all, check non-NIL-ness of parse->groupClause or
parse->distinctClause. This is comparable to longstanding rules
about handling the HAVING clause, so I don't think it'll be a huge
maintenance problem.
nodeAgg.c also needs minor mods, because it's now possible to generate
AGG_PLAIN and AGG_SORTED Agg nodes with zero grouping columns.
Patch by me; thanks to Richard Guo and David Rowley for review.
Discussion: https://postgr.es/m/185315.1672179489@sss.pgh.pa.us
Add leader_pid to pg_stat_subscription. leader_pid is the process ID of
the leader apply worker if this process is a parallel apply worker. If
this field is NULL, it indicates that the process is a leader apply
worker or a synchronization worker. The new column makes it easier to
distinguish parallel apply workers from other kinds of workers and helps
to identify the leader for the parallel workers corresponding to a
particular subscription.
Additionally, update the leader_pid column in pg_stat_activity as well to
display the PID of the leader apply worker for parallel apply workers.
Author: Hou Zhijie
Reviewed-by: Peter Smith, Sawada Masahiko, Amit Kapila, Shveta Mallik
Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
Presently, restore_command uses a different code path than
archive_cleanup_command and recovery_end_command. These code paths
are similar and can be easily combined, as long as it is possible to
identify if a command should:
- Issue a FATAL on signal.
- Exit immediately on SIGTERM.
While on it, this removes src/common/archive.c and its associated
header. Since the introduction of c96de2c, BuildRestoreCommand() has
become a simple wrapper of replace_percent_placeholders() able to call
make_native_path(). This simplifies shell_restore.c as long as
RestoreArchivedFile() includes a call to make_native_path().
Author: Nathan Bossart
Reviewed-by: Andres Freund, Michael Paquier
Discussion: https://postgr.es/m/20221227192449.GA3672473@nathanxps13
This makes sure that the internal logic of these functions does not
attempt to change the value of the arguments constified, and it removes
one unconstify() in basic_archive.c.
Author: Nathan Bossart
Reviewed-by: Andrew Dunstan, Peter Eisentraut
Discussion: https://postgr.es/m/20230114231126.GA2580330@nathanxps13
Instead of half a dozen of mostly-duplicate conditional branches,
write one common one that can handle most catalogs. We already have
all the information we need, such as which system catalog corresponds
to which catalog table and which column is the ACL column.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/504bc485-6bd6-dd1b-fe10-e7351aeb310d@enterprisedb.com
Remove some code guarded by IS_MINUS() or IS_PLUS(), where the entire
stanza is inside an else-block where both of these are false. This
should slightly improve test coverage.
While at it, remove coding that apparently assumes that unsetting a
bit is so expensive that we have to first check if it's already set
in the first place.
Per Coverity report from Ranier Vilela
Analysis and review by Justin Pryzby
Discussion: https://www.postgresql.org/message-id/20221223010818.GP1153%40telsasoft.com
The code that decides the apply action missed to handle non-transactional
messages and we didn't catch it in our testing as currently such messages
are simply ignored by the apply worker. This was introduced by changes in
commit 216a784829.
While testing this, I noticed that we forgot to reset stream_xid after
processing the stream stop message which could also result in the wrong
apply action after the fix for non-transactional messages.
In passing, change assert to elog for unexpected apply action in some of
the routines so as to catch the problems in the production environment, if
any.
Reported-by: Tomas Vondra
Author: Amit Kapila
Reviewed-by: Tomas Vondra, Sawada Masahiko, Hou Zhijie
Discussion: https://postgr.es/m/984ff689-adde-9977-affe-cd6029e850be@enterprisedb.com
Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
In 1349d2790, we gave the planner the ability to provide ORDER BY/DISTINCT
Aggrefs with presorted input so that nodeAgg would not have to perform
sorts during execution. That commit failed to properly consider the
implications of if the Aggref had a volatile function in its ORDER
BY/DISTINCT clause. As it happened, this resulted in an ERROR about the
volatile function being missing from the targetlist.
Here, instead of adding the volatile function to the targetlist, we just
never consider an Aggref with a volatile function in its ORDER BY/DISTINCT
clause when choosing which Aggrefs we should sort by. We do this as if we
were to choose a plan which provided these aggregates with presorted
input, then if there were many such aggregates which could all share the
same sort order, then it may be surprising if they all shared the same
sort sometimes and didn't at other times when some other set of aggregates
were given presorted results. We can avoid this inconsistency by just
never providing these volatile function aggregates with presorted input.
Reported-by: Dean Rasheed
Discussion: https://postgr.es/m/CAEZATCWETioXs5kY8vT6BVguY41_wD962VDk=u_Nvd7S1UXzuQ@mail.gmail.com
Tighten up the way that visibilitymap_set() is called: request that both
the all-visible and all-frozen bits get set whenever the all-frozen bit
is set, regardless of what we think we know about the present state of
the all-visible bit. Also make sure that the page level PD_ALL_VISIBLE
flag is set in the same code path.
In practice there doesn't seem to be a concrete scenario in which the
previous approach could lead to inconsistencies. It was almost possible
in scenarios involving concurrent HOT updates from transactions that
abort, but (unlike pruning) freezing can never remove XIDs > VACUUM's
OldestXmin, even those from transactions that are known to have aborted.
That was protective here.
These issues have been around since commit a892234f83, which added the
all-frozen bit to the VM fork. There is no known live bug here, so no
backpatch.
In passing, add some defensive assertions to catch the issue, and stop
reading the existing state of the VM when setting the VM in VACUUM's
final heap pass. We already know that affected pages must have had at
least one LP_DEAD item before we set it LP_UNUSED, so there is no point
in reading the VM when it is set like this.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-WznuNGSzF8v6OsgjaC5aYsb3cZ6HW6MLm30X0d65cmSH6A@mail.gmail.com
Most callers of BufFileRead() want to check whether they read the full
specified length. Checking this at every call site is very tedious.
This patch provides additional variants BufFileReadExact() and
BufFileReadMaybeEOF() that include the length checks.
I considered changing BufFileRead() itself, but this function is also
used in extensions, and so changing the behavior like this would
create a lot of problems there. The new names are analogous to the
existing LogicalTapeReadExact().
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/f3501945-c591-8cc3-5ef0-b72a2e0eaa9c@enterprisedb.com
The code specific to the execution of archive_cleanup_command,
recovery_end_command and restore_command is moved to a new file named
shell_restore.c. The code is split into three functions:
- shell_restore(), that attempts the execution of a shell-based
restore_command.
- shell_archive_cleanup(), for archive_cleanup_command.
- shell_recovery_end(), for recovery_end_command.
This introduces no functional changes, with failure patterns and logs
generated in consequence being the same as before (one case actually
generates one less DEBUG2 message "could not restore" when a restore
command succeeds but the follow-up stat() to check the size fails, but
that only matters with a elevel high enough).
This is preparatory work for allowing recovery modules, a facility
similar to archive modules, with callbacks shaped similarly to the
functions introduced here.
Author: Nathan Bossart
Reviewed-by: Andres Freund, Michael Paquier
Discussion: https://postgr.es/m/20221227192449.GA3672473@nathanxps13
While system_user was stored as an AuthToken in IdentLine, pg_user was
stored as a plain string. This commit changes the code as we start
storing pg_user as an AuthToken too.
This does not have any functional changes, as all the operations on
pg_user only use the string from the AuthToken. There is no regexp
compiled and no check based on its quoting, yet. This is in preparation
of more features that intend to extend its capabilities, like support
for regexps and group membership.
Author: Jelte Fennema
Discussion: https://postgr.es/m/CAGECzQRNow4MwkBjgPxywXdJU_K3a9+Pm78JB7De3yQwwkTDew@mail.gmail.com
int2vectorin limited the number of array elements it'd take to
FUNC_MAX_ARGS, which is probably fine for the traditional use-cases.
But now that pg_publication_rel.prattrs is an int2vector, it's not
fine at all: it's easy to construct cases where that can have up to
about MaxTupleAttributeNumber entries. Trying to replicate such
tables leads to logical-replication failures.
As long as we have to touch this code anyway, let's just remove
the a-priori limit altogether, and let it accept any size that'll
be allowed by repalloc. (Note that since int2vector isn't toastable,
we cannot store arrays longer than about BLCKSZ/2; but there is no
good excuse for letting int2vectorin depend on that. Perhaps we
will lift the no-toast restriction someday.)
While at it, also improve the equivalent logic in oidvectorin.
I don't know of any practical use-case for long oidvectors right
now, but doing it right actually makes the code shorter.
Per report from Erik Rijkers. Back-patch to v15 where
pg_publication_rel.prattrs was added.
Discussion: https://postgr.es/m/668ba539-33c5-8190-ca11-def2913cb94b@xs4all.nl
In commit 8bf6ec3ba I assumed that no code path could reach
ExecGetExtraUpdatedCols without having gone through
ExecInitStoredGenerated. That turns out not to be the case in
logical replication: if there's an ON UPDATE trigger on the target
table, trigger.c will call this code before anybody has set up its
generated columns. Having seen that, I don't have a lot of faith in
there not being other such paths. ExecGetExtraUpdatedCols can call
ExecInitStoredGenerated for itself, as long as we are willing to
assume that it is only called in CMD_UPDATE operations, which on
the whole seems like a safer leap of faith.
Per report from Vitaly Davydov.
Discussion: https://postgr.es/m/d259d69652b8c2ff50e14cda3c236c7f@postgrespro.ru
Commit 60684dd8 left loose ends when it came to maintaining toast
tables or partitions.
For toast tables, simply skip the privilege check if the toast table
is an indirect target of the maintenance command, because the main
table privileges have already been checked.
For partitions, allow the maintenance command if the user has the
MAINTAIN privilege on the partition or any parent.
Also make CLUSTER emit "skipping" messages when the user doesn't have
privileges, similar to VACUUM.
Author: Nathan Bossart
Reported-by: Pavel Luzanov
Reviewed-by: Pavel Luzanov, Ted Yu
Discussion: https://postgr.es/m/20230113231339.GA2422750@nathanxps13
When VACUUM/ANALYZE are run on an entire database, it warns of
skipping relations for which the user doesn't have sufficient
privileges. That only makes sense for tables, so skip such messages
for indexes, etc.
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/c0a85c2e83158560314b576b6241c8ed0aea1745.camel%40j-davis.com
The prior behavior was confusing and hard to document. For instance,
if you had UPDATE privileges, you could lock a table in any lock mode
except ACCESS SHARE mode.
Now, if granted a privilege to lock at a given mode, one also has
privileges to lock at a less-conflicting mode. MAINTAIN, UPDATE,
DELETE, and TRUNCATE privileges allow any lock mode. INSERT privileges
allow ROW EXCLUSIVE (or below). SELECT privileges allow ACCESS SHARE.
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/9550c76535404a83156252b25a11babb4792ea1e.camel%40j-davis.com
We don't allow different column lists for the same table in the different
publications of the single subscription. A publication with a column list
except for dropped and generated columns should be considered the same as
a publication with no column list (which implicitly includes all columns
as part of the columns list). However, as we were not excluding the
dropped and generated columns from the column list combining such
publications leads to an error "cannot use different column lists for
table ...".
We decided not to backpatch this fix as there is a risk of users seeing
this as a behavior change and also we didn't see any field report of this
case.
Author: Shi yu
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/OSZPR01MB631091CCBC56F195B1B9ACB0FDFE9@OSZPR01MB6310.jpnprd01.prod.outlook.com
This hash table is used to cache the state of streaming transactions being
applied by the parallel apply workers. So, this should be created only
when we are successful in launching at least one worker. This avoids rare
case memory leak when we are never able to launch any worker.
Author: Ted Yu
Discussion: https://postgr.es/m/CALte62wg0rBR3Vj2beV=HiWo2qG9L0hzKcX=yULNER0wmf4aEw@mail.gmail.com
The WAIT_USE_EPOLL and WAIT_USE_KQUEUE implementations of
WaitEventSetWaitBlock() confused the size of their internal buffer with
the size of the caller's output buffer, and could ask the kernel for too
many events. In fact the set of events retrieved from the kernel needs
to be able to fit in both buffers, so take the smaller of the two.
The WAIT_USE_POLL and WAIT_USE WIN32 implementations didn't have this
confusion.
This probably didn't come up before because we always used the same
number in both places, but commit 7389aad6 calculates a dynamic size at
construction time, while using MAXLISTEN for its output event buffer on
the stack. That seems like a reasonable thing to want to do, so
consider this to be a pre-existing bug worth fixing.
As discovered by valgrind on skink.
Back-patch to all supported releases for epoll, and to release 13 for
the kqueue part, which copied the incorrect epoll code.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/901504.1673504836%40sss.pgh.pa.us
The current jsonpath code assumes that the referenced variable always exists.
It could only throw an error at the value valuation time. At the same time
existence checking assumes variable is present without valuation, and error
suppression doesn't work for missing variables.
This commit makes existense checking trigger an error for missing variables.
This makes the overall behavior consistent.
Backpatch to 12 where jsonpath was introduced.
Reported-by: David G. Johnston
Discussion: https://postgr.es/m/CAKFQuwbeytffJkVnEqDyLZ%3DrQsznoTh1OgDoOF3VmOMkxcTMjA%40mail.gmail.com
Author: Alexander Korotkov, David G. Johnston
Backpatch-through: 12
Const qualifiers ensure that we don't do something stupid in the
function implementation. Additionally they clarify the interface. As
an example:
void
slist_delete(slist_head *head, const slist_node *node)
Here one can instantly tell that node->next is not going to be set to
NULL. Finally, const qualifiers potentially allow the compiler to do
more optimizations. This being said, no benchmarking was done for
this patch.
The functions that return non-const pointers like slist_next_node(),
dclist_next_node() etc. are not affected by the patch intentionally.
Author: Aleksander Alekseev
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CAJ7c6TM2%3D08mNKD9aJg8vEY9hd%2BG4L7%2BNvh30UiNT3kShgRgNg%40mail.gmail.com
The code that handles authentication for user maps was pretty confusing
with its choice of variable names. It involves two types of users: a
system user and a Postgres user (well, role), and these were not named
consistently throughout the code that processes the user maps loaded
from pg_ident.conf at authentication.
This commit changes the following things to improve the situation:
- Rename "pg_role" to "pg_user" and "token" to "system_user" in
IndetLine. These choices are more consistent with the pg_ident.conf
example in the docs, as well. "token" has been introduced recently in
fc579e1, and it is way worse than the choice before that, "ident_user".
- Switch the order of the fields in IdentLine to map with the order of
the items in the ident files, as of map name, system user and PG user.
- In check_ident_usermap(), rename "regexp_pgrole" to "expanded_pg_user"
when processing a regexp for the system user entry in a user map. This
variable does not store a regular expression at all: it would be either
a string or a substitution to \1 if the Postgres role is specified as
such.
Author: Jelte Fennema
Discussion: https://postgr.es/m/CAGECzQTkwELHUOAKhvdA+m3tWbUQySHHkExJV8GAZ1pwgbEgXg@mail.gmail.com
The creation of a logical decoding context in CreateDecodingContext()
updates some data of its slot for two-phase transactions if enabled by
the caller, but the code forgot to acquire a spinlock when updating
these fields like any other code paths. This could lead to the read of
inconsistent data.
Oversight in a8fd13c.
Author: Sawada Masahiko
Discussion: https://postgr.es/m/CAD21AoAD8_fp47191LKuecjDd3DYhoQ4TaucFco1_TEr_jQ-Zw@mail.gmail.com
Backpatch-through: 15
This reverts commit 1b4d280ea1.
It's broken the buildfarm members that run cross-version-upgrade tests,
because they're not prepared to deal with cosmetic differences between
CREATE VIEW commands emitted by older servers and HEAD. Even if we had
a solution to that, which we don't, it'd take some time to roll it out
to the affected animals. This improvement isn't valuable enough to
justify addressing that problem on an emergency basis, so revert it
for now.
Switch to a design similar to regular backends, instead of the previous
arrangement where signal handlers did non-trivial state management and
called fork(). The main changes are:
* The postmaster now has its own local latch to wait on. (For now, we
don't want other backends setting its latch directly, but that could
probably be made to work with more research on robustness.)
* The existing signal handlers are cut in two: a handle_pm_XXX() part
that just sets pending_pm_XXX flags and the latch, and a
process_pm_XXX() part that runs later when the latch is seen.
* Signal handlers are now installed with the regular pqsignal()
function rather than the special pqsignal_pm() function; historical
portability concerns about the effect of SA_RESTART on select() are no
longer relevant, and we don't need to block signals anymore.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
Rename the heapam.c freeze plan deduplication routines added by commit
9e540599 to names that follow conventions for functions in heapam.c.
Also relocate the functions so that they're next to their caller, which
runs during original execution, when FREEZE_PAGE WAL records are built.
The routines were initially placed next to (and followed the naming
conventions of) conceptually related REDO routine code, but that scheme
turned out to be kind of jarring when considered in a wider context.
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20230109214308.icz26oqvt3k2274c@awork3.anarazel.de
The rule system needs "old" and/or "new" pseudo-RTEs in rule actions
that are ON INSERT/UPDATE/DELETE. Historically it's put such entries
into the ON SELECT rules of views as well, but those are really quite
vestigial. The only thing we've used them for is to carry the
view's relid forward to AcquireExecutorLocks (so that we can
re-lock the view to verify it hasn't changed before re-using a plan)
and to carry its relid and permissions data forward to execution-time
permissions checks. What we can do instead of that is to retain
these fields of the RTE_RELATION RTE for the view even after we
convert it to an RTE_SUBQUERY RTE. This requires a tiny amount of
extra complication in the planner and AcquireExecutorLocks, but on
the other hand we can get rid of the logic that moves that data from
one place to another.
The principal immediate benefit of doing this, aside from a small
saving in the pg_rewrite data for views, is that these pseudo-RTEs
no longer trigger ruleutils.c's heuristic about qualifying variable
names when the rangetable's length is more than 1. That results
in quite a number of small simplifications in regression test outputs,
which are all to the good IMO.
Bump catversion because we need to dump a few more fields of
RTE_SUBQUERY RTEs. While those will always be zeroes anyway in
stored rules (because we'd never populate them until query rewrite)
they are useful for debugging, and it seems like we'd better make
sure to transmit such RTEs accurately in plans sent to parallel
workers. I don't think the executor actually examines these fields
after startup, but someday it might.
Amit Langote
Discussion: https://postgr.es/m/CA+HiwqEf7gPN4Hn+LoZ4tP2q_Qt7n3vw7-6fJKOf92tSEnX6Gg@mail.gmail.com
Document that TransactionIdDidAbort() won't indicate that transactions
that were in-progress during a crash have aborted. Tie this to existing
discussion of the TransactionIdDidCommit() and TransactionIdDidCommit()
protocol that code in heapam_visibility.c (and a few other places) must
observe.
Follow-up to bugfix commit eb5ad4ff.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wzn4bEEqgmaUQL3aJ73yM9gAeK-wE4ngi7kjRjLztb+P0w@mail.gmail.com
In both partitioning and traditional inheritance, require child
columns to be GENERATED if and only if their parent(s) are.
Formerly we allowed the case of an inherited column being
GENERATED when its parent isn't, but that results in inconsistent
behavior: the column can be directly updated through an UPDATE
on the parent table, leading to it containing a user-supplied
value that might not match the generation expression. This also
fixes an oversight that we enforced partition-key-columns-can't-
be-GENERATED against parent tables, but not against child tables
that were dynamically attached to them.
Also, remove the restriction that the child's generation expression
be equivalent to the parent's. In the wake of commit 3f7836ff6,
there doesn't seem to be any reason that we need that restriction,
since generation expressions are always computed per-table anyway.
By removing this, we can also allow a child to merge multiple
inheritance parents with inconsistent generation expressions, by
overriding them with its own expression, much as we've long allowed
for DEFAULT expressions.
Since we're rejecting a case that we used to accept, this doesn't
seem like a back-patchable change. Given the lack of field
complaints about the inconsistent behavior, it's likely that no
one is doing this anyway, but we won't change it in minor releases.
Amit Langote and Tom Lane
Discussion: https://postgr.es/m/2793383.1672944799@sss.pgh.pa.us
There are a number of places where a shell command is constructed with
percent-placeholders (like %x). It's cumbersome to have to open-code
this several times. This factors out this logic into a separate
function. This also allows us to ensure consistency for and document
some subtle behaviors, such as what to do with unrecognized
placeholders.
The unified handling is now that incorrect and unknown placeholders
are an error, where previously in most cases they were skipped or
ignored. This affects the following settings:
- archive_cleanup_command
- archive_command
- recovery_end_command
- restore_command
- ssl_passphrase_command
The following settings are part of this refactoring but already had
stricter error handling and should be unchanged in their behavior:
- basebackup_to_shell.command
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5238bbed-0b01-83a6-d4b2-7eb0562a054e%40enterprisedb.com
Prior to this, we only considered a full sort on the cheapest input path
and uniquifying any path which was already sorted in the required sort
order. Here we adjust create_final_distinct_paths() so that it also
adds an Incremental Sort path on any path which has presorted keys.
Additionally, this adjusts the parallel distinct code so that we now
consider sorting the cheapest partial path and incrementally sorting any
partial paths with presorted keys. Previously we didn't consider any
sorting for parallel distinct and only added a unique path atop any path
which had the required pathkeys already.
Author: David Rowley
Reviewed-by: Richard Guo
Discussion: https://postgr.es/m/CAApHDvo8Lz2H=42urBbfP65LTcEUOh288MT7DsG2_EWtW1AXHQ@mail.gmail.com
Can be set to the empty string, or to either or both of "set" or
"inherit". If set to a non-empty value, a non-superuser who creates
a role (necessarily by relying up the CREATEROLE privilege) will
grant that role back to themselves with the specified options.
This isn't a security feature, because the grant that this feature
triggers can also be performed explicitly. Instead, it's a user experience
feature. A superuser would necessarily inherit the privileges of any
created role and be able to access all such roles via SET ROLE;
with this patch, you can configure createrole_self_grant = 'set, inherit'
to provide a similar experience for a user who has CREATEROLE but not
SUPERUSER.
Discussion: https://postgr.es/m/CA+TgmobN59ct+Emmz6ig1Nua2Q-_o=r6DSD98KfU53kctq_kQw@mail.gmail.com
Previously, CREATEROLE users were permitted to make nearly arbitrary
changes to roles that they didn't create, with certain exceptions,
particularly superuser roles. Instead, allow CREATEROLE users to make such
changes to roles for which they possess ADMIN OPTION, and to
grant membership only in roles for which they possess ADMIN OPTION.
When a CREATEROLE user who is not a superuser creates a role, grant
ADMIN OPTION on the newly-created role to the creator, so that they
can administer roles they create or for which they have been given
privileges.
With these changes, CREATEROLE users still have very significant
powers that unprivileged users do not receive: they can alter, rename,
drop, comment on, change the password for, and change security labels
on roles. However, they can now do these things only for roles for
which they possess appropriate privileges, rather than all
non-superuser roles; moreover, they cannot grant a role such as
pg_execute_server_program unless they themselves possess it.
Patch by me, reviewed by Mark Dilger.
Discussion: https://postgr.es/m/CA+TgmobN59ct+Emmz6ig1Nua2Q-_o=r6DSD98KfU53kctq_kQw@mail.gmail.com
This allows left join removals and unique joins to work with partitioned
tables. The planner just lacked sufficient proofs that a given join
would not cause any row duplication. Unique indexes currently serve as
that proof, so have get_relation_info() populate the indexlist for
partitioned tables too.
Author: Arne Roland
Reviewed-by: Alvaro Herrera, Zhihong Yu, Amit Langote, David Rowley
Discussion: https://postgr.es/m/c3b2408b7a39433b8230bbcd02e9f302@index.de
Currently, for large transactions, the publisher sends the data in
multiple streams (changes divided into chunks depending upon
logical_decoding_work_mem), and then on the subscriber-side, the apply
worker writes the changes into temporary files and once it receives the
commit, it reads from those files and applies the entire transaction. To
improve the performance of such transactions, we can instead allow them to
be applied via parallel workers.
In this approach, we assign a new parallel apply worker (if available) as
soon as the xact's first stream is received and the leader apply worker
will send changes to this new worker via shared memory. The parallel apply
worker will directly apply the change instead of writing it to temporary
files. However, if the leader apply worker times out while attempting to
send a message to the parallel apply worker, it will switch to
"partial serialize" mode - in this mode, the leader serializes all
remaining changes to a file and notifies the parallel apply workers to
read and apply them at the end of the transaction. We use a non-blocking
way to send the messages from the leader apply worker to the parallel
apply to avoid deadlocks. We keep this parallel apply assigned till the
transaction commit is received and also wait for the worker to finish at
commit. This preserves commit ordering and avoid writing to and reading
from files in most cases. We still need to spill if there is no worker
available.
This patch also extends the SUBSCRIPTION 'streaming' parameter so that the
user can control whether to apply the streaming transaction in a parallel
apply worker or spill the change to disk. The user can set the streaming
parameter to 'on/off', or 'parallel'. The parameter value 'parallel' means
the streaming will be applied via a parallel apply worker, if available.
The parameter value 'on' means the streaming transaction will be spilled
to disk. The default value is 'off' (same as current behaviour).
In addition, the patch extends the logical replication STREAM_ABORT
message so that abort_lsn and abort_time can also be sent which can be
used to update the replication origin in parallel apply worker when the
streaming transaction is aborted. Because this message extension is needed
to support parallel streaming, parallel streaming is not supported for
publications on servers < PG16.
Author: Hou Zhijie, Wang wei, Amit Kapila with design inputs from Sawada Masahiko
Reviewed-by: Sawada Masahiko, Peter Smith, Dilip Kumar, Shi yu, Kuroda Hayato, Shveta Mallik
Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
GIN index scans were not taking any descent CPU-based cost into account. That
made them look cheaper than other types of indexes when they shouldn't be.
We use the same heuristic as for btree indexes, but multiply it by the number
of searched entries.
Additionally, the CPU cost for the tree was based largely on a
genericcostestimate. For a GIN index, we should not charge index quals per
tuple, but per entry. On top of this, charge cpu_index_tuple_cost per actual
tuple.
This should fix the cases where a GIN index is preferred over a btree and
the ones where a memoize node is not added on top of the GIN index scan
because it seemed too cheap.
We don't packpatch this to evade unexpected plan changes in stable versions.
Discussion: https://postgr.es/m/CABs3KGQnOkyQ42-zKQqiE7M0Ks9oWDSee%3D%2BJx3-TGq%3D68xqWYw%40mail.gmail.com
Discussion: https://postgr.es/m/3188617.44csPzL39Z%40aivenronan
Author: Ronan Dunklau
Reported-By: Hung Nguyen
Reviewed-by: Tom Lane, Alexander Korotkov
After restart, we try to stream the changes for large transactions that
were not sent before server crash and restart. However, we forget to send
the abort message for such transactions. This leads to spurious streaming
files on the subscriber which won't be cleaned till the apply worker or
the subscriber server restarts.
Reported-by: Dilip Kumar
Author: Hou Zhijie
Reviewed-by: Dilip Kumar and Amit Kapila
Backpatch-through: 14
Discussion: https://postgr.es/m/OS0PR01MB5716A773F46768A1B75BE24394FB9@OS0PR01MB5716.jpnprd01.prod.outlook.com
During the development of 728202b63, which was aimed at reducing the
number of sorts required to evaluate multiple window functions with
different WindowClause definitions, the code written sorted the
WindowClauses in reverse tleSortGroupRef order. There appears to be no
discussion in the thread which was opened to discuss the development of
this patch and no comments mentioning the fact that having the
WindowClauses in reverse tleSortGroupRef order makes it more likely that
the final WindowClause to be evaluated will provide presorted input to
the query's DISTINCT or ORDER BY clause. The reason for this is that the
tleSortGroupRef indexes are assigned for the DISTINCT and ORDER BY clauses
before they are for the WindowClauses PARTITION BY and ORDER BY clauses.
Putting the WindowClause with the lowest tleSortGroupRef last means that
it's more likely that no additional sorting is required for the query's
DISTINCT or ORDER BY clause.
All we're doing here is adding some tests and a comment to help ensure
that remains true and that we don't accidentally forget to consider this
again should we ever rewrite that code.
Author: Ankit Kumar Pandey, David Rowley
Discussion: https://postgr.es/m/CAApHDvq=g2=ny59f1bvwRVvupsgPHK-KjLPBsSL25fVuGZ4idQ@mail.gmail.com
Waken related worker processes immediately at commit of a transaction
that has performed ALTER SUBSCRIPTION (including the RENAME and
OWNER variants). This reduces the response time for such operations.
In the real world that might not be worth much, but it shaves several
seconds off the runtime for the subscription test suite.
In the case of PREPARE, we just throw away this notification state;
it doesn't seem worth the work to preserve it. The workers will
still react after the eventual COMMIT PREPARED, but not as quickly.
Nathan Bossart
Discussion: https://postgr.es/m/20221122004119.GA132961@nathanxps13
Previously this function checked to see if we were ready to switch
to two_phase mode at its start, but that's silly: we should check
at the end, after we've done the work that might make us ready.
This simple change removes one sleep cycle from the time needed to
switch to two_phase mode. In the real world that might not be
worth much, but it shaves a few seconds off the runtime for the
subscription test suite.
Nathan Bossart
Discussion: https://postgr.es/m/20221122004119.GA132961@nathanxps13
VACUUM normally ends by running vac_update_datfrozenxid(), which
requires a scan of pg_class. Therefore, if one attempts to vacuum a
database one table at a time --- as vacuumdb has done since v12 ---
we will spend O(N^2) time in vac_update_datfrozenxid(). That causes
serious performance problems in databases with tens of thousands of
tables, and indeed the effect is measurable with only a few hundred.
To add insult to injury, only one process can run
vac_update_datfrozenxid at the same time per DB, so this behavior
largely defeats vacuumdb's -j option.
Hence, invent options SKIP_DATABASE_STATS and ONLY_DATABASE_STATS
to allow applications to postpone vac_update_datfrozenxid() until the
end of a series of VACUUM requests, and teach vacuumdb to use them.
Per bug #17717 from Gunnar L. Sadly, this answer doesn't seem
like something we'd consider back-patching, so the performance
problem will remain in v12-v15.
Tom Lane and Nathan Bossart
Discussion: https://postgr.es/m/17717-6c50eb1c7d23a886@postgresql.org
A schema rename should cause reporting the new qualified names of
tables to logical replication subscribers, but that wasn't happening.
Flush the RelationSyncCache to make it happen.
(If you ask me, the new test case shows that the behavior in this area
is still pretty dubious, but apparently it's operating as designed.)
Vignesh C
Discussion: https://postgr.es/m/CALDaNm32vLRv5KdrDFeVC-CU+4Wg1daA55hMqOxDGJBzvd76-w@mail.gmail.com
Commit 57faaf376 added pg_truncate(const char *path, off_t length), but
"length" was ignored under WIN32 and the file was unconditionally
truncated to 0.
There was no live bug, since the only caller passes 0.
Fix, and back-patch to 14 where the function arrived.
Author: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20230106031652.GR3109%40telsasoft.com
This is just refactoring; there should be no functonal change. It
might have the effect of slightly reducing the number of calls to
GetUserId(), but the real point is to facilitate future work in
this area.
Patch by me, reviewed by Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmobFzTLkLwOquFrAcdsWBsOWDr-_H-jw+qBvfx-wSzMwDA@mail.gmail.com
Instead of having checks in AddRoleMems() and DelRoleMems(), have
the callers perform checks where it's required. In some cases it
isn't, either because the caller has already performed a check for
the same condition, or because the check couldn't possibly fail.
The "Skip permission check if nothing to do" check in each of
AddRoleMems() and DelRoleMems() is pointless. Some call sites
can't pass an empty list. Others can, but in those cases, the role
being modified is one that the current user has just created.
Therefore, they must have permission to modify it, and so no
permission check is required at all.
This patch is intended to have no user-visible consequences. It is
intended to simplify future work in this area.
Patch by me, reviewed by Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmobFzTLkLwOquFrAcdsWBsOWDr-_H-jw+qBvfx-wSzMwDA@mail.gmail.com
We were identifying the updatable generated columns of inheritance
children by transposing the calculation made for their parent.
However, there's nothing that says a traditional-inheritance child
can't have generated columns that aren't there in its parent, or that
have different dependencies than are in the parent's expression.
(At present it seems that we don't enforce that for partitioning
either, which is likely wrong to some degree or other; but the case
clearly needs to be handled with traditional inheritance.)
Hence, drop the very-klugy-anyway "extraUpdatedCols" RTE field
in favor of identifying which generated columns depend on updated
columns during executor startup. In HEAD we can remove
extraUpdatedCols altogether; in back branches, it's still there but
always empty. Another difference between the HEAD and back-branch
versions of this patch is that in HEAD we can add the new bitmap field
to ResultRelInfo, but that would cause an ABI break in back branches.
Like 4b3e37993, add a List field at the end of struct EState instead.
Back-patch to v13. The bogus calculation is also being made in v12,
but it doesn't have the same visible effect because we don't use it
to decide which generated columns to recalculate; as a consequence of
which the patch doesn't apply easily. I think that there might still
be a demonstrable bug associated with trigger firing conditions, but
that's such a weird corner-case usage that I'm content to leave it
unfixed in v12.
Amit Langote and Tom Lane
Discussion: https://postgr.es/m/CA+HiwqFshLKNvQUd1DgwJ-7tsTp=dwv7KZqXC4j2wYBV1aCDUA@mail.gmail.com
Discussion: https://postgr.es/m/2793383.1672944799@sss.pgh.pa.us
This fixes a couple of unused variable warnings that could be seen when
compiling with MEMORY_CONTEXT_CHECKING but not USE_ASSERT_CHECKING.
Defining MEMORY_CONTEXT_CHECKING without asserts is a little unusual,
however, we shouldn't be producing any warnings from such a build.
Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4_D-vgLEh7eO47p=73u1jWO78NWf6Qfv1FndY1kG-Q-jA@mail.gmail.com
We cannot rely on TransactionIdDidAbort here, since in general it may
report transactions that were in-progress at the time of an earlier hard
crash as not aborted, effectively behaving as if they were still in
progress even after crash recovery completes. Go back to defensively
verifying that xmax didn't commit instead.
Oversight in commit 79d4bf4e.
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20230104035636.hy5djyr2as4gbc4q@awork3.anarazel.de
Commit 4f627f89 switched SLRU truncation for multixacts back to being a
task performed during VACUUM, but missed some comments that continued to
reference truncation happening as part of checkpointing. Update those
comments now.
Also update comments that became obsolete when commit c3ffa731 changed
the way that vacuum_multixact_freeze_min_age is applied by VACUUM as it
computes its MultiXactCutoff cutoff (which is used by VACUUM to decide
what to freeze). Explain the same issues by referencing how OldestMxact
is the latest valid value that relminmxid can ever be advanced to at the
end of a VACUUM (following the work in commit 0b018fab).
pg_xact lookups are relatively expensive. Move the xmin/xmax commit
status checks from the point that freeze plans are prepared to the point
that they're actually executed. Otherwise we'll repeat many commit
status checks whenever multiple successive VACUUM operations scan the
same pages and decide against freezing each time, which is a waste of
cycles.
Oversight in commit 1de58df4, which added page-level freezing.
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkZpe4K6qMfEt8H4qYJCKc2R7TPvKsBva7jc9w7iGXQSw@mail.gmail.com
Improve comments added by commit 1de58df4 which describe the
lazy_scan_prune "freeze the page" path. These newly revised comments
are based on suggestions from Jeff Davis.
In passing, remove nearby visibility_cutoff_xid comments left over from
commit 6daeeb1f.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/ebc857107fe3edd422ef8a65191ca4a8da568b9b.camel@j-davis.com
Windows can enumerate the locales that are either installed or
supported by calling EnumSystemLocalesEx(), similar to what is already
done in the READ_LOCALE_A_OUTPUT switch. We can refactor some of the
logic already used in that switch into a new function
create_collation_from_locale().
The enumerated locales have BCP 47 shape, that is with a hyphen
between language and territory, instead of POSIX's underscore. The
created collations will retain the BCP 47 shape, but we will also
create a POSIX alias, so xx-YY will have an xx_YY alias.
A new test collate.windows.win1252 is added that is like
collate.linux.utf8.
Author: Juan Jose Santamaria Flecha <juanjo.santamaria@gmail.com>
Reviewed-by: Dmitry Koval <d.koval@postgrespro.ru>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/0050ec23-34d9-2765-9015-98c04f0e18ac@postgrespro.ru
While on it, newlines are removed from the end of two elog() strings.
The others are simple grammar mistakes. One comment in pg_upgrade
referred incorrectly to sequences since a7e5457.
Author: Justin Pryzby
Discussion: https://postgr.es/m/20221230231257.GI1153@telsasoft.com
Backpatch-through: 11
When considering an empty grouping set, we fetched
phasedata->eqfunctions[-1]. Because the eqfunctions array is
palloc'd, that would always be an aset pointer in released versions,
and thus the code accidentally failed to malfunction (since it would
do nothing unless it found a null pointer). Nonetheless this seems
like trouble waiting to happen, so add a check for length == 0.
It's depressing that our valgrind testing did not catch this.
Maybe we should reconsider the choice to not mark that word NOACCESS?
Richard Guo
Discussion: https://postgr.es/m/CAMbWs4-vZuuPOZsKOYnSAaPYGKhmacxhki+vpOKk0O7rymccXQ@mail.gmail.com
The term "truncation" has been ambiguous since commit 10a8d13823 added
line pointer array truncation during heap pruning. Clear things up by
specifying that we're talking about rel truncation here, to match nearby
comments that apply to tuples with storage.
Don't allow VACUUM to WAL-log the value FrozenTransactionId as the
snapshotConflictHorizon of freezing or visibility map related WAL
records.
The only special XID value that's an allowable snapshotConflictHorizon
is InvalidTransactionId, which is interpreted as "record definitely
doesn't require a recovery conflict".
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WznuNGSzF8v6OsgjaC5aYsb3cZ6HW6MLm30X0d65cmSH6A@mail.gmail.com
When brin_minmax_multi_union merges summaries, we may end up with just a
single range after merge_overlapping_ranges. The summaries may contain
just one range each, and they may overlap (or be exactly the same).
With a single range there's no distance to calculate, but we happen to
call build_distances anyway - which is fine, we don't calculate the
distance in this case, except that with asserts this failed due to a
check there are at least two ranges.
The assert is unnecessarily strict, so relax it a bit and bail out if
there's just a single range. The relaxed assert would be enough, but
this way we don't allocate unnecessary memory for distance.
Backpatch to 14, where minmax-multi opclasses were introduced.
Reported-by: Jaime Casanova
Backpatch-through: 14
Discussion: https://postgr.es/m/YzVA55qS0hgz8P3r@ahch-to
f193883 has been incorrectly setting up the precision used in the
timestamp compilations returned by the following functions:
- LOCALTIME
- LOCALTIMESTAMP
- CURRENT_TIME
- CURRENT_TIMESTAMP
Specifying an out-of-range precision for CURRENT_TIMESTAMP and
LOCALTIMESTAMP was raising a WARNING without adjusting the precision,
leading to a subsequent error. LOCALTIME and CURRENT_TIME raised a
WARNING without an error, still the precision given to the internal
routines was not correct, so let's be clean.
Ian has reported the problems in timestamp.c, while I have noticed the
ones in date.c. Regression tests are added for all of them with
precisions high enough to provide coverage for the warnings, something
that went missing up to this commit.
Author: Ian Lawrence Barwick, Michael Paquier
Discussion: https://postgr.es/m/CAB8KJ=jQEnn9sYG+N752spt68wMrhmT-ocHCh4oeNmHF82QMWA@mail.gmail.com
Teach VACUUM to decide on whether or not to trigger freezing at the
level of whole heap pages. Individual XIDs and MXIDs fields from tuple
headers now trigger freezing of whole pages, rather than independently
triggering freezing of each individual tuple header field.
Managing the cost of freezing over time now significantly influences
when and how VACUUM freezes. The overall amount of WAL written is the
single most important freezing related cost, in general. Freezing each
page's tuples together in batch allows VACUUM to take full advantage of
the freeze plan WAL deduplication optimization added by commit 9e540599.
Also teach VACUUM to trigger page-level freezing whenever it detects
that heap pruning generated an FPI. We'll have already written a large
amount of WAL just to do that much, so it's very likely a good idea to
get freezing out of the way for the page early. This only happens in
cases where it will directly lead to marking the page all-frozen in the
visibility map.
In most cases "freezing a page" removes all XIDs < OldestXmin, and all
MXIDs < OldestMxact. It doesn't quite work that way in certain rare
cases involving MultiXacts, though. It is convenient to define "freeze
the page" in a way that gives FreezeMultiXactId the leeway to put off
the work of processing an individual tuple's xmax whenever it happens to
be a MultiXactId that would require an expensive second pass to process
aggressively (allocating a new multi is especially worth avoiding here).
FreezeMultiXactId is eager when processing is cheap (as it usually is),
and lazy in the event of an individual multi that happens to require
expensive second pass processing. This avoids regressions related to
processing of multis that page-level freezing might otherwise cause.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com
Some compilers complain about sub_rteperminfos not being
initialized, evidently because they don't detect that it
is only used and set if isGeneralSelect is true.
Make it follow the long-established pattern for its
sibling variable sub_rtable.
Per reports from Pavel Stehule and the buildfarm.
Discussion: https://postgr.es/m/CAFj8pRDOvGOi-n616kM0Cc7qSbg_nGoS=-haB+D785sUXADqSg@mail.gmail.com
The modified error message for regcollationin failure includes
the database encoding, which it should've occurred to me is a
portability hazard for the regression tests. Adjust the test
so the expected output doesn't include that.
In passing, fix a comment typo introduced in b8c0ffbd2.
Per buildfarm.
Given the soft-input-error feature, we can reduce these functions
to be just thin wrappers around a soft-error call of the
corresponding datatype input function. This means less code and
more certainty that the to_reg* functions match the normal input
behavior.
Notably, it also means that they will accept numeric OID input,
which they didn't before. It's not clear to me if that omission
had more than laziness behind it, but it doesn't seem like
something we need to work hard to preserve.
Discussion: https://postgr.es/m/3910031.1672095600@sss.pgh.pa.us
This is not really complete, but it catches most cases of practical
interest. The main omissions are:
* regtype, regprocedure, and regoperator parse type names by
calling the main grammar, so any grammar-detected syntax error
will still be a hard error. Also, if one includes a type
modifier in such a type specification, errors detected by the
typmodin function will be hard errors.
* Lookup errors are handled just by passing missing_ok = true
to the relevant catalog lookup function. Because we've used
quite a restrictive definition of "missing_ok", this means that
edge cases such as "the named schema exists, but you lack
USAGE permission on it" are still hard errors.
It would make sense to me to replace most/all missing_ok
parameters with an escontext parameter and then allow these
additional lookup failure cases to be trapped too. But that's
a job for some other day.
Discussion: https://postgr.es/m/3342239.1671988406@sss.pgh.pa.us
This is slightly tedious because the adjustments cascade through
a couple of levels of subroutines, but it's not very hard.
I chose to avoid changing function signatures more than absolutely
necessary, by passing the escontext pointer in existing structs
where possible.
tsquery's nuisance NOTICEs about empty queries are suppressed in
soft-error mode, since they're not errors and we surely don't want
them to be shown to the user anyway. Maybe that whole behavior
should be reconsidered.
Discussion: https://postgr.es/m/3824377.1672076822@sss.pgh.pa.us
Historically these input functions just called strtoul or strtoull
and returned the result, with no error detection whatever. Upgrade
them to reject garbage input and out-of-range values, similarly to
our other numeric input routines.
To share the code for this with type oid, adjust the existing
"oidin_subr" to be agnostic about the SQL name of the type it is
handling, and move it to numutils.c; then clone it for 64-bit types.
Because the xid types previously accepted hex and octal input by
reason of calling strtoul[l] with third argument zero, I made the
common subroutine do that too, with the consequence that type oid
now also accepts hex and octal input. In view of 6fcda9aba, that
seems like a good thing.
While at it, simplify the existing over-complicated handling of
syntax errors from strtoul: we only need one ereturn not three.
Discussion: https://postgr.es/m/3526121.1672000729@sss.pgh.pa.us
When VACUUM determines that an existing MultiXact should use a freeze
plan that sets xmax to InvalidTransactionId, the original Multi may or
may not be before OldestMxact. Remove an incorrect assertion that
expected it to always be from before OldestMxact.
Oversight in commit 4ce3af.
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/TYAPR01MB5866B24104FD80B5D7E65C3EF5ED9@TYAPR01MB5866.jpnprd01.prod.outlook.com
This enables streaming or serializing changes immediately in logical
decoding. This parameter is intended to be used to test logical decoding
and replication of large transactions for which otherwise we need to
generate the changes till logical_decoding_work_mem is reached.
This helps in reducing the timing of existing tests related to logical
replication of in-progress transactions and will help in writing tests for
for the upcoming feature for parallelly applying large in-progress
transactions.
Author: Shi yu
Reviewed-by: Sawada Masahiko, Shveta Mallik, Amit Kapila, Dilip Kumar, Kuroda Hayato, Kyotaro Horiguchi
Discussion: https://postgr.es/m/OSZPR01MB63104E7449DBE41932DB19F1FD1B9@OSZPR01MB6310.jpnprd01.prod.outlook.com
While at it, use a common subroutine already.
This doesn't move the needle very far in terms of making these
functions non-throwing; the only case we're now able to trap is
numeric-OID-is-out-of-range. Still, it seems like a pretty
non-controversial step in that direction.
Fix incorrect code which was trying to convert a Bitmapset of columns at
the attnums according to a parent table and transform them into the
equivalent Bitmapset with same attnums according to the given child table.
This code is new as of a61b1f748 and was failing to do the correct
translation when there was an intermediate parent table between 'rel' and
'top_parent_rel'.
Reported-by: Ranier Vilela
Author: Richard Guo, Amit Langote
Discussion: https://postgr.es/m/CAEudQArohfB_Gy%2BhcH2-bANUkxgjJiP%3DABq01_LgTNTbcNijag%40mail.gmail.com
An epoll fd belonging to the parent should be closed in the child. A
kqueue fd is automatically closed by fork(), but we should still adjust
our counter. For poll and Windows systems, nothing special is required.
On all systems we free the memory.
No caller yet, but we'll need this if we start using WaitEventSet in the
postmaster as planned.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
At the time of commit 6a2a70a02 we didn't use latch infrastructure in
the postmaster. We're planning to start doing that, so we'd better make
sure that the signalfd inherited from a postmaster is not duplicated and
then leaked in the child.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
To be able to handle incoming connections on a server socket with
the WaitEventSet API, we'll need a new kind of event to indicate that
the the socket is ready to accept a connection.
On Unix, it's just the same as WL_SOCKET_READABLE, but on Windows there
is a different underlying kernel event that we need to map our
abstraction to.
No user yet, but a proposed patch would use this.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
Three error strings used with cache lookup failures were referring to
incorrect object types for ACL checks:
- Schemas
- Types
- Foreign Servers
There errors should never be triggered, but if they do incorrect
information would be reported.
Author: Justin Pryzby
Discussion: https://postgr.es/m/20221222153041.GN1153@telsasoft.com
Backpatch-through: 11
The former name was discussed as being confusing, so use "split", as per
a suggestion from Magnus Hagander.
While on it, one of the output arguments is renamed from "segno" to
"segment_number", as per a suggestion from Kyotaro Horiguchi.
The documentation is updated to reflect all these changes.
Bump catalog version.
Author: Bharath Rupireddy, Michael Paquier
Discussion: https://postgr.es/m/CABUevEytQVaOOhGdoh0D7hGwe3fuKcRF6NthsSW7ww04EmtFgQ@mail.gmail.com
WindowFuncs such as row_number() don't care if it's called with ROWS
UNBOUNDED PRECEDING AND CURRENT ROW or with RANGE UNBOUNDED PRECEDING AND
CURRENT ROW. The latter is less efficient as the RANGE option requires
that the executor check for peer rows, so using the ROW option instead
would cause less overhead. Because RANGE is part of the default frame
options for WindowClauses, it means WindowAgg is, by default, working much
harder than it needs to for window functions where the ROWS / RANGE option
has no effect on the window function's result.
On a test query from the discussion thread, a performance improvement of
344% was seen by using ROWS instead of RANGE.
Here we add a new support function node type to allow support functions to
be called for window functions so that the most optimal version of the
frame options can be set. The planner has been adjusted so that the frame
options are changed only if all window functions sharing the same window
clause agree on what the optimized frame options are.
Here we give the ability for row_number(), rank(), dense_rank(),
percent_rank(), cume_dist() and ntile() to alter their WindowClause's
frameOptions.
Reviewed-by: Vik Fearing, Erwin Brandstetter, Zhihong Yu
Discussion: https://postgr.es/m/CAGHENJ7LBBszxS+SkWWFVnBmOT2oVsBhDMB1DFrgerCeYa_DyA@mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvohAKEtTXxq7Pc-ic2dKT8oZfbRKeEJP64M0B6+S88z+A@mail.gmail.com
Use C99 designated initializer syntax for the array elements, instead of
writing the enumerator name and position in a comment. Replace nkeys
and key with a local variadic macro, for a shorter notation.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://postgr.es/m/CA%2BhUKGKdpDjKL2jgC-GpoL4DGZU1YPqnOFHbDqFkfRQcPaR5DQ%40mail.gmail.com
Perform a failsafe check every time VACUUM's first heap scan scans a
further FAILSAFE_EVERY_PAGES pages, rather than using an approach based
on the number of physical blocks that our current blkno is from the
blkno at the time of the previous failsafe check. That way VACUUM will
perform a failsafe check every time it has scanned a uniform number of
pages, without it mattering when or how VACUUM skipped pages using the
visibility map.
Sami Imseih, with changes to FAILSAFE_EVERY_PAGES comments added by me.
Author: Sami Imseih <simseih@amazon.com>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/401CE010-4049-4B94-9961-0B610A5D254D%40amazon.com
Use a dedicated struct for the XID/MXID cutoffs used by VACUUM, such as
FreezeLimit and OldestXmin. This state is initialized in vacuum.c, and
then passed around by code from vacuumlazy.c to heapam.c freezing
related routines. The new convention is that everybody works off of the
same cutoff state, which is passed around via pointers to const.
Also simplify some of the logic for dealing with frozen xmin in
heap_prepare_freeze_tuple: add dedicated "xmin_already_frozen" state to
clearly distinguish xmin XIDs that we're going to freeze from those that
were already frozen from before. That way the routine's xmin handling
code is symmetrical with the existing xmax handling code. This is
preparation for an upcoming commit that will add page level freezing.
Also refactor the control flow within FreezeMultiXactId(), while adding
stricter sanity checks. We now test OldestXmin directly, instead of
using FreezeLimit as an inexact proxy for OldestXmin. This is further
preparation for the page level freezing work, which will make the
function's caller cede control of page level freezing to the function
where appropriate (where heap_prepare_freeze_tuple sees a tuple that
happens to contain a MultiXactId in its xmax).
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/CAH2-WznS9TxXmz2_=SY+SyJyDFbiOftKofM9=aDo68BbXNBUMA@mail.gmail.com
perform_pullup_replace_vars() knows how to scan the whole parent
query tree when we are replacing Vars during a subquery flattening
operation. However, for the specific case of flattening a
UNION ALL leaf query, that's mostly wasted work: the only place
where relevant Vars could exist is in the AppendRelInfo that we
just made for this leaf. Teaching perform_pullup_replace_vars()
to just deal with that and exit is worthwhile because, if we have
N such subqueries to pull up, we were spending O(N^2) work uselessly
mutating the AppendRelInfos for all the other subqueries.
While we're at it, avoid calling substitute_phv_relids if there are no
PlaceHolderVars, and remove an obsolete check of parse->hasSubLinks.
Andrey Lepikhov and Tom Lane
Discussion: https://postgr.es/m/703c09a2-08f3-d2ec-b33d-dbecd62428b8@postgrespro.ru
Andrey Lepikhov demonstrated a case where we spend an unreasonable
amount of time in pull_up_subqueries(). Not only is that recursing
with no explicit check for stack overrun, but the code seems not
interruptable by control-C. Let's stick a CHECK_FOR_INTERRUPTS
there, along with sprinkling some stack depth checks.
An actual fix for the excessive time consumption seems a bit
risky to back-patch; but this isn't, so let's do so.
Discussion: https://postgr.es/m/703c09a2-08f3-d2ec-b33d-dbecd62428b8@postgrespro.ru
A bitwise operator was getting used on two bools in
ATAddCheckConstraint() to track if constraints should be merged or not
with the existing ones of a relation, though obviously this should use
a boolean OR operator. This led to the same result, but let's be
clean.
Oversight in 074c5cf.
Author: Ranier Vilela
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/CAEudQAp2R2fbbi0OHHhv_n4=Ch0t1VtjObR9YMqtGKHJ+faUFQ@mail.gmail.com
This introduces palloc_aligned() and MemoryContextAllocAligned() which
allow callers to obtain memory which is allocated to the given size and
also aligned to the specified alignment boundary. The alignment
boundaries may be any power-of-2 value. Currently, the alignment is
capped at 2^26, however, we don't expect values anything like that large.
The primary expected use case is to align allocations to perhaps CPU
cache line size or to maybe I/O page size. Certain use cases can benefit
from having aligned memory by either having better performance or more
predictable performance.
The alignment is achieved by requesting 'alignto' additional bytes from
the underlying allocator function and then aligning the address that is
returned to the requested alignment. This obviously does waste some
memory, so alignments should be kept as small as what is required.
It's also important to note that these alignment bytes eat into the
maximum allocation size. So something like:
palloc_aligned(MaxAllocSize, 64, 0);
will not work as we cannot request MaxAllocSize + 64 bytes.
Additionally, because we're just requesting the requested size plus the
alignment requirements from the given MemoryContext, if that context is
the Slab allocator, then since slab can only provide chunks of the size
that's specified when the slab context is created, then this is not going
to work. Slab will generate an error to indicate that the requested size
is not supported.
The alignment that is requested in palloc_aligned() is stored along with
the allocated memory. This allows the alignment to remain intact through
repalloc() calls.
Author: Andres Freund, David Rowley
Reviewed-by: Maxim Orlov, Andres Freund, John Naylor
Discussion: https://postgr.es/m/CAApHDvpxLPUMV1mhxs6g7GNwCP6Cs6hfnYQL5ffJQTuFAuxt8A%40mail.gmail.com
This is the guts of float4in, callable as a routine to input floats,
which will be useful in an upcoming patch for allowing soft errors in
the seg module's input function.
A similar operation was performed some years ago for float8in in
commit 50861cd683.
Reviewed by Tom Lane
Discussion: https://postgr.es/m/cee4e426-d014-c0b7-aa22-a659f2cd9130@dunslane.net
d21ded75f changed the way slab.c works but introduced a bug that meant we
could end up with the slab's curBlocklistIndex pointing to the wrong list.
The condition which was checking for this was failing to account for two
things:
1. The curBlocklistIndex could be 0 as we've currently got no non-full
blocks to put chunks on. In this case, the dlist_is_empty() check cannot
be performed as there can be any number of completely full blocks at that
index.
2. The curBlocklistIndex may be greater than the index we just moved the
block onto. Since we need to ensure we fill up fuller blocks first, we
must reset curBlocklistIndex when changing any blocklist element that's
less than the curBlocklistIndex too.
Reported-by: Takamichi Osumi
Discussion: https://postgr.es/m/TYCPR01MB8373329C6329768D7E093D68EDEB9@TYCPR01MB8373.jpnprd01.prod.outlook.com
This shaves some code by replacing the combinations of
CreateTemplateTupleDesc()/TupleDescInitEntry() hardcoding a mapping of
the attributes listed in pg_proc.dat by get_call_result_type() to build
the TupleDesc needed for the rows generated.
get_call_result_type() is more expensive than the former style, but this
removes some duplication with the lists of OUT parameters (pg_proc.dat
and the attributes hardcoded in these code paths). This is applied to
functions that are not considered as critical (aka that could be called
repeatedly for monitoring purposes).
Author: Bharath Rupireddy
Reviewed-by: Robert Haas, Álvaro Herrera, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACV23HW5HP5hFjd89FNS-z5X8r2jNXdMXcpN2BgTtKd87w@mail.gmail.com
Commit 927f453a9 disallowed batching added by commit b663a4136 to be
used for the inserts performed as part of cross-partition updates of
partitioned tables, mainly because the previous code in
nodeModifyTable.c couldn't handle pending inserts into foreign-table
partitions that are also UPDATE target partitions. But we don't have
such a limitation anymore (cf. commit ffbb7e65a), so let's allow for
this by removing from execPartition.c the restriction added by commit
927f453a9 that batching is only allowed if the query command type is
CMD_INSERT.
In postgres_fdw, since commit 86dc90056 changed it to effectively
disable cross-partition updates in the case where a foreign-table
partition chosen to insert rows into is also an UPDATE target partition,
allow batching in the case where a foreign-table partition chosen to
do so is *not* also an UPDATE target partition. This is enabled by the
"batch_size" option added by commit b663a4136, which is disabled by
default.
This patch also adjusts the test case added by commit 927f453a9 to
confirm that the inserts performed as part of a cross-partition update
of a partitioned table indeed uses batching.
Amit Langote, reviewed and/or tested by Georgios Kokolatos, Zhihong Yu,
Bharath Rupireddy, Hou Zhijie, Vignesh C, and me.
Discussion: http://postgr.es/m/CA%2BHiwqH1Lz1yJmPs%3DaD-pzd_HLLynLHvq5iYeT9mB0bBV7oJ6w%40mail.gmail.com
1349d279 added query planner support to allow more efficient execution of
aggregate functions which have an ORDER BY or a DISTINCT clause. Prior to
that commit, the planner would only request that the lower planner produce
a plan with the order required for the GROUP BY clause and it would be
left up to nodeAgg.c to perform the final sort of records within each
group so that the aggregate transition functions were called in the
correct order. Now that the planner requests the lower planner produce a
plan with the GROUP BY and the ORDER BY / DISTINCT aggregates in mind,
there is the possibility that the planner chooses a plan which could be
less efficient than what would have been produced before 1349d279.
While developing 1349d279, I had in mind that Incremental Sort would help
us in cases where an index exists only on the GROUP BY column(s).
Incremental Sort would just replace the implicit tuplesorts which are
being performed in nodeAgg.c. However, because the planner has the
flexibility to instead choose a plan which just performs a full sort on
both the GROUP BY and ORDER BY / DISTINCT aggregate columns, there is
potential for the planner to make a bad choice. The costing for
Incremental Sort is not perfect as it assumes an even distribution of rows
to sort within each sort group.
Here we add an escape hatch in the form of the enable_presorted_aggregate
GUC. This will allow users to get the pre-PG16 behavior in cases where
they have no other means to convince the query planner to produce a plan
which only sorts on the GROUP BY column(s).
Discussion: https://postgr.es/m/CAApHDvr1Sm+g9hbv4REOVuvQKeDWXcKUAhmbK5K+dfun0s9CvA@mail.gmail.com
Slab has traditionally been fairly slow when compared with the AllocSet or
Generation memory allocators. Part of this slowness came from having to
write out an entire block when we allocate a new block in order to
populate the free list indexes within the block's memory. Additional
slowness came from having to move a block onto another dlist each time we
palloc or pfree a chunk from it.
Here we optimize both of those cases and do a little bit extra to improve
the performance of the slab allocator.
Here, instead of writing out the free list indexes when allocating a new
block, we introduce the concept of "unused" chunks. When a block is first
allocated all chunks are unused. These chunks only make it onto the
free list when they are pfree'd. When allocating new chunks on an
existing block, we have the choice of consuming a chunk from the free list
or an unused chunk. When both exist, we opt to use one from the free
list, as these have been used already and the memory of them is more
likely to be cached by the CPU.
Here we also reduce the number of block lists from there being one for
every possible value of free chunks on a block to just having a small
fixed number of block lists. We keep the 0th block list for completely
full blocks and anything else stores blocks for some range of free chunks
with fuller blocks appearing on lower block list array elements. This
reduces how often we must move a block to another list when we allocate or
free chunks, but still allows us to prefer to put new chunks on fuller
blocks and perhaps allow blocks with fewer chunks to be free'd later
once all their remaining chunks have been pfree'd.
Additionally, we now store a list of "emptyblocks", which are blocks that
no longer contain any allocated chunks. We now keep up to 10 of these
around to avoid having to thrash malloc/free when allocation patterns
continually cause blocks to become free of any allocated chunks only to
allocate more chunks again. Now only once we have 10 of these, we free
the block. This does raise the high water mark for the total memory that
a slab context can consume. It does not seem entirely unreasonable that
we might one day want to make this a property of SlabContext rather than a
compile-time constant. Let's wait and see if there is any evidence to
support that this is required before doing it.
Author: Andres Freund, David Rowley
Tested-by: Tomas Vondra, John Naylor
Discussion: https://postgr.es/m/20210717194333.mr5io3zup3kxahfm@alap3.anarazel.de
This function takes in input a WAL segment name and returns a tuple made
of the segment sequence number (dependent on the WAL segment size of the
cluster) and its timeline, as of a thin SQL wrapper around the existing
XLogFromFileName().
This function has multiple usages, like being able to compile a LSN from
a file name and an offset, or finding the timeline of a segment without
having to do to some maths based on the first eight characters of the
segment.
Bump catalog version.
Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart, Kyotaro Horiguchi, Maxim Orlov, Michael
Paquier
Discussion: https://postgr.es/m/CALj2ACWV=FCddsxcGbVOA=cvPyMr75YCFbSQT6g4KDj=gcJK4g@mail.gmail.com
SCRAM_KEY_LEN was a variable used in the internal routines of SCRAM to
size a set of fixed-sized arrays used in the SHA and HMAC computations
during the SASL exchange or when building a SCRAM password. This had a
hard dependency on SHA-256, reducing the flexibility of SCRAM when it
comes to the addition of more hash methods. A second issue was that
SHA-256 is assumed as the cryptohash method to use all the time.
This commit renames SCRAM_KEY_LEN to a more generic SCRAM_KEY_MAX_LEN,
which is used as the size of the buffers used by the internal routines
of SCRAM. This is aimed at tracking centrally the maximum size
necessary for all the hash methods supported by SCRAM. A global
variable has the advantage of keeping the code in its simplest form,
reducing the need of more alloc/free logic for all the buffers used in
the hash calculations.
A second change is that the key length (SHA digest length) and hash
types are now tracked by the state data in the backend and the frontend,
the common portions being extended to handle these as arguments by the
internal routines of SCRAM. There are a few RFC proposals floating
around to extend the SCRAM protocol, including some to use stronger
cryptohash algorithms, so this lifts some of the existing restrictions
in the code.
The code in charge of parsing and building SCRAM secrets is extended to
rely on the key length and on the cryptohash type used for the exchange,
assuming currently that only SHA-256 is supported for the moment. Note
that the mock authentication simply enforces SHA-256.
Author: Michael Paquier
Reviewed-by: Peter Eisentraut, Jonathan Katz
Discussion: https://postgr.es/m/Y5k3Qiweo/1g9CG6@paquier.xyz
A new function pg_stat_get_backend_subxact() can be used to get
information about the number of subtransactions in the cache of
a particular backend and whether that cache has overflowed. This
can be useful for tracking down performance problems that can
result from overflowed snapshots.
Dilip Kumar, reviewed by Zhihong Yu, Nikolay Samokhvalov,
Justin Pryzby, Nathan Bossart, Ashutosh Sharma, Julien
Rouhaud. Additional design comments from Andres Freund,
Tom Lane, Bruce Momjian, and David G. Johnston.
Discussion: http://postgr.es/m/CAFiTN-ut0uwkRJDQJeDPXpVyTWD46m3gt3JDToE02hTfONEN=Q@mail.gmail.com
Such references failed with "cache lookup failed for type 0"
because we didn't resolve the type of the CYCLE column until after
analyzing the CTE's query. We can just move that processing
to before the recursive parse_sub_analyze call, though.
While here, invent a couple of local variables to make this
code less egregiously wider-than-80-columns.
Per bug #17723 from Vik Fearing. Back-patch to v14 where
the CYCLE feature was added.
Discussion: https://postgr.es/m/17723-2c4985ff111e7bba@postgresql.org
This ancient bit of code was summarily trapping any ereport longjmp
whatsoever and assuming that it must represent an invalid-XML report.
It's not really appropriate to handle OOM-like situations that way:
maybe the input is valid or maybe not, but we couldn't find out.
And it'd be a seriously bad idea to ignore, say, a query cancel
error that way. (Perhaps that can't happen because there is no
CHECK_FOR_INTERRUPTS anywhere within xml_parse, but even if that's
true today it's obviously a very fragile assumption.)
But in the wake of the previous commit, we can drop the PG_TRY
here altogether, and use the soft error mechanism to catch only
the kinds of errors that are legitimate to treat as invalid-XML.
(This is our first use of the soft error mechanism for something
not directly related to a datatype input function. It won't be
the last.)
xml_is_document can be converted in the same way. That one is
not actively broken, because it was checking specifically for
ERRCODE_INVALID_XML_DOCUMENT rather than trapping everything;
but the code is still shorter and probably faster this way.
Discussion: https://postgr.es/m/3564577.1671142683@sss.pgh.pa.us
The key idea here is that xml_parse must distinguish hard errors
from soft errors. We want to throw a hard error for libxml
initialization failures: those might be out-of-memory, or something
else, but in any case they are not the fault of the input string.
If we get to the point of parsing the input, and something goes
wrong, we can fairly consider that to mean bad input.
One thing that arguably does mean bad input, but I didn't trouble
to handle softly, is encoding conversion failure while converting
the server encoding to UTF8. This might be something to improve
later, but it seems like a pretty low-probability scenario.
Discussion: https://postgr.es/m/3564577.1671142683@sss.pgh.pa.us
When incremental sorts were added in v13 a 1.5x pessimism factor was added
to the cost modal. Seemingly this was done because the cost modal only
has an estimate of the total number of input rows and the number of
presorted groups. It assumes that the input rows will be evenly
distributed throughout the presorted groups. The 1.5x pessimism factor
was added to slightly reduce the likelihood of incremental sorts being
used in the hope to avoid performance regressions where an incremental
sort plan was picked and turned out slower due to a large skew in the
number of rows in the presorted groups.
An additional quirk with the path generation code meant that we could
consider both a sort and an incremental sort on paths with presorted keys.
This meant that with the pessimism factor, it was possible that we opted
to perform a sort rather than an incremental sort when the given path had
presorted keys.
Here we remove the 1.5x pessimism factor to allow incremental sorts to
have a fairer chance at being chosen against a full sort.
Previously we would generally create a sort path on the cheapest input
path (if that wasn't sorted already) and incremental sort paths on any
path which had presorted keys. This meant that if the cheapest input path
wasn't completely sorted but happened to have presorted keys, we would
create a full sort path *and* an incremental sort path on that input path.
Here we change this logic so that if there are presorted keys, we only
create an incremental sort path, and create sort paths only when a full
sort is required.
Both the removal of the cost pessimism factor and the changes made to the
path generation make it more likely that incremental sorts will now be
chosen. That, of course, as with teaching the planner any new tricks,
means an increased likelihood that the planner will perform an incremental
sort when it's not the best method. Our standard escape hatch for these
cases is an enable_* GUC. enable_incremental_sort already exists for
this.
This came out of a report by Pavel Luzanov where he mentioned that the
master branch was choosing to perform a Seq Scan -> Sort -> Group
Aggregate for his query with an ORDER BY aggregate function. The v15 plan
for his query performed an Index Scan -> Group Aggregate, of course, the
aggregate performed the final sort internally in nodeAgg.c for the
aggregate's ORDER BY. The ideal plan would have been to use the index,
which provided partially sorted input then use an incremental sort to
provide the aggregate with the sorted input. This was not being chosen
due to the pessimism in the incremental sort cost modal, so here we remove
that and rationalize the path generation so that sort and incremental sort
plans don't have to needlessly compete. We assume that it's senseless
to ever use a full sort on a given input path where an incremental sort
can be performed.
Reported-by: Pavel Luzanov
Reviewed-by: Richard Guo
Discussion: https://postgr.es/m/9f61ddbf-2989-1536-b31e-6459370a6baa%40postgrespro.ru
The building of command completion tags could often be seen showing up in
profiles when running high tps workloads.
The query completion tags were being built with snprintf, which is slow at
the best of times when compared with more manual ways of formatting
strings. Here we introduce BuildQueryCompletionString() to do this job
for us. We also now store the completion tag's strlen in the
CommandTagBehavior struct so that we can quickly memcpy this number of
bytes into the completion tag string. Appending the rows affected is done
via pg_ulltoa_n. BuildQueryCompletionString returns the length of the
built string. This saves us having to call strlen to figure out how many
bytes to pass to pq_putmessage().
Author: David Rowley, Andres Freund
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CAHoyFK-Xwqc-iY52shj0G+8K9FJpse+FuZ36XBKy78wDVnd=Qg@mail.gmail.com
This is mostly straightforward, except that if the range type
has a canonical function, that might throw an error during range
input. (Such errors probably only occur for edge cases: in the
in-core canonical functions, it happens only if a bound has the
maximum valid value for the underlying type.) Hence, this patch
extends the soft-error regime to allow canonical functions to
return errors softly as well. Extensions implementing range
canonical functions will need modification anyway because of the
API change for range_serialize(); while at it, they might want
to do something similar to what's been done here in the in-core
canonical functions.
Discussion: https://postgr.es/m/3284599.1671075185@sss.pgh.pa.us
Because we added StaticAssertStmt() first before StaticAssertDecl(),
some uses as well as the instructions in c.h are now a bit backwards
from the "native" way static assertions are meant to be used in C.
This updates the guidance and moves some static assertions to better
places.
Specifically, since the addition of StaticAssertDecl(), we can put
static assertions at the file level. This moves a number of static
assertions out of function bodies, where they might have been stuck
out of necessity, to perhaps better places at the file level or in
header files.
Also, when the static assertion appears in a position where a
declaration is allowed, then using StaticAssertDecl() is more native
than StaticAssertStmt().
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/941a04e7-dd6f-c0e4-8cdf-a33b3338cbda%40enterprisedb.com
Convert assorted internal-ish datatypes, namely aclitemin,
int2vectorin, oidin, oidvectorin, pg_lsn_in, pg_snapshot_in,
and tidin to the new style.
(Some others you might expect to find in this group, such as
cidin and xidin, need no changes because they never throw
errors at all. That seems a little cheesy ... but it is not in
the charter of this patch series to add new error conditions.)
Amul Sul, minor mods by me
Discussion: https://postgr.es/m/CAAJ_b97KeDWUdpTKGOaFYPv0OicjOu6EW+QYWj-Ywrgj_aEy1g@mail.gmail.com
Convert box_in, circle_in, line_in, lseg_in, path_in, point_in,
and poly_in to the new style.
line_in still throws hard errors for overflows/underflows that can occur
when the input is specified as two points rather than in the canonical
"Ax + By + C = 0" style. I'm not too concerned about that: it won't be
reached in normal dump/restore cases, and it's fairly debatable that
such conversion should ever have been made part of a type input function
in the first place. But in any case, I don't want to extend the soft
error conventions into float.h without more discussion than this patch
has had.
Amul Sul, minor mods by me
Discussion: https://postgr.es/m/CAAJ_b97KeDWUdpTKGOaFYPv0OicjOu6EW+QYWj-Ywrgj_aEy1g@mail.gmail.com
Add support for hexadecimal, octal, and binary integer literals:
0x42F
0o273
0b100101
per SQL:202x draft.
This adds support in the lexer as well as in the integer type input
functions.
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
Commits f92944137 et al. made IsInTransactionBlock() set the
XACT_FLAGS_NEEDIMMEDIATECOMMIT flag before returning "false",
on the grounds that that kept its API promises equivalent to those of
PreventInTransactionBlock(). This turns out to be a bad idea though,
because it allows an ANALYZE in a pipelined series of commands to
cause an immediate commit, which is unexpected.
Furthermore, if we return "false" then we have another issue,
which is that ANALYZE will decide it's allowed to do internal
commit-and-start-transaction sequences, thus possibly unexpectedly
committing the effects of previous commands in the pipeline.
To fix the latter situation, invent another transaction state flag
XACT_FLAGS_PIPELINING, which explicitly records the fact that we
have executed some extended-protocol command and not yet seen a
commit for it. Then, require that flag to not be set before allowing
InTransactionBlock() to return "false".
Having done that, we can remove its setting of NEEDIMMEDIATECOMMIT
without fear of causing problems. This means that the API guarantees
of IsInTransactionBlock now diverge from PreventInTransactionBlock,
which is mildly annoying, but it seems OK given the very limited usage
of IsInTransactionBlock. (In any case, a caller preferring the old
behavior could always set NEEDIMMEDIATECOMMIT for itself.)
For consistency also require XACT_FLAGS_PIPELINING to not be set
in PreventInTransactionBlock. This too is meant to prevent commands
such as CREATE DATABASE from silently committing previous commands
in a pipeline.
Per report from Peter Eisentraut. As before, back-patch to all
supported branches (which sadly no longer includes v10).
Discussion: https://postgr.es/m/65a899dd-aebc-f667-1d0a-abb89ff3abf8@enterprisedb.com
Instead of half a dozen of mostly-duplicate ExecGrant_Foo() functions,
write one common function ExecGrant_generic() that can handle most of
them. We already have all the information we need, such as which
system catalog corresponds to which catalog table and which column is
the ACL column.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: https://www.postgresql.org/message-id/flat/22c7e802-4e7d-8d87-8b71-cba95e6f4bcf%40enterprisedb.com
jsonb_get_element() was incautious enough to use VARDATA() and
VARSIZE() directly on an arbitrary text Datum. That of course
fails if the Datum is short-header, compressed, or out-of-line.
The typical result would be failing to match any element of a
jsonb object, though matching the wrong one seems possible as well.
setPathObject() was slightly brighter, in that it used VARDATA_ANY
and VARSIZE_ANY_EXHDR, but that only kept it out of trouble for
short-header Datums. push_path() had the same issue. This could
result in faulty subscripted insertions, though keys long enough to
cause a problem are likely rare in the wild.
Having seen these, I looked around for unsafe usages in the rest
of the adt/json* files. There are a couple of places where it's not
immediately obvious that the Datum can't be compressed or out-of-line,
so I added pg_detoast_datum_packed() to cope if it is. Also, remove
some other usages of VARDATA/VARSIZE on Datums we just extracted from
a text array. Those aren't actively broken, but they will become so
if we ever start allowing short-header array elements, which does not
seem like a terribly unreasonable thing to do. In any case they are
not great coding examples, and they could also do with comments
pointing out that we're assuming we don't need pg_detoast_datum_packed.
Per report from exe-dealer@yandex.ru. Patch by me, but thanks to
David Johnston for initial investigation. Back-patch to v14 where
jsonb subscripting was introduced.
Discussion: https://postgr.es/m/205321670615953@mail.yandex.ru
If sendFileWithContent were used to send a file larger than the
bbsink buffer size, this would result in corruption. The only
files that are sent via sendFileWithContent are the backup label
file, the tablespace map file, and .done files for WAL segments
included in the backup. Of these, it seems that only the
tablespace_map file can become large enough to cause a problem,
and then only if you have a lot of tablespaces. If you do have
that situation, you might end up with a corrupted
tablespace_map file, which would be bad.
My commit bef47ff85d introduced
this problem.
Report and patch by Antonin Houska.
Discussion: http://postgr.es/m/15764.1670528645@antos
During ALTER TABLE execution, when prep-time handling of subcommands of
certain types determine that execution-time handling requires recursion,
they signal this by changing the subcommand type to a special value.
This can be done in a simpler way by using a separate flag introduced by
commit ec0925c22a, so do that.
Catversion bumped. It's not clear to me that ALTER TABLE subcommands
are stored anywhere in catalogs (CREATE FUNCTION rejects it in BEGIN
ATOMIC function bodies), but we do have both write and read support for
them, so be safe.
Discussion: https://postgr.es/m/20220929090033.zxuaezcdwh2fgfjb@alvherre.pgsql
This is straightforward as far as it goes. However, it does not
attempt to trap errors occurring during the execution of domain
CHECK constraints. Since those are general user-defined
expressions, the only way to do that would involve starting up a
subtransaction for each check. Of course the entire point of
the soft-errors feature is to not need subtransactions, so that
would be self-defeating. For now, we'll rely on the assumption
that domain checks are written to avoid throwing errors.
Discussion: https://postgr.es/m/1181028.1670635727@sss.pgh.pa.us
This requires a bit of further infrastructure-extension to allow
trapping errors reported by numeric_in and pg_unicode_to_server,
but otherwise it's pretty straightforward.
In the case of jsonb_in, we are only capturing errors reported
during the initial "parse" phase. The value-construction phase
(JsonbValueToJsonb) can also throw errors if assorted implementation
limits are exceeded. We should improve that, but it seems like a
separable project.
Andrew Dunstan and Tom Lane
Discussion: https://postgr.es/m/3bac9841-fe07-713d-fa42-606c225567d6@dunslane.net
Formerly, semantic action functions for the JSON parser returned void,
so that there was no way for them to affect the parser's behavior.
That means in particular that they can't force an error exit except by
longjmp'ing. That won't do in the context of our project to make input
functions return errors softly. Hence, change them to return the same
JsonParseErrorType enum value as the parser itself uses. If an action
function returns anything besides JSON_SUCCESS, the parse is abandoned
and that error code is returned.
Action functions can thus easily return the same error conditions that
the parser already knows about. As an escape hatch for expansion, also
invent a code JSON_SEM_ACTION_FAILED that the core parser does not know
the exact meaning of. When returning this code, an action function
must use some out-of-band mechanism for reporting the error details.
This commit simply makes the API change and causes all the existing
action functions to return JSON_SUCCESS, so that there is no actual
change in behavior here. This is long enough and boring enough that
it seemed best to commit it separately from the changes that make
real use of the new mechanism.
In passing, remove a duplicate assignment of
transform_string_values_scalar.
Discussion: https://postgr.es/m/1436686.1670701118@sss.pgh.pa.us
We chose a specific wording of the not-implemented errors for
pseudotype I/O functions and other cases where there's little
value in implementing input and/or output. gtsvectorin never
got that memo though, nor did most of contrib. Make these all
fall in line, mostly because I'm a neatnik but also to remove
unnecessary translatable strings.
gbtreekey_in needs a bit of extra love since it supports
multiple SQL types. Sadly, gbtreekey_out doesn't have the
ability to do that, but I think it's unreachable anyway.
Noted while surveying datatype input functions to see what we
have left to fix.
Apparently anymultirange_in was written before we converted all
these pseudotype input functions to use a common macro, and it didn't
get fixed before committing. Sloppy merging probably explains its
unintuitive ordering, too, so rearrange.
Noted while surveying datatype input functions to see what we
have left to fix. I'm inclined to leave the pseudotypes as
throwing hard errors, because it's difficult to see a reason why
anyone would need something else. But in any case, if we want
to change that, we shouldn't have to change multiple copies of
the code.
9d9c02ccd added code to allow WindowAgg to take some shortcuts when a
monotonic WindowFunc reached some value that it could never come back
from due to the function's monotonic nature. That commit added a
runCondition field to WindowClause to store the condition which, when it
becomes false we can start taking shortcuts in nodeWindowAgg.c.
Here we fix an issue where subquery pullups didn't properly update the
runCondition to update the Vars to properly reference the new query level.
Here we also add a missing call to preprocess_expression() for the
WindowClause's runCondtion. The WindowFuncs in the targetlist will have
had this process done, so we must also do it for the WindowFuncs in the
runCondition so that they can be correctly found in the targetlist
during setrefs.c
Bug: #17709
Reported-by: Alexey Makhmutov
Author: Richard Guo, David Rowley
Discussion: https://postgr.es/m/17709-4f557160e3e8ee9a@postgresql.org
Backpatch-through: 15, where 9d9c02ccd was introduced
Buildfarm member wrasse has been complaining about empty declarations
as an effect of 8018ffb and 83a1a1b due to extra semicolons.
While on it, remove also the last backslash of the macros definitions,
causing more lines to be eaten in it than necessary, per comment from
Tom Lane.
Reported-by: Tom Lane, and buildfarm member wrasse
Author: Nathan Bossart, Michael Paquier
Discussion: https://postgr.es/m/1188769.1670640236@sss.pgh.pa.us
Replace the error trapping scheme introduced in 5bc450629 with our
shiny new errsave/ereturn mechanism. This doesn't have any real
functional impact (although I think that the new coding is able
to report a few more errors softly than v15 did). And I doubt
there's any measurable performance difference either. But this
gets rid of an ad-hoc, one-of-a-kind design in favor of a mechanism
that will be widely used going forward, so it should be a net win
for code readability.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
This patch converts the input functions for date, time, timetz,
timestamp, timestamptz, and interval to the new soft-error style.
There's some related stuff in formatting.c that remains to be
cleaned up, but that seems like a separable project.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
Pay down some ancient technical debt (dating to commit 022fd9966):
fix a couple of places in datetime parsing that were throwing
ereport's immediately instead of returning a DTERR code that could be
interpreted by DateTimeParseError. The reason for that was that there
was no mechanism for passing any auxiliary data (such as a zone name)
to DateTimeParseError, and these errors seemed to really need it.
Up to now it didn't matter that much just where the error got thrown,
but now we'd like to have a hard policy that datetime parse errors
get thrown from just the one place.
Hence, invent a "DateTimeErrorExtra" struct that can be used to
carry any extra values needed for specific DTERR codes. Perhaps
in the future somebody will be motivated to use this to improve
the specificity of other DateTimeParseError messages, but for now
just deal with the timezone-error cases.
This is on the way to making the datetime input functions report
parse errors softly; but it's really an independent change, so
commit separately.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
More could be done in this line, but I just grabbed some low-hanging
fruit. Principal objective was to remove the need for several ugly
unconstify() usages in formatting.c.
This patch converts the input functions for bool, int2, int4, int8,
float4, float8, numeric, and contrib/cube to the new soft-error style.
array_in and record_in are also converted. There's lots more to do,
but this is enough to provide proof-of-concept that the soft-error
API is usable, as well as reference examples for how to convert
input functions.
This patch is mostly by me, but it owes very substantial debt to
earlier work by Nikita Glukhov, Andrew Dunstan, and Amul Sul.
Thanks to Andres Freund for review.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
pg_input_is_valid() returns boolean, while pg_input_error_message()
returns the primary error message if the input is bad, or NULL
if the input is OK. The main reason for having two functions is
so that we can test both the details-wanted and the no-details-wanted
code paths.
Although these are primarily designed with testing in mind,
it could well be that they'll be useful to end users as well.
This patch is mostly by me, but it owes very substantial debt to
earlier work by Nikita Glukhov, Andrew Dunstan, and Amul Sul.
Thanks to Andres Freund for review.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
Postgres' standard mechanism for reporting errors (ereport() or elog())
is used for all sorts of error conditions. This means that throwing
an exception via ereport(ERROR) requires an expensive transaction or
subtransaction abort and cleanup, since the exception catcher dare not
make many assumptions about what has gone wrong. There are situations
where we would rather have a lighter-weight mechanism for dealing
with errors that are known to be safe to recover from without a full
transaction cleanup. This commit creates infrastructure to let us
adapt existing error-reporting code for that purpose. See the
included documentation changes for details. Follow-on commits will
provide test code and usage examples.
The near-term plan is to convert most if not all datatype input
functions to report invalid input "softly". This will enable
implementing some SQL/JSON features cleanly and without the cost
of subtransactions, and it will also allow creating COPY options
to deal with bad input without cancelling the whole COPY.
This patch is mostly by me, but it owes very substantial debt to
earlier work by Nikita Glukhov, Andrew Dunstan, and Amul Sul.
Thanks also to Andres Freund for review.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
The USER SET flag specifies that the variable should be set on behalf of an
ordinary role. That lets ordinary roles set placeholder variables, which
permission requirements are not known yet. Such a value wouldn't be used if
the variable finally appear to require superuser privileges.
The new flags are stored in the pg_db_role_setting.setuser array. Catversion
is bumped.
This commit is inspired by the previous work by Steve Chavez.
Discussion: https://postgr.es/m/CAPpHfdsLd6E--epnGqXENqLP6dLwuNZrPMcNYb3wJ87WR7UBOQ%40mail.gmail.com
Author: Alexander Korotkov, Steve Chavez
Reviewed-by: Pavel Borisov, Steve Chavez
Make the argument types of the File API match stdio better:
- Change the data buffer to void *, from char *.
- Change FileWrite() data buffer to const on top of that.
- Change amounts to size_t, from int.
In passing, change the FilePrefetch() amount argument from int to
off_t, to match the underlying posix_fadvise().
Discussion: https://www.postgresql.org/message-id/flat/11dda853-bb5b-59ba-a746-e168b1ce4bdb%40enterprisedb.com
In commit ffbb7e65a, I added a ModifyTableState member to ResultRelInfo
to save the owning ModifyTableState for use by nodeModifyTable.c when
performing batch inserts, but as pointed out by Tom Lane, that changed
the array stride of es_result_relations, and that would break any
previously-compiled extension code that accesses that array. Fix by
removing that member from ResultRelInfo and instead adding a List member
at the end of EState to save such ModifyTableStates.
Per report from Tom Lane. Back-patch to v14, like the previous commit;
I chose to apply the patch to HEAD as well, to make back-patching easy.
Discussion: http://postgr.es/m/4065383.1669395453%40sss.pgh.pa.us
After restart, we don't perform streaming of an in-progress transaction if
it was previously decoded and confirmed by the client. To achieve that we
were comparing the END location of the WAL record being decoded with the
WAL location we have already decoded and confirmed by the client. While
decoding the commit record, to decide whether to process and send the
complete transaction, we compare its START location with the WAL location
we have already decoded and confirmed by the client. Now, if we need to
queue some change in the transaction while decoding the commit record
(e.g. snapshot), it is possible that we decide to stream the transaction
but later commit processing decides to skip it. In such a case, we would
needlessly send the changes and later when we decide to skip it, we will
send stream abort.
We also sometimes decide to stream the changes when we actually just need
to process them locally like a change for invalidations. This will lead us
to send empty streams. To avoid this, while queuing each change for
decoding, we remember whether the transaction has any change that actually
needs to be sent downstream and use that information later to decide
whether to stream the transaction or not.
Note, we can't avoid all cases where we have to send empty streams like
the case where the plugin later decides that the change is not
publishable. However, we will no longer need to send stream_abort when we
skip sending a particular transaction.
Author: Dilip Kumar
Reviewed-by: Hou Zhijie, Ashutosh Bapat, Shi yu, Amit Kapila
Discussion: https://postgr.es/m/CAFiTN-tHK=7LzfrPs8fbT2ksrOJGQbzywcgXst2bM9-rJJAAUg@mail.gmail.com
Combine some duplicated code stanzas by creating small functions.
Most of these duplications arose at a time when I wouldn't have
trusted C compilers to auto-inline small functions intelligently,
but they're probably poor practice now. Similarly split out some
bits that aren't actually duplicative as the code stands, but would
become so after an upcoming patch to add another error-handling
code path.
Take the opportunity to add some lengthier comments about what
we're doing here, too. Re-order one function that seemed not
very well-placed.
Patch by me, per suggestions from Andres Freund.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
Previously export_dynamic was set in src/makefiles/Makefile.$port. For solaris
this required exporting with_gnu_ld. The determination of with_gnu_ld would be
nontrivial to copy for meson PGXS compatibility. It's also nice to delete
libtool.m4.
This uses -Wl,--export-dynamic on all platforms, previously all platforms but
FreeBSD used -Wl,-E. The likelihood of a name conflict seems lower with the
longer spelling.
Discussion: https://postgr.es/m/20221005200710.luvw5evhwf6clig6@awork3.anarazel.de
The same code pattern is repeated 21 times for int64 counters (0 for
missing entry) and 5 times for doubles (0 for missing entry) on database
entries. This code is switched to use macros for the basic code
instead, shaving a few hundred lines of originally-duplicated code
patterns. The function names remain the same, but some fields of
PgStat_StatDBEntry have to be renamed to cope with the new style.
This is in the same spirit as 83a1a1b.
Author: Michael Paquier
Reviewed-by: Nathan Bossart, Bertrand Drouvot
Discussion: https://postgr.es/m/Y46stlxQ2LQE20Na@paquier.xyz
Currently, information about the permissions to be checked on relations
mentioned in a query is stored in their range table entries. So the
executor must scan the entire range table looking for relations that
need to have permissions checked. This can make the permission checking
part of the executor initialization needlessly expensive when many
inheritance children are present in the range range. While the
permissions need not be checked on the individual child relations, the
executor still must visit every range table entry to filter them out.
This commit moves the permission checking information out of the range
table entries into a new plan node called RTEPermissionInfo. Every
top-level (inheritance "root") RTE_RELATION entry in the range table
gets one and a list of those is maintained alongside the range table.
This new list is initialized by the parser when initializing the range
table. The rewriter can add more entries to it as rules/views are
expanded. Finally, the planner combines the lists of the individual
subqueries into one flat list that is passed to the executor for
checking.
To make it quick to find the RTEPermissionInfo entry belonging to a
given relation, RangeTblEntry gets a new Index field 'perminfoindex'
that stores the corresponding RTEPermissionInfo's index in the query's
list of the latter.
ExecutorCheckPerms_hook has gained another List * argument; the
signature is now:
typedef bool (*ExecutorCheckPerms_hook_type) (List *rangeTable,
List *rtePermInfos,
bool ereport_on_violation);
The first argument is no longer used by any in-core uses of the hook,
but we leave it in place because there may be other implementations that
do. Implementations should likely scan the rtePermInfos list to
determine which operations to allow or deny.
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail.gmail.com
9d9c02ccd added window "run conditions", which allows the evaluation of
monotonic window functions to be skipped when the run condition is no
longer true. Prior to this commit, once the run condition was no longer
true and we stopped evaluating the window functions, we simply just left
the ecxt_aggvalues[] and ecxt_aggnulls[] arrays alone to store whatever
value was stored there the last time the window function was evaluated.
Leaving a stale value in there isn't really a problem on 64-bit builds as
all of the window functions which we recognize as monotonic all return
int8, which is passed by value on 64-bit builds. However, on 32-bit
builds, this was a problem as the value stored in the ecxt_values[]
element would be a by-ref value and it would be pointing to some memory
which would get reset once the tuple context is destroyed. Since the
WindowAgg node will output these values in the resulting tupleslot, this
could be problematic for the top-level WindowAgg node which must look at
these values to filter out the rows that don't meet its filter condition.
Here we fix this by just zeroing the ecxt_aggvalues[] and setting the
ecxt_aggnulls[] array to true when the run condition first becomes false.
This results in the WindowAgg's output having NULLs for the WindowFunc's
columns rather than the stale or pointer pointing to possibly freed
memory. These tuples with the NULLs can only make it as far as the
top-level WindowAgg node before they're filtered out. To ensure that
these tuples *are* always filtered out, we now insist that OpExprs making
up the run condition are strict OpExprs. Currently, all the window
functions which the planner recognizes as monotonic return INT8 and the
operator which is used for the run condition must be a member of a btree
opclass. In reality, these restrictions exclude nothing that's built-in
to Postgres and are unlikely to exclude anyone's custom operators due to
the requirement that the operator is part of a btree opclass. It would be
unusual if those were not strict.
Reported-by: Sergey Shinderuk, using valgrind
Reviewed-by: Richard Guo, Sergey Shinderuk
Discussion: https://postgr.es/m/29184c50-429a-ebd7-f1fb-0589c6723a35@postgrespro.ru
Backpatch-through: 15, where 9d9c02ccd was added
The same code pattern is repeated 17 times for int64 counters (0 for
missing entry) and 5 times for timestamps (NULL for missing entry) on
table entries. This code is switched to use a macro for the basic code
instead, shaving a few hundred lines of originally-duplicated code. The
function names remain the same, but some fields of PgStat_StatTabEntry
have to be renamed to cope with the new style.
Author: Bertrand Drouvot
Reviewed-by: Nathan Bossart
Discussion: https:/postgr.es/m/20221204173207.GA2669116@nathanxps13
Passing a NULL snapshot (InvalidSnapshot) is going to work but only as long
as the index can't find any matching rows. This can be confusing for
the extension authors, so add an explicit check for this argument. The check
is implemented with Assert() in order to avoid overhead in release builds.
Reported-by: Sven Klemm
Discussion: https://postgr.es/m/CAJ7c6TPxitD4vbKyP-mpmC1XwyHdPPqvjLzm%2BVpB88h8LGgneQ%40mail.gmail.com
Author: Aleksander Alekseev
Reviewed-by: Pavel Borisov
A couple of places weren't up to speed for this. By sheer good
luck, we didn't fail but just selected a non-memoized join plan,
at least in the test case we have. Nonetheless, it's a bug,
and I'm not quite sure that it couldn't have worse consequences
in other examples. So back-patch to v14 where Memoize came in.
Richard Guo
Discussion: https://postgr.es/m/CAMbWs48GkNom272sfp0-WeD6_0HSR19BJ4H1c9ZKSfbVnJsvRg@mail.gmail.com
The error messages reported during any failures while reading or
validating the header of a WAL currently includes only the offset of the
page but not the compiled LSN referring to the page, requiring an extra
step to compile it if looking at the surroundings with pg_waldump or
similar. Adding this information costs a bit in translation, but also
eases debugging.
Author: Bharath Rupireddy
Reviewed-by: Álvaro Herrera, Kyotaro Horiguchi, Maxim Orlov, Michael
Paquier
Discussion: https://postgr.es/m/CALj2ACWV=FCddsxcGbVOA=cvPyMr75YCFbSQT6g4KDj=gcJK4g@mail.gmail.com
As pointed out by Dean Rasheed, we really should be using tmp >
-(PG_INTNN_MIN / 10) rather than tmp > (PG_INTNN_MAX / 10) for checking
for overflows in the accumulation in the pg_strtointNN functions. This
does happen to be the same number when dividing by 10, but there is a
pending patch which adds other bases and this is not the same number if we
were to divide by 2 rather than 10, for example. If the base 2 parsing
was to follow this example then we could accidentally think a string
containing the value of PG_INT32_MIN was an overflow in pg_strtoint32.
Clearly that shouldn't overflow.
This does not fix any actual live bugs, only some bad examples of overflow
checks for future bases.
Reported-by: Dean Rasheed
Discussion: https://postgr.es/m/CAEZATCVEtwfhdm-K-etZYFB0=qsR0nT6qXta_W+GQx4RYph1dg@mail.gmail.com
Just because I'm a neatnik, and I'm currently working on
code in this area. It annoys me to not be able to pgindent
my patches without working around unrelated changes.
It neglected to recurse to the subpath, meaning you'd get back
a path identical to the input. This could produce wrong query
results if the omission meant that the subpath fails to enforce
some join clause it should be enforcing. We don't have a test
case for this at the moment, but the code is obviously broken
and the fix is equally obvious. Back-patch to v14 where
Memoize was introduced.
Richard Guo
Discussion: https://postgr.es/m/CAMbWs4_R=ORpz=Lkn2q3ebPC5EuWyfZF+tmfCPVLBVK5W39mHA@mail.gmail.com
These two functions failed to cover MaterialPath. That's not a
fatal problem, but we can generate better plans in some cases
if we support it.
Tom Lane and Richard Guo
Discussion: https://postgr.es/m/1854233.1669949723@sss.pgh.pa.us
We might fail to generate a partitionwise join, because
reparameterize_path_by_child() does not support all path types.
This should not be a hard failure condition: we should just fall back
to a non-partitioned join. However, generate_partitionwise_join_paths
did not consider this possibility and would emit the (misleading)
error "could not devise a query plan for the given query" if we'd
failed to make any paths for a child join. Fix it to give up on
partitionwise joining if so. (The accepted technique for giving up
appears to be to set rel->nparts = 0, which I find pretty bizarre,
but there you have it.)
I have not added a test case because there'd be little point:
any omissions of this sort that we identify would soon get fixed
by extending reparameterize_path_by_child(), so the test would stop
proving anything. However, right now there is a known test case based
on failure to cover MaterialPath, and with that I've found that this
is broken in all supported versions. Hence, patch all the way back.
Original report and patch by me; thanks to Richard Guo for
identifying a test case that works against committed versions.
Discussion: https://postgr.es/m/1854233.1669949723@sss.pgh.pa.us
Experiments have shown that modern versions of both gcc and clang are
unable to fully optimize the multiplication by 10 that we're doing in the
pg_strtointNN functions. Both compilers seem to be making use of "imul",
which is not the most efficient way to multiply by 10. This seems to be
due to the overflow checking that we're doing. Without the overflow
checks, both those compilers switch to a more efficient method of
multiplying by 10. In absence of overflow concern, integer multiplication
by 10 can be done by bit-shifting left 3 places to multiply by 8 and then
adding the original value twice.
To allow compilers this flexibility, here we adjust the code so that we
accumulate the number as an unsigned version of the type and remove the
use of pg_mul_sNN_overflow() and pg_sub_sNN_overflow(). The overflow
checking can be done simply by checking if the accumulated value has gone
beyond a 10th of the maximum *signed* value for the given type. If it has
then the accumulation of the next digit will cause an overflow. After
this is done, we do a final overflow check before converting the unsigned
version of the number back to its signed counterpart.
Testing has shown about an 8% speedup of a COPY into a table containing 2
INT columns.
Author: David Rowley, Dean Rasheed
Discussion: https://postgr.es/m/CAApHDvrL6_+wKgPqRHr7gH_6xy3hXM6a3QCsZ5ForurjDFfenA@mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvrdYByjfj-=WbmVNFgmVZg88-dE7heukw8p55aJ+W=qxQ@mail.gmail.com
I wrote this to provide a home for a planned discussion of error
return conventions for non-error-throwing functions. But it seems
useful as documentation of existing code no matter what becomes of
that proposal, so commit separately.
When updating a relation with a rule whose action performed an INSERT
from a multi-row VALUES list, the rewriter might skip processing the
VALUES list, and therefore fail to replace any DEFAULTs in it. This
would lead to an "unrecognized node type" error.
The reason was that RewriteQuery() assumed that a query doing an
INSERT from a multi-row VALUES list would necessarily only have one
item in its fromlist, pointing to the VALUES RTE to read from. That
assumption is correct for the original query, but not for product
queries produced for rule actions. In such cases, there may be
multiple items in the fromlist, possibly including multiple VALUES
RTEs.
What is required instead is for RewriteQuery() to skip any RTEs from
the product query's originating query, which might include one or more
already-processed VALUES RTEs. What's left should then include at most
one VALUES RTE (from the rule action) to be processed.
Patch by me. Thanks to Tom Lane for reviewing.
Back-patch to all supported branches.
Discussion: https://postgr.es/m/CAEZATCV39OOW7LAR_Xq4i%2BLc1Byux%3DeK3Q%3DHD_pF1o9LBt%3DphA%40mail.gmail.com
When the relkind of a relache entry changes, because a table is converted into
a view, pgstats can get confused in 15+, leading to crashes or assertion
failures.
For HEAD, Tom fixed this in b23cd185fd, by removing support for converting a
table to a view, removing the source of the inconsistency. This commit just
adds an assertion that a relcache entry's relkind does not change, just in
case we end up with another case of that in the future. As there's no cases of
changing relkind anymore, we can't add a test that that's handled correctly.
For 15, fix the problem by not maintaining the association with the old pgstat
entry when the relkind changes during a relcache invalidation processing. In
that case the pgstat entry needs to be unlinked first, to avoid
PgStat_TableStatus->relation getting out of sync. Also add a test reproducing
the issues.
No known problem exists in 11-14, so just add the test there.
Reported-by: vignesh C <vignesh21@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CALDaNm2yXz+zOtv7y5zBd5WKT8O0Ld3YxikuU3dcyCvxF7gypA@mail.gmail.com
Discussion: https://postgr.es/m/CALDaNm3oZA-8Wbps2Jd1g5_Gjrr-x3YWrJPek-mF5Asrrvz2Dg@mail.gmail.com
Backpatch: 15-
Ignore trailing spaces for non-deterministic collations when
hashing.
The previous behavior could lead to tuples falling into the wrong
partitions when hash partitioning is combined with the BPCHAR type and
a non-deterministic collation. Fortunately, it did not affect hash
indexes, because hash indexes do not use extended hash functions.
Decline to backpatch, per discussion.
Discussion: https://postgr.es/m/eb83d0ac7b299eb08f9b900dd08a5a0c5d90e517.camel@j-davis.com
Reviewed-by: Richard Guo, Tom Lane
It seems better to deal with this by explicit annotations on the
fields in question, instead of magic knowledge embedded in the
script. While that creates a risk-of-omission from failing to
annotate fields, the preceding commit should catch any such
oversights.
Discussion: https://postgr.es/m/263413.1669513145@sss.pgh.pa.us
Check that if we generate a call to copy, compare, write, or read
a specific node type, that node type does have the appropriate
support function. (This doesn't protect against trying to invoke
nonexistent code when considering generic field types such as
"Node *", but it seems like a useful check anyway.)
Check that array_size() refers to a field appearing earlier in
the struct. Aside from catching obvious errors like a misspelled
field name, this protects against a more subtle mistake: if the
size field appears later in the struct than the array field, then
compare and read functions would misbehave. There is actually
exactly that situation in PlannerInfo, but it's okay since we
do not need compare or read functionality for that (today anyway).
Discussion: https://postgr.es/m/263413.1669513145@sss.pgh.pa.us
Some options of these commands need to be able to identify the start
of the function body within the output of pg_get_functiondef().
It used to be that that always began with "AS", but since the
introduction of new-style SQL functions, it might also start with
"BEGIN" or "RETURN". Fix that on the psql side, and add some
regression tests.
Noted by me awhile ago, but I didn't do anything about it.
Thanks to David Johnston for a nag.
Discussion: https://postgr.es/m/AM9PR01MB8268D5CDABDF044EE9F42173FE8C9@AM9PR01MB8268.eurprd01.prod.exchangelabs.com
Up to now we have allowed manual creation of an ON SELECT rule on
a table to convert it into a view. That was never anything but a
horrid, error-prone hack though. pg_dump used to rely on that
behavior to deal with cases involving circular dependencies,
where a dependency loop could be broken by separating the creation
of a view from installation of its ON SELECT rule. However, we
changed pg_dump to use CREATE OR REPLACE VIEW for that in commit
d8c05aff5 (which was later back-patched as far as 9.4), so there's
not a good argument anymore for continuing to support the behavior.
The proximate reason for axing it now is that we found that the
new statistics code has failure modes associated with the relkind
change caused by this behavior. We'll patch around that in v15,
but going forward it seems like a better idea to get rid of the
need to support relkind changes.
Discussion: https://postgr.es/m/CALDaNm2yXz+zOtv7y5zBd5WKT8O0Ld3YxikuU3dcyCvxF7gypA@mail.gmail.com
ri_RootToPartitionMap is currently only initialized for tuple routing
target partitions, though a future commit will need the ability to use
it even for the non-partition child tables, so make adjustments to the
decouple it from the partitioning code.
Also, make it lazily initialized via ExecGetRootToChildMap(), making
that function its preferred access path. Existing third-party code
accessing it directly should no longer do so; consequently, it's been
renamed to ri_RootToChildMap, which also makes it consistent with
ri_ChildToRootMap.
ExecGetRootToChildMap() houses the logic of setting the map appropriately
depending on whether a given child relation is partition or not.
To support this, also add a separate entry point for TupleConversionMap
creation that receives an AttrMap. No new code here, just split an
existing function in two.
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqEYUhDXSK5BTvG_xk=eaAEJCD4GS3C6uH7ybBvv+Z_Tmg@mail.gmail.com
For Updates and Deletes, we were not honoring the columns list for old
tuple values while sending tuple data via pgoutput. This results in
pgoutput emitting more columns than expected.
This is not a problem for built-in logical replication as we simply ignore
additional columns based on the relation information sent previously which
didn't have those columns. However, some other users of pgoutput plugin
may expect the columns as per the column list. Also, sending extra columns
unnecessarily consumes network bandwidth defeating the purpose of the
column list feature.
Reported-by: Gunnar Morling
Author: Hou Zhijie
Reviewed-by: Amit Kapila
Backpatch-through: 15
Discussion: https://postgr.es/m/CADGJaX9kiRZ-OH0EpWF5Fkyh1ZZYofoNRCrhapBfdk02tj5EKg@mail.gmail.com
The only use of $(LD) in Makefiles is for AIX, to generate the export file for
the backend. We only support the system linker on AIX and we already hardcode
the path to a number of other binaries. Removing LD substitution will simplify
the upcoming meson PGXS compatibility.
While at it, add a comment why -r is used.
A subsequent commit will remove the determination of LD from configure as
well.
Discussion: https://postgr.es/m/20221005200710.luvw5evhwf6clig6@awork3.anarazel.de
In commit 40c24bfef, I forgot to use get_rule_expr_paren() for the
arguments of AT TIME ZONE, resulting in possibly not printing parens
for expressions that need it. But get_rule_expr_paren() wouldn't have
gotten it right anyway, because isSimpleNode() hadn't been taught that
COERCE_SQL_SYNTAX parent nodes don't guarantee sufficient parentheses.
Improve all that. Also use this methodology for F_IS_NORMALIZED, so
that we don't print useless parens for that.
In passing, remove a comment that was obsoleted later.
Per report from Duncan Sands. Back-patch to v14 where this code
came in. (Before that, we didn't try to print AT TIME ZONE that way,
so there was no bug just ugliness.)
Discussion: https://postgr.es/m/f41566aa-a057-6628-4b7c-b48770ecb84a@deepbluecap.com
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of directly to the
Append/MergeAppend plan node. What gets set instead in the
latter is an index field which points to the list element
of PlannedStmt.partPruneInfos containing the PartitioPruneInfo
belonging to the plan node.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked when validating a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than requiring
a walk of the plan tree to find them, which is easier when it can be
done by simply iterating over PlannedStmt.partPruneInfos.
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
Matviews have been discarded from needing predicate locks since 3bf3ab8
and their introduction. At this point, there was no concurrent flavor
of REFRESH yet, hence there was no meaning in having materialized views
look at read/write conflicts with concurrent transactions using
themselves the serializable isolation level because they could only be
refreshed with an access exclusive lock. CONCURRENTLY, on the contrary,
allows reads and writes during a refresh as it holds a share update
exclusive lock.
Some isolation tests are added to show the effect of the change, with a
combination of one table and a matview based on it, using a mix of
REFRESH CONCURRENTLY and read/write queries.
This could arguably be considered as a bug, but as it is a subtle
behavior change potentially impacting applications no backpatch is
done.
Author: Yugo Nagata
Reviewed-by: Richard Guo, Dilip Kumar, Michael Paquier
Discussion: https://postgr.es/m/20220726164434.42d4e33911b4b4fcf751c4e7@sraoss.co.jp
A future commit will move the checkAsUser field from RangeTblEntry
to a new node that, unlike RTEs, will only be created for tables
mentioned in the query but not for the inheritance child relations
added to the query by the planner. So, checkAsUser value for a
given child relation will have to be obtained by referring to that
for its ancestor mentioned in the query.
In preparation, it seems better to expand the use of RelOptInfo.userid
during planning in place of rte->checkAsUser so that there will be
fewer places to adjust for the above change.
Given that the child-to-ancestor mapping is not available during the
execution of a given "child" ForeignScan node, add a checkAsUser
field to ForeignScan to carry the child relation's RelOptInfo.userid.
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqGFCs2uq7VRKi7g+FFKbP6Ea_2_HkgZb2HPhUfaAKT3ng@mail.gmail.com
Previously, we'd compress only when the active range of array entries
reached Max(4 * PROCARRAY_MAXPROCS, 2 * pArray->numKnownAssignedXids).
If max_connections is large, the first term could result in not
compressing for a long time, resulting in much wastage of cycles in
hot-standby backends scanning the array to take snapshots. Get rid
of that term, and just bound it to 2 * pArray->numKnownAssignedXids.
That however creates the opposite risk, that we might spend too much
effort compressing. Hence, consider compressing only once every 128
commit records. (This frequency was chosen by benchmarking. While
we only tried one benchmark scenario, the results seem stable over
a fairly wide range of frequencies.)
Also, force compression when processing RecoveryInfo WAL records
(which should be infrequent); the old code could perform compression
then, but would do so only after the same array-range check as for
the transaction-commit path.
Also, opportunistically run compression if the startup process is about
to wait for WAL, though not oftener than once a second. This should
prevent cases where we waste lots of time by leaving the array
not-compressed for long intervals due to low WAL traffic.
Lastly, add a simple check to keep us from uselessly compressing
when the array storage is already compact.
Back-patch, as the performance problem is worse in pre-v14 branches
than in HEAD.
Simon Riggs and Michail Nikolaev, with help from Tom Lane and
Andres Freund.
Discussion: https://postgr.es/m/CALdSSPgahNUD_=pB_j=1zSnDBaiOtqVfzo8Ejt5J_k7qZiU1Tw@mail.gmail.com
This is an oversight in commit 7c337b6b5: I apparently didn't think
about the possibility of a SQL function being executed multiple
times within a query. In that case, functions.c's primitive caching
mechanism allows the same utility parse tree to be presented for
execution more than once. We have to tell ProcessUtility to make
a working copy of the parse tree, or bad things happen.
Normally I'd add a regression test, but I think the reported crasher
is dependent on some rather random implementation choices that are
nowhere near functions.c, so its usefulness as a long-lived test
feels questionable. In any case, this fix is clearly correct given
the design choices of 7c337b6b5.
Per bug #17702 from Xin Wen. Thanks to Daniel Gustafsson for
analysis. Back-patch to v14 where the faulty commit came in
(before that, the responsibility for copying scribble-able
utility parse trees lay elsewhere).
Discussion: https://postgr.es/m/17702-ad24fdcdd1e9047a@postgresql.org
The JOIN_SEMI case Assert'ed that there are no PlaceHolderVars that
need to be evaluated at the semijoin's RHS, which is wrong because
there could be some in the semijoin's qual condition. However, there
could not be any references further up than that, and within the qual
there is not any way that such a PHV could have gone to null yet, so
we don't really need the PHV and there is no need to avoid making the
RHS-removal optimization. The upshot is that there's no actual bug
in production code, and we ought to just remove this misguided Assert.
While we're here, also drop the JOIN_RIGHT case, which is dead code
because reduce_outer_joins() already got rid of JOIN_RIGHT.
Per bug #17700 from Xin Wen. Uselessness of the JOIN_RIGHT case
pointed out by Richard Guo. Back-patch to v12 where this code
was added.
Discussion: https://postgr.es/m/17700-2b5c10d917c30687@postgresql.org
When it's given as true, return a 0 in the position of the missing
column rather than raising an error.
This is currently unused, but it allows us to reimplement column
permission checking in a subsequent commit. It seems worth breaking
into a separate commit because it affects unrelated code.
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFfiai=qBxPDTjaio_ZcaqUKh+FC=prESrB8ogZgFNNNQ@mail.gmail.com
Previously, an idle startup (recovery) process would wake up every 5
seconds to have a chance to poll for promote_trigger_file, even if that
GUC was not configured. That promotion triggering mechanism was
effectively superseded by pg_ctl promote and pg_promote() a long time
ago. There probably aren't many users left and it's very easy to change
to the modern mechanisms, so we agreed to remove the feature.
This is part of a campaign to reduce wakeups on idle systems.
Author: Simon Riggs <simon.riggs@enterprisedb.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Ian Lawrence Barwick <barwick@gmail.com>
Discussion: https://postgr.es/m/CANbhV-FsjnzVOQGBpQ589%3DnWuL1Ex0Ykn74Nh1hEjp2usZSR5g%40mail.gmail.com
This provides two new predefined roles: pg_vacuum_all_tables and
pg_analyze_all_tables. Roles which have been granted these roles can
perform vacuum or analyse respectively on any or all tables as if they
were a superuser. This removes the need to grant superuser privilege to
roles just so they can perform vacuum and/or analyze.
Nathan Bossart
Reviewed by: Bharath Rupireddy, Kyotaro Horiguchi, Stephen Frost, Robert
Haas, Mark Dilger, Tom Lane, Corey Huinker, David G. Johnston, Michael
Paquier.
Discussion: https://postgr.es/m/20220722203735.GB3996698@nathanxps13
Currently a table can only be vacuumed or analyzed by its owner or
a superuser. This can now be extended to any user by means of an
appropriate GRANT.
Nathan Bossart
Reviewed by: Bharath Rupireddy, Kyotaro Horiguchi, Stephen Frost, Robert
Haas, Mark Dilger, Tom Lane, Corey Huinker, David G. Johnston, Michael
Paquier.
Discussion: https://postgr.es/m/20220722203735.GB3996698@nathanxps13
Commit b663a4136, which allowed FDWs to INSERT rows in bulk, added to
nodeModifyTable.c code to flush pending inserts to the foreign-table
result relation(s) before completing processing of the ModifyTable node,
but the code failed to take into account the case where the INSERT query
has modifying CTEs, leading to incorrect results.
Also, that commit failed to flush pending inserts before firing BEFORE
ROW triggers so that rows are visible to such triggers.
In that commit we scanned through EState's
es_tuple_routing_result_relations or es_opened_result_relations list to
find the foreign-table result relations to which pending inserts are
flushed, but that would be inefficient in some cases. So to fix, 1) add
a List member to EState to record the insert-pending result relations,
and 2) modify nodeModifyTable.c so that it adds the foreign-table result
relation to the list in ExecInsert() if appropriate, and flushes pending
inserts properly using the list where needed.
While here, fix a copy-and-pasteo in a comment in ExecBatchInsert(),
which was added by that commit.
Back-patch to v14 where that commit appeared.
Discussion: https://postgr.es/m/CAPmGK16qutyCmyJJzgQOhfBq%3DNoGDqTB6O0QBZTihrbqre%2BoxA%40mail.gmail.com
The code has been assuming already in a few places that the initial
recursion nesting depth is 0, and the recent changes in hba.c (mainly
783e8c6) have relies on this assumption in more places. The maximum
recursion nesting level is assumed to be 10 for hba.c and GUCs.
Author: Julien Rouhaud
Discussion: https://postgr.es/m/20221124090724.n7amf5kpdhx6vb76@jrouhaud
Logical WAL senders display now as follows, gaining a database name:
postgres: walsender USER DATABASE HOST(PORT) STATE
Physical WAL senders show up the same, as of:
postgres: walsender USER HOST(PORT) STATE
This information was missing, hence it was not possible to know from ps
if a WAL sender was a logical or a physical one, and on which database
it is connected when it is logical.
Author: Tatsuhiro Nakamori
Reviewed-by: Fujii Masao, Bharath Rupireddy
Discussion: https://postgr.es/m/36a3b137e82e0ea9fe7e4234f03b64a1@oss.nttdata.com
pg_hba.conf and pg_ident.conf gain support for three record keywords:
- "include", to include a file.
- "include_if_exists", to include a file, ignoring it if missing.
- "include_dir", to include a directory of files. These are classified
by name (C locale, mostly) and need to be prefixed by ".conf", hence
following the same rules as GUCs.
This commit relies on the refactoring pieces done in efc9816, ad6c528,
783e8c6 and 1b73d0b, adding a small wrapper to build a list of
TokenizedAuthLines (tokenize_include_file), and the code is shaped to
offer some symmetry with what is done for GUCs with the same options.
pg_hba_file_rules and pg_ident_file_mappings gain a new field called
file_name, to track from which file a record is located, taking
advantage of the addition of rule_number in c591300 to offer an
organized view of the HBA or ident records loaded.
Bump catalog version.
Author: Julien Rouhaud
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/20220223045959.35ipdsvbxcstrhya@jrouhaud
When building hash indexes using the spool method, tuples are added to the
index page in hashkey order. Because of this, we can safely skip
performing the binary search on the existing tuples on the page to find
the location to insert the tuple based on its hashkey value. For this
case, we can just always put the tuple at the end of the item array as the
tuples will always arrive in hashkey order.
Testing has shown that this can improve hash index build speeds by 5-15%
with a unique set of integer values.
Author: Simon Riggs
Reviewed-by: David Rowley
Tested-by: David Zhang, Tomas Vondra
Discussion: https://postgr.es/m/CANbhV-GBc5JoG0AneUGPZZW3o4OK5LjBGeKe_icpC3R1McrZWQ@mail.gmail.com
The memory context was created before attempting to open the first HBA
or ident file, which would cause it to leak. This had no consequences
for the system views for HBA and ident files, but this would cause
memory leaks in the postmaster on reload if the initial HBA and/or ident
files are missing, which is a valid behavior while the backend is
running.
Oversight in efc9816.
Author: Ted Yu
Discussion: https://postgr.es/m/CALte62xH6ivgiKKzPRJgfekPZC6FKLB3xbnf3=tZmc_gKj78dw@mail.gmail.com
This should have been added in efc9816, but it looks like I have found a
way to mess up a bit a patch split. This should have no consequence in
practice, but let's be clean.
Discussion: https://postgr.es/m/Y324HvGKiWxW2yxe@paquier.xyz
The list of TokenizedAuthLines generated at parsing for the HBA and
ident files is now stored in a static context called tokenize_context,
where only all the parsed tokens are stored. This context is created
when opening the first authentication file of a HBA/ident set (hba_file
or ident_file), and is cleaned up once we are done all the work around
it through a new routine called free_auth_file(). One call of
open_auth_file() should have one matching call of free_auth_file(), the
creation and deletion of the tokenization context is controlled by the
recursion depth of the tokenization.
Rather than having tokenize_auth_file() return a memory context that
includes all the records, the tokenization logic now creates and deletes
one memory context each time this function is called. This will
simplify recursive calls to this routine for the upcoming inclusion
record logic.
While on it, rename tokenize_inc_file() to tokenize_expand_file() as
this would conflict with the upcoming patch that will add inclusion
records for HBA/ident files. An '@' file has its tokens added to an
existing list.
Reloading HBA/indent configuration in a tight loop shows no leaks, as of
one type of test done (with and without -DEXEC_BACKEND).
Author: Michael Paquier
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/Y324HvGKiWxW2yxe@paquier.xyz
Some custom table access method may have their tuple format and use custom
executor nodes for their custom scan types. The ability to set a custom slot
would save them from tuple format conversion. Other users of custom executor
nodes may also benefit.
Discussion: https://postgr.es/m/CAPpHfduJUU6ToecvTyRE_yjxTS80FyPpct4OHaLFk3OEheMTNA@mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Pavel Borisov
We're running out of bits for new permissions. This change doubles the
number of permissions we can accomodate from 16 to 32, so the
forthcoming new ones for vacuum/analyze don't exhaust the pool.
Nathan Bossart
Reviewed by: Bharath Rupireddy, Kyotaro Horiguchi, Stephen Frost, Robert
Haas, Mark Dilger, Tom Lane, Corey Huinker, David G. Johnston, Michael
Paquier.
Discussion: https://postgr.es/m/20220722203735.GB3996698@nathanxps13
Pass VACUUM parameters (VacuumParams state) to vacuum_set_xid_limits()
directly, rather than passing most individual VacuumParams fields as
separate arguments.
Also make vacuum_set_xid_limits() output parameter symbol names match
those used by its vacuumlazy.c caller.
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wz=TE7gW5DgSahDkf0UEZigFGAoHNNN6EvSrdzC=Kn+hrA@mail.gmail.com
We shouldn't ever need to rely on whether HEAP_XMAX_INVALID is set in
t_infomask when considering whether or not an xmax should be deemed
already frozen, since that status flag is just a hint. The only
acceptable representation for an "xmax_already_frozen" raw xmax field is
the transaction ID value zero (also known as InvalidTransactionId).
Adjust code that superficially appeared to rely on HEAP_XMAX_INVALID to
make the rule about xmax_already_frozen clear. Also avoid needlessly
rereading the tuple's raw xmax.
Oversight in bugfix commit d2599ecf. There is no evidence that this
ever led to incorrect behavior, so no backpatch. The worst consequence
of this bug was that VACUUM could hypothetically fail to notice and
report on certain kinds of corruption, which seems fairly benign.
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wzkh3DMCDRPfhZxj9xCq9v3WmzvmbiCpf1dNKUBPadhCbQ@mail.gmail.com