pg_xact lookups are relatively expensive. Move the xmin/xmax commit
status checks from the point that freeze plans are prepared to the point
that they're actually executed. Otherwise we'll repeat many commit
status checks whenever multiple successive VACUUM operations scan the
same pages and decide against freezing each time, which is a waste of
cycles.
Oversight in commit 1de58df4, which added page-level freezing.
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkZpe4K6qMfEt8H4qYJCKc2R7TPvKsBva7jc9w7iGXQSw@mail.gmail.com
Improve comments added by commit 1de58df4 which describe the
lazy_scan_prune "freeze the page" path. These newly revised comments
are based on suggestions from Jeff Davis.
In passing, remove nearby visibility_cutoff_xid comments left over from
commit 6daeeb1f.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/ebc857107fe3edd422ef8a65191ca4a8da568b9b.camel@j-davis.com
Windows can enumerate the locales that are either installed or
supported by calling EnumSystemLocalesEx(), similar to what is already
done in the READ_LOCALE_A_OUTPUT switch. We can refactor some of the
logic already used in that switch into a new function
create_collation_from_locale().
The enumerated locales have BCP 47 shape, that is with a hyphen
between language and territory, instead of POSIX's underscore. The
created collations will retain the BCP 47 shape, but we will also
create a POSIX alias, so xx-YY will have an xx_YY alias.
A new test collate.windows.win1252 is added that is like
collate.linux.utf8.
Author: Juan Jose Santamaria Flecha <juanjo.santamaria@gmail.com>
Reviewed-by: Dmitry Koval <d.koval@postgrespro.ru>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/0050ec23-34d9-2765-9015-98c04f0e18ac@postgrespro.ru
While on it, newlines are removed from the end of two elog() strings.
The others are simple grammar mistakes. One comment in pg_upgrade
referred incorrectly to sequences since a7e5457.
Author: Justin Pryzby
Discussion: https://postgr.es/m/20221230231257.GI1153@telsasoft.com
Backpatch-through: 11
When considering an empty grouping set, we fetched
phasedata->eqfunctions[-1]. Because the eqfunctions array is
palloc'd, that would always be an aset pointer in released versions,
and thus the code accidentally failed to malfunction (since it would
do nothing unless it found a null pointer). Nonetheless this seems
like trouble waiting to happen, so add a check for length == 0.
It's depressing that our valgrind testing did not catch this.
Maybe we should reconsider the choice to not mark that word NOACCESS?
Richard Guo
Discussion: https://postgr.es/m/CAMbWs4-vZuuPOZsKOYnSAaPYGKhmacxhki+vpOKk0O7rymccXQ@mail.gmail.com
The term "truncation" has been ambiguous since commit 10a8d13823 added
line pointer array truncation during heap pruning. Clear things up by
specifying that we're talking about rel truncation here, to match nearby
comments that apply to tuples with storage.
Don't allow VACUUM to WAL-log the value FrozenTransactionId as the
snapshotConflictHorizon of freezing or visibility map related WAL
records.
The only special XID value that's an allowable snapshotConflictHorizon
is InvalidTransactionId, which is interpreted as "record definitely
doesn't require a recovery conflict".
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WznuNGSzF8v6OsgjaC5aYsb3cZ6HW6MLm30X0d65cmSH6A@mail.gmail.com
A refcursor variable that is bound to a specific query (by declaring
it with "CURSOR FOR") now chooses a portal name in the same way as an
unbound, plain refcursor variable. Its string value starts out as
NULL, and unless that's overridden by manual assignment, it will be
replaced by a unique-within-session portal name during OPEN.
The previous behavior was to initialize such variables to contain
their own name, resulting in that also being the portal name unless
the user overwrote it before OPEN. The trouble with this is that
it causes failures due to conflicting portal names if the same
cursor variable name is used in different functions. It is pretty
non-orthogonal to have bound and unbound refcursor variables behave
differently on this point, too, so let's change it.
This change can cause compatibility problems for applications that
open a bound cursor in a plpgsql function and then use it in the
calling code without explicitly passing back the refcursor value
(portal name). If the calling code simply assumes that the portal
name matches the called function's variable name, it will now fail.
That can be fixed by explicitly assigning a string value to the
refcursor variable before OPEN, e.g.
DECLARE myc CURSOR FOR SELECT ...;
BEGIN
myc := 'myc'; -- add this
OPEN myc;
We have no documentation examples showing the troublesome usage
pattern, so we can hope it's rare in practice.
Patch by me; thanks to Pavel Stehule and Jan Wieck for review.
Discussion: https://postgr.es/m/1465101.1667345983@sss.pgh.pa.us
Cirrus is about to shut down its macOS-on-Intel support, so it's time to
move our CI testing over to ARM instances. The Homebrew package manager
changed its default installation prefix for the new architecture, so a
couple of tests need tweaks to find binaries.
Back-patch to 15, where in-tree CI began.
Author: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20221122225744.GF11463%40telsasoft.com
When brin_minmax_multi_union merges summaries, we may end up with just a
single range after merge_overlapping_ranges. The summaries may contain
just one range each, and they may overlap (or be exactly the same).
With a single range there's no distance to calculate, but we happen to
call build_distances anyway - which is fine, we don't calculate the
distance in this case, except that with asserts this failed due to a
check there are at least two ranges.
The assert is unnecessarily strict, so relax it a bit and bail out if
there's just a single range. The relaxed assert would be enough, but
this way we don't allocate unnecessary memory for distance.
Backpatch to 14, where minmax-multi opclasses were introduced.
Reported-by: Jaime Casanova
Backpatch-through: 14
Discussion: https://postgr.es/m/YzVA55qS0hgz8P3r@ahch-to
f193883 has been incorrectly setting up the precision used in the
timestamp compilations returned by the following functions:
- LOCALTIME
- LOCALTIMESTAMP
- CURRENT_TIME
- CURRENT_TIMESTAMP
Specifying an out-of-range precision for CURRENT_TIMESTAMP and
LOCALTIMESTAMP was raising a WARNING without adjusting the precision,
leading to a subsequent error. LOCALTIME and CURRENT_TIME raised a
WARNING without an error, still the precision given to the internal
routines was not correct, so let's be clean.
Ian has reported the problems in timestamp.c, while I have noticed the
ones in date.c. Regression tests are added for all of them with
precisions high enough to provide coverage for the warnings, something
that went missing up to this commit.
Author: Ian Lawrence Barwick, Michael Paquier
Discussion: https://postgr.es/m/CAB8KJ=jQEnn9sYG+N752spt68wMrhmT-ocHCh4oeNmHF82QMWA@mail.gmail.com
New versions of perl trigger warnings within perl.h with our compiler
flags. At least -Wdeclaration-after-statement, -Wshadow=compatible-local are
known to be problematic.
To avoid these warnings, conditionally use #pragma GCC system_header before
including plperl.h.
Alternatively, we could add the include paths for problematic headers with
-isystem, but that is a larger hammer and is harder to search for.
A more granular alternative would be to use #pragma GCC diagnostic
push/ignored/pop, but gcc warns about unknown warnings being ignored, so every
to-be-ignored-temporarily compiler warning would require its own pg_config.h
symbol and #ifdef.
As the warnings are voluminous, it makes sense to backpatch this change. But
don't do so yet, we first want gather buildfarm coverage - it's e.g. possible
that some compiler claiming to be gcc compatible has issues with the pragma.
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: Discussion: https://postgr.es/m/20221228182455.hfdwd22zztvkojy2@awork3.anarazel.de
Teach VACUUM to decide on whether or not to trigger freezing at the
level of whole heap pages. Individual XIDs and MXIDs fields from tuple
headers now trigger freezing of whole pages, rather than independently
triggering freezing of each individual tuple header field.
Managing the cost of freezing over time now significantly influences
when and how VACUUM freezes. The overall amount of WAL written is the
single most important freezing related cost, in general. Freezing each
page's tuples together in batch allows VACUUM to take full advantage of
the freeze plan WAL deduplication optimization added by commit 9e540599.
Also teach VACUUM to trigger page-level freezing whenever it detects
that heap pruning generated an FPI. We'll have already written a large
amount of WAL just to do that much, so it's very likely a good idea to
get freezing out of the way for the page early. This only happens in
cases where it will directly lead to marking the page all-frozen in the
visibility map.
In most cases "freezing a page" removes all XIDs < OldestXmin, and all
MXIDs < OldestMxact. It doesn't quite work that way in certain rare
cases involving MultiXacts, though. It is convenient to define "freeze
the page" in a way that gives FreezeMultiXactId the leeway to put off
the work of processing an individual tuple's xmax whenever it happens to
be a MultiXactId that would require an expensive second pass to process
aggressively (allocating a new multi is especially worth avoiding here).
FreezeMultiXactId is eager when processing is cheap (as it usually is),
and lazy in the event of an individual multi that happens to require
expensive second pass processing. This avoids regressions related to
processing of multis that page-level freezing might otherwise cause.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com
Some compilers complain about sub_rteperminfos not being
initialized, evidently because they don't detect that it
is only used and set if isGeneralSelect is true.
Make it follow the long-established pattern for its
sibling variable sub_rtable.
Per reports from Pavel Stehule and the buildfarm.
Discussion: https://postgr.es/m/CAFj8pRDOvGOi-n616kM0Cc7qSbg_nGoS=-haB+D785sUXADqSg@mail.gmail.com
The modified error message for regcollationin failure includes
the database encoding, which it should've occurred to me is a
portability hazard for the regression tests. Adjust the test
so the expected output doesn't include that.
In passing, fix a comment typo introduced in b8c0ffbd2.
Per buildfarm.
Given the soft-input-error feature, we can reduce these functions
to be just thin wrappers around a soft-error call of the
corresponding datatype input function. This means less code and
more certainty that the to_reg* functions match the normal input
behavior.
Notably, it also means that they will accept numeric OID input,
which they didn't before. It's not clear to me if that omission
had more than laziness behind it, but it doesn't seem like
something we need to work hard to preserve.
Discussion: https://postgr.es/m/3910031.1672095600@sss.pgh.pa.us
This is not really complete, but it catches most cases of practical
interest. The main omissions are:
* regtype, regprocedure, and regoperator parse type names by
calling the main grammar, so any grammar-detected syntax error
will still be a hard error. Also, if one includes a type
modifier in such a type specification, errors detected by the
typmodin function will be hard errors.
* Lookup errors are handled just by passing missing_ok = true
to the relevant catalog lookup function. Because we've used
quite a restrictive definition of "missing_ok", this means that
edge cases such as "the named schema exists, but you lack
USAGE permission on it" are still hard errors.
It would make sense to me to replace most/all missing_ok
parameters with an escontext parameter and then allow these
additional lookup failure cases to be trapped too. But that's
a job for some other day.
Discussion: https://postgr.es/m/3342239.1671988406@sss.pgh.pa.us
This is slightly tedious because the adjustments cascade through
a couple of levels of subroutines, but it's not very hard.
I chose to avoid changing function signatures more than absolutely
necessary, by passing the escontext pointer in existing structs
where possible.
tsquery's nuisance NOTICEs about empty queries are suppressed in
soft-error mode, since they're not errors and we surely don't want
them to be shown to the user anyway. Maybe that whole behavior
should be reconsidered.
Discussion: https://postgr.es/m/3824377.1672076822@sss.pgh.pa.us
Historically these input functions just called strtoul or strtoull
and returned the result, with no error detection whatever. Upgrade
them to reject garbage input and out-of-range values, similarly to
our other numeric input routines.
To share the code for this with type oid, adjust the existing
"oidin_subr" to be agnostic about the SQL name of the type it is
handling, and move it to numutils.c; then clone it for 64-bit types.
Because the xid types previously accepted hex and octal input by
reason of calling strtoul[l] with third argument zero, I made the
common subroutine do that too, with the consequence that type oid
now also accepts hex and octal input. In view of 6fcda9aba, that
seems like a good thing.
While at it, simplify the existing over-complicated handling of
syntax errors from strtoul: we only need one ereturn not three.
Discussion: https://postgr.es/m/3526121.1672000729@sss.pgh.pa.us
When VACUUM determines that an existing MultiXact should use a freeze
plan that sets xmax to InvalidTransactionId, the original Multi may or
may not be before OldestMxact. Remove an incorrect assertion that
expected it to always be from before OldestMxact.
Oversight in commit 4ce3af.
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/TYAPR01MB5866B24104FD80B5D7E65C3EF5ED9@TYAPR01MB5866.jpnprd01.prod.outlook.com
002_pg_upgrade.pl gains support for a new environment variable called
"filter_rules", that can be used to point to a file that includes a
set of custom regular expressions that would be applied to the dumps of
the origin and target clusters when doing a cross-version test (aka when
defining olddump and oldinstall), to give the possibility to reshape
dynamically the dumps in the same way as the internals of the buildfarm
code so as the tests are able to pass in scenarios where one expects
them to even if pg_dump generates slightly-different outputs depending
on the versions involved.
This option is not used when pg_upgrade runs with the same version for
the origin and target clusters, and it is the last piece I see as
required to be able to plug-in more efficiently the TAP tests of
pg_upgrade with the buildfarm or just a CI.
Author: Anton A. Melnikov
Discussion: https://postgr.es/m/49f389ba-95ce-8a9b-09ae-f60650c0e7c7@inbox.ru
This option extracts (potentially decompressing) full-page images
included in WAL records into a given target directory. These images are
subject to the same filtering rules as the normal display of the WAL
records, hence with --relation one can for example extract only the FPIs
issued on the relation defined. By default, the records are printed or
their stats computed (--stats), using --quiet would only save the images
without any output generated.
This is a tool aimed mostly for very experienced users, useful for
fixing page-level corruption or just analyzing the past state of a page,
and there were no easy way to do that with the in-core tools up to now
when looking at WAL.
Each block is saved in a separate file, to ease their manipulation, with
the file respecting <lsn>.<ts>.<db>.<rel>.<blk>_<fork> with as format.
For instance, 00000000-010000C0.1663.1.6117.123_main refers to:
- WAL record LSN in hexa format (00000000-010000C0).
- Tablespace OID (1663).
- Database OID (1).
- Relfilenode (6117).
- Block number (123).
- Fork name of the file this block came from (_main).
Author: David Christensen
Reviewed-by: Sho Kato, Justin Pryzby, Bharath Rupireddy, Matthias van de
Meent
Discussion: https://postgr.es/m/CAOxo6XKjQb2bMSBRpePf3ZpzfNTwjQUc4Tafh21=jzjX6bX8CA@mail.gmail.com
This enables streaming or serializing changes immediately in logical
decoding. This parameter is intended to be used to test logical decoding
and replication of large transactions for which otherwise we need to
generate the changes till logical_decoding_work_mem is reached.
This helps in reducing the timing of existing tests related to logical
replication of in-progress transactions and will help in writing tests for
for the upcoming feature for parallelly applying large in-progress
transactions.
Author: Shi yu
Reviewed-by: Sawada Masahiko, Shveta Mallik, Amit Kapila, Dilip Kumar, Kuroda Hayato, Kyotaro Horiguchi
Discussion: https://postgr.es/m/OSZPR01MB63104E7449DBE41932DB19F1FD1B9@OSZPR01MB6310.jpnprd01.prod.outlook.com
f4f2f2b was doing a sequential scan of pg_class before checking if a
relation had attributes dependent on aclitem as data typewhen building
the set of ALTER TABLE queries, but it would be costly on a regression
database.
While on it, make the query style more consistent with the rest.
Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/20221223032724.GQ1153@telsasoft.com
While at it, use a common subroutine already.
This doesn't move the needle very far in terms of making these
functions non-throwing; the only case we're now able to trap is
numeric-OID-is-out-of-range. Still, it seems like a pretty
non-controversial step in that direction.
ed1a88dda added support functions for the ntile(), percent_rank() and
cume_dist() window functions but neglected to actually add these support
functions to the pg_proc entry for the corresponding window function.
Also, take this opportunity to add these window functions to one of the
regression tests added in ed1a88dda to give the support functions a little
bit of exercise. If I'd done that in the first place then the omission
would have been more obvious.
Bump the catversion, again.
The test added in commit e44dae07f9 has a thinko: it wants to read
info about a few WAL records, but it obtains the LSN of the final record
to read by asking for the WAL insert position; however,
pg_get_wal_records_info only accepts to read up to the flush position
(cf. IsFutureLSN()). In normal conditions there is no difference, since
the last record written by the preceding loop is known flushed and it's
the one the test wants; but it's possible to have some other process
insert another WAL record that isn't flushed, and that causes the whole
test to explode.
Fix by having pg_get_wal_records_info() read only up to the flushed
position. Backpatch to 15, which is where pg_walinspect appeared.
Author: Karina Litskevich <litskevichkarina@gmail.com>
Discussion: https://postgr.es/m/a5559c95-52c3-5eea-cd63-9b4f1c70ff96@gmail.com
Fix incorrect code which was trying to convert a Bitmapset of columns at
the attnums according to a parent table and transform them into the
equivalent Bitmapset with same attnums according to the given child table.
This code is new as of a61b1f748 and was failing to do the correct
translation when there was an intermediate parent table between 'rel' and
'top_parent_rel'.
Reported-by: Ranier Vilela
Author: Richard Guo, Amit Langote
Discussion: https://postgr.es/m/CAEudQArohfB_Gy%2BhcH2-bANUkxgjJiP%3DABq01_LgTNTbcNijag%40mail.gmail.com
An epoll fd belonging to the parent should be closed in the child. A
kqueue fd is automatically closed by fork(), but we should still adjust
our counter. For poll and Windows systems, nothing special is required.
On all systems we free the memory.
No caller yet, but we'll need this if we start using WaitEventSet in the
postmaster as planned.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
At the time of commit 6a2a70a02 we didn't use latch infrastructure in
the postmaster. We're planning to start doing that, so we'd better make
sure that the signalfd inherited from a postmaster is not duplicated and
then leaked in the child.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
To be able to handle incoming connections on a server socket with
the WaitEventSet API, we'll need a new kind of event to indicate that
the the socket is ready to accept a connection.
On Unix, it's just the same as WL_SOCKET_READABLE, but on Windows there
is a different underlying kernel event that we need to map our
abstraction to.
No user yet, but a proposed patch would use this.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
The regression test suite includes a table called "tab_core_types" that
has one attribute based on the type "aclitem". Keeping this attribute
as-is causes hard failures when running pg_upgrade with an origin on
~15. This commit updates upgrade_adapt.sql to automatically detect the
tables with such attributes and switch them to text so as pg_upgrade
is able to go through its run.
This does not provide the same detection coverage as pg_upgrade, where
we are able to find out aclitems used in arrays, domains or even
composite types, but this is (I guess) enough for most things like an
instance that had installcheck run on before the upgrade with a dump
generated from it.
Note that the buildfarm code has taken the simplest approach of just
dropping "tab_core_types", so what we have here is more modular.
Author: Anton A. Melnikov
Discussion: https://postgr.es/m/49f389ba-95ce-8a9b-09ae-f60650c0e7c7@inbox.ru
The query used to disable WITH OIDS in all the relations making use of
it was checking for materialized views, but this is not a supported
operation. On the contrary, this needs to be done on foreign tables.
While on it, use quote_ident() in the ALTER TABLE strings built on the
relation name.
Author: Anton A. Melnikov, Michael Paquier
Discussion: https://postgr.es/m/49f389ba-95ce-8a9b-09ae-f60650c0e7c7@inbox.ru
Backpatch-through: 12
Three error strings used with cache lookup failures were referring to
incorrect object types for ACL checks:
- Schemas
- Types
- Foreign Servers
There errors should never be triggered, but if they do incorrect
information would be reported.
Author: Justin Pryzby
Discussion: https://postgr.es/m/20221222153041.GN1153@telsasoft.com
Backpatch-through: 11
The former name was discussed as being confusing, so use "split", as per
a suggestion from Magnus Hagander.
While on it, one of the output arguments is renamed from "segno" to
"segment_number", as per a suggestion from Kyotaro Horiguchi.
The documentation is updated to reflect all these changes.
Bump catalog version.
Author: Bharath Rupireddy, Michael Paquier
Discussion: https://postgr.es/m/CABUevEytQVaOOhGdoh0D7hGwe3fuKcRF6NthsSW7ww04EmtFgQ@mail.gmail.com
WindowFuncs such as row_number() don't care if it's called with ROWS
UNBOUNDED PRECEDING AND CURRENT ROW or with RANGE UNBOUNDED PRECEDING AND
CURRENT ROW. The latter is less efficient as the RANGE option requires
that the executor check for peer rows, so using the ROW option instead
would cause less overhead. Because RANGE is part of the default frame
options for WindowClauses, it means WindowAgg is, by default, working much
harder than it needs to for window functions where the ROWS / RANGE option
has no effect on the window function's result.
On a test query from the discussion thread, a performance improvement of
344% was seen by using ROWS instead of RANGE.
Here we add a new support function node type to allow support functions to
be called for window functions so that the most optimal version of the
frame options can be set. The planner has been adjusted so that the frame
options are changed only if all window functions sharing the same window
clause agree on what the optimized frame options are.
Here we give the ability for row_number(), rank(), dense_rank(),
percent_rank(), cume_dist() and ntile() to alter their WindowClause's
frameOptions.
Reviewed-by: Vik Fearing, Erwin Brandstetter, Zhihong Yu
Discussion: https://postgr.es/m/CAGHENJ7LBBszxS+SkWWFVnBmOT2oVsBhDMB1DFrgerCeYa_DyA@mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvohAKEtTXxq7Pc-ic2dKT8oZfbRKeEJP64M0B6+S88z+A@mail.gmail.com
Use C99 designated initializer syntax for the array elements, instead of
writing the enumerator name and position in a comment. Replace nkeys
and key with a local variadic macro, for a shorter notation.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://postgr.es/m/CA%2BhUKGKdpDjKL2jgC-GpoL4DGZU1YPqnOFHbDqFkfRQcPaR5DQ%40mail.gmail.com
Perform a failsafe check every time VACUUM's first heap scan scans a
further FAILSAFE_EVERY_PAGES pages, rather than using an approach based
on the number of physical blocks that our current blkno is from the
blkno at the time of the previous failsafe check. That way VACUUM will
perform a failsafe check every time it has scanned a uniform number of
pages, without it mattering when or how VACUUM skipped pages using the
visibility map.
Sami Imseih, with changes to FAILSAFE_EVERY_PAGES comments added by me.
Author: Sami Imseih <simseih@amazon.com>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/401CE010-4049-4B94-9961-0B610A5D254D%40amazon.com
Use a dedicated struct for the XID/MXID cutoffs used by VACUUM, such as
FreezeLimit and OldestXmin. This state is initialized in vacuum.c, and
then passed around by code from vacuumlazy.c to heapam.c freezing
related routines. The new convention is that everybody works off of the
same cutoff state, which is passed around via pointers to const.
Also simplify some of the logic for dealing with frozen xmin in
heap_prepare_freeze_tuple: add dedicated "xmin_already_frozen" state to
clearly distinguish xmin XIDs that we're going to freeze from those that
were already frozen from before. That way the routine's xmin handling
code is symmetrical with the existing xmax handling code. This is
preparation for an upcoming commit that will add page level freezing.
Also refactor the control flow within FreezeMultiXactId(), while adding
stricter sanity checks. We now test OldestXmin directly, instead of
using FreezeLimit as an inexact proxy for OldestXmin. This is further
preparation for the page level freezing work, which will make the
function's caller cede control of page level freezing to the function
where appropriate (where heap_prepare_freeze_tuple sees a tuple that
happens to contain a MultiXactId in its xmax).
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/CAH2-WznS9TxXmz2_=SY+SyJyDFbiOftKofM9=aDo68BbXNBUMA@mail.gmail.com
perform_pullup_replace_vars() knows how to scan the whole parent
query tree when we are replacing Vars during a subquery flattening
operation. However, for the specific case of flattening a
UNION ALL leaf query, that's mostly wasted work: the only place
where relevant Vars could exist is in the AppendRelInfo that we
just made for this leaf. Teaching perform_pullup_replace_vars()
to just deal with that and exit is worthwhile because, if we have
N such subqueries to pull up, we were spending O(N^2) work uselessly
mutating the AppendRelInfos for all the other subqueries.
While we're at it, avoid calling substitute_phv_relids if there are no
PlaceHolderVars, and remove an obsolete check of parse->hasSubLinks.
Andrey Lepikhov and Tom Lane
Discussion: https://postgr.es/m/703c09a2-08f3-d2ec-b33d-dbecd62428b8@postgrespro.ru
Andrey Lepikhov demonstrated a case where we spend an unreasonable
amount of time in pull_up_subqueries(). Not only is that recursing
with no explicit check for stack overrun, but the code seems not
interruptable by control-C. Let's stick a CHECK_FOR_INTERRUPTS
there, along with sprinkling some stack depth checks.
An actual fix for the excessive time consumption seems a bit
risky to back-patch; but this isn't, so let's do so.
Discussion: https://postgr.es/m/703c09a2-08f3-d2ec-b33d-dbecd62428b8@postgrespro.ru
The previous coding of VA_ARGS_NARGS() always returned 1 on Visual
Studio, because it treats __VA_ARGS__ as a single token unless you jump
through extra hoops. Newer compilers have an option to fix that. Add a
comment about that so that we can remember to clean this up in the
future when our minimum MSVC version advances.
Author: Victor Spirin <v.spirin@postgrespro.ru>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/f450fc57-a147-19d0-e50c-33571c52cc13%40postgrespro.ru
A bitwise operator was getting used on two bools in
ATAddCheckConstraint() to track if constraints should be merged or not
with the existing ones of a relation, though obviously this should use
a boolean OR operator. This led to the same result, but let's be
clean.
Oversight in 074c5cf.
Author: Ranier Vilela
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/CAEudQAp2R2fbbi0OHHhv_n4=Ch0t1VtjObR9YMqtGKHJ+faUFQ@mail.gmail.com
This introduces palloc_aligned() and MemoryContextAllocAligned() which
allow callers to obtain memory which is allocated to the given size and
also aligned to the specified alignment boundary. The alignment
boundaries may be any power-of-2 value. Currently, the alignment is
capped at 2^26, however, we don't expect values anything like that large.
The primary expected use case is to align allocations to perhaps CPU
cache line size or to maybe I/O page size. Certain use cases can benefit
from having aligned memory by either having better performance or more
predictable performance.
The alignment is achieved by requesting 'alignto' additional bytes from
the underlying allocator function and then aligning the address that is
returned to the requested alignment. This obviously does waste some
memory, so alignments should be kept as small as what is required.
It's also important to note that these alignment bytes eat into the
maximum allocation size. So something like:
palloc_aligned(MaxAllocSize, 64, 0);
will not work as we cannot request MaxAllocSize + 64 bytes.
Additionally, because we're just requesting the requested size plus the
alignment requirements from the given MemoryContext, if that context is
the Slab allocator, then since slab can only provide chunks of the size
that's specified when the slab context is created, then this is not going
to work. Slab will generate an error to indicate that the requested size
is not supported.
The alignment that is requested in palloc_aligned() is stored along with
the allocated memory. This allows the alignment to remain intact through
repalloc() calls.
Author: Andres Freund, David Rowley
Reviewed-by: Maxim Orlov, Andres Freund, John Naylor
Discussion: https://postgr.es/m/CAApHDvpxLPUMV1mhxs6g7GNwCP6Cs6hfnYQL5ffJQTuFAuxt8A%40mail.gmail.com
This is the guts of float4in, callable as a routine to input floats,
which will be useful in an upcoming patch for allowing soft errors in
the seg module's input function.
A similar operation was performed some years ago for float8in in
commit 50861cd683.
Reviewed by Tom Lane
Discussion: https://postgr.es/m/cee4e426-d014-c0b7-aa22-a659f2cd9130@dunslane.net
d21ded75f changed the way slab.c works but introduced a bug that meant we
could end up with the slab's curBlocklistIndex pointing to the wrong list.
The condition which was checking for this was failing to account for two
things:
1. The curBlocklistIndex could be 0 as we've currently got no non-full
blocks to put chunks on. In this case, the dlist_is_empty() check cannot
be performed as there can be any number of completely full blocks at that
index.
2. The curBlocklistIndex may be greater than the index we just moved the
block onto. Since we need to ensure we fill up fuller blocks first, we
must reset curBlocklistIndex when changing any blocklist element that's
less than the curBlocklistIndex too.
Reported-by: Takamichi Osumi
Discussion: https://postgr.es/m/TYCPR01MB8373329C6329768D7E093D68EDEB9@TYCPR01MB8373.jpnprd01.prod.outlook.com
This commit changes some of the bbstreamer files and pg_dump to use the
same style as a few other places (like common/compression.c), where the
name of the compression method is not part of the string, but an
argument of it. This reduces a bit the translation work with less
string patterns.
Discussion: https://postgr.es/m/Y5/5tdK+4n3clvtU@paquier.xyz
This shaves some code by replacing the combinations of
CreateTemplateTupleDesc()/TupleDescInitEntry() hardcoding a mapping of
the attributes listed in pg_proc.dat by get_call_result_type() to build
the TupleDesc needed for the rows generated.
get_call_result_type() is more expensive than the former style, but this
removes some duplication with the lists of OUT parameters (pg_proc.dat
and the attributes hardcoded in these code paths). This is applied to
functions that are not considered as critical (aka that could be called
repeatedly for monitoring purposes).
Author: Bharath Rupireddy
Reviewed-by: Robert Haas, Álvaro Herrera, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACV23HW5HP5hFjd89FNS-z5X8r2jNXdMXcpN2BgTtKd87w@mail.gmail.com
Commit 927f453a9 disallowed batching added by commit b663a4136 to be
used for the inserts performed as part of cross-partition updates of
partitioned tables, mainly because the previous code in
nodeModifyTable.c couldn't handle pending inserts into foreign-table
partitions that are also UPDATE target partitions. But we don't have
such a limitation anymore (cf. commit ffbb7e65a), so let's allow for
this by removing from execPartition.c the restriction added by commit
927f453a9 that batching is only allowed if the query command type is
CMD_INSERT.
In postgres_fdw, since commit 86dc90056 changed it to effectively
disable cross-partition updates in the case where a foreign-table
partition chosen to insert rows into is also an UPDATE target partition,
allow batching in the case where a foreign-table partition chosen to
do so is *not* also an UPDATE target partition. This is enabled by the
"batch_size" option added by commit b663a4136, which is disabled by
default.
This patch also adjusts the test case added by commit 927f453a9 to
confirm that the inserts performed as part of a cross-partition update
of a partitioned table indeed uses batching.
Amit Langote, reviewed and/or tested by Georgios Kokolatos, Zhihong Yu,
Bharath Rupireddy, Hou Zhijie, Vignesh C, and me.
Discussion: http://postgr.es/m/CA%2BHiwqH1Lz1yJmPs%3DaD-pzd_HLLynLHvq5iYeT9mB0bBV7oJ6w%40mail.gmail.com
1349d279 added query planner support to allow more efficient execution of
aggregate functions which have an ORDER BY or a DISTINCT clause. Prior to
that commit, the planner would only request that the lower planner produce
a plan with the order required for the GROUP BY clause and it would be
left up to nodeAgg.c to perform the final sort of records within each
group so that the aggregate transition functions were called in the
correct order. Now that the planner requests the lower planner produce a
plan with the GROUP BY and the ORDER BY / DISTINCT aggregates in mind,
there is the possibility that the planner chooses a plan which could be
less efficient than what would have been produced before 1349d279.
While developing 1349d279, I had in mind that Incremental Sort would help
us in cases where an index exists only on the GROUP BY column(s).
Incremental Sort would just replace the implicit tuplesorts which are
being performed in nodeAgg.c. However, because the planner has the
flexibility to instead choose a plan which just performs a full sort on
both the GROUP BY and ORDER BY / DISTINCT aggregate columns, there is
potential for the planner to make a bad choice. The costing for
Incremental Sort is not perfect as it assumes an even distribution of rows
to sort within each sort group.
Here we add an escape hatch in the form of the enable_presorted_aggregate
GUC. This will allow users to get the pre-PG16 behavior in cases where
they have no other means to convince the query planner to produce a plan
which only sorts on the GROUP BY column(s).
Discussion: https://postgr.es/m/CAApHDvr1Sm+g9hbv4REOVuvQKeDWXcKUAhmbK5K+dfun0s9CvA@mail.gmail.com
Slab has traditionally been fairly slow when compared with the AllocSet or
Generation memory allocators. Part of this slowness came from having to
write out an entire block when we allocate a new block in order to
populate the free list indexes within the block's memory. Additional
slowness came from having to move a block onto another dlist each time we
palloc or pfree a chunk from it.
Here we optimize both of those cases and do a little bit extra to improve
the performance of the slab allocator.
Here, instead of writing out the free list indexes when allocating a new
block, we introduce the concept of "unused" chunks. When a block is first
allocated all chunks are unused. These chunks only make it onto the
free list when they are pfree'd. When allocating new chunks on an
existing block, we have the choice of consuming a chunk from the free list
or an unused chunk. When both exist, we opt to use one from the free
list, as these have been used already and the memory of them is more
likely to be cached by the CPU.
Here we also reduce the number of block lists from there being one for
every possible value of free chunks on a block to just having a small
fixed number of block lists. We keep the 0th block list for completely
full blocks and anything else stores blocks for some range of free chunks
with fuller blocks appearing on lower block list array elements. This
reduces how often we must move a block to another list when we allocate or
free chunks, but still allows us to prefer to put new chunks on fuller
blocks and perhaps allow blocks with fewer chunks to be free'd later
once all their remaining chunks have been pfree'd.
Additionally, we now store a list of "emptyblocks", which are blocks that
no longer contain any allocated chunks. We now keep up to 10 of these
around to avoid having to thrash malloc/free when allocation patterns
continually cause blocks to become free of any allocated chunks only to
allocate more chunks again. Now only once we have 10 of these, we free
the block. This does raise the high water mark for the total memory that
a slab context can consume. It does not seem entirely unreasonable that
we might one day want to make this a property of SlabContext rather than a
compile-time constant. Let's wait and see if there is any evidence to
support that this is required before doing it.
Author: Andres Freund, David Rowley
Tested-by: Tomas Vondra, John Naylor
Discussion: https://postgr.es/m/20210717194333.mr5io3zup3kxahfm@alap3.anarazel.de
This function takes in input a WAL segment name and returns a tuple made
of the segment sequence number (dependent on the WAL segment size of the
cluster) and its timeline, as of a thin SQL wrapper around the existing
XLogFromFileName().
This function has multiple usages, like being able to compile a LSN from
a file name and an offset, or finding the timeline of a segment without
having to do to some maths based on the first eight characters of the
segment.
Bump catalog version.
Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart, Kyotaro Horiguchi, Maxim Orlov, Michael
Paquier
Discussion: https://postgr.es/m/CALj2ACWV=FCddsxcGbVOA=cvPyMr75YCFbSQT6g4KDj=gcJK4g@mail.gmail.com
SCRAM_KEY_LEN was a variable used in the internal routines of SCRAM to
size a set of fixed-sized arrays used in the SHA and HMAC computations
during the SASL exchange or when building a SCRAM password. This had a
hard dependency on SHA-256, reducing the flexibility of SCRAM when it
comes to the addition of more hash methods. A second issue was that
SHA-256 is assumed as the cryptohash method to use all the time.
This commit renames SCRAM_KEY_LEN to a more generic SCRAM_KEY_MAX_LEN,
which is used as the size of the buffers used by the internal routines
of SCRAM. This is aimed at tracking centrally the maximum size
necessary for all the hash methods supported by SCRAM. A global
variable has the advantage of keeping the code in its simplest form,
reducing the need of more alloc/free logic for all the buffers used in
the hash calculations.
A second change is that the key length (SHA digest length) and hash
types are now tracked by the state data in the backend and the frontend,
the common portions being extended to handle these as arguments by the
internal routines of SCRAM. There are a few RFC proposals floating
around to extend the SCRAM protocol, including some to use stronger
cryptohash algorithms, so this lifts some of the existing restrictions
in the code.
The code in charge of parsing and building SCRAM secrets is extended to
rely on the key length and on the cryptohash type used for the exchange,
assuming currently that only SHA-256 is supported for the moment. Note
that the mock authentication simply enforces SHA-256.
Author: Michael Paquier
Reviewed-by: Peter Eisentraut, Jonathan Katz
Discussion: https://postgr.es/m/Y5k3Qiweo/1g9CG6@paquier.xyz
A new function pg_stat_get_backend_subxact() can be used to get
information about the number of subtransactions in the cache of
a particular backend and whether that cache has overflowed. This
can be useful for tracking down performance problems that can
result from overflowed snapshots.
Dilip Kumar, reviewed by Zhihong Yu, Nikolay Samokhvalov,
Justin Pryzby, Nathan Bossart, Ashutosh Sharma, Julien
Rouhaud. Additional design comments from Andres Freund,
Tom Lane, Bruce Momjian, and David G. Johnston.
Discussion: http://postgr.es/m/CAFiTN-ut0uwkRJDQJeDPXpVyTWD46m3gt3JDToE02hTfONEN=Q@mail.gmail.com
While fooling with my pet outer-join-variables patch, I discovered
that the test case I added in commit 11086f2f2 no longer demonstrates
what it's supposed to. The idea is to tempt the planner to reverse
the order of the two outer joins, which would leave noplace to
correctly evaluate the WHERE clause that's inserted between them.
Before the addition of the delay_upper_joins mechanism, it would
have taken the bait.
However, subsequent improvements broke the test in two different ways.
First, we now recognize the IS NULL coding pattern as an antijoin, and
we won't re-order antijoins; even if we did, the IS NULL test clauses
get removed so there would be no opportunity for them to misbehave.
Second, the planner now discovers that nested parameterized indexscans
are a lot cheaper than the double hash join it used back in the day,
and that approach doesn't want to re-order the joins anyway. Thus,
in HEAD the test passes even if one dikes out delay_upper_joins.
To fix, change the IS NULL tests to COALESCE clauses, which produce
the same results but the planner isn't smart enough to convert them
to antijoins. It'll still go for parameterized indexscans though,
so drop the index enabling that (don't know why I added that in the
first place), and disable nestloop joining just to be sure.
This time around, add an EXPLAIN to make the choice of plan visible.
Such references failed with "cache lookup failed for type 0"
because we didn't resolve the type of the CYCLE column until after
analyzing the CTE's query. We can just move that processing
to before the recursive parse_sub_analyze call, though.
While here, invent a couple of local variables to make this
code less egregiously wider-than-80-columns.
Per bug #17723 from Vik Fearing. Back-patch to v14 where
the CYCLE feature was added.
Discussion: https://postgr.es/m/17723-2c4985ff111e7bba@postgresql.org
The environment variable PG_TEST_PG_UPGRADE_MODE can be set to
override the default transfer mode for the pg_upgrade tests.
(Automatically running the pg_upgrade tests for all supported modes
would be too slow.)
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/50a97009-8ff9-ca4d-a0f6-6086a6775a5b%40enterprisedb.com
This option selects the default transfer mode. Having an explicit
option is handy to make scripts and tests more explicit. It also
makes it easier to talk about a "copy" mode rather than "the default
mode" or something like that, since until now the default mode didn't
have an externally visible name.
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/50a97009-8ff9-ca4d-a0f6-6086a6775a5b%40enterprisedb.com
This ancient bit of code was summarily trapping any ereport longjmp
whatsoever and assuming that it must represent an invalid-XML report.
It's not really appropriate to handle OOM-like situations that way:
maybe the input is valid or maybe not, but we couldn't find out.
And it'd be a seriously bad idea to ignore, say, a query cancel
error that way. (Perhaps that can't happen because there is no
CHECK_FOR_INTERRUPTS anywhere within xml_parse, but even if that's
true today it's obviously a very fragile assumption.)
But in the wake of the previous commit, we can drop the PG_TRY
here altogether, and use the soft error mechanism to catch only
the kinds of errors that are legitimate to treat as invalid-XML.
(This is our first use of the soft error mechanism for something
not directly related to a datatype input function. It won't be
the last.)
xml_is_document can be converted in the same way. That one is
not actively broken, because it was checking specifically for
ERRCODE_INVALID_XML_DOCUMENT rather than trapping everything;
but the code is still shorter and probably faster this way.
Discussion: https://postgr.es/m/3564577.1671142683@sss.pgh.pa.us
The key idea here is that xml_parse must distinguish hard errors
from soft errors. We want to throw a hard error for libxml
initialization failures: those might be out-of-memory, or something
else, but in any case they are not the fault of the input string.
If we get to the point of parsing the input, and something goes
wrong, we can fairly consider that to mean bad input.
One thing that arguably does mean bad input, but I didn't trouble
to handle softly, is encoding conversion failure while converting
the server encoding to UTF8. This might be something to improve
later, but it seems like a pretty low-probability scenario.
Discussion: https://postgr.es/m/3564577.1671142683@sss.pgh.pa.us
When incremental sorts were added in v13 a 1.5x pessimism factor was added
to the cost modal. Seemingly this was done because the cost modal only
has an estimate of the total number of input rows and the number of
presorted groups. It assumes that the input rows will be evenly
distributed throughout the presorted groups. The 1.5x pessimism factor
was added to slightly reduce the likelihood of incremental sorts being
used in the hope to avoid performance regressions where an incremental
sort plan was picked and turned out slower due to a large skew in the
number of rows in the presorted groups.
An additional quirk with the path generation code meant that we could
consider both a sort and an incremental sort on paths with presorted keys.
This meant that with the pessimism factor, it was possible that we opted
to perform a sort rather than an incremental sort when the given path had
presorted keys.
Here we remove the 1.5x pessimism factor to allow incremental sorts to
have a fairer chance at being chosen against a full sort.
Previously we would generally create a sort path on the cheapest input
path (if that wasn't sorted already) and incremental sort paths on any
path which had presorted keys. This meant that if the cheapest input path
wasn't completely sorted but happened to have presorted keys, we would
create a full sort path *and* an incremental sort path on that input path.
Here we change this logic so that if there are presorted keys, we only
create an incremental sort path, and create sort paths only when a full
sort is required.
Both the removal of the cost pessimism factor and the changes made to the
path generation make it more likely that incremental sorts will now be
chosen. That, of course, as with teaching the planner any new tricks,
means an increased likelihood that the planner will perform an incremental
sort when it's not the best method. Our standard escape hatch for these
cases is an enable_* GUC. enable_incremental_sort already exists for
this.
This came out of a report by Pavel Luzanov where he mentioned that the
master branch was choosing to perform a Seq Scan -> Sort -> Group
Aggregate for his query with an ORDER BY aggregate function. The v15 plan
for his query performed an Index Scan -> Group Aggregate, of course, the
aggregate performed the final sort internally in nodeAgg.c for the
aggregate's ORDER BY. The ideal plan would have been to use the index,
which provided partially sorted input then use an incremental sort to
provide the aggregate with the sorted input. This was not being chosen
due to the pessimism in the incremental sort cost modal, so here we remove
that and rationalize the path generation so that sort and incremental sort
plans don't have to needlessly compete. We assume that it's senseless
to ever use a full sort on a given input path where an incremental sort
can be performed.
Reported-by: Pavel Luzanov
Reviewed-by: Richard Guo
Discussion: https://postgr.es/m/9f61ddbf-2989-1536-b31e-6459370a6baa%40postgrespro.ru
It seems that drop-index-concurrently-1 has started to forget what it was
originally meant to be testing. d2d8a229b, which added incremental sorts
changed the expected plan to be an Index Scan plan instead of a Seq Scan
plan. This occurred as the primary key index of the table in question
provided presorted input and, because that index happened to be the
cheapest input path due to enable_seqscan being disabled, the incremental
sort changes just added a Sort on top of that. It seems based on the name
of the PREPAREd statement that the intention here is that the query
produces a seqscan plan.
The reason this test has become broken seems to be due to how the test was
originally coded. The test was trying to force a seqscan plan by
performing some casting to make it so the test_dc index couldn't be used
to perform the required filtering. Trying to coax the planner into using
a plan which has costed in a disable_cost seems like it's always going to
be flakey as small changes in costs are drowned out by the large
disable_cost combined with add_path's STD_FUZZ_FACTOR. Here we get rid of
the casts that we're using to try to trick the planner into a seqscan and
instead toggle enable_seqscan as and when required to get the desired
plan.
Additionally, rename a few things in the test and add some additional
wording to the comments to try and make it more clear in the future what
we expect this test to be doing.
Discussion: https://postgr.es/m/CAApHDvrbDhObhLV+=U_K_-t+2Av2av1aL9d+2j_3AO-XndaviA@mail.gmail.com
Backpatch-through: 13, where d2d8a229b changed the expected test output
The building of command completion tags could often be seen showing up in
profiles when running high tps workloads.
The query completion tags were being built with snprintf, which is slow at
the best of times when compared with more manual ways of formatting
strings. Here we introduce BuildQueryCompletionString() to do this job
for us. We also now store the completion tag's strlen in the
CommandTagBehavior struct so that we can quickly memcpy this number of
bytes into the completion tag string. Appending the rows affected is done
via pg_ulltoa_n. BuildQueryCompletionString returns the length of the
built string. This saves us having to call strlen to figure out how many
bytes to pass to pq_putmessage().
Author: David Rowley, Andres Freund
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CAHoyFK-Xwqc-iY52shj0G+8K9FJpse+FuZ36XBKy78wDVnd=Qg@mail.gmail.com
This is mostly straightforward, except that if the range type
has a canonical function, that might throw an error during range
input. (Such errors probably only occur for edge cases: in the
in-core canonical functions, it happens only if a bound has the
maximum valid value for the underlying type.) Hence, this patch
extends the soft-error regime to allow canonical functions to
return errors softly as well. Extensions implementing range
canonical functions will need modification anyway because of the
API change for range_serialize(); while at it, they might want
to do something similar to what's been done here in the in-core
canonical functions.
Discussion: https://postgr.es/m/3284599.1671075185@sss.pgh.pa.us
Because we added StaticAssertStmt() first before StaticAssertDecl(),
some uses as well as the instructions in c.h are now a bit backwards
from the "native" way static assertions are meant to be used in C.
This updates the guidance and moves some static assertions to better
places.
Specifically, since the addition of StaticAssertDecl(), we can put
static assertions at the file level. This moves a number of static
assertions out of function bodies, where they might have been stuck
out of necessity, to perhaps better places at the file level or in
header files.
Also, when the static assertion appears in a position where a
declaration is allowed, then using StaticAssertDecl() is more native
than StaticAssertStmt().
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/941a04e7-dd6f-c0e4-8cdf-a33b3338cbda%40enterprisedb.com
35f059e9bd put the provariadic sanity
check into type_sanity.sql, even though it's not about types, and
moreover in the middle of some connected test group, which makes it
all very confusing. Move it to opr_sanity.sql, where it is in better
company.
Convert assorted internal-ish datatypes, namely aclitemin,
int2vectorin, oidin, oidvectorin, pg_lsn_in, pg_snapshot_in,
and tidin to the new style.
(Some others you might expect to find in this group, such as
cidin and xidin, need no changes because they never throw
errors at all. That seems a little cheesy ... but it is not in
the charter of this patch series to add new error conditions.)
Amul Sul, minor mods by me
Discussion: https://postgr.es/m/CAAJ_b97KeDWUdpTKGOaFYPv0OicjOu6EW+QYWj-Ywrgj_aEy1g@mail.gmail.com
Convert box_in, circle_in, line_in, lseg_in, path_in, point_in,
and poly_in to the new style.
line_in still throws hard errors for overflows/underflows that can occur
when the input is specified as two points rather than in the canonical
"Ax + By + C = 0" style. I'm not too concerned about that: it won't be
reached in normal dump/restore cases, and it's fairly debatable that
such conversion should ever have been made part of a type input function
in the first place. But in any case, I don't want to extend the soft
error conventions into float.h without more discussion than this patch
has had.
Amul Sul, minor mods by me
Discussion: https://postgr.es/m/CAAJ_b97KeDWUdpTKGOaFYPv0OicjOu6EW+QYWj-Ywrgj_aEy1g@mail.gmail.com
Add support for hexadecimal, octal, and binary integer literals:
0x42F
0o273
0b100101
per SQL:202x draft.
This adds support in the lexer as well as in the integer type input
functions.
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
This referred to the size of the buffers for k_ipad and k_opad in HMAC
computations. This is unused since e6bdfd9, where SCRAM has switched to
the cryptohash routines for its HMAC calculations rather than its own
maths.
Reviewed-by: Jacob Champion
Discussion: https://postgr.es/m/Y5gGMjXhyp0oK0mH@paquier.xyz
Commits f92944137 et al. made IsInTransactionBlock() set the
XACT_FLAGS_NEEDIMMEDIATECOMMIT flag before returning "false",
on the grounds that that kept its API promises equivalent to those of
PreventInTransactionBlock(). This turns out to be a bad idea though,
because it allows an ANALYZE in a pipelined series of commands to
cause an immediate commit, which is unexpected.
Furthermore, if we return "false" then we have another issue,
which is that ANALYZE will decide it's allowed to do internal
commit-and-start-transaction sequences, thus possibly unexpectedly
committing the effects of previous commands in the pipeline.
To fix the latter situation, invent another transaction state flag
XACT_FLAGS_PIPELINING, which explicitly records the fact that we
have executed some extended-protocol command and not yet seen a
commit for it. Then, require that flag to not be set before allowing
InTransactionBlock() to return "false".
Having done that, we can remove its setting of NEEDIMMEDIATECOMMIT
without fear of causing problems. This means that the API guarantees
of IsInTransactionBlock now diverge from PreventInTransactionBlock,
which is mildly annoying, but it seems OK given the very limited usage
of IsInTransactionBlock. (In any case, a caller preferring the old
behavior could always set NEEDIMMEDIATECOMMIT for itself.)
For consistency also require XACT_FLAGS_PIPELINING to not be set
in PreventInTransactionBlock. This too is meant to prevent commands
such as CREATE DATABASE from silently committing previous commands
in a pipeline.
Per report from Peter Eisentraut. As before, back-patch to all
supported branches (which sadly no longer includes v10).
Discussion: https://postgr.es/m/65a899dd-aebc-f667-1d0a-abb89ff3abf8@enterprisedb.com
Instead of half a dozen of mostly-duplicate ExecGrant_Foo() functions,
write one common function ExecGrant_generic() that can handle most of
them. We already have all the information we need, such as which
system catalog corresponds to which catalog table and which column is
the ACL column.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: https://www.postgresql.org/message-id/flat/22c7e802-4e7d-8d87-8b71-cba95e6f4bcf%40enterprisedb.com
jsonb_get_element() was incautious enough to use VARDATA() and
VARSIZE() directly on an arbitrary text Datum. That of course
fails if the Datum is short-header, compressed, or out-of-line.
The typical result would be failing to match any element of a
jsonb object, though matching the wrong one seems possible as well.
setPathObject() was slightly brighter, in that it used VARDATA_ANY
and VARSIZE_ANY_EXHDR, but that only kept it out of trouble for
short-header Datums. push_path() had the same issue. This could
result in faulty subscripted insertions, though keys long enough to
cause a problem are likely rare in the wild.
Having seen these, I looked around for unsafe usages in the rest
of the adt/json* files. There are a couple of places where it's not
immediately obvious that the Datum can't be compressed or out-of-line,
so I added pg_detoast_datum_packed() to cope if it is. Also, remove
some other usages of VARDATA/VARSIZE on Datums we just extracted from
a text array. Those aren't actively broken, but they will become so
if we ever start allowing short-header array elements, which does not
seem like a terribly unreasonable thing to do. In any case they are
not great coding examples, and they could also do with comments
pointing out that we're assuming we don't need pg_detoast_datum_packed.
Per report from exe-dealer@yandex.ru. Patch by me, but thanks to
David Johnston for initial investigation. Back-patch to v14 where
jsonb subscripting was introduced.
Discussion: https://postgr.es/m/205321670615953@mail.yandex.ru
If sendFileWithContent were used to send a file larger than the
bbsink buffer size, this would result in corruption. The only
files that are sent via sendFileWithContent are the backup label
file, the tablespace map file, and .done files for WAL segments
included in the backup. Of these, it seems that only the
tablespace_map file can become large enough to cause a problem,
and then only if you have a lot of tablespaces. If you do have
that situation, you might end up with a corrupted
tablespace_map file, which would be bad.
My commit bef47ff85d introduced
this problem.
Report and patch by Antonin Houska.
Discussion: http://postgr.es/m/15764.1670528645@antos
During ALTER TABLE execution, when prep-time handling of subcommands of
certain types determine that execution-time handling requires recursion,
they signal this by changing the subcommand type to a special value.
This can be done in a simpler way by using a separate flag introduced by
commit ec0925c22a, so do that.
Catversion bumped. It's not clear to me that ALTER TABLE subcommands
are stored anywhere in catalogs (CREATE FUNCTION rejects it in BEGIN
ATOMIC function bodies), but we do have both write and read support for
them, so be safe.
Discussion: https://postgr.es/m/20220929090033.zxuaezcdwh2fgfjb@alvherre.pgsql
3d14e17 has added support for this query but psql was not able to
complete it. Spotted while working on a different patch in the same
area.
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/Y3hw7yvG0VwpC1jq@paquier.xyz
This is straightforward as far as it goes. However, it does not
attempt to trap errors occurring during the execution of domain
CHECK constraints. Since those are general user-defined
expressions, the only way to do that would involve starting up a
subtransaction for each check. Of course the entire point of
the soft-errors feature is to not need subtransactions, so that
would be self-defeating. For now, we'll rely on the assumption
that domain checks are written to avoid throwing errors.
Discussion: https://postgr.es/m/1181028.1670635727@sss.pgh.pa.us
This requires a bit of further infrastructure-extension to allow
trapping errors reported by numeric_in and pg_unicode_to_server,
but otherwise it's pretty straightforward.
In the case of jsonb_in, we are only capturing errors reported
during the initial "parse" phase. The value-construction phase
(JsonbValueToJsonb) can also throw errors if assorted implementation
limits are exceeded. We should improve that, but it seems like a
separable project.
Andrew Dunstan and Tom Lane
Discussion: https://postgr.es/m/3bac9841-fe07-713d-fa42-606c225567d6@dunslane.net
Formerly, semantic action functions for the JSON parser returned void,
so that there was no way for them to affect the parser's behavior.
That means in particular that they can't force an error exit except by
longjmp'ing. That won't do in the context of our project to make input
functions return errors softly. Hence, change them to return the same
JsonParseErrorType enum value as the parser itself uses. If an action
function returns anything besides JSON_SUCCESS, the parse is abandoned
and that error code is returned.
Action functions can thus easily return the same error conditions that
the parser already knows about. As an escape hatch for expansion, also
invent a code JSON_SEM_ACTION_FAILED that the core parser does not know
the exact meaning of. When returning this code, an action function
must use some out-of-band mechanism for reporting the error details.
This commit simply makes the API change and causes all the existing
action functions to return JSON_SUCCESS, so that there is no actual
change in behavior here. This is long enough and boring enough that
it seemed best to commit it separately from the changes that make
real use of the new mechanism.
In passing, remove a duplicate assignment of
transform_string_values_scalar.
Discussion: https://postgr.es/m/1436686.1670701118@sss.pgh.pa.us
We chose a specific wording of the not-implemented errors for
pseudotype I/O functions and other cases where there's little
value in implementing input and/or output. gtsvectorin never
got that memo though, nor did most of contrib. Make these all
fall in line, mostly because I'm a neatnik but also to remove
unnecessary translatable strings.
gbtreekey_in needs a bit of extra love since it supports
multiple SQL types. Sadly, gbtreekey_out doesn't have the
ability to do that, but I think it's unreachable anyway.
Noted while surveying datatype input functions to see what we
have left to fix.
Apparently anymultirange_in was written before we converted all
these pseudotype input functions to use a common macro, and it didn't
get fixed before committing. Sloppy merging probably explains its
unintuitive ordering, too, so rearrange.
Noted while surveying datatype input functions to see what we
have left to fix. I'm inclined to leave the pseudotypes as
throwing hard errors, because it's difficult to see a reason why
anyone would need something else. But in any case, if we want
to change that, we shouldn't have to change multiple copies of
the code.
9d9c02ccd added code to allow WindowAgg to take some shortcuts when a
monotonic WindowFunc reached some value that it could never come back
from due to the function's monotonic nature. That commit added a
runCondition field to WindowClause to store the condition which, when it
becomes false we can start taking shortcuts in nodeWindowAgg.c.
Here we fix an issue where subquery pullups didn't properly update the
runCondition to update the Vars to properly reference the new query level.
Here we also add a missing call to preprocess_expression() for the
WindowClause's runCondtion. The WindowFuncs in the targetlist will have
had this process done, so we must also do it for the WindowFuncs in the
runCondition so that they can be correctly found in the targetlist
during setrefs.c
Bug: #17709
Reported-by: Alexey Makhmutov
Author: Richard Guo, David Rowley
Discussion: https://postgr.es/m/17709-4f557160e3e8ee9a@postgresql.org
Backpatch-through: 15, where 9d9c02ccd was introduced
Buildfarm member wrasse has been complaining about empty declarations
as an effect of 8018ffb and 83a1a1b due to extra semicolons.
While on it, remove also the last backslash of the macros definitions,
causing more lines to be eaten in it than necessary, per comment from
Tom Lane.
Reported-by: Tom Lane, and buildfarm member wrasse
Author: Nathan Bossart, Michael Paquier
Discussion: https://postgr.es/m/1188769.1670640236@sss.pgh.pa.us
Replace the error trapping scheme introduced in 5bc450629 with our
shiny new errsave/ereturn mechanism. This doesn't have any real
functional impact (although I think that the new coding is able
to report a few more errors softly than v15 did). And I doubt
there's any measurable performance difference either. But this
gets rid of an ad-hoc, one-of-a-kind design in favor of a mechanism
that will be widely used going forward, so it should be a net win
for code readability.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
This patch converts the input functions for date, time, timetz,
timestamp, timestamptz, and interval to the new soft-error style.
There's some related stuff in formatting.c that remains to be
cleaned up, but that seems like a separable project.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
Pay down some ancient technical debt (dating to commit 022fd9966):
fix a couple of places in datetime parsing that were throwing
ereport's immediately instead of returning a DTERR code that could be
interpreted by DateTimeParseError. The reason for that was that there
was no mechanism for passing any auxiliary data (such as a zone name)
to DateTimeParseError, and these errors seemed to really need it.
Up to now it didn't matter that much just where the error got thrown,
but now we'd like to have a hard policy that datetime parse errors
get thrown from just the one place.
Hence, invent a "DateTimeErrorExtra" struct that can be used to
carry any extra values needed for specific DTERR codes. Perhaps
in the future somebody will be motivated to use this to improve
the specificity of other DateTimeParseError messages, but for now
just deal with the timezone-error cases.
This is on the way to making the datetime input functions report
parse errors softly; but it's really an independent change, so
commit separately.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
More could be done in this line, but I just grabbed some low-hanging
fruit. Principal objective was to remove the need for several ugly
unconstify() usages in formatting.c.
This patch converts the input functions for bool, int2, int4, int8,
float4, float8, numeric, and contrib/cube to the new soft-error style.
array_in and record_in are also converted. There's lots more to do,
but this is enough to provide proof-of-concept that the soft-error
API is usable, as well as reference examples for how to convert
input functions.
This patch is mostly by me, but it owes very substantial debt to
earlier work by Nikita Glukhov, Andrew Dunstan, and Amul Sul.
Thanks to Andres Freund for review.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
pg_input_is_valid() returns boolean, while pg_input_error_message()
returns the primary error message if the input is bad, or NULL
if the input is OK. The main reason for having two functions is
so that we can test both the details-wanted and the no-details-wanted
code paths.
Although these are primarily designed with testing in mind,
it could well be that they'll be useful to end users as well.
This patch is mostly by me, but it owes very substantial debt to
earlier work by Nikita Glukhov, Andrew Dunstan, and Amul Sul.
Thanks to Andres Freund for review.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
Postgres' standard mechanism for reporting errors (ereport() or elog())
is used for all sorts of error conditions. This means that throwing
an exception via ereport(ERROR) requires an expensive transaction or
subtransaction abort and cleanup, since the exception catcher dare not
make many assumptions about what has gone wrong. There are situations
where we would rather have a lighter-weight mechanism for dealing
with errors that are known to be safe to recover from without a full
transaction cleanup. This commit creates infrastructure to let us
adapt existing error-reporting code for that purpose. See the
included documentation changes for details. Follow-on commits will
provide test code and usage examples.
The near-term plan is to convert most if not all datatype input
functions to report invalid input "softly". This will enable
implementing some SQL/JSON features cleanly and without the cost
of subtransactions, and it will also allow creating COPY options
to deal with bad input without cancelling the whole COPY.
This patch is mostly by me, but it owes very substantial debt to
earlier work by Nikita Glukhov, Andrew Dunstan, and Amul Sul.
Thanks also to Andres Freund for review.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
The USER SET flag specifies that the variable should be set on behalf of an
ordinary role. That lets ordinary roles set placeholder variables, which
permission requirements are not known yet. Such a value wouldn't be used if
the variable finally appear to require superuser privileges.
The new flags are stored in the pg_db_role_setting.setuser array. Catversion
is bumped.
This commit is inspired by the previous work by Steve Chavez.
Discussion: https://postgr.es/m/CAPpHfdsLd6E--epnGqXENqLP6dLwuNZrPMcNYb3wJ87WR7UBOQ%40mail.gmail.com
Author: Alexander Korotkov, Steve Chavez
Reviewed-by: Pavel Borisov, Steve Chavez
Commit 7103ebb7aa added support for MERGE, which included support for
inheritance hierarchies, but didn't document the fact that ONLY could
be specified before the source and/or target tables to exclude tables
inheriting from the tables specified.
Update merge.sgml to mention this, and while at it, add some
regression tests to cover it.
Dean Rasheed, reviewed by Nathan Bossart.
Backpatch to 15, where MERGE was added.
Discussion: https://postgr.es/m/CAEZATCU0XM-bJCvpJuVRU3UYNRqEBS6g4-zH%3Dj9Ye0caX8F6uQ%40mail.gmail.com
Make the argument types of the File API match stdio better:
- Change the data buffer to void *, from char *.
- Change FileWrite() data buffer to const on top of that.
- Change amounts to size_t, from int.
In passing, change the FilePrefetch() amount argument from int to
off_t, to match the underlying posix_fadvise().
Discussion: https://www.postgresql.org/message-id/flat/11dda853-bb5b-59ba-a746-e168b1ce4bdb%40enterprisedb.com
In commit ffbb7e65a, I added a ModifyTableState member to ResultRelInfo
to save the owning ModifyTableState for use by nodeModifyTable.c when
performing batch inserts, but as pointed out by Tom Lane, that changed
the array stride of es_result_relations, and that would break any
previously-compiled extension code that accesses that array. Fix by
removing that member from ResultRelInfo and instead adding a List member
at the end of EState to save such ModifyTableStates.
Per report from Tom Lane. Back-patch to v14, like the previous commit;
I chose to apply the patch to HEAD as well, to make back-patching easy.
Discussion: http://postgr.es/m/4065383.1669395453%40sss.pgh.pa.us
After restart, we don't perform streaming of an in-progress transaction if
it was previously decoded and confirmed by the client. To achieve that we
were comparing the END location of the WAL record being decoded with the
WAL location we have already decoded and confirmed by the client. While
decoding the commit record, to decide whether to process and send the
complete transaction, we compare its START location with the WAL location
we have already decoded and confirmed by the client. Now, if we need to
queue some change in the transaction while decoding the commit record
(e.g. snapshot), it is possible that we decide to stream the transaction
but later commit processing decides to skip it. In such a case, we would
needlessly send the changes and later when we decide to skip it, we will
send stream abort.
We also sometimes decide to stream the changes when we actually just need
to process them locally like a change for invalidations. This will lead us
to send empty streams. To avoid this, while queuing each change for
decoding, we remember whether the transaction has any change that actually
needs to be sent downstream and use that information later to decide
whether to stream the transaction or not.
Note, we can't avoid all cases where we have to send empty streams like
the case where the plugin later decides that the change is not
publishable. However, we will no longer need to send stream_abort when we
skip sending a particular transaction.
Author: Dilip Kumar
Reviewed-by: Hou Zhijie, Ashutosh Bapat, Shi yu, Amit Kapila
Discussion: https://postgr.es/m/CAFiTN-tHK=7LzfrPs8fbT2ksrOJGQbzywcgXst2bM9-rJJAAUg@mail.gmail.com
To run all tests that support running against existing server:
$ meson test --setup running
To run just the main pg_regress tests against existing server:
$ meson test --setup running regress-running/regress
To ensure the 'running' setup continues to work, test it as part of the
freebsd CI task.
Discussion: https://postgr.es/m/CAH2-Wz=XDQcmLoo7RR_i6FKQdDmcyb9q5gStnfuuQXrOGhB2sQ@mail.gmail.com
Combine some duplicated code stanzas by creating small functions.
Most of these duplications arose at a time when I wouldn't have
trusted C compilers to auto-inline small functions intelligently,
but they're probably poor practice now. Similarly split out some
bits that aren't actually duplicative as the code stands, but would
become so after an upcoming patch to add another error-handling
code path.
Take the opportunity to add some lengthier comments about what
we're doing here, too. Re-order one function that seemed not
very well-placed.
Patch by me, per suggestions from Andres Freund.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
Generate a Makefile.global that's complete enough for PGXS to work for some
extensions. It is likely that this compatibility layer will not suffice for
every extension and not all platforms - we can expand it over time.
This allows extensions to use a single buildsystem across all the supported
postgres versions. Once all supported PG versions support meson, we can remove
the compatibility layer.
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://postgr.es/m/20221005200710.luvw5evhwf6clig6@awork3.anarazel.de
Previously export_dynamic was set in src/makefiles/Makefile.$port. For solaris
this required exporting with_gnu_ld. The determination of with_gnu_ld would be
nontrivial to copy for meson PGXS compatibility. It's also nice to delete
libtool.m4.
This uses -Wl,--export-dynamic on all platforms, previously all platforms but
FreeBSD used -Wl,-E. The likelihood of a name conflict seems lower with the
longer spelling.
Discussion: https://postgr.es/m/20221005200710.luvw5evhwf6clig6@awork3.anarazel.de
The same code pattern is repeated 21 times for int64 counters (0 for
missing entry) and 5 times for doubles (0 for missing entry) on database
entries. This code is switched to use macros for the basic code
instead, shaving a few hundred lines of originally-duplicated code
patterns. The function names remain the same, but some fields of
PgStat_StatDBEntry have to be renamed to cope with the new style.
This is in the same spirit as 83a1a1b.
Author: Michael Paquier
Reviewed-by: Nathan Bossart, Bertrand Drouvot
Discussion: https://postgr.es/m/Y46stlxQ2LQE20Na@paquier.xyz
There likely are further issues, but as evidenced by the CI task proposed by
Justin in the referenced thread, this suffices to build and run basic tests in
cygwin (some fixes for the test infrastructure are needed, but that's
independent of the meson aspect).
Author: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20221021034040.GT16921@telsasoft.com
Currently, information about the permissions to be checked on relations
mentioned in a query is stored in their range table entries. So the
executor must scan the entire range table looking for relations that
need to have permissions checked. This can make the permission checking
part of the executor initialization needlessly expensive when many
inheritance children are present in the range range. While the
permissions need not be checked on the individual child relations, the
executor still must visit every range table entry to filter them out.
This commit moves the permission checking information out of the range
table entries into a new plan node called RTEPermissionInfo. Every
top-level (inheritance "root") RTE_RELATION entry in the range table
gets one and a list of those is maintained alongside the range table.
This new list is initialized by the parser when initializing the range
table. The rewriter can add more entries to it as rules/views are
expanded. Finally, the planner combines the lists of the individual
subqueries into one flat list that is passed to the executor for
checking.
To make it quick to find the RTEPermissionInfo entry belonging to a
given relation, RangeTblEntry gets a new Index field 'perminfoindex'
that stores the corresponding RTEPermissionInfo's index in the query's
list of the latter.
ExecutorCheckPerms_hook has gained another List * argument; the
signature is now:
typedef bool (*ExecutorCheckPerms_hook_type) (List *rangeTable,
List *rtePermInfos,
bool ereport_on_violation);
The first argument is no longer used by any in-core uses of the hook,
but we leave it in place because there may be other implementations that
do. Implementations should likely scan the rtePermInfos list to
determine which operations to allow or deny.
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail.gmail.com
9d9c02ccd added window "run conditions", which allows the evaluation of
monotonic window functions to be skipped when the run condition is no
longer true. Prior to this commit, once the run condition was no longer
true and we stopped evaluating the window functions, we simply just left
the ecxt_aggvalues[] and ecxt_aggnulls[] arrays alone to store whatever
value was stored there the last time the window function was evaluated.
Leaving a stale value in there isn't really a problem on 64-bit builds as
all of the window functions which we recognize as monotonic all return
int8, which is passed by value on 64-bit builds. However, on 32-bit
builds, this was a problem as the value stored in the ecxt_values[]
element would be a by-ref value and it would be pointing to some memory
which would get reset once the tuple context is destroyed. Since the
WindowAgg node will output these values in the resulting tupleslot, this
could be problematic for the top-level WindowAgg node which must look at
these values to filter out the rows that don't meet its filter condition.
Here we fix this by just zeroing the ecxt_aggvalues[] and setting the
ecxt_aggnulls[] array to true when the run condition first becomes false.
This results in the WindowAgg's output having NULLs for the WindowFunc's
columns rather than the stale or pointer pointing to possibly freed
memory. These tuples with the NULLs can only make it as far as the
top-level WindowAgg node before they're filtered out. To ensure that
these tuples *are* always filtered out, we now insist that OpExprs making
up the run condition are strict OpExprs. Currently, all the window
functions which the planner recognizes as monotonic return INT8 and the
operator which is used for the run condition must be a member of a btree
opclass. In reality, these restrictions exclude nothing that's built-in
to Postgres and are unlikely to exclude anyone's custom operators due to
the requirement that the operator is part of a btree opclass. It would be
unusual if those were not strict.
Reported-by: Sergey Shinderuk, using valgrind
Reviewed-by: Richard Guo, Sergey Shinderuk
Discussion: https://postgr.es/m/29184c50-429a-ebd7-f1fb-0589c6723a35@postgrespro.ru
Backpatch-through: 15, where 9d9c02ccd was added
The same code pattern is repeated 17 times for int64 counters (0 for
missing entry) and 5 times for timestamps (NULL for missing entry) on
table entries. This code is switched to use a macro for the basic code
instead, shaving a few hundred lines of originally-duplicated code. The
function names remain the same, but some fields of PgStat_StatTabEntry
have to be renamed to cope with the new style.
Author: Bertrand Drouvot
Reviewed-by: Nathan Bossart
Discussion: https:/postgr.es/m/20221204173207.GA2669116@nathanxps13
Passing a NULL snapshot (InvalidSnapshot) is going to work but only as long
as the index can't find any matching rows. This can be confusing for
the extension authors, so add an explicit check for this argument. The check
is implemented with Assert() in order to avoid overhead in release builds.
Reported-by: Sven Klemm
Discussion: https://postgr.es/m/CAJ7c6TPxitD4vbKyP-mpmC1XwyHdPPqvjLzm%2BVpB88h8LGgneQ%40mail.gmail.com
Author: Aleksander Alekseev
Reviewed-by: Pavel Borisov
By default, the contents generated by the custom and directory dump
formats are compressed. However, with the existing test facility, the
restore program will succeed regardless of whether the dumped output was
compressed or not without checking if anything has been compressed.
This commit implements a portable way to check the contents of the
custom and directory dump formats:
- glob_patterns, that can be defined for each test as an array of
glob()-compilable strings, tracking the contents that should or should
not be compressed. While this is useful to make sure that the table
data is compressed, this also checks that blobs.toc and toc.dat are
never compressed.
- command_like, to execute a command on a dump and check its generated
output. This is used here in correlation with pg_restore -l to check if
the dumps have been compressed or not, depending on if the build
supports gzip, or not.
This hole in the tests has come up when working on 5e73a60, where
compression has to be applied by default, if available, for both dump
formats.
The idea of glob_patterns comes from me, and Georgios has come up with
the design for command_like.
Author: Georgios Kokolatos, Michael Paquier
Discussion: https://postgr.es/m/DQn4czCWR1rcbGPLL7p3LfEr5-kGmlySm-H05VgroINdikvhtS5r9EdI6b8D8sjnbKdJ09k-cxs2AqijBeHAWk9Q8gvEAxPRHuLRhwONcGc=@pm.me
Keeping the SQL commands that initdb runs in string arrays before
feeding them to PG_CMD_PUTS() seems unnecessarily verbose and
inflexible. In some cases, the array only has one member. In other
cases, one might want to use PG_CMD_PRINTF() instead, to parametrize a
command, but that would require breaking up the loop or using
workarounds like replace_token(). Unwind all that; it's much simpler
that way.
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://www.postgresql.org/message-id/flat/2c50823b-f453-bb97-e38b-34751c51dcdf%40enterprisedb.com
A couple of places weren't up to speed for this. By sheer good
luck, we didn't fail but just selected a non-memoized join plan,
at least in the test case we have. Nonetheless, it's a bug,
and I'm not quite sure that it couldn't have worse consequences
in other examples. So back-patch to v14 where Memoize came in.
Richard Guo
Discussion: https://postgr.es/m/CAMbWs48GkNom272sfp0-WeD6_0HSR19BJ4H1c9ZKSfbVnJsvRg@mail.gmail.com
For historical reasons, pg_dump refers to large objects as "BLOBs".
This term is not used anywhere else in PostgreSQL, and it also means
something different in the SQL standard and other SQL systems.
This patch renames internal functions, code comments, documentation,
etc. to use the "large object" or "LO" terminology instead. There is
no functionality change, so the archive format still uses the name
"BLOB" for the archive entry. Additional long command-line options
are added with the new naming.
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/flat/868a381f-4650-9460-1726-1ffd39a270b4%40enterprisedb.com
The error messages reported during any failures while reading or
validating the header of a WAL currently includes only the offset of the
page but not the compiled LSN referring to the page, requiring an extra
step to compile it if looking at the surroundings with pg_waldump or
similar. Adding this information costs a bit in translation, but also
eases debugging.
Author: Bharath Rupireddy
Reviewed-by: Álvaro Herrera, Kyotaro Horiguchi, Maxim Orlov, Michael
Paquier
Discussion: https://postgr.es/m/CALj2ACWV=FCddsxcGbVOA=cvPyMr75YCFbSQT6g4KDj=gcJK4g@mail.gmail.com
As pointed out by Dean Rasheed, we really should be using tmp >
-(PG_INTNN_MIN / 10) rather than tmp > (PG_INTNN_MAX / 10) for checking
for overflows in the accumulation in the pg_strtointNN functions. This
does happen to be the same number when dividing by 10, but there is a
pending patch which adds other bases and this is not the same number if we
were to divide by 2 rather than 10, for example. If the base 2 parsing
was to follow this example then we could accidentally think a string
containing the value of PG_INT32_MIN was an overflow in pg_strtoint32.
Clearly that shouldn't overflow.
This does not fix any actual live bugs, only some bad examples of overflow
checks for future bases.
Reported-by: Dean Rasheed
Discussion: https://postgr.es/m/CAEZATCVEtwfhdm-K-etZYFB0=qsR0nT6qXta_W+GQx4RYph1dg@mail.gmail.com
Just because I'm a neatnik, and I'm currently working on
code in this area. It annoys me to not be able to pgindent
my patches without working around unrelated changes.
It neglected to recurse to the subpath, meaning you'd get back
a path identical to the input. This could produce wrong query
results if the omission meant that the subpath fails to enforce
some join clause it should be enforcing. We don't have a test
case for this at the moment, but the code is obviously broken
and the fix is equally obvious. Back-patch to v14 where
Memoize was introduced.
Richard Guo
Discussion: https://postgr.es/m/CAMbWs4_R=ORpz=Lkn2q3ebPC5EuWyfZF+tmfCPVLBVK5W39mHA@mail.gmail.com
These two functions failed to cover MaterialPath. That's not a
fatal problem, but we can generate better plans in some cases
if we support it.
Tom Lane and Richard Guo
Discussion: https://postgr.es/m/1854233.1669949723@sss.pgh.pa.us
We might fail to generate a partitionwise join, because
reparameterize_path_by_child() does not support all path types.
This should not be a hard failure condition: we should just fall back
to a non-partitioned join. However, generate_partitionwise_join_paths
did not consider this possibility and would emit the (misleading)
error "could not devise a query plan for the given query" if we'd
failed to make any paths for a child join. Fix it to give up on
partitionwise joining if so. (The accepted technique for giving up
appears to be to set rel->nparts = 0, which I find pretty bizarre,
but there you have it.)
I have not added a test case because there'd be little point:
any omissions of this sort that we identify would soon get fixed
by extending reparameterize_path_by_child(), so the test would stop
proving anything. However, right now there is a known test case based
on failure to cover MaterialPath, and with that I've found that this
is broken in all supported versions. Hence, patch all the way back.
Original report and patch by me; thanks to Richard Guo for
identifying a test case that works against committed versions.
Discussion: https://postgr.es/m/1854233.1669949723@sss.pgh.pa.us
Experiments have shown that modern versions of both gcc and clang are
unable to fully optimize the multiplication by 10 that we're doing in the
pg_strtointNN functions. Both compilers seem to be making use of "imul",
which is not the most efficient way to multiply by 10. This seems to be
due to the overflow checking that we're doing. Without the overflow
checks, both those compilers switch to a more efficient method of
multiplying by 10. In absence of overflow concern, integer multiplication
by 10 can be done by bit-shifting left 3 places to multiply by 8 and then
adding the original value twice.
To allow compilers this flexibility, here we adjust the code so that we
accumulate the number as an unsigned version of the type and remove the
use of pg_mul_sNN_overflow() and pg_sub_sNN_overflow(). The overflow
checking can be done simply by checking if the accumulated value has gone
beyond a 10th of the maximum *signed* value for the given type. If it has
then the accumulation of the next digit will cause an overflow. After
this is done, we do a final overflow check before converting the unsigned
version of the number back to its signed counterpart.
Testing has shown about an 8% speedup of a COPY into a table containing 2
INT columns.
Author: David Rowley, Dean Rasheed
Discussion: https://postgr.es/m/CAApHDvrL6_+wKgPqRHr7gH_6xy3hXM6a3QCsZ5ForurjDFfenA@mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvrdYByjfj-=WbmVNFgmVZg88-dE7heukw8p55aJ+W=qxQ@mail.gmail.com
I wrote this to provide a home for a planned discussion of error
return conventions for non-error-throwing functions. But it seems
useful as documentation of existing code no matter what becomes of
that proposal, so commit separately.