Under interval_ops, some equal values are distinguishable. One such
pair is '24:00:00' and '1 day'. With that being so, btequalimage()
breaches the documented contract for the "equalimage" btree support
function. This can cause incorrect results from index-only scans.
Users should REINDEX any btree indexes having interval-type columns.
After updating, pg_amcheck will report an error for almost all such
indexes. This fix makes interval_ops simply omit the support function,
like numeric_ops does. Back-pack to v13, where btequalimage() first
appeared. In back branches, for the benefit of old catalog content,
btequalimage() code will return false for type "interval". Going
forward, back-branch initdb will include the catalog change.
Reviewed by Peter Geoghegan.
Discussion: https://postgr.es/m/20231011013317.22.nmisch@google.com
Memoize would always use the hash equality operator for the cache key
types to determine if the current set of parameters were the same as some
previously cached set. Certain types such as floating points where -0.0
and +0.0 differ in their binary representation but are classed as equal by
the hash equality operator may cause problems as unless the join uses the
same operator it's possible that whichever join operator is being used
would be able to distinguish the two values. In which case we may
accidentally return in the incorrect rows out of the cache.
To fix this here we add a binary mode to Memoize to allow it to the
current set of parameters to previously cached values by comparing
bit-by-bit rather than logically using the hash equality operator. This
binary mode is always used for LATERAL joins and it's used for normal
joins when any of the join operators are not hashable.
Reported-by: Tom Lane
Author: David Rowley
Discussion: https://postgr.es/m/3004308.1632952496@sss.pgh.pa.us
Backpatch-through: 14, where Memoize was added
Invent the concept of a B-Tree equalimage ("equality implies image
equality") support function, registered as support function 4. This
indicates whether it is safe (or not safe) to apply optimizations that
assume that any two datums considered equal by an operator class's order
method must be interchangeable without any loss of semantic information.
This is static information about an operator class and a collation.
Register an equalimage routine for almost all of the existing B-Tree
opclasses. We only need two trivial routines for all of the opclasses
that are included with the core distribution. There is one routine for
opclasses that index non-collatable types (which returns 'true'
unconditionally), plus another routine for collatable types (which
returns 'true' when the collation is a deterministic collation).
This patch is infrastructure for an upcoming patch that adds B-Tree
deduplication.
Author: Peter Geoghegan, Anastasia Lubennikova
Discussion: https://postgr.es/m/CAH2-Wzn3Ee49Gmxb7V1VJ3-AC8fWn-Fr8pfWQebHe8rYRxt5OQ@mail.gmail.com
Bring datum_image_eq() in line with datumIsEqual() by adding support for
comparing cstring datums.
An upcoming patch that adds deduplication to the nbtree AM will use
datum_image_eq(). datum_image_eq() will need to work with all datatypes
that can be used as the storage type of a B-Tree index column, including
cstring. (cstring is used as the storage type for columns of type
"name" as a space-saving optimization.)
Discussion: https://postgr.es/m/CAH2-Wzn3Ee49Gmxb7V1VJ3-AC8fWn-Fr8pfWQebHe8rYRxt5OQ@mail.gmail.com
Rearrange the logic in record_image_cmp() and datum_image_eq() to
error out on unexpected typlens (either not supported there or
completely invalid due to corruption). Barring corruption, this is
not possible today but it seems more future-proof and robust to fix
this.
Reported-by: Peter Geoghegan <pg@bowt.ie>
detoast.c/h contain functions required to detoast a datum, partially
or completely, plus a few other utility functions for examining the
size of toasted datums.
toast_internals.c/h contain functions that are used internally to the
TOAST subsystem but which (mostly) do not need to be accessed from
outside.
heaptoast.c/h contains code that is intrinsically specific to the
heap AM, either because it operates on HeapTuples or is based on the
layout of a heap page.
detoast.c and toast_internals.c are placed in
src/backend/access/common rather than src/backend/access/heap. At
present, both files still have dependencies on the heap, but that will
be improved in a future commit.
Patch by me, reviewed and tested by Prabhat Sabu, Thomas Munro,
Andres Freund, and Álvaro Herrera.
Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com
In RI_FKey_pk_upd_check_required(), we check among other things
whether the old and new key are equal, so that we don't need to run
cascade actions when nothing has actually changed. This was using the
equality operator. But the effect of this is that if a value in the
primary key is changed to one that "looks" different but compares as
equal, the update is not propagated. (Examples are float -0 and 0 and
case-insensitive text.) This appears to violate the SQL standard, and
it also behaves inconsistently if in a multicolumn key another key is
also updated that would cause the row to compare as not equal.
To fix, if we are looking at the PK table in ri_KeysEqual(), then do a
bytewise comparison similar to record_image_eq() instead of using the
equality operators. This only makes a difference for ON UPDATE
CASCADE, but for consistency we treat all changes to the PK the same. For
the FK table, we continue to use the equality operators.
Discussion: https://www.postgresql.org/message-id/flat/3326fc2e-bc02-d4c5-e3e5-e54da466e89a@2ndquadrant.com
The API (EOH_flatten_into) that flattens the expanded value representation
expects the target address to be maxaligned. All it's usage adhere to that
principle except when serializing datums for parallel query. Fix that
usage.
Diagnosed-by: Tom Lane
Author: Tom Lane and Amit Kapila
Backpatch-through: 9.6
Discussion: https://postgr.es/m/11629.1536550032@sss.pgh.pa.us
Commit 1efc7e538 did a poor job of emulating existing logic for touching
Datums that might be expanded-object pointers. It didn't check for typlen
being -1 first, which meant it could crash on fixed-length pass-by-ref
values, and probably on cstring values as well. It also didn't use
DatumGetPointer before VARATT_IS_EXTERNAL_EXPANDED, which while currently
harmless is not according to documentation nor prevailing style.
I also think the lack of any explanation as to why datumSerialize makes
these particular nonobvious choices is pretty awful, so fix that.
Per report from Jarred Ward. Back-patch to 9.6 where this code came in.
Discussion: https://postgr.es/m/6F61E6D2-2F5E-4794-9479-A429BE1CEA4B@simple.com
The code here previously tried to call the partitioning operator, but
really the right thing to do (and the safe thing to do) is use
datumIsEqual().
Amit Langote, but I expanded the comment and fixed a compiler warning.
Commit d1b7c1ffe7 introduced a mechanism
for serializing a ParamListInfo structure to be passed to a parallel
worker. However, this mechanism failed to handle external expanded
values, as pointed out by Noah Misch. Repair.
Moreover, plpgsql_param_fetch requires adjustment because the
serialization mechanism needs it to skip evaluating unused parameters
just as we would do when it is called from copyParamList, but params
== estate->paramLI in that case. To fix, make the bms_is_member test
in that function unconditional.
Finally, have setup_param_list set a new ParamListInfo field,
paramMask, to the parameters actually used in the expression, so that
we don't try to fetch those that are not needed when serializing a
parameter list. This isn't necessary for correctness, but it makes
the performance of the parallel executor code comparable to what we
do for cases involving cursors.
Design suggestions and extensive review by Noah Misch. Patch by me.
This code provides infrastructure for a parallel leader to start up
parallel workers to execute subtrees of the plan tree being executed
in the master. User-supplied parameters from ParamListInfo are passed
down, but PARAM_EXEC parameters are not. Various other constructs,
such as initplans, subplans, and CTEs, are also not currently shared.
Nevertheless, there's enough here to support a basic implementation of
parallel query, and we can lift some of the current restrictions as
needed.
Amit Kapila and Robert Haas
This patch introduces the ability for complex datatypes to have an
in-memory representation that is different from their on-disk format.
On-disk formats are typically optimized for minimal size, and in any case
they can't contain pointers, so they are often not well-suited for
computation. Now a datatype can invent an "expanded" in-memory format
that is better suited for its operations, and then pass that around among
the C functions that operate on the datatype. There are also provisions
(rudimentary as yet) to allow an expanded object to be modified in-place
under suitable conditions, so that operations like assignment to an element
of an array need not involve copying the entire array.
The initial application for this feature is arrays, but it is not hard
to foresee using it for other container types like JSON, XML and hstore.
I have hopes that it will be useful to PostGIS as well.
In this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve performance for arrays that are stored in
plpgsql variables. We would like to generalize those hacks so that
other datatypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a task for future work. (The heuristics
themselves are probably not optimal yet, either, as they sometimes
force expansion of arrays that would be better left alone.)
Preliminary performance testing shows impressive speed gains for plpgsql
functions that do element-by-element access or update of large arrays.
There are other cases that get a little slower, as a result of added array
format conversions; but we can hope to improve anything that's annoyingly
bad. In any case most applications should see a net win.
Tom Lane, reviewed by Andres Freund
This commit breaks any code that assumes that the mere act of forming a tuple
(without writing it to disk) does not "toast" any fields. While all available
regression tests pass, I'm not totally sure that we've fixed every nook and
cranny, especially in contrib.
Greg Stark with some help from Tom Lane
Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with
VARSIZE and VARDATA, and as a consequence almost no code was using the
longer names. Rename the length fields of struct varlena and various
derived structures to catch anyplace that was accessing them directly;
and clean up various places so caught. In itself this patch doesn't
change any behavior at all, but it is necessary infrastructure if we hope
to play any games with the representation of varlena headers.
Greg Stark and Tom Lane
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
value '-2' is used to indicate a variable-width type whose width is
computed as strlen(datum)+1. Everything that looks at typlen is updated
except for array support, which Joe Conway is working on; at the moment
it wouldn't work to try to create an array of cstring.
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
memory contexts. Currently, only leaks in expressions executed as
quals or projections are handled. Clean up some old dead cruft in
executor while at it --- unused fields in state nodes, that sort of thing.