postgresql/src/backend/nodes/readfuncs.c

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

832 lines
23 KiB
C
Raw Normal View History

/*-------------------------------------------------------------------------
*
* readfuncs.c
* Reader functions for Postgres tree nodes.
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
*
* IDENTIFICATION
2010-09-20 22:08:53 +02:00
* src/backend/nodes/readfuncs.c
*
* NOTES
* Parse location fields are written out by outfuncs.c, but only for
2018-09-18 23:11:54 +02:00
* debugging use. When reading a location field, we normally discard
* the stored value and set the location field to -1 (ie, "unknown").
* This is because nodes coming from a stored rule should not be thought
* to have a known location in the current query's text.
*
2018-09-18 23:11:54 +02:00
* However, if restore_location_fields is true, we do restore location
* fields from the string. This is currently intended only for use by the
* WRITE_READ_PARSE_PLAN_TREES test code, which doesn't want to cause
* any change in the node contents.
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include <math.h>
#include "miscadmin.h"
Automatically generate node support functions Add a script to automatically generate the node support functions (copy, equal, out, and read, as well as the node tags enum) from the struct definitions. For each of the four node support files, it creates two include files, e.g., copyfuncs.funcs.c and copyfuncs.switch.c, to include in the main file. All the scaffolding of the main file stays in place. I have tried to mostly make the coverage of the output match what is currently there. For example, one could now do out/read coverage of utility statement nodes, but I have manually excluded those for now. The reason is mainly that it's easier to diff the before and after, and adding a bunch of stuff like this might require a separate analysis and review. Subtyping (TidScan -> Scan) is supported. For the hard cases, you can just write a manual function and exclude generating one. For the not so hard cases, there is a way of annotating struct fields to get special behaviors. For example, pg_node_attr(equal_ignore) has the field ignored in equal functions. (In this patch, I have only ifdef'ed out the code to could be removed, mainly so that it won't constantly have merge conflicts. It will be deleted in a separate patch. All the code comments that are worth keeping from those sections have already been moved to the header files where the structs are defined.) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce%40enterprisedb.com
2022-07-09 08:52:19 +02:00
#include "nodes/bitmapset.h"
#include "nodes/readfuncs.h"
/*
* Macros to simplify reading of different kinds of fields. Use these
* wherever possible to reduce the chance for silly typos. Note that these
* hard-wire conventions about the names of the local variables in a Read
* routine.
*/
/* Macros for declaring appropriate local variables */
/* A few guys need only local_node */
#define READ_LOCALS_NO_FIELDS(nodeTypeName) \
nodeTypeName *local_node = makeNode(nodeTypeName)
/* And a few guys need only the pg_strtok support fields */
#define READ_TEMP_LOCALS() \
2018-09-18 23:11:54 +02:00
const char *token; \
int length
/* ... but most need both */
#define READ_LOCALS(nodeTypeName) \
READ_LOCALS_NO_FIELDS(nodeTypeName); \
READ_TEMP_LOCALS()
/* Read an integer field (anything written as ":fldname %d") */
#define READ_INT_FIELD(fldname) \
token = pg_strtok(&length); /* skip :fldname */ \
token = pg_strtok(&length); /* get field value */ \
local_node->fldname = atoi(token)
/* Read an unsigned integer field (anything written as ":fldname %u") */
#define READ_UINT_FIELD(fldname) \
token = pg_strtok(&length); /* skip :fldname */ \
token = pg_strtok(&length); /* get field value */ \
local_node->fldname = atoui(token)
/* Read an unsigned integer field (anything written using UINT64_FORMAT) */
#define READ_UINT64_FIELD(fldname) \
token = pg_strtok(&length); /* skip :fldname */ \
token = pg_strtok(&length); /* get field value */ \
local_node->fldname = strtou64(token, NULL, 10)
/* Read a long integer field (anything written as ":fldname %ld") */
#define READ_LONG_FIELD(fldname) \
token = pg_strtok(&length); /* skip :fldname */ \
token = pg_strtok(&length); /* get field value */ \
local_node->fldname = atol(token)
/* Read an OID field (don't hard-wire assumption that OID is same as uint) */
#define READ_OID_FIELD(fldname) \
token = pg_strtok(&length); /* skip :fldname */ \
token = pg_strtok(&length); /* get field value */ \
local_node->fldname = atooid(token)
/* Read a char field (ie, one ascii character) */
#define READ_CHAR_FIELD(fldname) \
token = pg_strtok(&length); /* skip :fldname */ \
token = pg_strtok(&length); /* get field value */ \
/* avoid overhead of calling debackslash() for one char */ \
local_node->fldname = (length == 0) ? '\0' : (token[0] == '\\' ? token[1] : token[0])
/* Read an enumerated-type field that was written as an integer code */
#define READ_ENUM_FIELD(fldname, enumtype) \
token = pg_strtok(&length); /* skip :fldname */ \
token = pg_strtok(&length); /* get field value */ \
local_node->fldname = (enumtype) atoi(token)
/* Read a float field */
#define READ_FLOAT_FIELD(fldname) \
token = pg_strtok(&length); /* skip :fldname */ \
token = pg_strtok(&length); /* get field value */ \
local_node->fldname = atof(token)
/* Read a boolean field */
#define READ_BOOL_FIELD(fldname) \
token = pg_strtok(&length); /* skip :fldname */ \
token = pg_strtok(&length); /* get field value */ \
local_node->fldname = strtobool(token)
/* Read a character-string field */
#define READ_STRING_FIELD(fldname) \
token = pg_strtok(&length); /* skip :fldname */ \
token = pg_strtok(&length); /* get field value */ \
local_node->fldname = nullable_string(token, length)
2018-09-18 23:11:54 +02:00
/* Read a parse location field (and possibly throw away the value) */
#ifdef WRITE_READ_PARSE_PLAN_TREES
#define READ_LOCATION_FIELD(fldname) \
token = pg_strtok(&length); /* skip :fldname */ \
token = pg_strtok(&length); /* get field value */ \
local_node->fldname = restore_location_fields ? atoi(token) : -1
#else
#define READ_LOCATION_FIELD(fldname) \
token = pg_strtok(&length); /* skip :fldname */ \
token = pg_strtok(&length); /* get field value */ \
(void) token; /* in case not used elsewhere */ \
local_node->fldname = -1 /* set field to "unknown" */
2018-09-18 23:11:54 +02:00
#endif
/* Read a Node field */
#define READ_NODE_FIELD(fldname) \
token = pg_strtok(&length); /* skip :fldname */ \
(void) token; /* in case not used elsewhere */ \
local_node->fldname = nodeRead(NULL, 0)
/* Read a bitmapset field */
#define READ_BITMAPSET_FIELD(fldname) \
token = pg_strtok(&length); /* skip :fldname */ \
(void) token; /* in case not used elsewhere */ \
local_node->fldname = _readBitmapset()
/* Read an attribute number array */
#define READ_ATTRNUMBER_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readAttrNumberCols(len)
/* Read an oid array */
#define READ_OID_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readOidCols(len)
/* Read an int array */
#define READ_INT_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readBoolCols(len)
/* Routine exit */
#define READ_DONE() \
return local_node
/*
* NOTE: use atoi() to read values written with %d, or atoui() to read
* values written with %u in outfuncs.c. An exception is OID values,
* for which use atooid(). (As of 7.1, outfuncs.c writes OIDs as %u,
* but this will probably change in the future.)
*/
#define atoui(x) ((unsigned int) strtoul((x), NULL, 10))
#define strtobool(x) ((*(x) == 't') ? true : false)
static char *
nullable_string(const char *token, int length)
{
/* outToken emits <> for NULL, and pg_strtok makes that an empty string */
if (length == 0)
return NULL;
/* outToken emits "" for empty string */
if (length == 2 && token[0] == '"' && token[1] == '"')
return pstrdup("");
/* otherwise, we must remove protective backslashes added by outToken */
return debackslash(token, length);
}
/*
* _readBitmapset
*
* Note: this code is used in contexts where we know that a Bitmapset
* is expected. There is equivalent code in nodeRead() that can read a
* Bitmapset when we come across one in other contexts.
*/
static Bitmapset *
_readBitmapset(void)
{
Bitmapset *result = NULL;
READ_TEMP_LOCALS();
token = pg_strtok(&length);
if (token == NULL)
elog(ERROR, "incomplete Bitmapset structure");
if (length != 1 || token[0] != '(')
elog(ERROR, "unrecognized token: \"%.*s\"", length, token);
token = pg_strtok(&length);
if (token == NULL)
elog(ERROR, "incomplete Bitmapset structure");
if (length != 1 || token[0] != 'b')
elog(ERROR, "unrecognized token: \"%.*s\"", length, token);
for (;;)
{
int val;
char *endptr;
token = pg_strtok(&length);
if (token == NULL)
elog(ERROR, "unterminated Bitmapset structure");
if (length == 1 && token[0] == ')')
break;
val = (int) strtol(token, &endptr, 10);
if (endptr != token + length)
elog(ERROR, "unrecognized integer: \"%.*s\"", length, token);
result = bms_add_member(result, val);
}
return result;
}
/*
* We export this function for use by extensions that define extensible nodes.
* That's somewhat historical, though, because calling nodeRead() will work.
*/
Bitmapset *
readBitmapset(void)
{
return _readBitmapset();
}
Automatically generate node support functions Add a script to automatically generate the node support functions (copy, equal, out, and read, as well as the node tags enum) from the struct definitions. For each of the four node support files, it creates two include files, e.g., copyfuncs.funcs.c and copyfuncs.switch.c, to include in the main file. All the scaffolding of the main file stays in place. I have tried to mostly make the coverage of the output match what is currently there. For example, one could now do out/read coverage of utility statement nodes, but I have manually excluded those for now. The reason is mainly that it's easier to diff the before and after, and adding a bunch of stuff like this might require a separate analysis and review. Subtyping (TidScan -> Scan) is supported. For the hard cases, you can just write a manual function and exclude generating one. For the not so hard cases, there is a way of annotating struct fields to get special behaviors. For example, pg_node_attr(equal_ignore) has the field ignored in equal functions. (In this patch, I have only ifdef'ed out the code to could be removed, mainly so that it won't constantly have merge conflicts. It will be deleted in a separate patch. All the code comments that are worth keeping from those sections have already been moved to the header files where the structs are defined.) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce%40enterprisedb.com
2022-07-09 08:52:19 +02:00
#include "readfuncs.funcs.c"
/*
* Support functions for nodes with custom_read_write attribute or
* special_read_write attribute
*/
static Const *
_readConst(void)
Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
{
READ_LOCALS(Const);
Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
READ_OID_FIELD(consttype);
READ_INT_FIELD(consttypmod);
READ_OID_FIELD(constcollid);
READ_INT_FIELD(constlen);
READ_BOOL_FIELD(constbyval);
READ_BOOL_FIELD(constisnull);
Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
READ_LOCATION_FIELD(location);
token = pg_strtok(&length); /* skip :constvalue */
if (local_node->constisnull)
token = pg_strtok(&length); /* skip "<>" */
else
local_node->constvalue = readDatum(local_node->constbyval);
READ_DONE();
}
static BoolExpr *
_readBoolExpr(void)
{
READ_LOCALS(BoolExpr);
/* do-it-yourself enum representation */
token = pg_strtok(&length); /* skip :boolop */
token = pg_strtok(&length); /* get field value */
if (length == 3 && strncmp(token, "and", 3) == 0)
local_node->boolop = AND_EXPR;
else if (length == 2 && strncmp(token, "or", 2) == 0)
local_node->boolop = OR_EXPR;
else if (length == 3 && strncmp(token, "not", 3) == 0)
local_node->boolop = NOT_EXPR;
else
elog(ERROR, "unrecognized boolop \"%.*s\"", length, token);
READ_NODE_FIELD(args);
READ_LOCATION_FIELD(location);
READ_DONE();
}
static A_Const *
_readA_Const(void)
{
READ_LOCALS(A_Const);
/* We expect either NULL or :val here */
token = pg_strtok(&length);
if (length == 4 && strncmp(token, "NULL", 4) == 0)
local_node->isnull = true;
else
{
union ValUnion *tmp = nodeRead(NULL, 0);
/* To forestall valgrind complaints, copy only the valid data */
switch (nodeTag(tmp))
{
case T_Integer:
memcpy(&local_node->val, tmp, sizeof(Integer));
break;
case T_Float:
memcpy(&local_node->val, tmp, sizeof(Float));
break;
case T_Boolean:
memcpy(&local_node->val, tmp, sizeof(Boolean));
break;
case T_String:
memcpy(&local_node->val, tmp, sizeof(String));
break;
case T_BitString:
memcpy(&local_node->val, tmp, sizeof(BitString));
break;
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(tmp));
break;
}
}
READ_LOCATION_FIELD(location);
READ_DONE();
}
/*
* _readConstraint
*/
static Constraint *
_readConstraint(void)
{
READ_LOCALS(Constraint);
READ_STRING_FIELD(conname);
READ_BOOL_FIELD(deferrable);
READ_BOOL_FIELD(initdeferred);
READ_LOCATION_FIELD(location);
token = pg_strtok(&length); /* skip :contype */
token = pg_strtok(&length); /* get field value */
if (length == 4 && strncmp(token, "NULL", 4) == 0)
local_node->contype = CONSTR_NULL;
else if (length == 8 && strncmp(token, "NOT_NULL", 8) == 0)
local_node->contype = CONSTR_NOTNULL;
else if (length == 7 && strncmp(token, "DEFAULT", 7) == 0)
local_node->contype = CONSTR_DEFAULT;
else if (length == 8 && strncmp(token, "IDENTITY", 8) == 0)
local_node->contype = CONSTR_IDENTITY;
else if (length == 9 && strncmp(token, "GENERATED", 9) == 0)
local_node->contype = CONSTR_GENERATED;
else if (length == 5 && strncmp(token, "CHECK", 5) == 0)
local_node->contype = CONSTR_CHECK;
else if (length == 11 && strncmp(token, "PRIMARY_KEY", 11) == 0)
local_node->contype = CONSTR_PRIMARY;
else if (length == 6 && strncmp(token, "UNIQUE", 6) == 0)
local_node->contype = CONSTR_UNIQUE;
else if (length == 9 && strncmp(token, "EXCLUSION", 9) == 0)
local_node->contype = CONSTR_EXCLUSION;
else if (length == 11 && strncmp(token, "FOREIGN_KEY", 11) == 0)
local_node->contype = CONSTR_FOREIGN;
else if (length == 15 && strncmp(token, "ATTR_DEFERRABLE", 15) == 0)
local_node->contype = CONSTR_ATTR_DEFERRABLE;
else if (length == 19 && strncmp(token, "ATTR_NOT_DEFERRABLE", 19) == 0)
local_node->contype = CONSTR_ATTR_NOT_DEFERRABLE;
else if (length == 13 && strncmp(token, "ATTR_DEFERRED", 13) == 0)
local_node->contype = CONSTR_ATTR_DEFERRED;
else if (length == 14 && strncmp(token, "ATTR_IMMEDIATE", 14) == 0)
local_node->contype = CONSTR_ATTR_IMMEDIATE;
switch (local_node->contype)
{
case CONSTR_NULL:
/* no extra fields */
Catalog NOT NULL constraints We now create pg_constaint rows for NOT NULL constraints with contype='n'. We propagate these constraints during operations such as adding inheritance relationships, creating and attaching partitions, creating tables LIKE other tables. We mostly follow the well-known rules of conislocal and coninhcount that we have for CHECK constraints, with some adaptations; for example, as opposed to CHECK constraints, we don't match NOT NULL ones by name when descending a hierarchy to alter it; instead we match by column number. This means we don't require the constraint names to be identical across a hierarchy. For now, we omit them from system catalogs. Maybe this is worth reconsidering. We don't support NOT VALID nor DEFERRABLE clauses either; these can be added as separate features later (this patch is already large and complicated enough.) This has been very long in the making. The first patch was written by Bernd Helmle in 2010 to add a new pg_constraint.contype value ('n'), which I (Álvaro) then hijacked in 2011 and 2012, until that one was killed by the realization that we ought to use contype='c' instead: manufactured CHECK constraints. However, later SQL standard development, as well as nonobvious emergent properties of that design (mostly, failure to distinguish them from "normal" CHECK constraints as well as the performance implication of having to test the CHECK expression) led us to reconsider this choice, so now the current implementation uses contype='n' again. In 2016 Vitaly Burovoy also worked on this feature[1] but found no consensus for his proposed approach, which was claimed to be closer to the letter of the standard, requiring additional pg_attribute columns to track the OID of the NOT NULL constraint for that column. [1] https://postgr.es/m/CAKOSWNkN6HSyatuys8xZxzRCR-KL1OkHS5-b9qd9bf1Rad3PLA@mail.gmail.com Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Bernd Helmle <mailings@oopsware.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/CACA0E642A0267EDA387AF2B%40%5B172.26.14.62%5D Discussion: https://postgr.es/m/AANLkTinLXMOEMz+0J29tf1POokKi4XDkWJ6-DDR9BKgU@mail.gmail.com Discussion: https://postgr.es/m/20110707213401.GA27098@alvh.no-ip.org Discussion: https://postgr.es/m/1343682669-sup-2532@alvh.no-ip.org Discussion: https://postgr.es/m/CAKOSWNkN6HSyatuys8xZxzRCR-KL1OkHS5-b9qd9bf1Rad3PLA@mail.gmail.com Discussion: https://postgr.es/m/20220817181249.q7qvj3okywctra3c@alvherre.pgsql
2023-04-07 19:20:53 +02:00
break;
Catalog not-null constraints We now create contype='n' pg_constraint rows for not-null constraints. We propagate these constraints to other tables during operations such as adding inheritance relationships, creating and attaching partitions and creating tables LIKE other tables. We also spawn not-null constraints for inheritance child tables when their parents have primary keys. These related constraints mostly follow the well-known rules of conislocal and coninhcount that we have for CHECK constraints, with some adaptations: for example, as opposed to CHECK constraints, we don't match not-null ones by name when descending a hierarchy to alter it, instead matching by column name that they apply to. This means we don't require the constraint names to be identical across a hierarchy. For now, we omit them for system catalogs. Maybe this is worth reconsidering. We don't support NOT VALID nor DEFERRABLE clauses either; these can be added as separate features later (this patch is already large and complicated enough.) psql shows these constraints in \d+. pg_dump requires some ad-hoc hacks, particularly when dumping a primary key. We now create one "throwaway" not-null constraint for each column in the PK together with the CREATE TABLE command, and once the PK is created, all those throwaway constraints are removed. This avoids having to check each tuple for nullness when the dump restores the primary key creation. pg_upgrading from an older release requires a somewhat brittle procedure to create a constraint state that matches what would be created if the database were being created fresh in Postgres 17. I have tested all the scenarios I could think of, and it works correctly as far as I can tell, but I could have neglected weird cases. This patch has been very long in the making. The first patch was written by Bernd Helmle in 2010 to add a new pg_constraint.contype value ('n'), which I (Álvaro) then hijacked in 2011 and 2012, until that one was killed by the realization that we ought to use contype='c' instead: manufactured CHECK constraints. However, later SQL standard development, as well as nonobvious emergent properties of that design (mostly, failure to distinguish them from "normal" CHECK constraints as well as the performance implication of having to test the CHECK expression) led us to reconsider this choice, so now the current implementation uses contype='n' again. During Postgres 16 this had already been introduced by commit e056c557aef4, but there were some problems mainly with the pg_upgrade procedure that couldn't be fixed in reasonable time, so it was reverted. In 2016 Vitaly Burovoy also worked on this feature[1] but found no consensus for his proposed approach, which was claimed to be closer to the letter of the standard, requiring an additional pg_attribute column to track the OID of the not-null constraint for that column. [1] https://postgr.es/m/CAKOSWNkN6HSyatuys8xZxzRCR-KL1OkHS5-b9qd9bf1Rad3PLA@mail.gmail.com Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Bernd Helmle <mailings@oopsware.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
2023-08-25 13:31:24 +02:00
case CONSTR_NOTNULL:
READ_NODE_FIELD(keys);
READ_INT_FIELD(inhcount);
READ_BOOL_FIELD(is_no_inherit);
READ_BOOL_FIELD(skip_validation);
READ_BOOL_FIELD(initially_valid);
break;
case CONSTR_DEFAULT:
READ_NODE_FIELD(raw_expr);
READ_STRING_FIELD(cooked_expr);
break;
case CONSTR_IDENTITY:
READ_NODE_FIELD(options);
READ_CHAR_FIELD(generated_when);
break;
case CONSTR_GENERATED:
READ_NODE_FIELD(raw_expr);
READ_STRING_FIELD(cooked_expr);
READ_CHAR_FIELD(generated_when);
break;
case CONSTR_CHECK:
READ_BOOL_FIELD(is_no_inherit);
READ_NODE_FIELD(raw_expr);
READ_STRING_FIELD(cooked_expr);
READ_BOOL_FIELD(skip_validation);
READ_BOOL_FIELD(initially_valid);
break;
case CONSTR_PRIMARY:
READ_NODE_FIELD(keys);
READ_BOOL_FIELD(without_overlaps);
READ_NODE_FIELD(including);
READ_NODE_FIELD(options);
READ_STRING_FIELD(indexname);
READ_STRING_FIELD(indexspace);
READ_BOOL_FIELD(reset_default_tblspc);
/* access_method and where_clause not currently used */
break;
case CONSTR_UNIQUE:
READ_BOOL_FIELD(nulls_not_distinct);
READ_NODE_FIELD(keys);
READ_BOOL_FIELD(without_overlaps);
READ_NODE_FIELD(including);
READ_NODE_FIELD(options);
READ_STRING_FIELD(indexname);
READ_STRING_FIELD(indexspace);
READ_BOOL_FIELD(reset_default_tblspc);
/* access_method and where_clause not currently used */
break;
case CONSTR_EXCLUSION:
READ_NODE_FIELD(exclusions);
READ_NODE_FIELD(including);
READ_NODE_FIELD(options);
READ_STRING_FIELD(indexname);
READ_STRING_FIELD(indexspace);
READ_BOOL_FIELD(reset_default_tblspc);
READ_STRING_FIELD(access_method);
READ_NODE_FIELD(where_clause);
break;
case CONSTR_FOREIGN:
READ_NODE_FIELD(pktable);
READ_NODE_FIELD(fk_attrs);
READ_NODE_FIELD(pk_attrs);
READ_CHAR_FIELD(fk_matchtype);
READ_CHAR_FIELD(fk_upd_action);
READ_CHAR_FIELD(fk_del_action);
READ_NODE_FIELD(fk_del_set_cols);
READ_NODE_FIELD(old_conpfeqop);
READ_OID_FIELD(old_pktable_oid);
READ_BOOL_FIELD(skip_validation);
READ_BOOL_FIELD(initially_valid);
break;
case CONSTR_ATTR_DEFERRABLE:
case CONSTR_ATTR_NOT_DEFERRABLE:
case CONSTR_ATTR_DEFERRED:
case CONSTR_ATTR_IMMEDIATE:
/* no extra fields */
break;
default:
elog(ERROR, "unrecognized ConstrType: %d", (int) local_node->contype);
break;
}
READ_DONE();
}
static RangeTblEntry *
_readRangeTblEntry(void)
{
READ_LOCALS(RangeTblEntry);
/* put alias + eref first to make dump more legible */
READ_NODE_FIELD(alias);
READ_NODE_FIELD(eref);
READ_ENUM_FIELD(rtekind, RTEKind);
switch (local_node->rtekind)
{
case RTE_RELATION:
READ_OID_FIELD(relid);
READ_CHAR_FIELD(relkind);
READ_INT_FIELD(rellockmode);
READ_NODE_FIELD(tablesample);
Rework query relation permission checking Currently, information about the permissions to be checked on relations mentioned in a query is stored in their range table entries. So the executor must scan the entire range table looking for relations that need to have permissions checked. This can make the permission checking part of the executor initialization needlessly expensive when many inheritance children are present in the range range. While the permissions need not be checked on the individual child relations, the executor still must visit every range table entry to filter them out. This commit moves the permission checking information out of the range table entries into a new plan node called RTEPermissionInfo. Every top-level (inheritance "root") RTE_RELATION entry in the range table gets one and a list of those is maintained alongside the range table. This new list is initialized by the parser when initializing the range table. The rewriter can add more entries to it as rules/views are expanded. Finally, the planner combines the lists of the individual subqueries into one flat list that is passed to the executor for checking. To make it quick to find the RTEPermissionInfo entry belonging to a given relation, RangeTblEntry gets a new Index field 'perminfoindex' that stores the corresponding RTEPermissionInfo's index in the query's list of the latter. ExecutorCheckPerms_hook has gained another List * argument; the signature is now: typedef bool (*ExecutorCheckPerms_hook_type) (List *rangeTable, List *rtePermInfos, bool ereport_on_violation); The first argument is no longer used by any in-core uses of the hook, but we leave it in place because there may be other implementations that do. Implementations should likely scan the rtePermInfos list to determine which operations to allow or deny. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail.gmail.com
2022-12-06 16:09:24 +01:00
READ_UINT_FIELD(perminfoindex);
break;
case RTE_SUBQUERY:
READ_NODE_FIELD(subquery);
READ_BOOL_FIELD(security_barrier);
Get rid of the "new" and "old" entries in a view's rangetable. The rule system needs "old" and/or "new" pseudo-RTEs in rule actions that are ON INSERT/UPDATE/DELETE. Historically it's put such entries into the ON SELECT rules of views as well, but those are really quite vestigial. The only thing we've used them for is to carry the view's relid forward to AcquireExecutorLocks (so that we can re-lock the view to verify it hasn't changed before re-using a plan) and to carry its relid and permissions data forward to execution-time permissions checks. What we can do instead of that is to retain these fields of the RTE_RELATION RTE for the view even after we convert it to an RTE_SUBQUERY RTE. This requires a tiny amount of extra complication in the planner and AcquireExecutorLocks, but on the other hand we can get rid of the logic that moves that data from one place to another. The principal immediate benefit of doing this, aside from a small saving in the pg_rewrite data for views, is that these pseudo-RTEs no longer trigger ruleutils.c's heuristic about qualifying variable names when the rangetable's length is more than 1. That results in quite a number of small simplifications in regression test outputs, which are all to the good IMO. Bump catversion because we need to dump a few more fields of RTE_SUBQUERY RTEs. While those will always be zeroes anyway in stored rules (because we'd never populate them until query rewrite) they are useful for debugging, and it seems like we'd better make sure to transmit such RTEs accurately in plans sent to parallel workers. I don't think the executor actually examines these fields after startup, but someday it might. This is a second attempt at committing 1b4d280ea. The difference from the first time is that now we can add some filtering rules to AdjustUpgrade.pm to allow cross-version upgrade testing to pass despite all the cosmetic changes in CREATE VIEW outputs. Amit Langote (filtering rules by me) Discussion: https://postgr.es/m/CA+HiwqEf7gPN4Hn+LoZ4tP2q_Qt7n3vw7-6fJKOf92tSEnX6Gg@mail.gmail.com Discussion: https://postgr.es/m/891521.1673657296@sss.pgh.pa.us
2023-01-18 19:23:57 +01:00
/* we re-use these RELATION fields, too: */
READ_OID_FIELD(relid);
READ_CHAR_FIELD(relkind);
Get rid of the "new" and "old" entries in a view's rangetable. The rule system needs "old" and/or "new" pseudo-RTEs in rule actions that are ON INSERT/UPDATE/DELETE. Historically it's put such entries into the ON SELECT rules of views as well, but those are really quite vestigial. The only thing we've used them for is to carry the view's relid forward to AcquireExecutorLocks (so that we can re-lock the view to verify it hasn't changed before re-using a plan) and to carry its relid and permissions data forward to execution-time permissions checks. What we can do instead of that is to retain these fields of the RTE_RELATION RTE for the view even after we convert it to an RTE_SUBQUERY RTE. This requires a tiny amount of extra complication in the planner and AcquireExecutorLocks, but on the other hand we can get rid of the logic that moves that data from one place to another. The principal immediate benefit of doing this, aside from a small saving in the pg_rewrite data for views, is that these pseudo-RTEs no longer trigger ruleutils.c's heuristic about qualifying variable names when the rangetable's length is more than 1. That results in quite a number of small simplifications in regression test outputs, which are all to the good IMO. Bump catversion because we need to dump a few more fields of RTE_SUBQUERY RTEs. While those will always be zeroes anyway in stored rules (because we'd never populate them until query rewrite) they are useful for debugging, and it seems like we'd better make sure to transmit such RTEs accurately in plans sent to parallel workers. I don't think the executor actually examines these fields after startup, but someday it might. This is a second attempt at committing 1b4d280ea. The difference from the first time is that now we can add some filtering rules to AdjustUpgrade.pm to allow cross-version upgrade testing to pass despite all the cosmetic changes in CREATE VIEW outputs. Amit Langote (filtering rules by me) Discussion: https://postgr.es/m/CA+HiwqEf7gPN4Hn+LoZ4tP2q_Qt7n3vw7-6fJKOf92tSEnX6Gg@mail.gmail.com Discussion: https://postgr.es/m/891521.1673657296@sss.pgh.pa.us
2023-01-18 19:23:57 +01:00
READ_INT_FIELD(rellockmode);
READ_UINT_FIELD(perminfoindex);
break;
case RTE_JOIN:
READ_ENUM_FIELD(jointype, JoinType);
READ_INT_FIELD(joinmergedcols);
READ_NODE_FIELD(joinaliasvars);
READ_NODE_FIELD(joinleftcols);
READ_NODE_FIELD(joinrightcols);
READ_NODE_FIELD(join_using_alias);
break;
case RTE_FUNCTION:
READ_NODE_FIELD(functions);
READ_BOOL_FIELD(funcordinality);
break;
case RTE_TABLEFUNC:
READ_NODE_FIELD(tablefunc);
/* The RTE must have a copy of the column type info, if any */
if (local_node->tablefunc)
{
TableFunc *tf = local_node->tablefunc;
local_node->coltypes = tf->coltypes;
local_node->coltypmods = tf->coltypmods;
local_node->colcollations = tf->colcollations;
}
break;
case RTE_VALUES:
READ_NODE_FIELD(values_lists);
READ_NODE_FIELD(coltypes);
READ_NODE_FIELD(coltypmods);
READ_NODE_FIELD(colcollations);
break;
case RTE_CTE:
READ_STRING_FIELD(ctename);
READ_UINT_FIELD(ctelevelsup);
READ_BOOL_FIELD(self_reference);
READ_NODE_FIELD(coltypes);
READ_NODE_FIELD(coltypmods);
READ_NODE_FIELD(colcollations);
break;
case RTE_NAMEDTUPLESTORE:
READ_STRING_FIELD(enrname);
READ_FLOAT_FIELD(enrtuples);
READ_NODE_FIELD(coltypes);
READ_NODE_FIELD(coltypmods);
READ_NODE_FIELD(colcollations);
Get rid of the "new" and "old" entries in a view's rangetable. The rule system needs "old" and/or "new" pseudo-RTEs in rule actions that are ON INSERT/UPDATE/DELETE. Historically it's put such entries into the ON SELECT rules of views as well, but those are really quite vestigial. The only thing we've used them for is to carry the view's relid forward to AcquireExecutorLocks (so that we can re-lock the view to verify it hasn't changed before re-using a plan) and to carry its relid and permissions data forward to execution-time permissions checks. What we can do instead of that is to retain these fields of the RTE_RELATION RTE for the view even after we convert it to an RTE_SUBQUERY RTE. This requires a tiny amount of extra complication in the planner and AcquireExecutorLocks, but on the other hand we can get rid of the logic that moves that data from one place to another. The principal immediate benefit of doing this, aside from a small saving in the pg_rewrite data for views, is that these pseudo-RTEs no longer trigger ruleutils.c's heuristic about qualifying variable names when the rangetable's length is more than 1. That results in quite a number of small simplifications in regression test outputs, which are all to the good IMO. Bump catversion because we need to dump a few more fields of RTE_SUBQUERY RTEs. While those will always be zeroes anyway in stored rules (because we'd never populate them until query rewrite) they are useful for debugging, and it seems like we'd better make sure to transmit such RTEs accurately in plans sent to parallel workers. I don't think the executor actually examines these fields after startup, but someday it might. This is a second attempt at committing 1b4d280ea. The difference from the first time is that now we can add some filtering rules to AdjustUpgrade.pm to allow cross-version upgrade testing to pass despite all the cosmetic changes in CREATE VIEW outputs. Amit Langote (filtering rules by me) Discussion: https://postgr.es/m/CA+HiwqEf7gPN4Hn+LoZ4tP2q_Qt7n3vw7-6fJKOf92tSEnX6Gg@mail.gmail.com Discussion: https://postgr.es/m/891521.1673657296@sss.pgh.pa.us
2023-01-18 19:23:57 +01:00
/* we re-use these RELATION fields, too: */
READ_OID_FIELD(relid);
break;
case RTE_RESULT:
/* no extra fields */
break;
default:
elog(ERROR, "unrecognized RTE kind: %d",
(int) local_node->rtekind);
break;
}
READ_BOOL_FIELD(lateral);
READ_BOOL_FIELD(inh);
READ_BOOL_FIELD(inFromCl);
READ_NODE_FIELD(securityQuals);
READ_DONE();
}
static A_Expr *
_readA_Expr(void)
{
READ_LOCALS(A_Expr);
token = pg_strtok(&length);
if (length == 3 && strncmp(token, "ANY", 3) == 0)
{
local_node->kind = AEXPR_OP_ANY;
READ_NODE_FIELD(name);
}
else if (length == 3 && strncmp(token, "ALL", 3) == 0)
{
local_node->kind = AEXPR_OP_ALL;
READ_NODE_FIELD(name);
}
else if (length == 8 && strncmp(token, "DISTINCT", 8) == 0)
{
local_node->kind = AEXPR_DISTINCT;
READ_NODE_FIELD(name);
}
else if (length == 12 && strncmp(token, "NOT_DISTINCT", 12) == 0)
{
local_node->kind = AEXPR_NOT_DISTINCT;
READ_NODE_FIELD(name);
}
else if (length == 6 && strncmp(token, "NULLIF", 6) == 0)
{
local_node->kind = AEXPR_NULLIF;
READ_NODE_FIELD(name);
}
else if (length == 2 && strncmp(token, "IN", 2) == 0)
{
local_node->kind = AEXPR_IN;
READ_NODE_FIELD(name);
}
else if (length == 4 && strncmp(token, "LIKE", 4) == 0)
{
local_node->kind = AEXPR_LIKE;
READ_NODE_FIELD(name);
}
else if (length == 5 && strncmp(token, "ILIKE", 5) == 0)
{
local_node->kind = AEXPR_ILIKE;
READ_NODE_FIELD(name);
}
else if (length == 7 && strncmp(token, "SIMILAR", 7) == 0)
{
local_node->kind = AEXPR_SIMILAR;
READ_NODE_FIELD(name);
}
else if (length == 7 && strncmp(token, "BETWEEN", 7) == 0)
{
local_node->kind = AEXPR_BETWEEN;
READ_NODE_FIELD(name);
}
else if (length == 11 && strncmp(token, "NOT_BETWEEN", 11) == 0)
{
local_node->kind = AEXPR_NOT_BETWEEN;
READ_NODE_FIELD(name);
}
else if (length == 11 && strncmp(token, "BETWEEN_SYM", 11) == 0)
{
local_node->kind = AEXPR_BETWEEN_SYM;
READ_NODE_FIELD(name);
}
else if (length == 15 && strncmp(token, "NOT_BETWEEN_SYM", 15) == 0)
{
local_node->kind = AEXPR_NOT_BETWEEN_SYM;
READ_NODE_FIELD(name);
}
else if (length == 5 && strncmp(token, ":name", 5) == 0)
{
local_node->kind = AEXPR_OP;
local_node->name = nodeRead(NULL, 0);
}
else
elog(ERROR, "unrecognized A_Expr kind: \"%.*s\"", length, token);
READ_NODE_FIELD(lexpr);
READ_NODE_FIELD(rexpr);
READ_LOCATION_FIELD(location);
READ_DONE();
}
static ExtensibleNode *
_readExtensibleNode(void)
{
const ExtensibleNodeMethods *methods;
ExtensibleNode *local_node;
const char *extnodename;
READ_TEMP_LOCALS();
token = pg_strtok(&length); /* skip :extnodename */
token = pg_strtok(&length); /* get extnodename */
extnodename = nullable_string(token, length);
if (!extnodename)
elog(ERROR, "extnodename has to be supplied");
methods = GetExtensibleNodeMethods(extnodename, false);
local_node = (ExtensibleNode *) newNode(methods->node_size,
T_ExtensibleNode);
local_node->extnodename = extnodename;
/* deserialize the private fields */
methods->nodeRead(local_node);
READ_DONE();
}
Implement table partitioning. Table partitioning is like table inheritance and reuses much of the existing infrastructure, but there are some important differences. The parent is called a partitioned table and is always empty; it may not have indexes or non-inherited constraints, since those make no sense for a relation with no data of its own. The children are called partitions and contain all of the actual data. Each partition has an implicit partitioning constraint. Multiple inheritance is not allowed, and partitioning and inheritance can't be mixed. Partitions can't have extra columns and may not allow nulls unless the parent does. Tuples inserted into the parent are automatically routed to the correct partition, so tuple-routing ON INSERT triggers are not needed. Tuple routing isn't yet supported for partitions which are foreign tables, and it doesn't handle updates that cross partition boundaries. Currently, tables can be range-partitioned or list-partitioned. List partitioning is limited to a single column, but range partitioning can involve multiple columns. A partitioning "column" can be an expression. Because table partitioning is less general than table inheritance, it is hoped that it will be easier to reason about properties of partitions, and therefore that this will serve as a better foundation for a variety of possible optimizations, including query planner optimizations. The tuple routing based which this patch does based on the implicit partitioning constraints is an example of this, but it seems likely that many other useful optimizations are also possible. Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat, Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova, Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
/*
* parseNodeString
*
* Given a character string representing a node tree, parseNodeString creates
* the internal node structure.
*
* The string to be read must already have been loaded into pg_strtok().
*/
Node *
parseNodeString(void)
{
READ_TEMP_LOCALS();
/* Guard against stack overflow due to overly complex expressions */
check_stack_depth();
token = pg_strtok(&length);
#define MATCH(tokname, namelen) \
(length == namelen && memcmp(token, tokname, namelen) == 0)
Automatically generate node support functions Add a script to automatically generate the node support functions (copy, equal, out, and read, as well as the node tags enum) from the struct definitions. For each of the four node support files, it creates two include files, e.g., copyfuncs.funcs.c and copyfuncs.switch.c, to include in the main file. All the scaffolding of the main file stays in place. I have tried to mostly make the coverage of the output match what is currently there. For example, one could now do out/read coverage of utility statement nodes, but I have manually excluded those for now. The reason is mainly that it's easier to diff the before and after, and adding a bunch of stuff like this might require a separate analysis and review. Subtyping (TidScan -> Scan) is supported. For the hard cases, you can just write a manual function and exclude generating one. For the not so hard cases, there is a way of annotating struct fields to get special behaviors. For example, pg_node_attr(equal_ignore) has the field ignored in equal functions. (In this patch, I have only ifdef'ed out the code to could be removed, mainly so that it won't constantly have merge conflicts. It will be deleted in a separate patch. All the code comments that are worth keeping from those sections have already been moved to the header files where the structs are defined.) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce%40enterprisedb.com
2022-07-09 08:52:19 +02:00
#include "readfuncs.switch.c"
elog(ERROR, "badly formatted node string \"%.32s\"...", token);
return NULL; /* keep compiler quiet */
}
/*
* readDatum
*
* Given a string representation of a constant, recreate the appropriate
* Datum. The string representation embeds length info, but not byValue,
* so we must be told that.
*/
Datum
readDatum(bool typbyval)
{
Size length,
i;
int tokenLength;
2018-09-18 23:11:54 +02:00
const char *token;
Datum res;
char *s;
/*
* read the actual length of the value
*/
token = pg_strtok(&tokenLength);
length = atoui(token);
token = pg_strtok(&tokenLength); /* read the '[' */
if (token == NULL || token[0] != '[')
elog(ERROR, "expected \"[\" to start datum, but got \"%s\"; length = %zu",
2018-09-18 23:11:54 +02:00
token ? token : "[NULL]", length);
if (typbyval)
{
if (length > (Size) sizeof(Datum))
elog(ERROR, "byval datum but length = %zu", length);
res = (Datum) 0;
s = (char *) (&res);
for (i = 0; i < (Size) sizeof(Datum); i++)
{
token = pg_strtok(&tokenLength);
s[i] = (char) atoi(token);
}
}
else if (length <= 0)
res = (Datum) NULL;
else
{
s = (char *) palloc(length);
for (i = 0; i < length; i++)
{
token = pg_strtok(&tokenLength);
s[i] = (char) atoi(token);
}
res = PointerGetDatum(s);
}
token = pg_strtok(&tokenLength); /* read the ']' */
if (token == NULL || token[0] != ']')
elog(ERROR, "expected \"]\" to end datum, but got \"%s\"; length = %zu",
2018-09-18 23:11:54 +02:00
token ? token : "[NULL]", length);
return res;
}
/*
Make serialization of Nodes' scalar-array fields more robust. When the ability to print variable-length-array fields was first added to outfuncs.c, there was no corresponding read capability, as it was used only for debug dumps of planner-internal Nodes. Not a lot of thought seems to have been put into the output format: it's just the space-separated array elements and nothing else. Later such fields appeared in Plan nodes, and still later we grew read support so that Plans could be transferred to parallel workers, but the original text format wasn't rethought. It seems inadequate to me because (a) no cross-check is possible that we got the right number of array entries, (b) we can't tell the difference between a NULL pointer and a zero-length array, and (c) except for WRITE_INDEX_ARRAY, we'd crash if a non-zero length is specified when the pointer is NULL, a situation that can arise in some fields that we currently conveniently avoid printing. Since we're currently in a campaign to make the Node infrastructure generally more it-just-works-without-thinking-about-it, now seems like a good time to improve this. Let's adopt a format similar to that used for Lists, that is "<>" for a NULL pointer or "(item item item)" otherwise. Also retool the code to not have so many copies of the identical logic. I bumped catversion out of an abundance of caution, although I think that we don't use any such array fields in Nodes that can get into the catalogs. Discussion: https://postgr.es/m/1528424.1658272135@sss.pgh.pa.us
2022-07-20 19:04:33 +02:00
* common implementation for scalar-array-reading functions
*
* The data format is either "<>" for a NULL pointer (in which case numCols
* is ignored) or "(item item item)" where the number of items must equal
* numCols. The convfunc must be okay with stopping at whitespace or a
* right parenthesis, since pg_strtok won't null-terminate the token.
*/
Make serialization of Nodes' scalar-array fields more robust. When the ability to print variable-length-array fields was first added to outfuncs.c, there was no corresponding read capability, as it was used only for debug dumps of planner-internal Nodes. Not a lot of thought seems to have been put into the output format: it's just the space-separated array elements and nothing else. Later such fields appeared in Plan nodes, and still later we grew read support so that Plans could be transferred to parallel workers, but the original text format wasn't rethought. It seems inadequate to me because (a) no cross-check is possible that we got the right number of array entries, (b) we can't tell the difference between a NULL pointer and a zero-length array, and (c) except for WRITE_INDEX_ARRAY, we'd crash if a non-zero length is specified when the pointer is NULL, a situation that can arise in some fields that we currently conveniently avoid printing. Since we're currently in a campaign to make the Node infrastructure generally more it-just-works-without-thinking-about-it, now seems like a good time to improve this. Let's adopt a format similar to that used for Lists, that is "<>" for a NULL pointer or "(item item item)" otherwise. Also retool the code to not have so many copies of the identical logic. I bumped catversion out of an abundance of caution, although I think that we don't use any such array fields in Nodes that can get into the catalogs. Discussion: https://postgr.es/m/1528424.1658272135@sss.pgh.pa.us
2022-07-20 19:04:33 +02:00
#define READ_SCALAR_ARRAY(fnname, datatype, convfunc) \
datatype * \
fnname(int numCols) \
{ \
datatype *vals; \
READ_TEMP_LOCALS(); \
token = pg_strtok(&length); \
if (token == NULL) \
elog(ERROR, "incomplete scalar array"); \
if (length == 0) \
return NULL; /* it was "<>", so return NULL pointer */ \
if (length != 1 || token[0] != '(') \
elog(ERROR, "unrecognized token: \"%.*s\"", length, token); \
vals = (datatype *) palloc(numCols * sizeof(datatype)); \
for (int i = 0; i < numCols; i++) \
{ \
token = pg_strtok(&length); \
if (token == NULL || token[0] == ')') \
elog(ERROR, "incomplete scalar array"); \
vals[i] = convfunc(token); \
} \
token = pg_strtok(&length); \
if (token == NULL || length != 1 || token[0] != ')') \
elog(ERROR, "incomplete scalar array"); \
return vals; \
}
/*
Make serialization of Nodes' scalar-array fields more robust. When the ability to print variable-length-array fields was first added to outfuncs.c, there was no corresponding read capability, as it was used only for debug dumps of planner-internal Nodes. Not a lot of thought seems to have been put into the output format: it's just the space-separated array elements and nothing else. Later such fields appeared in Plan nodes, and still later we grew read support so that Plans could be transferred to parallel workers, but the original text format wasn't rethought. It seems inadequate to me because (a) no cross-check is possible that we got the right number of array entries, (b) we can't tell the difference between a NULL pointer and a zero-length array, and (c) except for WRITE_INDEX_ARRAY, we'd crash if a non-zero length is specified when the pointer is NULL, a situation that can arise in some fields that we currently conveniently avoid printing. Since we're currently in a campaign to make the Node infrastructure generally more it-just-works-without-thinking-about-it, now seems like a good time to improve this. Let's adopt a format similar to that used for Lists, that is "<>" for a NULL pointer or "(item item item)" otherwise. Also retool the code to not have so many copies of the identical logic. I bumped catversion out of an abundance of caution, although I think that we don't use any such array fields in Nodes that can get into the catalogs. Discussion: https://postgr.es/m/1528424.1658272135@sss.pgh.pa.us
2022-07-20 19:04:33 +02:00
* Note: these functions are exported in nodes.h for possible use by
* extensions, so don't mess too much with their names or API.
*/
Make serialization of Nodes' scalar-array fields more robust. When the ability to print variable-length-array fields was first added to outfuncs.c, there was no corresponding read capability, as it was used only for debug dumps of planner-internal Nodes. Not a lot of thought seems to have been put into the output format: it's just the space-separated array elements and nothing else. Later such fields appeared in Plan nodes, and still later we grew read support so that Plans could be transferred to parallel workers, but the original text format wasn't rethought. It seems inadequate to me because (a) no cross-check is possible that we got the right number of array entries, (b) we can't tell the difference between a NULL pointer and a zero-length array, and (c) except for WRITE_INDEX_ARRAY, we'd crash if a non-zero length is specified when the pointer is NULL, a situation that can arise in some fields that we currently conveniently avoid printing. Since we're currently in a campaign to make the Node infrastructure generally more it-just-works-without-thinking-about-it, now seems like a good time to improve this. Let's adopt a format similar to that used for Lists, that is "<>" for a NULL pointer or "(item item item)" otherwise. Also retool the code to not have so many copies of the identical logic. I bumped catversion out of an abundance of caution, although I think that we don't use any such array fields in Nodes that can get into the catalogs. Discussion: https://postgr.es/m/1528424.1658272135@sss.pgh.pa.us
2022-07-20 19:04:33 +02:00
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
/* outfuncs.c has writeIndexCols, but we don't yet need that here */
/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)