1997-08-31 13:40:13 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
1999-02-14 00:22:53 +01:00
|
|
|
* trigger.c
|
1997-08-31 13:40:13 +02:00
|
|
|
* PostgreSQL TRIGGERs support code.
|
|
|
|
*
|
2022-01-08 01:04:57 +01:00
|
|
|
* Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
|
2000-01-31 05:35:57 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/commands/trigger.c
|
2000-01-31 05:35:57 +01:00
|
|
|
*
|
1997-08-31 13:40:13 +02:00
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
|
2019-12-27 00:09:00 +01:00
|
|
|
#include "access/genam.h"
|
2012-08-30 22:15:44 +02:00
|
|
|
#include "access/htup_details.h"
|
2019-03-25 08:13:42 +01:00
|
|
|
#include "access/relation.h"
|
|
|
|
#include "access/sysattr.h"
|
|
|
|
#include "access/table.h"
|
|
|
|
#include "access/tableam.h"
|
2003-03-31 22:47:51 +02:00
|
|
|
#include "access/xact.h"
|
1997-09-04 15:19:01 +02:00
|
|
|
#include "catalog/catalog.h"
|
2002-07-12 20:43:19 +02:00
|
|
|
#include "catalog/dependency.h"
|
2018-03-23 14:48:22 +01:00
|
|
|
#include "catalog/index.h"
|
1997-09-01 09:59:06 +02:00
|
|
|
#include "catalog/indexing.h"
|
2010-11-25 17:48:49 +01:00
|
|
|
#include "catalog/objectaccess.h"
|
2018-04-15 02:12:14 +02:00
|
|
|
#include "catalog/partition.h"
|
2007-02-14 02:58:58 +01:00
|
|
|
#include "catalog/pg_constraint.h"
|
2018-04-08 20:35:29 +02:00
|
|
|
#include "catalog/pg_inherits.h"
|
1998-04-27 06:08:07 +02:00
|
|
|
#include "catalog/pg_proc.h"
|
1997-08-31 13:40:13 +02:00
|
|
|
#include "catalog/pg_trigger.h"
|
2002-08-22 02:01:51 +02:00
|
|
|
#include "catalog/pg_type.h"
|
2006-04-27 02:33:46 +02:00
|
|
|
#include "commands/dbcommands.h"
|
2002-09-21 20:39:26 +02:00
|
|
|
#include "commands/defrem.h"
|
1998-04-27 06:08:07 +02:00
|
|
|
#include "commands/trigger.h"
|
1999-01-29 10:23:17 +01:00
|
|
|
#include "executor/executor.h"
|
2020-10-19 13:11:54 +02:00
|
|
|
#include "executor/execPartition.h"
|
1997-09-04 15:19:01 +02:00
|
|
|
#include "miscadmin.h"
|
2009-10-15 00:14:25 +02:00
|
|
|
#include "nodes/bitmapset.h"
|
2002-10-03 23:06:23 +02:00
|
|
|
#include "nodes/makefuncs.h"
|
2019-01-29 21:48:51 +01:00
|
|
|
#include "optimizer/optimizer.h"
|
2009-11-20 21:38:12 +01:00
|
|
|
#include "parser/parse_clause.h"
|
2011-04-07 08:34:57 +02:00
|
|
|
#include "parser/parse_collate.h"
|
2002-04-09 22:35:55 +02:00
|
|
|
#include "parser/parse_func.h"
|
2009-10-15 00:14:25 +02:00
|
|
|
#include "parser/parse_relation.h"
|
|
|
|
#include "parser/parsetree.h"
|
2019-02-21 17:38:54 +01:00
|
|
|
#include "partitioning/partdesc.h"
|
2008-05-15 02:17:41 +02:00
|
|
|
#include "pgstat.h"
|
2009-11-20 21:38:12 +01:00
|
|
|
#include "rewrite/rewriteManip.h"
|
2008-05-12 02:00:54 +02:00
|
|
|
#include "storage/bufmgr.h"
|
Avoid repeated name lookups during table and index DDL.
If the name lookups come to different conclusions due to concurrent
activity, we might perform some parts of the DDL on a different table
than other parts. At least in the case of CREATE INDEX, this can be
used to cause the permissions checks to be performed against a
different table than the index creation, allowing for a privilege
escalation attack.
This changes the calling convention for DefineIndex, CreateTrigger,
transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible
(in 9.2 and newer), and AlterTable (in 9.1 and older). In addition,
CheckRelationOwnership is removed in 9.2 and newer and the calling
convention is changed in older branches. A field has also been added
to the Constraint node (FkConstraint in 8.4). Third-party code calling
these functions or using the Constraint node will require updating.
Report by Andres Freund. Patch by Robert Haas and Andres Freund,
reviewed by Tom Lane.
Security: CVE-2014-0062
2014-02-17 15:33:31 +01:00
|
|
|
#include "storage/lmgr.h"
|
2007-11-04 02:16:19 +01:00
|
|
|
#include "tcop/utility.h"
|
1997-09-04 15:19:01 +02:00
|
|
|
#include "utils/acl.h"
|
1999-07-16 07:00:38 +02:00
|
|
|
#include "utils/builtins.h"
|
2009-08-04 18:08:37 +02:00
|
|
|
#include "utils/bytea.h"
|
2000-05-28 19:56:29 +02:00
|
|
|
#include "utils/fmgroids.h"
|
Split up guc.c for better build speed and ease of maintenance.
guc.c has grown to be one of our largest .c files, making it
a bottleneck for compilation. It's also acquired a bunch of
knowledge that'd be better kept elsewhere, because of our not
very good habit of putting variable-specific check hooks here.
Hence, split it up along these lines:
* guc.c itself retains just the core GUC housekeeping mechanisms.
* New file guc_funcs.c contains the SET/SHOW interfaces and some
SQL-accessible functions for GUC manipulation.
* New file guc_tables.c contains the data arrays that define the
built-in GUC variables, along with some already-exported constant
tables.
* GUC check/assign/show hook functions are moved to the variable's
home module, whenever that's clearly identifiable. A few hard-
to-classify hooks ended up in commands/variable.c, which was
already a home for miscellaneous GUC hook functions.
To avoid cluttering a lot more header files with #include "guc.h",
I also invented a new header file utils/guc_hooks.h and put all
the GUC hook functions' declarations there, regardless of their
originating module. That allowed removal of #include "guc.h"
from some existing headers. The fallout from that (hopefully
all caught here) demonstrates clearly why such inclusions are
best minimized: there are a lot of files that, for example,
were getting array.h at two or more levels of remove, despite
not having any connection at all to GUCs in themselves.
There is some very minor code beautification here, such as
renaming a couple of inconsistently-named hook functions
and improving some comments. But mostly this just moves
code from point A to point B and deals with the ensuing
needs for #include adjustments and exporting a few functions
that previously weren't exported.
Patch by me, per a suggestion from Andres Freund; thanks also
to Michael Paquier for the idea to invent guc_funcs.c.
Discussion: https://postgr.es/m/587607.1662836699@sss.pgh.pa.us
2022-09-13 17:05:07 +02:00
|
|
|
#include "utils/guc_hooks.h"
|
1999-07-16 07:00:38 +02:00
|
|
|
#include "utils/inval.h"
|
2002-03-29 23:10:34 +01:00
|
|
|
#include "utils/lsyscache.h"
|
2005-05-06 19:24:55 +02:00
|
|
|
#include "utils/memutils.h"
|
Split up guc.c for better build speed and ease of maintenance.
guc.c has grown to be one of our largest .c files, making it
a bottleneck for compilation. It's also acquired a bunch of
knowledge that'd be better kept elsewhere, because of our not
very good habit of putting variable-specific check hooks here.
Hence, split it up along these lines:
* guc.c itself retains just the core GUC housekeeping mechanisms.
* New file guc_funcs.c contains the SET/SHOW interfaces and some
SQL-accessible functions for GUC manipulation.
* New file guc_tables.c contains the data arrays that define the
built-in GUC variables, along with some already-exported constant
tables.
* GUC check/assign/show hook functions are moved to the variable's
home module, whenever that's clearly identifiable. A few hard-
to-classify hooks ended up in commands/variable.c, which was
already a home for miscellaneous GUC hook functions.
To avoid cluttering a lot more header files with #include "guc.h",
I also invented a new header file utils/guc_hooks.h and put all
the GUC hook functions' declarations there, regardless of their
originating module. That allowed removal of #include "guc.h"
from some existing headers. The fallout from that (hopefully
all caught here) demonstrates clearly why such inclusions are
best minimized: there are a lot of files that, for example,
were getting array.h at two or more levels of remove, despite
not having any connection at all to GUCs in themselves.
There is some very minor code beautification here, such as
renaming a couple of inconsistently-named hook functions
and improving some comments. But mostly this just moves
code from point A to point B and deals with the ensuing
needs for #include adjustments and exporting a few functions
that previously weren't exported.
Patch by me, per a suggestion from Andres Freund; thanks also
to Michael Paquier for the idea to invent guc_funcs.c.
Discussion: https://postgr.es/m/587607.1662836699@sss.pgh.pa.us
2022-09-13 17:05:07 +02:00
|
|
|
#include "utils/plancache.h"
|
2011-02-23 18:18:09 +01:00
|
|
|
#include "utils/rel.h"
|
2008-03-26 19:48:59 +01:00
|
|
|
#include "utils/snapmgr.h"
|
1999-07-16 07:00:38 +02:00
|
|
|
#include "utils/syscache.h"
|
2014-03-23 07:16:34 +01:00
|
|
|
#include "utils/tuplestore.h"
|
1997-09-04 15:19:01 +02:00
|
|
|
|
1997-09-01 09:59:06 +02:00
|
|
|
|
2007-11-04 02:16:19 +01:00
|
|
|
/* GUC variables */
|
2007-11-16 00:23:44 +01:00
|
|
|
int SessionReplicationRole = SESSION_REPLICATION_ROLE_ORIGIN;
|
2007-11-04 02:16:19 +01:00
|
|
|
|
2012-01-25 17:15:29 +01:00
|
|
|
/* How many levels deep into trigger execution are we? */
|
|
|
|
static int MyTriggerDepth = 0;
|
2007-11-04 02:16:19 +01:00
|
|
|
|
|
|
|
/* Local function prototypes */
|
2021-07-23 00:33:47 +02:00
|
|
|
static void renametrig_internal(Relation tgrel, Relation targetrel,
|
|
|
|
HeapTuple trigtup, const char *newname,
|
|
|
|
const char *expected_name);
|
|
|
|
static void renametrig_partition(Relation tgrel, Oid partitionId,
|
|
|
|
Oid parentTriggerOid, const char *newname,
|
|
|
|
const char *expected_name);
|
2010-10-10 19:43:33 +02:00
|
|
|
static void SetTriggerFlags(TriggerDesc *trigdesc, Trigger *trigger);
|
2019-02-27 05:30:28 +01:00
|
|
|
static bool GetTupleForTrigger(EState *estate,
|
Re-implement EvalPlanQual processing to improve its performance and eliminate
a lot of strange behaviors that occurred in join cases. We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row. The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.
Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested. To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param. Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.
This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the
duplicate-output-tuple problem. It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
2009-10-26 03:26:45 +01:00
|
|
|
EPQState *epqstate,
|
2003-03-27 15:33:11 +01:00
|
|
|
ResultRelInfo *relinfo,
|
|
|
|
ItemPointer tid,
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
LockTupleMode lockmode,
|
2019-02-27 05:30:28 +01:00
|
|
|
TupleTableSlot *oldslot,
|
2022-09-20 22:09:30 +02:00
|
|
|
TupleTableSlot **epqslot,
|
|
|
|
TM_FailureData *tmfdp);
|
2009-11-20 21:38:12 +01:00
|
|
|
static bool TriggerEnabled(EState *estate, ResultRelInfo *relinfo,
|
|
|
|
Trigger *trigger, TriggerEvent event,
|
|
|
|
Bitmapset *modifiedCols,
|
2019-02-27 05:30:28 +01:00
|
|
|
TupleTableSlot *oldslot, TupleTableSlot *newslot);
|
2001-06-01 04:41:36 +02:00
|
|
|
static HeapTuple ExecCallTriggerFunc(TriggerData *trigdata,
|
2005-03-25 22:58:00 +01:00
|
|
|
int tgindx,
|
2001-06-01 04:41:36 +02:00
|
|
|
FmgrInfo *finfo,
|
2005-03-25 22:58:00 +01:00
|
|
|
Instrumentation *instr,
|
2001-06-01 04:41:36 +02:00
|
|
|
MemoryContext per_tuple_context);
|
2009-11-20 21:38:12 +01:00
|
|
|
static void AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
ResultRelInfo *src_partinfo,
|
|
|
|
ResultRelInfo *dst_partinfo,
|
2009-11-20 21:38:12 +01:00
|
|
|
int event, bool row_trigger,
|
2022-09-20 22:09:30 +02:00
|
|
|
TupleTableSlot *oldslot, TupleTableSlot *newslot,
|
2017-06-28 19:55:03 +02:00
|
|
|
List *recheckIndexes, Bitmapset *modifiedCols,
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
TransitionCaptureState *transition_capture,
|
|
|
|
bool is_crosspart_update);
|
2014-10-23 18:33:02 +02:00
|
|
|
static void AfterTriggerEnlargeQueryState(void);
|
2017-09-17 18:16:38 +02:00
|
|
|
static bool before_stmt_triggers_fired(Oid relid, CmdType cmdType);
|
2000-06-09 00:38:00 +02:00
|
|
|
|
1997-09-04 15:19:01 +02:00
|
|
|
|
2002-07-20 21:55:38 +02:00
|
|
|
/*
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
* Create a trigger. Returns the address of the created trigger.
|
2002-07-20 21:55:38 +02:00
|
|
|
*
|
2009-11-20 21:38:12 +01:00
|
|
|
* queryString is the source text of the CREATE TRIGGER command.
|
|
|
|
* This must be supplied if a whenClause is specified, else it can be NULL.
|
|
|
|
*
|
Avoid repeated name lookups during table and index DDL.
If the name lookups come to different conclusions due to concurrent
activity, we might perform some parts of the DDL on a different table
than other parts. At least in the case of CREATE INDEX, this can be
used to cause the permissions checks to be performed against a
different table than the index creation, allowing for a privilege
escalation attack.
This changes the calling convention for DefineIndex, CreateTrigger,
transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible
(in 9.2 and newer), and AlterTable (in 9.1 and older). In addition,
CheckRelationOwnership is removed in 9.2 and newer and the calling
convention is changed in older branches. A field has also been added
to the Constraint node (FkConstraint in 8.4). Third-party code calling
these functions or using the Constraint node will require updating.
Report by Andres Freund. Patch by Robert Haas and Andres Freund,
reviewed by Tom Lane.
Security: CVE-2014-0062
2014-02-17 15:33:31 +01:00
|
|
|
* relOid, if nonzero, is the relation on which the trigger should be
|
|
|
|
* created. If zero, the name provided in the statement will be looked up.
|
|
|
|
*
|
|
|
|
* refRelOid, if nonzero, is the relation to which the constraint trigger
|
|
|
|
* refers. If zero, the constraint relation name provided in the statement
|
|
|
|
* will be looked up as needed.
|
|
|
|
*
|
2007-11-04 02:16:19 +01:00
|
|
|
* constraintOid, if nonzero, says that this trigger is being created
|
|
|
|
* internally to implement that constraint. A suitable pg_depend entry will
|
2007-11-16 00:23:44 +01:00
|
|
|
* be made to link the trigger to that constraint. constraintOid is zero when
|
2010-01-17 23:56:23 +01:00
|
|
|
* executing a user-entered CREATE TRIGGER command. (For CREATE CONSTRAINT
|
|
|
|
* TRIGGER, we build a pg_constraint entry internally.)
|
2007-11-04 02:16:19 +01:00
|
|
|
*
|
2009-07-28 04:56:31 +02:00
|
|
|
* indexOid, if nonzero, is the OID of an index associated with the constraint.
|
2018-03-23 14:48:22 +01:00
|
|
|
* We do nothing with this except store it into pg_trigger.tgconstrindid;
|
|
|
|
* but when creating a trigger for a deferrable unique constraint on a
|
|
|
|
* partitioned table, its children are looked up. Note we don't cope with
|
|
|
|
* invalid indexes in that case.
|
|
|
|
*
|
|
|
|
* funcoid, if nonzero, is the OID of the function to invoke. When this is
|
|
|
|
* given, stmt->funcname is ignored.
|
|
|
|
*
|
|
|
|
* parentTriggerOid, if nonzero, is a trigger that begets this one; so that
|
2022-01-05 23:00:13 +01:00
|
|
|
* if that trigger is dropped, this one should be too. There are two cases
|
|
|
|
* when a nonzero value is passed for this: 1) when this function recurses to
|
|
|
|
* create the trigger on partitions, 2) when creating child foreign key
|
|
|
|
* triggers; see CreateFKCheckTrigger() and createForeignKeyActionTriggers().
|
2018-03-23 14:48:22 +01:00
|
|
|
*
|
|
|
|
* If whenClause is passed, it is an already-transformed expression for
|
|
|
|
* WHEN. In this case, we ignore any that may come in stmt->whenClause.
|
2009-07-28 04:56:31 +02:00
|
|
|
*
|
2010-01-17 23:56:23 +01:00
|
|
|
* If isInternal is true then this is an internally-generated trigger.
|
|
|
|
* This argument sets the tgisinternal field of the pg_trigger entry, and
|
2017-08-16 06:22:32 +02:00
|
|
|
* if true causes us to modify the given trigger name to ensure uniqueness.
|
2009-07-29 22:56:21 +02:00
|
|
|
*
|
2010-01-17 23:56:23 +01:00
|
|
|
* When isInternal is not true we require ACL_TRIGGER permissions on the
|
2012-02-23 21:38:56 +01:00
|
|
|
* relation, as well as ACL_EXECUTE on the trigger function. For internal
|
|
|
|
* triggers the caller must apply any required permission checks.
|
2009-01-22 21:16:10 +01:00
|
|
|
*
|
2018-03-23 14:48:22 +01:00
|
|
|
* When called on partitioned tables, this function recurses to create the
|
|
|
|
* trigger on all the partitions, except if isInternal is true, in which
|
2020-11-14 23:05:34 +01:00
|
|
|
* case caller is expected to execute recursion on its own. in_partition
|
|
|
|
* indicates such a recursive call; outside callers should pass "false"
|
|
|
|
* (but see CloneRowTriggersToPartition).
|
2002-07-20 21:55:38 +02:00
|
|
|
*/
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddress
|
2009-11-20 21:38:12 +01:00
|
|
|
CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
|
Avoid repeated name lookups during table and index DDL.
If the name lookups come to different conclusions due to concurrent
activity, we might perform some parts of the DDL on a different table
than other parts. At least in the case of CREATE INDEX, this can be
used to cause the permissions checks to be performed against a
different table than the index creation, allowing for a privilege
escalation attack.
This changes the calling convention for DefineIndex, CreateTrigger,
transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible
(in 9.2 and newer), and AlterTable (in 9.1 and older). In addition,
CheckRelationOwnership is removed in 9.2 and newer and the calling
convention is changed in older branches. A field has also been added
to the Constraint node (FkConstraint in 8.4). Third-party code calling
these functions or using the Constraint node will require updating.
Report by Andres Freund. Patch by Robert Haas and Andres Freund,
reviewed by Tom Lane.
Security: CVE-2014-0062
2014-02-17 15:33:31 +01:00
|
|
|
Oid relOid, Oid refRelOid, Oid constraintOid, Oid indexOid,
|
2018-03-23 14:48:22 +01:00
|
|
|
Oid funcoid, Oid parentTriggerOid, Node *whenClause,
|
|
|
|
bool isInternal, bool in_partition)
|
2021-07-16 19:01:43 +02:00
|
|
|
{
|
|
|
|
return
|
|
|
|
CreateTriggerFiringOn(stmt, queryString, relOid, refRelOid,
|
|
|
|
constraintOid, indexOid, funcoid,
|
|
|
|
parentTriggerOid, whenClause, isInternal,
|
|
|
|
in_partition, TRIGGER_FIRES_ON_ORIGIN);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Like the above; additionally the firing condition
|
|
|
|
* (always/origin/replica/disabled) can be specified.
|
|
|
|
*/
|
|
|
|
ObjectAddress
|
|
|
|
CreateTriggerFiringOn(CreateTrigStmt *stmt, const char *queryString,
|
|
|
|
Oid relOid, Oid refRelOid, Oid constraintOid,
|
|
|
|
Oid indexOid, Oid funcoid, Oid parentTriggerOid,
|
|
|
|
Node *whenClause, bool isInternal, bool in_partition,
|
|
|
|
char trigger_fires_when)
|
1997-08-31 13:40:13 +02:00
|
|
|
{
|
1997-09-04 15:19:01 +02:00
|
|
|
int16 tgtype;
|
2009-10-15 00:14:25 +02:00
|
|
|
int ncolumns;
|
2012-06-25 00:51:46 +02:00
|
|
|
int16 *columns;
|
2005-03-29 02:17:27 +02:00
|
|
|
int2vector *tgattr;
|
2009-11-20 21:38:12 +01:00
|
|
|
List *whenRtable;
|
|
|
|
char *qual;
|
1997-09-04 15:19:01 +02:00
|
|
|
Datum values[Natts_pg_trigger];
|
2008-11-02 02:45:28 +01:00
|
|
|
bool nulls[Natts_pg_trigger];
|
1997-09-04 15:19:01 +02:00
|
|
|
Relation rel;
|
2002-04-27 05:45:03 +02:00
|
|
|
AclResult aclresult;
|
1997-09-04 15:19:01 +02:00
|
|
|
Relation tgrel;
|
|
|
|
Relation pgrel;
|
2020-11-14 23:05:34 +01:00
|
|
|
HeapTuple tuple = NULL;
|
2002-09-21 20:39:26 +02:00
|
|
|
Oid funcrettype;
|
2020-11-14 23:05:34 +01:00
|
|
|
Oid trigoid = InvalidOid;
|
2010-01-17 23:56:23 +01:00
|
|
|
char internaltrigname[NAMEDATALEN];
|
2002-07-12 20:43:19 +02:00
|
|
|
char *trigname;
|
2002-10-03 23:06:23 +02:00
|
|
|
Oid constrrelid = InvalidOid;
|
2002-07-12 20:43:19 +02:00
|
|
|
ObjectAddress myself,
|
|
|
|
referenced;
|
2016-11-04 16:49:50 +01:00
|
|
|
char *oldtablename = NULL;
|
|
|
|
char *newtablename = NULL;
|
2018-03-23 14:48:22 +01:00
|
|
|
bool partition_recurse;
|
2020-11-14 23:05:34 +01:00
|
|
|
bool trigger_exists = false;
|
|
|
|
Oid existing_constraint_oid = InvalidOid;
|
|
|
|
bool existing_isInternal = false;
|
2022-01-05 23:00:13 +01:00
|
|
|
bool existing_isClone = false;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
Avoid repeated name lookups during table and index DDL.
If the name lookups come to different conclusions due to concurrent
activity, we might perform some parts of the DDL on a different table
than other parts. At least in the case of CREATE INDEX, this can be
used to cause the permissions checks to be performed against a
different table than the index creation, allowing for a privilege
escalation attack.
This changes the calling convention for DefineIndex, CreateTrigger,
transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible
(in 9.2 and newer), and AlterTable (in 9.1 and older). In addition,
CheckRelationOwnership is removed in 9.2 and newer and the calling
convention is changed in older branches. A field has also been added
to the Constraint node (FkConstraint in 8.4). Third-party code calling
these functions or using the Constraint node will require updating.
Report by Andres Freund. Patch by Robert Haas and Andres Freund,
reviewed by Tom Lane.
Security: CVE-2014-0062
2014-02-17 15:33:31 +01:00
|
|
|
if (OidIsValid(relOid))
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_open(relOid, ShareRowExclusiveLock);
|
Avoid repeated name lookups during table and index DDL.
If the name lookups come to different conclusions due to concurrent
activity, we might perform some parts of the DDL on a different table
than other parts. At least in the case of CREATE INDEX, this can be
used to cause the permissions checks to be performed against a
different table than the index creation, allowing for a privilege
escalation attack.
This changes the calling convention for DefineIndex, CreateTrigger,
transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible
(in 9.2 and newer), and AlterTable (in 9.1 and older). In addition,
CheckRelationOwnership is removed in 9.2 and newer and the calling
convention is changed in older branches. A field has also been added
to the Constraint node (FkConstraint in 8.4). Third-party code calling
these functions or using the Constraint node will require updating.
Report by Andres Freund. Patch by Robert Haas and Andres Freund,
reviewed by Tom Lane.
Security: CVE-2014-0062
2014-02-17 15:33:31 +01:00
|
|
|
else
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_openrv(stmt->relation, ShareRowExclusiveLock);
|
2002-03-22 00:27:25 +01:00
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
/*
|
|
|
|
* Triggers must be on tables or views, and there are additional
|
|
|
|
* relation-type-specific restrictions.
|
|
|
|
*/
|
2018-03-23 14:48:22 +01:00
|
|
|
if (rel->rd_rel->relkind == RELKIND_RELATION)
|
2010-10-10 19:43:33 +02:00
|
|
|
{
|
|
|
|
/* Tables can't have INSTEAD OF triggers */
|
|
|
|
if (stmt->timing != TRIGGER_TYPE_BEFORE &&
|
|
|
|
stmt->timing != TRIGGER_TYPE_AFTER)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
|
|
|
errmsg("\"%s\" is a table",
|
|
|
|
RelationGetRelationName(rel)),
|
|
|
|
errdetail("Tables cannot have INSTEAD OF triggers.")));
|
2018-03-23 14:48:22 +01:00
|
|
|
}
|
|
|
|
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
|
|
|
|
{
|
|
|
|
/* Partitioned tables can't have INSTEAD OF triggers */
|
|
|
|
if (stmt->timing != TRIGGER_TYPE_BEFORE &&
|
|
|
|
stmt->timing != TRIGGER_TYPE_AFTER)
|
Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own. The children are called
partitions and contain all of the actual data. Each partition has an
implicit partitioning constraint. Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed. Partitions
can't have extra columns and may not allow nulls unless the parent
does. Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
Currently, tables can be range-partitioned or list-partitioned. List
partitioning is limited to a single column, but range partitioning can
involve multiple columns. A partitioning "column" can be an
expression.
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations. The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
2018-03-23 14:48:22 +01:00
|
|
|
errmsg("\"%s\" is a table",
|
Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own. The children are called
partitions and contain all of the actual data. Each partition has an
implicit partitioning constraint. Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed. Partitions
can't have extra columns and may not allow nulls unless the parent
does. Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
Currently, tables can be range-partitioned or list-partitioned. List
partitioning is limited to a single column, but range partitioning can
involve multiple columns. A partitioning "column" can be an
expression.
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations. The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
|
|
|
RelationGetRelationName(rel)),
|
2018-03-23 14:48:22 +01:00
|
|
|
errdetail("Tables cannot have INSTEAD OF triggers.")));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* FOR EACH ROW triggers have further restrictions
|
|
|
|
*/
|
|
|
|
if (stmt->row)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Disallow use of transition tables.
|
|
|
|
*
|
|
|
|
* Note that we have another restriction about transition tables
|
|
|
|
* in partitions; search for 'has_superclass' below for an
|
|
|
|
* explanation. The check here is just to protect from the fact
|
|
|
|
* that if we allowed it here, the creation would succeed for a
|
|
|
|
* partitioned table with no partitions, but would be blocked by
|
|
|
|
* the other restriction when the first partition was created,
|
|
|
|
* which is very unfriendly behavior.
|
|
|
|
*/
|
|
|
|
if (stmt->transitionRels != NIL)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("\"%s\" is a partitioned table",
|
|
|
|
RelationGetRelationName(rel)),
|
2022-11-04 11:15:00 +01:00
|
|
|
errdetail("ROW triggers with transition tables are not supported on partitioned tables.")));
|
2018-03-23 14:48:22 +01:00
|
|
|
}
|
2010-10-10 19:43:33 +02:00
|
|
|
}
|
|
|
|
else if (rel->rd_rel->relkind == RELKIND_VIEW)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Views can have INSTEAD OF triggers (which we check below are
|
|
|
|
* row-level), or statement-level BEFORE/AFTER triggers.
|
|
|
|
*/
|
|
|
|
if (stmt->timing != TRIGGER_TYPE_INSTEAD && stmt->row)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
|
|
|
errmsg("\"%s\" is a view",
|
|
|
|
RelationGetRelationName(rel)),
|
|
|
|
errdetail("Views cannot have row-level BEFORE or AFTER triggers.")));
|
|
|
|
/* Disallow TRUNCATE triggers on VIEWs */
|
|
|
|
if (TRIGGER_FOR_TRUNCATE(stmt->events))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
|
|
|
errmsg("\"%s\" is a view",
|
|
|
|
RelationGetRelationName(rel)),
|
|
|
|
errdetail("Views cannot have TRUNCATE triggers.")));
|
|
|
|
}
|
2014-03-23 07:16:34 +01:00
|
|
|
else if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
|
|
|
|
{
|
|
|
|
if (stmt->timing != TRIGGER_TYPE_BEFORE &&
|
|
|
|
stmt->timing != TRIGGER_TYPE_AFTER)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
|
|
|
errmsg("\"%s\" is a foreign table",
|
|
|
|
RelationGetRelationName(rel)),
|
|
|
|
errdetail("Foreign tables cannot have INSTEAD OF triggers.")));
|
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
/*
|
|
|
|
* We disallow constraint triggers to protect the assumption that
|
|
|
|
* triggers on FKs can't be deferred. See notes with AfterTriggers
|
|
|
|
* data structures, below.
|
|
|
|
*/
|
2014-03-23 07:16:34 +01:00
|
|
|
if (stmt->isconstraint)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
|
|
|
errmsg("\"%s\" is a foreign table",
|
|
|
|
RelationGetRelationName(rel)),
|
|
|
|
errdetail("Foreign tables cannot have constraint triggers.")));
|
|
|
|
}
|
2010-10-10 19:43:33 +02:00
|
|
|
else
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
Improve error messages about mismatching relkind
Most error messages about a relkind that was not supported or
appropriate for the command was of the pattern
"relation \"%s\" is not a table, foreign table, or materialized view"
This style can become verbose and tedious to maintain. Moreover, it's
not very helpful: If I'm trying to create a comment on a TOAST table,
which is not supported, then the information that I could have created
a comment on a materialized view is pointless.
Instead, write the primary error message shorter and saying more
directly that what was attempted is not possible. Then, in the detail
message, explain that the operation is not supported for the relkind
the object was. To simplify that, add a new function
errdetail_relkind_not_supported() that does this.
In passing, make use of RELKIND_HAS_STORAGE() where appropriate,
instead of listing out the relkinds individually.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/dc35a398-37d0-75ce-07ea-1dd71d98f8ec@2ndquadrant.com
2021-07-08 09:38:52 +02:00
|
|
|
errmsg("relation \"%s\" cannot have triggers",
|
|
|
|
RelationGetRelationName(rel)),
|
|
|
|
errdetail_relkind_not_supported(rel->rd_rel->relkind)));
|
2002-03-22 00:27:25 +01:00
|
|
|
|
2002-04-12 22:38:31 +02:00
|
|
|
if (!allowSystemTableMods && IsSystemRelation(rel))
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
2003-08-01 02:15:26 +02:00
|
|
|
errmsg("permission denied: \"%s\" is a system catalog",
|
2003-07-20 23:56:35 +02:00
|
|
|
RelationGetRelationName(rel))));
|
1997-08-31 13:40:13 +02:00
|
|
|
|
Avoid repeated name lookups during table and index DDL.
If the name lookups come to different conclusions due to concurrent
activity, we might perform some parts of the DDL on a different table
than other parts. At least in the case of CREATE INDEX, this can be
used to cause the permissions checks to be performed against a
different table than the index creation, allowing for a privilege
escalation attack.
This changes the calling convention for DefineIndex, CreateTrigger,
transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible
(in 9.2 and newer), and AlterTable (in 9.1 and older). In addition,
CheckRelationOwnership is removed in 9.2 and newer and the calling
convention is changed in older branches. A field has also been added
to the Constraint node (FkConstraint in 8.4). Third-party code calling
these functions or using the Constraint node will require updating.
Report by Andres Freund. Patch by Robert Haas and Andres Freund,
reviewed by Tom Lane.
Security: CVE-2014-0062
2014-02-17 15:33:31 +01:00
|
|
|
if (stmt->isconstraint)
|
2011-07-09 04:19:30 +02:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* We must take a lock on the target relation to protect against
|
|
|
|
* concurrent drop. It's not clear that AccessShareLock is strong
|
|
|
|
* enough, but we certainly need at least that much... otherwise, we
|
|
|
|
* might end up creating a pg_constraint entry referencing a
|
|
|
|
* nonexistent table.
|
|
|
|
*/
|
Avoid repeated name lookups during table and index DDL.
If the name lookups come to different conclusions due to concurrent
activity, we might perform some parts of the DDL on a different table
than other parts. At least in the case of CREATE INDEX, this can be
used to cause the permissions checks to be performed against a
different table than the index creation, allowing for a privilege
escalation attack.
This changes the calling convention for DefineIndex, CreateTrigger,
transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible
(in 9.2 and newer), and AlterTable (in 9.1 and older). In addition,
CheckRelationOwnership is removed in 9.2 and newer and the calling
convention is changed in older branches. A field has also been added
to the Constraint node (FkConstraint in 8.4). Third-party code calling
these functions or using the Constraint node will require updating.
Report by Andres Freund. Patch by Robert Haas and Andres Freund,
reviewed by Tom Lane.
Security: CVE-2014-0062
2014-02-17 15:33:31 +01:00
|
|
|
if (OidIsValid(refRelOid))
|
|
|
|
{
|
|
|
|
LockRelationOid(refRelOid, AccessShareLock);
|
|
|
|
constrrelid = refRelOid;
|
|
|
|
}
|
|
|
|
else if (stmt->constrrel != NULL)
|
|
|
|
constrrelid = RangeVarGetRelid(stmt->constrrel, AccessShareLock,
|
|
|
|
false);
|
2011-07-09 04:19:30 +02:00
|
|
|
}
|
2002-08-18 13:20:05 +02:00
|
|
|
|
2009-01-22 21:16:10 +01:00
|
|
|
/* permission checks */
|
2010-01-17 23:56:23 +01:00
|
|
|
if (!isInternal)
|
2002-08-18 13:20:05 +02:00
|
|
|
{
|
2002-11-23 04:59:09 +01:00
|
|
|
aclresult = pg_class_aclcheck(RelationGetRelid(rel), GetUserId(),
|
2009-01-22 21:16:10 +01:00
|
|
|
ACL_TRIGGER);
|
2002-08-18 13:20:05 +02:00
|
|
|
if (aclresult != ACLCHECK_OK)
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(aclresult, get_relkind_objtype(rel->rd_rel->relkind),
|
2003-08-01 02:15:26 +02:00
|
|
|
RelationGetRelationName(rel));
|
2007-02-14 02:58:58 +01:00
|
|
|
|
2009-01-22 21:16:10 +01:00
|
|
|
if (OidIsValid(constrrelid))
|
2002-08-18 13:20:05 +02:00
|
|
|
{
|
2002-11-23 04:59:09 +01:00
|
|
|
aclresult = pg_class_aclcheck(constrrelid, GetUserId(),
|
2009-01-22 21:16:10 +01:00
|
|
|
ACL_TRIGGER);
|
2002-08-18 13:20:05 +02:00
|
|
|
if (aclresult != ACLCHECK_OK)
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(aclresult, get_relkind_objtype(get_rel_relkind(constrrelid)),
|
2003-08-01 02:15:26 +02:00
|
|
|
get_rel_name(constrrelid));
|
2002-08-18 13:20:05 +02:00
|
|
|
}
|
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2018-03-23 14:48:22 +01:00
|
|
|
/*
|
|
|
|
* When called on a partitioned table to create a FOR EACH ROW trigger
|
|
|
|
* that's not internal, we create one trigger for each partition, too.
|
|
|
|
*
|
|
|
|
* For that, we'd better hold lock on all of them ahead of time.
|
|
|
|
*/
|
|
|
|
partition_recurse = !isInternal && stmt->row &&
|
|
|
|
rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;
|
|
|
|
if (partition_recurse)
|
|
|
|
list_free(find_all_inheritors(RelationGetRelid(rel),
|
|
|
|
ShareRowExclusiveLock, NULL));
|
|
|
|
|
2007-02-14 02:58:58 +01:00
|
|
|
/* Compute tgtype */
|
1997-09-04 15:19:01 +02:00
|
|
|
TRIGGER_CLEAR_TYPE(tgtype);
|
|
|
|
if (stmt->row)
|
|
|
|
TRIGGER_SETT_ROW(tgtype);
|
2010-10-10 19:43:33 +02:00
|
|
|
tgtype |= stmt->timing;
|
2009-06-18 03:27:02 +02:00
|
|
|
tgtype |= stmt->events;
|
1997-10-28 16:11:45 +01:00
|
|
|
|
2009-06-18 03:27:02 +02:00
|
|
|
/* Disallow ROW-level TRUNCATE triggers */
|
|
|
|
if (TRIGGER_FOR_ROW(tgtype) && TRIGGER_FOR_TRUNCATE(tgtype))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("TRUNCATE FOR EACH ROW triggers are not supported")));
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
/* INSTEAD triggers must be row-level, and can't have WHEN or columns */
|
|
|
|
if (TRIGGER_FOR_INSTEAD(tgtype))
|
|
|
|
{
|
|
|
|
if (!TRIGGER_FOR_ROW(tgtype))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("INSTEAD OF triggers must be FOR EACH ROW")));
|
|
|
|
if (stmt->whenClause)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("INSTEAD OF triggers cannot have WHEN conditions")));
|
|
|
|
if (stmt->columns != NIL)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("INSTEAD OF triggers cannot have column lists")));
|
|
|
|
}
|
|
|
|
|
2016-11-04 16:49:50 +01:00
|
|
|
/*
|
|
|
|
* We don't yet support naming ROW transition variables, but the parser
|
|
|
|
* recognizes the syntax so we can give a nicer message here.
|
|
|
|
*
|
|
|
|
* Per standard, REFERENCING TABLE names are only allowed on AFTER
|
|
|
|
* triggers. Per standard, REFERENCING ROW names are not allowed with FOR
|
|
|
|
* EACH STATEMENT. Per standard, each OLD/NEW, ROW/TABLE permutation is
|
|
|
|
* only allowed once. Per standard, OLD may not be specified when
|
|
|
|
* creating a trigger only for INSERT, and NEW may not be specified when
|
|
|
|
* creating a trigger only for DELETE.
|
|
|
|
*
|
|
|
|
* Notice that the standard allows an AFTER ... FOR EACH ROW trigger to
|
|
|
|
* reference both ROW and TABLE transition data.
|
|
|
|
*/
|
|
|
|
if (stmt->transitionRels != NIL)
|
|
|
|
{
|
|
|
|
List *varList = stmt->transitionRels;
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
foreach(lc, varList)
|
|
|
|
{
|
Improve castNode notation by introducing list-extraction-specific variants.
This extends the castNode() notation introduced by commit 5bcab1114 to
provide, in one step, extraction of a list cell's pointer and coercion to
a concrete node type. For example, "lfirst_node(Foo, lc)" is the same
as "castNode(Foo, lfirst(lc))". Almost half of the uses of castNode
that have appeared so far include a list extraction call, so this is
pretty widely useful, and it saves a few more keystrokes compared to the
old way.
As with the previous patch, back-patch the addition of these macros to
pg_list.h, so that the notation will be available when back-patching.
Patch by me, after an idea of Andrew Gierth's.
Discussion: https://postgr.es/m/14197.1491841216@sss.pgh.pa.us
2017-04-10 19:51:29 +02:00
|
|
|
TriggerTransition *tt = lfirst_node(TriggerTransition, lc);
|
2016-11-04 16:49:50 +01:00
|
|
|
|
|
|
|
if (!(tt->isTable))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("ROW variable naming in the REFERENCING clause is not supported"),
|
|
|
|
errhint("Use OLD TABLE or NEW TABLE for naming transition tables.")));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Because of the above test, we omit further ROW-related testing
|
|
|
|
* below. If we later allow naming OLD and NEW ROW variables,
|
|
|
|
* adjustments will be needed below.
|
|
|
|
*/
|
|
|
|
|
2017-05-10 05:34:02 +02:00
|
|
|
if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
|
|
|
errmsg("\"%s\" is a foreign table",
|
|
|
|
RelationGetRelationName(rel)),
|
|
|
|
errdetail("Triggers on foreign tables cannot have transition tables.")));
|
|
|
|
|
|
|
|
if (rel->rd_rel->relkind == RELKIND_VIEW)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
|
|
|
errmsg("\"%s\" is a view",
|
|
|
|
RelationGetRelationName(rel)),
|
|
|
|
errdetail("Triggers on views cannot have transition tables.")));
|
|
|
|
|
2017-06-28 19:55:03 +02:00
|
|
|
/*
|
|
|
|
* We currently don't allow row-level triggers with transition
|
|
|
|
* tables on partition or inheritance children. Such triggers
|
|
|
|
* would somehow need to see tuples converted to the format of the
|
|
|
|
* table they're attached to, and it's not clear which subset of
|
|
|
|
* tuples each child should see. See also the prohibitions in
|
|
|
|
* ATExecAttachPartition() and ATExecAddInherit().
|
|
|
|
*/
|
|
|
|
if (TRIGGER_FOR_ROW(tgtype) && has_superclass(rel->rd_id))
|
|
|
|
{
|
|
|
|
/* Use appropriate error message. */
|
|
|
|
if (rel->rd_rel->relispartition)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("ROW triggers with transition tables are not supported on partitions")));
|
|
|
|
else
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("ROW triggers with transition tables are not supported on inheritance children")));
|
|
|
|
}
|
|
|
|
|
2016-11-04 16:49:50 +01:00
|
|
|
if (stmt->timing != TRIGGER_TYPE_AFTER)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("transition table name can only be specified for an AFTER trigger")));
|
|
|
|
|
2017-05-10 05:22:39 +02:00
|
|
|
if (TRIGGER_FOR_TRUNCATE(tgtype))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("TRUNCATE triggers with transition tables are not supported")));
|
|
|
|
|
2017-06-28 20:00:55 +02:00
|
|
|
/*
|
|
|
|
* We currently don't allow multi-event triggers ("INSERT OR
|
|
|
|
* UPDATE") with transition tables, because it's not clear how to
|
|
|
|
* handle INSERT ... ON CONFLICT statements which can fire both
|
|
|
|
* INSERT and UPDATE triggers. We show the inserted tuples to
|
|
|
|
* INSERT triggers and the updated tuples to UPDATE triggers, but
|
|
|
|
* it's not yet clear what INSERT OR UPDATE trigger should see.
|
|
|
|
* This restriction could be lifted if we can decide on the right
|
|
|
|
* semantics in a later release.
|
|
|
|
*/
|
|
|
|
if (((TRIGGER_FOR_INSERT(tgtype) ? 1 : 0) +
|
|
|
|
(TRIGGER_FOR_UPDATE(tgtype) ? 1 : 0) +
|
|
|
|
(TRIGGER_FOR_DELETE(tgtype) ? 1 : 0)) != 1)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
2017-09-11 17:20:47 +02:00
|
|
|
errmsg("transition tables cannot be specified for triggers with more than one event")));
|
2017-06-28 20:00:55 +02:00
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
/*
|
|
|
|
* We currently don't allow column-specific triggers with
|
|
|
|
* transition tables. Per spec, that seems to require
|
|
|
|
* accumulating separate transition tables for each combination of
|
|
|
|
* columns, which is a lot of work for a rather marginal feature.
|
|
|
|
*/
|
|
|
|
if (stmt->columns != NIL)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("transition tables cannot be specified for triggers with column lists")));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We disallow constraint triggers with transition tables, to
|
|
|
|
* protect the assumption that such triggers can't be deferred.
|
|
|
|
* See notes with AfterTriggers data structures, below.
|
|
|
|
*
|
|
|
|
* Currently this is enforced by the grammar, so just Assert here.
|
|
|
|
*/
|
|
|
|
Assert(!stmt->isconstraint);
|
|
|
|
|
2016-11-04 16:49:50 +01:00
|
|
|
if (tt->isNew)
|
|
|
|
{
|
|
|
|
if (!(TRIGGER_FOR_INSERT(tgtype) ||
|
|
|
|
TRIGGER_FOR_UPDATE(tgtype)))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("NEW TABLE can only be specified for an INSERT or UPDATE trigger")));
|
|
|
|
|
|
|
|
if (newtablename != NULL)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("NEW TABLE cannot be specified multiple times")));
|
|
|
|
|
|
|
|
newtablename = tt->name;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
if (!(TRIGGER_FOR_DELETE(tgtype) ||
|
|
|
|
TRIGGER_FOR_UPDATE(tgtype)))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("OLD TABLE can only be specified for a DELETE or UPDATE trigger")));
|
|
|
|
|
|
|
|
if (oldtablename != NULL)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("OLD TABLE cannot be specified multiple times")));
|
|
|
|
|
|
|
|
oldtablename = tt->name;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (newtablename != NULL && oldtablename != NULL &&
|
|
|
|
strcmp(newtablename, oldtablename) == 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("OLD TABLE name and NEW TABLE name cannot be the same")));
|
|
|
|
}
|
|
|
|
|
2009-11-20 21:38:12 +01:00
|
|
|
/*
|
2018-03-23 14:48:22 +01:00
|
|
|
* Parse the WHEN clause, if any and we weren't passed an already
|
|
|
|
* transformed one.
|
|
|
|
*
|
|
|
|
* Note that as a side effect, we fill whenRtable when parsing. If we got
|
|
|
|
* an already parsed clause, this does not occur, which is what we want --
|
|
|
|
* no point in adding redundant dependencies below.
|
2009-11-20 21:38:12 +01:00
|
|
|
*/
|
2018-03-23 14:48:22 +01:00
|
|
|
if (!whenClause && stmt->whenClause)
|
2009-11-20 21:38:12 +01:00
|
|
|
{
|
|
|
|
ParseState *pstate;
|
Make parser rely more heavily on the ParseNamespaceItem data structure.
When I added the ParseNamespaceItem data structure (in commit 5ebaaa494),
it wasn't very tightly integrated into the parser's APIs. In the wake of
adding p_rtindex to that struct (commit b541e9acc), there is a good reason
to make more use of it: by passing around ParseNamespaceItem pointers
instead of bare RTE pointers, we can get rid of various messy methods for
passing back or deducing the rangetable index of an RTE during parsing.
Hence, refactor the addRangeTableEntryXXX functions to build and return
a ParseNamespaceItem struct, not just the RTE proper; and replace
addRTEtoQuery with addNSItemToQuery, which is passed a ParseNamespaceItem
rather than building one internally.
Also, add per-column data (a ParseNamespaceColumn array) to each
ParseNamespaceItem. These arrays are built during addRangeTableEntryXXX,
where we have column type data at hand so that it's nearly free to fill
the data structure. Later, when we need to build Vars referencing RTEs,
we can use the ParseNamespaceColumn info to avoid the rather expensive
operations done in get_rte_attribute_type() or expandRTE().
get_rte_attribute_type() is indeed dead code now, so I've removed it.
This makes for a useful improvement in parse analysis speed, around 20%
in one moderately-complex test query.
The ParseNamespaceColumn structs also include Var identity information
(varno/varattno). That info isn't actually being used in this patch,
except that p_varno == 0 is a handy test for a dropped column.
A follow-on patch will make more use of it.
Discussion: https://postgr.es/m/2461.1577764221@sss.pgh.pa.us
2020-01-02 17:29:01 +01:00
|
|
|
ParseNamespaceItem *nsitem;
|
2009-11-20 21:38:12 +01:00
|
|
|
List *varList;
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
/* Set up a pstate to parse with */
|
|
|
|
pstate = make_parsestate(NULL);
|
|
|
|
pstate->p_sourcetext = queryString;
|
|
|
|
|
|
|
|
/*
|
Make parser rely more heavily on the ParseNamespaceItem data structure.
When I added the ParseNamespaceItem data structure (in commit 5ebaaa494),
it wasn't very tightly integrated into the parser's APIs. In the wake of
adding p_rtindex to that struct (commit b541e9acc), there is a good reason
to make more use of it: by passing around ParseNamespaceItem pointers
instead of bare RTE pointers, we can get rid of various messy methods for
passing back or deducing the rangetable index of an RTE during parsing.
Hence, refactor the addRangeTableEntryXXX functions to build and return
a ParseNamespaceItem struct, not just the RTE proper; and replace
addRTEtoQuery with addNSItemToQuery, which is passed a ParseNamespaceItem
rather than building one internally.
Also, add per-column data (a ParseNamespaceColumn array) to each
ParseNamespaceItem. These arrays are built during addRangeTableEntryXXX,
where we have column type data at hand so that it's nearly free to fill
the data structure. Later, when we need to build Vars referencing RTEs,
we can use the ParseNamespaceColumn info to avoid the rather expensive
operations done in get_rte_attribute_type() or expandRTE().
get_rte_attribute_type() is indeed dead code now, so I've removed it.
This makes for a useful improvement in parse analysis speed, around 20%
in one moderately-complex test query.
The ParseNamespaceColumn structs also include Var identity information
(varno/varattno). That info isn't actually being used in this patch,
except that p_varno == 0 is a handy test for a dropped column.
A follow-on patch will make more use of it.
Discussion: https://postgr.es/m/2461.1577764221@sss.pgh.pa.us
2020-01-02 17:29:01 +01:00
|
|
|
* Set up nsitems for OLD and NEW references.
|
2009-11-20 21:38:12 +01:00
|
|
|
*
|
|
|
|
* 'OLD' must always have varno equal to 1 and 'NEW' equal to 2.
|
|
|
|
*/
|
Make parser rely more heavily on the ParseNamespaceItem data structure.
When I added the ParseNamespaceItem data structure (in commit 5ebaaa494),
it wasn't very tightly integrated into the parser's APIs. In the wake of
adding p_rtindex to that struct (commit b541e9acc), there is a good reason
to make more use of it: by passing around ParseNamespaceItem pointers
instead of bare RTE pointers, we can get rid of various messy methods for
passing back or deducing the rangetable index of an RTE during parsing.
Hence, refactor the addRangeTableEntryXXX functions to build and return
a ParseNamespaceItem struct, not just the RTE proper; and replace
addRTEtoQuery with addNSItemToQuery, which is passed a ParseNamespaceItem
rather than building one internally.
Also, add per-column data (a ParseNamespaceColumn array) to each
ParseNamespaceItem. These arrays are built during addRangeTableEntryXXX,
where we have column type data at hand so that it's nearly free to fill
the data structure. Later, when we need to build Vars referencing RTEs,
we can use the ParseNamespaceColumn info to avoid the rather expensive
operations done in get_rte_attribute_type() or expandRTE().
get_rte_attribute_type() is indeed dead code now, so I've removed it.
This makes for a useful improvement in parse analysis speed, around 20%
in one moderately-complex test query.
The ParseNamespaceColumn structs also include Var identity information
(varno/varattno). That info isn't actually being used in this patch,
except that p_varno == 0 is a handy test for a dropped column.
A follow-on patch will make more use of it.
Discussion: https://postgr.es/m/2461.1577764221@sss.pgh.pa.us
2020-01-02 17:29:01 +01:00
|
|
|
nsitem = addRangeTableEntryForRelation(pstate, rel,
|
|
|
|
AccessShareLock,
|
|
|
|
makeAlias("old", NIL),
|
|
|
|
false, false);
|
|
|
|
addNSItemToQuery(pstate, nsitem, false, true, true);
|
|
|
|
nsitem = addRangeTableEntryForRelation(pstate, rel,
|
|
|
|
AccessShareLock,
|
|
|
|
makeAlias("new", NIL),
|
|
|
|
false, false);
|
|
|
|
addNSItemToQuery(pstate, nsitem, false, true, true);
|
2009-11-20 21:38:12 +01:00
|
|
|
|
|
|
|
/* Transform expression. Copy to be sure we don't modify original */
|
|
|
|
whenClause = transformWhereClause(pstate,
|
|
|
|
copyObject(stmt->whenClause),
|
Centralize the logic for detecting misplaced aggregates, window funcs, etc.
Formerly we relied on checking after-the-fact to see if an expression
contained aggregates, window functions, or sub-selects when it shouldn't.
This is grotty, easily forgotten (indeed, we had forgotten to teach
DefineIndex about rejecting window functions), and none too efficient
since it requires extra traversals of the parse tree. To improve matters,
define an enum type that classifies all SQL sub-expressions, store it in
ParseState to show what kind of expression we are currently parsing, and
make transformAggregateCall, transformWindowFuncCall, and transformSubLink
check the expression type and throw error if the type indicates the
construct is disallowed. This allows removal of a large number of ad-hoc
checks scattered around the code base. The enum type is sufficiently
fine-grained that we can still produce error messages of at least the
same specificity as before.
Bringing these error checks together revealed that we'd been none too
consistent about phrasing of the error messages, so standardize the wording
a bit.
Also, rewrite checking of aggregate arguments so that it requires only one
traversal of the arguments, rather than up to three as before.
In passing, clean up some more comments left over from add_missing_from
support, and annotate some tests that I think are dead code now that that's
gone. (I didn't risk actually removing said dead code, though.)
2012-08-10 17:35:33 +02:00
|
|
|
EXPR_KIND_TRIGGER_WHEN,
|
2009-11-20 21:38:12 +01:00
|
|
|
"WHEN");
|
2011-04-07 08:34:57 +02:00
|
|
|
/* we have to fix its collations too */
|
|
|
|
assign_expr_collations(pstate, whenClause);
|
2009-11-20 21:38:12 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Check for disallowed references to OLD/NEW.
|
|
|
|
*
|
|
|
|
* NB: pull_var_clause is okay here only because we don't allow
|
|
|
|
* subselects in WHEN clauses; it would fail to examine the contents
|
|
|
|
* of subselects.
|
|
|
|
*/
|
2016-03-10 21:52:58 +01:00
|
|
|
varList = pull_var_clause(whenClause, 0);
|
2009-11-20 21:38:12 +01:00
|
|
|
foreach(lc, varList)
|
|
|
|
{
|
|
|
|
Var *var = (Var *) lfirst(lc);
|
|
|
|
|
|
|
|
switch (var->varno)
|
|
|
|
{
|
|
|
|
case PRS2_OLD_VARNO:
|
|
|
|
if (!TRIGGER_FOR_ROW(tgtype))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("statement trigger's WHEN condition cannot reference column values"),
|
|
|
|
parser_errposition(pstate, var->location)));
|
|
|
|
if (TRIGGER_FOR_INSERT(tgtype))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("INSERT trigger's WHEN condition cannot reference OLD values"),
|
|
|
|
parser_errposition(pstate, var->location)));
|
|
|
|
/* system columns are okay here */
|
|
|
|
break;
|
|
|
|
case PRS2_NEW_VARNO:
|
|
|
|
if (!TRIGGER_FOR_ROW(tgtype))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("statement trigger's WHEN condition cannot reference column values"),
|
|
|
|
parser_errposition(pstate, var->location)));
|
|
|
|
if (TRIGGER_FOR_DELETE(tgtype))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("DELETE trigger's WHEN condition cannot reference NEW values"),
|
|
|
|
parser_errposition(pstate, var->location)));
|
|
|
|
if (var->varattno < 0 && TRIGGER_FOR_BEFORE(tgtype))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("BEFORE trigger's WHEN condition cannot reference NEW system columns"),
|
|
|
|
parser_errposition(pstate, var->location)));
|
2019-03-30 08:13:09 +01:00
|
|
|
if (TRIGGER_FOR_BEFORE(tgtype) &&
|
|
|
|
var->varattno == 0 &&
|
|
|
|
RelationGetDescr(rel)->constr &&
|
|
|
|
RelationGetDescr(rel)->constr->has_generated_stored)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("BEFORE trigger's WHEN condition cannot reference NEW generated columns"),
|
|
|
|
errdetail("A whole-row reference is used and the table contains generated columns."),
|
|
|
|
parser_errposition(pstate, var->location)));
|
|
|
|
if (TRIGGER_FOR_BEFORE(tgtype) &&
|
|
|
|
var->varattno > 0 &&
|
|
|
|
TupleDescAttr(RelationGetDescr(rel), var->varattno - 1)->attgenerated)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("BEFORE trigger's WHEN condition cannot reference NEW generated columns"),
|
|
|
|
errdetail("Column \"%s\" is a generated column.",
|
|
|
|
NameStr(TupleDescAttr(RelationGetDescr(rel), var->varattno - 1)->attname)),
|
|
|
|
parser_errposition(pstate, var->location)));
|
2009-11-20 21:38:12 +01:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
/* can't happen without add_missing_from, so just elog */
|
|
|
|
elog(ERROR, "trigger WHEN condition cannot contain references to other relations");
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* we'll need the rtable for recordDependencyOnExpr */
|
|
|
|
whenRtable = pstate->p_rtable;
|
|
|
|
|
|
|
|
qual = nodeToString(whenClause);
|
|
|
|
|
|
|
|
free_parsestate(pstate);
|
|
|
|
}
|
2018-03-23 14:48:22 +01:00
|
|
|
else if (!whenClause)
|
2009-11-20 21:38:12 +01:00
|
|
|
{
|
|
|
|
whenClause = NULL;
|
|
|
|
whenRtable = NIL;
|
|
|
|
qual = NULL;
|
|
|
|
}
|
2018-03-23 14:48:22 +01:00
|
|
|
else
|
|
|
|
{
|
|
|
|
qual = nodeToString(whenClause);
|
|
|
|
whenRtable = NIL;
|
|
|
|
}
|
2009-11-20 21:38:12 +01:00
|
|
|
|
2007-11-04 02:16:19 +01:00
|
|
|
/*
|
|
|
|
* Find and validate the trigger function.
|
|
|
|
*/
|
2018-03-23 14:48:22 +01:00
|
|
|
if (!OidIsValid(funcoid))
|
2019-11-12 21:04:46 +01:00
|
|
|
funcoid = LookupFuncName(stmt->funcname, 0, NULL, false);
|
2012-02-23 21:38:56 +01:00
|
|
|
if (!isInternal)
|
|
|
|
{
|
|
|
|
aclresult = pg_proc_aclcheck(funcoid, GetUserId(), ACL_EXECUTE);
|
|
|
|
if (aclresult != ACLCHECK_OK)
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(aclresult, OBJECT_FUNCTION,
|
2012-02-23 21:38:56 +01:00
|
|
|
NameListToString(stmt->funcname));
|
|
|
|
}
|
2007-11-04 02:16:19 +01:00
|
|
|
funcrettype = get_func_rettype(funcoid);
|
|
|
|
if (funcrettype != TRIGGEROID)
|
2020-03-05 21:48:56 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("function %s must return type %s",
|
|
|
|
NameListToString(stmt->funcname), "trigger")));
|
2007-11-04 02:16:19 +01:00
|
|
|
|
2020-11-14 23:05:34 +01:00
|
|
|
/*
|
|
|
|
* Scan pg_trigger to see if there is already a trigger of the same name.
|
|
|
|
* Skip this for internally generated triggers, since we'll modify the
|
|
|
|
* name to be unique below.
|
|
|
|
*
|
|
|
|
* NOTE that this is cool only because we have ShareRowExclusiveLock on
|
|
|
|
* the relation, so the trigger set won't be changing underneath us.
|
|
|
|
*/
|
|
|
|
tgrel = table_open(TriggerRelationId, RowExclusiveLock);
|
|
|
|
if (!isInternal)
|
|
|
|
{
|
|
|
|
ScanKeyData skeys[2];
|
|
|
|
SysScanDesc tgscan;
|
|
|
|
|
|
|
|
ScanKeyInit(&skeys[0],
|
|
|
|
Anum_pg_trigger_tgrelid,
|
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(RelationGetRelid(rel)));
|
|
|
|
|
|
|
|
ScanKeyInit(&skeys[1],
|
|
|
|
Anum_pg_trigger_tgname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
|
|
|
CStringGetDatum(stmt->trigname));
|
|
|
|
|
|
|
|
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
|
|
|
|
NULL, 2, skeys);
|
|
|
|
|
|
|
|
/* There should be at most one matching tuple */
|
|
|
|
if (HeapTupleIsValid(tuple = systable_getnext(tgscan)))
|
|
|
|
{
|
|
|
|
Form_pg_trigger oldtrigger = (Form_pg_trigger) GETSTRUCT(tuple);
|
|
|
|
|
|
|
|
trigoid = oldtrigger->oid;
|
|
|
|
existing_constraint_oid = oldtrigger->tgconstraint;
|
|
|
|
existing_isInternal = oldtrigger->tgisinternal;
|
2022-01-05 23:00:13 +01:00
|
|
|
existing_isClone = OidIsValid(oldtrigger->tgparentid);
|
2020-11-14 23:05:34 +01:00
|
|
|
trigger_exists = true;
|
|
|
|
/* copy the tuple to use in CatalogTupleUpdate() */
|
|
|
|
tuple = heap_copytuple(tuple);
|
|
|
|
}
|
|
|
|
systable_endscan(tgscan);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!trigger_exists)
|
|
|
|
{
|
|
|
|
/* Generate the OID for the new trigger. */
|
|
|
|
trigoid = GetNewOidWithIndex(tgrel, TriggerOidIndexId,
|
|
|
|
Anum_pg_trigger_oid);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* If OR REPLACE was specified, we'll replace the old trigger;
|
|
|
|
* otherwise complain about the duplicate name.
|
|
|
|
*/
|
|
|
|
if (!stmt->replace)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DUPLICATE_OBJECT),
|
|
|
|
errmsg("trigger \"%s\" for relation \"%s\" already exists",
|
|
|
|
stmt->trigname, RelationGetRelationName(rel))));
|
|
|
|
|
|
|
|
/*
|
2022-01-05 23:00:13 +01:00
|
|
|
* An internal trigger or a child trigger (isClone) cannot be replaced
|
|
|
|
* by a user-defined trigger. However, skip this test when
|
|
|
|
* in_partition, because then we're recursing from a partitioned table
|
|
|
|
* and the check was made at the parent level.
|
2020-11-14 23:05:34 +01:00
|
|
|
*/
|
2022-01-05 23:00:13 +01:00
|
|
|
if ((existing_isInternal || existing_isClone) &&
|
|
|
|
!isInternal && !in_partition)
|
2020-11-14 23:05:34 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DUPLICATE_OBJECT),
|
2022-01-05 23:00:13 +01:00
|
|
|
errmsg("trigger \"%s\" for relation \"%s\" is an internal or a child trigger",
|
2020-11-14 23:05:34 +01:00
|
|
|
stmt->trigname, RelationGetRelationName(rel))));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* It is not allowed to replace with a constraint trigger; gram.y
|
|
|
|
* should have enforced this already.
|
|
|
|
*/
|
|
|
|
Assert(!stmt->isconstraint);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* It is not allowed to replace an existing constraint trigger,
|
|
|
|
* either. (The reason for these restrictions is partly that it seems
|
|
|
|
* difficult to deal with pending trigger events in such cases, and
|
|
|
|
* partly that the command might imply changing the constraint's
|
|
|
|
* properties as well, which doesn't seem nice.)
|
|
|
|
*/
|
|
|
|
if (OidIsValid(existing_constraint_oid))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DUPLICATE_OBJECT),
|
|
|
|
errmsg("trigger \"%s\" for relation \"%s\" is a constraint trigger",
|
|
|
|
stmt->trigname, RelationGetRelationName(rel))));
|
|
|
|
}
|
|
|
|
|
2010-01-17 23:56:23 +01:00
|
|
|
/*
|
|
|
|
* If it's a user-entered CREATE CONSTRAINT TRIGGER command, make a
|
|
|
|
* corresponding pg_constraint entry.
|
|
|
|
*/
|
|
|
|
if (stmt->isconstraint && !OidIsValid(constraintOid))
|
|
|
|
{
|
|
|
|
/* Internal callers should have made their own constraints */
|
|
|
|
Assert(!isInternal);
|
|
|
|
constraintOid = CreateConstraintEntry(stmt->trigname,
|
|
|
|
RelationGetNamespace(rel),
|
|
|
|
CONSTRAINT_TRIGGER,
|
|
|
|
stmt->deferrable,
|
|
|
|
stmt->initdeferred,
|
2011-02-08 13:23:20 +01:00
|
|
|
true,
|
2018-03-23 14:48:22 +01:00
|
|
|
InvalidOid, /* no parent */
|
2010-01-17 23:56:23 +01:00
|
|
|
RelationGetRelid(rel),
|
|
|
|
NULL, /* no conkey */
|
|
|
|
0,
|
2018-04-07 22:00:39 +02:00
|
|
|
0,
|
2010-01-17 23:56:23 +01:00
|
|
|
InvalidOid, /* no domain */
|
|
|
|
InvalidOid, /* no index */
|
|
|
|
InvalidOid, /* no foreign key */
|
|
|
|
NULL,
|
|
|
|
NULL,
|
|
|
|
NULL,
|
|
|
|
NULL,
|
|
|
|
0,
|
|
|
|
' ',
|
|
|
|
' ',
|
2021-12-08 11:09:44 +01:00
|
|
|
NULL,
|
|
|
|
0,
|
2010-01-17 23:56:23 +01:00
|
|
|
' ',
|
|
|
|
NULL, /* no exclusion */
|
|
|
|
NULL, /* no check constraint */
|
|
|
|
NULL,
|
|
|
|
true, /* islocal */
|
2011-12-05 19:10:18 +01:00
|
|
|
0, /* inhcount */
|
2019-07-22 03:01:50 +02:00
|
|
|
true, /* noinherit */
|
2013-03-18 03:55:14 +01:00
|
|
|
isInternal); /* is_internal */
|
2010-01-17 23:56:23 +01:00
|
|
|
}
|
|
|
|
|
2007-11-04 02:16:19 +01:00
|
|
|
/*
|
2010-01-17 23:56:23 +01:00
|
|
|
* If trigger is internally generated, modify the provided trigger name to
|
|
|
|
* ensure uniqueness by appending the trigger OID. (Callers will usually
|
|
|
|
* supply a simple constant trigger name in these cases.)
|
2007-11-04 02:16:19 +01:00
|
|
|
*/
|
2010-01-17 23:56:23 +01:00
|
|
|
if (isInternal)
|
2007-11-04 02:16:19 +01:00
|
|
|
{
|
2010-01-17 23:56:23 +01:00
|
|
|
snprintf(internaltrigname, sizeof(internaltrigname),
|
|
|
|
"%s_%u", stmt->trigname, trigoid);
|
|
|
|
trigname = internaltrigname;
|
2007-11-04 02:16:19 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2010-01-17 23:56:23 +01:00
|
|
|
/* user-defined trigger; use the specified trigger name as-is */
|
2007-11-04 02:16:19 +01:00
|
|
|
trigname = stmt->trigname;
|
|
|
|
}
|
|
|
|
|
2000-07-03 05:57:03 +02:00
|
|
|
/*
|
|
|
|
* Build the new pg_trigger tuple.
|
2018-03-23 14:48:22 +01:00
|
|
|
*
|
|
|
|
* When we're creating a trigger in a partition, we mark it as internal,
|
|
|
|
* even though we don't do the isInternal magic in this function. This
|
|
|
|
* makes the triggers in partitions identical to the ones in the
|
|
|
|
* partitioned tables, except that they are marked internal.
|
2000-07-03 05:57:03 +02:00
|
|
|
*/
|
2008-11-02 02:45:28 +01:00
|
|
|
memset(nulls, false, sizeof(nulls));
|
1997-09-07 07:04:48 +02:00
|
|
|
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
values[Anum_pg_trigger_oid - 1] = ObjectIdGetDatum(trigoid);
|
1998-08-19 04:04:17 +02:00
|
|
|
values[Anum_pg_trigger_tgrelid - 1] = ObjectIdGetDatum(RelationGetRelid(rel));
|
2020-02-27 17:23:33 +01:00
|
|
|
values[Anum_pg_trigger_tgparentid - 1] = ObjectIdGetDatum(parentTriggerOid);
|
2000-08-03 18:35:08 +02:00
|
|
|
values[Anum_pg_trigger_tgname - 1] = DirectFunctionCall1(namein,
|
2002-07-12 20:43:19 +02:00
|
|
|
CStringGetDatum(trigname));
|
2000-07-03 05:57:03 +02:00
|
|
|
values[Anum_pg_trigger_tgfoid - 1] = ObjectIdGetDatum(funcoid);
|
1997-09-04 15:19:01 +02:00
|
|
|
values[Anum_pg_trigger_tgtype - 1] = Int16GetDatum(tgtype);
|
2021-07-16 19:01:43 +02:00
|
|
|
values[Anum_pg_trigger_tgenabled - 1] = trigger_fires_when;
|
2022-01-05 23:00:13 +01:00
|
|
|
values[Anum_pg_trigger_tgisinternal - 1] = BoolGetDatum(isInternal);
|
2000-07-03 05:57:03 +02:00
|
|
|
values[Anum_pg_trigger_tgconstrrelid - 1] = ObjectIdGetDatum(constrrelid);
|
2009-07-28 04:56:31 +02:00
|
|
|
values[Anum_pg_trigger_tgconstrindid - 1] = ObjectIdGetDatum(indexOid);
|
2007-02-14 02:58:58 +01:00
|
|
|
values[Anum_pg_trigger_tgconstraint - 1] = ObjectIdGetDatum(constraintOid);
|
2000-07-03 05:57:03 +02:00
|
|
|
values[Anum_pg_trigger_tgdeferrable - 1] = BoolGetDatum(stmt->deferrable);
|
|
|
|
values[Anum_pg_trigger_tginitdeferred - 1] = BoolGetDatum(stmt->initdeferred);
|
1999-09-29 18:06:40 +02:00
|
|
|
|
1997-09-04 15:19:01 +02:00
|
|
|
if (stmt->args)
|
|
|
|
{
|
2004-05-26 06:41:50 +02:00
|
|
|
ListCell *le;
|
1997-09-04 15:19:01 +02:00
|
|
|
char *args;
|
2004-05-26 06:41:50 +02:00
|
|
|
int16 nargs = list_length(stmt->args);
|
1997-09-04 15:19:01 +02:00
|
|
|
int len = 0;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
1997-09-04 15:19:01 +02:00
|
|
|
foreach(le, stmt->args)
|
|
|
|
{
|
2002-08-25 19:20:01 +02:00
|
|
|
char *ar = strVal(lfirst(le));
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2000-07-03 05:57:03 +02:00
|
|
|
len += strlen(ar) + 4;
|
1997-10-28 16:11:45 +01:00
|
|
|
for (; *ar; ar++)
|
1997-10-02 15:52:29 +02:00
|
|
|
{
|
|
|
|
if (*ar == '\\')
|
|
|
|
len++;
|
|
|
|
}
|
1997-09-04 15:19:01 +02:00
|
|
|
}
|
|
|
|
args = (char *) palloc(len + 1);
|
2000-08-12 01:45:35 +02:00
|
|
|
args[0] = '\0';
|
1997-09-04 15:19:01 +02:00
|
|
|
foreach(le, stmt->args)
|
1997-10-02 15:52:29 +02:00
|
|
|
{
|
2002-08-25 19:20:01 +02:00
|
|
|
char *s = strVal(lfirst(le));
|
1997-10-28 16:11:45 +01:00
|
|
|
char *d = args + strlen(args);
|
|
|
|
|
1997-10-02 15:52:29 +02:00
|
|
|
while (*s)
|
|
|
|
{
|
|
|
|
if (*s == '\\')
|
|
|
|
*d++ = '\\';
|
|
|
|
*d++ = *s++;
|
|
|
|
}
|
2000-08-12 01:45:35 +02:00
|
|
|
strcpy(d, "\\000");
|
1997-10-02 15:52:29 +02:00
|
|
|
}
|
1997-09-04 15:19:01 +02:00
|
|
|
values[Anum_pg_trigger_tgnargs - 1] = Int16GetDatum(nargs);
|
2000-07-29 05:26:51 +02:00
|
|
|
values[Anum_pg_trigger_tgargs - 1] = DirectFunctionCall1(byteain,
|
|
|
|
CStringGetDatum(args));
|
1997-09-04 15:19:01 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
values[Anum_pg_trigger_tgnargs - 1] = Int16GetDatum(0);
|
2000-07-29 05:26:51 +02:00
|
|
|
values[Anum_pg_trigger_tgargs - 1] = DirectFunctionCall1(byteain,
|
|
|
|
CStringGetDatum(""));
|
1997-09-04 15:19:01 +02:00
|
|
|
}
|
2007-11-04 02:16:19 +01:00
|
|
|
|
2009-10-15 00:14:25 +02:00
|
|
|
/* build column number array if it's a column-specific trigger */
|
|
|
|
ncolumns = list_length(stmt->columns);
|
|
|
|
if (ncolumns == 0)
|
|
|
|
columns = NULL;
|
|
|
|
else
|
|
|
|
{
|
|
|
|
ListCell *cell;
|
|
|
|
int i = 0;
|
|
|
|
|
2012-06-25 00:51:46 +02:00
|
|
|
columns = (int16 *) palloc(ncolumns * sizeof(int16));
|
2009-10-15 00:14:25 +02:00
|
|
|
foreach(cell, stmt->columns)
|
|
|
|
{
|
|
|
|
char *name = strVal(lfirst(cell));
|
2012-06-25 00:51:46 +02:00
|
|
|
int16 attnum;
|
2009-10-15 00:14:25 +02:00
|
|
|
int j;
|
|
|
|
|
|
|
|
/* Lookup column name. System columns are not allowed */
|
|
|
|
attnum = attnameAttNum(rel, name, false);
|
|
|
|
if (attnum == InvalidAttrNumber)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_COLUMN),
|
|
|
|
errmsg("column \"%s\" of relation \"%s\" does not exist",
|
|
|
|
name, RelationGetRelationName(rel))));
|
|
|
|
|
|
|
|
/* Check for duplicates */
|
|
|
|
for (j = i - 1; j >= 0; j--)
|
|
|
|
{
|
|
|
|
if (columns[j] == attnum)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DUPLICATE_COLUMN),
|
|
|
|
errmsg("column \"%s\" specified more than once",
|
|
|
|
name)));
|
|
|
|
}
|
|
|
|
|
|
|
|
columns[i++] = attnum;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
tgattr = buildint2vector(columns, ncolumns);
|
1997-09-04 15:19:01 +02:00
|
|
|
values[Anum_pg_trigger_tgattr - 1] = PointerGetDatum(tgattr);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2009-11-20 21:38:12 +01:00
|
|
|
/* set tgqual if trigger has WHEN clause */
|
|
|
|
if (qual)
|
|
|
|
values[Anum_pg_trigger_tgqual - 1] = CStringGetTextDatum(qual);
|
|
|
|
else
|
|
|
|
nulls[Anum_pg_trigger_tgqual - 1] = true;
|
|
|
|
|
2016-11-04 16:49:50 +01:00
|
|
|
if (oldtablename)
|
|
|
|
values[Anum_pg_trigger_tgoldtable - 1] = DirectFunctionCall1(namein,
|
|
|
|
CStringGetDatum(oldtablename));
|
|
|
|
else
|
|
|
|
nulls[Anum_pg_trigger_tgoldtable - 1] = true;
|
|
|
|
if (newtablename)
|
|
|
|
values[Anum_pg_trigger_tgnewtable - 1] = DirectFunctionCall1(namein,
|
|
|
|
CStringGetDatum(newtablename));
|
|
|
|
else
|
|
|
|
nulls[Anum_pg_trigger_tgnewtable - 1] = true;
|
|
|
|
|
2000-07-03 05:57:03 +02:00
|
|
|
/*
|
2020-11-14 23:05:34 +01:00
|
|
|
* Insert or replace tuple in pg_trigger.
|
2000-07-03 05:57:03 +02:00
|
|
|
*/
|
2020-11-14 23:05:34 +01:00
|
|
|
if (!trigger_exists)
|
|
|
|
{
|
|
|
|
tuple = heap_form_tuple(tgrel->rd_att, values, nulls);
|
|
|
|
CatalogTupleInsert(tgrel, tuple);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
HeapTuple newtup;
|
2002-07-12 20:43:19 +02:00
|
|
|
|
2020-11-14 23:05:34 +01:00
|
|
|
newtup = heap_form_tuple(tgrel->rd_att, values, nulls);
|
|
|
|
CatalogTupleUpdate(tgrel, &tuple->t_self, newtup);
|
|
|
|
heap_freetuple(newtup);
|
|
|
|
}
|
|
|
|
|
|
|
|
heap_freetuple(tuple); /* free either original or new tuple */
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(tgrel, RowExclusiveLock);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
1997-09-04 15:19:01 +02:00
|
|
|
pfree(DatumGetPointer(values[Anum_pg_trigger_tgname - 1]));
|
|
|
|
pfree(DatumGetPointer(values[Anum_pg_trigger_tgargs - 1]));
|
2009-10-15 00:14:25 +02:00
|
|
|
pfree(DatumGetPointer(values[Anum_pg_trigger_tgattr - 1]));
|
2016-11-04 16:49:50 +01:00
|
|
|
if (oldtablename)
|
|
|
|
pfree(DatumGetPointer(values[Anum_pg_trigger_tgoldtable - 1]));
|
|
|
|
if (newtablename)
|
|
|
|
pfree(DatumGetPointer(values[Anum_pg_trigger_tgnewtable - 1]));
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2000-07-03 05:57:03 +02:00
|
|
|
/*
|
2018-03-23 14:48:22 +01:00
|
|
|
* Update relation's pg_class entry; if necessary; and if not, send an SI
|
|
|
|
* message to make other backends (and this one) rebuild relcache entries.
|
2000-07-03 05:57:03 +02:00
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
pgrel = table_open(RelationRelationId, RowExclusiveLock);
|
2010-02-14 19:42:19 +01:00
|
|
|
tuple = SearchSysCacheCopy1(RELOID,
|
|
|
|
ObjectIdGetDatum(RelationGetRelid(rel)));
|
1998-08-19 04:04:17 +02:00
|
|
|
if (!HeapTupleIsValid(tuple))
|
2003-07-20 23:56:35 +02:00
|
|
|
elog(ERROR, "cache lookup failed for relation %u",
|
|
|
|
RelationGetRelid(rel));
|
2018-03-23 14:48:22 +01:00
|
|
|
if (!((Form_pg_class) GETSTRUCT(tuple))->relhastriggers)
|
|
|
|
{
|
|
|
|
((Form_pg_class) GETSTRUCT(tuple))->relhastriggers = true;
|
1998-08-19 04:04:17 +02:00
|
|
|
|
2018-03-23 14:48:22 +01:00
|
|
|
CatalogTupleUpdate(pgrel, &tuple->t_self, tuple);
|
2002-07-12 20:43:19 +02:00
|
|
|
|
2018-03-23 14:48:22 +01:00
|
|
|
CommandCounterIncrement();
|
|
|
|
}
|
|
|
|
else
|
|
|
|
CacheInvalidateRelcacheByTuple(tuple);
|
2002-07-12 20:43:19 +02:00
|
|
|
|
1999-12-16 23:20:03 +01:00
|
|
|
heap_freetuple(tuple);
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(pgrel, RowExclusiveLock);
|
2000-04-12 19:17:23 +02:00
|
|
|
|
2020-11-14 23:05:34 +01:00
|
|
|
/*
|
|
|
|
* If we're replacing a trigger, flush all the old dependencies before
|
|
|
|
* recording new ones.
|
|
|
|
*/
|
|
|
|
if (trigger_exists)
|
|
|
|
deleteDependencyRecordsFor(TriggerRelationId, trigoid, true);
|
|
|
|
|
2002-07-12 20:43:19 +02:00
|
|
|
/*
|
|
|
|
* Record dependencies for trigger. Always place a normal dependency on
|
2007-02-14 02:58:58 +01:00
|
|
|
* the function.
|
2002-07-12 20:43:19 +02:00
|
|
|
*/
|
2007-02-14 02:58:58 +01:00
|
|
|
myself.classId = TriggerRelationId;
|
|
|
|
myself.objectId = trigoid;
|
|
|
|
myself.objectSubId = 0;
|
|
|
|
|
2005-04-14 03:38:22 +02:00
|
|
|
referenced.classId = ProcedureRelationId;
|
2002-07-12 20:43:19 +02:00
|
|
|
referenced.objectId = funcoid;
|
|
|
|
referenced.objectSubId = 0;
|
|
|
|
recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
|
|
|
|
|
2010-01-17 23:56:23 +01:00
|
|
|
if (isInternal && OidIsValid(constraintOid))
|
2007-02-14 02:58:58 +01:00
|
|
|
{
|
|
|
|
/*
|
2010-01-17 23:56:23 +01:00
|
|
|
* Internally-generated trigger for a constraint, so make it an
|
|
|
|
* internal dependency of the constraint. We can skip depending on
|
|
|
|
* the relation(s), as there'll be an indirect dependency via the
|
|
|
|
* constraint.
|
2007-02-14 02:58:58 +01:00
|
|
|
*/
|
|
|
|
referenced.classId = ConstraintRelationId;
|
|
|
|
referenced.objectId = constraintOid;
|
|
|
|
referenced.objectSubId = 0;
|
|
|
|
recordDependencyOn(&myself, &referenced, DEPENDENCY_INTERNAL);
|
|
|
|
}
|
|
|
|
else
|
2002-07-12 20:43:19 +02:00
|
|
|
{
|
2007-02-14 02:58:58 +01:00
|
|
|
/*
|
2010-01-17 23:56:23 +01:00
|
|
|
* User CREATE TRIGGER, so place dependencies. We make trigger be
|
2007-02-14 02:58:58 +01:00
|
|
|
* auto-dropped if its relation is dropped or if the FK relation is
|
|
|
|
* dropped. (Auto drop is compatible with our pre-7.3 behavior.)
|
|
|
|
*/
|
2005-04-14 03:38:22 +02:00
|
|
|
referenced.classId = RelationRelationId;
|
2002-07-12 20:43:19 +02:00
|
|
|
referenced.objectId = RelationGetRelid(rel);
|
|
|
|
referenced.objectSubId = 0;
|
Redesign the partition dependency mechanism.
The original setup for dependencies of partitioned objects had
serious problems:
1. It did not verify that a drop cascading to a partition-child object
also cascaded to at least one of the object's partition parents. Now,
normally a child object would share all its dependencies with one or
another parent (e.g. a child index's opclass dependencies would be shared
with the parent index), so that this oversight is usually harmless.
But if some dependency failed to fit this pattern, the child could be
dropped while all its parents remain, creating a logically broken
situation. (It's easy to construct artificial cases that break it,
such as attaching an unrelated extension dependency to the child object
and then dropping the extension. I'm not sure if any less-artificial
cases exist.)
2. Management of partition dependencies during ATTACH/DETACH PARTITION
was complicated and buggy; for example, after detaching a partition
table it was possible to create cases where a formerly-child index
should be dropped and was not, because the correct set of dependencies
had not been reconstructed.
Less seriously, because multiple partition relationships were
represented identically in pg_depend, there was an order-of-traversal
dependency on which partition parent was cited in error messages.
We also had some pre-existing order-of-traversal hazards for error
messages related to internal and extension dependencies. This is
cosmetic to users but causes testing problems.
To fix #1, add a check at the end of the partition tree traversal
to ensure that at least one partition parent got deleted. To fix #2,
establish a new policy that partition dependencies are in addition to,
not instead of, a child object's usual dependencies; in this way
ATTACH/DETACH PARTITION need not cope with adding or removing the
usual dependencies.
To fix the cosmetic problem, distinguish between primary and secondary
partition dependency entries in pg_depend, by giving them different
deptypes. (They behave identically except for having different
priorities for being cited in error messages.) This means that the
former 'I' dependency type is replaced with new 'P' and 'S' types.
This also fixes a longstanding bug that after handling an internal
dependency by recursing to the owning object, findDependentObjects
did not verify that the current target was now scheduled for deletion,
and did not apply the current recursion level's objflags to it.
Perhaps that should be back-patched; but in the back branches it
would only matter if some concurrent transaction had removed the
internal-linkage pg_depend entry before the recursive call found it,
or the recursive call somehow failed to find it, both of which seem
unlikely.
Catversion bump because the contents of pg_depend change for
partitioning relationships.
Patch HEAD only. It's annoying that we're not fixing #2 in v11,
but there seems no practical way to do so given that the problem
is exactly a poor choice of what entries to put in pg_depend.
We can't really fix that while staying compatible with what's
in pg_depend in existing v11 installations.
Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 20:41:13 +01:00
|
|
|
recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO);
|
2018-03-23 14:48:22 +01:00
|
|
|
|
2009-07-28 04:56:31 +02:00
|
|
|
if (OidIsValid(constrrelid))
|
2002-07-12 20:43:19 +02:00
|
|
|
{
|
2005-04-14 03:38:22 +02:00
|
|
|
referenced.classId = RelationRelationId;
|
2002-07-12 20:43:19 +02:00
|
|
|
referenced.objectId = constrrelid;
|
|
|
|
referenced.objectSubId = 0;
|
|
|
|
recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO);
|
|
|
|
}
|
2009-07-28 04:56:31 +02:00
|
|
|
/* Not possible to have an index dependency in this case */
|
|
|
|
Assert(!OidIsValid(indexOid));
|
2010-02-26 03:01:40 +01:00
|
|
|
|
2010-01-17 23:56:23 +01:00
|
|
|
/*
|
|
|
|
* If it's a user-specified constraint trigger, make the constraint
|
|
|
|
* internally dependent on the trigger instead of vice versa.
|
|
|
|
*/
|
|
|
|
if (OidIsValid(constraintOid))
|
|
|
|
{
|
|
|
|
referenced.classId = ConstraintRelationId;
|
|
|
|
referenced.objectId = constraintOid;
|
|
|
|
referenced.objectSubId = 0;
|
|
|
|
recordDependencyOn(&referenced, &myself, DEPENDENCY_INTERNAL);
|
|
|
|
}
|
2018-03-23 14:48:22 +01:00
|
|
|
|
Redesign the partition dependency mechanism.
The original setup for dependencies of partitioned objects had
serious problems:
1. It did not verify that a drop cascading to a partition-child object
also cascaded to at least one of the object's partition parents. Now,
normally a child object would share all its dependencies with one or
another parent (e.g. a child index's opclass dependencies would be shared
with the parent index), so that this oversight is usually harmless.
But if some dependency failed to fit this pattern, the child could be
dropped while all its parents remain, creating a logically broken
situation. (It's easy to construct artificial cases that break it,
such as attaching an unrelated extension dependency to the child object
and then dropping the extension. I'm not sure if any less-artificial
cases exist.)
2. Management of partition dependencies during ATTACH/DETACH PARTITION
was complicated and buggy; for example, after detaching a partition
table it was possible to create cases where a formerly-child index
should be dropped and was not, because the correct set of dependencies
had not been reconstructed.
Less seriously, because multiple partition relationships were
represented identically in pg_depend, there was an order-of-traversal
dependency on which partition parent was cited in error messages.
We also had some pre-existing order-of-traversal hazards for error
messages related to internal and extension dependencies. This is
cosmetic to users but causes testing problems.
To fix #1, add a check at the end of the partition tree traversal
to ensure that at least one partition parent got deleted. To fix #2,
establish a new policy that partition dependencies are in addition to,
not instead of, a child object's usual dependencies; in this way
ATTACH/DETACH PARTITION need not cope with adding or removing the
usual dependencies.
To fix the cosmetic problem, distinguish between primary and secondary
partition dependency entries in pg_depend, by giving them different
deptypes. (They behave identically except for having different
priorities for being cited in error messages.) This means that the
former 'I' dependency type is replaced with new 'P' and 'S' types.
This also fixes a longstanding bug that after handling an internal
dependency by recursing to the owning object, findDependentObjects
did not verify that the current target was now scheduled for deletion,
and did not apply the current recursion level's objflags to it.
Perhaps that should be back-patched; but in the back branches it
would only matter if some concurrent transaction had removed the
internal-linkage pg_depend entry before the recursive call found it,
or the recursive call somehow failed to find it, both of which seem
unlikely.
Catversion bump because the contents of pg_depend change for
partitioning relationships.
Patch HEAD only. It's annoying that we're not fixing #2 in v11,
but there seems no practical way to do so given that the problem
is exactly a poor choice of what entries to put in pg_depend.
We can't really fix that while staying compatible with what's
in pg_depend in existing v11 installations.
Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 20:41:13 +01:00
|
|
|
/*
|
|
|
|
* If it's a partition trigger, create the partition dependencies.
|
|
|
|
*/
|
2018-03-23 14:48:22 +01:00
|
|
|
if (OidIsValid(parentTriggerOid))
|
|
|
|
{
|
|
|
|
ObjectAddressSet(referenced, TriggerRelationId, parentTriggerOid);
|
Redesign the partition dependency mechanism.
The original setup for dependencies of partitioned objects had
serious problems:
1. It did not verify that a drop cascading to a partition-child object
also cascaded to at least one of the object's partition parents. Now,
normally a child object would share all its dependencies with one or
another parent (e.g. a child index's opclass dependencies would be shared
with the parent index), so that this oversight is usually harmless.
But if some dependency failed to fit this pattern, the child could be
dropped while all its parents remain, creating a logically broken
situation. (It's easy to construct artificial cases that break it,
such as attaching an unrelated extension dependency to the child object
and then dropping the extension. I'm not sure if any less-artificial
cases exist.)
2. Management of partition dependencies during ATTACH/DETACH PARTITION
was complicated and buggy; for example, after detaching a partition
table it was possible to create cases where a formerly-child index
should be dropped and was not, because the correct set of dependencies
had not been reconstructed.
Less seriously, because multiple partition relationships were
represented identically in pg_depend, there was an order-of-traversal
dependency on which partition parent was cited in error messages.
We also had some pre-existing order-of-traversal hazards for error
messages related to internal and extension dependencies. This is
cosmetic to users but causes testing problems.
To fix #1, add a check at the end of the partition tree traversal
to ensure that at least one partition parent got deleted. To fix #2,
establish a new policy that partition dependencies are in addition to,
not instead of, a child object's usual dependencies; in this way
ATTACH/DETACH PARTITION need not cope with adding or removing the
usual dependencies.
To fix the cosmetic problem, distinguish between primary and secondary
partition dependency entries in pg_depend, by giving them different
deptypes. (They behave identically except for having different
priorities for being cited in error messages.) This means that the
former 'I' dependency type is replaced with new 'P' and 'S' types.
This also fixes a longstanding bug that after handling an internal
dependency by recursing to the owning object, findDependentObjects
did not verify that the current target was now scheduled for deletion,
and did not apply the current recursion level's objflags to it.
Perhaps that should be back-patched; but in the back branches it
would only matter if some concurrent transaction had removed the
internal-linkage pg_depend entry before the recursive call found it,
or the recursive call somehow failed to find it, both of which seem
unlikely.
Catversion bump because the contents of pg_depend change for
partitioning relationships.
Patch HEAD only. It's annoying that we're not fixing #2 in v11,
but there seems no practical way to do so given that the problem
is exactly a poor choice of what entries to put in pg_depend.
We can't really fix that while staying compatible with what's
in pg_depend in existing v11 installations.
Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 20:41:13 +01:00
|
|
|
recordDependencyOn(&myself, &referenced, DEPENDENCY_PARTITION_PRI);
|
|
|
|
ObjectAddressSet(referenced, RelationRelationId, RelationGetRelid(rel));
|
|
|
|
recordDependencyOn(&myself, &referenced, DEPENDENCY_PARTITION_SEC);
|
2018-03-23 14:48:22 +01:00
|
|
|
}
|
2002-07-12 20:43:19 +02:00
|
|
|
}
|
|
|
|
|
2009-10-15 00:14:25 +02:00
|
|
|
/* If column-specific trigger, add normal dependencies on columns */
|
|
|
|
if (columns != NULL)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
referenced.classId = RelationRelationId;
|
|
|
|
referenced.objectId = RelationGetRelid(rel);
|
|
|
|
for (i = 0; i < ncolumns; i++)
|
|
|
|
{
|
|
|
|
referenced.objectSubId = columns[i];
|
|
|
|
recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2009-11-20 21:38:12 +01:00
|
|
|
/*
|
|
|
|
* If it has a WHEN clause, add dependencies on objects mentioned in the
|
|
|
|
* expression (eg, functions, as well as any columns used).
|
|
|
|
*/
|
2018-03-23 14:48:22 +01:00
|
|
|
if (whenRtable != NIL)
|
2009-11-20 21:38:12 +01:00
|
|
|
recordDependencyOnExpr(&myself, whenClause, whenRtable,
|
|
|
|
DEPENDENCY_NORMAL);
|
|
|
|
|
2010-11-25 17:48:49 +01:00
|
|
|
/* Post creation hook for new trigger */
|
2013-03-18 03:55:14 +01:00
|
|
|
InvokeObjectPostCreateHookArg(TriggerRelationId, trigoid, 0,
|
|
|
|
isInternal);
|
2010-11-25 17:48:49 +01:00
|
|
|
|
2018-03-23 14:48:22 +01:00
|
|
|
/*
|
|
|
|
* Lastly, create the trigger on child relations, if needed.
|
|
|
|
*/
|
|
|
|
if (partition_recurse)
|
|
|
|
{
|
Fix relcache inconsistency hazard in partition detach
During queries coming from ri_triggers.c, we need to omit partitions
that are marked pending detach -- otherwise, the RI query is tricked
into allowing a row into the referencing table whose corresponding row
is in the detached partition. Which is bogus: once the detach operation
completes, the row becomes an orphan.
However, the code was not doing that in repeatable-read transactions,
because relcache kept a copy of the partition descriptor that included
the partition, and used it in the RI query. This commit changes the
partdesc cache code to only keep descriptors that aren't dependent on
a snapshot (namely: those where no detached partition exist, and those
where detached partitions are included). When a partdesc-without-
detached-partitions is requested, we create one afresh each time; also,
those partdescs are stored in PortalContext instead of
CacheMemoryContext.
find_inheritance_children gets a new output *detached_exist boolean,
which indicates whether any partition marked pending-detach is found.
Its "include_detached" input flag is changed to "omit_detached", because
that name captures desired the semantics more naturally.
CreatePartitionDirectory() and RelationGetPartitionDesc() arguments are
identically renamed.
This was noticed because a buildfarm member that runs with relcache
clobbering, which would not keep the improperly cached partdesc, broke
one test, which led us to realize that the expected output of that test
was bogus. This commit also corrects that expected output.
Author: Amit Langote <amitlangote09@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/3269784.1617215412@sss.pgh.pa.us
2021-04-22 21:13:25 +02:00
|
|
|
PartitionDesc partdesc = RelationGetPartitionDesc(rel, true);
|
2018-03-23 14:48:22 +01:00
|
|
|
int i;
|
|
|
|
MemoryContext oldcxt,
|
|
|
|
perChildCxt;
|
|
|
|
|
|
|
|
perChildCxt = AllocSetContextCreate(CurrentMemoryContext,
|
|
|
|
"part trig clone",
|
|
|
|
ALLOCSET_SMALL_SIZES);
|
|
|
|
|
|
|
|
/*
|
2022-09-06 05:51:44 +02:00
|
|
|
* We don't currently expect to be called with a valid indexOid. If
|
2022-09-06 05:59:15 +02:00
|
|
|
* that ever changes then we'll need to write code here to find the
|
2022-09-06 05:51:44 +02:00
|
|
|
* corresponding child index.
|
2018-03-23 14:48:22 +01:00
|
|
|
*/
|
2022-09-06 05:51:44 +02:00
|
|
|
Assert(!OidIsValid(indexOid));
|
2018-03-23 14:48:22 +01:00
|
|
|
|
|
|
|
oldcxt = MemoryContextSwitchTo(perChildCxt);
|
|
|
|
|
|
|
|
/* Iterate to create the trigger on each existing partition */
|
|
|
|
for (i = 0; i < partdesc->nparts; i++)
|
|
|
|
{
|
|
|
|
CreateTrigStmt *childStmt;
|
|
|
|
Relation childTbl;
|
|
|
|
Node *qual;
|
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
childTbl = table_open(partdesc->oids[i], ShareRowExclusiveLock);
|
2018-03-23 14:48:22 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Initialize our fabricated parse node by copying the original
|
|
|
|
* one, then resetting fields that we pass separately.
|
|
|
|
*/
|
|
|
|
childStmt = (CreateTrigStmt *) copyObject(stmt);
|
|
|
|
childStmt->funcname = NIL;
|
|
|
|
childStmt->whenClause = NULL;
|
|
|
|
|
|
|
|
/* If there is a WHEN clause, create a modified copy of it */
|
|
|
|
qual = copyObject(whenClause);
|
|
|
|
qual = (Node *)
|
|
|
|
map_partition_varattnos((List *) qual, PRS2_OLD_VARNO,
|
2019-12-25 21:44:15 +01:00
|
|
|
childTbl, rel);
|
2018-03-23 14:48:22 +01:00
|
|
|
qual = (Node *)
|
|
|
|
map_partition_varattnos((List *) qual, PRS2_NEW_VARNO,
|
2019-12-25 21:44:15 +01:00
|
|
|
childTbl, rel);
|
2018-03-23 14:48:22 +01:00
|
|
|
|
2021-07-16 19:01:43 +02:00
|
|
|
CreateTriggerFiringOn(childStmt, queryString,
|
|
|
|
partdesc->oids[i], refRelOid,
|
2022-09-06 05:51:44 +02:00
|
|
|
InvalidOid, InvalidOid,
|
2021-07-16 19:01:43 +02:00
|
|
|
funcoid, trigoid, qual,
|
|
|
|
isInternal, true, trigger_fires_when);
|
2018-03-23 14:48:22 +01:00
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(childTbl, NoLock);
|
2018-03-23 14:48:22 +01:00
|
|
|
|
|
|
|
MemoryContextReset(perChildCxt);
|
|
|
|
}
|
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcxt);
|
|
|
|
MemoryContextDelete(perChildCxt);
|
|
|
|
}
|
|
|
|
|
1999-09-18 21:08:25 +02:00
|
|
|
/* Keep lock on target rel until end of xact */
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, NoLock);
|
2002-07-12 20:43:19 +02:00
|
|
|
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
return myself;
|
1997-08-31 13:40:13 +02:00
|
|
|
}
|
|
|
|
|
2022-01-05 23:00:13 +01:00
|
|
|
/*
|
|
|
|
* TriggerSetParentTrigger
|
|
|
|
* Set a partition's trigger as child of its parent trigger,
|
|
|
|
* or remove the linkage if parentTrigId is InvalidOid.
|
|
|
|
*
|
|
|
|
* This updates the constraint's pg_trigger row to show it as inherited, and
|
|
|
|
* adds PARTITION dependencies to prevent the trigger from being deleted
|
|
|
|
* on its own. Alternatively, reverse that.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
TriggerSetParentTrigger(Relation trigRel,
|
|
|
|
Oid childTrigId,
|
|
|
|
Oid parentTrigId,
|
|
|
|
Oid childTableId)
|
|
|
|
{
|
|
|
|
SysScanDesc tgscan;
|
|
|
|
ScanKeyData skey[1];
|
|
|
|
Form_pg_trigger trigForm;
|
|
|
|
HeapTuple tuple,
|
|
|
|
newtup;
|
|
|
|
ObjectAddress depender;
|
|
|
|
ObjectAddress referenced;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Find the trigger to delete.
|
|
|
|
*/
|
|
|
|
ScanKeyInit(&skey[0],
|
|
|
|
Anum_pg_trigger_oid,
|
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(childTrigId));
|
|
|
|
|
|
|
|
tgscan = systable_beginscan(trigRel, TriggerOidIndexId, true,
|
|
|
|
NULL, 1, skey);
|
|
|
|
|
|
|
|
tuple = systable_getnext(tgscan);
|
|
|
|
if (!HeapTupleIsValid(tuple))
|
|
|
|
elog(ERROR, "could not find tuple for trigger %u", childTrigId);
|
|
|
|
newtup = heap_copytuple(tuple);
|
|
|
|
trigForm = (Form_pg_trigger) GETSTRUCT(newtup);
|
|
|
|
if (OidIsValid(parentTrigId))
|
|
|
|
{
|
|
|
|
/* don't allow setting parent for a constraint that already has one */
|
|
|
|
if (OidIsValid(trigForm->tgparentid))
|
|
|
|
elog(ERROR, "trigger %u already has a parent trigger",
|
|
|
|
childTrigId);
|
|
|
|
|
|
|
|
trigForm->tgparentid = parentTrigId;
|
|
|
|
|
|
|
|
CatalogTupleUpdate(trigRel, &tuple->t_self, newtup);
|
|
|
|
|
|
|
|
ObjectAddressSet(depender, TriggerRelationId, childTrigId);
|
|
|
|
|
|
|
|
ObjectAddressSet(referenced, TriggerRelationId, parentTrigId);
|
|
|
|
recordDependencyOn(&depender, &referenced, DEPENDENCY_PARTITION_PRI);
|
|
|
|
|
|
|
|
ObjectAddressSet(referenced, RelationRelationId, childTableId);
|
|
|
|
recordDependencyOn(&depender, &referenced, DEPENDENCY_PARTITION_SEC);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
trigForm->tgparentid = InvalidOid;
|
|
|
|
|
|
|
|
CatalogTupleUpdate(trigRel, &tuple->t_self, newtup);
|
|
|
|
|
|
|
|
deleteDependencyRecordsForClass(TriggerRelationId, childTrigId,
|
|
|
|
TriggerRelationId,
|
|
|
|
DEPENDENCY_PARTITION_PRI);
|
|
|
|
deleteDependencyRecordsForClass(TriggerRelationId, childTrigId,
|
|
|
|
RelationRelationId,
|
|
|
|
DEPENDENCY_PARTITION_SEC);
|
|
|
|
}
|
|
|
|
|
|
|
|
heap_freetuple(newtup);
|
|
|
|
systable_endscan(tgscan);
|
|
|
|
}
|
|
|
|
|
2007-11-04 02:16:19 +01:00
|
|
|
|
2002-07-12 20:43:19 +02:00
|
|
|
/*
|
|
|
|
* Guts of trigger deletion.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
RemoveTriggerById(Oid trigOid)
|
1997-08-31 13:40:13 +02:00
|
|
|
{
|
1997-09-04 15:19:01 +02:00
|
|
|
Relation tgrel;
|
2002-02-19 21:11:20 +01:00
|
|
|
SysScanDesc tgscan;
|
2002-07-12 20:43:19 +02:00
|
|
|
ScanKeyData skey[1];
|
|
|
|
HeapTuple tup;
|
|
|
|
Oid relid;
|
|
|
|
Relation rel;
|
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
tgrel = table_open(TriggerRelationId, RowExclusiveLock);
|
2002-07-12 20:43:19 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Find the trigger to delete.
|
|
|
|
*/
|
2003-11-12 22:15:59 +01:00
|
|
|
ScanKeyInit(&skey[0],
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
Anum_pg_trigger_oid,
|
2003-11-12 22:15:59 +01:00
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(trigOid));
|
2002-07-12 20:43:19 +02:00
|
|
|
|
2005-04-14 22:03:27 +02:00
|
|
|
tgscan = systable_beginscan(tgrel, TriggerOidIndexId, true,
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
NULL, 1, skey);
|
2002-07-12 20:43:19 +02:00
|
|
|
|
|
|
|
tup = systable_getnext(tgscan);
|
|
|
|
if (!HeapTupleIsValid(tup))
|
2003-07-20 23:56:35 +02:00
|
|
|
elog(ERROR, "could not find tuple for trigger %u", trigOid);
|
2002-07-12 20:43:19 +02:00
|
|
|
|
|
|
|
/*
|
2011-07-07 19:14:46 +02:00
|
|
|
* Open and exclusive-lock the relation the trigger belongs to.
|
2002-07-12 20:43:19 +02:00
|
|
|
*/
|
|
|
|
relid = ((Form_pg_trigger) GETSTRUCT(tup))->tgrelid;
|
1998-09-01 06:40:42 +02:00
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_open(relid, AccessExclusiveLock);
|
2002-03-22 00:27:25 +01:00
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
if (rel->rd_rel->relkind != RELKIND_RELATION &&
|
2014-03-23 07:16:34 +01:00
|
|
|
rel->rd_rel->relkind != RELKIND_VIEW &&
|
Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own. The children are called
partitions and contain all of the actual data. Each partition has an
implicit partitioning constraint. Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed. Partitions
can't have extra columns and may not allow nulls unless the parent
does. Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
Currently, tables can be range-partitioned or list-partitioned. List
partitioning is limited to a single column, but range partitioning can
involve multiple columns. A partitioning "column" can be an
expression.
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations. The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
|
|
|
rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE &&
|
|
|
|
rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
Improve error messages about mismatching relkind
Most error messages about a relkind that was not supported or
appropriate for the command was of the pattern
"relation \"%s\" is not a table, foreign table, or materialized view"
This style can become verbose and tedious to maintain. Moreover, it's
not very helpful: If I'm trying to create a comment on a TOAST table,
which is not supported, then the information that I could have created
a comment on a materialized view is pointless.
Instead, write the primary error message shorter and saying more
directly that what was attempted is not possible. Then, in the detail
message, explain that the operation is not supported for the relkind
the object was. To simplify that, add a new function
errdetail_relkind_not_supported() that does this.
In passing, make use of RELKIND_HAS_STORAGE() where appropriate,
instead of listing out the relkinds individually.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/dc35a398-37d0-75ce-07ea-1dd71d98f8ec@2ndquadrant.com
2021-07-08 09:38:52 +02:00
|
|
|
errmsg("relation \"%s\" cannot have triggers",
|
|
|
|
RelationGetRelationName(rel)),
|
|
|
|
errdetail_relkind_not_supported(rel->rd_rel->relkind)));
|
2002-03-22 00:27:25 +01:00
|
|
|
|
2002-04-12 22:38:31 +02:00
|
|
|
if (!allowSystemTableMods && IsSystemRelation(rel))
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
2003-08-01 02:15:26 +02:00
|
|
|
errmsg("permission denied: \"%s\" is a system catalog",
|
2003-07-20 23:56:35 +02:00
|
|
|
RelationGetRelationName(rel))));
|
2002-01-04 00:21:32 +01:00
|
|
|
|
2000-07-03 05:57:03 +02:00
|
|
|
/*
|
2002-07-12 20:43:19 +02:00
|
|
|
* Delete the pg_trigger tuple.
|
2000-07-03 05:57:03 +02:00
|
|
|
*/
|
2017-02-01 22:13:30 +01:00
|
|
|
CatalogTupleDelete(tgrel, &tup->t_self);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2002-02-19 21:11:20 +01:00
|
|
|
systable_endscan(tgscan);
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(tgrel, RowExclusiveLock);
|
2002-02-19 21:11:20 +01:00
|
|
|
|
2000-07-03 05:57:03 +02:00
|
|
|
/*
|
2008-11-09 22:24:33 +01:00
|
|
|
* We do not bother to try to determine whether any other triggers remain,
|
|
|
|
* which would be needed in order to decide whether it's safe to clear the
|
|
|
|
* relation's relhastriggers. (In any case, there might be a concurrent
|
|
|
|
* process adding new triggers.) Instead, just force a relcache inval to
|
|
|
|
* make other backends (and this one too!) rebuild their relcache entries.
|
|
|
|
* There's no great harm in leaving relhastriggers true even if there are
|
|
|
|
* no triggers left.
|
2000-07-03 05:57:03 +02:00
|
|
|
*/
|
2008-11-09 22:24:33 +01:00
|
|
|
CacheInvalidateRelcache(rel);
|
2000-04-12 19:17:23 +02:00
|
|
|
|
2002-07-12 20:43:19 +02:00
|
|
|
/* Keep lock on trigger's rel until end of xact */
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, NoLock);
|
1997-08-31 13:40:13 +02:00
|
|
|
}
|
1997-09-01 09:59:06 +02:00
|
|
|
|
2010-08-05 17:25:36 +02:00
|
|
|
/*
|
|
|
|
* get_trigger_oid - Look up a trigger by name to find its OID.
|
|
|
|
*
|
|
|
|
* If missing_ok is false, throw an error if trigger not found. If
|
|
|
|
* true, just return InvalidOid.
|
|
|
|
*/
|
|
|
|
Oid
|
|
|
|
get_trigger_oid(Oid relid, const char *trigname, bool missing_ok)
|
|
|
|
{
|
|
|
|
Relation tgrel;
|
|
|
|
ScanKeyData skey[2];
|
|
|
|
SysScanDesc tgscan;
|
|
|
|
HeapTuple tup;
|
|
|
|
Oid oid;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Find the trigger, verify permissions, set up object address
|
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
tgrel = table_open(TriggerRelationId, AccessShareLock);
|
2010-08-05 17:25:36 +02:00
|
|
|
|
|
|
|
ScanKeyInit(&skey[0],
|
|
|
|
Anum_pg_trigger_tgrelid,
|
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(relid));
|
|
|
|
ScanKeyInit(&skey[1],
|
|
|
|
Anum_pg_trigger_tgname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
|
|
|
CStringGetDatum(trigname));
|
|
|
|
|
|
|
|
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
NULL, 2, skey);
|
2010-08-05 17:25:36 +02:00
|
|
|
|
|
|
|
tup = systable_getnext(tgscan);
|
|
|
|
|
|
|
|
if (!HeapTupleIsValid(tup))
|
|
|
|
{
|
|
|
|
if (!missing_ok)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("trigger \"%s\" for table \"%s\" does not exist",
|
|
|
|
trigname, get_rel_name(relid))));
|
|
|
|
oid = InvalidOid;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
oid = ((Form_pg_trigger) GETSTRUCT(tup))->oid;
|
2010-08-05 17:25:36 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
systable_endscan(tgscan);
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(tgrel, AccessShareLock);
|
2010-08-05 17:25:36 +02:00
|
|
|
return oid;
|
|
|
|
}
|
|
|
|
|
Improve behavior of concurrent rename statements.
Previously, renaming a table, sequence, view, index, foreign table,
column, or trigger checked permissions before locking the object, which
meant that if permissions were revoked during the lock wait, we would
still allow the operation. Similarly, if the original object is dropped
and a new one with the same name is created, the operation will be allowed
if we had permissions on the old object; the permissions on the new
object don't matter. All this is now fixed.
Along the way, attempting to rename a trigger on a foreign table now gives
the same error message as trying to create one there in the first place
(i.e. that it's not a table or view) rather than simply stating that no
trigger by that name exists.
Patch by me; review by Noah Misch.
2011-12-16 00:51:46 +01:00
|
|
|
/*
|
|
|
|
* Perform permissions and integrity checks before acquiring a relation lock.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
RangeVarCallbackForRenameTrigger(const RangeVar *rv, Oid relid, Oid oldrelid,
|
|
|
|
void *arg)
|
|
|
|
{
|
|
|
|
HeapTuple tuple;
|
|
|
|
Form_pg_class form;
|
|
|
|
|
|
|
|
tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
|
|
|
|
if (!HeapTupleIsValid(tuple))
|
|
|
|
return; /* concurrently dropped */
|
|
|
|
form = (Form_pg_class) GETSTRUCT(tuple);
|
|
|
|
|
|
|
|
/* only tables and views can have triggers */
|
2014-03-23 07:16:34 +01:00
|
|
|
if (form->relkind != RELKIND_RELATION && form->relkind != RELKIND_VIEW &&
|
Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own. The children are called
partitions and contain all of the actual data. Each partition has an
implicit partitioning constraint. Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed. Partitions
can't have extra columns and may not allow nulls unless the parent
does. Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
Currently, tables can be range-partitioned or list-partitioned. List
partitioning is limited to a single column, but range partitioning can
involve multiple columns. A partitioning "column" can be an
expression.
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations. The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
|
|
|
form->relkind != RELKIND_FOREIGN_TABLE &&
|
|
|
|
form->relkind != RELKIND_PARTITIONED_TABLE)
|
Improve behavior of concurrent rename statements.
Previously, renaming a table, sequence, view, index, foreign table,
column, or trigger checked permissions before locking the object, which
meant that if permissions were revoked during the lock wait, we would
still allow the operation. Similarly, if the original object is dropped
and a new one with the same name is created, the operation will be allowed
if we had permissions on the old object; the permissions on the new
object don't matter. All this is now fixed.
Along the way, attempting to rename a trigger on a foreign table now gives
the same error message as trying to create one there in the first place
(i.e. that it's not a table or view) rather than simply stating that no
trigger by that name exists.
Patch by me; review by Noah Misch.
2011-12-16 00:51:46 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
Improve error messages about mismatching relkind
Most error messages about a relkind that was not supported or
appropriate for the command was of the pattern
"relation \"%s\" is not a table, foreign table, or materialized view"
This style can become verbose and tedious to maintain. Moreover, it's
not very helpful: If I'm trying to create a comment on a TOAST table,
which is not supported, then the information that I could have created
a comment on a materialized view is pointless.
Instead, write the primary error message shorter and saying more
directly that what was attempted is not possible. Then, in the detail
message, explain that the operation is not supported for the relkind
the object was. To simplify that, add a new function
errdetail_relkind_not_supported() that does this.
In passing, make use of RELKIND_HAS_STORAGE() where appropriate,
instead of listing out the relkinds individually.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/dc35a398-37d0-75ce-07ea-1dd71d98f8ec@2ndquadrant.com
2021-07-08 09:38:52 +02:00
|
|
|
errmsg("relation \"%s\" cannot have triggers",
|
|
|
|
rv->relname),
|
|
|
|
errdetail_relkind_not_supported(form->relkind)));
|
Improve behavior of concurrent rename statements.
Previously, renaming a table, sequence, view, index, foreign table,
column, or trigger checked permissions before locking the object, which
meant that if permissions were revoked during the lock wait, we would
still allow the operation. Similarly, if the original object is dropped
and a new one with the same name is created, the operation will be allowed
if we had permissions on the old object; the permissions on the new
object don't matter. All this is now fixed.
Along the way, attempting to rename a trigger on a foreign table now gives
the same error message as trying to create one there in the first place
(i.e. that it's not a table or view) rather than simply stating that no
trigger by that name exists.
Patch by me; review by Noah Misch.
2011-12-16 00:51:46 +01:00
|
|
|
|
|
|
|
/* you must own the table to rename one of its triggers */
|
2022-11-13 08:11:17 +01:00
|
|
|
if (!object_ownercheck(RelationRelationId, relid, GetUserId()))
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(ACLCHECK_NOT_OWNER, get_relkind_objtype(get_rel_relkind(relid)), rv->relname);
|
Refine our definition of what constitutes a system relation.
Although user-defined relations can't be directly created in
pg_catalog, it's possible for them to end up there, because you can
create them in some other schema and then use ALTER TABLE .. SET SCHEMA
to move them there. Previously, such relations couldn't afterwards
be manipulated, because IsSystemRelation()/IsSystemClass() rejected
all attempts to modify objects in the pg_catalog schema, regardless
of their origin. With this patch, they now reject only those
objects in pg_catalog which were created at initdb-time, allowing
most operations on user-created tables in pg_catalog to proceed
normally.
This patch also adds new functions IsCatalogRelation() and
IsCatalogClass(), which is similar to IsSystemRelation() and
IsSystemClass() but with a slightly narrower definition: only TOAST
tables of system catalogs are included, rather than *all* TOAST tables.
This is currently used only for making decisions about when
invalidation messages need to be sent, but upcoming logical decoding
patches will find other uses for this information.
Andres Freund, with some modifications by me.
2013-11-29 02:57:20 +01:00
|
|
|
if (!allowSystemTableMods && IsSystemClass(relid, form))
|
Improve behavior of concurrent rename statements.
Previously, renaming a table, sequence, view, index, foreign table,
column, or trigger checked permissions before locking the object, which
meant that if permissions were revoked during the lock wait, we would
still allow the operation. Similarly, if the original object is dropped
and a new one with the same name is created, the operation will be allowed
if we had permissions on the old object; the permissions on the new
object don't matter. All this is now fixed.
Along the way, attempting to rename a trigger on a foreign table now gives
the same error message as trying to create one there in the first place
(i.e. that it's not a table or view) rather than simply stating that no
trigger by that name exists.
Patch by me; review by Noah Misch.
2011-12-16 00:51:46 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
|
|
|
errmsg("permission denied: \"%s\" is a system catalog",
|
|
|
|
rv->relname)));
|
|
|
|
|
|
|
|
ReleaseSysCache(tuple);
|
|
|
|
}
|
|
|
|
|
2002-04-26 21:29:47 +02:00
|
|
|
/*
|
|
|
|
* renametrig - changes the name of a trigger on a relation
|
|
|
|
*
|
|
|
|
* trigger name is changed in trigger catalog.
|
|
|
|
* No record of the previous name is kept.
|
|
|
|
*
|
|
|
|
* get proper relrelation from relation catalog (if not arg)
|
|
|
|
* scan trigger catalog
|
|
|
|
* for name conflict (within rel)
|
|
|
|
* for original trigger (if not arg)
|
|
|
|
* modify tgname in trigger tuple
|
2002-04-30 03:24:57 +02:00
|
|
|
* update row in catalog
|
2002-04-26 21:29:47 +02:00
|
|
|
*/
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddress
|
Improve behavior of concurrent rename statements.
Previously, renaming a table, sequence, view, index, foreign table,
column, or trigger checked permissions before locking the object, which
meant that if permissions were revoked during the lock wait, we would
still allow the operation. Similarly, if the original object is dropped
and a new one with the same name is created, the operation will be allowed
if we had permissions on the old object; the permissions on the new
object don't matter. All this is now fixed.
Along the way, attempting to rename a trigger on a foreign table now gives
the same error message as trying to create one there in the first place
(i.e. that it's not a table or view) rather than simply stating that no
trigger by that name exists.
Patch by me; review by Noah Misch.
2011-12-16 00:51:46 +01:00
|
|
|
renametrig(RenameStmt *stmt)
|
2002-04-26 21:29:47 +02:00
|
|
|
{
|
2012-12-24 00:25:03 +01:00
|
|
|
Oid tgoid;
|
2002-04-26 21:29:47 +02:00
|
|
|
Relation targetrel;
|
|
|
|
Relation tgrel;
|
|
|
|
HeapTuple tuple;
|
|
|
|
SysScanDesc tgscan;
|
2002-04-30 03:24:57 +02:00
|
|
|
ScanKeyData key[2];
|
Improve behavior of concurrent rename statements.
Previously, renaming a table, sequence, view, index, foreign table,
column, or trigger checked permissions before locking the object, which
meant that if permissions were revoked during the lock wait, we would
still allow the operation. Similarly, if the original object is dropped
and a new one with the same name is created, the operation will be allowed
if we had permissions on the old object; the permissions on the new
object don't matter. All this is now fixed.
Along the way, attempting to rename a trigger on a foreign table now gives
the same error message as trying to create one there in the first place
(i.e. that it's not a table or view) rather than simply stating that no
trigger by that name exists.
Patch by me; review by Noah Misch.
2011-12-16 00:51:46 +01:00
|
|
|
Oid relid;
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddress address;
|
2002-04-26 21:29:47 +02:00
|
|
|
|
|
|
|
/*
|
Improve behavior of concurrent rename statements.
Previously, renaming a table, sequence, view, index, foreign table,
column, or trigger checked permissions before locking the object, which
meant that if permissions were revoked during the lock wait, we would
still allow the operation. Similarly, if the original object is dropped
and a new one with the same name is created, the operation will be allowed
if we had permissions on the old object; the permissions on the new
object don't matter. All this is now fixed.
Along the way, attempting to rename a trigger on a foreign table now gives
the same error message as trying to create one there in the first place
(i.e. that it's not a table or view) rather than simply stating that no
trigger by that name exists.
Patch by me; review by Noah Misch.
2011-12-16 00:51:46 +01:00
|
|
|
* Look up name, check permissions, and acquire lock (which we will NOT
|
|
|
|
* release until end of transaction).
|
2002-04-26 21:29:47 +02:00
|
|
|
*/
|
Improve behavior of concurrent rename statements.
Previously, renaming a table, sequence, view, index, foreign table,
column, or trigger checked permissions before locking the object, which
meant that if permissions were revoked during the lock wait, we would
still allow the operation. Similarly, if the original object is dropped
and a new one with the same name is created, the operation will be allowed
if we had permissions on the old object; the permissions on the new
object don't matter. All this is now fixed.
Along the way, attempting to rename a trigger on a foreign table now gives
the same error message as trying to create one there in the first place
(i.e. that it's not a table or view) rather than simply stating that no
trigger by that name exists.
Patch by me; review by Noah Misch.
2011-12-16 00:51:46 +01:00
|
|
|
relid = RangeVarGetRelidExtended(stmt->relation, AccessExclusiveLock,
|
2018-03-31 01:33:42 +02:00
|
|
|
0,
|
Improve behavior of concurrent rename statements.
Previously, renaming a table, sequence, view, index, foreign table,
column, or trigger checked permissions before locking the object, which
meant that if permissions were revoked during the lock wait, we would
still allow the operation. Similarly, if the original object is dropped
and a new one with the same name is created, the operation will be allowed
if we had permissions on the old object; the permissions on the new
object don't matter. All this is now fixed.
Along the way, attempting to rename a trigger on a foreign table now gives
the same error message as trying to create one there in the first place
(i.e. that it's not a table or view) rather than simply stating that no
trigger by that name exists.
Patch by me; review by Noah Misch.
2011-12-16 00:51:46 +01:00
|
|
|
RangeVarCallbackForRenameTrigger,
|
|
|
|
NULL);
|
|
|
|
|
|
|
|
/* Have lock already, so just need to build relcache entry. */
|
|
|
|
targetrel = relation_open(relid, NoLock);
|
2002-04-26 21:29:47 +02:00
|
|
|
|
|
|
|
/*
|
2021-07-23 00:33:47 +02:00
|
|
|
* On partitioned tables, this operation recurses to partitions. Lock all
|
|
|
|
* tables upfront.
|
2002-04-26 21:29:47 +02:00
|
|
|
*/
|
2021-07-23 00:33:47 +02:00
|
|
|
if (targetrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
|
|
|
|
(void) find_all_inheritors(relid, AccessExclusiveLock, NULL);
|
2002-04-26 21:29:47 +02:00
|
|
|
|
2021-07-23 00:33:47 +02:00
|
|
|
tgrel = table_open(TriggerRelationId, RowExclusiveLock);
|
2002-04-26 21:29:47 +02:00
|
|
|
|
|
|
|
/*
|
2021-07-23 00:33:47 +02:00
|
|
|
* Search for the trigger to modify.
|
2002-04-26 21:29:47 +02:00
|
|
|
*/
|
2003-11-12 22:15:59 +01:00
|
|
|
ScanKeyInit(&key[0],
|
|
|
|
Anum_pg_trigger_tgrelid,
|
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(relid));
|
|
|
|
ScanKeyInit(&key[1],
|
|
|
|
Anum_pg_trigger_tgname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
Improve behavior of concurrent rename statements.
Previously, renaming a table, sequence, view, index, foreign table,
column, or trigger checked permissions before locking the object, which
meant that if permissions were revoked during the lock wait, we would
still allow the operation. Similarly, if the original object is dropped
and a new one with the same name is created, the operation will be allowed
if we had permissions on the old object; the permissions on the new
object don't matter. All this is now fixed.
Along the way, attempting to rename a trigger on a foreign table now gives
the same error message as trying to create one there in the first place
(i.e. that it's not a table or view) rather than simply stating that no
trigger by that name exists.
Patch by me; review by Noah Misch.
2011-12-16 00:51:46 +01:00
|
|
|
PointerGetDatum(stmt->subname));
|
2005-04-14 22:03:27 +02:00
|
|
|
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
NULL, 2, key);
|
2002-04-30 03:24:57 +02:00
|
|
|
if (HeapTupleIsValid(tuple = systable_getnext(tgscan)))
|
2002-04-26 21:29:47 +02:00
|
|
|
{
|
2019-01-21 18:12:31 +01:00
|
|
|
Form_pg_trigger trigform;
|
2013-05-29 22:58:43 +02:00
|
|
|
|
2019-01-21 18:12:31 +01:00
|
|
|
trigform = (Form_pg_trigger) GETSTRUCT(tuple);
|
|
|
|
tgoid = trigform->oid;
|
2002-04-26 21:29:47 +02:00
|
|
|
|
2021-07-23 00:33:47 +02:00
|
|
|
/*
|
|
|
|
* If the trigger descends from a trigger on a parent partitioned
|
|
|
|
* table, reject the rename. We don't allow a trigger in a partition
|
|
|
|
* to differ in name from that of its parent: that would lead to an
|
|
|
|
* inconsistency that pg_dump would not reproduce.
|
|
|
|
*/
|
|
|
|
if (OidIsValid(trigform->tgparentid))
|
|
|
|
ereport(ERROR,
|
|
|
|
errmsg("cannot rename trigger \"%s\" on table \"%s\"",
|
|
|
|
stmt->subname, RelationGetRelationName(targetrel)),
|
2022-09-25 00:38:35 +02:00
|
|
|
errhint("Rename the trigger on the partitioned table \"%s\" instead.",
|
2021-07-23 00:33:47 +02:00
|
|
|
get_rel_name(get_partition_parent(relid, false))));
|
2002-04-26 21:29:47 +02:00
|
|
|
|
|
|
|
|
2021-07-23 00:33:47 +02:00
|
|
|
/* Rename the trigger on this relation ... */
|
|
|
|
renametrig_internal(tgrel, targetrel, tuple, stmt->newname,
|
|
|
|
stmt->subname);
|
2013-03-18 03:55:14 +01:00
|
|
|
|
2021-07-23 00:33:47 +02:00
|
|
|
/* ... and if it is partitioned, recurse to its partitions */
|
|
|
|
if (targetrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
|
|
|
|
{
|
|
|
|
PartitionDesc partdesc = RelationGetPartitionDesc(targetrel, true);
|
|
|
|
|
|
|
|
for (int i = 0; i < partdesc->nparts; i++)
|
|
|
|
{
|
|
|
|
Oid partitionId = partdesc->oids[i];
|
|
|
|
|
|
|
|
renametrig_partition(tgrel, partitionId, trigform->oid,
|
|
|
|
stmt->newname, stmt->subname);
|
|
|
|
}
|
|
|
|
}
|
2002-04-30 03:24:57 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
2003-09-25 08:58:07 +02:00
|
|
|
errmsg("trigger \"%s\" for table \"%s\" does not exist",
|
Improve behavior of concurrent rename statements.
Previously, renaming a table, sequence, view, index, foreign table,
column, or trigger checked permissions before locking the object, which
meant that if permissions were revoked during the lock wait, we would
still allow the operation. Similarly, if the original object is dropped
and a new one with the same name is created, the operation will be allowed
if we had permissions on the old object; the permissions on the new
object don't matter. All this is now fixed.
Along the way, attempting to rename a trigger on a foreign table now gives
the same error message as trying to create one there in the first place
(i.e. that it's not a table or view) rather than simply stating that no
trigger by that name exists.
Patch by me; review by Noah Misch.
2011-12-16 00:51:46 +01:00
|
|
|
stmt->subname, RelationGetRelationName(targetrel))));
|
2002-04-30 03:24:57 +02:00
|
|
|
}
|
|
|
|
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddressSet(address, TriggerRelationId, tgoid);
|
|
|
|
|
2002-04-30 03:24:57 +02:00
|
|
|
systable_endscan(tgscan);
|
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(tgrel, RowExclusiveLock);
|
2002-04-26 21:29:47 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Close rel, but keep exclusive lock!
|
|
|
|
*/
|
Improve behavior of concurrent rename statements.
Previously, renaming a table, sequence, view, index, foreign table,
column, or trigger checked permissions before locking the object, which
meant that if permissions were revoked during the lock wait, we would
still allow the operation. Similarly, if the original object is dropped
and a new one with the same name is created, the operation will be allowed
if we had permissions on the old object; the permissions on the new
object don't matter. All this is now fixed.
Along the way, attempting to rename a trigger on a foreign table now gives
the same error message as trying to create one there in the first place
(i.e. that it's not a table or view) rather than simply stating that no
trigger by that name exists.
Patch by me; review by Noah Misch.
2011-12-16 00:51:46 +01:00
|
|
|
relation_close(targetrel, NoLock);
|
2012-12-24 00:25:03 +01:00
|
|
|
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
return address;
|
2002-04-26 21:29:47 +02:00
|
|
|
}
|
|
|
|
|
2021-07-23 00:33:47 +02:00
|
|
|
/*
|
|
|
|
* Subroutine for renametrig -- perform the actual work of renaming one
|
|
|
|
* trigger on one table.
|
|
|
|
*
|
|
|
|
* If the trigger has a name different from the expected one, raise a
|
|
|
|
* NOTICE about it.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
renametrig_internal(Relation tgrel, Relation targetrel, HeapTuple trigtup,
|
|
|
|
const char *newname, const char *expected_name)
|
|
|
|
{
|
|
|
|
HeapTuple tuple;
|
|
|
|
Form_pg_trigger tgform;
|
|
|
|
ScanKeyData key[2];
|
|
|
|
SysScanDesc tgscan;
|
|
|
|
|
|
|
|
/* If the trigger already has the new name, nothing to do. */
|
|
|
|
tgform = (Form_pg_trigger) GETSTRUCT(trigtup);
|
|
|
|
if (strcmp(NameStr(tgform->tgname), newname) == 0)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Before actually trying the rename, search for triggers with the same
|
|
|
|
* name. The update would fail with an ugly message in that case, and it
|
|
|
|
* is better to throw a nicer error.
|
|
|
|
*/
|
|
|
|
ScanKeyInit(&key[0],
|
|
|
|
Anum_pg_trigger_tgrelid,
|
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(RelationGetRelid(targetrel)));
|
|
|
|
ScanKeyInit(&key[1],
|
|
|
|
Anum_pg_trigger_tgname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
|
|
|
PointerGetDatum(newname));
|
|
|
|
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
|
|
|
|
NULL, 2, key);
|
|
|
|
if (HeapTupleIsValid(tuple = systable_getnext(tgscan)))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DUPLICATE_OBJECT),
|
|
|
|
errmsg("trigger \"%s\" for relation \"%s\" already exists",
|
|
|
|
newname, RelationGetRelationName(targetrel))));
|
|
|
|
systable_endscan(tgscan);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The target name is free; update the existing pg_trigger tuple with it.
|
|
|
|
*/
|
|
|
|
tuple = heap_copytuple(trigtup); /* need a modifiable copy */
|
|
|
|
tgform = (Form_pg_trigger) GETSTRUCT(tuple);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the trigger has a name different from what we expected, let the user
|
|
|
|
* know. (We can proceed anyway, since we must have reached here following
|
|
|
|
* a tgparentid link.)
|
|
|
|
*/
|
|
|
|
if (strcmp(NameStr(tgform->tgname), expected_name) != 0)
|
|
|
|
ereport(NOTICE,
|
|
|
|
errmsg("renamed trigger \"%s\" on relation \"%s\"",
|
|
|
|
NameStr(tgform->tgname),
|
|
|
|
RelationGetRelationName(targetrel)));
|
|
|
|
|
|
|
|
namestrcpy(&tgform->tgname, newname);
|
|
|
|
|
|
|
|
CatalogTupleUpdate(tgrel, &tuple->t_self, tuple);
|
|
|
|
|
|
|
|
InvokeObjectPostAlterHook(TriggerRelationId, tgform->oid, 0);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Invalidate relation's relcache entry so that other backends (and this
|
|
|
|
* one too!) are sent SI message to make them rebuild relcache entries.
|
|
|
|
* (Ideally this should happen automatically...)
|
|
|
|
*/
|
|
|
|
CacheInvalidateRelcache(targetrel);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Subroutine for renametrig -- Helper for recursing to partitions when
|
|
|
|
* renaming triggers on a partitioned table.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
renametrig_partition(Relation tgrel, Oid partitionId, Oid parentTriggerOid,
|
|
|
|
const char *newname, const char *expected_name)
|
|
|
|
{
|
|
|
|
SysScanDesc tgscan;
|
|
|
|
ScanKeyData key;
|
|
|
|
HeapTuple tuple;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Given a relation and the OID of a trigger on parent relation, find the
|
|
|
|
* corresponding trigger in the child and rename that trigger to the given
|
|
|
|
* name.
|
|
|
|
*/
|
|
|
|
ScanKeyInit(&key,
|
|
|
|
Anum_pg_trigger_tgrelid,
|
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(partitionId));
|
|
|
|
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
|
|
|
|
NULL, 1, &key);
|
|
|
|
while (HeapTupleIsValid(tuple = systable_getnext(tgscan)))
|
|
|
|
{
|
|
|
|
Form_pg_trigger tgform = (Form_pg_trigger) GETSTRUCT(tuple);
|
|
|
|
Relation partitionRel;
|
|
|
|
|
|
|
|
if (tgform->tgparentid != parentTriggerOid)
|
|
|
|
continue; /* not our trigger */
|
|
|
|
|
|
|
|
partitionRel = table_open(partitionId, NoLock);
|
|
|
|
|
|
|
|
/* Rename the trigger on this partition */
|
|
|
|
renametrig_internal(tgrel, partitionRel, tuple, newname, expected_name);
|
|
|
|
|
|
|
|
/* And if this relation is partitioned, recurse to its partitions */
|
|
|
|
if (partitionRel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
|
|
|
|
{
|
|
|
|
PartitionDesc partdesc = RelationGetPartitionDesc(partitionRel,
|
|
|
|
true);
|
|
|
|
|
|
|
|
for (int i = 0; i < partdesc->nparts; i++)
|
|
|
|
{
|
2022-10-05 10:01:41 +02:00
|
|
|
Oid partoid = partdesc->oids[i];
|
2021-07-23 00:33:47 +02:00
|
|
|
|
2022-10-05 10:01:41 +02:00
|
|
|
renametrig_partition(tgrel, partoid, tgform->oid, newname,
|
2021-07-23 00:33:47 +02:00
|
|
|
NameStr(tgform->tgname));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
table_close(partitionRel, NoLock);
|
2021-07-26 18:56:33 +02:00
|
|
|
|
|
|
|
/* There should be at most one matching tuple */
|
|
|
|
break;
|
2021-07-23 00:33:47 +02:00
|
|
|
}
|
|
|
|
systable_endscan(tgscan);
|
|
|
|
}
|
2005-08-24 00:40:47 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* EnableDisableTrigger()
|
|
|
|
*
|
2009-01-22 20:16:31 +01:00
|
|
|
* Called by ALTER TABLE ENABLE/DISABLE [ REPLICA | ALWAYS ] TRIGGER
|
2007-03-20 00:38:32 +01:00
|
|
|
* to change 'tgenabled' field for the specified trigger(s)
|
2005-08-24 00:40:47 +02:00
|
|
|
*
|
|
|
|
* rel: relation to process (caller must hold suitable lock on it)
|
|
|
|
* tgname: trigger to process, or NULL to scan all triggers
|
2009-01-22 20:16:31 +01:00
|
|
|
* fires_when: new value for tgenabled field. In addition to generic
|
|
|
|
* enablement/disablement, this also defines when the trigger
|
|
|
|
* should be fired in session replication roles.
|
2005-08-24 00:40:47 +02:00
|
|
|
* skip_system: if true, skip "system" triggers (constraint triggers)
|
2022-08-04 20:02:02 +02:00
|
|
|
* recurse: if true, recurse to partitions
|
2005-08-24 00:40:47 +02:00
|
|
|
*
|
|
|
|
* Caller should have checked permissions for the table; here we also
|
|
|
|
* enforce that superuser privilege is required to alter the state of
|
|
|
|
* system triggers
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
EnableDisableTrigger(Relation rel, const char *tgname,
|
2022-08-04 20:02:02 +02:00
|
|
|
char fires_when, bool skip_system, bool recurse,
|
|
|
|
LOCKMODE lockmode)
|
2005-08-24 00:40:47 +02:00
|
|
|
{
|
|
|
|
Relation tgrel;
|
|
|
|
int nkeys;
|
|
|
|
ScanKeyData keys[2];
|
|
|
|
SysScanDesc tgscan;
|
|
|
|
HeapTuple tuple;
|
|
|
|
bool found;
|
|
|
|
bool changed;
|
|
|
|
|
|
|
|
/* Scan the relevant entries in pg_triggers */
|
2019-01-21 19:32:19 +01:00
|
|
|
tgrel = table_open(TriggerRelationId, RowExclusiveLock);
|
2005-08-24 00:40:47 +02:00
|
|
|
|
|
|
|
ScanKeyInit(&keys[0],
|
|
|
|
Anum_pg_trigger_tgrelid,
|
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(RelationGetRelid(rel)));
|
|
|
|
if (tgname)
|
|
|
|
{
|
|
|
|
ScanKeyInit(&keys[1],
|
|
|
|
Anum_pg_trigger_tgname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
|
|
|
CStringGetDatum(tgname));
|
|
|
|
nkeys = 2;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
nkeys = 1;
|
|
|
|
|
|
|
|
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
NULL, nkeys, keys);
|
2005-08-24 00:40:47 +02:00
|
|
|
|
|
|
|
found = changed = false;
|
|
|
|
|
|
|
|
while (HeapTupleIsValid(tuple = systable_getnext(tgscan)))
|
|
|
|
{
|
|
|
|
Form_pg_trigger oldtrig = (Form_pg_trigger) GETSTRUCT(tuple);
|
|
|
|
|
2010-01-17 23:56:23 +01:00
|
|
|
if (oldtrig->tgisinternal)
|
2005-08-24 00:40:47 +02:00
|
|
|
{
|
|
|
|
/* system trigger ... ok to process? */
|
|
|
|
if (skip_system)
|
|
|
|
continue;
|
|
|
|
if (!superuser())
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
|
|
|
errmsg("permission denied: \"%s\" is a system trigger",
|
|
|
|
NameStr(oldtrig->tgname))));
|
|
|
|
}
|
|
|
|
|
|
|
|
found = true;
|
|
|
|
|
2007-03-20 00:38:32 +01:00
|
|
|
if (oldtrig->tgenabled != fires_when)
|
2005-08-24 00:40:47 +02:00
|
|
|
{
|
|
|
|
/* need to change this one ... make a copy to scribble on */
|
|
|
|
HeapTuple newtup = heap_copytuple(tuple);
|
|
|
|
Form_pg_trigger newtrig = (Form_pg_trigger) GETSTRUCT(newtup);
|
|
|
|
|
2007-03-20 00:38:32 +01:00
|
|
|
newtrig->tgenabled = fires_when;
|
2005-08-24 00:40:47 +02:00
|
|
|
|
2017-01-31 22:42:24 +01:00
|
|
|
CatalogTupleUpdate(tgrel, &newtup->t_self, newtup);
|
2005-08-24 00:40:47 +02:00
|
|
|
|
|
|
|
heap_freetuple(newtup);
|
|
|
|
|
|
|
|
changed = true;
|
|
|
|
}
|
2013-03-18 03:55:14 +01:00
|
|
|
|
2022-08-04 20:02:02 +02:00
|
|
|
/*
|
|
|
|
* When altering FOR EACH ROW triggers on a partitioned table, do the
|
|
|
|
* same on the partitions as well, unless ONLY is specified.
|
|
|
|
*
|
|
|
|
* Note that we recurse even if we didn't change the trigger above,
|
|
|
|
* because the partitions' copy of the trigger may have a different
|
|
|
|
* value of tgenabled than the parent's trigger and thus might need to
|
|
|
|
* be changed.
|
|
|
|
*/
|
|
|
|
if (recurse &&
|
|
|
|
rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
|
|
|
|
(TRIGGER_FOR_ROW(oldtrig->tgtype)))
|
|
|
|
{
|
|
|
|
PartitionDesc partdesc = RelationGetPartitionDesc(rel, true);
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < partdesc->nparts; i++)
|
|
|
|
{
|
|
|
|
Relation part;
|
|
|
|
|
|
|
|
part = relation_open(partdesc->oids[i], lockmode);
|
|
|
|
EnableDisableTrigger(part, NameStr(oldtrig->tgname),
|
|
|
|
fires_when, skip_system, recurse,
|
|
|
|
lockmode);
|
|
|
|
table_close(part, NoLock); /* keep lock till commit */
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-03-18 03:55:14 +01:00
|
|
|
InvokeObjectPostAlterHook(TriggerRelationId,
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
oldtrig->oid, 0);
|
2005-08-24 00:40:47 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
systable_endscan(tgscan);
|
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(tgrel, RowExclusiveLock);
|
2005-08-24 00:40:47 +02:00
|
|
|
|
|
|
|
if (tgname && !found)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("trigger \"%s\" for table \"%s\" does not exist",
|
|
|
|
tgname, RelationGetRelationName(rel))));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we changed anything, broadcast a SI inval message to force each
|
|
|
|
* backend (including our own!) to rebuild relation's relcache entry.
|
|
|
|
* Otherwise they will fail to apply the change promptly.
|
|
|
|
*/
|
|
|
|
if (changed)
|
|
|
|
CacheInvalidateRelcache(rel);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2000-06-30 09:04:23 +02:00
|
|
|
/*
|
|
|
|
* Build trigger data to attach to the given relcache entry.
|
|
|
|
*
|
2002-10-14 18:51:30 +02:00
|
|
|
* Note that trigger data attached to a relcache entry must be stored in
|
|
|
|
* CacheMemoryContext to ensure it survives as long as the relcache entry.
|
|
|
|
* But we should be running in a less long-lived working context. To avoid
|
|
|
|
* leaking cache memory if this routine fails partway through, we build a
|
|
|
|
* temporary TriggerDesc in working memory and then copy the completed
|
|
|
|
* structure into cache memory.
|
2000-06-30 09:04:23 +02:00
|
|
|
*/
|
1997-09-01 09:59:06 +02:00
|
|
|
void
|
|
|
|
RelationBuildTriggers(Relation relation)
|
|
|
|
{
|
2000-06-30 09:04:23 +02:00
|
|
|
TriggerDesc *trigdesc;
|
2008-11-09 22:24:33 +01:00
|
|
|
int numtrigs;
|
|
|
|
int maxtrigs;
|
2002-04-19 18:36:08 +02:00
|
|
|
Trigger *triggers;
|
1997-09-01 09:59:06 +02:00
|
|
|
Relation tgrel;
|
|
|
|
ScanKeyData skey;
|
2002-02-19 21:11:20 +01:00
|
|
|
SysScanDesc tgscan;
|
2000-02-18 10:30:20 +01:00
|
|
|
HeapTuple htup;
|
2002-10-14 18:51:30 +02:00
|
|
|
MemoryContext oldContext;
|
2008-11-09 22:24:33 +01:00
|
|
|
int i;
|
2002-10-14 18:51:30 +02:00
|
|
|
|
2008-11-09 22:24:33 +01:00
|
|
|
/*
|
|
|
|
* Allocate a working array to hold the triggers (the array is extended if
|
|
|
|
* necessary)
|
|
|
|
*/
|
|
|
|
maxtrigs = 16;
|
|
|
|
triggers = (Trigger *) palloc(maxtrigs * sizeof(Trigger));
|
|
|
|
numtrigs = 0;
|
2002-04-19 18:36:08 +02:00
|
|
|
|
|
|
|
/*
|
2005-04-14 22:03:27 +02:00
|
|
|
* Note: since we scan the triggers using TriggerRelidNameIndexId, we will
|
2002-04-19 18:36:08 +02:00
|
|
|
* be reading the triggers in name order, except possibly during
|
2006-01-05 11:07:46 +01:00
|
|
|
* emergency-recovery operations (ie, IgnoreSystemIndexes). This in turn
|
2002-04-19 18:36:08 +02:00
|
|
|
* ensures that triggers will be fired in name order.
|
|
|
|
*/
|
2003-11-12 22:15:59 +01:00
|
|
|
ScanKeyInit(&skey,
|
|
|
|
Anum_pg_trigger_tgrelid,
|
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(RelationGetRelid(relation)));
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
tgrel = table_open(TriggerRelationId, AccessShareLock);
|
2005-04-14 22:03:27 +02:00
|
|
|
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
NULL, 1, &skey);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2002-02-19 21:11:20 +01:00
|
|
|
while (HeapTupleIsValid(htup = systable_getnext(tgscan)))
|
1997-09-01 09:59:06 +02:00
|
|
|
{
|
2002-02-19 21:11:20 +01:00
|
|
|
Form_pg_trigger pg_trigger = (Form_pg_trigger) GETSTRUCT(htup);
|
|
|
|
Trigger *build;
|
2009-11-20 21:38:12 +01:00
|
|
|
Datum datum;
|
|
|
|
bool isnull;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2008-11-09 22:24:33 +01:00
|
|
|
if (numtrigs >= maxtrigs)
|
|
|
|
{
|
|
|
|
maxtrigs *= 2;
|
|
|
|
triggers = (Trigger *) repalloc(triggers, maxtrigs * sizeof(Trigger));
|
|
|
|
}
|
|
|
|
build = &(triggers[numtrigs]);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
build->tgoid = pg_trigger->oid;
|
2002-10-14 18:51:30 +02:00
|
|
|
build->tgname = DatumGetCString(DirectFunctionCall1(nameout,
|
|
|
|
NameGetDatum(&pg_trigger->tgname)));
|
1997-09-04 15:19:01 +02:00
|
|
|
build->tgfoid = pg_trigger->tgfoid;
|
1997-09-01 09:59:06 +02:00
|
|
|
build->tgtype = pg_trigger->tgtype;
|
1999-09-29 18:06:40 +02:00
|
|
|
build->tgenabled = pg_trigger->tgenabled;
|
2010-01-17 23:56:23 +01:00
|
|
|
build->tgisinternal = pg_trigger->tgisinternal;
|
2020-03-18 22:58:05 +01:00
|
|
|
build->tgisclone = OidIsValid(pg_trigger->tgparentid);
|
2002-04-02 00:36:13 +02:00
|
|
|
build->tgconstrrelid = pg_trigger->tgconstrrelid;
|
2009-07-28 04:56:31 +02:00
|
|
|
build->tgconstrindid = pg_trigger->tgconstrindid;
|
2007-02-14 02:58:58 +01:00
|
|
|
build->tgconstraint = pg_trigger->tgconstraint;
|
1999-09-29 18:06:40 +02:00
|
|
|
build->tgdeferrable = pg_trigger->tgdeferrable;
|
|
|
|
build->tginitdeferred = pg_trigger->tginitdeferred;
|
1997-09-01 09:59:06 +02:00
|
|
|
build->tgnargs = pg_trigger->tgnargs;
|
2005-03-29 02:17:27 +02:00
|
|
|
/* tgattr is first var-width field, so OK to access directly */
|
|
|
|
build->tgnattr = pg_trigger->tgattr.dim1;
|
|
|
|
if (build->tgnattr > 0)
|
|
|
|
{
|
2012-06-25 00:51:46 +02:00
|
|
|
build->tgattr = (int16 *) palloc(build->tgnattr * sizeof(int16));
|
2005-03-29 02:17:27 +02:00
|
|
|
memcpy(build->tgattr, &(pg_trigger->tgattr.values),
|
2012-06-25 00:51:46 +02:00
|
|
|
build->tgnattr * sizeof(int16));
|
2005-03-29 02:17:27 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
build->tgattr = NULL;
|
1997-09-01 09:59:06 +02:00
|
|
|
if (build->tgnargs > 0)
|
|
|
|
{
|
2002-08-25 19:20:01 +02:00
|
|
|
bytea *val;
|
1997-09-01 09:59:06 +02:00
|
|
|
char *p;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2017-03-13 00:35:34 +01:00
|
|
|
val = DatumGetByteaPP(fastgetattr(htup,
|
|
|
|
Anum_pg_trigger_tgargs,
|
|
|
|
tgrel->rd_att, &isnull));
|
1997-09-01 09:59:06 +02:00
|
|
|
if (isnull)
|
2003-07-20 23:56:35 +02:00
|
|
|
elog(ERROR, "tgargs is null in trigger for relation \"%s\"",
|
2000-01-31 05:35:57 +01:00
|
|
|
RelationGetRelationName(relation));
|
2017-03-13 00:35:34 +01:00
|
|
|
p = (char *) VARDATA_ANY(val);
|
2002-10-14 18:51:30 +02:00
|
|
|
build->tgargs = (char **) palloc(build->tgnargs * sizeof(char *));
|
1997-09-01 09:59:06 +02:00
|
|
|
for (i = 0; i < build->tgnargs; i++)
|
|
|
|
{
|
2002-10-14 18:51:30 +02:00
|
|
|
build->tgargs[i] = pstrdup(p);
|
1997-09-01 09:59:06 +02:00
|
|
|
p += strlen(p) + 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else
|
1997-09-04 15:19:01 +02:00
|
|
|
build->tgargs = NULL;
|
2016-11-04 16:49:50 +01:00
|
|
|
|
|
|
|
datum = fastgetattr(htup, Anum_pg_trigger_tgoldtable,
|
|
|
|
tgrel->rd_att, &isnull);
|
|
|
|
if (!isnull)
|
|
|
|
build->tgoldtable =
|
|
|
|
DatumGetCString(DirectFunctionCall1(nameout, datum));
|
|
|
|
else
|
|
|
|
build->tgoldtable = NULL;
|
|
|
|
|
|
|
|
datum = fastgetattr(htup, Anum_pg_trigger_tgnewtable,
|
|
|
|
tgrel->rd_att, &isnull);
|
|
|
|
if (!isnull)
|
|
|
|
build->tgnewtable =
|
|
|
|
DatumGetCString(DirectFunctionCall1(nameout, datum));
|
|
|
|
else
|
|
|
|
build->tgnewtable = NULL;
|
|
|
|
|
2009-11-20 21:38:12 +01:00
|
|
|
datum = fastgetattr(htup, Anum_pg_trigger_tgqual,
|
|
|
|
tgrel->rd_att, &isnull);
|
|
|
|
if (!isnull)
|
|
|
|
build->tgqual = TextDatumGetCString(datum);
|
|
|
|
else
|
|
|
|
build->tgqual = NULL;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2008-11-09 22:24:33 +01:00
|
|
|
numtrigs++;
|
1997-09-01 09:59:06 +02:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2002-02-19 21:11:20 +01:00
|
|
|
systable_endscan(tgscan);
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(tgrel, AccessShareLock);
|
2002-02-19 21:11:20 +01:00
|
|
|
|
2008-11-09 22:24:33 +01:00
|
|
|
/* There might not be any triggers */
|
|
|
|
if (numtrigs == 0)
|
|
|
|
{
|
|
|
|
pfree(triggers);
|
|
|
|
return;
|
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
1997-09-01 09:59:06 +02:00
|
|
|
/* Build trigdesc */
|
2002-11-13 01:39:48 +01:00
|
|
|
trigdesc = (TriggerDesc *) palloc0(sizeof(TriggerDesc));
|
1997-09-01 09:59:06 +02:00
|
|
|
trigdesc->triggers = triggers;
|
2008-11-09 22:24:33 +01:00
|
|
|
trigdesc->numtriggers = numtrigs;
|
|
|
|
for (i = 0; i < numtrigs; i++)
|
2010-10-10 19:43:33 +02:00
|
|
|
SetTriggerFlags(trigdesc, &(triggers[i]));
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2002-10-14 18:51:30 +02:00
|
|
|
/* Copy completed trigdesc into cache storage */
|
|
|
|
oldContext = MemoryContextSwitchTo(CacheMemoryContext);
|
|
|
|
relation->trigdesc = CopyTriggerDesc(trigdesc);
|
|
|
|
MemoryContextSwitchTo(oldContext);
|
|
|
|
|
|
|
|
/* Release working memory */
|
|
|
|
FreeTriggerDesc(trigdesc);
|
1997-09-04 15:19:01 +02:00
|
|
|
}
|
|
|
|
|
2002-10-14 18:51:30 +02:00
|
|
|
/*
|
2010-10-10 19:43:33 +02:00
|
|
|
* Update the TriggerDesc's hint flags to include the specified trigger
|
2002-10-14 18:51:30 +02:00
|
|
|
*/
|
1997-09-04 15:19:01 +02:00
|
|
|
static void
|
2010-10-10 19:43:33 +02:00
|
|
|
SetTriggerFlags(TriggerDesc *trigdesc, Trigger *trigger)
|
1997-09-04 15:19:01 +02:00
|
|
|
{
|
2010-10-10 19:43:33 +02:00
|
|
|
int16 tgtype = trigger->tgtype;
|
|
|
|
|
|
|
|
trigdesc->trig_insert_before_row |=
|
|
|
|
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_ROW,
|
|
|
|
TRIGGER_TYPE_BEFORE, TRIGGER_TYPE_INSERT);
|
|
|
|
trigdesc->trig_insert_after_row |=
|
|
|
|
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_ROW,
|
|
|
|
TRIGGER_TYPE_AFTER, TRIGGER_TYPE_INSERT);
|
|
|
|
trigdesc->trig_insert_instead_row |=
|
|
|
|
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_ROW,
|
|
|
|
TRIGGER_TYPE_INSTEAD, TRIGGER_TYPE_INSERT);
|
|
|
|
trigdesc->trig_insert_before_statement |=
|
|
|
|
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_STATEMENT,
|
|
|
|
TRIGGER_TYPE_BEFORE, TRIGGER_TYPE_INSERT);
|
|
|
|
trigdesc->trig_insert_after_statement |=
|
|
|
|
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_STATEMENT,
|
|
|
|
TRIGGER_TYPE_AFTER, TRIGGER_TYPE_INSERT);
|
|
|
|
trigdesc->trig_update_before_row |=
|
|
|
|
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_ROW,
|
|
|
|
TRIGGER_TYPE_BEFORE, TRIGGER_TYPE_UPDATE);
|
|
|
|
trigdesc->trig_update_after_row |=
|
|
|
|
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_ROW,
|
|
|
|
TRIGGER_TYPE_AFTER, TRIGGER_TYPE_UPDATE);
|
|
|
|
trigdesc->trig_update_instead_row |=
|
|
|
|
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_ROW,
|
|
|
|
TRIGGER_TYPE_INSTEAD, TRIGGER_TYPE_UPDATE);
|
|
|
|
trigdesc->trig_update_before_statement |=
|
|
|
|
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_STATEMENT,
|
|
|
|
TRIGGER_TYPE_BEFORE, TRIGGER_TYPE_UPDATE);
|
|
|
|
trigdesc->trig_update_after_statement |=
|
|
|
|
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_STATEMENT,
|
|
|
|
TRIGGER_TYPE_AFTER, TRIGGER_TYPE_UPDATE);
|
|
|
|
trigdesc->trig_delete_before_row |=
|
|
|
|
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_ROW,
|
|
|
|
TRIGGER_TYPE_BEFORE, TRIGGER_TYPE_DELETE);
|
|
|
|
trigdesc->trig_delete_after_row |=
|
|
|
|
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_ROW,
|
|
|
|
TRIGGER_TYPE_AFTER, TRIGGER_TYPE_DELETE);
|
|
|
|
trigdesc->trig_delete_instead_row |=
|
|
|
|
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_ROW,
|
|
|
|
TRIGGER_TYPE_INSTEAD, TRIGGER_TYPE_DELETE);
|
|
|
|
trigdesc->trig_delete_before_statement |=
|
|
|
|
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_STATEMENT,
|
|
|
|
TRIGGER_TYPE_BEFORE, TRIGGER_TYPE_DELETE);
|
|
|
|
trigdesc->trig_delete_after_statement |=
|
|
|
|
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_STATEMENT,
|
|
|
|
TRIGGER_TYPE_AFTER, TRIGGER_TYPE_DELETE);
|
|
|
|
/* there are no row-level truncate triggers */
|
|
|
|
trigdesc->trig_truncate_before_statement |=
|
|
|
|
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_STATEMENT,
|
|
|
|
TRIGGER_TYPE_BEFORE, TRIGGER_TYPE_TRUNCATE);
|
|
|
|
trigdesc->trig_truncate_after_statement |=
|
|
|
|
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_STATEMENT,
|
|
|
|
TRIGGER_TYPE_AFTER, TRIGGER_TYPE_TRUNCATE);
|
2016-11-04 16:49:50 +01:00
|
|
|
|
|
|
|
trigdesc->trig_insert_new_table |=
|
|
|
|
(TRIGGER_FOR_INSERT(tgtype) &&
|
|
|
|
TRIGGER_USES_TRANSITION_TABLE(trigger->tgnewtable));
|
|
|
|
trigdesc->trig_update_old_table |=
|
|
|
|
(TRIGGER_FOR_UPDATE(tgtype) &&
|
|
|
|
TRIGGER_USES_TRANSITION_TABLE(trigger->tgoldtable));
|
|
|
|
trigdesc->trig_update_new_table |=
|
|
|
|
(TRIGGER_FOR_UPDATE(tgtype) &&
|
|
|
|
TRIGGER_USES_TRANSITION_TABLE(trigger->tgnewtable));
|
|
|
|
trigdesc->trig_delete_old_table |=
|
|
|
|
(TRIGGER_FOR_DELETE(tgtype) &&
|
|
|
|
TRIGGER_USES_TRANSITION_TABLE(trigger->tgoldtable));
|
1997-09-01 09:59:06 +02:00
|
|
|
}
|
|
|
|
|
2002-10-14 18:51:30 +02:00
|
|
|
/*
|
|
|
|
* Copy a TriggerDesc data structure.
|
|
|
|
*
|
|
|
|
* The copy is allocated in the current memory context.
|
|
|
|
*/
|
|
|
|
TriggerDesc *
|
|
|
|
CopyTriggerDesc(TriggerDesc *trigdesc)
|
|
|
|
{
|
|
|
|
TriggerDesc *newdesc;
|
|
|
|
Trigger *trigger;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (trigdesc == NULL || trigdesc->numtriggers <= 0)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
newdesc = (TriggerDesc *) palloc(sizeof(TriggerDesc));
|
|
|
|
memcpy(newdesc, trigdesc, sizeof(TriggerDesc));
|
|
|
|
|
|
|
|
trigger = (Trigger *) palloc(trigdesc->numtriggers * sizeof(Trigger));
|
|
|
|
memcpy(trigger, trigdesc->triggers,
|
|
|
|
trigdesc->numtriggers * sizeof(Trigger));
|
|
|
|
newdesc->triggers = trigger;
|
|
|
|
|
|
|
|
for (i = 0; i < trigdesc->numtriggers; i++)
|
|
|
|
{
|
|
|
|
trigger->tgname = pstrdup(trigger->tgname);
|
2005-03-29 02:17:27 +02:00
|
|
|
if (trigger->tgnattr > 0)
|
|
|
|
{
|
2012-06-25 00:51:46 +02:00
|
|
|
int16 *newattr;
|
2005-03-29 02:17:27 +02:00
|
|
|
|
2012-06-25 00:51:46 +02:00
|
|
|
newattr = (int16 *) palloc(trigger->tgnattr * sizeof(int16));
|
2005-03-29 02:17:27 +02:00
|
|
|
memcpy(newattr, trigger->tgattr,
|
2012-06-25 00:51:46 +02:00
|
|
|
trigger->tgnattr * sizeof(int16));
|
2005-03-29 02:17:27 +02:00
|
|
|
trigger->tgattr = newattr;
|
|
|
|
}
|
2002-10-14 18:51:30 +02:00
|
|
|
if (trigger->tgnargs > 0)
|
|
|
|
{
|
|
|
|
char **newargs;
|
|
|
|
int16 j;
|
|
|
|
|
|
|
|
newargs = (char **) palloc(trigger->tgnargs * sizeof(char *));
|
|
|
|
for (j = 0; j < trigger->tgnargs; j++)
|
|
|
|
newargs[j] = pstrdup(trigger->tgargs[j]);
|
|
|
|
trigger->tgargs = newargs;
|
|
|
|
}
|
2009-11-20 21:38:12 +01:00
|
|
|
if (trigger->tgqual)
|
|
|
|
trigger->tgqual = pstrdup(trigger->tgqual);
|
2016-11-04 16:49:50 +01:00
|
|
|
if (trigger->tgoldtable)
|
|
|
|
trigger->tgoldtable = pstrdup(trigger->tgoldtable);
|
|
|
|
if (trigger->tgnewtable)
|
|
|
|
trigger->tgnewtable = pstrdup(trigger->tgnewtable);
|
2002-10-14 18:51:30 +02:00
|
|
|
trigger++;
|
|
|
|
}
|
|
|
|
|
|
|
|
return newdesc;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Free a TriggerDesc data structure.
|
|
|
|
*/
|
2000-01-31 05:35:57 +01:00
|
|
|
void
|
|
|
|
FreeTriggerDesc(TriggerDesc *trigdesc)
|
|
|
|
{
|
|
|
|
Trigger *trigger;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (trigdesc == NULL)
|
|
|
|
return;
|
|
|
|
|
|
|
|
trigger = trigdesc->triggers;
|
|
|
|
for (i = 0; i < trigdesc->numtriggers; i++)
|
|
|
|
{
|
|
|
|
pfree(trigger->tgname);
|
2005-03-29 02:17:27 +02:00
|
|
|
if (trigger->tgnattr > 0)
|
|
|
|
pfree(trigger->tgattr);
|
2000-01-31 05:35:57 +01:00
|
|
|
if (trigger->tgnargs > 0)
|
|
|
|
{
|
|
|
|
while (--(trigger->tgnargs) >= 0)
|
|
|
|
pfree(trigger->tgargs[trigger->tgnargs]);
|
|
|
|
pfree(trigger->tgargs);
|
|
|
|
}
|
2009-11-20 21:38:12 +01:00
|
|
|
if (trigger->tgqual)
|
|
|
|
pfree(trigger->tgqual);
|
2016-11-04 16:49:50 +01:00
|
|
|
if (trigger->tgoldtable)
|
|
|
|
pfree(trigger->tgoldtable);
|
|
|
|
if (trigger->tgnewtable)
|
|
|
|
pfree(trigger->tgnewtable);
|
2000-01-31 05:35:57 +01:00
|
|
|
trigger++;
|
|
|
|
}
|
|
|
|
pfree(trigdesc->triggers);
|
|
|
|
pfree(trigdesc);
|
|
|
|
}
|
|
|
|
|
2002-10-14 18:51:30 +02:00
|
|
|
/*
|
|
|
|
* Compare two TriggerDesc structures for logical equality.
|
|
|
|
*/
|
|
|
|
#ifdef NOT_USED
|
2000-01-31 05:35:57 +01:00
|
|
|
bool
|
|
|
|
equalTriggerDescs(TriggerDesc *trigdesc1, TriggerDesc *trigdesc2)
|
|
|
|
{
|
|
|
|
int i,
|
|
|
|
j;
|
|
|
|
|
|
|
|
/*
|
2010-10-10 19:43:33 +02:00
|
|
|
* We need not examine the hint flags, just the trigger array itself; if
|
|
|
|
* we have the same triggers with the same types, the flags should match.
|
2002-04-19 18:36:08 +02:00
|
|
|
*
|
|
|
|
* As of 7.3 we assume trigger set ordering is significant in the
|
|
|
|
* comparison; so we just compare corresponding slots of the two sets.
|
2009-11-20 21:38:12 +01:00
|
|
|
*
|
|
|
|
* Note: comparing the stringToNode forms of the WHEN clauses means that
|
|
|
|
* parse column locations will affect the result. This is okay as long as
|
|
|
|
* this function is only used for detecting exact equality, as for example
|
|
|
|
* in checking for staleness of a cache entry.
|
2000-01-31 05:35:57 +01:00
|
|
|
*/
|
|
|
|
if (trigdesc1 != NULL)
|
|
|
|
{
|
|
|
|
if (trigdesc2 == NULL)
|
|
|
|
return false;
|
|
|
|
if (trigdesc1->numtriggers != trigdesc2->numtriggers)
|
|
|
|
return false;
|
|
|
|
for (i = 0; i < trigdesc1->numtriggers; i++)
|
|
|
|
{
|
|
|
|
Trigger *trig1 = trigdesc1->triggers + i;
|
2002-04-19 18:36:08 +02:00
|
|
|
Trigger *trig2 = trigdesc2->triggers + i;
|
|
|
|
|
|
|
|
if (trig1->tgoid != trig2->tgoid)
|
2000-01-31 05:35:57 +01:00
|
|
|
return false;
|
|
|
|
if (strcmp(trig1->tgname, trig2->tgname) != 0)
|
|
|
|
return false;
|
|
|
|
if (trig1->tgfoid != trig2->tgfoid)
|
|
|
|
return false;
|
|
|
|
if (trig1->tgtype != trig2->tgtype)
|
|
|
|
return false;
|
|
|
|
if (trig1->tgenabled != trig2->tgenabled)
|
|
|
|
return false;
|
2010-01-17 23:56:23 +01:00
|
|
|
if (trig1->tgisinternal != trig2->tgisinternal)
|
2000-01-31 05:35:57 +01:00
|
|
|
return false;
|
2020-03-18 22:58:05 +01:00
|
|
|
if (trig1->tgisclone != trig2->tgisclone)
|
|
|
|
return false;
|
2002-04-02 00:36:13 +02:00
|
|
|
if (trig1->tgconstrrelid != trig2->tgconstrrelid)
|
|
|
|
return false;
|
2009-07-28 04:56:31 +02:00
|
|
|
if (trig1->tgconstrindid != trig2->tgconstrindid)
|
|
|
|
return false;
|
2007-02-14 02:58:58 +01:00
|
|
|
if (trig1->tgconstraint != trig2->tgconstraint)
|
|
|
|
return false;
|
2000-01-31 05:35:57 +01:00
|
|
|
if (trig1->tgdeferrable != trig2->tgdeferrable)
|
|
|
|
return false;
|
|
|
|
if (trig1->tginitdeferred != trig2->tginitdeferred)
|
|
|
|
return false;
|
|
|
|
if (trig1->tgnargs != trig2->tgnargs)
|
|
|
|
return false;
|
2005-03-29 02:17:27 +02:00
|
|
|
if (trig1->tgnattr != trig2->tgnattr)
|
|
|
|
return false;
|
|
|
|
if (trig1->tgnattr > 0 &&
|
|
|
|
memcmp(trig1->tgattr, trig2->tgattr,
|
2012-06-25 00:51:46 +02:00
|
|
|
trig1->tgnattr * sizeof(int16)) != 0)
|
2000-01-31 05:35:57 +01:00
|
|
|
return false;
|
|
|
|
for (j = 0; j < trig1->tgnargs; j++)
|
|
|
|
if (strcmp(trig1->tgargs[j], trig2->tgargs[j]) != 0)
|
|
|
|
return false;
|
2009-11-20 21:38:12 +01:00
|
|
|
if (trig1->tgqual == NULL && trig2->tgqual == NULL)
|
|
|
|
/* ok */ ;
|
|
|
|
else if (trig1->tgqual == NULL || trig2->tgqual == NULL)
|
|
|
|
return false;
|
|
|
|
else if (strcmp(trig1->tgqual, trig2->tgqual) != 0)
|
|
|
|
return false;
|
2016-11-04 16:49:50 +01:00
|
|
|
if (trig1->tgoldtable == NULL && trig2->tgoldtable == NULL)
|
|
|
|
/* ok */ ;
|
|
|
|
else if (trig1->tgoldtable == NULL || trig2->tgoldtable == NULL)
|
|
|
|
return false;
|
|
|
|
else if (strcmp(trig1->tgoldtable, trig2->tgoldtable) != 0)
|
|
|
|
return false;
|
|
|
|
if (trig1->tgnewtable == NULL && trig2->tgnewtable == NULL)
|
|
|
|
/* ok */ ;
|
|
|
|
else if (trig1->tgnewtable == NULL || trig2->tgnewtable == NULL)
|
|
|
|
return false;
|
|
|
|
else if (strcmp(trig1->tgnewtable, trig2->tgnewtable) != 0)
|
|
|
|
return false;
|
2000-01-31 05:35:57 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
else if (trigdesc2 != NULL)
|
|
|
|
return false;
|
|
|
|
return true;
|
|
|
|
}
|
2002-10-14 18:51:30 +02:00
|
|
|
#endif /* NOT_USED */
|
2000-01-31 05:35:57 +01:00
|
|
|
|
2017-06-28 19:55:03 +02:00
|
|
|
/*
|
|
|
|
* Check if there is a row-level trigger with transition tables that prevents
|
|
|
|
* a table from becoming an inheritance child or partition. Return the name
|
|
|
|
* of the first such incompatible trigger, or NULL if there is none.
|
|
|
|
*/
|
|
|
|
const char *
|
|
|
|
FindTriggerIncompatibleWithInheritance(TriggerDesc *trigdesc)
|
|
|
|
{
|
|
|
|
if (trigdesc != NULL)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < trigdesc->numtriggers; ++i)
|
|
|
|
{
|
|
|
|
Trigger *trigger = &trigdesc->triggers[i];
|
|
|
|
|
|
|
|
if (trigger->tgoldtable != NULL || trigger->tgnewtable != NULL)
|
|
|
|
return trigger->tgname;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2001-06-01 04:41:36 +02:00
|
|
|
/*
|
|
|
|
* Call a trigger function.
|
|
|
|
*
|
|
|
|
* trigdata: trigger descriptor.
|
2005-03-25 22:58:00 +01:00
|
|
|
* tgindx: trigger's index in finfo and instr arrays.
|
|
|
|
* finfo: array of cached trigger function call information.
|
|
|
|
* instr: optional array of EXPLAIN ANALYZE instrumentation state.
|
2001-06-01 04:41:36 +02:00
|
|
|
* per_tuple_context: memory context to execute the function in.
|
|
|
|
*
|
|
|
|
* Returns the tuple (or NULL) as returned by the function.
|
|
|
|
*/
|
1997-10-28 16:11:45 +01:00
|
|
|
static HeapTuple
|
2001-06-01 04:41:36 +02:00
|
|
|
ExecCallTriggerFunc(TriggerData *trigdata,
|
2005-03-25 22:58:00 +01:00
|
|
|
int tgindx,
|
2001-06-01 04:41:36 +02:00
|
|
|
FmgrInfo *finfo,
|
2005-03-25 22:58:00 +01:00
|
|
|
Instrumentation *instr,
|
2001-01-22 01:50:07 +01:00
|
|
|
MemoryContext per_tuple_context)
|
1997-10-28 16:11:45 +01:00
|
|
|
{
|
Change function call information to be variable length.
Before this change FunctionCallInfoData, the struct arguments etc for
V1 function calls are stored in, always had space for
FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two
arrays. For nearly every function call 100 arguments is far more than
needed, therefore wasting memory. Arg and argnull being two separate
arrays also guarantees that to access a single argument, two
cachelines have to be touched.
Change the layout so there's a single variable-length array with pairs
of value / isnull. That drastically reduces memory consumption for
most function calls (on x86-64 a two argument function now uses
64bytes, previously 936 bytes), and makes it very likely that argument
value and its nullness are on the same cacheline.
Arguments are stored in a new NullableDatum struct, which, due to
padding, needs more memory per argument than before. But as usually
far fewer arguments are stored, and individual arguments are cheaper
to access, that's still a clear win. It's likely that there's other
places where conversion to NullableDatum arrays would make sense,
e.g. TupleTableSlots, but that's for another commit.
Because the function call information is now variable-length
allocations have to take the number of arguments into account. For
heap allocations that can be done with SizeForFunctionCallInfoData(),
for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro
that helps to allocate an appropriately sized and aligned variable.
Some places with stack allocation function call information don't know
the number of arguments at compile time, and currently variably sized
stack allocations aren't allowed in postgres. Therefore allow for
FUNC_MAX_ARGS space in these cases. They're not that common, so for
now that seems acceptable.
Because of the need to allocate FunctionCallInfo of the appropriate
size, older extensions may need to update their code. To avoid subtle
breakages, the FunctionCallInfoData struct has been renamed to
FunctionCallInfoBaseData. Most code only references FunctionCallInfo,
so that shouldn't cause much collateral damage.
This change is also a prerequisite for more efficient expression JIT
compilation (by allocating the function call information on the stack,
allowing LLVM to optimize it away); previously the size of the call
information caused problems inside LLVM's optimizer.
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
2019-01-26 23:17:52 +01:00
|
|
|
LOCAL_FCINFO(fcinfo, 0);
|
2008-05-15 02:17:41 +02:00
|
|
|
PgStat_FunctionCallUsage fcusage;
|
2000-05-29 03:59:17 +02:00
|
|
|
Datum result;
|
2001-01-22 01:50:07 +01:00
|
|
|
MemoryContext oldContext;
|
2000-05-29 03:59:17 +02:00
|
|
|
|
2016-11-04 16:49:50 +01:00
|
|
|
/*
|
|
|
|
* Protect against code paths that may fail to initialize transition table
|
|
|
|
* info.
|
|
|
|
*/
|
|
|
|
Assert(((TRIGGER_FIRED_BY_INSERT(trigdata->tg_event) ||
|
|
|
|
TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event) ||
|
|
|
|
TRIGGER_FIRED_BY_DELETE(trigdata->tg_event)) &&
|
|
|
|
TRIGGER_FIRED_AFTER(trigdata->tg_event) &&
|
|
|
|
!(trigdata->tg_event & AFTER_TRIGGER_DEFERRABLE) &&
|
|
|
|
!(trigdata->tg_event & AFTER_TRIGGER_INITDEFERRED)) ||
|
|
|
|
(trigdata->tg_oldtable == NULL && trigdata->tg_newtable == NULL));
|
|
|
|
|
2005-03-25 22:58:00 +01:00
|
|
|
finfo += tgindx;
|
|
|
|
|
2000-05-29 03:59:17 +02:00
|
|
|
/*
|
2001-06-01 04:41:36 +02:00
|
|
|
* We cache fmgr lookup info, to avoid making the lookup again on each
|
|
|
|
* call.
|
2000-05-29 03:59:17 +02:00
|
|
|
*/
|
2001-06-01 04:41:36 +02:00
|
|
|
if (finfo->fn_oid == InvalidOid)
|
|
|
|
fmgr_info(trigdata->tg_trigger->tgfoid, finfo);
|
|
|
|
|
|
|
|
Assert(finfo->fn_oid == trigdata->tg_trigger->tgfoid);
|
1997-10-28 16:11:45 +01:00
|
|
|
|
2005-03-25 22:58:00 +01:00
|
|
|
/*
|
|
|
|
* If doing EXPLAIN ANALYZE, start charging time to this trigger.
|
|
|
|
*/
|
|
|
|
if (instr)
|
|
|
|
InstrStartNode(instr + tgindx);
|
|
|
|
|
2001-01-22 01:50:07 +01:00
|
|
|
/*
|
|
|
|
* Do the function evaluation in the per-tuple memory context, so that
|
|
|
|
* leaked memory will be reclaimed once per tuple. Note in particular that
|
|
|
|
* any new tuple created by the trigger function will live till the end of
|
|
|
|
* the tuple cycle.
|
|
|
|
*/
|
|
|
|
oldContext = MemoryContextSwitchTo(per_tuple_context);
|
|
|
|
|
2000-05-29 03:59:17 +02:00
|
|
|
/*
|
|
|
|
* Call the function, passing no arguments but setting a context.
|
|
|
|
*/
|
Change function call information to be variable length.
Before this change FunctionCallInfoData, the struct arguments etc for
V1 function calls are stored in, always had space for
FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two
arrays. For nearly every function call 100 arguments is far more than
needed, therefore wasting memory. Arg and argnull being two separate
arrays also guarantees that to access a single argument, two
cachelines have to be touched.
Change the layout so there's a single variable-length array with pairs
of value / isnull. That drastically reduces memory consumption for
most function calls (on x86-64 a two argument function now uses
64bytes, previously 936 bytes), and makes it very likely that argument
value and its nullness are on the same cacheline.
Arguments are stored in a new NullableDatum struct, which, due to
padding, needs more memory per argument than before. But as usually
far fewer arguments are stored, and individual arguments are cheaper
to access, that's still a clear win. It's likely that there's other
places where conversion to NullableDatum arrays would make sense,
e.g. TupleTableSlots, but that's for another commit.
Because the function call information is now variable-length
allocations have to take the number of arguments into account. For
heap allocations that can be done with SizeForFunctionCallInfoData(),
for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro
that helps to allocate an appropriately sized and aligned variable.
Some places with stack allocation function call information don't know
the number of arguments at compile time, and currently variably sized
stack allocations aren't allowed in postgres. Therefore allow for
FUNC_MAX_ARGS space in these cases. They're not that common, so for
now that seems acceptable.
Because of the need to allocate FunctionCallInfo of the appropriate
size, older extensions may need to update their code. To avoid subtle
breakages, the FunctionCallInfoData struct has been renamed to
FunctionCallInfoBaseData. Most code only references FunctionCallInfo,
so that shouldn't cause much collateral damage.
This change is also a prerequisite for more efficient expression JIT
compilation (by allocating the function call information on the stack,
allowing LLVM to optimize it away); previously the size of the call
information caused problems inside LLVM's optimizer.
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
2019-01-26 23:17:52 +01:00
|
|
|
InitFunctionCallInfoData(*fcinfo, finfo, 0,
|
2011-04-13 01:19:24 +02:00
|
|
|
InvalidOid, (Node *) trigdata, NULL);
|
2000-05-29 03:59:17 +02:00
|
|
|
|
Change function call information to be variable length.
Before this change FunctionCallInfoData, the struct arguments etc for
V1 function calls are stored in, always had space for
FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two
arrays. For nearly every function call 100 arguments is far more than
needed, therefore wasting memory. Arg and argnull being two separate
arrays also guarantees that to access a single argument, two
cachelines have to be touched.
Change the layout so there's a single variable-length array with pairs
of value / isnull. That drastically reduces memory consumption for
most function calls (on x86-64 a two argument function now uses
64bytes, previously 936 bytes), and makes it very likely that argument
value and its nullness are on the same cacheline.
Arguments are stored in a new NullableDatum struct, which, due to
padding, needs more memory per argument than before. But as usually
far fewer arguments are stored, and individual arguments are cheaper
to access, that's still a clear win. It's likely that there's other
places where conversion to NullableDatum arrays would make sense,
e.g. TupleTableSlots, but that's for another commit.
Because the function call information is now variable-length
allocations have to take the number of arguments into account. For
heap allocations that can be done with SizeForFunctionCallInfoData(),
for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro
that helps to allocate an appropriately sized and aligned variable.
Some places with stack allocation function call information don't know
the number of arguments at compile time, and currently variably sized
stack allocations aren't allowed in postgres. Therefore allow for
FUNC_MAX_ARGS space in these cases. They're not that common, so for
now that seems acceptable.
Because of the need to allocate FunctionCallInfo of the appropriate
size, older extensions may need to update their code. To avoid subtle
breakages, the FunctionCallInfoData struct has been renamed to
FunctionCallInfoBaseData. Most code only references FunctionCallInfo,
so that shouldn't cause much collateral damage.
This change is also a prerequisite for more efficient expression JIT
compilation (by allocating the function call information on the stack,
allowing LLVM to optimize it away); previously the size of the call
information caused problems inside LLVM's optimizer.
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
2019-01-26 23:17:52 +01:00
|
|
|
pgstat_init_function_usage(fcinfo, &fcusage);
|
2008-05-15 02:17:41 +02:00
|
|
|
|
2012-01-25 17:15:29 +01:00
|
|
|
MyTriggerDepth++;
|
|
|
|
PG_TRY();
|
|
|
|
{
|
Change function call information to be variable length.
Before this change FunctionCallInfoData, the struct arguments etc for
V1 function calls are stored in, always had space for
FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two
arrays. For nearly every function call 100 arguments is far more than
needed, therefore wasting memory. Arg and argnull being two separate
arrays also guarantees that to access a single argument, two
cachelines have to be touched.
Change the layout so there's a single variable-length array with pairs
of value / isnull. That drastically reduces memory consumption for
most function calls (on x86-64 a two argument function now uses
64bytes, previously 936 bytes), and makes it very likely that argument
value and its nullness are on the same cacheline.
Arguments are stored in a new NullableDatum struct, which, due to
padding, needs more memory per argument than before. But as usually
far fewer arguments are stored, and individual arguments are cheaper
to access, that's still a clear win. It's likely that there's other
places where conversion to NullableDatum arrays would make sense,
e.g. TupleTableSlots, but that's for another commit.
Because the function call information is now variable-length
allocations have to take the number of arguments into account. For
heap allocations that can be done with SizeForFunctionCallInfoData(),
for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro
that helps to allocate an appropriately sized and aligned variable.
Some places with stack allocation function call information don't know
the number of arguments at compile time, and currently variably sized
stack allocations aren't allowed in postgres. Therefore allow for
FUNC_MAX_ARGS space in these cases. They're not that common, so for
now that seems acceptable.
Because of the need to allocate FunctionCallInfo of the appropriate
size, older extensions may need to update their code. To avoid subtle
breakages, the FunctionCallInfoData struct has been renamed to
FunctionCallInfoBaseData. Most code only references FunctionCallInfo,
so that shouldn't cause much collateral damage.
This change is also a prerequisite for more efficient expression JIT
compilation (by allocating the function call information on the stack,
allowing LLVM to optimize it away); previously the size of the call
information caused problems inside LLVM's optimizer.
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
2019-01-26 23:17:52 +01:00
|
|
|
result = FunctionCallInvoke(fcinfo);
|
2012-01-25 17:15:29 +01:00
|
|
|
}
|
2019-11-01 11:09:52 +01:00
|
|
|
PG_FINALLY();
|
2012-01-25 17:15:29 +01:00
|
|
|
{
|
|
|
|
MyTriggerDepth--;
|
|
|
|
}
|
|
|
|
PG_END_TRY();
|
2000-05-29 03:59:17 +02:00
|
|
|
|
2008-05-15 02:17:41 +02:00
|
|
|
pgstat_end_function_usage(&fcusage, true);
|
|
|
|
|
2001-01-22 01:50:07 +01:00
|
|
|
MemoryContextSwitchTo(oldContext);
|
|
|
|
|
2000-05-29 03:59:17 +02:00
|
|
|
/*
|
|
|
|
* Trigger protocol allows function to return a null pointer, but NOT to
|
|
|
|
* set the isnull result flag.
|
|
|
|
*/
|
Change function call information to be variable length.
Before this change FunctionCallInfoData, the struct arguments etc for
V1 function calls are stored in, always had space for
FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two
arrays. For nearly every function call 100 arguments is far more than
needed, therefore wasting memory. Arg and argnull being two separate
arrays also guarantees that to access a single argument, two
cachelines have to be touched.
Change the layout so there's a single variable-length array with pairs
of value / isnull. That drastically reduces memory consumption for
most function calls (on x86-64 a two argument function now uses
64bytes, previously 936 bytes), and makes it very likely that argument
value and its nullness are on the same cacheline.
Arguments are stored in a new NullableDatum struct, which, due to
padding, needs more memory per argument than before. But as usually
far fewer arguments are stored, and individual arguments are cheaper
to access, that's still a clear win. It's likely that there's other
places where conversion to NullableDatum arrays would make sense,
e.g. TupleTableSlots, but that's for another commit.
Because the function call information is now variable-length
allocations have to take the number of arguments into account. For
heap allocations that can be done with SizeForFunctionCallInfoData(),
for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro
that helps to allocate an appropriately sized and aligned variable.
Some places with stack allocation function call information don't know
the number of arguments at compile time, and currently variably sized
stack allocations aren't allowed in postgres. Therefore allow for
FUNC_MAX_ARGS space in these cases. They're not that common, so for
now that seems acceptable.
Because of the need to allocate FunctionCallInfo of the appropriate
size, older extensions may need to update their code. To avoid subtle
breakages, the FunctionCallInfoData struct has been renamed to
FunctionCallInfoBaseData. Most code only references FunctionCallInfo,
so that shouldn't cause much collateral damage.
This change is also a prerequisite for more efficient expression JIT
compilation (by allocating the function call information on the stack,
allowing LLVM to optimize it away); previously the size of the call
information caused problems inside LLVM's optimizer.
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
2019-01-26 23:17:52 +01:00
|
|
|
if (fcinfo->isnull)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
|
2003-09-25 08:58:07 +02:00
|
|
|
errmsg("trigger function %u returned null value",
|
Change function call information to be variable length.
Before this change FunctionCallInfoData, the struct arguments etc for
V1 function calls are stored in, always had space for
FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two
arrays. For nearly every function call 100 arguments is far more than
needed, therefore wasting memory. Arg and argnull being two separate
arrays also guarantees that to access a single argument, two
cachelines have to be touched.
Change the layout so there's a single variable-length array with pairs
of value / isnull. That drastically reduces memory consumption for
most function calls (on x86-64 a two argument function now uses
64bytes, previously 936 bytes), and makes it very likely that argument
value and its nullness are on the same cacheline.
Arguments are stored in a new NullableDatum struct, which, due to
padding, needs more memory per argument than before. But as usually
far fewer arguments are stored, and individual arguments are cheaper
to access, that's still a clear win. It's likely that there's other
places where conversion to NullableDatum arrays would make sense,
e.g. TupleTableSlots, but that's for another commit.
Because the function call information is now variable-length
allocations have to take the number of arguments into account. For
heap allocations that can be done with SizeForFunctionCallInfoData(),
for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro
that helps to allocate an appropriately sized and aligned variable.
Some places with stack allocation function call information don't know
the number of arguments at compile time, and currently variably sized
stack allocations aren't allowed in postgres. Therefore allow for
FUNC_MAX_ARGS space in these cases. They're not that common, so for
now that seems acceptable.
Because of the need to allocate FunctionCallInfo of the appropriate
size, older extensions may need to update their code. To avoid subtle
breakages, the FunctionCallInfoData struct has been renamed to
FunctionCallInfoBaseData. Most code only references FunctionCallInfo,
so that shouldn't cause much collateral damage.
This change is also a prerequisite for more efficient expression JIT
compilation (by allocating the function call information on the stack,
allowing LLVM to optimize it away); previously the size of the call
information caused problems inside LLVM's optimizer.
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
2019-01-26 23:17:52 +01:00
|
|
|
fcinfo->flinfo->fn_oid)));
|
2000-05-29 03:59:17 +02:00
|
|
|
|
2005-03-25 22:58:00 +01:00
|
|
|
/*
|
|
|
|
* If doing EXPLAIN ANALYZE, stop charging time to this trigger, and count
|
|
|
|
* one "tuple returned" (really the number of firings).
|
|
|
|
*/
|
|
|
|
if (instr)
|
2006-05-30 16:01:58 +02:00
|
|
|
InstrStopNode(instr + tgindx, 1);
|
2005-03-25 22:58:00 +01:00
|
|
|
|
2000-05-29 03:59:17 +02:00
|
|
|
return (HeapTuple) DatumGetPointer(result);
|
1997-10-28 16:11:45 +01:00
|
|
|
}
|
|
|
|
|
2002-11-23 04:59:09 +01:00
|
|
|
void
|
|
|
|
ExecBSInsertTriggers(EState *estate, ResultRelInfo *relinfo)
|
|
|
|
{
|
|
|
|
TriggerDesc *trigdesc;
|
|
|
|
int i;
|
2020-02-24 10:12:10 +01:00
|
|
|
TriggerData LocTriggerData = {0};
|
2002-11-23 04:59:09 +01:00
|
|
|
|
|
|
|
trigdesc = relinfo->ri_TrigDesc;
|
|
|
|
|
|
|
|
if (trigdesc == NULL)
|
|
|
|
return;
|
2010-10-10 19:43:33 +02:00
|
|
|
if (!trigdesc->trig_insert_before_statement)
|
2002-11-23 04:59:09 +01:00
|
|
|
return;
|
|
|
|
|
2017-09-17 18:16:38 +02:00
|
|
|
/* no-op if we already fired BS triggers in this context */
|
|
|
|
if (before_stmt_triggers_fired(RelationGetRelid(relinfo->ri_RelationDesc),
|
|
|
|
CMD_INSERT))
|
|
|
|
return;
|
|
|
|
|
2002-11-23 04:59:09 +01:00
|
|
|
LocTriggerData.type = T_TriggerData;
|
|
|
|
LocTriggerData.tg_event = TRIGGER_EVENT_INSERT |
|
|
|
|
TRIGGER_EVENT_BEFORE;
|
|
|
|
LocTriggerData.tg_relation = relinfo->ri_RelationDesc;
|
2010-10-10 19:43:33 +02:00
|
|
|
for (i = 0; i < trigdesc->numtriggers; i++)
|
2002-11-23 04:59:09 +01:00
|
|
|
{
|
2010-10-10 19:43:33 +02:00
|
|
|
Trigger *trigger = &trigdesc->triggers[i];
|
2002-11-23 04:59:09 +01:00
|
|
|
HeapTuple newtuple;
|
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
if (!TRIGGER_TYPE_MATCHES(trigger->tgtype,
|
|
|
|
TRIGGER_TYPE_STATEMENT,
|
|
|
|
TRIGGER_TYPE_BEFORE,
|
|
|
|
TRIGGER_TYPE_INSERT))
|
|
|
|
continue;
|
2009-11-20 21:38:12 +01:00
|
|
|
if (!TriggerEnabled(estate, relinfo, trigger, LocTriggerData.tg_event,
|
|
|
|
NULL, NULL, NULL))
|
2009-10-15 00:14:25 +02:00
|
|
|
continue;
|
|
|
|
|
2002-11-23 04:59:09 +01:00
|
|
|
LocTriggerData.tg_trigger = trigger;
|
|
|
|
newtuple = ExecCallTriggerFunc(&LocTriggerData,
|
2010-10-10 19:43:33 +02:00
|
|
|
i,
|
2005-03-25 22:58:00 +01:00
|
|
|
relinfo->ri_TrigFunctions,
|
|
|
|
relinfo->ri_TrigInstrument,
|
2002-11-23 04:59:09 +01:00
|
|
|
GetPerTupleMemoryContext(estate));
|
|
|
|
|
|
|
|
if (newtuple)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
|
|
|
|
errmsg("BEFORE STATEMENT trigger cannot return a value")));
|
2002-11-23 04:59:09 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
2017-06-28 19:59:01 +02:00
|
|
|
ExecASInsertTriggers(EState *estate, ResultRelInfo *relinfo,
|
|
|
|
TransitionCaptureState *transition_capture)
|
2002-11-23 04:59:09 +01:00
|
|
|
{
|
|
|
|
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
|
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
if (trigdesc && trigdesc->trig_insert_after_statement)
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
AfterTriggerSaveEvent(estate, relinfo, NULL, NULL,
|
|
|
|
TRIGGER_EVENT_INSERT,
|
|
|
|
false, NULL, NULL, NIL, NULL, transition_capture,
|
|
|
|
false);
|
2002-11-23 04:59:09 +01:00
|
|
|
}
|
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
bool
|
2001-06-01 04:41:36 +02:00
|
|
|
ExecBRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
|
2011-02-22 03:18:04 +01:00
|
|
|
TupleTableSlot *slot)
|
1997-09-01 09:59:06 +02:00
|
|
|
{
|
2001-06-01 04:41:36 +02:00
|
|
|
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
|
2019-11-13 21:26:54 +01:00
|
|
|
HeapTuple newtuple = NULL;
|
Rejigger materializing and fetching a HeapTuple from a slot.
Previously materializing a slot always returned a HeapTuple. As
current work aims to reduce the reliance on HeapTuples (so other
storage systems can work efficiently), that needs to change. Thus
split the tasks of materializing a slot (i.e. making it independent
from the underlying storage / other memory contexts) from fetching a
HeapTuple from the slot. For brevity, allow to fetch a HeapTuple from
a slot and materializing the slot at the same time, controlled by a
parameter.
For now some callers of ExecFetchSlotHeapTuple, with materialize =
true, expect that changes to the heap tuple will be reflected in the
underlying slot. Those places will be adapted in due course, so while
not pretty, that's OK for now.
Also rename ExecFetchSlotTuple to ExecFetchSlotHeapTupleDatum and
ExecFetchSlotTupleDatum to ExecFetchSlotHeapTupleDatum, as it's likely
that future storage methods will need similar methods. There already
is ExecFetchSlotMinimalTuple, so the new names make the naming scheme
more coherent.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-15 23:26:14 +01:00
|
|
|
bool should_free;
|
2020-02-24 10:12:10 +01:00
|
|
|
TriggerData LocTriggerData = {0};
|
1997-09-04 15:19:01 +02:00
|
|
|
int i;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2000-05-29 03:59:17 +02:00
|
|
|
LocTriggerData.type = T_TriggerData;
|
2002-11-23 04:59:09 +01:00
|
|
|
LocTriggerData.tg_event = TRIGGER_EVENT_INSERT |
|
|
|
|
TRIGGER_EVENT_ROW |
|
|
|
|
TRIGGER_EVENT_BEFORE;
|
2001-06-01 04:41:36 +02:00
|
|
|
LocTriggerData.tg_relation = relinfo->ri_RelationDesc;
|
2010-10-10 19:43:33 +02:00
|
|
|
for (i = 0; i < trigdesc->numtriggers; i++)
|
1997-09-04 15:19:01 +02:00
|
|
|
{
|
2010-10-10 19:43:33 +02:00
|
|
|
Trigger *trigger = &trigdesc->triggers[i];
|
2019-02-27 05:30:28 +01:00
|
|
|
HeapTuple oldtuple;
|
2001-06-01 04:41:36 +02:00
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
if (!TRIGGER_TYPE_MATCHES(trigger->tgtype,
|
|
|
|
TRIGGER_TYPE_ROW,
|
|
|
|
TRIGGER_TYPE_BEFORE,
|
|
|
|
TRIGGER_TYPE_INSERT))
|
|
|
|
continue;
|
2009-11-20 21:38:12 +01:00
|
|
|
if (!TriggerEnabled(estate, relinfo, trigger, LocTriggerData.tg_event,
|
2019-02-27 05:30:28 +01:00
|
|
|
NULL, NULL, slot))
|
2009-10-15 00:14:25 +02:00
|
|
|
continue;
|
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
if (!newtuple)
|
|
|
|
newtuple = ExecFetchSlotHeapTuple(slot, true, &should_free);
|
|
|
|
|
|
|
|
LocTriggerData.tg_trigslot = slot;
|
2000-05-29 03:59:17 +02:00
|
|
|
LocTriggerData.tg_trigtuple = oldtuple = newtuple;
|
2001-06-01 04:41:36 +02:00
|
|
|
LocTriggerData.tg_trigger = trigger;
|
|
|
|
newtuple = ExecCallTriggerFunc(&LocTriggerData,
|
2010-10-10 19:43:33 +02:00
|
|
|
i,
|
2005-03-25 22:58:00 +01:00
|
|
|
relinfo->ri_TrigFunctions,
|
|
|
|
relinfo->ri_TrigInstrument,
|
2001-01-22 01:50:07 +01:00
|
|
|
GetPerTupleMemoryContext(estate));
|
1997-09-04 15:19:01 +02:00
|
|
|
if (newtuple == NULL)
|
Rejigger materializing and fetching a HeapTuple from a slot.
Previously materializing a slot always returned a HeapTuple. As
current work aims to reduce the reliance on HeapTuples (so other
storage systems can work efficiently), that needs to change. Thus
split the tasks of materializing a slot (i.e. making it independent
from the underlying storage / other memory contexts) from fetching a
HeapTuple from the slot. For brevity, allow to fetch a HeapTuple from
a slot and materializing the slot at the same time, controlled by a
parameter.
For now some callers of ExecFetchSlotHeapTuple, with materialize =
true, expect that changes to the heap tuple will be reflected in the
underlying slot. Those places will be adapted in due course, so while
not pretty, that's OK for now.
Also rename ExecFetchSlotTuple to ExecFetchSlotHeapTupleDatum and
ExecFetchSlotTupleDatum to ExecFetchSlotHeapTupleDatum, as it's likely
that future storage methods will need similar methods. There already
is ExecFetchSlotMinimalTuple, so the new names make the naming scheme
more coherent.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-15 23:26:14 +01:00
|
|
|
{
|
|
|
|
if (should_free)
|
2019-02-27 05:30:28 +01:00
|
|
|
heap_freetuple(oldtuple);
|
|
|
|
return false; /* "do nothing" */
|
Rejigger materializing and fetching a HeapTuple from a slot.
Previously materializing a slot always returned a HeapTuple. As
current work aims to reduce the reliance on HeapTuples (so other
storage systems can work efficiently), that needs to change. Thus
split the tasks of materializing a slot (i.e. making it independent
from the underlying storage / other memory contexts) from fetching a
HeapTuple from the slot. For brevity, allow to fetch a HeapTuple from
a slot and materializing the slot at the same time, controlled by a
parameter.
For now some callers of ExecFetchSlotHeapTuple, with materialize =
true, expect that changes to the heap tuple will be reflected in the
underlying slot. Those places will be adapted in due course, so while
not pretty, that's OK for now.
Also rename ExecFetchSlotTuple to ExecFetchSlotHeapTupleDatum and
ExecFetchSlotTupleDatum to ExecFetchSlotHeapTupleDatum, as it's likely
that future storage methods will need similar methods. There already
is ExecFetchSlotMinimalTuple, so the new names make the naming scheme
more coherent.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-15 23:26:14 +01:00
|
|
|
}
|
2019-02-27 05:30:28 +01:00
|
|
|
else if (newtuple != oldtuple)
|
|
|
|
{
|
2019-04-19 20:33:37 +02:00
|
|
|
ExecForceStoreHeapTuple(newtuple, slot, false);
|
2011-02-22 03:18:04 +01:00
|
|
|
|
2020-03-18 22:58:05 +01:00
|
|
|
/*
|
|
|
|
* After a tuple in a partition goes through a trigger, the user
|
|
|
|
* could have changed the partition key enough that the tuple no
|
|
|
|
* longer fits the partition. Verify that.
|
|
|
|
*/
|
|
|
|
if (trigger->tgisclone &&
|
|
|
|
!ExecPartitionCheck(relinfo, slot, estate, false))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("moving row to another partition during a BEFORE FOR EACH ROW trigger is not supported"),
|
|
|
|
errdetail("Before executing trigger \"%s\", the row was to be in partition \"%s.%s\".",
|
|
|
|
trigger->tgname,
|
|
|
|
get_namespace_name(RelationGetNamespace(relinfo->ri_RelationDesc)),
|
|
|
|
RelationGetRelationName(relinfo->ri_RelationDesc))));
|
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
if (should_free)
|
|
|
|
heap_freetuple(oldtuple);
|
2011-02-22 03:18:04 +01:00
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
/* signal tuple should be re-fetched if used */
|
|
|
|
newtuple = NULL;
|
|
|
|
}
|
2011-02-22 03:18:04 +01:00
|
|
|
}
|
Rejigger materializing and fetching a HeapTuple from a slot.
Previously materializing a slot always returned a HeapTuple. As
current work aims to reduce the reliance on HeapTuples (so other
storage systems can work efficiently), that needs to change. Thus
split the tasks of materializing a slot (i.e. making it independent
from the underlying storage / other memory contexts) from fetching a
HeapTuple from the slot. For brevity, allow to fetch a HeapTuple from
a slot and materializing the slot at the same time, controlled by a
parameter.
For now some callers of ExecFetchSlotHeapTuple, with materialize =
true, expect that changes to the heap tuple will be reflected in the
underlying slot. Those places will be adapted in due course, so while
not pretty, that's OK for now.
Also rename ExecFetchSlotTuple to ExecFetchSlotHeapTupleDatum and
ExecFetchSlotTupleDatum to ExecFetchSlotHeapTupleDatum, as it's likely
that future storage methods will need similar methods. There already
is ExecFetchSlotMinimalTuple, so the new names make the naming scheme
more coherent.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-15 23:26:14 +01:00
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
return true;
|
1997-09-01 09:59:06 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
2001-06-01 04:41:36 +02:00
|
|
|
ExecARInsertTriggers(EState *estate, ResultRelInfo *relinfo,
|
2019-02-27 05:30:28 +01:00
|
|
|
TupleTableSlot *slot, List *recheckIndexes,
|
2017-06-28 19:55:03 +02:00
|
|
|
TransitionCaptureState *transition_capture)
|
1997-09-01 09:59:06 +02:00
|
|
|
{
|
2001-06-01 04:41:36 +02:00
|
|
|
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
|
|
|
|
|
2017-06-28 19:55:03 +02:00
|
|
|
if ((trigdesc && trigdesc->trig_insert_after_row) ||
|
|
|
|
(transition_capture && transition_capture->tcs_insert_new_table))
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
AfterTriggerSaveEvent(estate, relinfo, NULL, NULL,
|
|
|
|
TRIGGER_EVENT_INSERT,
|
2019-02-27 05:30:28 +01:00
|
|
|
true, NULL, slot,
|
2017-06-28 19:55:03 +02:00
|
|
|
recheckIndexes, NULL,
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
transition_capture,
|
|
|
|
false);
|
2002-11-23 04:59:09 +01:00
|
|
|
}
|
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
bool
|
2010-10-10 19:43:33 +02:00
|
|
|
ExecIRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
|
2011-02-22 03:18:04 +01:00
|
|
|
TupleTableSlot *slot)
|
2010-10-10 19:43:33 +02:00
|
|
|
{
|
|
|
|
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
|
2019-02-27 05:30:28 +01:00
|
|
|
HeapTuple newtuple = NULL;
|
Rejigger materializing and fetching a HeapTuple from a slot.
Previously materializing a slot always returned a HeapTuple. As
current work aims to reduce the reliance on HeapTuples (so other
storage systems can work efficiently), that needs to change. Thus
split the tasks of materializing a slot (i.e. making it independent
from the underlying storage / other memory contexts) from fetching a
HeapTuple from the slot. For brevity, allow to fetch a HeapTuple from
a slot and materializing the slot at the same time, controlled by a
parameter.
For now some callers of ExecFetchSlotHeapTuple, with materialize =
true, expect that changes to the heap tuple will be reflected in the
underlying slot. Those places will be adapted in due course, so while
not pretty, that's OK for now.
Also rename ExecFetchSlotTuple to ExecFetchSlotHeapTupleDatum and
ExecFetchSlotTupleDatum to ExecFetchSlotHeapTupleDatum, as it's likely
that future storage methods will need similar methods. There already
is ExecFetchSlotMinimalTuple, so the new names make the naming scheme
more coherent.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-15 23:26:14 +01:00
|
|
|
bool should_free;
|
2020-02-24 10:12:10 +01:00
|
|
|
TriggerData LocTriggerData = {0};
|
2010-10-10 19:43:33 +02:00
|
|
|
int i;
|
|
|
|
|
|
|
|
LocTriggerData.type = T_TriggerData;
|
|
|
|
LocTriggerData.tg_event = TRIGGER_EVENT_INSERT |
|
|
|
|
TRIGGER_EVENT_ROW |
|
|
|
|
TRIGGER_EVENT_INSTEAD;
|
|
|
|
LocTriggerData.tg_relation = relinfo->ri_RelationDesc;
|
|
|
|
for (i = 0; i < trigdesc->numtriggers; i++)
|
|
|
|
{
|
|
|
|
Trigger *trigger = &trigdesc->triggers[i];
|
2019-02-27 05:30:28 +01:00
|
|
|
HeapTuple oldtuple;
|
2010-10-10 19:43:33 +02:00
|
|
|
|
|
|
|
if (!TRIGGER_TYPE_MATCHES(trigger->tgtype,
|
|
|
|
TRIGGER_TYPE_ROW,
|
|
|
|
TRIGGER_TYPE_INSTEAD,
|
|
|
|
TRIGGER_TYPE_INSERT))
|
|
|
|
continue;
|
|
|
|
if (!TriggerEnabled(estate, relinfo, trigger, LocTriggerData.tg_event,
|
2019-02-27 05:30:28 +01:00
|
|
|
NULL, NULL, slot))
|
2010-10-10 19:43:33 +02:00
|
|
|
continue;
|
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
if (!newtuple)
|
|
|
|
newtuple = ExecFetchSlotHeapTuple(slot, true, &should_free);
|
|
|
|
|
|
|
|
LocTriggerData.tg_trigslot = slot;
|
2010-10-10 19:43:33 +02:00
|
|
|
LocTriggerData.tg_trigtuple = oldtuple = newtuple;
|
|
|
|
LocTriggerData.tg_trigger = trigger;
|
|
|
|
newtuple = ExecCallTriggerFunc(&LocTriggerData,
|
|
|
|
i,
|
|
|
|
relinfo->ri_TrigFunctions,
|
|
|
|
relinfo->ri_TrigInstrument,
|
|
|
|
GetPerTupleMemoryContext(estate));
|
|
|
|
if (newtuple == NULL)
|
Rejigger materializing and fetching a HeapTuple from a slot.
Previously materializing a slot always returned a HeapTuple. As
current work aims to reduce the reliance on HeapTuples (so other
storage systems can work efficiently), that needs to change. Thus
split the tasks of materializing a slot (i.e. making it independent
from the underlying storage / other memory contexts) from fetching a
HeapTuple from the slot. For brevity, allow to fetch a HeapTuple from
a slot and materializing the slot at the same time, controlled by a
parameter.
For now some callers of ExecFetchSlotHeapTuple, with materialize =
true, expect that changes to the heap tuple will be reflected in the
underlying slot. Those places will be adapted in due course, so while
not pretty, that's OK for now.
Also rename ExecFetchSlotTuple to ExecFetchSlotHeapTupleDatum and
ExecFetchSlotTupleDatum to ExecFetchSlotHeapTupleDatum, as it's likely
that future storage methods will need similar methods. There already
is ExecFetchSlotMinimalTuple, so the new names make the naming scheme
more coherent.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-15 23:26:14 +01:00
|
|
|
{
|
|
|
|
if (should_free)
|
2019-02-27 05:30:28 +01:00
|
|
|
heap_freetuple(oldtuple);
|
|
|
|
return false; /* "do nothing" */
|
Rejigger materializing and fetching a HeapTuple from a slot.
Previously materializing a slot always returned a HeapTuple. As
current work aims to reduce the reliance on HeapTuples (so other
storage systems can work efficiently), that needs to change. Thus
split the tasks of materializing a slot (i.e. making it independent
from the underlying storage / other memory contexts) from fetching a
HeapTuple from the slot. For brevity, allow to fetch a HeapTuple from
a slot and materializing the slot at the same time, controlled by a
parameter.
For now some callers of ExecFetchSlotHeapTuple, with materialize =
true, expect that changes to the heap tuple will be reflected in the
underlying slot. Those places will be adapted in due course, so while
not pretty, that's OK for now.
Also rename ExecFetchSlotTuple to ExecFetchSlotHeapTupleDatum and
ExecFetchSlotTupleDatum to ExecFetchSlotHeapTupleDatum, as it's likely
that future storage methods will need similar methods. There already
is ExecFetchSlotMinimalTuple, so the new names make the naming scheme
more coherent.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-15 23:26:14 +01:00
|
|
|
}
|
2019-02-27 05:30:28 +01:00
|
|
|
else if (newtuple != oldtuple)
|
|
|
|
{
|
2019-04-19 20:33:37 +02:00
|
|
|
ExecForceStoreHeapTuple(newtuple, slot, false);
|
2011-02-22 03:18:04 +01:00
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
if (should_free)
|
|
|
|
heap_freetuple(oldtuple);
|
2011-02-22 03:18:04 +01:00
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
/* signal tuple should be re-fetched if used */
|
|
|
|
newtuple = NULL;
|
|
|
|
}
|
2010-10-10 19:43:33 +02:00
|
|
|
}
|
Rejigger materializing and fetching a HeapTuple from a slot.
Previously materializing a slot always returned a HeapTuple. As
current work aims to reduce the reliance on HeapTuples (so other
storage systems can work efficiently), that needs to change. Thus
split the tasks of materializing a slot (i.e. making it independent
from the underlying storage / other memory contexts) from fetching a
HeapTuple from the slot. For brevity, allow to fetch a HeapTuple from
a slot and materializing the slot at the same time, controlled by a
parameter.
For now some callers of ExecFetchSlotHeapTuple, with materialize =
true, expect that changes to the heap tuple will be reflected in the
underlying slot. Those places will be adapted in due course, so while
not pretty, that's OK for now.
Also rename ExecFetchSlotTuple to ExecFetchSlotHeapTupleDatum and
ExecFetchSlotTupleDatum to ExecFetchSlotHeapTupleDatum, as it's likely
that future storage methods will need similar methods. There already
is ExecFetchSlotMinimalTuple, so the new names make the naming scheme
more coherent.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-15 23:26:14 +01:00
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
return true;
|
2010-10-10 19:43:33 +02:00
|
|
|
}
|
|
|
|
|
2002-11-23 04:59:09 +01:00
|
|
|
void
|
|
|
|
ExecBSDeleteTriggers(EState *estate, ResultRelInfo *relinfo)
|
|
|
|
{
|
|
|
|
TriggerDesc *trigdesc;
|
|
|
|
int i;
|
2020-02-24 10:12:10 +01:00
|
|
|
TriggerData LocTriggerData = {0};
|
2002-11-23 04:59:09 +01:00
|
|
|
|
|
|
|
trigdesc = relinfo->ri_TrigDesc;
|
|
|
|
|
|
|
|
if (trigdesc == NULL)
|
|
|
|
return;
|
2010-10-10 19:43:33 +02:00
|
|
|
if (!trigdesc->trig_delete_before_statement)
|
2002-11-23 04:59:09 +01:00
|
|
|
return;
|
|
|
|
|
2017-09-17 18:16:38 +02:00
|
|
|
/* no-op if we already fired BS triggers in this context */
|
|
|
|
if (before_stmt_triggers_fired(RelationGetRelid(relinfo->ri_RelationDesc),
|
|
|
|
CMD_DELETE))
|
|
|
|
return;
|
|
|
|
|
2002-11-23 04:59:09 +01:00
|
|
|
LocTriggerData.type = T_TriggerData;
|
|
|
|
LocTriggerData.tg_event = TRIGGER_EVENT_DELETE |
|
|
|
|
TRIGGER_EVENT_BEFORE;
|
|
|
|
LocTriggerData.tg_relation = relinfo->ri_RelationDesc;
|
2010-10-10 19:43:33 +02:00
|
|
|
for (i = 0; i < trigdesc->numtriggers; i++)
|
2002-11-23 04:59:09 +01:00
|
|
|
{
|
2010-10-10 19:43:33 +02:00
|
|
|
Trigger *trigger = &trigdesc->triggers[i];
|
2002-11-23 04:59:09 +01:00
|
|
|
HeapTuple newtuple;
|
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
if (!TRIGGER_TYPE_MATCHES(trigger->tgtype,
|
|
|
|
TRIGGER_TYPE_STATEMENT,
|
|
|
|
TRIGGER_TYPE_BEFORE,
|
|
|
|
TRIGGER_TYPE_DELETE))
|
|
|
|
continue;
|
2009-11-20 21:38:12 +01:00
|
|
|
if (!TriggerEnabled(estate, relinfo, trigger, LocTriggerData.tg_event,
|
|
|
|
NULL, NULL, NULL))
|
2009-10-15 00:14:25 +02:00
|
|
|
continue;
|
|
|
|
|
2002-11-23 04:59:09 +01:00
|
|
|
LocTriggerData.tg_trigger = trigger;
|
|
|
|
newtuple = ExecCallTriggerFunc(&LocTriggerData,
|
2010-10-10 19:43:33 +02:00
|
|
|
i,
|
2005-03-25 22:58:00 +01:00
|
|
|
relinfo->ri_TrigFunctions,
|
|
|
|
relinfo->ri_TrigInstrument,
|
2002-11-23 04:59:09 +01:00
|
|
|
GetPerTupleMemoryContext(estate));
|
|
|
|
|
|
|
|
if (newtuple)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
|
|
|
|
errmsg("BEFORE STATEMENT trigger cannot return a value")));
|
2002-11-23 04:59:09 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
2017-06-28 19:59:01 +02:00
|
|
|
ExecASDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
|
|
|
|
TransitionCaptureState *transition_capture)
|
2002-11-23 04:59:09 +01:00
|
|
|
{
|
|
|
|
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
|
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
if (trigdesc && trigdesc->trig_delete_after_statement)
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
AfterTriggerSaveEvent(estate, relinfo, NULL, NULL,
|
|
|
|
TRIGGER_EVENT_DELETE,
|
|
|
|
false, NULL, NULL, NIL, NULL, transition_capture,
|
|
|
|
false);
|
1997-09-01 09:59:06 +02:00
|
|
|
}
|
|
|
|
|
2018-07-12 09:21:39 +02:00
|
|
|
/*
|
|
|
|
* Execute BEFORE ROW DELETE triggers.
|
|
|
|
*
|
|
|
|
* True indicates caller can proceed with the delete. False indicates caller
|
|
|
|
* need to suppress the delete and additionally if requested, we need to pass
|
|
|
|
* back the concurrently updated tuple if any.
|
|
|
|
*/
|
1997-09-01 09:59:06 +02:00
|
|
|
bool
|
Re-implement EvalPlanQual processing to improve its performance and eliminate
a lot of strange behaviors that occurred in join cases. We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row. The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.
Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested. To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param. Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.
This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the
duplicate-output-tuple problem. It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
2009-10-26 03:26:45 +01:00
|
|
|
ExecBRDeleteTriggers(EState *estate, EPQState *epqstate,
|
2009-10-10 03:43:50 +02:00
|
|
|
ResultRelInfo *relinfo,
|
2014-03-23 07:16:34 +01:00
|
|
|
ItemPointer tupleid,
|
2018-07-12 09:21:39 +02:00
|
|
|
HeapTuple fdw_trigtuple,
|
|
|
|
TupleTableSlot **epqslot)
|
1997-09-01 09:59:06 +02:00
|
|
|
{
|
2019-02-27 05:30:28 +01:00
|
|
|
TupleTableSlot *slot = ExecGetTriggerOldSlot(estate, relinfo);
|
2001-06-01 04:41:36 +02:00
|
|
|
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
|
2005-08-24 19:38:35 +02:00
|
|
|
bool result = true;
|
2020-02-24 10:12:10 +01:00
|
|
|
TriggerData LocTriggerData = {0};
|
1998-12-15 13:47:01 +01:00
|
|
|
HeapTuple trigtuple;
|
2019-02-27 05:30:28 +01:00
|
|
|
bool should_free = false;
|
1998-12-15 13:47:01 +01:00
|
|
|
int i;
|
|
|
|
|
2014-03-23 07:16:34 +01:00
|
|
|
Assert(HeapTupleIsValid(fdw_trigtuple) ^ ItemPointerIsValid(tupleid));
|
|
|
|
if (fdw_trigtuple == NULL)
|
|
|
|
{
|
2019-10-04 20:59:34 +02:00
|
|
|
TupleTableSlot *epqslot_candidate = NULL;
|
2019-02-27 05:30:28 +01:00
|
|
|
|
|
|
|
if (!GetTupleForTrigger(estate, epqstate, relinfo, tupleid,
|
2022-03-28 16:45:58 +02:00
|
|
|
LockTupleExclusive, slot, &epqslot_candidate,
|
|
|
|
NULL))
|
2014-03-23 07:16:34 +01:00
|
|
|
return false;
|
2018-07-12 09:21:39 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If the tuple was concurrently updated and the caller of this
|
|
|
|
* function requested for the updated tuple, skip the trigger
|
|
|
|
* execution.
|
|
|
|
*/
|
2019-10-04 20:59:34 +02:00
|
|
|
if (epqslot_candidate != NULL && epqslot != NULL)
|
2018-07-12 09:21:39 +02:00
|
|
|
{
|
2019-10-04 20:59:34 +02:00
|
|
|
*epqslot = epqslot_candidate;
|
2018-07-12 09:21:39 +02:00
|
|
|
return false;
|
|
|
|
}
|
2019-02-27 05:30:28 +01:00
|
|
|
|
|
|
|
trigtuple = ExecFetchSlotHeapTuple(slot, true, &should_free);
|
2014-03-23 07:16:34 +01:00
|
|
|
}
|
|
|
|
else
|
2019-02-27 05:30:28 +01:00
|
|
|
{
|
2014-03-23 07:16:34 +01:00
|
|
|
trigtuple = fdw_trigtuple;
|
2019-04-19 20:33:37 +02:00
|
|
|
ExecForceStoreHeapTuple(trigtuple, slot, false);
|
2019-02-27 05:30:28 +01:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2000-05-29 03:59:17 +02:00
|
|
|
LocTriggerData.type = T_TriggerData;
|
2002-11-23 04:59:09 +01:00
|
|
|
LocTriggerData.tg_event = TRIGGER_EVENT_DELETE |
|
|
|
|
TRIGGER_EVENT_ROW |
|
|
|
|
TRIGGER_EVENT_BEFORE;
|
2001-06-01 04:41:36 +02:00
|
|
|
LocTriggerData.tg_relation = relinfo->ri_RelationDesc;
|
2010-10-10 19:43:33 +02:00
|
|
|
for (i = 0; i < trigdesc->numtriggers; i++)
|
1997-09-11 09:24:37 +02:00
|
|
|
{
|
2019-02-27 05:30:28 +01:00
|
|
|
HeapTuple newtuple;
|
2010-10-10 19:43:33 +02:00
|
|
|
Trigger *trigger = &trigdesc->triggers[i];
|
2001-06-01 04:41:36 +02:00
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
if (!TRIGGER_TYPE_MATCHES(trigger->tgtype,
|
|
|
|
TRIGGER_TYPE_ROW,
|
|
|
|
TRIGGER_TYPE_BEFORE,
|
|
|
|
TRIGGER_TYPE_DELETE))
|
|
|
|
continue;
|
2009-11-20 21:38:12 +01:00
|
|
|
if (!TriggerEnabled(estate, relinfo, trigger, LocTriggerData.tg_event,
|
2019-02-27 05:30:28 +01:00
|
|
|
NULL, slot, NULL))
|
2009-10-15 00:14:25 +02:00
|
|
|
continue;
|
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
LocTriggerData.tg_trigslot = slot;
|
2000-05-29 03:59:17 +02:00
|
|
|
LocTriggerData.tg_trigtuple = trigtuple;
|
2001-06-01 04:41:36 +02:00
|
|
|
LocTriggerData.tg_trigger = trigger;
|
|
|
|
newtuple = ExecCallTriggerFunc(&LocTriggerData,
|
2010-10-10 19:43:33 +02:00
|
|
|
i,
|
2005-03-25 22:58:00 +01:00
|
|
|
relinfo->ri_TrigFunctions,
|
|
|
|
relinfo->ri_TrigInstrument,
|
2001-01-22 01:50:07 +01:00
|
|
|
GetPerTupleMemoryContext(estate));
|
1997-09-11 09:24:37 +02:00
|
|
|
if (newtuple == NULL)
|
2005-08-24 19:38:35 +02:00
|
|
|
{
|
|
|
|
result = false; /* tell caller to suppress delete */
|
1997-09-11 09:24:37 +02:00
|
|
|
break;
|
2005-08-24 19:38:35 +02:00
|
|
|
}
|
1999-02-01 21:25:55 +01:00
|
|
|
if (newtuple != trigtuple)
|
1999-12-16 23:20:03 +01:00
|
|
|
heap_freetuple(newtuple);
|
1997-09-11 09:24:37 +02:00
|
|
|
}
|
2019-02-27 05:30:28 +01:00
|
|
|
if (should_free)
|
2014-03-23 07:16:34 +01:00
|
|
|
heap_freetuple(trigtuple);
|
1997-09-11 09:24:37 +02:00
|
|
|
|
2005-08-24 19:38:35 +02:00
|
|
|
return result;
|
1997-09-01 09:59:06 +02:00
|
|
|
}
|
|
|
|
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
/*
|
|
|
|
* Note: is_crosspart_update must be true if the DELETE is being performed
|
|
|
|
* as part of a cross-partition update.
|
|
|
|
*/
|
1997-09-01 09:59:06 +02:00
|
|
|
void
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
ExecARDeleteTriggers(EState *estate,
|
|
|
|
ResultRelInfo *relinfo,
|
2014-03-23 07:16:34 +01:00
|
|
|
ItemPointer tupleid,
|
2017-06-28 19:55:03 +02:00
|
|
|
HeapTuple fdw_trigtuple,
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
TransitionCaptureState *transition_capture,
|
|
|
|
bool is_crosspart_update)
|
1997-09-01 09:59:06 +02:00
|
|
|
{
|
2001-06-01 04:41:36 +02:00
|
|
|
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
|
1997-09-11 09:24:37 +02:00
|
|
|
|
2017-06-28 19:55:03 +02:00
|
|
|
if ((trigdesc && trigdesc->trig_delete_after_row) ||
|
|
|
|
(transition_capture && transition_capture->tcs_delete_old_table))
|
2001-01-27 06:16:58 +01:00
|
|
|
{
|
2021-03-31 02:01:27 +02:00
|
|
|
TupleTableSlot *slot = ExecGetTriggerOldSlot(estate, relinfo);
|
|
|
|
|
2014-03-23 07:16:34 +01:00
|
|
|
Assert(HeapTupleIsValid(fdw_trigtuple) ^ ItemPointerIsValid(tupleid));
|
|
|
|
if (fdw_trigtuple == NULL)
|
2019-02-27 05:30:28 +01:00
|
|
|
GetTupleForTrigger(estate,
|
|
|
|
NULL,
|
|
|
|
relinfo,
|
|
|
|
tupleid,
|
|
|
|
LockTupleExclusive,
|
|
|
|
slot,
|
2022-03-28 16:45:58 +02:00
|
|
|
NULL,
|
2019-02-27 05:30:28 +01:00
|
|
|
NULL);
|
2014-03-23 07:16:34 +01:00
|
|
|
else
|
2019-04-19 20:33:37 +02:00
|
|
|
ExecForceStoreHeapTuple(fdw_trigtuple, slot, false);
|
2001-01-27 06:16:58 +01:00
|
|
|
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
AfterTriggerSaveEvent(estate, relinfo, NULL, NULL,
|
|
|
|
TRIGGER_EVENT_DELETE,
|
2019-02-27 05:30:28 +01:00
|
|
|
true, slot, NULL, NIL, NULL,
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
transition_capture,
|
|
|
|
is_crosspart_update);
|
2001-01-27 06:16:58 +01:00
|
|
|
}
|
1997-09-01 09:59:06 +02:00
|
|
|
}
|
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
bool
|
|
|
|
ExecIRDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
|
|
|
|
HeapTuple trigtuple)
|
|
|
|
{
|
|
|
|
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
|
2019-02-27 05:30:28 +01:00
|
|
|
TupleTableSlot *slot = ExecGetTriggerOldSlot(estate, relinfo);
|
2020-02-24 10:12:10 +01:00
|
|
|
TriggerData LocTriggerData = {0};
|
2010-10-10 19:43:33 +02:00
|
|
|
int i;
|
|
|
|
|
|
|
|
LocTriggerData.type = T_TriggerData;
|
|
|
|
LocTriggerData.tg_event = TRIGGER_EVENT_DELETE |
|
|
|
|
TRIGGER_EVENT_ROW |
|
|
|
|
TRIGGER_EVENT_INSTEAD;
|
|
|
|
LocTriggerData.tg_relation = relinfo->ri_RelationDesc;
|
2019-02-27 05:30:28 +01:00
|
|
|
|
2019-04-19 20:33:37 +02:00
|
|
|
ExecForceStoreHeapTuple(trigtuple, slot, false);
|
2019-02-27 05:30:28 +01:00
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
for (i = 0; i < trigdesc->numtriggers; i++)
|
|
|
|
{
|
2019-02-27 05:30:28 +01:00
|
|
|
HeapTuple rettuple;
|
2010-10-10 19:43:33 +02:00
|
|
|
Trigger *trigger = &trigdesc->triggers[i];
|
|
|
|
|
|
|
|
if (!TRIGGER_TYPE_MATCHES(trigger->tgtype,
|
|
|
|
TRIGGER_TYPE_ROW,
|
|
|
|
TRIGGER_TYPE_INSTEAD,
|
|
|
|
TRIGGER_TYPE_DELETE))
|
|
|
|
continue;
|
|
|
|
if (!TriggerEnabled(estate, relinfo, trigger, LocTriggerData.tg_event,
|
2019-02-27 05:30:28 +01:00
|
|
|
NULL, slot, NULL))
|
2010-10-10 19:43:33 +02:00
|
|
|
continue;
|
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
LocTriggerData.tg_trigslot = slot;
|
2010-10-10 19:43:33 +02:00
|
|
|
LocTriggerData.tg_trigtuple = trigtuple;
|
|
|
|
LocTriggerData.tg_trigger = trigger;
|
|
|
|
rettuple = ExecCallTriggerFunc(&LocTriggerData,
|
|
|
|
i,
|
|
|
|
relinfo->ri_TrigFunctions,
|
|
|
|
relinfo->ri_TrigInstrument,
|
|
|
|
GetPerTupleMemoryContext(estate));
|
|
|
|
if (rettuple == NULL)
|
|
|
|
return false; /* Delete was suppressed */
|
|
|
|
if (rettuple != trigtuple)
|
|
|
|
heap_freetuple(rettuple);
|
|
|
|
}
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2002-11-23 04:59:09 +01:00
|
|
|
void
|
|
|
|
ExecBSUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
|
|
|
|
{
|
|
|
|
TriggerDesc *trigdesc;
|
|
|
|
int i;
|
2020-02-24 10:12:10 +01:00
|
|
|
TriggerData LocTriggerData = {0};
|
2015-05-08 00:20:46 +02:00
|
|
|
Bitmapset *updatedCols;
|
2002-11-23 04:59:09 +01:00
|
|
|
|
|
|
|
trigdesc = relinfo->ri_TrigDesc;
|
|
|
|
|
|
|
|
if (trigdesc == NULL)
|
|
|
|
return;
|
2010-10-10 19:43:33 +02:00
|
|
|
if (!trigdesc->trig_update_before_statement)
|
2002-11-23 04:59:09 +01:00
|
|
|
return;
|
|
|
|
|
2017-09-17 18:16:38 +02:00
|
|
|
/* no-op if we already fired BS triggers in this context */
|
|
|
|
if (before_stmt_triggers_fired(RelationGetRelid(relinfo->ri_RelationDesc),
|
|
|
|
CMD_UPDATE))
|
|
|
|
return;
|
|
|
|
|
Fix permission checks on constraint violation errors on partitions.
If a cross-partition UPDATE violates a constraint on the target partition,
and the columns in the new partition are in different physical order than
in the parent, the error message can reveal columns that the user does not
have SELECT permission on. A similar bug was fixed earlier in commit
804b6b6db4.
The cause of the bug is that the callers of the
ExecBuildSlotValueDescription() function got confused when constructing
the list of modified columns. If the tuple was routed from a parent, we
converted the tuple to the parent's format, but the list of modified
columns was grabbed directly from the child's RTE entry.
ExecUpdateLockMode() had a similar issue. That lead to confusion on which
columns are key columns, leading to wrong tuple lock being taken on tables
referenced by foreign keys, when a row is updated with INSERT ON CONFLICT
UPDATE. A new isolation test is added for that corner case.
With this patch, the ri_RangeTableIndex field is no longer set for
partitions that don't have an entry in the range table. Previously, it was
set to the RTE entry of the parent relation, but that was confusing.
NOTE: This modifies the ResultRelInfo struct, replacing the
ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to
backpatch, because it breaks any extensions accessing the field. The
change that ri_RangeTableIndex is not set for partitions could potentially
break extensions, too. The ResultRelInfos are visible to FDWs at least,
and this patch required small changes to postgres_fdw. Nevertheless, this
seem like the least bad option. I don't think these fields widely used in
extensions; I don't think there are FDWs out there that uses the FDW
"direct update" API, other than postgres_fdw. If there is, you will get a
compilation error, so hopefully it is caught quickly.
Backpatch to 11, where support for both cross-partition UPDATEs, and unique
indexes on partitioned tables, were added.
Reviewed-by: Amit Langote
Security: CVE-2021-3393
2021-02-08 10:01:51 +01:00
|
|
|
/* statement-level triggers operate on the parent table */
|
|
|
|
Assert(relinfo->ri_RootResultRelInfo == NULL);
|
|
|
|
|
|
|
|
updatedCols = ExecGetAllUpdatedCols(relinfo, estate);
|
2009-10-15 00:14:25 +02:00
|
|
|
|
2002-11-23 04:59:09 +01:00
|
|
|
LocTriggerData.type = T_TriggerData;
|
|
|
|
LocTriggerData.tg_event = TRIGGER_EVENT_UPDATE |
|
|
|
|
TRIGGER_EVENT_BEFORE;
|
|
|
|
LocTriggerData.tg_relation = relinfo->ri_RelationDesc;
|
2020-03-09 09:22:22 +01:00
|
|
|
LocTriggerData.tg_updatedcols = updatedCols;
|
2010-10-10 19:43:33 +02:00
|
|
|
for (i = 0; i < trigdesc->numtriggers; i++)
|
2002-11-23 04:59:09 +01:00
|
|
|
{
|
2010-10-10 19:43:33 +02:00
|
|
|
Trigger *trigger = &trigdesc->triggers[i];
|
2002-11-23 04:59:09 +01:00
|
|
|
HeapTuple newtuple;
|
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
if (!TRIGGER_TYPE_MATCHES(trigger->tgtype,
|
|
|
|
TRIGGER_TYPE_STATEMENT,
|
|
|
|
TRIGGER_TYPE_BEFORE,
|
|
|
|
TRIGGER_TYPE_UPDATE))
|
|
|
|
continue;
|
2009-11-20 21:38:12 +01:00
|
|
|
if (!TriggerEnabled(estate, relinfo, trigger, LocTriggerData.tg_event,
|
2015-05-08 00:20:46 +02:00
|
|
|
updatedCols, NULL, NULL))
|
2009-10-15 00:14:25 +02:00
|
|
|
continue;
|
|
|
|
|
2002-11-23 04:59:09 +01:00
|
|
|
LocTriggerData.tg_trigger = trigger;
|
|
|
|
newtuple = ExecCallTriggerFunc(&LocTriggerData,
|
2010-10-10 19:43:33 +02:00
|
|
|
i,
|
2005-03-25 22:58:00 +01:00
|
|
|
relinfo->ri_TrigFunctions,
|
|
|
|
relinfo->ri_TrigInstrument,
|
2002-11-23 04:59:09 +01:00
|
|
|
GetPerTupleMemoryContext(estate));
|
|
|
|
|
|
|
|
if (newtuple)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
|
|
|
|
errmsg("BEFORE STATEMENT trigger cannot return a value")));
|
2002-11-23 04:59:09 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
2017-06-28 19:59:01 +02:00
|
|
|
ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
|
|
|
|
TransitionCaptureState *transition_capture)
|
2002-11-23 04:59:09 +01:00
|
|
|
{
|
|
|
|
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
|
|
|
|
|
Fix permission checks on constraint violation errors on partitions.
If a cross-partition UPDATE violates a constraint on the target partition,
and the columns in the new partition are in different physical order than
in the parent, the error message can reveal columns that the user does not
have SELECT permission on. A similar bug was fixed earlier in commit
804b6b6db4.
The cause of the bug is that the callers of the
ExecBuildSlotValueDescription() function got confused when constructing
the list of modified columns. If the tuple was routed from a parent, we
converted the tuple to the parent's format, but the list of modified
columns was grabbed directly from the child's RTE entry.
ExecUpdateLockMode() had a similar issue. That lead to confusion on which
columns are key columns, leading to wrong tuple lock being taken on tables
referenced by foreign keys, when a row is updated with INSERT ON CONFLICT
UPDATE. A new isolation test is added for that corner case.
With this patch, the ri_RangeTableIndex field is no longer set for
partitions that don't have an entry in the range table. Previously, it was
set to the RTE entry of the parent relation, but that was confusing.
NOTE: This modifies the ResultRelInfo struct, replacing the
ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to
backpatch, because it breaks any extensions accessing the field. The
change that ri_RangeTableIndex is not set for partitions could potentially
break extensions, too. The ResultRelInfos are visible to FDWs at least,
and this patch required small changes to postgres_fdw. Nevertheless, this
seem like the least bad option. I don't think these fields widely used in
extensions; I don't think there are FDWs out there that uses the FDW
"direct update" API, other than postgres_fdw. If there is, you will get a
compilation error, so hopefully it is caught quickly.
Backpatch to 11, where support for both cross-partition UPDATEs, and unique
indexes on partitioned tables, were added.
Reviewed-by: Amit Langote
Security: CVE-2021-3393
2021-02-08 10:01:51 +01:00
|
|
|
/* statement-level triggers operate on the parent table */
|
|
|
|
Assert(relinfo->ri_RootResultRelInfo == NULL);
|
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
if (trigdesc && trigdesc->trig_update_after_statement)
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
AfterTriggerSaveEvent(estate, relinfo, NULL, NULL,
|
|
|
|
TRIGGER_EVENT_UPDATE,
|
2009-10-15 00:14:25 +02:00
|
|
|
false, NULL, NULL, NIL,
|
Fix permission checks on constraint violation errors on partitions.
If a cross-partition UPDATE violates a constraint on the target partition,
and the columns in the new partition are in different physical order than
in the parent, the error message can reveal columns that the user does not
have SELECT permission on. A similar bug was fixed earlier in commit
804b6b6db4.
The cause of the bug is that the callers of the
ExecBuildSlotValueDescription() function got confused when constructing
the list of modified columns. If the tuple was routed from a parent, we
converted the tuple to the parent's format, but the list of modified
columns was grabbed directly from the child's RTE entry.
ExecUpdateLockMode() had a similar issue. That lead to confusion on which
columns are key columns, leading to wrong tuple lock being taken on tables
referenced by foreign keys, when a row is updated with INSERT ON CONFLICT
UPDATE. A new isolation test is added for that corner case.
With this patch, the ri_RangeTableIndex field is no longer set for
partitions that don't have an entry in the range table. Previously, it was
set to the RTE entry of the parent relation, but that was confusing.
NOTE: This modifies the ResultRelInfo struct, replacing the
ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to
backpatch, because it breaks any extensions accessing the field. The
change that ri_RangeTableIndex is not set for partitions could potentially
break extensions, too. The ResultRelInfos are visible to FDWs at least,
and this patch required small changes to postgres_fdw. Nevertheless, this
seem like the least bad option. I don't think these fields widely used in
extensions; I don't think there are FDWs out there that uses the FDW
"direct update" API, other than postgres_fdw. If there is, you will get a
compilation error, so hopefully it is caught quickly.
Backpatch to 11, where support for both cross-partition UPDATEs, and unique
indexes on partitioned tables, were added.
Reviewed-by: Amit Langote
Security: CVE-2021-3393
2021-02-08 10:01:51 +01:00
|
|
|
ExecGetAllUpdatedCols(relinfo, estate),
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
transition_capture,
|
|
|
|
false);
|
2002-11-23 04:59:09 +01:00
|
|
|
}
|
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
bool
|
Re-implement EvalPlanQual processing to improve its performance and eliminate
a lot of strange behaviors that occurred in join cases. We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row. The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.
Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested. To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param. Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.
This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the
duplicate-output-tuple problem. It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
2009-10-26 03:26:45 +01:00
|
|
|
ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
|
2009-10-10 03:43:50 +02:00
|
|
|
ResultRelInfo *relinfo,
|
2014-03-23 07:16:34 +01:00
|
|
|
ItemPointer tupleid,
|
|
|
|
HeapTuple fdw_trigtuple,
|
2022-03-28 16:45:58 +02:00
|
|
|
TupleTableSlot *newslot,
|
|
|
|
TM_FailureData *tmfd)
|
1997-09-01 09:59:06 +02:00
|
|
|
{
|
2001-06-01 04:41:36 +02:00
|
|
|
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
|
2019-02-27 05:30:28 +01:00
|
|
|
TupleTableSlot *oldslot = ExecGetTriggerOldSlot(estate, relinfo);
|
|
|
|
HeapTuple newtuple = NULL;
|
1998-12-15 13:47:01 +01:00
|
|
|
HeapTuple trigtuple;
|
2019-02-27 05:30:28 +01:00
|
|
|
bool should_free_trig = false;
|
|
|
|
bool should_free_new = false;
|
2020-02-24 10:12:10 +01:00
|
|
|
TriggerData LocTriggerData = {0};
|
1998-12-15 13:47:01 +01:00
|
|
|
int i;
|
2015-05-08 00:20:46 +02:00
|
|
|
Bitmapset *updatedCols;
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
LockTupleMode lockmode;
|
|
|
|
|
Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint. DO NOTHING avoids the
constraint violation, without touching the pre-existing row. DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed. The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.
This feature is often referred to as upsert.
This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert. If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made. If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.
To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.
Bumps catversion as stored rules change.
Author: Peter Geoghegan, with significant contributions from Heikki
Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
Dean Rasheed, Stephen Frost and many others.
2015-05-08 05:31:36 +02:00
|
|
|
/* Determine lock mode to use */
|
|
|
|
lockmode = ExecUpdateLockMode(estate, relinfo);
|
1998-12-15 13:47:01 +01:00
|
|
|
|
2014-03-23 07:16:34 +01:00
|
|
|
Assert(HeapTupleIsValid(fdw_trigtuple) ^ ItemPointerIsValid(tupleid));
|
|
|
|
if (fdw_trigtuple == NULL)
|
|
|
|
{
|
2019-10-04 20:59:34 +02:00
|
|
|
TupleTableSlot *epqslot_candidate = NULL;
|
2019-02-27 05:30:28 +01:00
|
|
|
|
2014-03-23 07:16:34 +01:00
|
|
|
/* get a copy of the on-disk tuple we are planning to update */
|
2019-02-27 05:30:28 +01:00
|
|
|
if (!GetTupleForTrigger(estate, epqstate, relinfo, tupleid,
|
2022-03-28 16:45:58 +02:00
|
|
|
lockmode, oldslot, &epqslot_candidate,
|
|
|
|
tmfd))
|
2019-02-27 05:30:28 +01:00
|
|
|
return false; /* cancel the update action */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* In READ COMMITTED isolation level it's possible that target tuple
|
|
|
|
* was changed due to concurrent update. In that case we have a raw
|
Rework planning and execution of UPDATE and DELETE.
This patch makes two closely related sets of changes:
1. For UPDATE, the subplan of the ModifyTable node now only delivers
the new values of the changed columns (i.e., the expressions computed
in the query's SET clause) plus row identity information such as CTID.
ModifyTable must re-fetch the original tuple to merge in the old
values of any unchanged columns. The core advantage of this is that
the changed columns are uniform across all tables of an inherited or
partitioned target relation, whereas the other columns might not be.
A secondary advantage, when the UPDATE involves joins, is that less
data needs to pass through the plan tree. The disadvantage of course
is an extra fetch of each tuple to be updated. However, that seems to
be very nearly free in context; even worst-case tests don't show it to
add more than a couple percent to the total query cost. At some point
it might be interesting to combine the re-fetch with the tuple access
that ModifyTable must do anyway to mark the old tuple dead; but that
would require a good deal of refactoring and it seems it wouldn't buy
all that much, so this patch doesn't attempt it.
2. For inherited UPDATE/DELETE, instead of generating a separate
subplan for each target relation, we now generate a single subplan
that is just exactly like a SELECT's plan, then stick ModifyTable
on top of that. To let ModifyTable know which target relation a
given incoming row refers to, a tableoid junk column is added to
the row identity information. This gets rid of the horrid hack
that was inheritance_planner(), eliminating O(N^2) planning cost
and memory consumption in cases where there were many unprunable
target relations.
Point 2 of course requires point 1, so that there is a uniform
definition of the non-junk columns to be returned by the subplan.
We can't insist on uniform definition of the row identity junk
columns however, if we want to keep the ability to have both
plain and foreign tables in a partitioning hierarchy. Since
it wouldn't scale very far to have every child table have its
own row identity column, this patch includes provisions to merge
similar row identity columns into one column of the subplan result.
In particular, we can merge the whole-row Vars typically used as
row identity by FDWs into one column by pretending they are type
RECORD. (It's still okay for the actual composite Datums to be
labeled with the table's rowtype OID, though.)
There is more that can be done to file down residual inefficiencies
in this patch, but it seems to be committable now.
FDW authors should note several API changes:
* The argument list for AddForeignUpdateTargets() has changed, and so
has the method it must use for adding junk columns to the query. Call
add_row_identity_var() instead of manipulating the parse tree directly.
You might want to reconsider exactly what you're adding, too.
* PlanDirectModify() must now work a little harder to find the
ForeignScan plan node; if the foreign table is part of a partitioning
hierarchy then the ForeignScan might not be the direct child of
ModifyTable. See postgres_fdw for sample code.
* To check whether a relation is a target relation, it's no
longer sufficient to compare its relid to root->parse->resultRelation.
Instead, check it against all_result_relids or leaf_result_relids,
as appropriate.
Amit Langote and Tom Lane
Discussion: https://postgr.es/m/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com
2021-03-31 17:52:34 +02:00
|
|
|
* subplan output tuple in epqslot_candidate, and need to form a new
|
|
|
|
* insertable tuple using ExecGetUpdateNewTuple to replace the one we
|
|
|
|
* received in newslot. Neither we nor our callers have any further
|
|
|
|
* interest in the passed-in tuple, so it's okay to overwrite newslot
|
|
|
|
* with the newer data.
|
2019-02-27 05:30:28 +01:00
|
|
|
*
|
Rework planning and execution of UPDATE and DELETE.
This patch makes two closely related sets of changes:
1. For UPDATE, the subplan of the ModifyTable node now only delivers
the new values of the changed columns (i.e., the expressions computed
in the query's SET clause) plus row identity information such as CTID.
ModifyTable must re-fetch the original tuple to merge in the old
values of any unchanged columns. The core advantage of this is that
the changed columns are uniform across all tables of an inherited or
partitioned target relation, whereas the other columns might not be.
A secondary advantage, when the UPDATE involves joins, is that less
data needs to pass through the plan tree. The disadvantage of course
is an extra fetch of each tuple to be updated. However, that seems to
be very nearly free in context; even worst-case tests don't show it to
add more than a couple percent to the total query cost. At some point
it might be interesting to combine the re-fetch with the tuple access
that ModifyTable must do anyway to mark the old tuple dead; but that
would require a good deal of refactoring and it seems it wouldn't buy
all that much, so this patch doesn't attempt it.
2. For inherited UPDATE/DELETE, instead of generating a separate
subplan for each target relation, we now generate a single subplan
that is just exactly like a SELECT's plan, then stick ModifyTable
on top of that. To let ModifyTable know which target relation a
given incoming row refers to, a tableoid junk column is added to
the row identity information. This gets rid of the horrid hack
that was inheritance_planner(), eliminating O(N^2) planning cost
and memory consumption in cases where there were many unprunable
target relations.
Point 2 of course requires point 1, so that there is a uniform
definition of the non-junk columns to be returned by the subplan.
We can't insist on uniform definition of the row identity junk
columns however, if we want to keep the ability to have both
plain and foreign tables in a partitioning hierarchy. Since
it wouldn't scale very far to have every child table have its
own row identity column, this patch includes provisions to merge
similar row identity columns into one column of the subplan result.
In particular, we can merge the whole-row Vars typically used as
row identity by FDWs into one column by pretending they are type
RECORD. (It's still okay for the actual composite Datums to be
labeled with the table's rowtype OID, though.)
There is more that can be done to file down residual inefficiencies
in this patch, but it seems to be committable now.
FDW authors should note several API changes:
* The argument list for AddForeignUpdateTargets() has changed, and so
has the method it must use for adding junk columns to the query. Call
add_row_identity_var() instead of manipulating the parse tree directly.
You might want to reconsider exactly what you're adding, too.
* PlanDirectModify() must now work a little harder to find the
ForeignScan plan node; if the foreign table is part of a partitioning
hierarchy then the ForeignScan might not be the direct child of
ModifyTable. See postgres_fdw for sample code.
* To check whether a relation is a target relation, it's no
longer sufficient to compare its relid to root->parse->resultRelation.
Instead, check it against all_result_relids or leaf_result_relids,
as appropriate.
Amit Langote and Tom Lane
Discussion: https://postgr.es/m/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com
2021-03-31 17:52:34 +02:00
|
|
|
* (Typically, newslot was also generated by ExecGetUpdateNewTuple, so
|
|
|
|
* that epqslot_clean will be that same slot and the copy step below
|
|
|
|
* is not needed.)
|
2019-02-27 05:30:28 +01:00
|
|
|
*/
|
2019-10-04 20:59:34 +02:00
|
|
|
if (epqslot_candidate != NULL)
|
2019-02-27 05:30:28 +01:00
|
|
|
{
|
2019-10-04 20:59:34 +02:00
|
|
|
TupleTableSlot *epqslot_clean;
|
2019-02-27 05:30:28 +01:00
|
|
|
|
Rework planning and execution of UPDATE and DELETE.
This patch makes two closely related sets of changes:
1. For UPDATE, the subplan of the ModifyTable node now only delivers
the new values of the changed columns (i.e., the expressions computed
in the query's SET clause) plus row identity information such as CTID.
ModifyTable must re-fetch the original tuple to merge in the old
values of any unchanged columns. The core advantage of this is that
the changed columns are uniform across all tables of an inherited or
partitioned target relation, whereas the other columns might not be.
A secondary advantage, when the UPDATE involves joins, is that less
data needs to pass through the plan tree. The disadvantage of course
is an extra fetch of each tuple to be updated. However, that seems to
be very nearly free in context; even worst-case tests don't show it to
add more than a couple percent to the total query cost. At some point
it might be interesting to combine the re-fetch with the tuple access
that ModifyTable must do anyway to mark the old tuple dead; but that
would require a good deal of refactoring and it seems it wouldn't buy
all that much, so this patch doesn't attempt it.
2. For inherited UPDATE/DELETE, instead of generating a separate
subplan for each target relation, we now generate a single subplan
that is just exactly like a SELECT's plan, then stick ModifyTable
on top of that. To let ModifyTable know which target relation a
given incoming row refers to, a tableoid junk column is added to
the row identity information. This gets rid of the horrid hack
that was inheritance_planner(), eliminating O(N^2) planning cost
and memory consumption in cases where there were many unprunable
target relations.
Point 2 of course requires point 1, so that there is a uniform
definition of the non-junk columns to be returned by the subplan.
We can't insist on uniform definition of the row identity junk
columns however, if we want to keep the ability to have both
plain and foreign tables in a partitioning hierarchy. Since
it wouldn't scale very far to have every child table have its
own row identity column, this patch includes provisions to merge
similar row identity columns into one column of the subplan result.
In particular, we can merge the whole-row Vars typically used as
row identity by FDWs into one column by pretending they are type
RECORD. (It's still okay for the actual composite Datums to be
labeled with the table's rowtype OID, though.)
There is more that can be done to file down residual inefficiencies
in this patch, but it seems to be committable now.
FDW authors should note several API changes:
* The argument list for AddForeignUpdateTargets() has changed, and so
has the method it must use for adding junk columns to the query. Call
add_row_identity_var() instead of manipulating the parse tree directly.
You might want to reconsider exactly what you're adding, too.
* PlanDirectModify() must now work a little harder to find the
ForeignScan plan node; if the foreign table is part of a partitioning
hierarchy then the ForeignScan might not be the direct child of
ModifyTable. See postgres_fdw for sample code.
* To check whether a relation is a target relation, it's no
longer sufficient to compare its relid to root->parse->resultRelation.
Instead, check it against all_result_relids or leaf_result_relids,
as appropriate.
Amit Langote and Tom Lane
Discussion: https://postgr.es/m/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com
2021-03-31 17:52:34 +02:00
|
|
|
epqslot_clean = ExecGetUpdateNewTuple(relinfo, epqslot_candidate,
|
|
|
|
oldslot);
|
2019-10-04 20:59:34 +02:00
|
|
|
|
|
|
|
if (newslot != epqslot_clean)
|
|
|
|
ExecCopySlot(newslot, epqslot_clean);
|
2019-02-27 05:30:28 +01:00
|
|
|
}
|
|
|
|
|
In INSERT/UPDATE, use the table's real tuple descriptor as target.
Previously, ExecInitModifyTable relied on ExecInitJunkFilter,
and thence ExecCleanTypeFromTL, to build the target descriptor from
the query tlist. While we just checked (in ExecCheckPlanOutput)
that the tlist produces compatible output, this is not a great
substitute for the relation's actual tuple descriptor that's
available from the relcache. For one thing, dropped columns will
not be correctly marked attisdropped; it's a bit surprising that
we've gotten away with that this long. But the real reason for
being concerned with this is that using the table's descriptor means
that the slot will have correct attrmissing data, allowing us to
revert the klugy fix of commit ba9f18abd. (This commit undoes
that one's changes in trigger.c, but keeps the new test case.)
Thus we can solve the bogus-trigger-tuple problem with fewer cycles
rather than more.
No back-patch, since this doesn't fix any additional bug, and it
seems somewhat more likely to have unforeseen side effects than
ba9f18abd's narrow fix.
Discussion: https://postgr.es/m/16644-5da7ef98a7ac4545@postgresql.org
2020-10-26 16:36:53 +01:00
|
|
|
trigtuple = ExecFetchSlotHeapTuple(oldslot, true, &should_free_trig);
|
2014-03-23 07:16:34 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2019-04-19 20:33:37 +02:00
|
|
|
ExecForceStoreHeapTuple(fdw_trigtuple, oldslot, false);
|
2014-03-23 07:16:34 +01:00
|
|
|
trigtuple = fdw_trigtuple;
|
2011-02-22 03:18:04 +01:00
|
|
|
}
|
1999-01-29 12:56:01 +01:00
|
|
|
|
2000-05-29 03:59:17 +02:00
|
|
|
LocTriggerData.type = T_TriggerData;
|
2003-01-08 23:28:32 +01:00
|
|
|
LocTriggerData.tg_event = TRIGGER_EVENT_UPDATE |
|
|
|
|
TRIGGER_EVENT_ROW |
|
|
|
|
TRIGGER_EVENT_BEFORE;
|
2001-06-01 04:41:36 +02:00
|
|
|
LocTriggerData.tg_relation = relinfo->ri_RelationDesc;
|
Fix permission checks on constraint violation errors on partitions.
If a cross-partition UPDATE violates a constraint on the target partition,
and the columns in the new partition are in different physical order than
in the parent, the error message can reveal columns that the user does not
have SELECT permission on. A similar bug was fixed earlier in commit
804b6b6db4.
The cause of the bug is that the callers of the
ExecBuildSlotValueDescription() function got confused when constructing
the list of modified columns. If the tuple was routed from a parent, we
converted the tuple to the parent's format, but the list of modified
columns was grabbed directly from the child's RTE entry.
ExecUpdateLockMode() had a similar issue. That lead to confusion on which
columns are key columns, leading to wrong tuple lock being taken on tables
referenced by foreign keys, when a row is updated with INSERT ON CONFLICT
UPDATE. A new isolation test is added for that corner case.
With this patch, the ri_RangeTableIndex field is no longer set for
partitions that don't have an entry in the range table. Previously, it was
set to the RTE entry of the parent relation, but that was confusing.
NOTE: This modifies the ResultRelInfo struct, replacing the
ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to
backpatch, because it breaks any extensions accessing the field. The
change that ri_RangeTableIndex is not set for partitions could potentially
break extensions, too. The ResultRelInfos are visible to FDWs at least,
and this patch required small changes to postgres_fdw. Nevertheless, this
seem like the least bad option. I don't think these fields widely used in
extensions; I don't think there are FDWs out there that uses the FDW
"direct update" API, other than postgres_fdw. If there is, you will get a
compilation error, so hopefully it is caught quickly.
Backpatch to 11, where support for both cross-partition UPDATEs, and unique
indexes on partitioned tables, were added.
Reviewed-by: Amit Langote
Security: CVE-2021-3393
2021-02-08 10:01:51 +01:00
|
|
|
updatedCols = ExecGetAllUpdatedCols(relinfo, estate);
|
2020-03-09 09:22:22 +01:00
|
|
|
LocTriggerData.tg_updatedcols = updatedCols;
|
2010-10-10 19:43:33 +02:00
|
|
|
for (i = 0; i < trigdesc->numtriggers; i++)
|
1997-09-11 09:24:37 +02:00
|
|
|
{
|
2010-10-10 19:43:33 +02:00
|
|
|
Trigger *trigger = &trigdesc->triggers[i];
|
2019-02-27 05:30:28 +01:00
|
|
|
HeapTuple oldtuple;
|
2001-06-01 04:41:36 +02:00
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
if (!TRIGGER_TYPE_MATCHES(trigger->tgtype,
|
|
|
|
TRIGGER_TYPE_ROW,
|
|
|
|
TRIGGER_TYPE_BEFORE,
|
|
|
|
TRIGGER_TYPE_UPDATE))
|
|
|
|
continue;
|
2009-11-20 21:38:12 +01:00
|
|
|
if (!TriggerEnabled(estate, relinfo, trigger, LocTriggerData.tg_event,
|
2019-02-27 05:30:28 +01:00
|
|
|
updatedCols, oldslot, newslot))
|
2009-10-15 00:14:25 +02:00
|
|
|
continue;
|
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
if (!newtuple)
|
|
|
|
newtuple = ExecFetchSlotHeapTuple(newslot, true, &should_free_new);
|
|
|
|
|
|
|
|
LocTriggerData.tg_trigslot = oldslot;
|
2000-05-29 03:59:17 +02:00
|
|
|
LocTriggerData.tg_trigtuple = trigtuple;
|
|
|
|
LocTriggerData.tg_newtuple = oldtuple = newtuple;
|
2019-02-27 05:30:28 +01:00
|
|
|
LocTriggerData.tg_newslot = newslot;
|
2001-06-01 04:41:36 +02:00
|
|
|
LocTriggerData.tg_trigger = trigger;
|
|
|
|
newtuple = ExecCallTriggerFunc(&LocTriggerData,
|
2010-10-10 19:43:33 +02:00
|
|
|
i,
|
2005-03-25 22:58:00 +01:00
|
|
|
relinfo->ri_TrigFunctions,
|
|
|
|
relinfo->ri_TrigInstrument,
|
2001-01-22 01:50:07 +01:00
|
|
|
GetPerTupleMemoryContext(estate));
|
2019-02-27 05:30:28 +01:00
|
|
|
|
1997-09-11 09:24:37 +02:00
|
|
|
if (newtuple == NULL)
|
2011-02-22 03:18:04 +01:00
|
|
|
{
|
2019-02-27 05:30:28 +01:00
|
|
|
if (should_free_trig)
|
2014-03-23 07:16:34 +01:00
|
|
|
heap_freetuple(trigtuple);
|
2019-02-27 05:30:28 +01:00
|
|
|
if (should_free_new)
|
|
|
|
heap_freetuple(oldtuple);
|
|
|
|
return false; /* "do nothing" */
|
2011-02-22 03:18:04 +01:00
|
|
|
}
|
2019-02-27 05:30:28 +01:00
|
|
|
else if (newtuple != oldtuple)
|
|
|
|
{
|
2019-04-19 20:33:37 +02:00
|
|
|
ExecForceStoreHeapTuple(newtuple, newslot, false);
|
2011-02-22 03:18:04 +01:00
|
|
|
|
2019-04-19 02:53:54 +02:00
|
|
|
/*
|
|
|
|
* If the tuple returned by the trigger / being stored, is the old
|
|
|
|
* row version, and the heap tuple passed to the trigger was
|
|
|
|
* allocated locally, materialize the slot. Otherwise we might
|
|
|
|
* free it while still referenced by the slot.
|
|
|
|
*/
|
|
|
|
if (should_free_trig && newtuple == trigtuple)
|
|
|
|
ExecMaterializeSlot(newslot);
|
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
if (should_free_new)
|
|
|
|
heap_freetuple(oldtuple);
|
2011-02-22 03:18:04 +01:00
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
/* signal tuple should be re-fetched if used */
|
|
|
|
newtuple = NULL;
|
|
|
|
}
|
2011-02-22 03:18:04 +01:00
|
|
|
}
|
2019-02-27 05:30:28 +01:00
|
|
|
if (should_free_trig)
|
|
|
|
heap_freetuple(trigtuple);
|
|
|
|
|
|
|
|
return true;
|
1997-09-01 09:59:06 +02:00
|
|
|
}
|
|
|
|
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
/*
|
|
|
|
* Note: 'src_partinfo' and 'dst_partinfo', when non-NULL, refer to the source
|
|
|
|
* and destination partitions, respectively, of a cross-partition update of
|
|
|
|
* the root partitioned table mentioned in the query, given by 'relinfo'.
|
|
|
|
* 'tupleid' in that case refers to the ctid of the "old" tuple in the source
|
|
|
|
* partition, and 'newslot' contains the "new" tuple in the destination
|
|
|
|
* partition. This interface allows to support the requirements of
|
|
|
|
* ExecCrossPartitionUpdateForeignKey(); is_crosspart_update must be true in
|
|
|
|
* that case.
|
|
|
|
*/
|
1997-09-01 09:59:06 +02:00
|
|
|
void
|
2001-06-01 04:41:36 +02:00
|
|
|
ExecARUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
ResultRelInfo *src_partinfo,
|
|
|
|
ResultRelInfo *dst_partinfo,
|
2014-03-23 07:16:34 +01:00
|
|
|
ItemPointer tupleid,
|
|
|
|
HeapTuple fdw_trigtuple,
|
2019-02-27 05:30:28 +01:00
|
|
|
TupleTableSlot *newslot,
|
2017-06-28 19:55:03 +02:00
|
|
|
List *recheckIndexes,
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
TransitionCaptureState *transition_capture,
|
|
|
|
bool is_crosspart_update)
|
1997-09-01 09:59:06 +02:00
|
|
|
{
|
2001-06-01 04:41:36 +02:00
|
|
|
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2017-06-28 19:55:03 +02:00
|
|
|
if ((trigdesc && trigdesc->trig_update_after_row) ||
|
|
|
|
(transition_capture &&
|
|
|
|
(transition_capture->tcs_update_old_table ||
|
|
|
|
transition_capture->tcs_update_new_table)))
|
2001-01-27 06:16:58 +01:00
|
|
|
{
|
Allow UPDATE to move rows between partitions.
When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint. In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one. This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented. (There is a pending patch to improve the
situation further, but it needs more review.)
Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me. A few final revisions by me.
Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
2018-01-19 21:33:06 +01:00
|
|
|
/*
|
|
|
|
* Note: if the UPDATE is converted into a DELETE+INSERT as part of
|
|
|
|
* update-partition-key operation, then this function is also called
|
|
|
|
* separately for DELETE and INSERT to capture transition table rows.
|
|
|
|
* In such case, either old tuple or new tuple can be NULL.
|
|
|
|
*/
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
TupleTableSlot *oldslot;
|
|
|
|
ResultRelInfo *tupsrc;
|
|
|
|
|
|
|
|
Assert((src_partinfo != NULL && dst_partinfo != NULL) ||
|
|
|
|
!is_crosspart_update);
|
|
|
|
|
|
|
|
tupsrc = src_partinfo ? src_partinfo : relinfo;
|
|
|
|
oldslot = ExecGetTriggerOldSlot(estate, tupsrc);
|
2021-03-31 02:01:27 +02:00
|
|
|
|
Allow UPDATE to move rows between partitions.
When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint. In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one. This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented. (There is a pending patch to improve the
situation further, but it needs more review.)
Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me. A few final revisions by me.
Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
2018-01-19 21:33:06 +01:00
|
|
|
if (fdw_trigtuple == NULL && ItemPointerIsValid(tupleid))
|
2019-02-27 05:30:28 +01:00
|
|
|
GetTupleForTrigger(estate,
|
|
|
|
NULL,
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
tupsrc,
|
2019-02-27 05:30:28 +01:00
|
|
|
tupleid,
|
|
|
|
LockTupleExclusive,
|
|
|
|
oldslot,
|
2022-03-28 16:45:58 +02:00
|
|
|
NULL,
|
2019-02-27 05:30:28 +01:00
|
|
|
NULL);
|
|
|
|
else if (fdw_trigtuple != NULL)
|
2019-04-19 20:33:37 +02:00
|
|
|
ExecForceStoreHeapTuple(fdw_trigtuple, oldslot, false);
|
2021-03-31 02:01:27 +02:00
|
|
|
else
|
|
|
|
ExecClearTuple(oldslot);
|
2001-01-27 06:16:58 +01:00
|
|
|
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
AfterTriggerSaveEvent(estate, relinfo,
|
|
|
|
src_partinfo, dst_partinfo,
|
|
|
|
TRIGGER_EVENT_UPDATE,
|
|
|
|
true,
|
|
|
|
oldslot, newslot, recheckIndexes,
|
Fix permission checks on constraint violation errors on partitions.
If a cross-partition UPDATE violates a constraint on the target partition,
and the columns in the new partition are in different physical order than
in the parent, the error message can reveal columns that the user does not
have SELECT permission on. A similar bug was fixed earlier in commit
804b6b6db4.
The cause of the bug is that the callers of the
ExecBuildSlotValueDescription() function got confused when constructing
the list of modified columns. If the tuple was routed from a parent, we
converted the tuple to the parent's format, but the list of modified
columns was grabbed directly from the child's RTE entry.
ExecUpdateLockMode() had a similar issue. That lead to confusion on which
columns are key columns, leading to wrong tuple lock being taken on tables
referenced by foreign keys, when a row is updated with INSERT ON CONFLICT
UPDATE. A new isolation test is added for that corner case.
With this patch, the ri_RangeTableIndex field is no longer set for
partitions that don't have an entry in the range table. Previously, it was
set to the RTE entry of the parent relation, but that was confusing.
NOTE: This modifies the ResultRelInfo struct, replacing the
ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to
backpatch, because it breaks any extensions accessing the field. The
change that ri_RangeTableIndex is not set for partitions could potentially
break extensions, too. The ResultRelInfos are visible to FDWs at least,
and this patch required small changes to postgres_fdw. Nevertheless, this
seem like the least bad option. I don't think these fields widely used in
extensions; I don't think there are FDWs out there that uses the FDW
"direct update" API, other than postgres_fdw. If there is, you will get a
compilation error, so hopefully it is caught quickly.
Backpatch to 11, where support for both cross-partition UPDATEs, and unique
indexes on partitioned tables, were added.
Reviewed-by: Amit Langote
Security: CVE-2021-3393
2021-02-08 10:01:51 +01:00
|
|
|
ExecGetAllUpdatedCols(relinfo, estate),
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
transition_capture,
|
|
|
|
is_crosspart_update);
|
2001-01-27 06:16:58 +01:00
|
|
|
}
|
1997-09-01 09:59:06 +02:00
|
|
|
}
|
1997-09-11 09:24:37 +02:00
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
bool
|
2010-10-10 19:43:33 +02:00
|
|
|
ExecIRUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
|
2019-02-27 05:30:28 +01:00
|
|
|
HeapTuple trigtuple, TupleTableSlot *newslot)
|
2010-10-10 19:43:33 +02:00
|
|
|
{
|
|
|
|
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
|
2019-02-27 05:30:28 +01:00
|
|
|
TupleTableSlot *oldslot = ExecGetTriggerOldSlot(estate, relinfo);
|
2019-11-13 21:26:54 +01:00
|
|
|
HeapTuple newtuple = NULL;
|
2019-02-27 05:30:28 +01:00
|
|
|
bool should_free;
|
2020-02-24 10:12:10 +01:00
|
|
|
TriggerData LocTriggerData = {0};
|
2010-10-10 19:43:33 +02:00
|
|
|
int i;
|
|
|
|
|
|
|
|
LocTriggerData.type = T_TriggerData;
|
|
|
|
LocTriggerData.tg_event = TRIGGER_EVENT_UPDATE |
|
|
|
|
TRIGGER_EVENT_ROW |
|
|
|
|
TRIGGER_EVENT_INSTEAD;
|
|
|
|
LocTriggerData.tg_relation = relinfo->ri_RelationDesc;
|
2019-02-27 05:30:28 +01:00
|
|
|
|
2019-04-19 20:33:37 +02:00
|
|
|
ExecForceStoreHeapTuple(trigtuple, oldslot, false);
|
2019-02-27 05:30:28 +01:00
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
for (i = 0; i < trigdesc->numtriggers; i++)
|
|
|
|
{
|
|
|
|
Trigger *trigger = &trigdesc->triggers[i];
|
2019-02-27 05:30:28 +01:00
|
|
|
HeapTuple oldtuple;
|
2010-10-10 19:43:33 +02:00
|
|
|
|
|
|
|
if (!TRIGGER_TYPE_MATCHES(trigger->tgtype,
|
|
|
|
TRIGGER_TYPE_ROW,
|
|
|
|
TRIGGER_TYPE_INSTEAD,
|
|
|
|
TRIGGER_TYPE_UPDATE))
|
|
|
|
continue;
|
|
|
|
if (!TriggerEnabled(estate, relinfo, trigger, LocTriggerData.tg_event,
|
2019-02-27 05:30:28 +01:00
|
|
|
NULL, oldslot, newslot))
|
2010-10-10 19:43:33 +02:00
|
|
|
continue;
|
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
if (!newtuple)
|
|
|
|
newtuple = ExecFetchSlotHeapTuple(newslot, true, &should_free);
|
|
|
|
|
|
|
|
LocTriggerData.tg_trigslot = oldslot;
|
2011-02-22 03:18:04 +01:00
|
|
|
LocTriggerData.tg_trigtuple = trigtuple;
|
2019-02-27 05:30:28 +01:00
|
|
|
LocTriggerData.tg_newslot = newslot;
|
2011-02-22 03:18:04 +01:00
|
|
|
LocTriggerData.tg_newtuple = oldtuple = newtuple;
|
2019-02-27 05:30:28 +01:00
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
LocTriggerData.tg_trigger = trigger;
|
2011-02-22 03:18:04 +01:00
|
|
|
newtuple = ExecCallTriggerFunc(&LocTriggerData,
|
2010-10-10 19:43:33 +02:00
|
|
|
i,
|
|
|
|
relinfo->ri_TrigFunctions,
|
|
|
|
relinfo->ri_TrigInstrument,
|
|
|
|
GetPerTupleMemoryContext(estate));
|
|
|
|
if (newtuple == NULL)
|
2019-02-27 05:30:28 +01:00
|
|
|
{
|
|
|
|
return false; /* "do nothing" */
|
|
|
|
}
|
|
|
|
else if (newtuple != oldtuple)
|
|
|
|
{
|
2019-04-19 20:33:37 +02:00
|
|
|
ExecForceStoreHeapTuple(newtuple, newslot, false);
|
2011-02-22 03:18:04 +01:00
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
if (should_free)
|
|
|
|
heap_freetuple(oldtuple);
|
2011-02-22 03:18:04 +01:00
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
/* signal tuple should be re-fetched if used */
|
|
|
|
newtuple = NULL;
|
|
|
|
}
|
2010-10-10 19:43:33 +02:00
|
|
|
}
|
2019-02-27 05:30:28 +01:00
|
|
|
|
|
|
|
return true;
|
2010-10-10 19:43:33 +02:00
|
|
|
}
|
|
|
|
|
2008-03-28 01:21:56 +01:00
|
|
|
void
|
|
|
|
ExecBSTruncateTriggers(EState *estate, ResultRelInfo *relinfo)
|
|
|
|
{
|
|
|
|
TriggerDesc *trigdesc;
|
|
|
|
int i;
|
2020-02-24 10:12:10 +01:00
|
|
|
TriggerData LocTriggerData = {0};
|
2008-03-28 01:21:56 +01:00
|
|
|
|
|
|
|
trigdesc = relinfo->ri_TrigDesc;
|
|
|
|
|
|
|
|
if (trigdesc == NULL)
|
|
|
|
return;
|
2010-10-10 19:43:33 +02:00
|
|
|
if (!trigdesc->trig_truncate_before_statement)
|
2008-03-28 01:21:56 +01:00
|
|
|
return;
|
|
|
|
|
|
|
|
LocTriggerData.type = T_TriggerData;
|
|
|
|
LocTriggerData.tg_event = TRIGGER_EVENT_TRUNCATE |
|
|
|
|
TRIGGER_EVENT_BEFORE;
|
|
|
|
LocTriggerData.tg_relation = relinfo->ri_RelationDesc;
|
2019-02-27 05:30:28 +01:00
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
for (i = 0; i < trigdesc->numtriggers; i++)
|
2008-03-28 01:21:56 +01:00
|
|
|
{
|
2010-10-10 19:43:33 +02:00
|
|
|
Trigger *trigger = &trigdesc->triggers[i];
|
2008-03-28 01:21:56 +01:00
|
|
|
HeapTuple newtuple;
|
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
if (!TRIGGER_TYPE_MATCHES(trigger->tgtype,
|
|
|
|
TRIGGER_TYPE_STATEMENT,
|
|
|
|
TRIGGER_TYPE_BEFORE,
|
|
|
|
TRIGGER_TYPE_TRUNCATE))
|
|
|
|
continue;
|
2009-11-20 21:38:12 +01:00
|
|
|
if (!TriggerEnabled(estate, relinfo, trigger, LocTriggerData.tg_event,
|
|
|
|
NULL, NULL, NULL))
|
2009-10-15 00:14:25 +02:00
|
|
|
continue;
|
|
|
|
|
2008-03-28 01:21:56 +01:00
|
|
|
LocTriggerData.tg_trigger = trigger;
|
|
|
|
newtuple = ExecCallTriggerFunc(&LocTriggerData,
|
2010-10-10 19:43:33 +02:00
|
|
|
i,
|
2008-03-28 01:21:56 +01:00
|
|
|
relinfo->ri_TrigFunctions,
|
|
|
|
relinfo->ri_TrigInstrument,
|
|
|
|
GetPerTupleMemoryContext(estate));
|
|
|
|
|
|
|
|
if (newtuple)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
|
|
|
|
errmsg("BEFORE STATEMENT trigger cannot return a value")));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
ExecASTruncateTriggers(EState *estate, ResultRelInfo *relinfo)
|
|
|
|
{
|
|
|
|
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
|
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
if (trigdesc && trigdesc->trig_truncate_after_statement)
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
AfterTriggerSaveEvent(estate, relinfo,
|
|
|
|
NULL, NULL,
|
|
|
|
TRIGGER_EVENT_TRUNCATE,
|
|
|
|
false, NULL, NULL, NIL, NULL, NULL,
|
|
|
|
false);
|
2008-03-28 01:21:56 +01:00
|
|
|
}
|
|
|
|
|
1999-01-29 10:23:17 +01:00
|
|
|
|
2020-10-25 18:57:46 +01:00
|
|
|
/*
|
|
|
|
* Fetch tuple into "oldslot", dealing with locking and EPQ if necessary
|
|
|
|
*/
|
2019-02-27 05:30:28 +01:00
|
|
|
static bool
|
2009-10-10 03:43:50 +02:00
|
|
|
GetTupleForTrigger(EState *estate,
|
Re-implement EvalPlanQual processing to improve its performance and eliminate
a lot of strange behaviors that occurred in join cases. We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row. The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.
Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested. To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param. Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.
This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the
duplicate-output-tuple problem. It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
2009-10-26 03:26:45 +01:00
|
|
|
EPQState *epqstate,
|
2009-10-10 03:43:50 +02:00
|
|
|
ResultRelInfo *relinfo,
|
2007-11-30 22:22:54 +01:00
|
|
|
ItemPointer tid,
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
LockTupleMode lockmode,
|
2019-02-27 05:30:28 +01:00
|
|
|
TupleTableSlot *oldslot,
|
2022-03-28 16:45:58 +02:00
|
|
|
TupleTableSlot **epqslot,
|
|
|
|
TM_FailureData *tmfdp)
|
1997-09-11 09:24:37 +02:00
|
|
|
{
|
2001-06-01 04:41:36 +02:00
|
|
|
Relation relation = relinfo->ri_RelationDesc;
|
1997-09-11 09:24:37 +02:00
|
|
|
|
2019-10-04 20:59:34 +02:00
|
|
|
if (epqslot != NULL)
|
1998-12-15 13:47:01 +01:00
|
|
|
{
|
tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
|
|
|
TM_Result test;
|
|
|
|
TM_FailureData tmfd;
|
|
|
|
int lockflags = 0;
|
2005-08-20 02:40:32 +02:00
|
|
|
|
2019-10-04 20:59:34 +02:00
|
|
|
*epqslot = NULL;
|
1998-12-15 13:47:01 +01:00
|
|
|
|
Re-implement EvalPlanQual processing to improve its performance and eliminate
a lot of strange behaviors that occurred in join cases. We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row. The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.
Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested. To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param. Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.
This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the
duplicate-output-tuple problem. It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
2009-10-26 03:26:45 +01:00
|
|
|
/* caller must pass an epqstate if EvalPlanQual is possible */
|
|
|
|
Assert(epqstate != NULL);
|
2009-10-10 03:43:50 +02:00
|
|
|
|
1998-12-15 13:47:01 +01:00
|
|
|
/*
|
2005-04-28 23:47:18 +02:00
|
|
|
* lock tuple for update
|
1998-12-15 13:47:01 +01:00
|
|
|
*/
|
tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
|
|
|
if (!IsolationUsesXactSnapshot())
|
|
|
|
lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
|
2019-05-24 01:25:48 +02:00
|
|
|
test = table_tuple_lock(relation, tid, estate->es_snapshot, oldslot,
|
tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
|
|
|
estate->es_output_cid,
|
|
|
|
lockmode, LockWaitBlock,
|
|
|
|
lockflags,
|
|
|
|
&tmfd);
|
|
|
|
|
2022-03-28 16:45:58 +02:00
|
|
|
/* Let the caller know about the status of this operation */
|
|
|
|
if (tmfdp)
|
|
|
|
*tmfdp = tmfd;
|
|
|
|
|
1998-12-15 13:47:01 +01:00
|
|
|
switch (test)
|
|
|
|
{
|
tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
|
|
|
case TM_SelfModified:
|
2013-05-29 22:58:43 +02:00
|
|
|
|
2012-10-26 21:55:36 +02:00
|
|
|
/*
|
|
|
|
* The target tuple was already updated or deleted by the
|
|
|
|
* current command, or by a later command in the current
|
|
|
|
* transaction. We ignore the tuple in the former case, and
|
|
|
|
* throw error in the latter case, for the same reasons
|
|
|
|
* enumerated in ExecUpdate and ExecDelete in
|
|
|
|
* nodeModifyTable.c.
|
|
|
|
*/
|
tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
|
|
|
if (tmfd.cmax != estate->es_output_cid)
|
2012-10-26 21:55:36 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION),
|
|
|
|
errmsg("tuple to be updated was already modified by an operation triggered by the current command"),
|
|
|
|
errhint("Consider using an AFTER trigger instead of a BEFORE trigger to propagate changes to other rows.")));
|
|
|
|
|
2002-09-24 00:57:44 +02:00
|
|
|
/* treat it as deleted; do not process */
|
2019-02-27 05:30:28 +01:00
|
|
|
return false;
|
1997-09-11 09:24:37 +02:00
|
|
|
|
tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
|
|
|
case TM_Ok:
|
|
|
|
if (tmfd.traversed)
|
1999-01-29 10:23:17 +01:00
|
|
|
{
|
2019-10-04 20:59:34 +02:00
|
|
|
*epqslot = EvalPlanQual(epqstate,
|
|
|
|
relation,
|
|
|
|
relinfo->ri_RangeTableIndex,
|
|
|
|
oldslot);
|
2019-02-27 05:30:28 +01:00
|
|
|
|
tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
|
|
|
/*
|
|
|
|
* If PlanQual failed for updated tuple - we must not
|
|
|
|
* process this tuple!
|
|
|
|
*/
|
2019-10-04 20:59:34 +02:00
|
|
|
if (TupIsNull(*epqslot))
|
|
|
|
{
|
|
|
|
*epqslot = NULL;
|
tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
|
|
|
return false;
|
2019-10-04 20:59:34 +02:00
|
|
|
}
|
1999-01-29 10:23:17 +01:00
|
|
|
}
|
tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
|
|
|
break;
|
1999-05-25 18:15:34 +02:00
|
|
|
|
tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
|
|
|
case TM_Updated:
|
|
|
|
if (IsolationUsesXactSnapshot())
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
|
|
|
|
errmsg("could not serialize access due to concurrent update")));
|
2019-05-24 01:25:48 +02:00
|
|
|
elog(ERROR, "unexpected table_tuple_lock status: %u", test);
|
tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
|
|
|
break;
|
|
|
|
|
|
|
|
case TM_Deleted:
|
|
|
|
if (IsolationUsesXactSnapshot())
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
|
|
|
|
errmsg("could not serialize access due to concurrent delete")));
|
|
|
|
/* tuple was deleted */
|
2019-02-27 05:30:28 +01:00
|
|
|
return false;
|
1997-09-11 09:24:37 +02:00
|
|
|
|
tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
|
|
|
case TM_Invisible:
|
Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint. DO NOTHING avoids the
constraint violation, without touching the pre-existing row. DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed. The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.
This feature is often referred to as upsert.
This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert. If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made. If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.
To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.
Bumps catversion as stored rules change.
Author: Peter Geoghegan, with significant contributions from Heikki
Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
Dean Rasheed, Stephen Frost and many others.
2015-05-08 05:31:36 +02:00
|
|
|
elog(ERROR, "attempted to lock invisible tuple");
|
2018-05-02 01:35:08 +02:00
|
|
|
break;
|
Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint. DO NOTHING avoids the
constraint violation, without touching the pre-existing row. DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed. The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.
This feature is often referred to as upsert.
This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert. If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made. If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.
To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.
Bumps catversion as stored rules change.
Author: Peter Geoghegan, with significant contributions from Heikki
Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
Dean Rasheed, Stephen Frost and many others.
2015-05-08 05:31:36 +02:00
|
|
|
|
1998-12-15 13:47:01 +01:00
|
|
|
default:
|
2019-05-24 01:25:48 +02:00
|
|
|
elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
|
2019-02-27 05:30:28 +01:00
|
|
|
return false; /* keep compiler quiet */
|
1998-12-15 13:47:01 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2012-11-30 19:55:55 +01:00
|
|
|
/*
|
2019-03-25 08:13:42 +01:00
|
|
|
* We expect the tuple to be present, thus very simple error handling
|
|
|
|
* suffices.
|
2012-11-30 19:55:55 +01:00
|
|
|
*/
|
2019-05-24 01:25:48 +02:00
|
|
|
if (!table_tuple_fetch_row_version(relation, tid, SnapshotAny,
|
|
|
|
oldslot))
|
2019-03-25 08:13:42 +01:00
|
|
|
elog(ERROR, "failed to fetch tuple for trigger");
|
2019-02-27 05:30:28 +01:00
|
|
|
}
|
1997-09-11 09:24:37 +02:00
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
return true;
|
1997-09-11 09:24:37 +02:00
|
|
|
}
|
1999-09-29 18:06:40 +02:00
|
|
|
|
2009-10-15 00:14:25 +02:00
|
|
|
/*
|
|
|
|
* Is trigger enabled to fire?
|
|
|
|
*/
|
|
|
|
static bool
|
2009-11-20 21:38:12 +01:00
|
|
|
TriggerEnabled(EState *estate, ResultRelInfo *relinfo,
|
|
|
|
Trigger *trigger, TriggerEvent event,
|
|
|
|
Bitmapset *modifiedCols,
|
2019-02-27 05:30:28 +01:00
|
|
|
TupleTableSlot *oldslot, TupleTableSlot *newslot)
|
2009-10-15 00:14:25 +02:00
|
|
|
{
|
|
|
|
/* Check replication-role-dependent enable state */
|
|
|
|
if (SessionReplicationRole == SESSION_REPLICATION_ROLE_REPLICA)
|
|
|
|
{
|
|
|
|
if (trigger->tgenabled == TRIGGER_FIRES_ON_ORIGIN ||
|
|
|
|
trigger->tgenabled == TRIGGER_DISABLED)
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
else /* ORIGIN or LOCAL role */
|
|
|
|
{
|
|
|
|
if (trigger->tgenabled == TRIGGER_FIRES_ON_REPLICA ||
|
|
|
|
trigger->tgenabled == TRIGGER_DISABLED)
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check for column-specific trigger (only possible for UPDATE, and in
|
|
|
|
* fact we *must* ignore tgattr for other event types)
|
|
|
|
*/
|
|
|
|
if (trigger->tgnattr > 0 && TRIGGER_FIRED_BY_UPDATE(event))
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
bool modified;
|
|
|
|
|
|
|
|
modified = false;
|
|
|
|
for (i = 0; i < trigger->tgnattr; i++)
|
|
|
|
{
|
|
|
|
if (bms_is_member(trigger->tgattr[i] - FirstLowInvalidHeapAttributeNumber,
|
|
|
|
modifiedCols))
|
|
|
|
{
|
|
|
|
modified = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (!modified)
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2009-11-20 21:38:12 +01:00
|
|
|
/* Check for WHEN clause */
|
|
|
|
if (trigger->tgqual)
|
|
|
|
{
|
Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
|
|
|
ExprState **predicate;
|
2009-11-20 21:38:12 +01:00
|
|
|
ExprContext *econtext;
|
|
|
|
MemoryContext oldContext;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
Assert(estate != NULL);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* trigger is an element of relinfo->ri_TrigDesc->triggers[]; find the
|
|
|
|
* matching element of relinfo->ri_TrigWhenExprs[]
|
|
|
|
*/
|
|
|
|
i = trigger - relinfo->ri_TrigDesc->triggers;
|
|
|
|
predicate = &relinfo->ri_TrigWhenExprs[i];
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If first time through for this WHEN expression, build expression
|
|
|
|
* nodetrees for it. Keep them in the per-query memory context so
|
|
|
|
* they'll survive throughout the query.
|
|
|
|
*/
|
Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
|
|
|
if (*predicate == NULL)
|
2009-11-20 21:38:12 +01:00
|
|
|
{
|
|
|
|
Node *tgqual;
|
|
|
|
|
|
|
|
oldContext = MemoryContextSwitchTo(estate->es_query_cxt);
|
|
|
|
tgqual = stringToNode(trigger->tgqual);
|
2011-10-11 20:20:06 +02:00
|
|
|
/* Change references to OLD and NEW to INNER_VAR and OUTER_VAR */
|
|
|
|
ChangeVarNodes(tgqual, PRS2_OLD_VARNO, INNER_VAR, 0);
|
|
|
|
ChangeVarNodes(tgqual, PRS2_NEW_VARNO, OUTER_VAR, 0);
|
Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
|
|
|
/* ExecPrepareQual wants implicit-AND form */
|
2009-11-20 21:38:12 +01:00
|
|
|
tgqual = (Node *) make_ands_implicit((Expr *) tgqual);
|
Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
|
|
|
*predicate = ExecPrepareQual((List *) tgqual, estate);
|
2009-11-20 21:38:12 +01:00
|
|
|
MemoryContextSwitchTo(oldContext);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We will use the EState's per-tuple context for evaluating WHEN
|
|
|
|
* expressions (creating it if it's not already there).
|
|
|
|
*/
|
|
|
|
econtext = GetPerTupleExprContext(estate);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Finally evaluate the expression, making the old and/or new tuples
|
2011-10-11 20:20:06 +02:00
|
|
|
* available as INNER_VAR/OUTER_VAR respectively.
|
2009-11-20 21:38:12 +01:00
|
|
|
*/
|
|
|
|
econtext->ecxt_innertuple = oldslot;
|
|
|
|
econtext->ecxt_outertuple = newslot;
|
Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
|
|
|
if (!ExecQual(*predicate, econtext))
|
2009-11-20 21:38:12 +01:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2009-10-15 00:14:25 +02:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
1999-09-29 18:06:40 +02:00
|
|
|
|
|
|
|
/* ----------
|
2004-09-10 20:40:09 +02:00
|
|
|
* After-trigger stuff
|
2004-07-01 02:52:04 +02:00
|
|
|
*
|
2004-09-10 20:40:09 +02:00
|
|
|
* The AfterTriggersData struct holds data about pending AFTER trigger events
|
|
|
|
* during the current transaction tree. (BEFORE triggers are fired
|
|
|
|
* immediately so we don't need any persistent state about them.) The struct
|
|
|
|
* and most of its subsidiary data are kept in TopTransactionContext; however
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* some data that can be discarded sooner appears in the CurTransactionContext
|
|
|
|
* of the relevant subtransaction. Also, the individual event records are
|
|
|
|
* kept in a separate sub-context of TopTransactionContext. This is done
|
|
|
|
* mainly so that it's easy to tell from a memory context dump how much space
|
|
|
|
* is being eaten by trigger events.
|
2004-07-01 02:52:04 +02:00
|
|
|
*
|
2008-10-25 01:42:35 +02:00
|
|
|
* Because the list of pending events can grow large, we go to some
|
|
|
|
* considerable effort to minimize per-event memory consumption. The event
|
|
|
|
* records are grouped into chunks and common data for similar events in the
|
|
|
|
* same chunk is only stored once.
|
2004-07-01 02:52:04 +02:00
|
|
|
*
|
|
|
|
* XXX We need to be able to save the per-event data in a file if it grows too
|
|
|
|
* large.
|
1999-09-29 18:06:40 +02:00
|
|
|
* ----------
|
|
|
|
*/
|
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
/* Per-trigger SET CONSTRAINT status */
|
|
|
|
typedef struct SetConstraintTriggerData
|
2004-07-01 02:52:04 +02:00
|
|
|
{
|
2004-09-10 20:40:09 +02:00
|
|
|
Oid sct_tgoid;
|
|
|
|
bool sct_tgisdeferred;
|
|
|
|
} SetConstraintTriggerData;
|
2004-07-01 02:52:04 +02:00
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
typedef struct SetConstraintTriggerData *SetConstraintTrigger;
|
2004-07-01 02:52:04 +02:00
|
|
|
|
|
|
|
/*
|
2004-09-10 20:40:09 +02:00
|
|
|
* SET CONSTRAINT intra-transaction status.
|
2001-11-16 17:31:16 +01:00
|
|
|
*
|
2004-07-01 02:52:04 +02:00
|
|
|
* We make this a single palloc'd object so it can be copied and freed easily.
|
2003-04-20 19:03:25 +02:00
|
|
|
*
|
2004-07-01 02:52:04 +02:00
|
|
|
* all_isset and all_isdeferred are used to keep track
|
|
|
|
* of SET CONSTRAINTS ALL {DEFERRED, IMMEDIATE}.
|
1999-09-29 18:06:40 +02:00
|
|
|
*
|
2004-07-01 02:52:04 +02:00
|
|
|
* trigstates[] stores per-trigger tgisdeferred settings.
|
1999-09-29 18:06:40 +02:00
|
|
|
*/
|
2004-09-10 20:40:09 +02:00
|
|
|
typedef struct SetConstraintStateData
|
2004-07-01 02:52:04 +02:00
|
|
|
{
|
|
|
|
bool all_isset;
|
|
|
|
bool all_isdeferred;
|
|
|
|
int numstates; /* number of trigstates[] entries in use */
|
|
|
|
int numalloc; /* allocated size of trigstates[] */
|
2015-02-20 23:32:01 +01:00
|
|
|
SetConstraintTriggerData trigstates[FLEXIBLE_ARRAY_MEMBER];
|
2004-09-10 20:40:09 +02:00
|
|
|
} SetConstraintStateData;
|
2004-07-01 02:52:04 +02:00
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
typedef SetConstraintStateData *SetConstraintState;
|
2004-07-01 02:52:04 +02:00
|
|
|
|
1999-09-29 18:06:40 +02:00
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
/*
|
|
|
|
* Per-trigger-event data
|
|
|
|
*
|
2008-10-25 01:42:35 +02:00
|
|
|
* The actual per-event data, AfterTriggerEventData, includes DONE/IN_PROGRESS
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
* status bits, up to two tuple CTIDs, and optionally two OIDs of partitions.
|
|
|
|
* Each event record also has an associated AfterTriggerSharedData that is
|
|
|
|
* shared across all instances of similar events within a "chunk".
|
2008-10-25 01:42:35 +02:00
|
|
|
*
|
2014-03-23 07:16:34 +01:00
|
|
|
* For row-level triggers, we arrange not to waste storage on unneeded ctid
|
|
|
|
* fields. Updates of regular tables use two; inserts and deletes of regular
|
|
|
|
* tables use one; foreign tables always use zero and save the tuple(s) to a
|
|
|
|
* tuplestore. AFTER_TRIGGER_FDW_FETCH directs AfterTriggerExecute() to
|
|
|
|
* retrieve a fresh tuple or pair of tuples from that tuplestore, while
|
|
|
|
* AFTER_TRIGGER_FDW_REUSE directs it to use the most-recently-retrieved
|
|
|
|
* tuple(s). This permits storing tuples once regardless of the number of
|
|
|
|
* row-level triggers on a foreign table.
|
|
|
|
*
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
* When updates on partitioned tables cause rows to move between partitions,
|
|
|
|
* the OIDs of both partitions are stored too, so that the tuples can be
|
|
|
|
* fetched; such entries are marked AFTER_TRIGGER_CP_UPDATE (for "cross-
|
|
|
|
* partition update").
|
|
|
|
*
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* Note that we need triggers on foreign tables to be fired in exactly the
|
|
|
|
* order they were queued, so that the tuples come out of the tuplestore in
|
|
|
|
* the right order. To ensure that, we forbid deferrable (constraint)
|
|
|
|
* triggers on foreign tables. This also ensures that such triggers do not
|
|
|
|
* get deferred into outer trigger query levels, meaning that it's okay to
|
|
|
|
* destroy the tuplestore at the end of the query level.
|
|
|
|
*
|
2014-03-23 07:16:34 +01:00
|
|
|
* Statement-level triggers always bear AFTER_TRIGGER_1CTID, though they
|
|
|
|
* require no ctid field. We lack the flag bit space to neatly represent that
|
|
|
|
* distinct case, and it seems unlikely to be worth much trouble.
|
2008-10-25 01:42:35 +02:00
|
|
|
*
|
|
|
|
* Note: ats_firing_id is initially zero and is set to something else when
|
|
|
|
* AFTER_TRIGGER_IN_PROGRESS is set. It indicates which trigger firing
|
|
|
|
* cycle the trigger will be fired in (or was fired in, if DONE is set).
|
|
|
|
* Although this is mutable state, we can keep it in AfterTriggerSharedData
|
|
|
|
* because all instances of the same type of event in a given event list will
|
|
|
|
* be fired at the same time, if they were queued between the same firing
|
|
|
|
* cycles. So we need only ensure that ats_firing_id is zero when attaching
|
|
|
|
* a new event to an existing AfterTriggerSharedData record.
|
2004-09-10 20:40:09 +02:00
|
|
|
*/
|
2008-10-25 01:42:35 +02:00
|
|
|
typedef uint32 TriggerFlags;
|
|
|
|
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
#define AFTER_TRIGGER_OFFSET 0x07FFFFFF /* must be low-order bits */
|
|
|
|
#define AFTER_TRIGGER_DONE 0x80000000
|
|
|
|
#define AFTER_TRIGGER_IN_PROGRESS 0x40000000
|
2014-03-23 07:16:34 +01:00
|
|
|
/* bits describing the size and tuple sources of this event */
|
|
|
|
#define AFTER_TRIGGER_FDW_REUSE 0x00000000
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
#define AFTER_TRIGGER_FDW_FETCH 0x20000000
|
|
|
|
#define AFTER_TRIGGER_1CTID 0x10000000
|
|
|
|
#define AFTER_TRIGGER_2CTID 0x30000000
|
|
|
|
#define AFTER_TRIGGER_CP_UPDATE 0x08000000
|
|
|
|
#define AFTER_TRIGGER_TUP_BITS 0x38000000
|
2008-10-25 01:42:35 +02:00
|
|
|
typedef struct AfterTriggerSharedData *AfterTriggerShared;
|
|
|
|
|
|
|
|
typedef struct AfterTriggerSharedData
|
|
|
|
{
|
|
|
|
TriggerEvent ats_event; /* event type indicator, see trigger.h */
|
|
|
|
Oid ats_tgoid; /* the trigger's ID */
|
|
|
|
Oid ats_relid; /* the relation it's on */
|
|
|
|
CommandId ats_firing_id; /* ID for firing cycle */
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
struct AfterTriggersTableData *ats_table; /* transition table access */
|
2020-03-09 09:22:22 +01:00
|
|
|
Bitmapset *ats_modifiedcols; /* modified columns */
|
2008-10-25 01:42:35 +02:00
|
|
|
} AfterTriggerSharedData;
|
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
typedef struct AfterTriggerEventData *AfterTriggerEvent;
|
2003-06-25 01:25:44 +02:00
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
typedef struct AfterTriggerEventData
|
|
|
|
{
|
2008-10-25 01:42:35 +02:00
|
|
|
TriggerFlags ate_flags; /* status bits and offset to shared data */
|
|
|
|
ItemPointerData ate_ctid1; /* inserted, deleted, or old updated tuple */
|
|
|
|
ItemPointerData ate_ctid2; /* new updated tuple */
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* During a cross-partition update of a partitioned table, we also store
|
|
|
|
* the OIDs of source and destination partitions that are needed to fetch
|
|
|
|
* the old (ctid1) and the new tuple (ctid2) from, respectively.
|
|
|
|
*/
|
|
|
|
Oid ate_src_part;
|
|
|
|
Oid ate_dst_part;
|
2004-09-10 20:40:09 +02:00
|
|
|
} AfterTriggerEventData;
|
|
|
|
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
/* AfterTriggerEventData, minus ate_src_part, ate_dst_part */
|
|
|
|
typedef struct AfterTriggerEventDataNoOids
|
|
|
|
{
|
|
|
|
TriggerFlags ate_flags;
|
|
|
|
ItemPointerData ate_ctid1;
|
|
|
|
ItemPointerData ate_ctid2;
|
|
|
|
} AfterTriggerEventDataNoOids;
|
|
|
|
|
|
|
|
/* AfterTriggerEventData, minus ate_*_part and ate_ctid2 */
|
2008-10-25 01:42:35 +02:00
|
|
|
typedef struct AfterTriggerEventDataOneCtid
|
|
|
|
{
|
|
|
|
TriggerFlags ate_flags; /* status bits and offset to shared data */
|
|
|
|
ItemPointerData ate_ctid1; /* inserted, deleted, or old updated tuple */
|
|
|
|
} AfterTriggerEventDataOneCtid;
|
|
|
|
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
/* AfterTriggerEventData, minus ate_*_part, ate_ctid1 and ate_ctid2 */
|
2014-03-23 07:16:34 +01:00
|
|
|
typedef struct AfterTriggerEventDataZeroCtids
|
|
|
|
{
|
|
|
|
TriggerFlags ate_flags; /* status bits and offset to shared data */
|
|
|
|
} AfterTriggerEventDataZeroCtids;
|
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
#define SizeofTriggerEvent(evt) \
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
(((evt)->ate_flags & AFTER_TRIGGER_TUP_BITS) == AFTER_TRIGGER_CP_UPDATE ? \
|
2014-03-23 07:16:34 +01:00
|
|
|
sizeof(AfterTriggerEventData) : \
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
(((evt)->ate_flags & AFTER_TRIGGER_TUP_BITS) == AFTER_TRIGGER_2CTID ? \
|
|
|
|
sizeof(AfterTriggerEventDataNoOids) : \
|
|
|
|
(((evt)->ate_flags & AFTER_TRIGGER_TUP_BITS) == AFTER_TRIGGER_1CTID ? \
|
|
|
|
sizeof(AfterTriggerEventDataOneCtid) : \
|
|
|
|
sizeof(AfterTriggerEventDataZeroCtids))))
|
2008-10-25 01:42:35 +02:00
|
|
|
|
|
|
|
#define GetTriggerSharedData(evt) \
|
|
|
|
((AfterTriggerShared) ((char *) (evt) + ((evt)->ate_flags & AFTER_TRIGGER_OFFSET)))
|
|
|
|
|
|
|
|
/*
|
|
|
|
* To avoid palloc overhead, we keep trigger events in arrays in successively-
|
|
|
|
* larger chunks (a slightly more sophisticated version of an expansible
|
|
|
|
* array). The space between CHUNK_DATA_START and freeptr is occupied by
|
|
|
|
* AfterTriggerEventData records; the space between endfree and endptr is
|
|
|
|
* occupied by AfterTriggerSharedData records.
|
|
|
|
*/
|
|
|
|
typedef struct AfterTriggerEventChunk
|
|
|
|
{
|
|
|
|
struct AfterTriggerEventChunk *next; /* list link */
|
|
|
|
char *freeptr; /* start of free space in chunk */
|
|
|
|
char *endfree; /* end of free space in chunk */
|
|
|
|
char *endptr; /* end of chunk */
|
|
|
|
/* event data follows here */
|
|
|
|
} AfterTriggerEventChunk;
|
|
|
|
|
|
|
|
#define CHUNK_DATA_START(cptr) ((char *) (cptr) + MAXALIGN(sizeof(AfterTriggerEventChunk)))
|
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
/* A list of events */
|
|
|
|
typedef struct AfterTriggerEventList
|
|
|
|
{
|
2008-10-25 01:42:35 +02:00
|
|
|
AfterTriggerEventChunk *head;
|
|
|
|
AfterTriggerEventChunk *tail;
|
|
|
|
char *tailfree; /* freeptr of tail chunk */
|
2004-09-10 20:40:09 +02:00
|
|
|
} AfterTriggerEventList;
|
1999-09-29 18:06:40 +02:00
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
/* Macros to help in iterating over a list of events */
|
|
|
|
#define for_each_chunk(cptr, evtlist) \
|
|
|
|
for (cptr = (evtlist).head; cptr != NULL; cptr = cptr->next)
|
|
|
|
#define for_each_event(eptr, cptr) \
|
|
|
|
for (eptr = (AfterTriggerEvent) CHUNK_DATA_START(cptr); \
|
|
|
|
(char *) eptr < (cptr)->freeptr; \
|
|
|
|
eptr = (AfterTriggerEvent) (((char *) eptr) + SizeofTriggerEvent(eptr)))
|
|
|
|
/* Use this if no special per-chunk processing is needed */
|
|
|
|
#define for_each_event_chunk(eptr, cptr, evtlist) \
|
|
|
|
for_each_chunk(cptr, evtlist) for_each_event(eptr, cptr)
|
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
/* Macros for iterating from a start point that might not be list start */
|
|
|
|
#define for_each_chunk_from(cptr) \
|
|
|
|
for (; cptr != NULL; cptr = cptr->next)
|
|
|
|
#define for_each_event_from(eptr, cptr) \
|
|
|
|
for (; \
|
|
|
|
(char *) eptr < (cptr)->freeptr; \
|
|
|
|
eptr = (AfterTriggerEvent) (((char *) eptr) + SizeofTriggerEvent(eptr)))
|
|
|
|
|
2004-07-01 02:52:04 +02:00
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
/*
|
|
|
|
* All per-transaction data for the AFTER TRIGGERS module.
|
|
|
|
*
|
|
|
|
* AfterTriggersData has the following fields:
|
|
|
|
*
|
|
|
|
* firing_counter is incremented for each call of afterTriggerInvokeEvents.
|
|
|
|
* We mark firable events with the current firing cycle's ID so that we can
|
|
|
|
* tell which ones to work on. This ensures sane behavior if a trigger
|
|
|
|
* function chooses to do SET CONSTRAINTS: the inner SET CONSTRAINTS will
|
|
|
|
* only fire those events that weren't already scheduled for firing.
|
|
|
|
*
|
|
|
|
* state keeps track of the transaction-local effects of SET CONSTRAINTS.
|
|
|
|
* This is saved and restored across failed subtransactions.
|
|
|
|
*
|
|
|
|
* events is the current list of deferred events. This is global across
|
|
|
|
* all subtransactions of the current transaction. In a subtransaction
|
|
|
|
* abort, we know that the events added by the subtransaction are at the
|
2006-11-23 02:14:59 +01:00
|
|
|
* end of the list, so it is relatively easy to discard them. The event
|
2008-10-25 01:42:35 +02:00
|
|
|
* list chunks themselves are stored in event_cxt.
|
2004-09-10 20:40:09 +02:00
|
|
|
*
|
|
|
|
* query_depth is the current depth of nested AfterTriggerBeginQuery calls
|
|
|
|
* (-1 when the stack is empty).
|
|
|
|
*
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* query_stack[query_depth] is the per-query-level data, including these fields:
|
|
|
|
*
|
|
|
|
* events is a list of AFTER trigger events queued by the current query.
|
|
|
|
* None of these are valid until the matching AfterTriggerEndQuery call
|
|
|
|
* occurs. At that point we fire immediate-mode triggers, and append any
|
|
|
|
* deferred events to the main events list.
|
2004-09-10 20:40:09 +02:00
|
|
|
*
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* fdw_tuplestore is a tuplestore containing the foreign-table tuples
|
|
|
|
* needed by events queued by the current query. (Note: we use just one
|
|
|
|
* tuplestore even though more than one foreign table might be involved.
|
|
|
|
* This is okay because tuplestores don't really care what's in the tuples
|
|
|
|
* they store; but it's possible that someday it'd break.)
|
2014-03-23 07:16:34 +01:00
|
|
|
*
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* tables is a List of AfterTriggersTableData structs for target tables
|
|
|
|
* of the current query (see below).
|
2004-09-10 20:40:09 +02:00
|
|
|
*
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* maxquerydepth is just the allocated length of query_stack.
|
|
|
|
*
|
|
|
|
* trans_stack holds per-subtransaction data, including these fields:
|
|
|
|
*
|
|
|
|
* state is NULL or a pointer to a saved copy of the SET CONSTRAINTS
|
|
|
|
* state data. Each subtransaction level that modifies that state first
|
2004-09-10 20:40:09 +02:00
|
|
|
* saves a copy, which we use to restore the state if we abort.
|
|
|
|
*
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* events is a copy of the events head/tail pointers,
|
2004-09-10 20:40:09 +02:00
|
|
|
* which we use to restore those values during subtransaction abort.
|
|
|
|
*
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* query_depth is the subtransaction-start-time value of query_depth,
|
2004-09-10 20:40:09 +02:00
|
|
|
* which we similarly use to clean up at subtransaction abort.
|
|
|
|
*
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* firing_counter is the subtransaction-start-time value of firing_counter.
|
|
|
|
* We use this to recognize which deferred triggers were fired (or marked
|
|
|
|
* for firing) within an aborted subtransaction.
|
2004-09-10 20:40:09 +02:00
|
|
|
*
|
|
|
|
* We use GetCurrentTransactionNestLevel() to determine the correct array
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* index in trans_stack. maxtransdepth is the number of allocated entries in
|
|
|
|
* trans_stack. (By not keeping our own stack pointer, we can avoid trouble
|
2004-09-10 20:40:09 +02:00
|
|
|
* in cases where errors during subxact abort cause multiple invocations
|
|
|
|
* of AfterTriggerEndSubXact() at the same nesting depth.)
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
*
|
|
|
|
* We create an AfterTriggersTableData struct for each target table of the
|
|
|
|
* current query, and each operation mode (INSERT/UPDATE/DELETE), that has
|
2017-09-17 18:16:38 +02:00
|
|
|
* either transition tables or statement-level triggers. This is used to
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* hold the relevant transition tables, as well as info tracking whether
|
2017-09-17 18:16:38 +02:00
|
|
|
* we already queued the statement triggers. (We use that info to prevent
|
|
|
|
* firing the same statement triggers more than once per statement, or really
|
|
|
|
* once per transition table set.) These structs, along with the transition
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* table tuplestores, live in the (sub)transaction's CurTransactionContext.
|
|
|
|
* That's sufficient lifespan because we don't allow transition tables to be
|
|
|
|
* used by deferrable triggers, so they only need to survive until
|
|
|
|
* AfterTriggerEndQuery.
|
2004-09-10 20:40:09 +02:00
|
|
|
*/
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
typedef struct AfterTriggersQueryData AfterTriggersQueryData;
|
|
|
|
typedef struct AfterTriggersTransData AfterTriggersTransData;
|
|
|
|
typedef struct AfterTriggersTableData AfterTriggersTableData;
|
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
typedef struct AfterTriggersData
|
|
|
|
{
|
|
|
|
CommandId firing_counter; /* next firing ID to assign */
|
|
|
|
SetConstraintState state; /* the active S C state */
|
|
|
|
AfterTriggerEventList events; /* deferred-event list */
|
2008-10-25 01:42:35 +02:00
|
|
|
MemoryContext event_cxt; /* memory context for events, if any */
|
2004-09-10 20:40:09 +02:00
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
/* per-query-level data: */
|
|
|
|
AfterTriggersQueryData *query_stack; /* array of structs shown below */
|
|
|
|
int query_depth; /* current index in above array */
|
|
|
|
int maxquerydepth; /* allocated len of above array */
|
2004-09-10 20:40:09 +02:00
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
/* per-subtransaction-level data: */
|
|
|
|
AfterTriggersTransData *trans_stack; /* array of structs shown below */
|
|
|
|
int maxtransdepth; /* allocated len of above array */
|
2004-09-10 20:40:09 +02:00
|
|
|
} AfterTriggersData;
|
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
struct AfterTriggersQueryData
|
|
|
|
{
|
|
|
|
AfterTriggerEventList events; /* events pending from this query */
|
|
|
|
Tuplestorestate *fdw_tuplestore; /* foreign tuples for said events */
|
|
|
|
List *tables; /* list of AfterTriggersTableData, see below */
|
|
|
|
};
|
|
|
|
|
|
|
|
struct AfterTriggersTransData
|
|
|
|
{
|
|
|
|
/* these fields are just for resetting at subtrans abort: */
|
|
|
|
SetConstraintState state; /* saved S C state, or NULL if not yet saved */
|
|
|
|
AfterTriggerEventList events; /* saved list pointer */
|
|
|
|
int query_depth; /* saved query_depth */
|
|
|
|
CommandId firing_counter; /* saved firing_counter */
|
|
|
|
};
|
|
|
|
|
|
|
|
struct AfterTriggersTableData
|
|
|
|
{
|
|
|
|
/* relid + cmdType form the lookup key for these structs: */
|
|
|
|
Oid relid; /* target table's OID */
|
|
|
|
CmdType cmdType; /* event type, CMD_INSERT/UPDATE/DELETE */
|
|
|
|
bool closed; /* true when no longer OK to add tuples */
|
2017-09-17 18:16:38 +02:00
|
|
|
bool before_trig_done; /* did we already queue BS triggers? */
|
|
|
|
bool after_trig_done; /* did we already queue AS triggers? */
|
|
|
|
AfterTriggerEventList after_trig_events; /* if so, saved list pointer */
|
2022-03-28 16:45:58 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We maintain separate transition tables for UPDATE/INSERT/DELETE since
|
|
|
|
* MERGE can run all three actions in a single statement. Note that UPDATE
|
|
|
|
* needs both old and new transition tables whereas INSERT needs only new,
|
|
|
|
* and DELETE needs only old.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* "old" transition table for UPDATE, if any */
|
|
|
|
Tuplestorestate *old_upd_tuplestore;
|
|
|
|
/* "new" transition table for UPDATE, if any */
|
|
|
|
Tuplestorestate *new_upd_tuplestore;
|
|
|
|
/* "old" transition table for DELETE, if any */
|
|
|
|
Tuplestorestate *old_del_tuplestore;
|
|
|
|
/* "new" transition table for INSERT, if any */
|
|
|
|
Tuplestorestate *new_ins_tuplestore;
|
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
TupleTableSlot *storeslot; /* for converting to tuplestore's format */
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
};
|
|
|
|
|
2014-10-23 18:33:02 +02:00
|
|
|
static AfterTriggersData afterTriggers;
|
2004-09-10 20:40:09 +02:00
|
|
|
|
2019-02-27 05:30:28 +01:00
|
|
|
static void AfterTriggerExecute(EState *estate,
|
|
|
|
AfterTriggerEvent event,
|
|
|
|
ResultRelInfo *relInfo,
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
ResultRelInfo *src_relInfo,
|
|
|
|
ResultRelInfo *dst_relInfo,
|
2019-02-27 05:30:28 +01:00
|
|
|
TriggerDesc *trigdesc,
|
2004-09-10 20:40:09 +02:00
|
|
|
FmgrInfo *finfo,
|
2005-03-25 22:58:00 +01:00
|
|
|
Instrumentation *instr,
|
2014-03-23 07:16:34 +01:00
|
|
|
MemoryContext per_tuple_context,
|
|
|
|
TupleTableSlot *trig_tuple_slot1,
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
TupleTableSlot *trig_tuple_slot2);
|
|
|
|
static AfterTriggersTableData *GetAfterTriggersTableData(Oid relid,
|
|
|
|
CmdType cmdType);
|
2021-02-27 22:09:15 +01:00
|
|
|
static TupleTableSlot *GetAfterTriggersStoreSlot(AfterTriggersTableData *table,
|
|
|
|
TupleDesc tupdesc);
|
2022-03-12 00:40:03 +01:00
|
|
|
static Tuplestorestate *GetAfterTriggersTransitionTable(int event,
|
|
|
|
TupleTableSlot *oldslot,
|
|
|
|
TupleTableSlot *newslot,
|
|
|
|
TransitionCaptureState *transition_capture);
|
|
|
|
static void TransitionTableAddTuple(EState *estate,
|
|
|
|
TransitionCaptureState *transition_capture,
|
|
|
|
ResultRelInfo *relinfo,
|
|
|
|
TupleTableSlot *slot,
|
|
|
|
TupleTableSlot *original_insert_tuple,
|
|
|
|
Tuplestorestate *tuplestore);
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
static void AfterTriggerFreeQuery(AfterTriggersQueryData *qs);
|
2004-09-10 20:40:09 +02:00
|
|
|
static SetConstraintState SetConstraintStateCreate(int numalloc);
|
2022-09-20 22:09:30 +02:00
|
|
|
static SetConstraintState SetConstraintStateCopy(SetConstraintState origstate);
|
2004-09-10 20:40:09 +02:00
|
|
|
static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
|
2004-07-01 02:52:04 +02:00
|
|
|
Oid tgoid, bool tgisdeferred);
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);
|
2004-07-01 02:52:04 +02:00
|
|
|
|
|
|
|
|
2014-03-23 07:16:34 +01:00
|
|
|
/*
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* Get the FDW tuplestore for the current trigger query level, creating it
|
|
|
|
* if necessary.
|
2014-03-23 07:16:34 +01:00
|
|
|
*/
|
|
|
|
static Tuplestorestate *
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
GetCurrentFDWTuplestore(void)
|
2014-03-23 07:16:34 +01:00
|
|
|
{
|
|
|
|
Tuplestorestate *ret;
|
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
ret = afterTriggers.query_stack[afterTriggers.query_depth].fdw_tuplestore;
|
2014-03-23 07:16:34 +01:00
|
|
|
if (ret == NULL)
|
|
|
|
{
|
|
|
|
MemoryContext oldcxt;
|
|
|
|
ResourceOwner saveResourceOwner;
|
|
|
|
|
|
|
|
/*
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* Make the tuplestore valid until end of subtransaction. We really
|
2014-03-23 07:16:34 +01:00
|
|
|
* only need it until AfterTriggerEndQuery().
|
|
|
|
*/
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
oldcxt = MemoryContextSwitchTo(CurTransactionContext);
|
2014-03-23 07:16:34 +01:00
|
|
|
saveResourceOwner = CurrentResourceOwner;
|
2017-10-11 23:43:50 +02:00
|
|
|
CurrentResourceOwner = CurTransactionResourceOwner;
|
|
|
|
|
|
|
|
ret = tuplestore_begin_heap(false, false, work_mem);
|
|
|
|
|
2014-03-23 07:16:34 +01:00
|
|
|
CurrentResourceOwner = saveResourceOwner;
|
|
|
|
MemoryContextSwitchTo(oldcxt);
|
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
afterTriggers.query_stack[afterTriggers.query_depth].fdw_tuplestore = ret;
|
2014-03-23 07:16:34 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
1999-09-29 18:06:40 +02:00
|
|
|
/* ----------
|
2004-09-10 20:40:09 +02:00
|
|
|
* afterTriggerCheckState()
|
1999-09-29 18:06:40 +02:00
|
|
|
*
|
2008-10-25 01:42:35 +02:00
|
|
|
* Returns true if the trigger event is actually in state DEFERRED.
|
1999-09-29 18:06:40 +02:00
|
|
|
* ----------
|
|
|
|
*/
|
|
|
|
static bool
|
2008-10-25 01:42:35 +02:00
|
|
|
afterTriggerCheckState(AfterTriggerShared evtshared)
|
1999-09-29 18:06:40 +02:00
|
|
|
{
|
2008-10-25 01:42:35 +02:00
|
|
|
Oid tgoid = evtshared->ats_tgoid;
|
2014-10-23 18:33:02 +02:00
|
|
|
SetConstraintState state = afterTriggers.state;
|
2004-07-01 02:52:04 +02:00
|
|
|
int i;
|
1999-09-29 18:06:40 +02:00
|
|
|
|
|
|
|
/*
|
2004-07-01 02:52:04 +02:00
|
|
|
* For not-deferrable triggers (i.e. normal AFTER ROW triggers and
|
|
|
|
* constraints declared NOT DEFERRABLE), the state is always false.
|
1999-09-29 18:06:40 +02:00
|
|
|
*/
|
2008-10-25 01:42:35 +02:00
|
|
|
if ((evtshared->ats_event & AFTER_TRIGGER_DEFERRABLE) == 0)
|
1999-09-29 18:06:40 +02:00
|
|
|
return false;
|
|
|
|
|
|
|
|
/*
|
2014-10-23 18:33:02 +02:00
|
|
|
* If constraint state exists, SET CONSTRAINTS might have been executed
|
|
|
|
* either for this trigger or for all triggers.
|
1999-09-29 18:06:40 +02:00
|
|
|
*/
|
2014-10-23 18:33:02 +02:00
|
|
|
if (state != NULL)
|
1999-09-29 18:06:40 +02:00
|
|
|
{
|
2014-10-23 18:33:02 +02:00
|
|
|
/* Check for SET CONSTRAINTS for this specific trigger. */
|
|
|
|
for (i = 0; i < state->numstates; i++)
|
|
|
|
{
|
|
|
|
if (state->trigstates[i].sct_tgoid == tgoid)
|
|
|
|
return state->trigstates[i].sct_tgisdeferred;
|
|
|
|
}
|
1999-09-29 18:06:40 +02:00
|
|
|
|
2014-10-23 18:33:02 +02:00
|
|
|
/* Check for SET CONSTRAINTS ALL. */
|
|
|
|
if (state->all_isset)
|
|
|
|
return state->all_isdeferred;
|
|
|
|
}
|
1999-09-29 18:06:40 +02:00
|
|
|
|
|
|
|
/*
|
2004-09-09 01:47:58 +02:00
|
|
|
* Otherwise return the default state for the trigger.
|
1999-09-29 18:06:40 +02:00
|
|
|
*/
|
2008-10-25 01:42:35 +02:00
|
|
|
return ((evtshared->ats_event & AFTER_TRIGGER_INITDEFERRED) != 0);
|
1999-09-29 18:06:40 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/* ----------
|
2004-09-10 20:40:09 +02:00
|
|
|
* afterTriggerAddEvent()
|
1999-09-29 18:06:40 +02:00
|
|
|
*
|
2008-10-25 01:42:35 +02:00
|
|
|
* Add a new trigger event to the specified queue.
|
|
|
|
* The passed-in event data is copied.
|
1999-09-29 18:06:40 +02:00
|
|
|
* ----------
|
|
|
|
*/
|
2000-01-10 18:14:46 +01:00
|
|
|
static void
|
2008-10-25 01:42:35 +02:00
|
|
|
afterTriggerAddEvent(AfterTriggerEventList *events,
|
|
|
|
AfterTriggerEvent event, AfterTriggerShared evtshared)
|
1999-09-29 18:06:40 +02:00
|
|
|
{
|
2008-10-25 01:42:35 +02:00
|
|
|
Size eventsize = SizeofTriggerEvent(event);
|
|
|
|
Size needed = eventsize + sizeof(AfterTriggerSharedData);
|
|
|
|
AfterTriggerEventChunk *chunk;
|
|
|
|
AfterTriggerShared newshared;
|
|
|
|
AfterTriggerEvent newevent;
|
2004-09-10 20:40:09 +02:00
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
/*
|
|
|
|
* If empty list or not enough room in the tail chunk, make a new chunk.
|
|
|
|
* We assume here that a new shared record will always be needed.
|
|
|
|
*/
|
|
|
|
chunk = events->tail;
|
|
|
|
if (chunk == NULL ||
|
|
|
|
chunk->endfree - chunk->freeptr < needed)
|
|
|
|
{
|
|
|
|
Size chunksize;
|
2004-07-01 02:52:04 +02:00
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
/* Create event context if we didn't already */
|
2014-10-23 18:33:02 +02:00
|
|
|
if (afterTriggers.event_cxt == NULL)
|
|
|
|
afterTriggers.event_cxt =
|
2008-10-25 01:42:35 +02:00
|
|
|
AllocSetContextCreate(TopTransactionContext,
|
|
|
|
"AfterTriggerEvents",
|
Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters. While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea. Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.
While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.
In passing, change TopMemoryContext to use the default allocation
parameters. Formerly it could only be extended 8K at a time. That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there. There seems no good reason
not to let it use growing blocks like most other contexts.
Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain. The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.
Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 23:50:38 +02:00
|
|
|
ALLOCSET_DEFAULT_SIZES);
|
2004-09-10 20:40:09 +02:00
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
/*
|
|
|
|
* Chunk size starts at 1KB and is allowed to increase up to 1MB.
|
|
|
|
* These numbers are fairly arbitrary, though there is a hard limit at
|
|
|
|
* AFTER_TRIGGER_OFFSET; else we couldn't link event records to their
|
|
|
|
* shared records using the available space in ate_flags. Another
|
|
|
|
* constraint is that if the chunk size gets too huge, the search loop
|
|
|
|
* below would get slow given a (not too common) usage pattern with
|
|
|
|
* many distinct event types in a chunk. Therefore, we double the
|
|
|
|
* preceding chunk size only if there weren't too many shared records
|
|
|
|
* in the preceding chunk; otherwise we halve it. This gives us some
|
|
|
|
* ability to adapt to the actual usage pattern of the current query
|
|
|
|
* while still having large chunk sizes in typical usage. All chunk
|
|
|
|
* sizes used should be MAXALIGN multiples, to ensure that the shared
|
|
|
|
* records will be aligned safely.
|
|
|
|
*/
|
|
|
|
#define MIN_CHUNK_SIZE 1024
|
|
|
|
#define MAX_CHUNK_SIZE (1024*1024)
|
|
|
|
|
|
|
|
#if MAX_CHUNK_SIZE > (AFTER_TRIGGER_OFFSET+1)
|
|
|
|
#error MAX_CHUNK_SIZE must not exceed AFTER_TRIGGER_OFFSET
|
|
|
|
#endif
|
|
|
|
|
|
|
|
if (chunk == NULL)
|
|
|
|
chunksize = MIN_CHUNK_SIZE;
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* preceding chunk size... */
|
|
|
|
chunksize = chunk->endptr - (char *) chunk;
|
|
|
|
/* check number of shared records in preceding chunk */
|
|
|
|
if ((chunk->endptr - chunk->endfree) <=
|
|
|
|
(100 * sizeof(AfterTriggerSharedData)))
|
|
|
|
chunksize *= 2; /* okay, double it */
|
|
|
|
else
|
|
|
|
chunksize /= 2; /* too many shared records */
|
|
|
|
chunksize = Min(chunksize, MAX_CHUNK_SIZE);
|
|
|
|
}
|
2014-10-23 18:33:02 +02:00
|
|
|
chunk = MemoryContextAlloc(afterTriggers.event_cxt, chunksize);
|
2008-10-25 01:42:35 +02:00
|
|
|
chunk->next = NULL;
|
|
|
|
chunk->freeptr = CHUNK_DATA_START(chunk);
|
|
|
|
chunk->endptr = chunk->endfree = (char *) chunk + chunksize;
|
|
|
|
Assert(chunk->endfree - chunk->freeptr >= needed);
|
|
|
|
|
|
|
|
if (events->head == NULL)
|
|
|
|
events->head = chunk;
|
|
|
|
else
|
|
|
|
events->tail->next = chunk;
|
|
|
|
events->tail = chunk;
|
2010-08-19 17:46:18 +02:00
|
|
|
/* events->tailfree is now out of sync, but we'll fix it below */
|
2008-10-25 01:42:35 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Try to locate a matching shared-data record already in the chunk. If
|
|
|
|
* none, make a new one.
|
|
|
|
*/
|
|
|
|
for (newshared = ((AfterTriggerShared) chunk->endptr) - 1;
|
|
|
|
(char *) newshared >= chunk->endfree;
|
|
|
|
newshared--)
|
|
|
|
{
|
|
|
|
if (newshared->ats_tgoid == evtshared->ats_tgoid &&
|
|
|
|
newshared->ats_relid == evtshared->ats_relid &&
|
|
|
|
newshared->ats_event == evtshared->ats_event &&
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
newshared->ats_table == evtshared->ats_table &&
|
2008-10-25 01:42:35 +02:00
|
|
|
newshared->ats_firing_id == 0)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if ((char *) newshared < chunk->endfree)
|
2001-03-13 00:02:00 +01:00
|
|
|
{
|
2008-10-25 01:42:35 +02:00
|
|
|
*newshared = *evtshared;
|
|
|
|
newshared->ats_firing_id = 0; /* just to be sure */
|
|
|
|
chunk->endfree = (char *) newshared;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Insert the data */
|
|
|
|
newevent = (AfterTriggerEvent) chunk->freeptr;
|
|
|
|
memcpy(newevent, event, eventsize);
|
|
|
|
/* ... and link the new event to its shared record */
|
|
|
|
newevent->ate_flags &= ~AFTER_TRIGGER_OFFSET;
|
|
|
|
newevent->ate_flags |= (char *) newshared - (char *) newevent;
|
|
|
|
|
|
|
|
chunk->freeptr += eventsize;
|
|
|
|
events->tailfree = chunk->freeptr;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* ----------
|
|
|
|
* afterTriggerFreeEventList()
|
|
|
|
*
|
|
|
|
* Free all the event storage in the given list.
|
|
|
|
* ----------
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
afterTriggerFreeEventList(AfterTriggerEventList *events)
|
|
|
|
{
|
|
|
|
AfterTriggerEventChunk *chunk;
|
|
|
|
|
Fix possible dangling pointer dereference in trigger.c.
AfterTriggerEndQuery correctly notes that the query_stack could get
repalloc'd during a trigger firing, but it nonetheless passes the address
of a query_stack entry to afterTriggerInvokeEvents, so that if such a
repalloc occurs, afterTriggerInvokeEvents is already working with an
obsolete dangling pointer while it scans the rest of the events. Oops.
The only code at risk is its "delete_ok" cleanup code, so we can
prevent unsafe behavior by passing delete_ok = false instead of true.
However, that could have a significant performance penalty, because the
point of passing delete_ok = true is to not have to re-scan possibly
a large number of dead trigger events on the next time through the loop.
There's more than one way to skin that cat, though. What we can do is
delete all the "chunks" in the event list except the last one, since
we know all events in them must be dead. Deleting the chunks is work
we'd have had to do later in AfterTriggerEndQuery anyway, and it ends
up saving rescanning of just about the same events we'd have gotten
rid of with delete_ok = true.
In v10 and HEAD, we also have to be careful to mop up any per-table
after_trig_events pointers that would become dangling. This is slightly
annoying, but I don't think that normal use-cases will traverse this code
path often enough for it to be a performance problem.
It's pretty hard to hit this in practice because of the unlikelihood
of the query_stack getting resized at just the wrong time. Nonetheless,
it's definitely a live bug of ancient standing, so back-patch to all
supported branches.
Discussion: https://postgr.es/m/2891.1505419542@sss.pgh.pa.us
2017-09-17 20:50:01 +02:00
|
|
|
while ((chunk = events->head) != NULL)
|
2008-10-25 01:42:35 +02:00
|
|
|
{
|
Fix possible dangling pointer dereference in trigger.c.
AfterTriggerEndQuery correctly notes that the query_stack could get
repalloc'd during a trigger firing, but it nonetheless passes the address
of a query_stack entry to afterTriggerInvokeEvents, so that if such a
repalloc occurs, afterTriggerInvokeEvents is already working with an
obsolete dangling pointer while it scans the rest of the events. Oops.
The only code at risk is its "delete_ok" cleanup code, so we can
prevent unsafe behavior by passing delete_ok = false instead of true.
However, that could have a significant performance penalty, because the
point of passing delete_ok = true is to not have to re-scan possibly
a large number of dead trigger events on the next time through the loop.
There's more than one way to skin that cat, though. What we can do is
delete all the "chunks" in the event list except the last one, since
we know all events in them must be dead. Deleting the chunks is work
we'd have had to do later in AfterTriggerEndQuery anyway, and it ends
up saving rescanning of just about the same events we'd have gotten
rid of with delete_ok = true.
In v10 and HEAD, we also have to be careful to mop up any per-table
after_trig_events pointers that would become dangling. This is slightly
annoying, but I don't think that normal use-cases will traverse this code
path often enough for it to be a performance problem.
It's pretty hard to hit this in practice because of the unlikelihood
of the query_stack getting resized at just the wrong time. Nonetheless,
it's definitely a live bug of ancient standing, so back-patch to all
supported branches.
Discussion: https://postgr.es/m/2891.1505419542@sss.pgh.pa.us
2017-09-17 20:50:01 +02:00
|
|
|
events->head = chunk->next;
|
2008-10-25 01:42:35 +02:00
|
|
|
pfree(chunk);
|
|
|
|
}
|
|
|
|
events->tail = NULL;
|
|
|
|
events->tailfree = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* ----------
|
|
|
|
* afterTriggerRestoreEventList()
|
|
|
|
*
|
|
|
|
* Restore an event list to its prior length, removing all the events
|
|
|
|
* added since it had the value old_events.
|
|
|
|
* ----------
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
afterTriggerRestoreEventList(AfterTriggerEventList *events,
|
|
|
|
const AfterTriggerEventList *old_events)
|
|
|
|
{
|
|
|
|
AfterTriggerEventChunk *chunk;
|
|
|
|
AfterTriggerEventChunk *next_chunk;
|
|
|
|
|
|
|
|
if (old_events->tail == NULL)
|
|
|
|
{
|
|
|
|
/* restoring to a completely empty state, so free everything */
|
|
|
|
afterTriggerFreeEventList(events);
|
2001-03-13 00:02:00 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2008-10-25 01:42:35 +02:00
|
|
|
*events = *old_events;
|
|
|
|
/* free any chunks after the last one we want to keep */
|
|
|
|
for (chunk = events->tail->next; chunk != NULL; chunk = next_chunk)
|
|
|
|
{
|
|
|
|
next_chunk = chunk->next;
|
|
|
|
pfree(chunk);
|
|
|
|
}
|
|
|
|
/* and clean up the tail chunk to be the right length */
|
|
|
|
events->tail->next = NULL;
|
|
|
|
events->tail->freeptr = events->tailfree;
|
2009-06-11 16:49:15 +02:00
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
/*
|
|
|
|
* We don't make any effort to remove now-unused shared data records.
|
|
|
|
* They might still be useful, anyway.
|
|
|
|
*/
|
2001-03-13 00:02:00 +01:00
|
|
|
}
|
1999-09-29 18:06:40 +02:00
|
|
|
}
|
|
|
|
|
Fix possible dangling pointer dereference in trigger.c.
AfterTriggerEndQuery correctly notes that the query_stack could get
repalloc'd during a trigger firing, but it nonetheless passes the address
of a query_stack entry to afterTriggerInvokeEvents, so that if such a
repalloc occurs, afterTriggerInvokeEvents is already working with an
obsolete dangling pointer while it scans the rest of the events. Oops.
The only code at risk is its "delete_ok" cleanup code, so we can
prevent unsafe behavior by passing delete_ok = false instead of true.
However, that could have a significant performance penalty, because the
point of passing delete_ok = true is to not have to re-scan possibly
a large number of dead trigger events on the next time through the loop.
There's more than one way to skin that cat, though. What we can do is
delete all the "chunks" in the event list except the last one, since
we know all events in them must be dead. Deleting the chunks is work
we'd have had to do later in AfterTriggerEndQuery anyway, and it ends
up saving rescanning of just about the same events we'd have gotten
rid of with delete_ok = true.
In v10 and HEAD, we also have to be careful to mop up any per-table
after_trig_events pointers that would become dangling. This is slightly
annoying, but I don't think that normal use-cases will traverse this code
path often enough for it to be a performance problem.
It's pretty hard to hit this in practice because of the unlikelihood
of the query_stack getting resized at just the wrong time. Nonetheless,
it's definitely a live bug of ancient standing, so back-patch to all
supported branches.
Discussion: https://postgr.es/m/2891.1505419542@sss.pgh.pa.us
2017-09-17 20:50:01 +02:00
|
|
|
/* ----------
|
|
|
|
* afterTriggerDeleteHeadEventChunk()
|
|
|
|
*
|
|
|
|
* Remove the first chunk of events from the query level's event list.
|
|
|
|
* Keep any event list pointers elsewhere in the query level's data
|
|
|
|
* structures in sync.
|
|
|
|
* ----------
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
afterTriggerDeleteHeadEventChunk(AfterTriggersQueryData *qs)
|
|
|
|
{
|
|
|
|
AfterTriggerEventChunk *target = qs->events.head;
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
Assert(target && target->next);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* First, update any pointers in the per-table data, so that they won't be
|
|
|
|
* dangling. Resetting obsoleted pointers to NULL will make
|
|
|
|
* cancel_prior_stmt_triggers start from the list head, which is fine.
|
|
|
|
*/
|
|
|
|
foreach(lc, qs->tables)
|
|
|
|
{
|
|
|
|
AfterTriggersTableData *table = (AfterTriggersTableData *) lfirst(lc);
|
|
|
|
|
|
|
|
if (table->after_trig_done &&
|
|
|
|
table->after_trig_events.tail == target)
|
|
|
|
{
|
|
|
|
table->after_trig_events.head = NULL;
|
|
|
|
table->after_trig_events.tail = NULL;
|
|
|
|
table->after_trig_events.tailfree = NULL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Now we can flush the head chunk */
|
|
|
|
qs->events.head = target->next;
|
|
|
|
pfree(target);
|
|
|
|
}
|
|
|
|
|
1999-09-29 18:06:40 +02:00
|
|
|
|
|
|
|
/* ----------
|
2004-09-10 20:40:09 +02:00
|
|
|
* AfterTriggerExecute()
|
1999-09-29 18:06:40 +02:00
|
|
|
*
|
|
|
|
* Fetch the required tuples back from the heap and fire one
|
|
|
|
* single trigger function.
|
2001-06-01 04:41:36 +02:00
|
|
|
*
|
|
|
|
* Frequently, this will be fired many times in a row for triggers of
|
|
|
|
* a single relation. Therefore, we cache the open relation and provide
|
2005-03-25 22:58:00 +01:00
|
|
|
* fmgr lookup cache space at the caller level. (For triggers fired at
|
|
|
|
* the end of a query, we can even piggyback on the executor's state.)
|
2001-06-01 04:41:36 +02:00
|
|
|
*
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
* When fired for a cross-partition update of a partitioned table, the old
|
|
|
|
* tuple is fetched using 'src_relInfo' (the source leaf partition) and
|
|
|
|
* the new tuple using 'dst_relInfo' (the destination leaf partition), though
|
|
|
|
* both are converted into the root partitioned table's format before passing
|
|
|
|
* to the trigger function.
|
|
|
|
*
|
2001-06-01 04:41:36 +02:00
|
|
|
* event: event currently being fired.
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
* relInfo: result relation for event.
|
|
|
|
* src_relInfo: source partition of a cross-partition update
|
|
|
|
* dst_relInfo: its destination partition
|
2002-10-14 18:51:30 +02:00
|
|
|
* trigdesc: working copy of rel's trigger info.
|
|
|
|
* finfo: array of fmgr lookup cache entries (one per trigger in trigdesc).
|
2005-03-25 22:58:00 +01:00
|
|
|
* instr: array of EXPLAIN ANALYZE instrumentation nodes (one per trigger),
|
|
|
|
* or NULL if no instrumentation is wanted.
|
2001-06-01 04:41:36 +02:00
|
|
|
* per_tuple_context: memory context to call trigger function in.
|
2014-03-23 07:16:34 +01:00
|
|
|
* trig_tuple_slot1: scratch slot for tg_trigtuple (foreign tables only)
|
|
|
|
* trig_tuple_slot2: scratch slot for tg_newtuple (foreign tables only)
|
1999-09-29 18:06:40 +02:00
|
|
|
* ----------
|
|
|
|
*/
|
|
|
|
static void
|
2019-02-27 05:30:28 +01:00
|
|
|
AfterTriggerExecute(EState *estate,
|
|
|
|
AfterTriggerEvent event,
|
|
|
|
ResultRelInfo *relInfo,
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
ResultRelInfo *src_relInfo,
|
|
|
|
ResultRelInfo *dst_relInfo,
|
2019-02-27 05:30:28 +01:00
|
|
|
TriggerDesc *trigdesc,
|
2005-03-25 22:58:00 +01:00
|
|
|
FmgrInfo *finfo, Instrumentation *instr,
|
2014-03-23 07:16:34 +01:00
|
|
|
MemoryContext per_tuple_context,
|
|
|
|
TupleTableSlot *trig_tuple_slot1,
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
TupleTableSlot *trig_tuple_slot2)
|
1999-09-29 18:06:40 +02:00
|
|
|
{
|
2019-02-27 05:30:28 +01:00
|
|
|
Relation rel = relInfo->ri_RelationDesc;
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
Relation src_rel = src_relInfo->ri_RelationDesc;
|
|
|
|
Relation dst_rel = dst_relInfo->ri_RelationDesc;
|
2008-10-25 01:42:35 +02:00
|
|
|
AfterTriggerShared evtshared = GetTriggerSharedData(event);
|
|
|
|
Oid tgoid = evtshared->ats_tgoid;
|
2020-02-24 10:12:10 +01:00
|
|
|
TriggerData LocTriggerData = {0};
|
1999-09-29 18:06:40 +02:00
|
|
|
HeapTuple rettuple;
|
2001-06-01 04:41:36 +02:00
|
|
|
int tgindx;
|
2019-02-27 05:30:28 +01:00
|
|
|
bool should_free_trig = false;
|
|
|
|
bool should_free_new = false;
|
1999-09-29 18:06:40 +02:00
|
|
|
|
2005-03-25 22:58:00 +01:00
|
|
|
/*
|
|
|
|
* Locate trigger in trigdesc.
|
|
|
|
*/
|
|
|
|
for (tgindx = 0; tgindx < trigdesc->numtriggers; tgindx++)
|
|
|
|
{
|
|
|
|
if (trigdesc->triggers[tgindx].tgoid == tgoid)
|
|
|
|
{
|
|
|
|
LocTriggerData.tg_trigger = &(trigdesc->triggers[tgindx]);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (LocTriggerData.tg_trigger == NULL)
|
|
|
|
elog(ERROR, "could not find trigger %u", tgoid);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If doing EXPLAIN ANALYZE, start charging time to this trigger. We want
|
|
|
|
* to include time spent re-fetching tuples in the trigger cost.
|
|
|
|
*/
|
|
|
|
if (instr)
|
|
|
|
InstrStartNode(instr + tgindx);
|
|
|
|
|
1999-09-29 18:06:40 +02:00
|
|
|
/*
|
2008-10-25 01:42:35 +02:00
|
|
|
* Fetch the required tuple(s).
|
1999-09-29 18:06:40 +02:00
|
|
|
*/
|
2014-03-23 07:16:34 +01:00
|
|
|
switch (event->ate_flags & AFTER_TRIGGER_TUP_BITS)
|
1999-09-29 18:06:40 +02:00
|
|
|
{
|
2014-03-23 07:16:34 +01:00
|
|
|
case AFTER_TRIGGER_FDW_FETCH:
|
|
|
|
{
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
Tuplestorestate *fdw_tuplestore = GetCurrentFDWTuplestore();
|
1999-09-29 18:06:40 +02:00
|
|
|
|
2014-03-23 07:16:34 +01:00
|
|
|
if (!tuplestore_gettupleslot(fdw_tuplestore, true, false,
|
|
|
|
trig_tuple_slot1))
|
|
|
|
elog(ERROR, "failed to fetch tuple1 for AFTER trigger");
|
|
|
|
|
|
|
|
if ((evtshared->ats_event & TRIGGER_EVENT_OPMASK) ==
|
|
|
|
TRIGGER_EVENT_UPDATE &&
|
|
|
|
!tuplestore_gettupleslot(fdw_tuplestore, true, false,
|
|
|
|
trig_tuple_slot2))
|
|
|
|
elog(ERROR, "failed to fetch tuple2 for AFTER trigger");
|
|
|
|
}
|
2020-05-13 21:31:14 +02:00
|
|
|
/* fall through */
|
2014-03-23 07:16:34 +01:00
|
|
|
case AFTER_TRIGGER_FDW_REUSE:
|
2014-05-06 18:12:18 +02:00
|
|
|
|
2014-03-23 07:16:34 +01:00
|
|
|
/*
|
2019-02-27 05:30:28 +01:00
|
|
|
* Store tuple in the slot so that tg_trigtuple does not reference
|
Rejigger materializing and fetching a HeapTuple from a slot.
Previously materializing a slot always returned a HeapTuple. As
current work aims to reduce the reliance on HeapTuples (so other
storage systems can work efficiently), that needs to change. Thus
split the tasks of materializing a slot (i.e. making it independent
from the underlying storage / other memory contexts) from fetching a
HeapTuple from the slot. For brevity, allow to fetch a HeapTuple from
a slot and materializing the slot at the same time, controlled by a
parameter.
For now some callers of ExecFetchSlotHeapTuple, with materialize =
true, expect that changes to the heap tuple will be reflected in the
underlying slot. Those places will be adapted in due course, so while
not pretty, that's OK for now.
Also rename ExecFetchSlotTuple to ExecFetchSlotHeapTupleDatum and
ExecFetchSlotTupleDatum to ExecFetchSlotHeapTupleDatum, as it's likely
that future storage methods will need similar methods. There already
is ExecFetchSlotMinimalTuple, so the new names make the naming scheme
more coherent.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-15 23:26:14 +01:00
|
|
|
* tuplestore memory. (It is formally possible for the trigger
|
|
|
|
* function to queue trigger events that add to the same
|
|
|
|
* tuplestore, which can push other tuples out of memory.) The
|
|
|
|
* distinction is academic, because we start with a minimal tuple
|
|
|
|
* that is stored as a heap tuple, constructed in different memory
|
|
|
|
* context, in the slot anyway.
|
2014-03-23 07:16:34 +01:00
|
|
|
*/
|
2019-02-27 05:30:28 +01:00
|
|
|
LocTriggerData.tg_trigslot = trig_tuple_slot1;
|
|
|
|
LocTriggerData.tg_trigtuple =
|
|
|
|
ExecFetchSlotHeapTuple(trig_tuple_slot1, true, &should_free_trig);
|
2014-03-23 07:16:34 +01:00
|
|
|
|
2019-12-10 10:00:30 +01:00
|
|
|
if ((evtshared->ats_event & TRIGGER_EVENT_OPMASK) ==
|
|
|
|
TRIGGER_EVENT_UPDATE)
|
|
|
|
{
|
|
|
|
LocTriggerData.tg_newslot = trig_tuple_slot2;
|
|
|
|
LocTriggerData.tg_newtuple =
|
|
|
|
ExecFetchSlotHeapTuple(trig_tuple_slot2, true, &should_free_new);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
LocTriggerData.tg_newtuple = NULL;
|
|
|
|
}
|
2014-03-23 07:16:34 +01:00
|
|
|
break;
|
|
|
|
|
|
|
|
default:
|
|
|
|
if (ItemPointerIsValid(&(event->ate_ctid1)))
|
|
|
|
{
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
TupleTableSlot *src_slot = ExecGetTriggerOldSlot(estate,
|
|
|
|
src_relInfo);
|
2019-02-27 05:30:28 +01:00
|
|
|
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
if (!table_tuple_fetch_row_version(src_rel,
|
|
|
|
&(event->ate_ctid1),
|
2019-05-24 01:25:48 +02:00
|
|
|
SnapshotAny,
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
src_slot))
|
2014-03-23 07:16:34 +01:00
|
|
|
elog(ERROR, "failed to fetch tuple1 for AFTER trigger");
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Store the tuple fetched from the source partition into the
|
|
|
|
* target (root partitioned) table slot, converting if needed.
|
|
|
|
*/
|
|
|
|
if (src_relInfo != relInfo)
|
|
|
|
{
|
|
|
|
TupleConversionMap *map = ExecGetChildToRootMap(src_relInfo);
|
|
|
|
|
|
|
|
LocTriggerData.tg_trigslot = ExecGetTriggerOldSlot(estate, relInfo);
|
|
|
|
if (map)
|
|
|
|
{
|
|
|
|
execute_attr_map_slot(map->attrMap,
|
|
|
|
src_slot,
|
|
|
|
LocTriggerData.tg_trigslot);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
ExecCopySlot(LocTriggerData.tg_trigslot, src_slot);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
LocTriggerData.tg_trigslot = src_slot;
|
2019-02-27 05:30:28 +01:00
|
|
|
LocTriggerData.tg_trigtuple =
|
2019-03-25 08:13:42 +01:00
|
|
|
ExecFetchSlotHeapTuple(LocTriggerData.tg_trigslot, false, &should_free_trig);
|
2014-03-23 07:16:34 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
LocTriggerData.tg_trigtuple = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* don't touch ctid2 if not there */
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
if (((event->ate_flags & AFTER_TRIGGER_TUP_BITS) == AFTER_TRIGGER_2CTID ||
|
|
|
|
(event->ate_flags & AFTER_TRIGGER_CP_UPDATE)) &&
|
2014-03-23 07:16:34 +01:00
|
|
|
ItemPointerIsValid(&(event->ate_ctid2)))
|
|
|
|
{
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
TupleTableSlot *dst_slot = ExecGetTriggerNewSlot(estate,
|
|
|
|
dst_relInfo);
|
2019-02-27 05:30:28 +01:00
|
|
|
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
if (!table_tuple_fetch_row_version(dst_rel,
|
|
|
|
&(event->ate_ctid2),
|
2019-05-24 01:25:48 +02:00
|
|
|
SnapshotAny,
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
dst_slot))
|
2014-03-23 07:16:34 +01:00
|
|
|
elog(ERROR, "failed to fetch tuple2 for AFTER trigger");
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Store the tuple fetched from the destination partition into
|
|
|
|
* the target (root partitioned) table slot, converting if
|
|
|
|
* needed.
|
|
|
|
*/
|
|
|
|
if (dst_relInfo != relInfo)
|
|
|
|
{
|
|
|
|
TupleConversionMap *map = ExecGetChildToRootMap(dst_relInfo);
|
|
|
|
|
|
|
|
LocTriggerData.tg_newslot = ExecGetTriggerNewSlot(estate, relInfo);
|
|
|
|
if (map)
|
|
|
|
{
|
|
|
|
execute_attr_map_slot(map->attrMap,
|
|
|
|
dst_slot,
|
|
|
|
LocTriggerData.tg_newslot);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
ExecCopySlot(LocTriggerData.tg_newslot, dst_slot);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
LocTriggerData.tg_newslot = dst_slot;
|
2019-02-27 05:30:28 +01:00
|
|
|
LocTriggerData.tg_newtuple =
|
2019-03-25 08:13:42 +01:00
|
|
|
ExecFetchSlotHeapTuple(LocTriggerData.tg_newslot, false, &should_free_new);
|
2014-03-23 07:16:34 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
LocTriggerData.tg_newtuple = NULL;
|
|
|
|
}
|
1999-09-29 18:06:40 +02:00
|
|
|
}
|
|
|
|
|
2016-11-04 16:49:50 +01:00
|
|
|
/*
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* Set up the tuplestore information to let the trigger have access to
|
|
|
|
* transition tables. When we first make a transition table available to
|
|
|
|
* a trigger, mark it "closed" so that it cannot change anymore. If any
|
|
|
|
* additional events of the same type get queued in the current trigger
|
|
|
|
* query level, they'll go into new transition tables.
|
2016-11-04 16:49:50 +01:00
|
|
|
*/
|
2017-06-28 19:59:01 +02:00
|
|
|
LocTriggerData.tg_oldtable = LocTriggerData.tg_newtable = NULL;
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
if (evtshared->ats_table)
|
2017-06-28 19:59:01 +02:00
|
|
|
{
|
|
|
|
if (LocTriggerData.tg_trigger->tgoldtable)
|
2017-06-28 20:00:55 +02:00
|
|
|
{
|
2022-03-28 16:45:58 +02:00
|
|
|
if (TRIGGER_FIRED_BY_UPDATE(evtshared->ats_event))
|
|
|
|
LocTriggerData.tg_oldtable = evtshared->ats_table->old_upd_tuplestore;
|
|
|
|
else
|
|
|
|
LocTriggerData.tg_oldtable = evtshared->ats_table->old_del_tuplestore;
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
evtshared->ats_table->closed = true;
|
|
|
|
}
|
2017-06-28 20:00:55 +02:00
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
if (LocTriggerData.tg_trigger->tgnewtable)
|
|
|
|
{
|
2022-03-28 16:45:58 +02:00
|
|
|
if (TRIGGER_FIRED_BY_INSERT(evtshared->ats_event))
|
|
|
|
LocTriggerData.tg_newtable = evtshared->ats_table->new_ins_tuplestore;
|
|
|
|
else
|
|
|
|
LocTriggerData.tg_newtable = evtshared->ats_table->new_upd_tuplestore;
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
evtshared->ats_table->closed = true;
|
2017-06-28 20:00:55 +02:00
|
|
|
}
|
2017-06-28 19:59:01 +02:00
|
|
|
}
|
2016-11-04 16:49:50 +01:00
|
|
|
|
1999-09-29 18:06:40 +02:00
|
|
|
/*
|
2006-08-03 18:04:41 +02:00
|
|
|
* Setup the remaining trigger information
|
1999-09-29 18:06:40 +02:00
|
|
|
*/
|
2000-05-29 03:59:17 +02:00
|
|
|
LocTriggerData.type = T_TriggerData;
|
2004-09-10 20:40:09 +02:00
|
|
|
LocTriggerData.tg_event =
|
2008-10-25 01:42:35 +02:00
|
|
|
evtshared->ats_event & (TRIGGER_EVENT_OPMASK | TRIGGER_EVENT_ROW);
|
2000-05-29 03:59:17 +02:00
|
|
|
LocTriggerData.tg_relation = rel;
|
2020-03-09 09:22:22 +01:00
|
|
|
if (TRIGGER_FOR_UPDATE(LocTriggerData.tg_trigger->tgtype))
|
|
|
|
LocTriggerData.tg_updatedcols = evtshared->ats_modifiedcols;
|
1999-09-29 18:06:40 +02:00
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
MemoryContextReset(per_tuple_context);
|
|
|
|
|
1999-09-29 18:06:40 +02:00
|
|
|
/*
|
2005-03-25 22:58:00 +01:00
|
|
|
* Call the trigger and throw away any possibly returned updated tuple.
|
|
|
|
* (Don't let ExecCallTriggerFunc measure EXPLAIN time.)
|
1999-09-29 18:06:40 +02:00
|
|
|
*/
|
2001-06-01 04:41:36 +02:00
|
|
|
rettuple = ExecCallTriggerFunc(&LocTriggerData,
|
2005-03-25 22:58:00 +01:00
|
|
|
tgindx,
|
|
|
|
finfo,
|
|
|
|
NULL,
|
2001-01-22 01:50:07 +01:00
|
|
|
per_tuple_context);
|
2014-03-23 07:16:34 +01:00
|
|
|
if (rettuple != NULL &&
|
|
|
|
rettuple != LocTriggerData.tg_trigtuple &&
|
|
|
|
rettuple != LocTriggerData.tg_newtuple)
|
1999-12-16 23:20:03 +01:00
|
|
|
heap_freetuple(rettuple);
|
1999-09-29 18:06:40 +02:00
|
|
|
|
|
|
|
/*
|
2019-02-27 05:30:28 +01:00
|
|
|
* Release resources
|
1999-09-29 18:06:40 +02:00
|
|
|
*/
|
2019-02-27 05:30:28 +01:00
|
|
|
if (should_free_trig)
|
|
|
|
heap_freetuple(LocTriggerData.tg_trigtuple);
|
|
|
|
if (should_free_new)
|
|
|
|
heap_freetuple(LocTriggerData.tg_newtuple);
|
|
|
|
|
2019-12-10 10:00:30 +01:00
|
|
|
/* don't clear slots' contents if foreign table */
|
|
|
|
if (trig_tuple_slot1 == NULL)
|
|
|
|
{
|
|
|
|
if (LocTriggerData.tg_trigslot)
|
|
|
|
ExecClearTuple(LocTriggerData.tg_trigslot);
|
|
|
|
if (LocTriggerData.tg_newslot)
|
|
|
|
ExecClearTuple(LocTriggerData.tg_newslot);
|
|
|
|
}
|
2005-03-25 22:58:00 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If doing EXPLAIN ANALYZE, stop charging time to this trigger, and count
|
|
|
|
* one "tuple returned" (really the number of firings).
|
|
|
|
*/
|
|
|
|
if (instr)
|
2006-05-30 16:01:58 +02:00
|
|
|
InstrStopNode(instr + tgindx, 1);
|
1999-09-29 18:06:40 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
/*
|
|
|
|
* afterTriggerMarkEvents()
|
|
|
|
*
|
|
|
|
* Scan the given event list for not yet invoked events. Mark the ones
|
|
|
|
* that can be invoked now with the current firing ID.
|
|
|
|
*
|
|
|
|
* If move_list isn't NULL, events that are not to be invoked now are
|
2008-10-25 01:42:35 +02:00
|
|
|
* transferred to move_list.
|
2004-09-10 20:40:09 +02:00
|
|
|
*
|
2017-08-16 06:22:32 +02:00
|
|
|
* When immediate_only is true, do not invoke currently-deferred triggers.
|
|
|
|
* (This will be false only at main transaction exit.)
|
2004-09-10 20:40:09 +02:00
|
|
|
*
|
2017-08-16 06:22:32 +02:00
|
|
|
* Returns true if any invokable events were found.
|
2004-09-10 20:40:09 +02:00
|
|
|
*/
|
|
|
|
static bool
|
|
|
|
afterTriggerMarkEvents(AfterTriggerEventList *events,
|
|
|
|
AfterTriggerEventList *move_list,
|
|
|
|
bool immediate_only)
|
|
|
|
{
|
|
|
|
bool found = false;
|
In security-restricted operations, block enqueue of at-commit user code.
Specifically, this blocks DECLARE ... WITH HOLD and firing of deferred
triggers within index expressions and materialized view queries. An
attacker having permission to create non-temp objects in at least one
schema could execute arbitrary SQL functions under the identity of the
bootstrap superuser. One can work around the vulnerability by disabling
autovacuum and not manually running ANALYZE, CLUSTER, REINDEX, CREATE
INDEX, VACUUM FULL, or REFRESH MATERIALIZED VIEW. (Don't restore from
pg_dump, since it runs some of those commands.) Plain VACUUM (without
FULL) is safe, and all commands are fine when a trusted user owns the
target object. Performance may degrade quickly under this workaround,
however. Back-patch to 9.5 (all supported versions).
Reviewed by Robert Haas. Reported by Etienne Stalmans.
Security: CVE-2020-25695
2020-11-09 16:32:09 +01:00
|
|
|
bool deferred_found = false;
|
2008-10-25 01:42:35 +02:00
|
|
|
AfterTriggerEvent event;
|
|
|
|
AfterTriggerEventChunk *chunk;
|
2004-09-10 20:40:09 +02:00
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
for_each_event_chunk(event, chunk, *events)
|
2004-09-10 20:40:09 +02:00
|
|
|
{
|
2008-10-25 01:42:35 +02:00
|
|
|
AfterTriggerShared evtshared = GetTriggerSharedData(event);
|
2004-09-10 20:40:09 +02:00
|
|
|
bool defer_it = false;
|
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
if (!(event->ate_flags &
|
2004-09-10 20:40:09 +02:00
|
|
|
(AFTER_TRIGGER_DONE | AFTER_TRIGGER_IN_PROGRESS)))
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* This trigger hasn't been called or scheduled yet. Check if we
|
|
|
|
* should call it now.
|
|
|
|
*/
|
2008-10-25 01:42:35 +02:00
|
|
|
if (immediate_only && afterTriggerCheckState(evtshared))
|
2004-09-10 20:40:09 +02:00
|
|
|
{
|
|
|
|
defer_it = true;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Mark it as to be fired in this firing cycle.
|
|
|
|
*/
|
2014-10-23 18:33:02 +02:00
|
|
|
evtshared->ats_firing_id = afterTriggers.firing_counter;
|
2008-10-25 01:42:35 +02:00
|
|
|
event->ate_flags |= AFTER_TRIGGER_IN_PROGRESS;
|
2004-09-10 20:40:09 +02:00
|
|
|
found = true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If it's deferred, move it to move_list, if requested.
|
|
|
|
*/
|
|
|
|
if (defer_it && move_list != NULL)
|
|
|
|
{
|
In security-restricted operations, block enqueue of at-commit user code.
Specifically, this blocks DECLARE ... WITH HOLD and firing of deferred
triggers within index expressions and materialized view queries. An
attacker having permission to create non-temp objects in at least one
schema could execute arbitrary SQL functions under the identity of the
bootstrap superuser. One can work around the vulnerability by disabling
autovacuum and not manually running ANALYZE, CLUSTER, REINDEX, CREATE
INDEX, VACUUM FULL, or REFRESH MATERIALIZED VIEW. (Don't restore from
pg_dump, since it runs some of those commands.) Plain VACUUM (without
FULL) is safe, and all commands are fine when a trusted user owns the
target object. Performance may degrade quickly under this workaround,
however. Back-patch to 9.5 (all supported versions).
Reviewed by Robert Haas. Reported by Etienne Stalmans.
Security: CVE-2020-25695
2020-11-09 16:32:09 +01:00
|
|
|
deferred_found = true;
|
2008-10-25 01:42:35 +02:00
|
|
|
/* add it to move_list */
|
|
|
|
afterTriggerAddEvent(move_list, event, evtshared);
|
|
|
|
/* mark original copy "done" so we don't do it again */
|
|
|
|
event->ate_flags |= AFTER_TRIGGER_DONE;
|
2004-09-10 20:40:09 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
In security-restricted operations, block enqueue of at-commit user code.
Specifically, this blocks DECLARE ... WITH HOLD and firing of deferred
triggers within index expressions and materialized view queries. An
attacker having permission to create non-temp objects in at least one
schema could execute arbitrary SQL functions under the identity of the
bootstrap superuser. One can work around the vulnerability by disabling
autovacuum and not manually running ANALYZE, CLUSTER, REINDEX, CREATE
INDEX, VACUUM FULL, or REFRESH MATERIALIZED VIEW. (Don't restore from
pg_dump, since it runs some of those commands.) Plain VACUUM (without
FULL) is safe, and all commands are fine when a trusted user owns the
target object. Performance may degrade quickly under this workaround,
however. Back-patch to 9.5 (all supported versions).
Reviewed by Robert Haas. Reported by Etienne Stalmans.
Security: CVE-2020-25695
2020-11-09 16:32:09 +01:00
|
|
|
/*
|
|
|
|
* We could allow deferred triggers if, before the end of the
|
|
|
|
* security-restricted operation, we were to verify that a SET CONSTRAINTS
|
|
|
|
* ... IMMEDIATE has fired all such triggers. For now, don't bother.
|
|
|
|
*/
|
|
|
|
if (deferred_found && InSecurityRestrictedOperation())
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
|
|
|
errmsg("cannot fire deferred trigger within security-restricted operation")));
|
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
return found;
|
|
|
|
}
|
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
/*
|
2004-09-10 20:40:09 +02:00
|
|
|
* afterTriggerInvokeEvents()
|
1999-09-29 18:06:40 +02:00
|
|
|
*
|
2004-09-10 20:40:09 +02:00
|
|
|
* Scan the given event list for events that are marked as to be fired
|
|
|
|
* in the current firing cycle, and fire them.
|
|
|
|
*
|
2007-08-15 23:39:50 +02:00
|
|
|
* If estate isn't NULL, we use its result relation info to avoid repeated
|
|
|
|
* openings and closing of trigger target relations. If it is NULL, we
|
|
|
|
* make one locally to cache the info in case there are multiple trigger
|
|
|
|
* events per rel.
|
2005-03-25 22:58:00 +01:00
|
|
|
*
|
2017-08-16 06:22:32 +02:00
|
|
|
* When delete_ok is true, it's safe to delete fully-processed events.
|
2008-10-25 01:42:35 +02:00
|
|
|
* (We are not very tense about that: we simply reset a chunk to be empty
|
|
|
|
* if all its events got fired. The objective here is just to avoid useless
|
|
|
|
* rescanning of events when a trigger queues new events during transaction
|
|
|
|
* end, so it's not necessary to worry much about the case where only
|
|
|
|
* some events are fired.)
|
|
|
|
*
|
2017-08-16 06:22:32 +02:00
|
|
|
* Returns true if no unfired events remain in the list (this allows us
|
2008-10-25 01:42:35 +02:00
|
|
|
* to avoid repeating afterTriggerMarkEvents).
|
1999-09-29 18:06:40 +02:00
|
|
|
*/
|
2008-10-25 01:42:35 +02:00
|
|
|
static bool
|
2004-09-10 20:40:09 +02:00
|
|
|
afterTriggerInvokeEvents(AfterTriggerEventList *events,
|
|
|
|
CommandId firing_id,
|
2005-03-25 22:58:00 +01:00
|
|
|
EState *estate,
|
2004-09-10 20:40:09 +02:00
|
|
|
bool delete_ok)
|
1999-09-29 18:06:40 +02:00
|
|
|
{
|
2008-10-25 01:42:35 +02:00
|
|
|
bool all_fired = true;
|
|
|
|
AfterTriggerEventChunk *chunk;
|
2001-01-22 01:50:07 +01:00
|
|
|
MemoryContext per_tuple_context;
|
2007-08-15 23:39:50 +02:00
|
|
|
bool local_estate = false;
|
2019-02-27 18:14:34 +01:00
|
|
|
ResultRelInfo *rInfo = NULL;
|
2001-06-01 04:41:36 +02:00
|
|
|
Relation rel = NULL;
|
2002-10-14 18:51:30 +02:00
|
|
|
TriggerDesc *trigdesc = NULL;
|
2001-06-01 04:41:36 +02:00
|
|
|
FmgrInfo *finfo = NULL;
|
2005-03-25 22:58:00 +01:00
|
|
|
Instrumentation *instr = NULL;
|
2014-03-23 07:16:34 +01:00
|
|
|
TupleTableSlot *slot1 = NULL,
|
|
|
|
*slot2 = NULL;
|
1999-09-29 18:06:40 +02:00
|
|
|
|
2007-08-15 23:39:50 +02:00
|
|
|
/* Make a local EState if need be */
|
|
|
|
if (estate == NULL)
|
|
|
|
{
|
|
|
|
estate = CreateExecutorState();
|
|
|
|
local_estate = true;
|
|
|
|
}
|
|
|
|
|
2001-01-22 01:50:07 +01:00
|
|
|
/* Make a per-tuple memory context for trigger function calls */
|
|
|
|
per_tuple_context =
|
|
|
|
AllocSetContextCreate(CurrentMemoryContext,
|
2004-09-10 20:40:09 +02:00
|
|
|
"AfterTriggerTupleContext",
|
Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters. While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea. Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.
While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.
In passing, change TopMemoryContext to use the default allocation
parameters. Formerly it could only be extended 8K at a time. That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there. There seems no good reason
not to let it use growing blocks like most other contexts.
Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain. The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.
Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 23:50:38 +02:00
|
|
|
ALLOCSET_DEFAULT_SIZES);
|
2001-01-22 01:50:07 +01:00
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
for_each_chunk(chunk, *events)
|
1999-09-29 18:06:40 +02:00
|
|
|
{
|
2008-10-25 01:42:35 +02:00
|
|
|
AfterTriggerEvent event;
|
|
|
|
bool all_fired_in_chunk = true;
|
2001-03-22 07:16:21 +01:00
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
for_each_event(event, chunk)
|
1999-09-29 18:06:40 +02:00
|
|
|
{
|
2008-10-25 01:42:35 +02:00
|
|
|
AfterTriggerShared evtshared = GetTriggerSharedData(event);
|
|
|
|
|
1999-09-29 18:06:40 +02:00
|
|
|
/*
|
2008-10-25 01:42:35 +02:00
|
|
|
* Is it one for me to fire?
|
1999-09-29 18:06:40 +02:00
|
|
|
*/
|
2008-10-25 01:42:35 +02:00
|
|
|
if ((event->ate_flags & AFTER_TRIGGER_IN_PROGRESS) &&
|
|
|
|
evtshared->ats_firing_id == firing_id)
|
2001-06-01 04:41:36 +02:00
|
|
|
{
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
ResultRelInfo *src_rInfo,
|
|
|
|
*dst_rInfo;
|
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
/*
|
|
|
|
* So let's fire it... but first, find the correct relation if
|
|
|
|
* this is not the same relation as before.
|
|
|
|
*/
|
|
|
|
if (rel == NULL || RelationGetRelid(rel) != evtshared->ats_relid)
|
|
|
|
{
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
rInfo = ExecGetTriggerResultRel(estate, evtshared->ats_relid,
|
|
|
|
NULL);
|
2008-10-25 01:42:35 +02:00
|
|
|
rel = rInfo->ri_RelationDesc;
|
2021-05-23 03:24:48 +02:00
|
|
|
/* Catch calls with insufficient relcache refcounting */
|
|
|
|
Assert(!RelationHasReferenceCountZero(rel));
|
2008-10-25 01:42:35 +02:00
|
|
|
trigdesc = rInfo->ri_TrigDesc;
|
|
|
|
finfo = rInfo->ri_TrigFunctions;
|
|
|
|
instr = rInfo->ri_TrigInstrument;
|
2019-12-10 10:00:30 +01:00
|
|
|
if (slot1 != NULL)
|
|
|
|
{
|
|
|
|
ExecDropSingleTupleTableSlot(slot1);
|
|
|
|
ExecDropSingleTupleTableSlot(slot2);
|
|
|
|
slot1 = slot2 = NULL;
|
|
|
|
}
|
2014-03-23 07:16:34 +01:00
|
|
|
if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
|
|
|
|
{
|
Introduce notion of different types of slots (without implementing them).
Upcoming work intends to allow pluggable ways to introduce new ways of
storing table data. Accessing those table access methods from the
executor requires TupleTableSlots to be carry tuples in the native
format of such storage methods; otherwise there'll be a significant
conversion overhead.
Different access methods will require different data to store tuples
efficiently (just like virtual, minimal, heap already require fields
in TupleTableSlot). To allow that without requiring additional pointer
indirections, we want to have different structs (embedding
TupleTableSlot) for different types of slots. Thus different types of
slots are needed, which requires adapting creators of slots.
The slot that most efficiently can represent a type of tuple in an
executor node will often depend on the type of slot a child node
uses. Therefore we need to track the type of slot is returned by
nodes, so parent slots can create slots based on that.
Relatedly, JIT compilation of tuple deforming needs to know which type
of slot a certain expression refers to, so it can create an
appropriate deforming function for the type of tuple in the slot.
But not all nodes will only return one type of slot, e.g. an append
node will potentially return different types of slots for each of its
subplans.
Therefore add function that allows to query the type of a node's
result slot, and whether it'll always be the same type (whether it's
fixed). This can be queried using ExecGetResultSlotOps().
The scan, result, inner, outer type of slots are automatically
inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(),
left/right subtrees respectively. If that's not correct for a node,
that can be overwritten using new fields in PlanState.
This commit does not introduce the actually abstracted implementation
of different kind of TupleTableSlots, that will be left for a followup
commit. The different types of slots introduced will, for now, still
use the same backing implementation.
While this already partially invalidates the big comment in
tuptable.h, it seems to make more sense to update it later, when the
different TupleTableSlot implementations actually exist.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-16 07:00:30 +01:00
|
|
|
slot1 = MakeSingleTupleTableSlot(rel->rd_att,
|
|
|
|
&TTSOpsMinimalTuple);
|
|
|
|
slot2 = MakeSingleTupleTableSlot(rel->rd_att,
|
|
|
|
&TTSOpsMinimalTuple);
|
2014-03-23 07:16:34 +01:00
|
|
|
}
|
2008-10-25 01:42:35 +02:00
|
|
|
if (trigdesc == NULL) /* should not happen */
|
|
|
|
elog(ERROR, "relation %u has no triggers",
|
|
|
|
evtshared->ats_relid);
|
|
|
|
}
|
2001-11-16 17:31:16 +01:00
|
|
|
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
/*
|
|
|
|
* Look up source and destination partition result rels of a
|
|
|
|
* cross-partition update event.
|
|
|
|
*/
|
|
|
|
if ((event->ate_flags & AFTER_TRIGGER_TUP_BITS) ==
|
|
|
|
AFTER_TRIGGER_CP_UPDATE)
|
|
|
|
{
|
|
|
|
Assert(OidIsValid(event->ate_src_part) &&
|
|
|
|
OidIsValid(event->ate_dst_part));
|
|
|
|
src_rInfo = ExecGetTriggerResultRel(estate,
|
|
|
|
event->ate_src_part,
|
|
|
|
rInfo);
|
|
|
|
dst_rInfo = ExecGetTriggerResultRel(estate,
|
|
|
|
event->ate_dst_part,
|
|
|
|
rInfo);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
src_rInfo = dst_rInfo = rInfo;
|
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
/*
|
|
|
|
* Fire it. Note that the AFTER_TRIGGER_IN_PROGRESS flag is
|
|
|
|
* still set, so recursive examinations of the event list
|
|
|
|
* won't try to re-fire it.
|
|
|
|
*/
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
AfterTriggerExecute(estate, event, rInfo,
|
|
|
|
src_rInfo, dst_rInfo,
|
|
|
|
trigdesc, finfo, instr,
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
per_tuple_context, slot1, slot2);
|
2001-11-16 17:31:16 +01:00
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
/*
|
|
|
|
* Mark the event as done.
|
|
|
|
*/
|
|
|
|
event->ate_flags &= ~AFTER_TRIGGER_IN_PROGRESS;
|
|
|
|
event->ate_flags |= AFTER_TRIGGER_DONE;
|
|
|
|
}
|
|
|
|
else if (!(event->ate_flags & AFTER_TRIGGER_DONE))
|
|
|
|
{
|
|
|
|
/* something remains to be done */
|
|
|
|
all_fired = all_fired_in_chunk = false;
|
|
|
|
}
|
1999-09-29 18:06:40 +02:00
|
|
|
}
|
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
/* Clear the chunk if delete_ok and nothing left of interest */
|
|
|
|
if (delete_ok && all_fired_in_chunk)
|
2001-11-16 17:31:16 +01:00
|
|
|
{
|
2008-10-25 01:42:35 +02:00
|
|
|
chunk->freeptr = CHUNK_DATA_START(chunk);
|
|
|
|
chunk->endfree = chunk->endptr;
|
2010-08-19 17:46:18 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If it's last chunk, must sync event list's tailfree too. Note
|
|
|
|
* that delete_ok must NOT be passed as true if there could be
|
Fix possible dangling pointer dereference in trigger.c.
AfterTriggerEndQuery correctly notes that the query_stack could get
repalloc'd during a trigger firing, but it nonetheless passes the address
of a query_stack entry to afterTriggerInvokeEvents, so that if such a
repalloc occurs, afterTriggerInvokeEvents is already working with an
obsolete dangling pointer while it scans the rest of the events. Oops.
The only code at risk is its "delete_ok" cleanup code, so we can
prevent unsafe behavior by passing delete_ok = false instead of true.
However, that could have a significant performance penalty, because the
point of passing delete_ok = true is to not have to re-scan possibly
a large number of dead trigger events on the next time through the loop.
There's more than one way to skin that cat, though. What we can do is
delete all the "chunks" in the event list except the last one, since
we know all events in them must be dead. Deleting the chunks is work
we'd have had to do later in AfterTriggerEndQuery anyway, and it ends
up saving rescanning of just about the same events we'd have gotten
rid of with delete_ok = true.
In v10 and HEAD, we also have to be careful to mop up any per-table
after_trig_events pointers that would become dangling. This is slightly
annoying, but I don't think that normal use-cases will traverse this code
path often enough for it to be a performance problem.
It's pretty hard to hit this in practice because of the unlikelihood
of the query_stack getting resized at just the wrong time. Nonetheless,
it's definitely a live bug of ancient standing, so back-patch to all
supported branches.
Discussion: https://postgr.es/m/2891.1505419542@sss.pgh.pa.us
2017-09-17 20:50:01 +02:00
|
|
|
* additional AfterTriggerEventList values pointing at this event
|
2010-08-19 17:46:18 +02:00
|
|
|
* list, since we'd fail to fix their copies of tailfree.
|
|
|
|
*/
|
|
|
|
if (chunk == events->tail)
|
|
|
|
events->tailfree = chunk->freeptr;
|
2001-11-16 17:31:16 +01:00
|
|
|
}
|
1999-09-29 18:06:40 +02:00
|
|
|
}
|
2014-03-23 07:16:34 +01:00
|
|
|
if (slot1 != NULL)
|
|
|
|
{
|
|
|
|
ExecDropSingleTupleTableSlot(slot1);
|
|
|
|
ExecDropSingleTupleTableSlot(slot2);
|
|
|
|
}
|
2001-01-22 01:50:07 +01:00
|
|
|
|
2001-11-16 17:31:16 +01:00
|
|
|
/* Release working resources */
|
2007-08-15 23:39:50 +02:00
|
|
|
MemoryContextDelete(per_tuple_context);
|
|
|
|
|
|
|
|
if (local_estate)
|
2005-03-25 22:58:00 +01:00
|
|
|
{
|
Create ResultRelInfos later in InitPlan, index them by RT index.
Instead of allocating all the ResultRelInfos upfront in one big array,
allocate them in ExecInitModifyTable(). es_result_relations is now an
array of ResultRelInfo pointers, rather than an array of structs, and it
is indexed by the RT index.
This simplifies things: we get rid of the separate concept of a "result
rel index", and don't need to set it in setrefs.c anymore. This also
allows follow-up optimizations (not included in this commit yet) to skip
initializing ResultRelInfos for target relations that were not needed at
runtime, and removal of the es_result_relation_info pointer.
The EState arrays of regular result rels and root result rels are merged
into one array. Similarly, the resultRelations and rootResultRelations
lists in PlannedStmt are merged into one. It's not actually clear to me
why they were kept separate in the first place, but now that the
es_result_relations array is indexed by RT index, it certainly seems
pointless.
The PlannedStmt->resultRelations list is now only needed for
ExecRelationIsTargetRelation(). One visible effect of this change is that
ExecRelationIsTargetRelation() will now return 'true' also for the
partition root, if a partitioned table is updated. That seems like a good
thing, although the function isn't used in core code, and I don't see any
reason for an FDW to call it on a partition root.
Author: Amit Langote
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
2020-10-13 11:57:02 +02:00
|
|
|
ExecCloseResultRelations(estate);
|
2019-02-27 05:30:28 +01:00
|
|
|
ExecResetTupleTable(estate->es_tupleTable, false);
|
2007-08-15 23:39:50 +02:00
|
|
|
FreeExecutorState(estate);
|
2005-03-25 22:58:00 +01:00
|
|
|
}
|
2008-10-25 01:42:35 +02:00
|
|
|
|
|
|
|
return all_fired;
|
1999-09-29 18:06:40 +02:00
|
|
|
}
|
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
/*
|
|
|
|
* GetAfterTriggersTableData
|
|
|
|
*
|
|
|
|
* Find or create an AfterTriggersTableData struct for the specified
|
|
|
|
* trigger event (relation + operation type). Ignore existing structs
|
|
|
|
* marked "closed"; we don't want to put any additional tuples into them,
|
|
|
|
* nor change their stmt-triggers-fired state.
|
|
|
|
*
|
|
|
|
* Note: the AfterTriggersTableData list is allocated in the current
|
|
|
|
* (sub)transaction's CurTransactionContext. This is OK because
|
|
|
|
* we don't need it to live past AfterTriggerEndQuery.
|
|
|
|
*/
|
|
|
|
static AfterTriggersTableData *
|
|
|
|
GetAfterTriggersTableData(Oid relid, CmdType cmdType)
|
|
|
|
{
|
|
|
|
AfterTriggersTableData *table;
|
|
|
|
AfterTriggersQueryData *qs;
|
|
|
|
MemoryContext oldcxt;
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
/* Caller should have ensured query_depth is OK. */
|
|
|
|
Assert(afterTriggers.query_depth >= 0 &&
|
|
|
|
afterTriggers.query_depth < afterTriggers.maxquerydepth);
|
|
|
|
qs = &afterTriggers.query_stack[afterTriggers.query_depth];
|
|
|
|
|
|
|
|
foreach(lc, qs->tables)
|
|
|
|
{
|
|
|
|
table = (AfterTriggersTableData *) lfirst(lc);
|
|
|
|
if (table->relid == relid && table->cmdType == cmdType &&
|
|
|
|
!table->closed)
|
|
|
|
return table;
|
|
|
|
}
|
|
|
|
|
|
|
|
oldcxt = MemoryContextSwitchTo(CurTransactionContext);
|
|
|
|
|
|
|
|
table = (AfterTriggersTableData *) palloc0(sizeof(AfterTriggersTableData));
|
|
|
|
table->relid = relid;
|
|
|
|
table->cmdType = cmdType;
|
|
|
|
qs->tables = lappend(qs->tables, table);
|
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcxt);
|
|
|
|
|
|
|
|
return table;
|
|
|
|
}
|
|
|
|
|
2021-02-27 22:09:15 +01:00
|
|
|
/*
|
|
|
|
* Returns a TupleTableSlot suitable for holding the tuples to be put
|
|
|
|
* into AfterTriggersTableData's transition table tuplestores.
|
|
|
|
*/
|
|
|
|
static TupleTableSlot *
|
|
|
|
GetAfterTriggersStoreSlot(AfterTriggersTableData *table,
|
|
|
|
TupleDesc tupdesc)
|
|
|
|
{
|
|
|
|
/* Create it if not already done. */
|
|
|
|
if (!table->storeslot)
|
|
|
|
{
|
|
|
|
MemoryContext oldcxt;
|
|
|
|
|
|
|
|
/*
|
Fix tupdesc lifespan bug with AfterTriggersTableData.storeslot.
Commit 25936fd46 adjusted things so that the "storeslot" we use
for remapping trigger tuples would have adequate lifespan, but it
neglected to consider the lifespan of the tuple descriptor that
the slot depends on. It turns out that in at least some cases, the
tupdesc we are passing is a refcounted tupdesc, and the refcount for
the slot's reference can get assigned to a resource owner having
different lifespan than the slot does. That leads to an error like
"tupdesc reference 0x7fdef236a1b8 is not owned by resource owner
SubTransaction". Worse, because of a second oversight in the same
commit, we'd try to free the same tupdesc refcount again while
cleaning up after that error, leading to recursive errors and an
"ERRORDATA_STACK_SIZE exceeded" PANIC.
To fix the initial problem, let's just make a non-refcounted copy
of the tupdesc we're supposed to use. That seems likely to guard
against additional problems, since there's no strong reason for
this code to assume that what it's given is a refcounted tupdesc;
in which case there's an independent hazard of the tupdesc having
shorter lifespan than the slot does. (I didn't bother trying to
free said copy, since it should go away anyway when the (sub)
transaction context is cleaned up.)
The other issue can be fixed by making the code added to
AfterTriggerFreeQuery work like the rest of that function, ie be
sure that it doesn't try to free the same slot twice in the event
of recursive error cleanup.
While here, also clean up minor stylistic issues in the test case
added by 25936fd46: don't use "create or replace function", as any
name collision within the tests is likely to have ill effects
that that won't mask; and don't use function names as generic as
trigger_function1, especially if you're not going to drop them
at the end of the test stanza.
Per bug #17607 from Thomas Mc Kay. Back-patch to v12, as the
previous fix was.
Discussion: https://postgr.es/m/17607-bd8ccc81226f7f80@postgresql.org
2022-09-25 23:10:58 +02:00
|
|
|
* We need this slot only until AfterTriggerEndQuery, but making it
|
|
|
|
* last till end-of-subxact is good enough. It'll be freed by
|
|
|
|
* AfterTriggerFreeQuery(). However, the passed-in tupdesc might have
|
|
|
|
* a different lifespan, so we'd better make a copy of that.
|
2021-02-27 22:09:15 +01:00
|
|
|
*/
|
|
|
|
oldcxt = MemoryContextSwitchTo(CurTransactionContext);
|
Fix tupdesc lifespan bug with AfterTriggersTableData.storeslot.
Commit 25936fd46 adjusted things so that the "storeslot" we use
for remapping trigger tuples would have adequate lifespan, but it
neglected to consider the lifespan of the tuple descriptor that
the slot depends on. It turns out that in at least some cases, the
tupdesc we are passing is a refcounted tupdesc, and the refcount for
the slot's reference can get assigned to a resource owner having
different lifespan than the slot does. That leads to an error like
"tupdesc reference 0x7fdef236a1b8 is not owned by resource owner
SubTransaction". Worse, because of a second oversight in the same
commit, we'd try to free the same tupdesc refcount again while
cleaning up after that error, leading to recursive errors and an
"ERRORDATA_STACK_SIZE exceeded" PANIC.
To fix the initial problem, let's just make a non-refcounted copy
of the tupdesc we're supposed to use. That seems likely to guard
against additional problems, since there's no strong reason for
this code to assume that what it's given is a refcounted tupdesc;
in which case there's an independent hazard of the tupdesc having
shorter lifespan than the slot does. (I didn't bother trying to
free said copy, since it should go away anyway when the (sub)
transaction context is cleaned up.)
The other issue can be fixed by making the code added to
AfterTriggerFreeQuery work like the rest of that function, ie be
sure that it doesn't try to free the same slot twice in the event
of recursive error cleanup.
While here, also clean up minor stylistic issues in the test case
added by 25936fd46: don't use "create or replace function", as any
name collision within the tests is likely to have ill effects
that that won't mask; and don't use function names as generic as
trigger_function1, especially if you're not going to drop them
at the end of the test stanza.
Per bug #17607 from Thomas Mc Kay. Back-patch to v12, as the
previous fix was.
Discussion: https://postgr.es/m/17607-bd8ccc81226f7f80@postgresql.org
2022-09-25 23:10:58 +02:00
|
|
|
tupdesc = CreateTupleDescCopy(tupdesc);
|
2021-02-27 22:09:15 +01:00
|
|
|
table->storeslot = MakeSingleTupleTableSlot(tupdesc, &TTSOpsVirtual);
|
|
|
|
MemoryContextSwitchTo(oldcxt);
|
|
|
|
}
|
|
|
|
|
|
|
|
return table->storeslot;
|
|
|
|
}
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* MakeTransitionCaptureState
|
|
|
|
*
|
|
|
|
* Make a TransitionCaptureState object for the given TriggerDesc, target
|
|
|
|
* relation, and operation type. The TCS object holds all the state needed
|
|
|
|
* to decide whether to capture tuples in transition tables.
|
|
|
|
*
|
|
|
|
* If there are no triggers in 'trigdesc' that request relevant transition
|
|
|
|
* tables, then return NULL.
|
|
|
|
*
|
2020-10-19 13:11:54 +02:00
|
|
|
* The resulting object can be passed to the ExecAR* functions. When
|
|
|
|
* dealing with child tables, the caller can set tcs_original_insert_tuple
|
|
|
|
* to avoid having to reconstruct the original tuple in the root table's
|
|
|
|
* format.
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
*
|
|
|
|
* Note that we copy the flags from a parent table into this struct (rather
|
|
|
|
* than subsequently using the relation's TriggerDesc directly) so that we can
|
|
|
|
* use it to control collection of transition tuples from child tables.
|
|
|
|
*
|
|
|
|
* Per SQL spec, all operations of the same kind (INSERT/UPDATE/DELETE)
|
|
|
|
* on the same table during one query should share one transition table.
|
|
|
|
* Therefore, the Tuplestores are owned by an AfterTriggersTableData struct
|
|
|
|
* looked up using the table OID + CmdType, and are merely referenced by
|
|
|
|
* the TransitionCaptureState objects we hand out to callers.
|
|
|
|
*/
|
|
|
|
TransitionCaptureState *
|
|
|
|
MakeTransitionCaptureState(TriggerDesc *trigdesc, Oid relid, CmdType cmdType)
|
|
|
|
{
|
|
|
|
TransitionCaptureState *state;
|
2022-03-28 16:45:58 +02:00
|
|
|
bool need_old_upd,
|
|
|
|
need_new_upd,
|
|
|
|
need_old_del,
|
|
|
|
need_new_ins;
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
AfterTriggersTableData *table;
|
|
|
|
MemoryContext oldcxt;
|
|
|
|
ResourceOwner saveResourceOwner;
|
|
|
|
|
|
|
|
if (trigdesc == NULL)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
/* Detect which table(s) we need. */
|
|
|
|
switch (cmdType)
|
|
|
|
{
|
|
|
|
case CMD_INSERT:
|
2022-03-28 16:45:58 +02:00
|
|
|
need_old_upd = need_old_del = need_new_upd = false;
|
|
|
|
need_new_ins = trigdesc->trig_insert_new_table;
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
break;
|
|
|
|
case CMD_UPDATE:
|
2022-03-28 16:45:58 +02:00
|
|
|
need_old_upd = trigdesc->trig_update_old_table;
|
|
|
|
need_new_upd = trigdesc->trig_update_new_table;
|
|
|
|
need_old_del = need_new_ins = false;
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
break;
|
|
|
|
case CMD_DELETE:
|
2022-03-28 16:45:58 +02:00
|
|
|
need_old_del = trigdesc->trig_delete_old_table;
|
|
|
|
need_old_upd = need_new_upd = need_new_ins = false;
|
|
|
|
break;
|
|
|
|
case CMD_MERGE:
|
|
|
|
need_old_upd = trigdesc->trig_update_old_table;
|
|
|
|
need_new_upd = trigdesc->trig_update_new_table;
|
|
|
|
need_old_del = trigdesc->trig_delete_old_table;
|
|
|
|
need_new_ins = trigdesc->trig_insert_new_table;
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
elog(ERROR, "unexpected CmdType: %d", (int) cmdType);
|
2022-03-28 16:45:58 +02:00
|
|
|
/* keep compiler quiet */
|
|
|
|
need_old_upd = need_new_upd = need_old_del = need_new_ins = false;
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
break;
|
|
|
|
}
|
2022-03-28 16:45:58 +02:00
|
|
|
if (!need_old_upd && !need_new_upd && !need_new_ins && !need_old_del)
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
return NULL;
|
|
|
|
|
|
|
|
/* Check state, like AfterTriggerSaveEvent. */
|
|
|
|
if (afterTriggers.query_depth < 0)
|
|
|
|
elog(ERROR, "MakeTransitionCaptureState() called outside of query");
|
|
|
|
|
|
|
|
/* Be sure we have enough space to record events at this query depth. */
|
|
|
|
if (afterTriggers.query_depth >= afterTriggers.maxquerydepth)
|
|
|
|
AfterTriggerEnlargeQueryState();
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Find or create an AfterTriggersTableData struct to hold the
|
|
|
|
* tuplestore(s). If there's a matching struct but it's marked closed,
|
|
|
|
* ignore it; we need a newer one.
|
|
|
|
*
|
|
|
|
* Note: the AfterTriggersTableData list, as well as the tuplestores, are
|
|
|
|
* allocated in the current (sub)transaction's CurTransactionContext, and
|
|
|
|
* the tuplestores are managed by the (sub)transaction's resource owner.
|
|
|
|
* This is sufficient lifespan because we do not allow triggers using
|
|
|
|
* transition tables to be deferrable; they will be fired during
|
|
|
|
* AfterTriggerEndQuery, after which it's okay to delete the data.
|
|
|
|
*/
|
|
|
|
table = GetAfterTriggersTableData(relid, cmdType);
|
|
|
|
|
|
|
|
/* Now create required tuplestore(s), if we don't have them already. */
|
|
|
|
oldcxt = MemoryContextSwitchTo(CurTransactionContext);
|
|
|
|
saveResourceOwner = CurrentResourceOwner;
|
2017-10-11 23:43:50 +02:00
|
|
|
CurrentResourceOwner = CurTransactionResourceOwner;
|
|
|
|
|
2022-03-28 16:45:58 +02:00
|
|
|
if (need_old_upd && table->old_upd_tuplestore == NULL)
|
|
|
|
table->old_upd_tuplestore = tuplestore_begin_heap(false, false, work_mem);
|
|
|
|
if (need_new_upd && table->new_upd_tuplestore == NULL)
|
|
|
|
table->new_upd_tuplestore = tuplestore_begin_heap(false, false, work_mem);
|
|
|
|
if (need_old_del && table->old_del_tuplestore == NULL)
|
|
|
|
table->old_del_tuplestore = tuplestore_begin_heap(false, false, work_mem);
|
|
|
|
if (need_new_ins && table->new_ins_tuplestore == NULL)
|
|
|
|
table->new_ins_tuplestore = tuplestore_begin_heap(false, false, work_mem);
|
2017-10-11 23:43:50 +02:00
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
CurrentResourceOwner = saveResourceOwner;
|
|
|
|
MemoryContextSwitchTo(oldcxt);
|
|
|
|
|
|
|
|
/* Now build the TransitionCaptureState struct, in caller's context */
|
|
|
|
state = (TransitionCaptureState *) palloc0(sizeof(TransitionCaptureState));
|
|
|
|
state->tcs_delete_old_table = trigdesc->trig_delete_old_table;
|
|
|
|
state->tcs_update_old_table = trigdesc->trig_update_old_table;
|
|
|
|
state->tcs_update_new_table = trigdesc->trig_update_new_table;
|
|
|
|
state->tcs_insert_new_table = trigdesc->trig_insert_new_table;
|
|
|
|
state->tcs_private = table;
|
|
|
|
|
|
|
|
return state;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
1999-09-29 18:06:40 +02:00
|
|
|
/* ----------
|
2004-09-10 20:40:09 +02:00
|
|
|
* AfterTriggerBeginXact()
|
1999-09-29 18:06:40 +02:00
|
|
|
*
|
|
|
|
* Called at transaction start (either BEGIN or implicit for single
|
|
|
|
* statement outside of transaction block).
|
|
|
|
* ----------
|
|
|
|
*/
|
|
|
|
void
|
2004-09-10 20:40:09 +02:00
|
|
|
AfterTriggerBeginXact(void)
|
1999-09-29 18:06:40 +02:00
|
|
|
{
|
2014-10-23 18:33:02 +02:00
|
|
|
/*
|
|
|
|
* Initialize after-trigger state structure to empty
|
|
|
|
*/
|
|
|
|
afterTriggers.firing_counter = (CommandId) 1; /* mustn't be 0 */
|
|
|
|
afterTriggers.query_depth = -1;
|
2004-09-10 20:40:09 +02:00
|
|
|
|
|
|
|
/*
|
2014-10-23 18:33:02 +02:00
|
|
|
* Verify that there is no leftover state remaining. If these assertions
|
|
|
|
* trip, it means that AfterTriggerEndXact wasn't called or didn't clean
|
|
|
|
* up properly.
|
2004-09-10 20:40:09 +02:00
|
|
|
*/
|
2014-10-23 18:33:02 +02:00
|
|
|
Assert(afterTriggers.state == NULL);
|
|
|
|
Assert(afterTriggers.query_stack == NULL);
|
|
|
|
Assert(afterTriggers.maxquerydepth == 0);
|
|
|
|
Assert(afterTriggers.event_cxt == NULL);
|
|
|
|
Assert(afterTriggers.events.head == NULL);
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
Assert(afterTriggers.trans_stack == NULL);
|
2014-10-23 18:33:02 +02:00
|
|
|
Assert(afterTriggers.maxtransdepth == 0);
|
2004-09-10 20:40:09 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/* ----------
|
|
|
|
* AfterTriggerBeginQuery()
|
|
|
|
*
|
|
|
|
* Called just before we start processing a single query within a
|
2014-10-23 18:33:02 +02:00
|
|
|
* transaction (or subtransaction). Most of the real work gets deferred
|
|
|
|
* until somebody actually tries to queue a trigger event.
|
2004-09-10 20:40:09 +02:00
|
|
|
* ----------
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
AfterTriggerBeginQuery(void)
|
|
|
|
{
|
|
|
|
/* Increase the query stack depth */
|
2014-10-23 18:33:02 +02:00
|
|
|
afterTriggers.query_depth++;
|
1999-09-29 18:06:40 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/* ----------
|
2004-09-10 20:40:09 +02:00
|
|
|
* AfterTriggerEndQuery()
|
1999-09-29 18:06:40 +02:00
|
|
|
*
|
2004-09-10 20:40:09 +02:00
|
|
|
* Called after one query has been completely processed. At this time
|
|
|
|
* we invoke all AFTER IMMEDIATE trigger events queued by the query, and
|
|
|
|
* transfer deferred trigger events to the global deferred-trigger list.
|
2005-03-25 22:58:00 +01:00
|
|
|
*
|
2011-02-27 19:43:29 +01:00
|
|
|
* Note that this must be called BEFORE closing down the executor
|
2005-03-25 22:58:00 +01:00
|
|
|
* with ExecutorEnd, because we make use of the EState's info about
|
2011-02-27 19:43:29 +01:00
|
|
|
* target relations. Normally it is called from ExecutorFinish.
|
1999-09-29 18:06:40 +02:00
|
|
|
* ----------
|
|
|
|
*/
|
|
|
|
void
|
2005-03-25 22:58:00 +01:00
|
|
|
AfterTriggerEndQuery(EState *estate)
|
1999-09-29 18:06:40 +02:00
|
|
|
{
|
Fix possible dangling pointer dereference in trigger.c.
AfterTriggerEndQuery correctly notes that the query_stack could get
repalloc'd during a trigger firing, but it nonetheless passes the address
of a query_stack entry to afterTriggerInvokeEvents, so that if such a
repalloc occurs, afterTriggerInvokeEvents is already working with an
obsolete dangling pointer while it scans the rest of the events. Oops.
The only code at risk is its "delete_ok" cleanup code, so we can
prevent unsafe behavior by passing delete_ok = false instead of true.
However, that could have a significant performance penalty, because the
point of passing delete_ok = true is to not have to re-scan possibly
a large number of dead trigger events on the next time through the loop.
There's more than one way to skin that cat, though. What we can do is
delete all the "chunks" in the event list except the last one, since
we know all events in them must be dead. Deleting the chunks is work
we'd have had to do later in AfterTriggerEndQuery anyway, and it ends
up saving rescanning of just about the same events we'd have gotten
rid of with delete_ok = true.
In v10 and HEAD, we also have to be careful to mop up any per-table
after_trig_events pointers that would become dangling. This is slightly
annoying, but I don't think that normal use-cases will traverse this code
path often enough for it to be a performance problem.
It's pretty hard to hit this in practice because of the unlikelihood
of the query_stack getting resized at just the wrong time. Nonetheless,
it's definitely a live bug of ancient standing, so back-patch to all
supported branches.
Discussion: https://postgr.es/m/2891.1505419542@sss.pgh.pa.us
2017-09-17 20:50:01 +02:00
|
|
|
AfterTriggersQueryData *qs;
|
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
/* Must be inside a query, too */
|
2014-10-23 18:33:02 +02:00
|
|
|
Assert(afterTriggers.query_depth >= 0);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we never even got as far as initializing the event stack, there
|
|
|
|
* certainly won't be any events, so exit quickly.
|
|
|
|
*/
|
|
|
|
if (afterTriggers.query_depth >= afterTriggers.maxquerydepth)
|
|
|
|
{
|
|
|
|
afterTriggers.query_depth--;
|
|
|
|
return;
|
|
|
|
}
|
2004-09-10 20:40:09 +02:00
|
|
|
|
1999-09-29 18:06:40 +02:00
|
|
|
/*
|
2004-09-10 20:40:09 +02:00
|
|
|
* Process all immediate-mode triggers queued by the query, and move the
|
|
|
|
* deferred ones to the main list of deferred events.
|
|
|
|
*
|
|
|
|
* Notice that we decide which ones will be fired, and put the deferred
|
|
|
|
* ones on the main list, before anything is actually fired. This ensures
|
|
|
|
* reasonably sane behavior if a trigger function does SET CONSTRAINTS ...
|
|
|
|
* IMMEDIATE: all events we have decided to defer will be available for it
|
|
|
|
* to fire.
|
|
|
|
*
|
2014-03-23 07:15:52 +01:00
|
|
|
* We loop in case a trigger queues more events at the same query level.
|
|
|
|
* Ordinary trigger functions, including all PL/pgSQL trigger functions,
|
|
|
|
* will instead fire any triggers in a dedicated query level. Foreign key
|
|
|
|
* enforcement triggers do add to the current query level, thanks to their
|
|
|
|
* passing fire_triggers = false to SPI_execute_snapshot(). Other
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* C-language triggers might do likewise.
|
2007-08-15 21:15:47 +02:00
|
|
|
*
|
2004-09-10 20:40:09 +02:00
|
|
|
* If we find no firable events, we don't have to increment
|
|
|
|
* firing_counter.
|
1999-09-29 18:06:40 +02:00
|
|
|
*/
|
Fix possible dangling pointer dereference in trigger.c.
AfterTriggerEndQuery correctly notes that the query_stack could get
repalloc'd during a trigger firing, but it nonetheless passes the address
of a query_stack entry to afterTriggerInvokeEvents, so that if such a
repalloc occurs, afterTriggerInvokeEvents is already working with an
obsolete dangling pointer while it scans the rest of the events. Oops.
The only code at risk is its "delete_ok" cleanup code, so we can
prevent unsafe behavior by passing delete_ok = false instead of true.
However, that could have a significant performance penalty, because the
point of passing delete_ok = true is to not have to re-scan possibly
a large number of dead trigger events on the next time through the loop.
There's more than one way to skin that cat, though. What we can do is
delete all the "chunks" in the event list except the last one, since
we know all events in them must be dead. Deleting the chunks is work
we'd have had to do later in AfterTriggerEndQuery anyway, and it ends
up saving rescanning of just about the same events we'd have gotten
rid of with delete_ok = true.
In v10 and HEAD, we also have to be careful to mop up any per-table
after_trig_events pointers that would become dangling. This is slightly
annoying, but I don't think that normal use-cases will traverse this code
path often enough for it to be a performance problem.
It's pretty hard to hit this in practice because of the unlikelihood
of the query_stack getting resized at just the wrong time. Nonetheless,
it's definitely a live bug of ancient standing, so back-patch to all
supported branches.
Discussion: https://postgr.es/m/2891.1505419542@sss.pgh.pa.us
2017-09-17 20:50:01 +02:00
|
|
|
qs = &afterTriggers.query_stack[afterTriggers.query_depth];
|
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
for (;;)
|
2004-09-10 20:40:09 +02:00
|
|
|
{
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
if (afterTriggerMarkEvents(&qs->events, &afterTriggers.events, true))
|
2008-10-25 01:42:35 +02:00
|
|
|
{
|
2014-10-23 18:33:02 +02:00
|
|
|
CommandId firing_id = afterTriggers.firing_counter++;
|
Fix possible dangling pointer dereference in trigger.c.
AfterTriggerEndQuery correctly notes that the query_stack could get
repalloc'd during a trigger firing, but it nonetheless passes the address
of a query_stack entry to afterTriggerInvokeEvents, so that if such a
repalloc occurs, afterTriggerInvokeEvents is already working with an
obsolete dangling pointer while it scans the rest of the events. Oops.
The only code at risk is its "delete_ok" cleanup code, so we can
prevent unsafe behavior by passing delete_ok = false instead of true.
However, that could have a significant performance penalty, because the
point of passing delete_ok = true is to not have to re-scan possibly
a large number of dead trigger events on the next time through the loop.
There's more than one way to skin that cat, though. What we can do is
delete all the "chunks" in the event list except the last one, since
we know all events in them must be dead. Deleting the chunks is work
we'd have had to do later in AfterTriggerEndQuery anyway, and it ends
up saving rescanning of just about the same events we'd have gotten
rid of with delete_ok = true.
In v10 and HEAD, we also have to be careful to mop up any per-table
after_trig_events pointers that would become dangling. This is slightly
annoying, but I don't think that normal use-cases will traverse this code
path often enough for it to be a performance problem.
It's pretty hard to hit this in practice because of the unlikelihood
of the query_stack getting resized at just the wrong time. Nonetheless,
it's definitely a live bug of ancient standing, so back-patch to all
supported branches.
Discussion: https://postgr.es/m/2891.1505419542@sss.pgh.pa.us
2017-09-17 20:50:01 +02:00
|
|
|
AfterTriggerEventChunk *oldtail = qs->events.tail;
|
2004-09-10 20:40:09 +02:00
|
|
|
|
Fix possible dangling pointer dereference in trigger.c.
AfterTriggerEndQuery correctly notes that the query_stack could get
repalloc'd during a trigger firing, but it nonetheless passes the address
of a query_stack entry to afterTriggerInvokeEvents, so that if such a
repalloc occurs, afterTriggerInvokeEvents is already working with an
obsolete dangling pointer while it scans the rest of the events. Oops.
The only code at risk is its "delete_ok" cleanup code, so we can
prevent unsafe behavior by passing delete_ok = false instead of true.
However, that could have a significant performance penalty, because the
point of passing delete_ok = true is to not have to re-scan possibly
a large number of dead trigger events on the next time through the loop.
There's more than one way to skin that cat, though. What we can do is
delete all the "chunks" in the event list except the last one, since
we know all events in them must be dead. Deleting the chunks is work
we'd have had to do later in AfterTriggerEndQuery anyway, and it ends
up saving rescanning of just about the same events we'd have gotten
rid of with delete_ok = true.
In v10 and HEAD, we also have to be careful to mop up any per-table
after_trig_events pointers that would become dangling. This is slightly
annoying, but I don't think that normal use-cases will traverse this code
path often enough for it to be a performance problem.
It's pretty hard to hit this in practice because of the unlikelihood
of the query_stack getting resized at just the wrong time. Nonetheless,
it's definitely a live bug of ancient standing, so back-patch to all
supported branches.
Discussion: https://postgr.es/m/2891.1505419542@sss.pgh.pa.us
2017-09-17 20:50:01 +02:00
|
|
|
if (afterTriggerInvokeEvents(&qs->events, firing_id, estate, false))
|
2008-10-25 01:42:35 +02:00
|
|
|
break; /* all fired */
|
Fix possible dangling pointer dereference in trigger.c.
AfterTriggerEndQuery correctly notes that the query_stack could get
repalloc'd during a trigger firing, but it nonetheless passes the address
of a query_stack entry to afterTriggerInvokeEvents, so that if such a
repalloc occurs, afterTriggerInvokeEvents is already working with an
obsolete dangling pointer while it scans the rest of the events. Oops.
The only code at risk is its "delete_ok" cleanup code, so we can
prevent unsafe behavior by passing delete_ok = false instead of true.
However, that could have a significant performance penalty, because the
point of passing delete_ok = true is to not have to re-scan possibly
a large number of dead trigger events on the next time through the loop.
There's more than one way to skin that cat, though. What we can do is
delete all the "chunks" in the event list except the last one, since
we know all events in them must be dead. Deleting the chunks is work
we'd have had to do later in AfterTriggerEndQuery anyway, and it ends
up saving rescanning of just about the same events we'd have gotten
rid of with delete_ok = true.
In v10 and HEAD, we also have to be careful to mop up any per-table
after_trig_events pointers that would become dangling. This is slightly
annoying, but I don't think that normal use-cases will traverse this code
path often enough for it to be a performance problem.
It's pretty hard to hit this in practice because of the unlikelihood
of the query_stack getting resized at just the wrong time. Nonetheless,
it's definitely a live bug of ancient standing, so back-patch to all
supported branches.
Discussion: https://postgr.es/m/2891.1505419542@sss.pgh.pa.us
2017-09-17 20:50:01 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Firing a trigger could result in query_stack being repalloc'd,
|
|
|
|
* so we must recalculate qs after each afterTriggerInvokeEvents
|
|
|
|
* call. Furthermore, it's unsafe to pass delete_ok = true here,
|
|
|
|
* because that could cause afterTriggerInvokeEvents to try to
|
|
|
|
* access qs->events after the stack has been repalloc'd.
|
|
|
|
*/
|
|
|
|
qs = &afterTriggers.query_stack[afterTriggers.query_depth];
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We'll need to scan the events list again. To reduce the cost
|
|
|
|
* of doing so, get rid of completely-fired chunks. We know that
|
|
|
|
* all events were marked IN_PROGRESS or DONE at the conclusion of
|
|
|
|
* afterTriggerMarkEvents, so any still-interesting events must
|
|
|
|
* have been added after that, and so must be in the chunk that
|
|
|
|
* was then the tail chunk, or in later chunks. So, zap all
|
|
|
|
* chunks before oldtail. This is approximately the same set of
|
|
|
|
* events we would have gotten rid of by passing delete_ok = true.
|
|
|
|
*/
|
|
|
|
Assert(oldtail != NULL);
|
|
|
|
while (qs->events.head != oldtail)
|
|
|
|
afterTriggerDeleteHeadEventChunk(qs);
|
2008-10-25 01:42:35 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
break;
|
2004-09-10 20:40:09 +02:00
|
|
|
}
|
1999-09-29 18:06:40 +02:00
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
/* Release query-level-local storage, including tuplestores if any */
|
|
|
|
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
|
|
|
|
|
|
|
|
afterTriggers.query_depth--;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* AfterTriggerFreeQuery
|
|
|
|
* Release subsidiary storage for a trigger query level.
|
|
|
|
* This includes closing down tuplestores.
|
|
|
|
* Note: it's important for this to be safe if interrupted by an error
|
|
|
|
* and then called again for the same query level.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
AfterTriggerFreeQuery(AfterTriggersQueryData *qs)
|
|
|
|
{
|
|
|
|
Tuplestorestate *ts;
|
|
|
|
List *tables;
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
/* Drop the trigger events */
|
|
|
|
afterTriggerFreeEventList(&qs->events);
|
|
|
|
|
|
|
|
/* Drop FDW tuplestore if any */
|
|
|
|
ts = qs->fdw_tuplestore;
|
|
|
|
qs->fdw_tuplestore = NULL;
|
|
|
|
if (ts)
|
|
|
|
tuplestore_end(ts);
|
|
|
|
|
|
|
|
/* Release per-table subsidiary storage */
|
|
|
|
tables = qs->tables;
|
|
|
|
foreach(lc, tables)
|
2014-03-23 07:16:34 +01:00
|
|
|
{
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
AfterTriggersTableData *table = (AfterTriggersTableData *) lfirst(lc);
|
|
|
|
|
2022-03-28 16:45:58 +02:00
|
|
|
ts = table->old_upd_tuplestore;
|
|
|
|
table->old_upd_tuplestore = NULL;
|
|
|
|
if (ts)
|
|
|
|
tuplestore_end(ts);
|
|
|
|
ts = table->new_upd_tuplestore;
|
|
|
|
table->new_upd_tuplestore = NULL;
|
|
|
|
if (ts)
|
|
|
|
tuplestore_end(ts);
|
|
|
|
ts = table->old_del_tuplestore;
|
|
|
|
table->old_del_tuplestore = NULL;
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
if (ts)
|
|
|
|
tuplestore_end(ts);
|
2022-03-28 16:45:58 +02:00
|
|
|
ts = table->new_ins_tuplestore;
|
|
|
|
table->new_ins_tuplestore = NULL;
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
if (ts)
|
|
|
|
tuplestore_end(ts);
|
2021-02-27 22:09:15 +01:00
|
|
|
if (table->storeslot)
|
Fix tupdesc lifespan bug with AfterTriggersTableData.storeslot.
Commit 25936fd46 adjusted things so that the "storeslot" we use
for remapping trigger tuples would have adequate lifespan, but it
neglected to consider the lifespan of the tuple descriptor that
the slot depends on. It turns out that in at least some cases, the
tupdesc we are passing is a refcounted tupdesc, and the refcount for
the slot's reference can get assigned to a resource owner having
different lifespan than the slot does. That leads to an error like
"tupdesc reference 0x7fdef236a1b8 is not owned by resource owner
SubTransaction". Worse, because of a second oversight in the same
commit, we'd try to free the same tupdesc refcount again while
cleaning up after that error, leading to recursive errors and an
"ERRORDATA_STACK_SIZE exceeded" PANIC.
To fix the initial problem, let's just make a non-refcounted copy
of the tupdesc we're supposed to use. That seems likely to guard
against additional problems, since there's no strong reason for
this code to assume that what it's given is a refcounted tupdesc;
in which case there's an independent hazard of the tupdesc having
shorter lifespan than the slot does. (I didn't bother trying to
free said copy, since it should go away anyway when the (sub)
transaction context is cleaned up.)
The other issue can be fixed by making the code added to
AfterTriggerFreeQuery work like the rest of that function, ie be
sure that it doesn't try to free the same slot twice in the event
of recursive error cleanup.
While here, also clean up minor stylistic issues in the test case
added by 25936fd46: don't use "create or replace function", as any
name collision within the tests is likely to have ill effects
that that won't mask; and don't use function names as generic as
trigger_function1, especially if you're not going to drop them
at the end of the test stanza.
Per bug #17607 from Thomas Mc Kay. Back-patch to v12, as the
previous fix was.
Discussion: https://postgr.es/m/17607-bd8ccc81226f7f80@postgresql.org
2022-09-25 23:10:58 +02:00
|
|
|
{
|
|
|
|
TupleTableSlot *slot = table->storeslot;
|
|
|
|
|
|
|
|
table->storeslot = NULL;
|
|
|
|
ExecDropSingleTupleTableSlot(slot);
|
|
|
|
}
|
2014-03-23 07:16:34 +01:00
|
|
|
}
|
2008-10-25 01:42:35 +02:00
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
/*
|
|
|
|
* Now free the AfterTriggersTableData structs and list cells. Reset list
|
|
|
|
* pointer first; if list_free_deep somehow gets an error, better to leak
|
|
|
|
* that storage than have an infinite loop.
|
|
|
|
*/
|
|
|
|
qs->tables = NIL;
|
|
|
|
list_free_deep(tables);
|
1999-09-29 18:06:40 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/* ----------
|
2005-04-11 21:51:16 +02:00
|
|
|
* AfterTriggerFireDeferred()
|
1999-09-29 18:06:40 +02:00
|
|
|
*
|
|
|
|
* Called just before the current transaction is committed. At this
|
2005-04-11 21:51:16 +02:00
|
|
|
* time we invoke all pending DEFERRED triggers.
|
|
|
|
*
|
|
|
|
* It is possible for other modules to queue additional deferred triggers
|
|
|
|
* during pre-commit processing; therefore xact.c may have to call this
|
|
|
|
* multiple times.
|
1999-09-29 18:06:40 +02:00
|
|
|
* ----------
|
|
|
|
*/
|
|
|
|
void
|
2005-04-11 21:51:16 +02:00
|
|
|
AfterTriggerFireDeferred(void)
|
1999-09-29 18:06:40 +02:00
|
|
|
{
|
2004-09-10 20:40:09 +02:00
|
|
|
AfterTriggerEventList *events;
|
2008-05-12 22:02:02 +02:00
|
|
|
bool snap_pushed = false;
|
2004-09-10 20:40:09 +02:00
|
|
|
|
2014-10-23 18:33:02 +02:00
|
|
|
/* Must not be inside a query */
|
|
|
|
Assert(afterTriggers.query_depth == -1);
|
2004-09-10 20:40:09 +02:00
|
|
|
|
2004-12-07 00:57:17 +01:00
|
|
|
/*
|
|
|
|
* If there are any triggers to fire, make sure we have set a snapshot for
|
|
|
|
* them to use. (Since PortalRunUtility doesn't set a snap for COMMIT, we
|
|
|
|
* can't assume ActiveSnapshot is valid on entry.)
|
|
|
|
*/
|
2014-10-23 18:33:02 +02:00
|
|
|
events = &afterTriggers.events;
|
2005-04-11 21:51:16 +02:00
|
|
|
if (events->head != NULL)
|
2008-05-12 22:02:02 +02:00
|
|
|
{
|
|
|
|
PushActiveSnapshot(GetTransactionSnapshot());
|
|
|
|
snap_pushed = true;
|
|
|
|
}
|
2004-12-07 00:57:17 +01:00
|
|
|
|
1999-09-29 18:06:40 +02:00
|
|
|
/*
|
2007-11-16 00:23:44 +01:00
|
|
|
* Run all the remaining triggers. Loop until they are all gone, in case
|
|
|
|
* some trigger queues more for us to do.
|
1999-09-29 18:06:40 +02:00
|
|
|
*/
|
2004-09-10 20:40:09 +02:00
|
|
|
while (afterTriggerMarkEvents(events, NULL, false))
|
|
|
|
{
|
2014-10-23 18:33:02 +02:00
|
|
|
CommandId firing_id = afterTriggers.firing_counter++;
|
1999-09-29 18:06:40 +02:00
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
if (afterTriggerInvokeEvents(events, firing_id, NULL, true))
|
|
|
|
break; /* all fired */
|
2004-09-10 20:40:09 +02:00
|
|
|
}
|
1999-09-29 18:06:40 +02:00
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
/*
|
|
|
|
* We don't bother freeing the event list, since it will go away anyway
|
|
|
|
* (and more efficiently than via pfree) in AfterTriggerEndXact.
|
|
|
|
*/
|
|
|
|
|
2008-05-12 22:02:02 +02:00
|
|
|
if (snap_pushed)
|
|
|
|
PopActiveSnapshot();
|
1999-09-29 18:06:40 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/* ----------
|
2005-04-11 21:51:16 +02:00
|
|
|
* AfterTriggerEndXact()
|
|
|
|
*
|
|
|
|
* The current transaction is finishing.
|
1999-09-29 18:06:40 +02:00
|
|
|
*
|
2005-04-11 21:51:16 +02:00
|
|
|
* Any unfired triggers are canceled so we simply throw
|
1999-09-29 18:06:40 +02:00
|
|
|
* away anything we know.
|
2005-04-11 21:51:16 +02:00
|
|
|
*
|
|
|
|
* Note: it is possible for this to be called repeatedly in case of
|
|
|
|
* error during transaction abort; therefore, do not complain if
|
|
|
|
* already closed down.
|
1999-09-29 18:06:40 +02:00
|
|
|
* ----------
|
|
|
|
*/
|
|
|
|
void
|
2005-04-11 21:51:16 +02:00
|
|
|
AfterTriggerEndXact(bool isCommit)
|
1999-09-29 18:06:40 +02:00
|
|
|
{
|
2003-06-25 01:25:44 +02:00
|
|
|
/*
|
2014-10-23 18:33:02 +02:00
|
|
|
* Forget the pending-events list.
|
2004-07-01 02:52:04 +02:00
|
|
|
*
|
|
|
|
* Since all the info is in TopTransactionContext or children thereof, we
|
2006-11-23 02:14:59 +01:00
|
|
|
* don't really need to do anything to reclaim memory. However, the
|
2007-11-16 00:23:44 +01:00
|
|
|
* pending-events list could be large, and so it's useful to discard it as
|
|
|
|
* soon as possible --- especially if we are aborting because we ran out
|
|
|
|
* of memory for the list!
|
2003-06-25 01:25:44 +02:00
|
|
|
*/
|
2014-10-23 18:33:02 +02:00
|
|
|
if (afterTriggers.event_cxt)
|
|
|
|
{
|
|
|
|
MemoryContextDelete(afterTriggers.event_cxt);
|
|
|
|
afterTriggers.event_cxt = NULL;
|
|
|
|
afterTriggers.events.head = NULL;
|
|
|
|
afterTriggers.events.tail = NULL;
|
|
|
|
afterTriggers.events.tailfree = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Forget any subtransaction state as well. Since this can't be very
|
|
|
|
* large, we let the eventual reset of TopTransactionContext free the
|
|
|
|
* memory instead of doing it here.
|
|
|
|
*/
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
afterTriggers.trans_stack = NULL;
|
2014-10-23 18:33:02 +02:00
|
|
|
afterTriggers.maxtransdepth = 0;
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
2015-08-23 17:38:57 +02:00
|
|
|
* Forget the query stack and constraint-related state information. As
|
2014-10-23 18:33:02 +02:00
|
|
|
* with the subtransaction state information, we don't bother freeing the
|
|
|
|
* memory here.
|
|
|
|
*/
|
|
|
|
afterTriggers.query_stack = NULL;
|
|
|
|
afterTriggers.maxquerydepth = 0;
|
|
|
|
afterTriggers.state = NULL;
|
2006-11-23 02:14:59 +01:00
|
|
|
|
2014-10-23 18:33:02 +02:00
|
|
|
/* No more afterTriggers manipulation until next transaction starts. */
|
|
|
|
afterTriggers.query_depth = -1;
|
1999-09-29 18:06:40 +02:00
|
|
|
}
|
|
|
|
|
2004-07-01 02:52:04 +02:00
|
|
|
/*
|
2004-09-10 20:40:09 +02:00
|
|
|
* AfterTriggerBeginSubXact()
|
2004-07-01 02:52:04 +02:00
|
|
|
*
|
|
|
|
* Start a subtransaction.
|
|
|
|
*/
|
|
|
|
void
|
2004-09-10 20:40:09 +02:00
|
|
|
AfterTriggerBeginSubXact(void)
|
2004-07-01 02:52:04 +02:00
|
|
|
{
|
2004-09-07 01:33:48 +02:00
|
|
|
int my_level = GetCurrentTransactionNestLevel();
|
|
|
|
|
2004-07-01 02:52:04 +02:00
|
|
|
/*
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* Allocate more space in the trans_stack if needed. (Note: because the
|
2007-11-16 00:23:44 +01:00
|
|
|
* minimum nest level of a subtransaction is 2, we waste the first couple
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* entries of the array; not worth the notational effort to avoid it.)
|
2004-07-01 02:52:04 +02:00
|
|
|
*/
|
2014-10-23 18:33:02 +02:00
|
|
|
while (my_level >= afterTriggers.maxtransdepth)
|
2004-07-01 02:52:04 +02:00
|
|
|
{
|
2014-10-23 18:33:02 +02:00
|
|
|
if (afterTriggers.maxtransdepth == 0)
|
2004-07-01 02:52:04 +02:00
|
|
|
{
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
/* Arbitrarily initialize for max of 8 subtransaction levels */
|
|
|
|
afterTriggers.trans_stack = (AfterTriggersTransData *)
|
|
|
|
MemoryContextAlloc(TopTransactionContext,
|
|
|
|
8 * sizeof(AfterTriggersTransData));
|
|
|
|
afterTriggers.maxtransdepth = 8;
|
2004-07-01 02:52:04 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
/* repalloc will keep the stack in the same context */
|
2014-10-23 18:33:02 +02:00
|
|
|
int new_alloc = afterTriggers.maxtransdepth * 2;
|
2004-09-10 20:40:09 +02:00
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
afterTriggers.trans_stack = (AfterTriggersTransData *)
|
|
|
|
repalloc(afterTriggers.trans_stack,
|
|
|
|
new_alloc * sizeof(AfterTriggersTransData));
|
2014-10-23 18:33:02 +02:00
|
|
|
afterTriggers.maxtransdepth = new_alloc;
|
2004-07-01 02:52:04 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2004-09-10 20:40:09 +02:00
|
|
|
* Push the current information into the stack. The SET CONSTRAINTS state
|
2006-11-23 02:14:59 +01:00
|
|
|
* is not saved until/unless changed. Likewise, we don't make a
|
|
|
|
* per-subtransaction event context until needed.
|
2004-07-01 02:52:04 +02:00
|
|
|
*/
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
afterTriggers.trans_stack[my_level].state = NULL;
|
|
|
|
afterTriggers.trans_stack[my_level].events = afterTriggers.events;
|
|
|
|
afterTriggers.trans_stack[my_level].query_depth = afterTriggers.query_depth;
|
|
|
|
afterTriggers.trans_stack[my_level].firing_counter = afterTriggers.firing_counter;
|
2004-07-01 02:52:04 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2004-09-10 20:40:09 +02:00
|
|
|
* AfterTriggerEndSubXact()
|
2004-07-01 02:52:04 +02:00
|
|
|
*
|
|
|
|
* The current subtransaction is ending.
|
|
|
|
*/
|
|
|
|
void
|
2004-09-10 20:40:09 +02:00
|
|
|
AfterTriggerEndSubXact(bool isCommit)
|
2004-07-01 02:52:04 +02:00
|
|
|
{
|
2004-09-07 01:33:48 +02:00
|
|
|
int my_level = GetCurrentTransactionNestLevel();
|
2004-09-10 20:40:09 +02:00
|
|
|
SetConstraintState state;
|
|
|
|
AfterTriggerEvent event;
|
2008-10-25 01:42:35 +02:00
|
|
|
AfterTriggerEventChunk *chunk;
|
2004-09-10 20:40:09 +02:00
|
|
|
CommandId subxact_firing_id;
|
2004-07-01 02:52:04 +02:00
|
|
|
|
|
|
|
/*
|
2004-09-07 01:33:48 +02:00
|
|
|
* Pop the prior state if needed.
|
2004-07-01 02:52:04 +02:00
|
|
|
*/
|
|
|
|
if (isCommit)
|
|
|
|
{
|
2014-10-23 18:33:02 +02:00
|
|
|
Assert(my_level < afterTriggers.maxtransdepth);
|
2004-07-01 02:52:04 +02:00
|
|
|
/* If we saved a prior state, we don't need it anymore */
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
state = afterTriggers.trans_stack[my_level].state;
|
2004-07-01 02:52:04 +02:00
|
|
|
if (state != NULL)
|
|
|
|
pfree(state);
|
2004-09-07 01:33:48 +02:00
|
|
|
/* this avoids double pfree if error later: */
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
afterTriggers.trans_stack[my_level].state = NULL;
|
2014-10-23 18:33:02 +02:00
|
|
|
Assert(afterTriggers.query_depth ==
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
afterTriggers.trans_stack[my_level].query_depth);
|
2004-07-01 02:52:04 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
2010-01-24 22:49:17 +01:00
|
|
|
* Aborting. It is possible subxact start failed before calling
|
|
|
|
* AfterTriggerBeginSubXact, in which case we mustn't risk touching
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* trans_stack levels that aren't there.
|
2010-01-24 22:49:17 +01:00
|
|
|
*/
|
2014-10-23 18:33:02 +02:00
|
|
|
if (my_level >= afterTriggers.maxtransdepth)
|
2010-01-24 22:49:17 +01:00
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* Release query-level storage for queries being aborted, and restore
|
2014-03-23 07:15:52 +01:00
|
|
|
* query_depth to its pre-subxact value. This assumes that a
|
|
|
|
* subtransaction will not add events to query levels started in a
|
|
|
|
* earlier transaction state.
|
2006-11-23 02:14:59 +01:00
|
|
|
*/
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
while (afterTriggers.query_depth > afterTriggers.trans_stack[my_level].query_depth)
|
2006-11-23 02:14:59 +01:00
|
|
|
{
|
2014-10-23 18:33:02 +02:00
|
|
|
if (afterTriggers.query_depth < afterTriggers.maxquerydepth)
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
|
2014-10-23 18:33:02 +02:00
|
|
|
afterTriggers.query_depth--;
|
2006-11-23 02:14:59 +01:00
|
|
|
}
|
2014-10-23 18:33:02 +02:00
|
|
|
Assert(afterTriggers.query_depth ==
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
afterTriggers.trans_stack[my_level].query_depth);
|
2006-11-23 02:14:59 +01:00
|
|
|
|
|
|
|
/*
|
2008-10-25 01:42:35 +02:00
|
|
|
* Restore the global deferred-event list to its former length,
|
|
|
|
* discarding any events queued by the subxact.
|
2004-07-01 02:52:04 +02:00
|
|
|
*/
|
2014-10-23 18:33:02 +02:00
|
|
|
afterTriggerRestoreEventList(&afterTriggers.events,
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
&afterTriggers.trans_stack[my_level].events);
|
2004-07-01 02:52:04 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Restore the trigger state. If the saved state is NULL, then this
|
|
|
|
* subxact didn't save it, so it doesn't need restoring.
|
|
|
|
*/
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
state = afterTriggers.trans_stack[my_level].state;
|
2004-07-01 02:52:04 +02:00
|
|
|
if (state != NULL)
|
|
|
|
{
|
2014-10-23 18:33:02 +02:00
|
|
|
pfree(afterTriggers.state);
|
|
|
|
afterTriggers.state = state;
|
2004-07-01 02:52:04 +02:00
|
|
|
}
|
2004-09-07 01:33:48 +02:00
|
|
|
/* this avoids double pfree if error later: */
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
afterTriggers.trans_stack[my_level].state = NULL;
|
2004-09-10 20:40:09 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Scan for any remaining deferred events that were marked DONE or IN
|
|
|
|
* PROGRESS by this subxact or a child, and un-mark them. We can
|
|
|
|
* recognize such events because they have a firing ID greater than or
|
|
|
|
* equal to the firing_counter value we saved at subtransaction start.
|
|
|
|
* (This essentially assumes that the current subxact includes all
|
|
|
|
* subxacts started after it.)
|
|
|
|
*/
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
subxact_firing_id = afterTriggers.trans_stack[my_level].firing_counter;
|
2014-10-23 18:33:02 +02:00
|
|
|
for_each_event_chunk(event, chunk, afterTriggers.events)
|
2004-09-10 20:40:09 +02:00
|
|
|
{
|
2008-10-25 01:42:35 +02:00
|
|
|
AfterTriggerShared evtshared = GetTriggerSharedData(event);
|
|
|
|
|
|
|
|
if (event->ate_flags &
|
2004-09-10 20:40:09 +02:00
|
|
|
(AFTER_TRIGGER_DONE | AFTER_TRIGGER_IN_PROGRESS))
|
|
|
|
{
|
2008-10-25 01:42:35 +02:00
|
|
|
if (evtshared->ats_firing_id >= subxact_firing_id)
|
|
|
|
event->ate_flags &=
|
2004-09-10 20:40:09 +02:00
|
|
|
~(AFTER_TRIGGER_DONE | AFTER_TRIGGER_IN_PROGRESS);
|
|
|
|
}
|
|
|
|
}
|
2004-07-01 02:52:04 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-03-12 00:40:03 +01:00
|
|
|
/*
|
|
|
|
* Get the transition table for the given event and depending on whether we are
|
|
|
|
* processing the old or the new tuple.
|
|
|
|
*/
|
|
|
|
static Tuplestorestate *
|
|
|
|
GetAfterTriggersTransitionTable(int event,
|
|
|
|
TupleTableSlot *oldslot,
|
|
|
|
TupleTableSlot *newslot,
|
|
|
|
TransitionCaptureState *transition_capture)
|
|
|
|
{
|
|
|
|
Tuplestorestate *tuplestore = NULL;
|
|
|
|
bool delete_old_table = transition_capture->tcs_delete_old_table;
|
|
|
|
bool update_old_table = transition_capture->tcs_update_old_table;
|
|
|
|
bool update_new_table = transition_capture->tcs_update_new_table;
|
|
|
|
bool insert_new_table = transition_capture->tcs_insert_new_table;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For INSERT events NEW should be non-NULL, for DELETE events OLD should
|
|
|
|
* be non-NULL, whereas for UPDATE events normally both OLD and NEW are
|
|
|
|
* non-NULL. But for UPDATE events fired for capturing transition tuples
|
|
|
|
* during UPDATE partition-key row movement, OLD is NULL when the event is
|
|
|
|
* for a row being inserted, whereas NEW is NULL when the event is for a
|
|
|
|
* row being deleted.
|
|
|
|
*/
|
|
|
|
Assert(!(event == TRIGGER_EVENT_DELETE && delete_old_table &&
|
|
|
|
TupIsNull(oldslot)));
|
|
|
|
Assert(!(event == TRIGGER_EVENT_INSERT && insert_new_table &&
|
|
|
|
TupIsNull(newslot)));
|
|
|
|
|
|
|
|
if (!TupIsNull(oldslot))
|
|
|
|
{
|
|
|
|
Assert(TupIsNull(newslot));
|
|
|
|
if (event == TRIGGER_EVENT_DELETE && delete_old_table)
|
2022-03-28 16:45:58 +02:00
|
|
|
tuplestore = transition_capture->tcs_private->old_del_tuplestore;
|
2022-03-12 00:40:03 +01:00
|
|
|
else if (event == TRIGGER_EVENT_UPDATE && update_old_table)
|
2022-03-28 16:45:58 +02:00
|
|
|
tuplestore = transition_capture->tcs_private->old_upd_tuplestore;
|
2022-03-12 00:40:03 +01:00
|
|
|
}
|
|
|
|
else if (!TupIsNull(newslot))
|
|
|
|
{
|
|
|
|
Assert(TupIsNull(oldslot));
|
|
|
|
if (event == TRIGGER_EVENT_INSERT && insert_new_table)
|
2022-03-28 16:45:58 +02:00
|
|
|
tuplestore = transition_capture->tcs_private->new_ins_tuplestore;
|
2022-03-12 00:40:03 +01:00
|
|
|
else if (event == TRIGGER_EVENT_UPDATE && update_new_table)
|
2022-03-28 16:45:58 +02:00
|
|
|
tuplestore = transition_capture->tcs_private->new_upd_tuplestore;
|
2022-03-12 00:40:03 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
return tuplestore;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Add the given heap tuple to the given tuplestore, applying the conversion
|
|
|
|
* map if necessary.
|
|
|
|
*
|
|
|
|
* If original_insert_tuple is given, we can add that tuple without conversion.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
TransitionTableAddTuple(EState *estate,
|
|
|
|
TransitionCaptureState *transition_capture,
|
|
|
|
ResultRelInfo *relinfo,
|
|
|
|
TupleTableSlot *slot,
|
|
|
|
TupleTableSlot *original_insert_tuple,
|
|
|
|
Tuplestorestate *tuplestore)
|
|
|
|
{
|
|
|
|
TupleConversionMap *map;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Nothing needs to be done if we don't have a tuplestore.
|
|
|
|
*/
|
|
|
|
if (tuplestore == NULL)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (original_insert_tuple)
|
|
|
|
tuplestore_puttupleslot(tuplestore, original_insert_tuple);
|
|
|
|
else if ((map = ExecGetChildToRootMap(relinfo)) != NULL)
|
|
|
|
{
|
|
|
|
AfterTriggersTableData *table = transition_capture->tcs_private;
|
|
|
|
TupleTableSlot *storeslot;
|
|
|
|
|
|
|
|
storeslot = GetAfterTriggersStoreSlot(table, map->outdesc);
|
|
|
|
execute_attr_map_slot(map->attrMap, slot, storeslot);
|
|
|
|
tuplestore_puttupleslot(tuplestore, storeslot);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
tuplestore_puttupleslot(tuplestore, slot);
|
|
|
|
}
|
|
|
|
|
2014-10-23 18:33:02 +02:00
|
|
|
/* ----------
|
|
|
|
* AfterTriggerEnlargeQueryState()
|
|
|
|
*
|
|
|
|
* Prepare the necessary state so that we can record AFTER trigger events
|
|
|
|
* queued by a query. It is allowed to have nested queries within a
|
|
|
|
* (sub)transaction, so we need to have separate state for each query
|
|
|
|
* nesting level.
|
|
|
|
* ----------
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
AfterTriggerEnlargeQueryState(void)
|
|
|
|
{
|
|
|
|
int init_depth = afterTriggers.maxquerydepth;
|
|
|
|
|
|
|
|
Assert(afterTriggers.query_depth >= afterTriggers.maxquerydepth);
|
|
|
|
|
|
|
|
if (afterTriggers.maxquerydepth == 0)
|
|
|
|
{
|
2014-10-24 14:17:00 +02:00
|
|
|
int new_alloc = Max(afterTriggers.query_depth + 1, 8);
|
2014-10-23 18:33:02 +02:00
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
afterTriggers.query_stack = (AfterTriggersQueryData *)
|
2014-10-23 18:33:02 +02:00
|
|
|
MemoryContextAlloc(TopTransactionContext,
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
new_alloc * sizeof(AfterTriggersQueryData));
|
2014-10-23 18:33:02 +02:00
|
|
|
afterTriggers.maxquerydepth = new_alloc;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* repalloc will keep the stack in the same context */
|
|
|
|
int old_alloc = afterTriggers.maxquerydepth;
|
2014-10-24 14:17:00 +02:00
|
|
|
int new_alloc = Max(afterTriggers.query_depth + 1,
|
|
|
|
old_alloc * 2);
|
2014-10-23 18:33:02 +02:00
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
afterTriggers.query_stack = (AfterTriggersQueryData *)
|
2014-10-23 18:33:02 +02:00
|
|
|
repalloc(afterTriggers.query_stack,
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
new_alloc * sizeof(AfterTriggersQueryData));
|
2014-10-23 18:33:02 +02:00
|
|
|
afterTriggers.maxquerydepth = new_alloc;
|
|
|
|
}
|
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
/* Initialize new array entries to empty */
|
2014-10-23 18:33:02 +02:00
|
|
|
while (init_depth < afterTriggers.maxquerydepth)
|
|
|
|
{
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
AfterTriggersQueryData *qs = &afterTriggers.query_stack[init_depth];
|
2014-10-23 18:33:02 +02:00
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
qs->events.head = NULL;
|
|
|
|
qs->events.tail = NULL;
|
|
|
|
qs->events.tailfree = NULL;
|
|
|
|
qs->fdw_tuplestore = NULL;
|
|
|
|
qs->tables = NIL;
|
2014-10-23 18:33:02 +02:00
|
|
|
|
|
|
|
++init_depth;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2004-07-01 02:52:04 +02:00
|
|
|
/*
|
2004-09-10 20:40:09 +02:00
|
|
|
* Create an empty SetConstraintState with room for numalloc trigstates
|
2004-07-01 02:52:04 +02:00
|
|
|
*/
|
2004-09-10 20:40:09 +02:00
|
|
|
static SetConstraintState
|
|
|
|
SetConstraintStateCreate(int numalloc)
|
2004-07-01 02:52:04 +02:00
|
|
|
{
|
2004-09-10 20:40:09 +02:00
|
|
|
SetConstraintState state;
|
2004-07-01 02:52:04 +02:00
|
|
|
|
|
|
|
/* Behave sanely with numalloc == 0 */
|
|
|
|
if (numalloc <= 0)
|
|
|
|
numalloc = 1;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We assume that zeroing will correctly initialize the state values.
|
|
|
|
*/
|
2004-09-10 20:40:09 +02:00
|
|
|
state = (SetConstraintState)
|
2004-07-01 02:52:04 +02:00
|
|
|
MemoryContextAllocZero(TopTransactionContext,
|
2015-02-20 23:32:01 +01:00
|
|
|
offsetof(SetConstraintStateData, trigstates) +
|
|
|
|
numalloc * sizeof(SetConstraintTriggerData));
|
2004-07-01 02:52:04 +02:00
|
|
|
|
|
|
|
state->numalloc = numalloc;
|
|
|
|
|
|
|
|
return state;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2004-09-10 20:40:09 +02:00
|
|
|
* Copy a SetConstraintState
|
2004-07-01 02:52:04 +02:00
|
|
|
*/
|
2004-09-10 20:40:09 +02:00
|
|
|
static SetConstraintState
|
|
|
|
SetConstraintStateCopy(SetConstraintState origstate)
|
2004-07-01 02:52:04 +02:00
|
|
|
{
|
2004-09-10 20:40:09 +02:00
|
|
|
SetConstraintState state;
|
2004-07-01 02:52:04 +02:00
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
state = SetConstraintStateCreate(origstate->numstates);
|
2004-07-01 02:52:04 +02:00
|
|
|
|
|
|
|
state->all_isset = origstate->all_isset;
|
|
|
|
state->all_isdeferred = origstate->all_isdeferred;
|
|
|
|
state->numstates = origstate->numstates;
|
|
|
|
memcpy(state->trigstates, origstate->trigstates,
|
2004-09-10 20:40:09 +02:00
|
|
|
origstate->numstates * sizeof(SetConstraintTriggerData));
|
2004-07-01 02:52:04 +02:00
|
|
|
|
|
|
|
return state;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2004-09-10 20:40:09 +02:00
|
|
|
* Add a per-trigger item to a SetConstraintState. Returns possibly-changed
|
2004-07-01 02:52:04 +02:00
|
|
|
* pointer to the state object (it will change if we have to repalloc).
|
|
|
|
*/
|
2004-09-10 20:40:09 +02:00
|
|
|
static SetConstraintState
|
|
|
|
SetConstraintStateAddItem(SetConstraintState state,
|
|
|
|
Oid tgoid, bool tgisdeferred)
|
2004-07-01 02:52:04 +02:00
|
|
|
{
|
|
|
|
if (state->numstates >= state->numalloc)
|
|
|
|
{
|
|
|
|
int newalloc = state->numalloc * 2;
|
|
|
|
|
|
|
|
newalloc = Max(newalloc, 8); /* in case original has size 0 */
|
2004-09-10 20:40:09 +02:00
|
|
|
state = (SetConstraintState)
|
2004-07-01 02:52:04 +02:00
|
|
|
repalloc(state,
|
2015-02-20 23:32:01 +01:00
|
|
|
offsetof(SetConstraintStateData, trigstates) +
|
|
|
|
newalloc * sizeof(SetConstraintTriggerData));
|
2004-07-01 02:52:04 +02:00
|
|
|
state->numalloc = newalloc;
|
|
|
|
Assert(state->numstates < state->numalloc);
|
|
|
|
}
|
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
state->trigstates[state->numstates].sct_tgoid = tgoid;
|
|
|
|
state->trigstates[state->numstates].sct_tgisdeferred = tgisdeferred;
|
2004-07-01 02:52:04 +02:00
|
|
|
state->numstates++;
|
|
|
|
|
|
|
|
return state;
|
|
|
|
}
|
1999-09-29 18:06:40 +02:00
|
|
|
|
|
|
|
/* ----------
|
2004-09-10 20:40:09 +02:00
|
|
|
* AfterTriggerSetState()
|
1999-09-29 18:06:40 +02:00
|
|
|
*
|
2004-09-10 20:40:09 +02:00
|
|
|
* Execute the SET CONSTRAINTS ... utility command.
|
1999-09-29 18:06:40 +02:00
|
|
|
* ----------
|
|
|
|
*/
|
|
|
|
void
|
2004-09-10 20:40:09 +02:00
|
|
|
AfterTriggerSetState(ConstraintsSetStmt *stmt)
|
1999-09-29 18:06:40 +02:00
|
|
|
{
|
2004-09-07 01:33:48 +02:00
|
|
|
int my_level = GetCurrentTransactionNestLevel();
|
|
|
|
|
2014-10-23 18:33:02 +02:00
|
|
|
/* If we haven't already done so, initialize our state. */
|
|
|
|
if (afterTriggers.state == NULL)
|
|
|
|
afterTriggers.state = SetConstraintStateCreate(8);
|
1999-09-29 18:06:40 +02:00
|
|
|
|
2004-07-01 02:52:04 +02:00
|
|
|
/*
|
|
|
|
* If in a subtransaction, and we didn't save the current state already,
|
|
|
|
* save it so it can be restored if the subtransaction aborts.
|
|
|
|
*/
|
2004-09-07 01:33:48 +02:00
|
|
|
if (my_level > 1 &&
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
afterTriggers.trans_stack[my_level].state == NULL)
|
2004-07-01 02:52:04 +02:00
|
|
|
{
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
afterTriggers.trans_stack[my_level].state =
|
2014-10-23 18:33:02 +02:00
|
|
|
SetConstraintStateCopy(afterTriggers.state);
|
2004-07-01 02:52:04 +02:00
|
|
|
}
|
|
|
|
|
1999-09-29 18:06:40 +02:00
|
|
|
/*
|
|
|
|
* Handle SET CONSTRAINTS ALL ...
|
|
|
|
*/
|
|
|
|
if (stmt->constraints == NIL)
|
|
|
|
{
|
2002-08-17 14:15:49 +02:00
|
|
|
/*
|
2004-09-09 01:47:58 +02:00
|
|
|
* Forget any previous SET CONSTRAINTS commands in this transaction.
|
2002-08-17 14:15:49 +02:00
|
|
|
*/
|
2014-10-23 18:33:02 +02:00
|
|
|
afterTriggers.state->numstates = 0;
|
1999-09-29 18:06:40 +02:00
|
|
|
|
|
|
|
/*
|
2002-08-17 14:15:49 +02:00
|
|
|
* Set the per-transaction ALL state to known.
|
1999-09-29 18:06:40 +02:00
|
|
|
*/
|
2014-10-23 18:33:02 +02:00
|
|
|
afterTriggers.state->all_isset = true;
|
|
|
|
afterTriggers.state->all_isdeferred = stmt->deferred;
|
2002-08-17 14:15:49 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2010-01-17 23:56:23 +01:00
|
|
|
Relation conrel;
|
2002-08-17 14:15:49 +02:00
|
|
|
Relation tgrel;
|
2010-01-17 23:56:23 +01:00
|
|
|
List *conoidlist = NIL;
|
|
|
|
List *tgoidlist = NIL;
|
|
|
|
ListCell *lc;
|
2002-08-17 14:15:49 +02:00
|
|
|
|
2010-01-17 23:56:23 +01:00
|
|
|
/*
|
2002-08-17 14:15:49 +02:00
|
|
|
* Handle SET CONSTRAINTS constraint-name [, ...]
|
2010-01-17 23:56:23 +01:00
|
|
|
*
|
|
|
|
* First, identify all the named constraints and make a list of their
|
|
|
|
* OIDs. Since, unlike the SQL spec, we allow multiple constraints of
|
|
|
|
* the same name within a schema, the specifications are not
|
|
|
|
* necessarily unique. Our strategy is to target all matching
|
|
|
|
* constraints within the first search-path schema that has any
|
|
|
|
* matches, but disregard matches in schemas beyond the first match.
|
|
|
|
* (This is a bit odd but it's the historical behavior.)
|
2018-03-23 14:48:22 +01:00
|
|
|
*
|
|
|
|
* A constraint in a partitioned table may have corresponding
|
|
|
|
* constraints in the partitions. Grab those too.
|
1999-09-29 18:06:40 +02:00
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
conrel = table_open(ConstraintRelationId, AccessShareLock);
|
1999-09-29 18:06:40 +02:00
|
|
|
|
2010-01-17 23:56:23 +01:00
|
|
|
foreach(lc, stmt->constraints)
|
2002-02-19 21:11:20 +01:00
|
|
|
{
|
2010-01-17 23:56:23 +01:00
|
|
|
RangeVar *constraint = lfirst(lc);
|
2004-07-01 02:52:04 +02:00
|
|
|
bool found;
|
2010-01-17 23:56:23 +01:00
|
|
|
List *namespacelist;
|
|
|
|
ListCell *nslc;
|
1999-09-29 18:06:40 +02:00
|
|
|
|
2006-04-27 02:33:46 +02:00
|
|
|
if (constraint->catalogname)
|
|
|
|
{
|
|
|
|
if (strcmp(constraint->catalogname, get_database_name(MyDatabaseId)) != 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("cross-database references are not implemented: \"%s.%s.%s\"",
|
|
|
|
constraint->catalogname, constraint->schemaname,
|
|
|
|
constraint->relname)));
|
|
|
|
}
|
2002-02-19 21:11:20 +01:00
|
|
|
|
2006-04-27 02:33:46 +02:00
|
|
|
/*
|
|
|
|
* If we're given the schema name with the constraint, look only
|
|
|
|
* in that schema. If given a bare constraint name, use the
|
|
|
|
* search path to find the first matching constraint.
|
2002-08-17 14:15:49 +02:00
|
|
|
*/
|
2006-04-27 02:33:46 +02:00
|
|
|
if (constraint->schemaname)
|
|
|
|
{
|
2013-01-26 19:24:50 +01:00
|
|
|
Oid namespaceId = LookupExplicitNamespace(constraint->schemaname,
|
|
|
|
false);
|
2006-10-04 02:30:14 +02:00
|
|
|
|
2010-01-17 23:56:23 +01:00
|
|
|
namespacelist = list_make1_oid(namespaceId);
|
2006-04-27 02:33:46 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2010-01-17 23:56:23 +01:00
|
|
|
namespacelist = fetch_search_path(true);
|
2006-04-27 02:33:46 +02:00
|
|
|
}
|
1999-09-29 18:06:40 +02:00
|
|
|
|
|
|
|
found = false;
|
2010-01-17 23:56:23 +01:00
|
|
|
foreach(nslc, namespacelist)
|
1999-09-29 18:06:40 +02:00
|
|
|
{
|
2010-01-17 23:56:23 +01:00
|
|
|
Oid namespaceId = lfirst_oid(nslc);
|
|
|
|
SysScanDesc conscan;
|
|
|
|
ScanKeyData skey[2];
|
|
|
|
HeapTuple tup;
|
1999-09-29 18:06:40 +02:00
|
|
|
|
2010-01-17 23:56:23 +01:00
|
|
|
ScanKeyInit(&skey[0],
|
|
|
|
Anum_pg_constraint_conname,
|
2006-04-27 02:33:46 +02:00
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
2010-01-17 23:56:23 +01:00
|
|
|
CStringGetDatum(constraint->relname));
|
|
|
|
ScanKeyInit(&skey[1],
|
|
|
|
Anum_pg_constraint_connamespace,
|
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(namespaceId));
|
2006-04-27 02:33:46 +02:00
|
|
|
|
2010-01-17 23:56:23 +01:00
|
|
|
conscan = systable_beginscan(conrel, ConstraintNameNspIndexId,
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
true, NULL, 2, skey);
|
2006-04-27 02:33:46 +02:00
|
|
|
|
2010-01-17 23:56:23 +01:00
|
|
|
while (HeapTupleIsValid(tup = systable_getnext(conscan)))
|
2004-09-09 01:47:58 +02:00
|
|
|
{
|
2010-01-17 23:56:23 +01:00
|
|
|
Form_pg_constraint con = (Form_pg_constraint) GETSTRUCT(tup);
|
|
|
|
|
|
|
|
if (con->condeferrable)
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
conoidlist = lappend_oid(conoidlist, con->oid);
|
2010-01-17 23:56:23 +01:00
|
|
|
else if (stmt->deferred)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
|
|
|
errmsg("constraint \"%s\" is not deferrable",
|
|
|
|
constraint->relname)));
|
2006-04-27 02:33:46 +02:00
|
|
|
found = true;
|
2004-09-09 01:47:58 +02:00
|
|
|
}
|
2006-04-27 02:33:46 +02:00
|
|
|
|
2010-01-17 23:56:23 +01:00
|
|
|
systable_endscan(conscan);
|
2006-04-27 02:33:46 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Once we've found a matching constraint we do not search
|
|
|
|
* later parts of the search path.
|
|
|
|
*/
|
|
|
|
if (found)
|
|
|
|
break;
|
1999-09-29 18:06:40 +02:00
|
|
|
}
|
|
|
|
|
2010-01-17 23:56:23 +01:00
|
|
|
list_free(namespacelist);
|
2002-08-17 14:15:49 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Not found ?
|
|
|
|
*/
|
|
|
|
if (!found)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
2004-09-09 01:47:58 +02:00
|
|
|
errmsg("constraint \"%s\" does not exist",
|
2006-04-27 02:33:46 +02:00
|
|
|
constraint->relname)));
|
2002-08-17 14:15:49 +02:00
|
|
|
}
|
2010-01-17 23:56:23 +01:00
|
|
|
|
2018-03-23 14:48:22 +01:00
|
|
|
/*
|
|
|
|
* Scan for any possible descendants of the constraints. We append
|
|
|
|
* whatever we find to the same list that we're scanning; this has the
|
|
|
|
* effect that we create new scans for those, too, so if there are
|
|
|
|
* further descendents, we'll also catch them.
|
|
|
|
*/
|
|
|
|
foreach(lc, conoidlist)
|
|
|
|
{
|
|
|
|
Oid parent = lfirst_oid(lc);
|
|
|
|
ScanKeyData key;
|
|
|
|
SysScanDesc scan;
|
|
|
|
HeapTuple tuple;
|
|
|
|
|
|
|
|
ScanKeyInit(&key,
|
|
|
|
Anum_pg_constraint_conparentid,
|
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(parent));
|
|
|
|
|
|
|
|
scan = systable_beginscan(conrel, ConstraintParentIndexId, true, NULL, 1, &key);
|
|
|
|
|
|
|
|
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
{
|
|
|
|
Form_pg_constraint con = (Form_pg_constraint) GETSTRUCT(tuple);
|
|
|
|
|
|
|
|
conoidlist = lappend_oid(conoidlist, con->oid);
|
|
|
|
}
|
2018-03-23 14:48:22 +01:00
|
|
|
|
|
|
|
systable_endscan(scan);
|
|
|
|
}
|
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(conrel, AccessShareLock);
|
2010-01-17 23:56:23 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Now, locate the trigger(s) implementing each of these constraints,
|
|
|
|
* and make a list of their OIDs.
|
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
tgrel = table_open(TriggerRelationId, AccessShareLock);
|
2010-01-17 23:56:23 +01:00
|
|
|
|
|
|
|
foreach(lc, conoidlist)
|
|
|
|
{
|
|
|
|
Oid conoid = lfirst_oid(lc);
|
|
|
|
ScanKeyData skey;
|
|
|
|
SysScanDesc tgscan;
|
|
|
|
HeapTuple htup;
|
|
|
|
|
|
|
|
ScanKeyInit(&skey,
|
|
|
|
Anum_pg_trigger_tgconstraint,
|
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(conoid));
|
|
|
|
|
|
|
|
tgscan = systable_beginscan(tgrel, TriggerConstraintIndexId, true,
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
NULL, 1, &skey);
|
2010-01-17 23:56:23 +01:00
|
|
|
|
|
|
|
while (HeapTupleIsValid(htup = systable_getnext(tgscan)))
|
|
|
|
{
|
|
|
|
Form_pg_trigger pg_trigger = (Form_pg_trigger) GETSTRUCT(htup);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Silently skip triggers that are marked as non-deferrable in
|
|
|
|
* pg_trigger. This is not an error condition, since a
|
|
|
|
* deferrable RI constraint may have some non-deferrable
|
|
|
|
* actions.
|
|
|
|
*/
|
|
|
|
if (pg_trigger->tgdeferrable)
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
tgoidlist = lappend_oid(tgoidlist, pg_trigger->oid);
|
2010-01-17 23:56:23 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
systable_endscan(tgscan);
|
|
|
|
}
|
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(tgrel, AccessShareLock);
|
1999-09-29 18:06:40 +02:00
|
|
|
|
|
|
|
/*
|
2010-01-17 23:56:23 +01:00
|
|
|
* Now we can set the trigger states of individual triggers for this
|
|
|
|
* xact.
|
1999-09-29 18:06:40 +02:00
|
|
|
*/
|
2010-01-17 23:56:23 +01:00
|
|
|
foreach(lc, tgoidlist)
|
1999-09-29 18:06:40 +02:00
|
|
|
{
|
2010-01-17 23:56:23 +01:00
|
|
|
Oid tgoid = lfirst_oid(lc);
|
2014-10-23 18:33:02 +02:00
|
|
|
SetConstraintState state = afterTriggers.state;
|
2004-07-01 02:52:04 +02:00
|
|
|
bool found = false;
|
|
|
|
int i;
|
|
|
|
|
2004-09-09 01:47:58 +02:00
|
|
|
for (i = 0; i < state->numstates; i++)
|
1999-09-29 18:06:40 +02:00
|
|
|
{
|
2004-09-10 20:40:09 +02:00
|
|
|
if (state->trigstates[i].sct_tgoid == tgoid)
|
1999-09-29 18:06:40 +02:00
|
|
|
{
|
2004-09-10 20:40:09 +02:00
|
|
|
state->trigstates[i].sct_tgisdeferred = stmt->deferred;
|
1999-09-29 18:06:40 +02:00
|
|
|
found = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (!found)
|
|
|
|
{
|
2014-10-23 18:33:02 +02:00
|
|
|
afterTriggers.state =
|
2004-09-10 20:40:09 +02:00
|
|
|
SetConstraintStateAddItem(state, tgoid, stmt->deferred);
|
1999-09-29 18:06:40 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2002-08-17 14:15:49 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* SQL99 requires that when a constraint is set to IMMEDIATE, any deferred
|
|
|
|
* checks against that constraint must be made when the SET CONSTRAINTS
|
|
|
|
* command is executed -- i.e. the effects of the SET CONSTRAINTS command
|
2004-09-10 20:40:09 +02:00
|
|
|
* apply retroactively. We've updated the constraints state, so scan the
|
|
|
|
* list of previously deferred events to fire any that have now become
|
|
|
|
* immediate.
|
|
|
|
*
|
|
|
|
* Obviously, if this was SET ... DEFERRED then it can't have converted
|
|
|
|
* any unfired events to immediate, so we need do nothing in that case.
|
2002-08-17 14:15:49 +02:00
|
|
|
*/
|
2004-09-10 20:40:09 +02:00
|
|
|
if (!stmt->deferred)
|
|
|
|
{
|
2014-10-23 18:33:02 +02:00
|
|
|
AfterTriggerEventList *events = &afterTriggers.events;
|
2008-12-13 03:00:20 +01:00
|
|
|
bool snapshot_set = false;
|
2004-09-10 20:40:09 +02:00
|
|
|
|
2007-08-15 21:15:47 +02:00
|
|
|
while (afterTriggerMarkEvents(events, NULL, true))
|
2004-09-10 20:40:09 +02:00
|
|
|
{
|
2014-10-23 18:33:02 +02:00
|
|
|
CommandId firing_id = afterTriggers.firing_counter++;
|
2004-09-10 20:40:09 +02:00
|
|
|
|
2008-12-13 03:00:20 +01:00
|
|
|
/*
|
|
|
|
* Make sure a snapshot has been established in case trigger
|
|
|
|
* functions need one. Note that we avoid setting a snapshot if
|
|
|
|
* we don't find at least one trigger that has to be fired now.
|
|
|
|
* This is so that BEGIN; SET CONSTRAINTS ...; SET TRANSACTION
|
|
|
|
* ISOLATION LEVEL SERIALIZABLE; ... works properly. (If we are
|
|
|
|
* at the start of a transaction it's not possible for any trigger
|
|
|
|
* events to be queued yet.)
|
|
|
|
*/
|
|
|
|
if (!snapshot_set)
|
|
|
|
{
|
|
|
|
PushActiveSnapshot(GetTransactionSnapshot());
|
|
|
|
snapshot_set = true;
|
|
|
|
}
|
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
/*
|
|
|
|
* We can delete fired events if we are at top transaction level,
|
|
|
|
* but we'd better not if inside a subtransaction, since the
|
|
|
|
* subtransaction could later get rolled back.
|
|
|
|
*/
|
2008-10-25 01:42:35 +02:00
|
|
|
if (afterTriggerInvokeEvents(events, firing_id, NULL,
|
|
|
|
!IsSubTransaction()))
|
|
|
|
break; /* all fired */
|
2004-09-10 20:40:09 +02:00
|
|
|
}
|
2008-12-13 03:00:20 +01:00
|
|
|
|
|
|
|
if (snapshot_set)
|
|
|
|
PopActiveSnapshot();
|
2004-09-10 20:40:09 +02:00
|
|
|
}
|
1999-09-29 18:06:40 +02:00
|
|
|
}
|
|
|
|
|
2006-09-04 23:15:56 +02:00
|
|
|
/* ----------
|
2008-01-03 00:34:42 +01:00
|
|
|
* AfterTriggerPendingOnRel()
|
|
|
|
* Test to see if there are any pending after-trigger events for rel.
|
2006-09-04 23:15:56 +02:00
|
|
|
*
|
2008-01-03 00:34:42 +01:00
|
|
|
* This is used by TRUNCATE, CLUSTER, ALTER TABLE, etc to detect whether
|
|
|
|
* it is unsafe to perform major surgery on a relation. Note that only
|
|
|
|
* local pending events are examined. We assume that having exclusive lock
|
|
|
|
* on a rel guarantees there are no unserviced events in other backends ---
|
|
|
|
* but having a lock does not prevent there being such events in our own.
|
2006-09-04 23:15:56 +02:00
|
|
|
*
|
|
|
|
* In some scenarios it'd be reasonable to remove pending events (more
|
|
|
|
* specifically, mark them DONE by the current subxact) but without a lot
|
|
|
|
* of knowledge of the trigger semantics we can't do this in general.
|
|
|
|
* ----------
|
|
|
|
*/
|
2008-01-03 00:34:42 +01:00
|
|
|
bool
|
|
|
|
AfterTriggerPendingOnRel(Oid relid)
|
2006-09-04 23:15:56 +02:00
|
|
|
{
|
|
|
|
AfterTriggerEvent event;
|
2008-10-25 01:42:35 +02:00
|
|
|
AfterTriggerEventChunk *chunk;
|
2006-09-04 23:15:56 +02:00
|
|
|
int depth;
|
|
|
|
|
|
|
|
/* Scan queued events */
|
2014-10-23 18:33:02 +02:00
|
|
|
for_each_event_chunk(event, chunk, afterTriggers.events)
|
2006-09-04 23:15:56 +02:00
|
|
|
{
|
2008-10-25 01:42:35 +02:00
|
|
|
AfterTriggerShared evtshared = GetTriggerSharedData(event);
|
|
|
|
|
2006-09-04 23:15:56 +02:00
|
|
|
/*
|
|
|
|
* We can ignore completed events. (Even if a DONE flag is rolled
|
|
|
|
* back by subxact abort, it's OK because the effects of the TRUNCATE
|
2008-01-03 00:34:42 +01:00
|
|
|
* or whatever must get rolled back too.)
|
2006-09-04 23:15:56 +02:00
|
|
|
*/
|
2008-10-25 01:42:35 +02:00
|
|
|
if (event->ate_flags & AFTER_TRIGGER_DONE)
|
2006-09-04 23:15:56 +02:00
|
|
|
continue;
|
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
if (evtshared->ats_relid == relid)
|
2008-01-03 00:34:42 +01:00
|
|
|
return true;
|
2006-09-04 23:15:56 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Also scan events queued by incomplete queries. This could only matter
|
2008-01-03 00:34:42 +01:00
|
|
|
* if TRUNCATE/etc is executed by a function or trigger within an updating
|
2006-09-04 23:15:56 +02:00
|
|
|
* query on the same relation, which is pretty perverse, but let's check.
|
|
|
|
*/
|
2014-11-10 21:19:56 +01:00
|
|
|
for (depth = 0; depth <= afterTriggers.query_depth && depth < afterTriggers.maxquerydepth; depth++)
|
2006-09-04 23:15:56 +02:00
|
|
|
{
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
for_each_event_chunk(event, chunk, afterTriggers.query_stack[depth].events)
|
2006-09-04 23:15:56 +02:00
|
|
|
{
|
2008-10-25 01:42:35 +02:00
|
|
|
AfterTriggerShared evtshared = GetTriggerSharedData(event);
|
|
|
|
|
|
|
|
if (event->ate_flags & AFTER_TRIGGER_DONE)
|
2006-09-04 23:15:56 +02:00
|
|
|
continue;
|
|
|
|
|
2008-10-25 01:42:35 +02:00
|
|
|
if (evtshared->ats_relid == relid)
|
2008-01-03 00:34:42 +01:00
|
|
|
return true;
|
2006-09-04 23:15:56 +02:00
|
|
|
}
|
|
|
|
}
|
2008-01-03 00:34:42 +01:00
|
|
|
|
|
|
|
return false;
|
2006-09-04 23:15:56 +02:00
|
|
|
}
|
|
|
|
|
1999-09-29 18:06:40 +02:00
|
|
|
/* ----------
|
2004-09-10 20:40:09 +02:00
|
|
|
* AfterTriggerSaveEvent()
|
1999-09-29 18:06:40 +02:00
|
|
|
*
|
2009-07-29 22:56:21 +02:00
|
|
|
* Called by ExecA[RS]...Triggers() to queue up the triggers that should
|
|
|
|
* be fired for an event.
|
2001-01-27 06:16:58 +01:00
|
|
|
*
|
2009-07-29 22:56:21 +02:00
|
|
|
* NOTE: this is called whenever there are any triggers associated with
|
|
|
|
* the event (even if they are disabled). This function decides which
|
2016-11-04 16:49:50 +01:00
|
|
|
* triggers actually need to be queued. It is also called after each row,
|
|
|
|
* even if there are no triggers for that event, if there are any AFTER
|
|
|
|
* STATEMENT triggers for the statement which use transition tables, so that
|
Allow UPDATE to move rows between partitions.
When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint. In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one. This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented. (There is a pending patch to improve the
situation further, but it needs more review.)
Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me. A few final revisions by me.
Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
2018-01-19 21:33:06 +01:00
|
|
|
* the transition tuplestores can be built. Furthermore, if the transition
|
|
|
|
* capture is happening for UPDATEd rows being moved to another partition due
|
|
|
|
* to the partition-key being changed, then this function is called once when
|
|
|
|
* the row is deleted (to capture OLD row), and once when the row is inserted
|
|
|
|
* into another partition (to capture NEW row). This is done separately because
|
|
|
|
* DELETE and INSERT happen on different tables.
|
2016-11-04 16:49:50 +01:00
|
|
|
*
|
|
|
|
* Transition tuplestores are built now, rather than when events are pulled
|
|
|
|
* off of the queue because AFTER ROW triggers are allowed to select from the
|
|
|
|
* transition tables for the statement.
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
*
|
|
|
|
* This contains special support to queue the update events for the case where
|
|
|
|
* a partitioned table undergoing a cross-partition update may have foreign
|
|
|
|
* keys pointing into it. Normally, a partitioned table's row triggers are
|
|
|
|
* not fired because the leaf partition(s) which are modified as a result of
|
|
|
|
* the operation on the partitioned table contain the same triggers which are
|
|
|
|
* fired instead. But that general scheme can cause problematic behavior with
|
|
|
|
* foreign key triggers during cross-partition updates, which are implemented
|
|
|
|
* as DELETE on the source partition followed by INSERT into the destination
|
|
|
|
* partition. Specifically, firing DELETE triggers would lead to the wrong
|
|
|
|
* foreign key action to be enforced considering that the original command is
|
|
|
|
* UPDATE; in this case, this function is called with relinfo as the
|
|
|
|
* partitioned table, and src_partinfo and dst_partinfo referring to the
|
|
|
|
* source and target leaf partitions, respectively.
|
|
|
|
*
|
|
|
|
* is_crosspart_update is true either when a DELETE event is fired on the
|
|
|
|
* source partition (which is to be ignored) or an UPDATE event is fired on
|
|
|
|
* the root partitioned table.
|
1999-09-29 18:06:40 +02:00
|
|
|
* ----------
|
|
|
|
*/
|
2000-06-09 00:38:00 +02:00
|
|
|
static void
|
2009-11-20 21:38:12 +01:00
|
|
|
AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
ResultRelInfo *src_partinfo,
|
|
|
|
ResultRelInfo *dst_partinfo,
|
2009-11-20 21:38:12 +01:00
|
|
|
int event, bool row_trigger,
|
2019-02-27 05:30:28 +01:00
|
|
|
TupleTableSlot *oldslot, TupleTableSlot *newslot,
|
2017-06-28 19:55:03 +02:00
|
|
|
List *recheckIndexes, Bitmapset *modifiedCols,
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
TransitionCaptureState *transition_capture,
|
|
|
|
bool is_crosspart_update)
|
1999-09-29 18:06:40 +02:00
|
|
|
{
|
2001-06-01 04:41:36 +02:00
|
|
|
Relation rel = relinfo->ri_RelationDesc;
|
|
|
|
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
|
2008-10-25 01:42:35 +02:00
|
|
|
AfterTriggerEventData new_event;
|
|
|
|
AfterTriggerSharedData new_shared;
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
char relkind = rel->rd_rel->relkind;
|
2010-10-10 19:43:33 +02:00
|
|
|
int tgtype_event;
|
|
|
|
int tgtype_level;
|
1999-09-29 18:06:40 +02:00
|
|
|
int i;
|
2014-03-23 07:16:34 +01:00
|
|
|
Tuplestorestate *fdw_tuplestore = NULL;
|
1999-09-29 18:06:40 +02:00
|
|
|
|
2009-10-27 21:14:27 +01:00
|
|
|
/*
|
2014-10-23 18:33:02 +02:00
|
|
|
* Check state. We use a normal test not Assert because it is possible to
|
2009-10-27 21:14:27 +01:00
|
|
|
* reach here in the wrong state given misconfigured RI triggers, in
|
|
|
|
* particular deferring a cascade action trigger.
|
|
|
|
*/
|
2014-10-23 18:33:02 +02:00
|
|
|
if (afterTriggers.query_depth < 0)
|
2009-10-27 21:14:27 +01:00
|
|
|
elog(ERROR, "AfterTriggerSaveEvent() called outside of query");
|
1999-09-29 18:06:40 +02:00
|
|
|
|
2014-10-23 18:33:02 +02:00
|
|
|
/* Be sure we have enough space to record events at this query depth. */
|
|
|
|
if (afterTriggers.query_depth >= afterTriggers.maxquerydepth)
|
|
|
|
AfterTriggerEnlargeQueryState();
|
|
|
|
|
2016-11-04 16:49:50 +01:00
|
|
|
/*
|
2017-06-28 19:59:01 +02:00
|
|
|
* If the directly named relation has any triggers with transition tables,
|
|
|
|
* then we need to capture transition tuples.
|
2016-11-04 16:49:50 +01:00
|
|
|
*/
|
2017-06-28 19:59:01 +02:00
|
|
|
if (row_trigger && transition_capture != NULL)
|
2016-11-04 16:49:50 +01:00
|
|
|
{
|
2022-03-28 16:45:58 +02:00
|
|
|
TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple;
|
2017-06-28 19:55:03 +02:00
|
|
|
|
Allow UPDATE to move rows between partitions.
When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint. In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one. This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented. (There is a pending patch to improve the
situation further, but it needs more review.)
Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me. A few final revisions by me.
Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
2018-01-19 21:33:06 +01:00
|
|
|
/*
|
2022-03-12 00:40:03 +01:00
|
|
|
* Capture the old tuple in the appropriate transition table based on
|
|
|
|
* the event.
|
Allow UPDATE to move rows between partitions.
When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint. In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one. This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented. (There is a pending patch to improve the
situation further, but it needs more review.)
Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me. A few final revisions by me.
Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
2018-01-19 21:33:06 +01:00
|
|
|
*/
|
2022-03-12 00:40:03 +01:00
|
|
|
if (!TupIsNull(oldslot))
|
2016-11-04 16:49:50 +01:00
|
|
|
{
|
2018-04-12 12:22:56 +02:00
|
|
|
Tuplestorestate *old_tuplestore;
|
MERGE SQL Command following SQL:2016
MERGE performs actions that modify rows in the target table
using a source table or query. MERGE provides a single SQL
statement that can conditionally INSERT/UPDATE/DELETE rows
a task that would other require multiple PL statements.
e.g.
MERGE INTO target AS t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED AND t.balance > s.delta THEN
UPDATE SET balance = t.balance - s.delta
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED AND s.delta > 0 THEN
INSERT VALUES (s.sid, s.delta)
WHEN NOT MATCHED THEN
DO NOTHING;
MERGE works with regular and partitioned tables, including
column and row security enforcement, as well as support for
row, statement and transition triggers.
MERGE is optimized for OLTP and is parameterizable, though
also useful for large scale ETL/ELT. MERGE is not intended
to be used in preference to existing single SQL commands
for INSERT, UPDATE or DELETE since there is some overhead.
MERGE can be used statically from PL/pgSQL.
MERGE does not yet support inheritance, write rules,
RETURNING clauses, updatable views or foreign tables.
MERGE follows SQL Standard per the most recent SQL:2016.
Includes full tests and documentation, including full
isolation tests to demonstrate the concurrent behavior.
This version written from scratch in 2017 by Simon Riggs,
using docs and tests originally written in 2009. Later work
from Pavan Deolasee has been both complex and deep, leaving
the lead author credit now in his hands.
Extensive discussion of concurrency from Peter Geoghegan,
with thanks for the time and effort contributed.
Various issues reported via sqlsmith by Andreas Seltenreich
Authors: Pavan Deolasee, Simon Riggs
Reviewer: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs
Discussion:
https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com
https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
2018-04-03 10:28:16 +02:00
|
|
|
|
2022-03-12 00:40:03 +01:00
|
|
|
old_tuplestore = GetAfterTriggersTransitionTable(event,
|
|
|
|
oldslot,
|
|
|
|
NULL,
|
|
|
|
transition_capture);
|
|
|
|
TransitionTableAddTuple(estate, transition_capture, relinfo,
|
|
|
|
oldslot, NULL, old_tuplestore);
|
2018-04-12 12:22:56 +02:00
|
|
|
}
|
2022-03-12 00:40:03 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Capture the new tuple in the appropriate transition table based on
|
|
|
|
* the event.
|
|
|
|
*/
|
|
|
|
if (!TupIsNull(newslot))
|
2016-11-04 16:49:50 +01:00
|
|
|
{
|
2018-04-12 12:22:56 +02:00
|
|
|
Tuplestorestate *new_tuplestore;
|
|
|
|
|
2022-03-12 00:40:03 +01:00
|
|
|
new_tuplestore = GetAfterTriggersTransitionTable(event,
|
|
|
|
NULL,
|
|
|
|
newslot,
|
|
|
|
transition_capture);
|
|
|
|
TransitionTableAddTuple(estate, transition_capture, relinfo,
|
2022-03-28 16:45:58 +02:00
|
|
|
newslot, original_insert_tuple, new_tuplestore);
|
2016-11-04 16:49:50 +01:00
|
|
|
}
|
|
|
|
|
Allow UPDATE to move rows between partitions.
When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint. In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one. This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented. (There is a pending patch to improve the
situation further, but it needs more review.)
Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me. A few final revisions by me.
Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
2018-01-19 21:33:06 +01:00
|
|
|
/*
|
|
|
|
* If transition tables are the only reason we're here, return. As
|
|
|
|
* mentioned above, we can also be here during update tuple routing in
|
|
|
|
* presence of transition tables, in which case this function is
|
2022-03-28 16:45:58 +02:00
|
|
|
* called separately for OLD and NEW, so we expect exactly one of them
|
|
|
|
* to be NULL.
|
Allow UPDATE to move rows between partitions.
When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint. In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one. This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented. (There is a pending patch to improve the
situation further, but it needs more review.)
Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me. A few final revisions by me.
Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
2018-01-19 21:33:06 +01:00
|
|
|
*/
|
2017-06-28 19:55:03 +02:00
|
|
|
if (trigdesc == NULL ||
|
|
|
|
(event == TRIGGER_EVENT_DELETE && !trigdesc->trig_delete_after_row) ||
|
2016-11-04 16:49:50 +01:00
|
|
|
(event == TRIGGER_EVENT_INSERT && !trigdesc->trig_insert_after_row) ||
|
Allow UPDATE to move rows between partitions.
When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint. In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one. This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented. (There is a pending patch to improve the
situation further, but it needs more review.)
Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me. A few final revisions by me.
Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
2018-01-19 21:33:06 +01:00
|
|
|
(event == TRIGGER_EVENT_UPDATE && !trigdesc->trig_update_after_row) ||
|
2019-02-27 05:30:28 +01:00
|
|
|
(event == TRIGGER_EVENT_UPDATE && (TupIsNull(oldslot) ^ TupIsNull(newslot))))
|
2016-11-04 16:49:50 +01:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
/*
|
|
|
|
* We normally don't see partitioned tables here for row level triggers
|
|
|
|
* except in the special case of a cross-partition update. In that case,
|
|
|
|
* nodeModifyTable.c:ExecCrossPartitionUpdateForeignKey() calls here to
|
|
|
|
* queue an update event on the root target partitioned table, also
|
|
|
|
* passing the source and destination partitions and their tuples.
|
|
|
|
*/
|
|
|
|
Assert(!row_trigger ||
|
|
|
|
rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE ||
|
|
|
|
(is_crosspart_update &&
|
|
|
|
TRIGGER_FIRED_BY_UPDATE(event) &&
|
|
|
|
src_partinfo != NULL && dst_partinfo != NULL));
|
|
|
|
|
2008-03-28 01:21:56 +01:00
|
|
|
/*
|
2008-10-25 01:42:35 +02:00
|
|
|
* Validate the event code and collect the associated tuple CTIDs.
|
|
|
|
*
|
|
|
|
* The event code will be used both as a bitmask and an array offset, so
|
|
|
|
* validation is important to make sure we don't walk off the edge of our
|
|
|
|
* arrays.
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
*
|
|
|
|
* Also, if we're considering statement-level triggers, check whether we
|
|
|
|
* already queued a set of them for this event, and cancel the prior set
|
|
|
|
* if so. This preserves the behavior that statement-level triggers fire
|
|
|
|
* just once per statement and fire after row-level triggers.
|
1999-09-29 18:06:40 +02:00
|
|
|
*/
|
2008-10-25 01:42:35 +02:00
|
|
|
switch (event)
|
|
|
|
{
|
|
|
|
case TRIGGER_EVENT_INSERT:
|
2010-10-10 19:43:33 +02:00
|
|
|
tgtype_event = TRIGGER_TYPE_INSERT;
|
2008-10-25 01:42:35 +02:00
|
|
|
if (row_trigger)
|
|
|
|
{
|
2019-02-27 05:30:28 +01:00
|
|
|
Assert(oldslot == NULL);
|
|
|
|
Assert(newslot != NULL);
|
|
|
|
ItemPointerCopy(&(newslot->tts_tid), &(new_event.ate_ctid1));
|
2008-10-25 01:42:35 +02:00
|
|
|
ItemPointerSetInvalid(&(new_event.ate_ctid2));
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2019-02-27 05:30:28 +01:00
|
|
|
Assert(oldslot == NULL);
|
|
|
|
Assert(newslot == NULL);
|
2008-10-25 01:42:35 +02:00
|
|
|
ItemPointerSetInvalid(&(new_event.ate_ctid1));
|
|
|
|
ItemPointerSetInvalid(&(new_event.ate_ctid2));
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
cancel_prior_stmt_triggers(RelationGetRelid(rel),
|
|
|
|
CMD_INSERT, event);
|
2008-10-25 01:42:35 +02:00
|
|
|
}
|
|
|
|
break;
|
|
|
|
case TRIGGER_EVENT_DELETE:
|
2010-10-10 19:43:33 +02:00
|
|
|
tgtype_event = TRIGGER_TYPE_DELETE;
|
2008-10-25 01:42:35 +02:00
|
|
|
if (row_trigger)
|
|
|
|
{
|
2019-02-27 05:30:28 +01:00
|
|
|
Assert(oldslot != NULL);
|
|
|
|
Assert(newslot == NULL);
|
|
|
|
ItemPointerCopy(&(oldslot->tts_tid), &(new_event.ate_ctid1));
|
2008-10-25 01:42:35 +02:00
|
|
|
ItemPointerSetInvalid(&(new_event.ate_ctid2));
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2019-02-27 05:30:28 +01:00
|
|
|
Assert(oldslot == NULL);
|
|
|
|
Assert(newslot == NULL);
|
2008-10-25 01:42:35 +02:00
|
|
|
ItemPointerSetInvalid(&(new_event.ate_ctid1));
|
|
|
|
ItemPointerSetInvalid(&(new_event.ate_ctid2));
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
cancel_prior_stmt_triggers(RelationGetRelid(rel),
|
|
|
|
CMD_DELETE, event);
|
2008-10-25 01:42:35 +02:00
|
|
|
}
|
|
|
|
break;
|
|
|
|
case TRIGGER_EVENT_UPDATE:
|
2010-10-10 19:43:33 +02:00
|
|
|
tgtype_event = TRIGGER_TYPE_UPDATE;
|
2008-10-25 01:42:35 +02:00
|
|
|
if (row_trigger)
|
|
|
|
{
|
2019-02-27 05:30:28 +01:00
|
|
|
Assert(oldslot != NULL);
|
|
|
|
Assert(newslot != NULL);
|
|
|
|
ItemPointerCopy(&(oldslot->tts_tid), &(new_event.ate_ctid1));
|
|
|
|
ItemPointerCopy(&(newslot->tts_tid), &(new_event.ate_ctid2));
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Also remember the OIDs of partitions to fetch these tuples
|
|
|
|
* out of later in AfterTriggerExecute().
|
|
|
|
*/
|
|
|
|
if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
|
|
|
|
{
|
|
|
|
Assert(src_partinfo != NULL && dst_partinfo != NULL);
|
|
|
|
new_event.ate_src_part =
|
|
|
|
RelationGetRelid(src_partinfo->ri_RelationDesc);
|
|
|
|
new_event.ate_dst_part =
|
|
|
|
RelationGetRelid(dst_partinfo->ri_RelationDesc);
|
|
|
|
}
|
2008-10-25 01:42:35 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2019-02-27 05:30:28 +01:00
|
|
|
Assert(oldslot == NULL);
|
|
|
|
Assert(newslot == NULL);
|
2008-10-25 01:42:35 +02:00
|
|
|
ItemPointerSetInvalid(&(new_event.ate_ctid1));
|
|
|
|
ItemPointerSetInvalid(&(new_event.ate_ctid2));
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
cancel_prior_stmt_triggers(RelationGetRelid(rel),
|
|
|
|
CMD_UPDATE, event);
|
2008-10-25 01:42:35 +02:00
|
|
|
}
|
|
|
|
break;
|
|
|
|
case TRIGGER_EVENT_TRUNCATE:
|
2010-10-10 19:43:33 +02:00
|
|
|
tgtype_event = TRIGGER_TYPE_TRUNCATE;
|
2019-02-27 05:30:28 +01:00
|
|
|
Assert(oldslot == NULL);
|
|
|
|
Assert(newslot == NULL);
|
2008-10-25 01:42:35 +02:00
|
|
|
ItemPointerSetInvalid(&(new_event.ate_ctid1));
|
|
|
|
ItemPointerSetInvalid(&(new_event.ate_ctid2));
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
elog(ERROR, "invalid after-trigger event code: %d", event);
|
2010-10-10 19:43:33 +02:00
|
|
|
tgtype_event = 0; /* keep compiler quiet */
|
2008-10-25 01:42:35 +02:00
|
|
|
break;
|
|
|
|
}
|
1999-09-29 18:06:40 +02:00
|
|
|
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
/* Determine flags */
|
2014-03-23 07:16:34 +01:00
|
|
|
if (!(relkind == RELKIND_FOREIGN_TABLE && row_trigger))
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
{
|
|
|
|
if (row_trigger && event == TRIGGER_EVENT_UPDATE)
|
|
|
|
{
|
|
|
|
if (relkind == RELKIND_PARTITIONED_TABLE)
|
|
|
|
new_event.ate_flags = AFTER_TRIGGER_CP_UPDATE;
|
|
|
|
else
|
|
|
|
new_event.ate_flags = AFTER_TRIGGER_2CTID;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
new_event.ate_flags = AFTER_TRIGGER_1CTID;
|
|
|
|
}
|
|
|
|
|
2014-03-23 07:16:34 +01:00
|
|
|
/* else, we'll initialize ate_flags for each trigger */
|
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
tgtype_level = (row_trigger ? TRIGGER_TYPE_ROW : TRIGGER_TYPE_STATEMENT);
|
2002-11-23 04:59:09 +01:00
|
|
|
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
/*
|
|
|
|
* Must convert/copy the source and destination partition tuples into the
|
|
|
|
* root partitioned table's format/slot, because the processing in the
|
|
|
|
* loop below expects both oldslot and newslot tuples to be in that form.
|
|
|
|
*/
|
|
|
|
if (row_trigger && rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
|
|
|
|
{
|
|
|
|
TupleTableSlot *rootslot;
|
|
|
|
TupleConversionMap *map;
|
|
|
|
|
|
|
|
rootslot = ExecGetTriggerOldSlot(estate, relinfo);
|
|
|
|
map = ExecGetChildToRootMap(src_partinfo);
|
|
|
|
if (map)
|
|
|
|
oldslot = execute_attr_map_slot(map->attrMap,
|
|
|
|
oldslot,
|
|
|
|
rootslot);
|
|
|
|
else
|
|
|
|
oldslot = ExecCopySlot(rootslot, oldslot);
|
|
|
|
|
|
|
|
rootslot = ExecGetTriggerNewSlot(estate, relinfo);
|
|
|
|
map = ExecGetChildToRootMap(dst_partinfo);
|
|
|
|
if (map)
|
|
|
|
newslot = execute_attr_map_slot(map->attrMap,
|
|
|
|
newslot,
|
|
|
|
rootslot);
|
|
|
|
else
|
|
|
|
newslot = ExecCopySlot(rootslot, newslot);
|
|
|
|
}
|
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
for (i = 0; i < trigdesc->numtriggers; i++)
|
2003-01-08 23:28:32 +01:00
|
|
|
{
|
2010-10-10 19:43:33 +02:00
|
|
|
Trigger *trigger = &trigdesc->triggers[i];
|
2003-01-08 23:28:32 +01:00
|
|
|
|
2010-10-10 19:43:33 +02:00
|
|
|
if (!TRIGGER_TYPE_MATCHES(trigger->tgtype,
|
|
|
|
tgtype_level,
|
|
|
|
TRIGGER_TYPE_AFTER,
|
|
|
|
tgtype_event))
|
|
|
|
continue;
|
2009-11-20 21:38:12 +01:00
|
|
|
if (!TriggerEnabled(estate, relinfo, trigger, event,
|
2019-02-27 05:30:28 +01:00
|
|
|
modifiedCols, oldslot, newslot))
|
2009-10-15 00:14:25 +02:00
|
|
|
continue;
|
2003-01-08 23:28:32 +01:00
|
|
|
|
2014-03-23 07:16:34 +01:00
|
|
|
if (relkind == RELKIND_FOREIGN_TABLE && row_trigger)
|
|
|
|
{
|
|
|
|
if (fdw_tuplestore == NULL)
|
|
|
|
{
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
fdw_tuplestore = GetCurrentFDWTuplestore();
|
2014-03-23 07:16:34 +01:00
|
|
|
new_event.ate_flags = AFTER_TRIGGER_FDW_FETCH;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
/* subsequent event for the same tuple */
|
|
|
|
new_event.ate_flags = AFTER_TRIGGER_FDW_REUSE;
|
|
|
|
}
|
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
/*
|
2012-06-20 02:07:08 +02:00
|
|
|
* If the trigger is a foreign key enforcement trigger, there are
|
|
|
|
* certain cases where we can skip queueing the event because we can
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
* tell by inspection that the FK constraint will still pass. There
|
|
|
|
* are also some cases during cross-partition updates of a partitioned
|
|
|
|
* table where queuing the event can be skipped.
|
2004-09-10 20:40:09 +02:00
|
|
|
*/
|
2018-07-19 08:37:32 +02:00
|
|
|
if (TRIGGER_FIRED_BY_UPDATE(event) || TRIGGER_FIRED_BY_DELETE(event))
|
2004-09-10 20:40:09 +02:00
|
|
|
{
|
2005-05-30 09:20:59 +02:00
|
|
|
switch (RI_FKey_trigger_type(trigger->tgfoid))
|
2004-09-10 20:40:09 +02:00
|
|
|
{
|
2005-05-30 09:20:59 +02:00
|
|
|
case RI_TRIGGER_PK:
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* For cross-partitioned updates of partitioned PK table,
|
|
|
|
* skip the event fired by the component delete on the
|
|
|
|
* source leaf partition unless the constraint originates
|
|
|
|
* in the partition itself (!tgisclone), because the
|
|
|
|
* update event that will be fired on the root
|
|
|
|
* (partitioned) target table will be used to perform the
|
|
|
|
* necessary foreign key enforcement action.
|
|
|
|
*/
|
|
|
|
if (is_crosspart_update &&
|
|
|
|
TRIGGER_FIRED_BY_DELETE(event) &&
|
|
|
|
trigger->tgisclone)
|
|
|
|
continue;
|
|
|
|
|
2018-07-19 08:37:32 +02:00
|
|
|
/* Update or delete on trigger's PK table */
|
2012-06-20 02:07:08 +02:00
|
|
|
if (!RI_FKey_pk_upd_check_required(trigger, rel,
|
2019-02-27 05:30:28 +01:00
|
|
|
oldslot, newslot))
|
2005-05-30 09:20:59 +02:00
|
|
|
{
|
2012-06-20 02:07:08 +02:00
|
|
|
/* skip queuing this event */
|
2005-05-30 09:20:59 +02:00
|
|
|
continue;
|
|
|
|
}
|
2004-09-10 20:40:09 +02:00
|
|
|
break;
|
1999-09-29 18:06:40 +02:00
|
|
|
|
2005-05-30 09:20:59 +02:00
|
|
|
case RI_TRIGGER_FK:
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Update on trigger's FK table. We can skip the update
|
|
|
|
* event fired on a partitioned table during a
|
|
|
|
* cross-partition of that table, because the insert event
|
|
|
|
* that is fired on the destination leaf partition would
|
|
|
|
* suffice to perform the necessary foreign key check.
|
|
|
|
* Moreover, RI_FKey_fk_upd_check_required() expects to be
|
|
|
|
* passed a tuple that contains system attributes, most of
|
|
|
|
* which are not present in the virtual slot belonging to
|
|
|
|
* a partitioned table.
|
|
|
|
*/
|
|
|
|
if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ||
|
|
|
|
!RI_FKey_fk_upd_check_required(trigger, rel,
|
2019-02-27 05:30:28 +01:00
|
|
|
oldslot, newslot))
|
2005-05-30 09:20:59 +02:00
|
|
|
{
|
2012-06-20 02:07:08 +02:00
|
|
|
/* skip queuing this event */
|
2005-05-30 09:20:59 +02:00
|
|
|
continue;
|
|
|
|
}
|
2004-09-10 20:40:09 +02:00
|
|
|
break;
|
2002-09-04 22:31:48 +02:00
|
|
|
|
2005-05-30 09:20:59 +02:00
|
|
|
case RI_TRIGGER_NONE:
|
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6af (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f178441044b (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf40, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Not an FK trigger. No need to queue the update event
|
|
|
|
* fired during a cross-partitioned update of a
|
|
|
|
* partitioned table, because the same row trigger must be
|
|
|
|
* present in the leaf partition(s) that are affected as
|
|
|
|
* part of this update and the events fired on them are
|
|
|
|
* queued instead.
|
|
|
|
*/
|
|
|
|
if (row_trigger &&
|
|
|
|
rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
|
|
|
|
continue;
|
2005-05-30 09:20:59 +02:00
|
|
|
break;
|
1999-09-29 18:06:40 +02:00
|
|
|
}
|
2004-09-10 20:40:09 +02:00
|
|
|
}
|
1999-09-29 18:06:40 +02:00
|
|
|
|
2009-07-29 22:56:21 +02:00
|
|
|
/*
|
|
|
|
* If the trigger is a deferred unique constraint check trigger, only
|
|
|
|
* queue it if the unique constraint was potentially violated, which
|
|
|
|
* we know from index insertion time.
|
|
|
|
*/
|
|
|
|
if (trigger->tgfoid == F_UNIQUE_KEY_RECHECK)
|
|
|
|
{
|
|
|
|
if (!list_member_oid(recheckIndexes, trigger->tgconstrindid))
|
|
|
|
continue; /* Uniqueness definitely not violated */
|
|
|
|
}
|
|
|
|
|
2004-09-10 20:40:09 +02:00
|
|
|
/*
|
2008-10-25 01:42:35 +02:00
|
|
|
* Fill in event structure and add it to the current query's queue.
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
* Note we set ats_table to NULL whenever this trigger doesn't use
|
|
|
|
* transition tables, to improve sharability of the shared event data.
|
2004-09-10 20:40:09 +02:00
|
|
|
*/
|
2008-10-25 01:42:35 +02:00
|
|
|
new_shared.ats_event =
|
2004-09-10 20:40:09 +02:00
|
|
|
(event & TRIGGER_EVENT_OPMASK) |
|
|
|
|
(row_trigger ? TRIGGER_EVENT_ROW : 0) |
|
|
|
|
(trigger->tgdeferrable ? AFTER_TRIGGER_DEFERRABLE : 0) |
|
|
|
|
(trigger->tginitdeferred ? AFTER_TRIGGER_INITDEFERRED : 0);
|
2008-10-25 01:42:35 +02:00
|
|
|
new_shared.ats_tgoid = trigger->tgoid;
|
|
|
|
new_shared.ats_relid = RelationGetRelid(rel);
|
|
|
|
new_shared.ats_firing_id = 0;
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
if ((trigger->tgoldtable || trigger->tgnewtable) &&
|
|
|
|
transition_capture != NULL)
|
|
|
|
new_shared.ats_table = transition_capture->tcs_private;
|
|
|
|
else
|
|
|
|
new_shared.ats_table = NULL;
|
2020-03-09 09:22:22 +01:00
|
|
|
new_shared.ats_modifiedcols = modifiedCols;
|
1999-09-29 18:06:40 +02:00
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
afterTriggerAddEvent(&afterTriggers.query_stack[afterTriggers.query_depth].events,
|
2008-10-25 01:42:35 +02:00
|
|
|
&new_event, &new_shared);
|
1999-09-29 18:06:40 +02:00
|
|
|
}
|
2014-03-23 07:16:34 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Finally, spool any foreign tuple(s). The tuplestore squashes them to
|
|
|
|
* minimal tuples, so this loses any system columns. The executor lost
|
|
|
|
* those columns before us, for an unrelated reason, so this is fine.
|
|
|
|
*/
|
|
|
|
if (fdw_tuplestore)
|
|
|
|
{
|
2019-02-27 05:30:28 +01:00
|
|
|
if (oldslot != NULL)
|
|
|
|
tuplestore_puttupleslot(fdw_tuplestore, oldslot);
|
|
|
|
if (newslot != NULL)
|
|
|
|
tuplestore_puttupleslot(fdw_tuplestore, newslot);
|
2014-03-23 07:16:34 +01:00
|
|
|
}
|
1999-09-29 18:06:40 +02:00
|
|
|
}
|
2012-01-25 17:15:29 +01:00
|
|
|
|
2017-09-17 18:16:38 +02:00
|
|
|
/*
|
|
|
|
* Detect whether we already queued BEFORE STATEMENT triggers for the given
|
|
|
|
* relation + operation, and set the flag so the next call will report "true".
|
|
|
|
*/
|
|
|
|
static bool
|
|
|
|
before_stmt_triggers_fired(Oid relid, CmdType cmdType)
|
|
|
|
{
|
|
|
|
bool result;
|
|
|
|
AfterTriggersTableData *table;
|
|
|
|
|
|
|
|
/* Check state, like AfterTriggerSaveEvent. */
|
|
|
|
if (afterTriggers.query_depth < 0)
|
|
|
|
elog(ERROR, "before_stmt_triggers_fired() called outside of query");
|
|
|
|
|
|
|
|
/* Be sure we have enough space to record events at this query depth. */
|
|
|
|
if (afterTriggers.query_depth >= afterTriggers.maxquerydepth)
|
|
|
|
AfterTriggerEnlargeQueryState();
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We keep this state in the AfterTriggersTableData that also holds
|
|
|
|
* transition tables for the relation + operation. In this way, if we are
|
|
|
|
* forced to make a new set of transition tables because more tuples get
|
|
|
|
* entered after we've already fired triggers, we will allow a new set of
|
|
|
|
* statement triggers to get queued.
|
|
|
|
*/
|
|
|
|
table = GetAfterTriggersTableData(relid, cmdType);
|
|
|
|
result = table->before_trig_done;
|
|
|
|
table->before_trig_done = true;
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
/*
|
|
|
|
* If we previously queued a set of AFTER STATEMENT triggers for the given
|
|
|
|
* relation + operation, and they've not been fired yet, cancel them. The
|
|
|
|
* caller will queue a fresh set that's after any row-level triggers that may
|
|
|
|
* have been queued by the current sub-statement, preserving (as much as
|
|
|
|
* possible) the property that AFTER ROW triggers fire before AFTER STATEMENT
|
|
|
|
* triggers, and that the latter only fire once. This deals with the
|
|
|
|
* situation where several FK enforcement triggers sequentially queue triggers
|
|
|
|
* for the same table into the same trigger query level. We can't fully
|
|
|
|
* prevent odd behavior though: if there are AFTER ROW triggers taking
|
|
|
|
* transition tables, we don't want to change the transition tables once the
|
|
|
|
* first such trigger has seen them. In such a case, any additional events
|
|
|
|
* will result in creating new transition tables and allowing new firings of
|
|
|
|
* statement triggers.
|
|
|
|
*
|
|
|
|
* This also saves the current event list location so that a later invocation
|
|
|
|
* of this function can cheaply find the triggers we're about to queue and
|
|
|
|
* cancel them.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent)
|
|
|
|
{
|
|
|
|
AfterTriggersTableData *table;
|
|
|
|
AfterTriggersQueryData *qs = &afterTriggers.query_stack[afterTriggers.query_depth];
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We keep this state in the AfterTriggersTableData that also holds
|
|
|
|
* transition tables for the relation + operation. In this way, if we are
|
|
|
|
* forced to make a new set of transition tables because more tuples get
|
|
|
|
* entered after we've already fired triggers, we will allow a new set of
|
|
|
|
* statement triggers to get queued without canceling the old ones.
|
|
|
|
*/
|
|
|
|
table = GetAfterTriggersTableData(relid, cmdType);
|
|
|
|
|
2017-09-17 18:16:38 +02:00
|
|
|
if (table->after_trig_done)
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* We want to start scanning from the tail location that existed just
|
|
|
|
* before we inserted any statement triggers. But the events list
|
|
|
|
* might've been entirely empty then, in which case scan from the
|
|
|
|
* current head.
|
|
|
|
*/
|
|
|
|
AfterTriggerEvent event;
|
|
|
|
AfterTriggerEventChunk *chunk;
|
|
|
|
|
2017-09-17 18:16:38 +02:00
|
|
|
if (table->after_trig_events.tail)
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
{
|
2017-09-17 18:16:38 +02:00
|
|
|
chunk = table->after_trig_events.tail;
|
|
|
|
event = (AfterTriggerEvent) table->after_trig_events.tailfree;
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
chunk = qs->events.head;
|
|
|
|
event = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
for_each_chunk_from(chunk)
|
|
|
|
{
|
|
|
|
if (event == NULL)
|
|
|
|
event = (AfterTriggerEvent) CHUNK_DATA_START(chunk);
|
|
|
|
for_each_event_from(event, chunk)
|
|
|
|
{
|
|
|
|
AfterTriggerShared evtshared = GetTriggerSharedData(event);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Exit loop when we reach events that aren't AS triggers for
|
|
|
|
* the target relation.
|
|
|
|
*/
|
|
|
|
if (evtshared->ats_relid != relid)
|
|
|
|
goto done;
|
|
|
|
if ((evtshared->ats_event & TRIGGER_EVENT_OPMASK) != tgevent)
|
|
|
|
goto done;
|
|
|
|
if (!TRIGGER_FIRED_FOR_STATEMENT(evtshared->ats_event))
|
|
|
|
goto done;
|
|
|
|
if (!TRIGGER_FIRED_AFTER(evtshared->ats_event))
|
|
|
|
goto done;
|
|
|
|
/* OK, mark it DONE */
|
|
|
|
event->ate_flags &= ~AFTER_TRIGGER_IN_PROGRESS;
|
|
|
|
event->ate_flags |= AFTER_TRIGGER_DONE;
|
|
|
|
}
|
|
|
|
/* signal we must reinitialize event ptr for next chunk */
|
|
|
|
event = NULL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
done:
|
|
|
|
|
|
|
|
/* In any case, save current insertion point for next time */
|
2017-09-17 18:16:38 +02:00
|
|
|
table->after_trig_done = true;
|
|
|
|
table->after_trig_events = qs->events;
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
}
|
|
|
|
|
Split up guc.c for better build speed and ease of maintenance.
guc.c has grown to be one of our largest .c files, making it
a bottleneck for compilation. It's also acquired a bunch of
knowledge that'd be better kept elsewhere, because of our not
very good habit of putting variable-specific check hooks here.
Hence, split it up along these lines:
* guc.c itself retains just the core GUC housekeeping mechanisms.
* New file guc_funcs.c contains the SET/SHOW interfaces and some
SQL-accessible functions for GUC manipulation.
* New file guc_tables.c contains the data arrays that define the
built-in GUC variables, along with some already-exported constant
tables.
* GUC check/assign/show hook functions are moved to the variable's
home module, whenever that's clearly identifiable. A few hard-
to-classify hooks ended up in commands/variable.c, which was
already a home for miscellaneous GUC hook functions.
To avoid cluttering a lot more header files with #include "guc.h",
I also invented a new header file utils/guc_hooks.h and put all
the GUC hook functions' declarations there, regardless of their
originating module. That allowed removal of #include "guc.h"
from some existing headers. The fallout from that (hopefully
all caught here) demonstrates clearly why such inclusions are
best minimized: there are a lot of files that, for example,
were getting array.h at two or more levels of remove, despite
not having any connection at all to GUCs in themselves.
There is some very minor code beautification here, such as
renaming a couple of inconsistently-named hook functions
and improving some comments. But mostly this just moves
code from point A to point B and deals with the ensuing
needs for #include adjustments and exporting a few functions
that previously weren't exported.
Patch by me, per a suggestion from Andres Freund; thanks also
to Michael Paquier for the idea to invent guc_funcs.c.
Discussion: https://postgr.es/m/587607.1662836699@sss.pgh.pa.us
2022-09-13 17:05:07 +02:00
|
|
|
/*
|
|
|
|
* GUC assign_hook for session_replication_role
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
assign_session_replication_role(int newval, void *extra)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Must flush the plan cache when changing replication role; but don't
|
|
|
|
* flush unnecessarily.
|
|
|
|
*/
|
|
|
|
if (SessionReplicationRole != newval)
|
|
|
|
ResetPlanCache();
|
|
|
|
}
|
|
|
|
|
Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 19:20:32 +02:00
|
|
|
/*
|
|
|
|
* SQL function pg_trigger_depth()
|
|
|
|
*/
|
2012-01-25 17:15:29 +01:00
|
|
|
Datum
|
|
|
|
pg_trigger_depth(PG_FUNCTION_ARGS)
|
|
|
|
{
|
|
|
|
PG_RETURN_INT32(MyTriggerDepth);
|
|
|
|
}
|