Commit Graph

466 Commits

Author SHA1 Message Date
Alvaro Herrera 71f4c8c6f7
ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY
Allow a partition be detached from its partitioned table without
blocking concurrent queries, by running in two transactions and only
requiring ShareUpdateExclusive in the partitioned table.

Because it runs in two transactions, it cannot be used in a transaction
block.  This is the main reason to use dedicated syntax: so that users
can choose to use the original mode if they need it.  But also, it
doesn't work when a default partition exists (because an exclusive lock
would still need to be obtained on it, in order to change its partition
constraint.)

In case the second transaction is cancelled or a crash occurs, there's
ALTER TABLE .. DETACH PARTITION .. FINALIZE, which executes the final
steps.

The main trick to make this work is the addition of column
pg_inherits.inhdetachpending, initially false; can only be set true in
the first part of this command.  Once that is committed, concurrent
transactions that use a PartitionDirectory will include or ignore
partitions so marked: in optimizer they are ignored if the row is marked
committed for the snapshot; in executor they are always included.  As a
result, and because of the way PartitionDirectory caches partition
descriptors, queries that were planned before the detach will see the
rows in the detached partition and queries that are planned after the
detach, won't.

A CHECK constraint is created that duplicates the partition constraint.
This is probably not strictly necessary, and some users will prefer to
remove it afterwards, but if the partition is re-attached to a
partitioned table, the constraint needn't be rechecked.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20200803234854.GA24158@alvherre.pgsql
2021-03-25 18:00:28 -03:00
Alvaro Herrera 25936fd46c
Fix use-after-free bug with AfterTriggersTableData.storeslot
AfterTriggerSaveEvent() wrongly allocates the slot in execution-span
memory context, whereas the correct thing is to allocate it in
a transaction-span context, because that's where the enclosing
AfterTriggersTableData instance belongs into.

Backpatch to 12 (the test back to 11, where it works well with no code
changes, and it's good to have to confirm that the case was previously
well supported); this bug seems introduced by commit ff11e7f4b9.

Reported-by: Bertrand Drouvot <bdrouvot@amazon.com>
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/39a71864-b120-5a5c-8cc5-c632b6f16761@amazon.com
2021-02-27 18:09:15 -03:00
Heikki Linnakangas 6214e2b228 Fix permission checks on constraint violation errors on partitions.
If a cross-partition UPDATE violates a constraint on the target partition,
and the columns in the new partition are in different physical order than
in the parent, the error message can reveal columns that the user does not
have SELECT permission on. A similar bug was fixed earlier in commit
804b6b6db4.

The cause of the bug is that the callers of the
ExecBuildSlotValueDescription() function got confused when constructing
the list of modified columns. If the tuple was routed from a parent, we
converted the tuple to the parent's format, but the list of modified
columns was grabbed directly from the child's RTE entry.

ExecUpdateLockMode() had a similar issue. That lead to confusion on which
columns are key columns, leading to wrong tuple lock being taken on tables
referenced by foreign keys, when a row is updated with INSERT ON CONFLICT
UPDATE. A new isolation test is added for that corner case.

With this patch, the ri_RangeTableIndex field is no longer set for
partitions that don't have an entry in the range table. Previously, it was
set to the RTE entry of the parent relation, but that was confusing.

NOTE: This modifies the ResultRelInfo struct, replacing the
ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to
backpatch, because it breaks any extensions accessing the field. The
change that ri_RangeTableIndex is not set for partitions could potentially
break extensions, too. The ResultRelInfos are visible to FDWs at least,
and this patch required small changes to postgres_fdw. Nevertheless, this
seem like the least bad option. I don't think these fields widely used in
extensions; I don't think there are FDWs out there that uses the FDW
"direct update" API, other than postgres_fdw. If there is, you will get a
compilation error, so hopefully it is caught quickly.

Backpatch to 11, where support for both cross-partition UPDATEs, and unique
indexes on partitioned tables, were added.

Reviewed-by: Amit Langote
Security: CVE-2021-3393
2021-02-08 11:01:51 +02:00
Alvaro Herrera 6f5c8a8ec2
Remove bogus restriction from BEFORE UPDATE triggers
In trying to protect the user from inconsistent behavior, commit
487e9861d0 "Enable BEFORE row-level triggers for partitioned tables"
tried to prevent BEFORE UPDATE FOR EACH ROW triggers from moving the row
from one partition to another.  However, it turns out that the
restriction is wrong in two ways: first, it fails spuriously, preventing
valid situations from working, as in bug #16794; and second, they don't
protect from any misbehavior, because tuple routing would cope anyway.

Fix by removing that restriction.

We keep the same restriction on BEFORE INSERT FOR EACH ROW triggers,
though.  It is valid and useful there.  In the future we could remove it
by having tuple reroute work for inserts as it does for updates.

Backpatch to 13.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Phillip Menke <pg@pmenke.de>
Discussion: https://postgr.es/m/16794-350a655580fbb9ae@postgresql.org
2021-01-28 16:56:07 -03:00
Peter Geoghegan 9dc718bdf2 Pass down "logically unchanged index" hint.
Add an executor aminsert() hint mechanism that informs index AMs that
the incoming index tuple (the tuple that accompanies the hint) is not
being inserted by execution of an SQL statement that logically modifies
any of the index's key columns.

The hint is received by indexes when an UPDATE takes place that does not
apply an optimization like heapam's HOT (though only for indexes where
all key columns are logically unchanged).  Any index tuple that receives
the hint on insert is expected to be a duplicate of at least one
existing older version that is needed for the same logical row.  Related
versions will typically be stored on the same index page, at least
within index AMs that apply the hint.

Recognizing the difference between MVCC version churn duplicates and
true logical row duplicates at the index AM level can help with cleanup
of garbage index tuples.  Cleanup can intelligently target tuples that
are likely to be garbage, without wasting too many cycles on less
promising tuples/pages (index pages with little or no version churn).

This is infrastructure for an upcoming commit that will teach nbtree to
perform bottom-up index deletion.  No index AM actually applies the hint
just yet.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=CEKFa74EScx_hFVshCOn6AA5T-ajFASTdzipdkLTNQQ@mail.gmail.com
2021-01-13 08:11:00 -08:00
Bruce Momjian ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Tom Lane 92bf7e2d02 Provide the OR REPLACE option for CREATE TRIGGER.
This is mostly straightforward.  However, we disallow replacing
constraint triggers or changing the is-constraint property; perhaps
that can be added later, but the complexity versus benefit tradeoff
doesn't look very good.

Also, no special thought is taken here for whether replacing an
existing trigger should result in changes to queued-but-not-fired
trigger actions.  We just document that if you're surprised by the
results, too bad, don't do that.  (Note that any such pending trigger
activity would have to be within the current session.)

Takamichi Osumi, reviewed at various times by Surafel Temesgen,
Peter Smith, and myself

Discussion: https://postgr.es/m/0DDF369B45A1B44B8A687ED43F06557C010BC362@G01JPEXMBYT03
2020-11-14 17:05:34 -05:00
Noah Misch 0c3185e963 In security-restricted operations, block enqueue of at-commit user code.
Specifically, this blocks DECLARE ... WITH HOLD and firing of deferred
triggers within index expressions and materialized view queries.  An
attacker having permission to create non-temp objects in at least one
schema could execute arbitrary SQL functions under the identity of the
bootstrap superuser.  One can work around the vulnerability by disabling
autovacuum and not manually running ANALYZE, CLUSTER, REINDEX, CREATE
INDEX, VACUUM FULL, or REFRESH MATERIALIZED VIEW.  (Don't restore from
pg_dump, since it runs some of those commands.)  Plain VACUUM (without
FULL) is safe, and all commands are fine when a trusted user owns the
target object.  Performance may degrade quickly under this workaround,
however.  Back-patch to 9.5 (all supported versions).

Reviewed by Robert Haas.  Reported by Etienne Stalmans.

Security: CVE-2020-25695
2020-11-09 07:32:09 -08:00
Tom Lane 20d3fe9009 In INSERT/UPDATE, use the table's real tuple descriptor as target.
Previously, ExecInitModifyTable relied on ExecInitJunkFilter,
and thence ExecCleanTypeFromTL, to build the target descriptor from
the query tlist.  While we just checked (in ExecCheckPlanOutput)
that the tlist produces compatible output, this is not a great
substitute for the relation's actual tuple descriptor that's
available from the relcache.  For one thing, dropped columns will
not be correctly marked attisdropped; it's a bit surprising that
we've gotten away with that this long.  But the real reason for
being concerned with this is that using the table's descriptor means
that the slot will have correct attrmissing data, allowing us to
revert the klugy fix of commit ba9f18abd.  (This commit undoes
that one's changes in trigger.c, but keeps the new test case.)
Thus we can solve the bogus-trigger-tuple problem with fewer cycles
rather than more.

No back-patch, since this doesn't fix any additional bug, and it
seems somewhat more likely to have unforeseen side effects than
ba9f18abd's narrow fix.

Discussion: https://postgr.es/m/16644-5da7ef98a7ac4545@postgresql.org
2020-10-26 11:36:53 -04:00
Tom Lane ba9f18abd3 Fix corner case for a BEFORE ROW UPDATE trigger returning OLD.
If the old row has any "missing" attributes that are supposed to
be retrieved from an associated tuple descriptor, the wrong things
happened because the trigger result is shoved directly into an
executor slot that lacks the missing-attribute data.  Notably,
CHECK-constraint verification would incorrectly see those columns
as NULL, and so would RETURNING-list evaluation.

Band-aid around this by forcibly expanding the tuple before passing
it to the trigger function.  (IMO it was a fundamental misdesign to
put the missing-attribute data into tuple constraints, which so
much of the system considers to be optional.  But we're probably
stuck with that now, and will have to continue to apply band-aids
as we find other places with similar issues.)

Back-patch to v12.  v11 would also have the issue, except that
commit 920311ab1 already applied a similar band-aid.  That forced
expansion in more cases than seem really necessary, though, so
this isn't a directly equivalent fix.

Amit Langote, with some cosmetic changes by me

Discussion: https://postgr.es/m/16644-5da7ef98a7ac4545@postgresql.org
2020-10-25 13:57:46 -04:00
Alvaro Herrera bbb927b4db
Fix ALTER TABLE .. ENABLE/DISABLE TRIGGER recursion
More precisely, correctly handle the ONLY flag indicating not to
recurse.  This was implemented in 86f575948c by recursing in
trigger.c, but that's the wrong place; use ATSimpleRecursion instead,
which behaves properly.  However, because legacy inheritance has never
recursed in that situation, make sure to do that only for new-style
partitioning.

I noticed this problem while testing a fix for another bug in the
vicinity.

This has been wrong all along, so backpatch to 11.

Discussion: https://postgr.es/m/20201016235925.GA29829@alvherre.pgsql
2020-10-20 19:22:09 -03:00
Heikki Linnakangas 6973533650 Revise child-to-root tuple conversion map management.
Store the tuple conversion map to convert a tuple from a child table's
format to the root format in a new ri_ChildToRootMap field in
ResultRelInfo. It is initialized if transition tuple capture for FOR
STATEMENT triggers or INSERT tuple routing on a partitioned table is
needed. Previously, ModifyTable kept the maps in the per-subplan
ModifyTableState->mt_per_subplan_tupconv_maps array, or when tuple
routing was used, in
ResultRelInfo->ri_Partitioninfo->pi_PartitionToRootMap. The new field
replaces both of those.

Now that the child-to-root tuple conversion map is always available in
ResultRelInfo (when needed), remove the TransitionCaptureState.tcs_map
field. The callers of Exec*Trigger() functions no longer need to set or
save it, which is much less confusing and bug-prone. Also, as a future
optimization, this will allow us to delay creating the map for a given
result relation until the relation is actually processed during
execution.

Author: Amit Langote
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqHtCWLdK-LO%3DNEsvOdHx%2B7yv4mE_zYK0i3BH7dXb-wxog%40mail.gmail.com
2020-10-19 14:42:55 +03:00
Heikki Linnakangas 1375422c78 Create ResultRelInfos later in InitPlan, index them by RT index.
Instead of allocating all the ResultRelInfos upfront in one big array,
allocate them in ExecInitModifyTable(). es_result_relations is now an
array of ResultRelInfo pointers, rather than an array of structs, and it
is indexed by the RT index.

This simplifies things: we get rid of the separate concept of a "result
rel index", and don't need to set it in setrefs.c anymore. This also
allows follow-up optimizations (not included in this commit yet) to skip
initializing ResultRelInfos for target relations that were not needed at
runtime, and removal of the es_result_relation_info pointer.

The EState arrays of regular result rels and root result rels are merged
into one array. Similarly, the resultRelations and rootResultRelations
lists in PlannedStmt are merged into one. It's not actually clear to me
why they were kept separate in the first place, but now that the
es_result_relations array is indexed by RT index, it certainly seems
pointless.

The PlannedStmt->resultRelations list is now only needed for
ExecRelationIsTargetRelation(). One visible effect of this change is that
ExecRelationIsTargetRelation() will now return 'true' also for the
partition root, if a partitioned table is updated. That seems like a good
thing, although the function isn't used in core code, and I don't see any
reason for an FDW to call it on a partition root.

Author: Amit Langote
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
2020-10-13 12:57:02 +03:00
Tom Lane 5cbfce562f Initial pgindent and pgperltidy run for v13.
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.

Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
2020-05-14 13:06:50 -04:00
Alvaro Herrera 17cc133f01
Dial back -Wimplicit-fallthrough to level 3
The additional pain from level 4 is excessive for the gain.

Also revert all the source annotation changes to their original
wordings, to avoid back-patching pain.

Discussion: https://postgr.es/m/31166.1589378554@sss.pgh.pa.us
2020-05-13 15:31:14 -04:00
Alvaro Herrera 3e9744465d
Add -Wimplicit-fallthrough to CFLAGS and CXXFLAGS
Use it at level 4, a bit more restrictive than the default level, and
tweak our commanding comments to FALLTHROUGH.

(However, leave zic.c alone, since it's external code; to avoid the
warnings that would appear there, change CFLAGS for that file in the
Makefile.)

Author: Julien Rouhaud <rjuju123@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20200412081825.qyo5vwwco3fv4gdo@nol
Discussion: https://postgr.es/m/flat/E1fDenm-0000C8-IJ@gemulon.postgresql.org
2020-05-12 16:07:30 -04:00
Alvaro Herrera 487e9861d0
Enable BEFORE row-level triggers for partitioned tables
... with the limitation that the tuple must remain in the same
partition.

Reviewed-by: Ashutosh Bapat
Discussion: https://postgr.es/m/20200227165158.GA2071@alvherre.pgsql
2020-03-18 18:58:05 -03:00
Peter Eisentraut 71d60e2aa0 Add tg_updatedcols to TriggerData
This allows a trigger function to determine for an UPDATE trigger
which columns were actually updated.  This allows some optimizations
in generic trigger functions such as lo_manage and
tsvector_update_trigger.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/flat/11c5f156-67a9-0fb5-8200-2a8018eb2e0c@2ndquadrant.com
2020-03-09 09:34:55 +01:00
Peter Eisentraut 8f152b6c50 Code simplification
Initialize TriggerData to 0 for the whole struct together, instead of
each field separately.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/flat/11c5f156-67a9-0fb5-8200-2a8018eb2e0c@2ndquadrant.com
2020-03-09 09:34:55 +01:00
Tom Lane bb03010b9f Remove the "opaque" pseudo-type and associated compatibility hacks.
A long time ago, it was necessary to declare datatype I/O functions,
triggers, and language handler support functions in a very type-unsafe
way involving a single pseudo-type "opaque".  We got rid of those
conventions in 7.3, but there was still support in various places to
automatically convert such functions to the modern declaration style,
to be able to transparently re-load dumps from pre-7.3 servers.
It seems unnecessary to continue to support that anymore, so take out
the hacks; whereupon the "opaque" pseudo-type itself is no longer
needed and can be dropped.

This is part of a group of patches removing various server-side kluges
for transparently upgrading pre-8.0 dump files.  Since we've had few
complaints about dropping pg_dump's support for dumping from pre-8.0
servers (commit 64f3524e2), it seems okay to now remove these kluges.

Discussion: https://postgr.es/m/4110.1583255415@sss.pgh.pa.us
2020-03-05 15:48:56 -05:00
Tom Lane e58a599752 Remove ancient support for upgrading pre-7.3 foreign key constraints.
Before 7.3, foreign key constraints had no explicit catalog
representation, so that what pg_dump produced for them was (usually)
a set of three CREATE CONSTRAINT TRIGGER commands.  Commit a2899ebdc
and some follow-on fixes added an ugly hack in CreateTrigger() to
recognize that pattern and reconstruct the foreign key definition.
However, we've never had any test coverage for that code, so that it's
legitimate to wonder if it still works; and having to maintain it in
the face of upcoming trigger-related patches seems rather pointless.
Let's decree that its time has passed, and drop it.

This is part of a group of patches removing various server-side kluges
for transparently upgrading pre-8.0 dump files.  Since we've had few
complaints about dropping pg_dump's support for dumping from pre-8.0
servers (commit 64f3524e2), it seems okay to now remove these kluges.

Daniel Gustafsson

Discussion: https://postgr.es/m/805874E2-999C-4CDA-856F-1AFBD9DFE933@yesql.se
2020-03-05 15:25:45 -05:00
Alvaro Herrera b9b408c487
Record parents of triggers
This let us get rid of a recently introduced ugly hack (commit
1fa846f1c9).

Author: Álvaro Herrera
Reviewed-by: Amit Langote, Tom Lane
Discussion: https://postgr.es/m/20200217215641.GA29784@alvherre.pgsql
2020-02-27 13:23:33 -03:00
Tom Lane 5815696bc6 Make parser rely more heavily on the ParseNamespaceItem data structure.
When I added the ParseNamespaceItem data structure (in commit 5ebaaa494),
it wasn't very tightly integrated into the parser's APIs.  In the wake of
adding p_rtindex to that struct (commit b541e9acc), there is a good reason
to make more use of it: by passing around ParseNamespaceItem pointers
instead of bare RTE pointers, we can get rid of various messy methods for
passing back or deducing the rangetable index of an RTE during parsing.
Hence, refactor the addRangeTableEntryXXX functions to build and return
a ParseNamespaceItem struct, not just the RTE proper; and replace
addRTEtoQuery with addNSItemToQuery, which is passed a ParseNamespaceItem
rather than building one internally.

Also, add per-column data (a ParseNamespaceColumn array) to each
ParseNamespaceItem.  These arrays are built during addRangeTableEntryXXX,
where we have column type data at hand so that it's nearly free to fill
the data structure.  Later, when we need to build Vars referencing RTEs,
we can use the ParseNamespaceColumn info to avoid the rather expensive
operations done in get_rte_attribute_type() or expandRTE().
get_rte_attribute_type() is indeed dead code now, so I've removed it.
This makes for a useful improvement in parse analysis speed, around 20%
in one moderately-complex test query.

The ParseNamespaceColumn structs also include Var identity information
(varno/varattno).  That info isn't actually being used in this patch,
except that p_varno == 0 is a handy test for a dropped column.
A follow-on patch will make more use of it.

Discussion: https://postgr.es/m/2461.1577764221@sss.pgh.pa.us
2020-01-02 11:29:01 -05:00
Bruce Momjian 7559d8ebfa Update copyrights for 2020
Backpatch-through: update all files in master, backpatch legal files through 9.4
2020-01-01 12:21:45 -05:00
Michael Paquier 7854e07f25 Revert "Rename files and headers related to index AM"
This follows multiple complains from Peter Geoghegan, Andres Freund and
Alvaro Herrera that this issue ought to be dug more before actually
happening, if it happens.

Discussion: https://postgr.es/m/20191226144606.GA5659@alvherre.pgsql
2019-12-27 08:09:00 +09:00
Tom Lane bb4114a4e2 Allow whole-row Vars to be used in partitioning expressions.
In the wake of commit 5b9312378, there's no particular reason
for this restriction (previously, it was problematic because of
the implied rowtype reference).  A simple constraint on a whole-row
Var probably isn't that useful, but conceivably somebody would want
to pass one to a function that extracts a partitioning key.  Besides
which, we're expending much more code to enforce the restriction than
we save by having it, since the latter quantity is now zero.
So drop the restriction.

Amit Langote

Discussion: https://postgr.es/m/CA+HiwqFUzjfj9HEsJtYWcr1SgQ_=iCAvQ=O2Sx6aQxoDu4OiHw@mail.gmail.com
2019-12-25 15:44:15 -05:00
Michael Paquier 8ce3aa9b59 Rename files and headers related to index AM
The following renaming is done so as source files related to index
access methods are more consistent with table access methods (the
original names used for index AMs ware too generic, and could be
confused as including features related to table AMs):
- amapi.h -> indexam.h.
- amapi.c -> indexamapi.c.  Here we have an equivalent with
backend/access/table/tableamapi.c.
- amvalidate.c -> indexamvalidate.c.
- amvalidate.h -> indexamvalidate.h.
- genam.c -> indexgenam.c.
- genam.h -> indexgenam.h.

This has been discussed during the development of v12 when table AM was
worked on, but the renaming never happened.

Author: Michael Paquier
Reviewed-by: Fabien Coelho, Julien Rouhaud
Discussion: https://postgr.es/m/20191223053434.GF34339@paquier.xyz
2019-12-25 10:23:39 +09:00
Robert Haas 7cdcc747a9 Update neglected comment.
Commit d986d4e87f renamed a variable
but neglected to update the corresponding comment.

Amit Langote
2019-12-19 09:24:44 -05:00
Etsuro Fujita 5a20b0219e Fix handling of multiple AFTER ROW triggers on a foreign table.
AfterTriggerExecute() retrieves a fresh tuple or pair of tuples from a
tuplestore and then stores the tuple(s) in the passed-in slot(s) if
AFTER_TRIGGER_FDW_FETCH, while it uses the most-recently-retrieved
tuple(s) stored in the slot(s) if AFTER_TRIGGER_FDW_REUSE.  This was
done correctly before 12, but commit ff11e7f4b broke it by mistakenly
clearing the tuple(s) stored in the slot(s) in that function, leading to
an assertion failure as reported in bug #16139 from Alexander Lakhin.

Also, fix some other issues with the aforementioned commit in passing:

* For tg_newslot, which is a slot added to the TriggerData struct by the
  commit to store new updated tuples, it didn't ensure the slot was NULL
  if there was no such tuple.
* The commit failed to update the documentation about the trigger
  interface.

Author: Etsuro Fujita
Backpatch-through: 12
Discussion: https://postgr.es/m/16139-94f9ccf0db6119ec%40postgresql.org
2019-12-10 18:00:30 +09:00
Tom Lane 0cafdd03a8 Fix silly initializations (cosmetic only).
Initializing a pointer to "false" isn't per project style,
and reportedly some compilers warn about it (though I've
not seen any such warnings in the buildfarm).

Seems to have come in with commit ff11e7f4b, so back-patch
to v12 where that was added.

Didier Gautheron

Discussion: https://postgr.es/m/CAJRYxu+XQuM0qnSqt1Ujztu6fBPzMMAT3VEn6W32rgKG6A2Fsw@mail.gmail.com
2019-11-13 15:26:54 -05:00
Alvaro Herrera dcb7d3cafa Have LookupFuncName accept NULL argtypes for 0 args
Prior to this change, it requires to be passed a valid pointer just to
be able to pass it to a zero-byte memcmp, per 0a52d378b0.  Given the
strange resulting code in callsites, it seems better to test for the
case specifically and remove the requirement.

Reported-by: Ranier Vilela
Discussion: https://postgr.es/m/MN2PR18MB2927F24692485D754794F01BE3740@MN2PR18MB2927.namprd18.prod.outlook.com
Discussion: https://postgr.es/m/MN2PR18MB2927F6873DF2774A505AC298E3740@MN2PR18MB2927.namprd18.prod.outlook.com
2019-11-12 17:06:58 -03:00
Alvaro Herrera b4bcc6bfdf Fix SET CONSTRAINTS .. DEFERRED on partitioned tables
SET CONSTRAINTS ... DEFERRED failed on partitioned tables, because of a
sanity check that ensures that the affected constraints have triggers.
On partitioned tables, the triggers are in the leaf partitions, not in
the partitioned relations themselves, so the sanity check fails.
Removing the sanity check solves the problem, because the code needed to
support the case is already there.

Backpatch to 11.

Note: deferred unique constraints are not affected by this bug, because
they do have triggers in the parent partitioned table.  I did not add a
test for this scenario.

Discussion: https://postgr.es/m/20191105212915.GA11324@alvherre.pgsql
2019-11-07 13:59:24 -03:00
Peter Eisentraut 604bd36711 PG_FINALLY
This gives an alternative way of catching exceptions, for the common
case where the cleanup code is the same in the error and non-error
cases.  So instead of

    PG_TRY();
    {
        ... code that might throw ereport(ERROR) ...
    }
    PG_CATCH();
    {
        cleanup();
	PG_RE_THROW();
    }
    PG_END_TRY();
    cleanup();

one can write

    PG_TRY();
    {
        ... code that might throw ereport(ERROR) ...
    }
    PG_FINALLY();
    {
        cleanup();
    }
    PG_END_TRY();

Discussion: https://www.postgresql.org/message-id/flat/95a822c3-728b-af0e-d7e5-71890507ae0c%402ndquadrant.com
2019-11-01 11:18:03 +01:00
Andres Freund d986d4e87f Fix crash caused by EPQ happening with a before update trigger present.
When ExecBRUpdateTriggers()'s GetTupleForTrigger() follows an EPQ
chain the former needs to run the result tuple through the junkfilter
again, and update the slot containing the new version of the tuple to
contain that new version. The input tuple may already be in the
junkfilter's output slot, which used to be OK - we don't need the
previous version anymore. Unfortunately ff11e7f4b9 started to use
ExecCopySlot() to update newslot, and ExecCopySlot() doesn't support
copying a slot into itself, leading to a slot in a corrupt
state, which then can cause crashes or other symptoms.

Fix this by skipping the ExecCopySlot() when copying into itself.

While we could have easily made ExecCopySlot() handle that case, it
seems better to add an assert forbidding doing so instead. As the goal
of copying might be to make the contents of one slot independent from
another, it seems failure prone to handle doing so silently.

A follow-up commit will add tests for the obviously under-covered
combination of EPQ and triggers. Done as a separate commit as it might
make sense to backpatch them further than this bug.

Also remove confusion with confusing variable names for slots in
ExecBRDeleteTriggers() and ExecBRUpdateTriggers().

Bug: #16036
Reported-By: Антон Власов
Author: Andres Freund
Discussion: https://postgr.es/m/16036-28184c90d952fb7f@postgresql.org
Backpatch: 12-, where ff11e7f4b9 was merged
2019-10-04 13:50:49 -07:00
Andres Freund 27cc7cd2bc Reorder EPQ work, to fix rowmark related bugs and improve efficiency.
In ad0bda5d24 I changed the EvalPlanQual machinery to store
substitution tuples in slot, instead of using plain HeapTuples. The
main motivation for that was that using HeapTuples will be inefficient
for future tableams.  But it turns out that that conversion was buggy
for non-locking rowmarks - the wrong tuple descriptor was used to
create the slot.

As a secondary issue 5db6df0c0 changed ExecLockRows() to begin EPQ
earlier, to allow to fetch the locked rows directly into the EPQ
slots, instead of having to copy tuples around. Unfortunately, as Tom
complained, that forces some expensive initialization to happen
earlier.

As a third issue, the test coverage for EPQ was clearly insufficient.

Fixing the first issue is unfortunately not trivial: Non-locked row
marks were fetched at the start of EPQ, and we don't have the type
information for the rowmarks available at that point. While we could
change that, it's not easy. It might be worthwhile to change that at
some point, but to fix this bug, it seems better to delay fetching
non-locking rowmarks when they're actually needed, rather than
eagerly. They're referenced at most once, and in cases where EPQ
fails, might never be referenced. Fetching them when needed also
increases locality a bit.

To be able to fetch rowmarks during execution, rather than
initialization, we need to be able to access the active EPQState, as
that contains necessary data. To do so move EPQ related data from
EState to EPQState, and, only for EStates creates as part of EPQ,
reference the associated EPQState from EState.

To fix the second issue, change EPQ initialization to allow use of
EvalPlanQualSlot() to be used before EvalPlanQualBegin() (but
obviously still requiring EvalPlanQualInit() to have been done).

As these changes made struct EState harder to understand, e.g. by
adding multiple EStates, significantly reorder the members, and add a
lot more comments.

Also add a few more EPQ tests, including one that fails for the first
issue above. More is needed.

Reported-By: yi huang
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion:
    https://postgr.es/m/CAHU7rYZo_C4ULsAx_LAj8az9zqgrD8WDd4hTegDTMM1LMqrBsg@mail.gmail.com
    https://postgr.es/m/24530.1562686693@sss.pgh.pa.us
Backpatch: 12-, where the EPQ changes were introduced
2019-09-09 05:14:11 -07:00
Michael Paquier 23bccc823d Fix inconsistencies and typos in the tree
This is numbered take 7, and addresses a set of issues with code
comments, variable names and unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/dff75442-2468-f74f-568c-6006e141062f@gmail.com
2019-07-22 10:01:50 +09:00
Alvaro Herrera 2c84ea6cf9 Propagate trigger arguments to partitions
We were creating the cloned triggers with an empty list of arguments,
losing the ones that had been specified by the user when creating the
trigger in the partitioned table.  Repair.

This was forgotten in commit 86f575948c.

Author: Patrick McHardy
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/20190709130027.amr2cavjvo7rdvac@access1.trash.net
Discussion: https://postgr.es/m/15752-123bc90287986de4@postgresql.org
2019-07-09 17:16:36 -04:00
Andres Freund 73b8c3bd28 tableam: Rename wrapper functions to match callback names.
Some of the wrapper functions didn't match the callback names. Many of
them due to staying "consistent" with historic naming of the wrapped
functionality. We decided that for most cases it's more important to
be for tableam to be consistent going forward, than with the past.

The one exception is beginscan/endscan/...  because it'd have looked
odd to have systable_beginscan/endscan/... with a different naming
scheme, and changing the systable_* APIs would have caused way too
much churn (including breaking a lot of external users).

Author: Ashwin Agrawal, with some small additions by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CALfoeiugyrXZfX7n0ORCa4L-m834dzmaE8eFdbNR6PMpetU4Ww@mail.gmail.com
2019-05-23 16:32:36 -07:00
Tom Lane 8255c7a5ee Phase 2 pgindent run for v12.
Switch to 2.1 version of pg_bsd_indent.  This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.

Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-05-22 13:04:48 -04:00
Tom Lane be76af171c Initial pgindent run for v12.
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.

Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
2019-05-22 12:55:34 -04:00
Andres Freund 88e6ad3054 Fix two memory leaks around force-storing tuples in slots.
As reported by Tom, when ExecStoreMinimalTuple() had to perform a
conversion to store the minimal tuple in the slot, it forgot to
respect the shouldFree flag, and leaked the tuple into the current
memory context if true.  Fix that by freeing the tuple in that case.

Looking at the relevant code made me (Andres) realize that not having
the shouldFree parameter to ExecForceStoreHeapTuple() was a bad
idea. Some callers had to locally implement the necessary logic, and
in one case it was missing, creating a potential per-group leak in
non-hashed aggregation.

The choice to not free the tuple in ExecComputeStoredGenerated() is
not pretty, but not introduced by this commit - I'll start a separate
discussion about it.

Reported-By: Tom Lane
Discussion: https://postgr.es/m/366.1555382816@sss.pgh.pa.us
2019-04-19 11:39:56 -07:00
Andres Freund 75e03eabea Fix potential use-after-free for BEFORE UPDATE row triggers on non-core AMs.
When such a trigger returns the old row version, it naturally get
stored in the slot for the trigger result. When a table AMs doesn't
store HeapTuples internally, ExecBRUpdateTriggers() frees the old row
version passed to triggers - but before this fix it might still be
referenced by the slot holding the new tuple.

Noticed when running the out-of-core zheap AM against the in-core
version of tableam.

Author: Andres Freund
2019-04-18 17:53:54 -07:00
Peter Eisentraut fc22b6623b Generated columns
This is an SQL-standard feature that allows creating columns that are
computed from expressions rather than assigned, similar to a view or
materialized view but on a column basis.

This implements one kind of generated column: stored (computed on
write).  Another kind, virtual (computed on read), is planned for the
future, and some room is left for it.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
2019-03-30 08:15:57 +01:00
Andres Freund 9a8ee1dc65 tableam: Add and use table_fetch_row_version().
This is essentially the tableam version of heapam_fetch(),
i.e. fetching a tuple identified by a tid, performing visibility
checks.

Note that this different from table_index_fetch_tuple(), which is for
index lookups. It therefore has to handle a tid pointing to an earlier
version of a tuple if the AM uses an optimization like heap's HOT. Add
comments to that end.

This commit removes the stats_relation argument from heap_fetch, as
it's been unused for a long time.

Author: Andres Freund
Reviewed-By: Haribabu Kommi
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-25 00:17:59 -07:00
Andres Freund 5db6df0c01 tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.

Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one.  It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch().  Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.

The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.

As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.

The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).

Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.

Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.

Author: Andres Freund and Haribabu Kommi
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
    https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-23 19:55:57 -07:00
Andres Freund 5963b29e03 Initialize variable to silence compiler warning.
After ff11e7f4b9 Tom's compiler warns about accessing a potentially
uninitialized rInfo. That's not actually possible, but it's
understandable the compiler would get this wrong. NULL initialize too.

Reported-By: Tom Lane
Discussion: https://postgr.es/m/11199.1551285318@sss.pgh.pa.us
2019-02-27 09:14:34 -08:00
Andres Freund ff11e7f4b9 Use slots in trigger infrastructure, except for the actual invocation.
In preparation for abstracting table storage, convert trigger.c to
track tuples in slots. Which also happens to make code calling
triggers simpler.

As the calling interface for triggers themselves is not changed in
this patch, HeapTuples still are extracted from the slot at that
time. But that's handled solely inside trigger.c, not visible to
callers. It's quite likely that we'll want to revise the external
trigger interface, but that's a separate large project.

As part of this work the slots used for old/new/return tuples are
moved from EState into ResultRelInfo, as different updated tables
might need different slots. The slots are now also now created
on-demand, which is good both from an efficiency POV, but also makes
the modifying code simpler.

Author: Andres Freund, Amit Khandekar and Ashutosh Bapat
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-26 20:31:38 -08:00
Robert Haas 1bb5e78218 Move code for managing PartitionDescs into a new file, partdesc.c
This is similar in spirit to the existing partbounds.c file in the
same directory, except that there's a lot less code in the new file
created by this commit.  Pending work in this area proposes to add a
bunch more code related to PartitionDescs, though, and this will give
us a good place to put it.

Discussion: http://postgr.es/m/CA+TgmoZUwPf_uanjF==gTGBMJrn8uCq52XYvAEorNkLrUdoawg@mail.gmail.com
2019-02-21 11:45:02 -05:00
Tom Lane 1d92a0c9f7 Redesign the partition dependency mechanism.
The original setup for dependencies of partitioned objects had
serious problems:

1. It did not verify that a drop cascading to a partition-child object
also cascaded to at least one of the object's partition parents.  Now,
normally a child object would share all its dependencies with one or
another parent (e.g. a child index's opclass dependencies would be shared
with the parent index), so that this oversight is usually harmless.
But if some dependency failed to fit this pattern, the child could be
dropped while all its parents remain, creating a logically broken
situation.  (It's easy to construct artificial cases that break it,
such as attaching an unrelated extension dependency to the child object
and then dropping the extension.  I'm not sure if any less-artificial
cases exist.)

2. Management of partition dependencies during ATTACH/DETACH PARTITION
was complicated and buggy; for example, after detaching a partition
table it was possible to create cases where a formerly-child index
should be dropped and was not, because the correct set of dependencies
had not been reconstructed.

Less seriously, because multiple partition relationships were
represented identically in pg_depend, there was an order-of-traversal
dependency on which partition parent was cited in error messages.
We also had some pre-existing order-of-traversal hazards for error
messages related to internal and extension dependencies.  This is
cosmetic to users but causes testing problems.

To fix #1, add a check at the end of the partition tree traversal
to ensure that at least one partition parent got deleted.  To fix #2,
establish a new policy that partition dependencies are in addition to,
not instead of, a child object's usual dependencies; in this way
ATTACH/DETACH PARTITION need not cope with adding or removing the
usual dependencies.

To fix the cosmetic problem, distinguish between primary and secondary
partition dependency entries in pg_depend, by giving them different
deptypes.  (They behave identically except for having different
priorities for being cited in error messages.)  This means that the
former 'I' dependency type is replaced with new 'P' and 'S' types.

This also fixes a longstanding bug that after handling an internal
dependency by recursing to the owning object, findDependentObjects
did not verify that the current target was now scheduled for deletion,
and did not apply the current recursion level's objflags to it.
Perhaps that should be back-patched; but in the back branches it
would only matter if some concurrent transaction had removed the
internal-linkage pg_depend entry before the recursive call found it,
or the recursive call somehow failed to find it, both of which seem
unlikely.

Catversion bump because the contents of pg_depend change for
partitioning relationships.

Patch HEAD only.  It's annoying that we're not fixing #2 in v11,
but there seems no practical way to do so given that the problem
is exactly a poor choice of what entries to put in pg_depend.
We can't really fix that while staying compatible with what's
in pg_depend in existing v11 installations.

Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 14:41:17 -05:00
Andres Freund 171e0418b0 Fix heap_getattr() handling of fast defaults.
Previously heap_getattr() returned NULL for attributes with a fast
default value (c.f. 16828d5c02), as it had no handling whatsoever
for that case.

A previous fix, 7636e5c60f, attempted to fix issues caused by this
oversight, but just expanding OLD tuples for triggers doesn't actually
solve the underlying issue.

One known consequence of this bug is that the check for HOT updates
can return the wrong result, when a previously fast-default'ed column
is set to NULL. Which in turn means that an index over a column with
fast default'ed columns might be corrupt if the underlying column(s)
allow NULLs.

Fix by handling fast default columns in heap_getattr(), remove now
superfluous expansion in GetTupleForTrigger().

Author: Andres Freund
Discussion: https://postgr.es/m/20190201162404.onngi77f26baem4g@alap3.anarazel.de
Backpatch: 11, where fast defaults were introduced
2019-02-06 01:09:32 -08:00