postgresql

Commit Graph

Author	SHA1	Message	Date
Andres Freund	88e6ad3054	Fix two memory leaks around force-storing tuples in slots. As reported by Tom, when ExecStoreMinimalTuple() had to perform a conversion to store the minimal tuple in the slot, it forgot to respect the shouldFree flag, and leaked the tuple into the current memory context if true. Fix that by freeing the tuple in that case. Looking at the relevant code made me (Andres) realize that not having the shouldFree parameter to ExecForceStoreHeapTuple() was a bad idea. Some callers had to locally implement the necessary logic, and in one case it was missing, creating a potential per-group leak in non-hashed aggregation. The choice to not free the tuple in ExecComputeStoredGenerated() is not pretty, but not introduced by this commit - I'll start a separate discussion about it. Reported-By: Tom Lane Discussion: https://postgr.es/m/366.1555382816@sss.pgh.pa.us	2019-04-19 11:39:56 -07:00
Andres Freund	75e03eabea	Fix potential use-after-free for BEFORE UPDATE row triggers on non-core AMs. When such a trigger returns the old row version, it naturally get stored in the slot for the trigger result. When a table AMs doesn't store HeapTuples internally, ExecBRUpdateTriggers() frees the old row version passed to triggers - but before this fix it might still be referenced by the slot holding the new tuple. Noticed when running the out-of-core zheap AM against the in-core version of tableam. Author: Andres Freund	2019-04-18 17:53:54 -07:00
Peter Eisentraut	fc22b6623b	Generated columns This is an SQL-standard feature that allows creating columns that are computed from expressions rather than assigned, similar to a view or materialized view but on a column basis. This implements one kind of generated column: stored (computed on write). Another kind, virtual (computed on read), is planned for the future, and some room is left for it. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com	2019-03-30 08:15:57 +01:00
Andres Freund	9a8ee1dc65	tableam: Add and use table_fetch_row_version(). This is essentially the tableam version of heapam_fetch(), i.e. fetching a tuple identified by a tid, performing visibility checks. Note that this different from table_index_fetch_tuple(), which is for index lookups. It therefore has to handle a tid pointing to an earlier version of a tuple if the AM uses an optimization like heap's HOT. Add comments to that end. This commit removes the stats_relation argument from heap_fetch, as it's been unused for a long time. Author: Andres Freund Reviewed-By: Haribabu Kommi Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de	2019-03-25 00:17:59 -07:00
Andres Freund	5db6df0c01	tableam: Add tuple_{insert, delete, update, lock} and use. This adds new, required, table AM callbacks for insert/delete/update and lock_tuple. To be able to reasonably use those, the EvalPlanQual mechanism had to be adapted, moving more logic into the AM. Previously both delete/update/lock call-sites and the EPQ mechanism had to have awareness of the specific tuple format to be able to fetch the latest version of a tuple. Obviously that needs to be abstracted away. To do so, move the logic that find the latest row version into the AM. lock_tuple has a new flag argument, TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last version, rather than the current one. It'd have been possible to do so via a separate callback as well, but finding the last version usually also necessitates locking the newest version, making it sensible to combine the two. This replaces the previous use of EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously signaled either a concurrent update or delete, is now split into two, to avoid callers needing AM specific knowledge to differentiate. The move of finding the latest row version into tuple_lock means that encountering a row concurrently moved into another partition will now raise an error about "tuple to be locked" rather than "tuple to be updated/deleted" - which is accurate, as that always happens when locking rows. While possible slightly less helpful for users, it seems like an acceptable trade-off. As part of this commit HTSU_Result has been renamed to TM_Result, and its members been expanded to differentiated between updating and deleting. HeapUpdateFailureData has been renamed to TM_FailureData. The interface to speculative insertion is changed so nodeModifyTable.c does not have to set the speculative token itself anymore. Instead there's a version of tuple_insert, tuple_insert_speculative, that performs the speculative insertion (without requiring a flag to signal that fact), and the speculative insertion is either made permanent with table_complete_speculative(succeeded = true) or aborted with succeeded = false). Note that multi_insert is not yet routed through tableam, nor is COPY. Changing multi_insert requires changes to copy.c that are large enough to better be done separately. Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED VIEW are also only going to be adjusted in a later commit. Author: Andres Freund and Haribabu Kommi Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql	2019-03-23 19:55:57 -07:00
Andres Freund	5963b29e03	Initialize variable to silence compiler warning. After `ff11e7f4b9` Tom's compiler warns about accessing a potentially uninitialized rInfo. That's not actually possible, but it's understandable the compiler would get this wrong. NULL initialize too. Reported-By: Tom Lane Discussion: https://postgr.es/m/11199.1551285318@sss.pgh.pa.us	2019-02-27 09:14:34 -08:00
Andres Freund	ff11e7f4b9	Use slots in trigger infrastructure, except for the actual invocation. In preparation for abstracting table storage, convert trigger.c to track tuples in slots. Which also happens to make code calling triggers simpler. As the calling interface for triggers themselves is not changed in this patch, HeapTuples still are extracted from the slot at that time. But that's handled solely inside trigger.c, not visible to callers. It's quite likely that we'll want to revise the external trigger interface, but that's a separate large project. As part of this work the slots used for old/new/return tuples are moved from EState into ResultRelInfo, as different updated tables might need different slots. The slots are now also now created on-demand, which is good both from an efficiency POV, but also makes the modifying code simpler. Author: Andres Freund, Amit Khandekar and Ashutosh Bapat Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de	2019-02-26 20:31:38 -08:00
Robert Haas	1bb5e78218	Move code for managing PartitionDescs into a new file, partdesc.c This is similar in spirit to the existing partbounds.c file in the same directory, except that there's a lot less code in the new file created by this commit. Pending work in this area proposes to add a bunch more code related to PartitionDescs, though, and this will give us a good place to put it. Discussion: http://postgr.es/m/CA+TgmoZUwPf_uanjF==gTGBMJrn8uCq52XYvAEorNkLrUdoawg@mail.gmail.com	2019-02-21 11:45:02 -05:00
Tom Lane	1d92a0c9f7	Redesign the partition dependency mechanism. The original setup for dependencies of partitioned objects had serious problems: 1. It did not verify that a drop cascading to a partition-child object also cascaded to at least one of the object's partition parents. Now, normally a child object would share all its dependencies with one or another parent (e.g. a child index's opclass dependencies would be shared with the parent index), so that this oversight is usually harmless. But if some dependency failed to fit this pattern, the child could be dropped while all its parents remain, creating a logically broken situation. (It's easy to construct artificial cases that break it, such as attaching an unrelated extension dependency to the child object and then dropping the extension. I'm not sure if any less-artificial cases exist.) 2. Management of partition dependencies during ATTACH/DETACH PARTITION was complicated and buggy; for example, after detaching a partition table it was possible to create cases where a formerly-child index should be dropped and was not, because the correct set of dependencies had not been reconstructed. Less seriously, because multiple partition relationships were represented identically in pg_depend, there was an order-of-traversal dependency on which partition parent was cited in error messages. We also had some pre-existing order-of-traversal hazards for error messages related to internal and extension dependencies. This is cosmetic to users but causes testing problems. To fix #1, add a check at the end of the partition tree traversal to ensure that at least one partition parent got deleted. To fix #2, establish a new policy that partition dependencies are in addition to, not instead of, a child object's usual dependencies; in this way ATTACH/DETACH PARTITION need not cope with adding or removing the usual dependencies. To fix the cosmetic problem, distinguish between primary and secondary partition dependency entries in pg_depend, by giving them different deptypes. (They behave identically except for having different priorities for being cited in error messages.) This means that the former 'I' dependency type is replaced with new 'P' and 'S' types. This also fixes a longstanding bug that after handling an internal dependency by recursing to the owning object, findDependentObjects did not verify that the current target was now scheduled for deletion, and did not apply the current recursion level's objflags to it. Perhaps that should be back-patched; but in the back branches it would only matter if some concurrent transaction had removed the internal-linkage pg_depend entry before the recursive call found it, or the recursive call somehow failed to find it, both of which seem unlikely. Catversion bump because the contents of pg_depend change for partitioning relationships. Patch HEAD only. It's annoying that we're not fixing #2 in v11, but there seems no practical way to do so given that the problem is exactly a poor choice of what entries to put in pg_depend. We can't really fix that while staying compatible with what's in pg_depend in existing v11 installations. Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com	2019-02-11 14:41:17 -05:00
Andres Freund	171e0418b0	Fix heap_getattr() handling of fast defaults. Previously heap_getattr() returned NULL for attributes with a fast default value (c.f. `16828d5c02`), as it had no handling whatsoever for that case. A previous fix, `7636e5c60f`, attempted to fix issues caused by this oversight, but just expanding OLD tuples for triggers doesn't actually solve the underlying issue. One known consequence of this bug is that the check for HOT updates can return the wrong result, when a previously fast-default'ed column is set to NULL. Which in turn means that an index over a column with fast default'ed columns might be corrupt if the underlying column(s) allow NULLs. Fix by handling fast default columns in heap_getattr(), remove now superfluous expansion in GetTupleForTrigger(). Author: Andres Freund Discussion: https://postgr.es/m/20190201162404.onngi77f26baem4g@alap3.anarazel.de Backpatch: 11, where fast defaults were introduced	2019-02-06 01:09:32 -08:00
Tom Lane	f09346a9c6	Refactor planner's header files. Create a new header optimizer/optimizer.h, which exposes just the planner functions that can be used "at arm's length", without need to access Paths or the other planner-internal data structures defined in nodes/relation.h. This is intended to provide the whole planner API seen by most of the rest of the system; although FDWs still need to use additional stuff, and more thought is also needed about just what selfuncs.c should rely on. The main point of doing this now is to limit the amount of new #include baggage that will be needed by "planner support functions", which I expect to introduce later, and which will be in relevant datatype modules rather than anywhere near the planner. This commit just moves relevant declarations into optimizer.h from other header files (a couple of which go away because everything got moved), and adjusts #include lists to match. There's further cleanup that could be done if we want to decide that some stuff being exposed by optimizer.h doesn't belong in the planner at all, but I'll leave that for another day. Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us	2019-01-29 15:48:51 -05:00
Andres Freund	a9c35cf85c	Change function call information to be variable length. Before this change FunctionCallInfoData, the struct arguments etc for V1 function calls are stored in, always had space for FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two arrays. For nearly every function call 100 arguments is far more than needed, therefore wasting memory. Arg and argnull being two separate arrays also guarantees that to access a single argument, two cachelines have to be touched. Change the layout so there's a single variable-length array with pairs of value / isnull. That drastically reduces memory consumption for most function calls (on x86-64 a two argument function now uses 64bytes, previously 936 bytes), and makes it very likely that argument value and its nullness are on the same cacheline. Arguments are stored in a new NullableDatum struct, which, due to padding, needs more memory per argument than before. But as usually far fewer arguments are stored, and individual arguments are cheaper to access, that's still a clear win. It's likely that there's other places where conversion to NullableDatum arrays would make sense, e.g. TupleTableSlots, but that's for another commit. Because the function call information is now variable-length allocations have to take the number of arguments into account. For heap allocations that can be done with SizeForFunctionCallInfoData(), for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro that helps to allocate an appropriately sized and aligned variable. Some places with stack allocation function call information don't know the number of arguments at compile time, and currently variably sized stack allocations aren't allowed in postgres. Therefore allow for FUNC_MAX_ARGS space in these cases. They're not that common, so for now that seems acceptable. Because of the need to allocate FunctionCallInfo of the appropriate size, older extensions may need to update their code. To avoid subtle breakages, the FunctionCallInfoData struct has been renamed to FunctionCallInfoBaseData. Most code only references FunctionCallInfo, so that shouldn't cause much collateral damage. This change is also a prerequisite for more efficient expression JIT compilation (by allocating the function call information on the stack, allowing LLVM to optimize it away); previously the size of the call information caused problems inside LLVM's optimizer. Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de	2019-01-26 14:17:52 -08:00
Andres Freund	e7cc78ad43	Remove superfluous tqual.h includes. Most of these had been obsoleted by `568d4138c` / the SnapshotNow removal. This is is preparation for moving most of tqual.[ch] into either snapmgr.h or heapam.h, which in turn is in preparation for pluggable table AMs. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de	2019-01-21 12:15:02 -08:00
Andres Freund	e0c4ec0728	Replace uses of heap_open et al with the corresponding table_* function. Author: Andres Freund Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de	2019-01-21 10:51:37 -08:00
Andres Freund	8cc157b234	Fix ALTER TRIGGER ... RENAME, broken in WITH OIDS removal. I (Andres) broke this in `578b229718`. Author: Rushabh Lathia Discussion: https://postgr.es/m/CAGPqQf04PywZX3sVQaF6H=oLiW9GJncRW+=e78vTy4MokEWcZw@mail.gmail.com	2019-01-21 09:13:43 -08:00
Bruce Momjian	97c39498e5	Update copyright for 2019 Backpatch-through: certain files through 9.4	2019-01-02 12:44:25 -05:00
Amit Kapila	3abb11e55b	Remove extra semicolons. Reported-by: David Rowley Author: David Rowley Reviewed-by: Amit Kapila Backpatch-through: 10 Discussion: https://postgr.es/m/CAKJS1f8EneeYyzzvdjahVZ6gbAHFkHbSFB5m_C0Y6TUJs9Dgdg@mail.gmail.com	2018-12-17 14:32:25 +05:30
Andres Freund	578b229718	Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de	2018-11-20 16:00:17 -08:00
Andres Freund	1a0586de36	Introduce notion of different types of slots (without implementing them). Upcoming work intends to allow pluggable ways to introduce new ways of storing table data. Accessing those table access methods from the executor requires TupleTableSlots to be carry tuples in the native format of such storage methods; otherwise there'll be a significant conversion overhead. Different access methods will require different data to store tuples efficiently (just like virtual, minimal, heap already require fields in TupleTableSlot). To allow that without requiring additional pointer indirections, we want to have different structs (embedding TupleTableSlot) for different types of slots. Thus different types of slots are needed, which requires adapting creators of slots. The slot that most efficiently can represent a type of tuple in an executor node will often depend on the type of slot a child node uses. Therefore we need to track the type of slot is returned by nodes, so parent slots can create slots based on that. Relatedly, JIT compilation of tuple deforming needs to know which type of slot a certain expression refers to, so it can create an appropriate deforming function for the type of tuple in the slot. But not all nodes will only return one type of slot, e.g. an append node will potentially return different types of slots for each of its subplans. Therefore add function that allows to query the type of a node's result slot, and whether it'll always be the same type (whether it's fixed). This can be queried using ExecGetResultSlotOps(). The scan, result, inner, outer type of slots are automatically inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(), left/right subtrees respectively. If that's not correct for a node, that can be overwritten using new fields in PlanState. This commit does not introduce the actually abstracted implementation of different kind of TupleTableSlots, that will be left for a followup commit. The different types of slots introduced will, for now, still use the same backing implementation. While this already partially invalidates the big comment in tuptable.h, it seems to make more sense to update it later, when the different TupleTableSlot implementations actually exist. Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de	2018-11-15 22:00:30 -08:00
Andres Freund	763f2edd92	Rejigger materializing and fetching a HeapTuple from a slot. Previously materializing a slot always returned a HeapTuple. As current work aims to reduce the reliance on HeapTuples (so other storage systems can work efficiently), that needs to change. Thus split the tasks of materializing a slot (i.e. making it independent from the underlying storage / other memory contexts) from fetching a HeapTuple from the slot. For brevity, allow to fetch a HeapTuple from a slot and materializing the slot at the same time, controlled by a parameter. For now some callers of ExecFetchSlotHeapTuple, with materialize = true, expect that changes to the heap tuple will be reflected in the underlying slot. Those places will be adapted in due course, so while not pretty, that's OK for now. Also rename ExecFetchSlotTuple to ExecFetchSlotHeapTupleDatum and ExecFetchSlotTupleDatum to ExecFetchSlotHeapTupleDatum, as it's likely that future storage methods will need similar methods. There already is ExecFetchSlotMinimalTuple, so the new names make the naming scheme more coherent. Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de	2018-11-15 14:31:12 -08:00
Peter Eisentraut	69ee2ff930	Apply RI trigger skipping tests also for DELETE The tests added in `cfa0f4255b` to skip firing an RI trigger if any old key value is NULL can also be applied for DELETE. This should give a performance gain in those cases, and it also saves a lot of duplicate code in the actual RI triggers. (That code was already dead code for the UPDATE cases.) Reviewed-by: Daniel Gustafsson <daniel@yesql.se>	2018-11-10 16:14:51 +01:00
Peter Eisentraut	96b00c433c	Remove obsolete pg_constraint.consrc column This has been deprecated and effectively unused for a long time. Reviewed-by: Daniel Gustafsson <daniel@yesql.se>	2018-11-01 20:36:05 +01:00
Tom Lane	d73f4c74dd	In the executor, use an array of pointers to access the rangetable. Instead of doing a lot of list_nth() accesses to es_range_table, create a flattened pointer array during executor startup and index into that to get at individual RangeTblEntrys. This eliminates one source of O(N^2) behavior with lots of partitions. (I'm not exactly convinced that it's the most important source, but it's an easy one to fix.) Amit Langote and David Rowley Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-04 15:48:17 -04:00
Andres Freund	cc2905e963	Use slots more widely in tuple mapping code and make naming more consistent. It's inefficient to use a single slot for mapping between tuple descriptors for multiple tuples, as previously done when using ConvertPartitionTupleSlot(), as that means the slot's tuple descriptors change for every tuple. Previously we also, via ConvertPartitionTupleSlot(), built new tuples after the mapping even in cases where we, immediately afterwards, access individual columns again. Refactor the code so one slot, on demand, is used for each partition. That avoids having to change the descriptor (and allows to use the more efficient "fixed" tuple slots). Then use slot->slot mapping, to avoid unnecessarily forming a tuple. As the naming between the tuple and slot mapping functions wasn't consistent, rename them to execute_attr_map_{tuple,slot}. It's likely that we'll also rename convert_tuples_by_* to denote that these functions "only" build a map, but that's left for later. Author: Amit Khandekar and Amit Langote, editorialized by me Reviewed-By: Amit Langote, Amit Khandekar, Andres Freund Discussion: https://postgr.es/m/CAJ3gD9fR0wRNeAE8VqffNTyONS_UfFPRpqxhnD9Q42vZB+Jvpg@mail.gmail.com https://postgr.es/m/e4f9d743-cd4b-efb0-7574-da21d86a7f36%40lab.ntt.co.jp Backpatch: -	2018-10-02 11:14:26 -07:00
Tom Lane	fdba460a26	Create an RTE field to record the query's lock mode for each relation. Add RangeTblEntry.rellockmode, which records the appropriate lock mode for each RTE_RELATION rangetable entry (either AccessShareLock, RowShareLock, or RowExclusiveLock depending on the RTE's role in the query). This patch creates the field and makes all creators of RTE nodes fill it in reasonably, but for the moment nothing much is done with it. The plan is to replace assorted post-parser logic that re-determines the right lockmode to use with simple uses of rte->rellockmode. For now, just add Asserts in each of those places that the rellockmode matches what they are computing today. (In some cases the match isn't perfect, so the Asserts are weaker than you might expect; but this seems OK, as per discussion.) This passes check-world for me, but it seems worth pushing in this state to see if the buildfarm finds any problems in cases I failed to test. catversion bump due to change of stored rules. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-09-30 13:55:51 -04:00
Andres Freund	29c94e03c7	Split ExecStoreTuple into ExecStoreHeapTuple and ExecStoreBufferHeapTuple. Upcoming changes introduce further types of tuple table slots, in preparation of making table storage pluggable. New storage methods will have different representation of tuples, therefore the slot accessor should refer explicitly to heap tuples. Instead of just renaming the functions, split it into one function that accepts heap tuples not residing in buffers, and one accepting ones in buffers. Previously one function was used for both, but that was a bit awkward already, and splitting will allow us to represent slot types for tuples in buffers and normal memory separately. This is split out from the patch introducing abstract slots, as this largely consists out of mechanical changes. Author: Ashutosh Bapat Reviewed-By: Andres Freund Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de	2018-09-25 16:27:48 -07:00
Andrew Dunstan	7636e5c60f	Fast default trigger and expand_tuple fixes Ensure that triggers get properly filled in tuples for the OLD value. Also fix the logic of detecting missing null values. The previous logic failed to detect a missing null column before the first missing column with a default. Fixing this has simplified the logic a bit. Regression tests are added to test changes. This should ensure better coverage of expand_tuple(). Original bug reports, and some code and test scripts from Tomas Vondra Backpatch to release 11.	2018-09-24 16:11:24 -04:00
Amit Kapila	40ca70ebcc	Allow using the updated tuple while moving it to a different partition. An update that causes the tuple to be moved to a different partition was missing out on re-constructing the to-be-updated tuple, based on the latest tuple in the update chain. Instead, it's simply deleting the latest tuple and inserting a new tuple in the new partition based on the old tuple. Commit `2f17844104` didn't consider this case, so some of the updates were getting lost. In passing, change the argument order for output parameter in ExecDelete and add some commentary about it. Reported-by: Pavan Deolasee Author: Amit Khandekar, with minor changes by me Reviewed-by: Dilip Kumar, Amit Kapila and Alvaro Herrera Backpatch-through: 11 Discussion: https://postgr.es/m/CAJ3gD9fRbEzDqdeDq1jxqZUb47kJn+tQ7=Bcgjc8quqKsDViKQ@mail.gmail.com	2018-07-12 12:51:39 +05:30
Tom Lane	41c912cad1	Clean up warnings from -Wimplicit-fallthrough. Recent gcc can warn about switch-case fall throughs that are not explicitly labeled as intentional. This seems like a good thing, so clean up the warnings exposed thereby by labeling all such cases with comments that gcc will recognize. In files that already had one or more suitable comments, I generally matched the existing style of those. Otherwise I went with /* FALLTHROUGH /, which is one of the spellings approved at the more-restrictive-than-default level -Wimplicit-fallthrough=4. (At the default level you can also spell it / FALL ?THRU */, and it's not picky about case. What you can't do is include additional text in the same comment, so some existing comments containing versions of this aren't good enough.) Testing with gcc 8.0.1 (Fedora 28's current version), I found that I also had to put explicit "break"s after elog(ERROR) or ereport(ERROR); apparently, for this purpose gcc doesn't recognize that those don't return. That seems like possibly a gcc bug, but it's fine because in most places we did that anyway; so this amounts to a visit from the style police. Discussion: https://postgr.es/m/15083.1525207729@sss.pgh.pa.us	2018-05-01 19:35:08 -04:00
Tom Lane	bdf46af748	Post-feature-freeze pgindent run. Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us	2018-04-26 14:47:16 -04:00
Alvaro Herrera	da6f3e45dd	Reorganize partitioning code There's been a massive addition of partitioning code in PostgreSQL 11, with little oversight on its placement, resulting in a catalog/partition.c with poorly defined boundaries and responsibilities. This commit tries to set a couple of distinct modules to separate things a little bit. There are no code changes here, only code movement. There are three new files: src/backend/utils/cache/partcache.c src/include/partitioning/partdefs.h src/include/utils/partcache.h The previous arrangement of #including catalog/partition.h almost everywhere is no more. Authors: Amit Langote and Álvaro Herrera Discussion: https://postgr.es/m/98e8d509-790a-128c-be7f-e48a5b2d8d97@lab.ntt.co.jp https://postgr.es/m/11aa0c50-316b-18bb-722d-c23814f39059@lab.ntt.co.jp https://postgr.es/m/143ed9a4-6038-76d4-9a55-502035815e68@lab.ntt.co.jp https://postgr.es/m/20180413193503.nynq7bnmgh6vs5vm@alvherre.pgsql	2018-04-14 21:12:14 -03:00
Simon Riggs	08ea7a2291	Revert MERGE patch This reverts commits `d204ef6377`, `83454e3c2b` and a few more commits thereafter (complete list at the end) related to MERGE feature. While the feature was fully functional, with sufficient test coverage and necessary documentation, it was felt that some parts of the executor and parse-analyzer can use a different design and it wasn't possible to do that in the available time. So it was decided to revert the patch for PG11 and retry again in the future. Thanks again to all reviewers and bug reporters. List of commits reverted, in reverse chronological order: `f1464c5380` Improve parse representation for MERGE `ddb4158579` MERGE syntax diagram correction `530e69e59b` Allow cpluspluscheck to pass by renaming variable `01b88b4df5` MERGE minor errata `3af7b2b0d4` MERGE fix variable warning in non-assert builds `a5d86181ec` MERGE INSERT allows only one VALUES clause `4b2d44031f` MERGE post-commit review `4923550c20` Tab completion for MERGE `aa3faa3c7a` WITH support in MERGE `83454e3c2b` New files for MERGE `d204ef6377` MERGE SQL Command following SQL:2016 Author: Pavan Deolasee Reviewed-by: Michael Paquier	2018-04-12 11:22:56 +01:00
Tom Lane	cefa387153	Merge catalog/pg_foo_fn.h headers back into pg_foo.h headers. Traditionally, include/catalog/pg_foo.h contains extern declarations for functions in backend/catalog/pg_foo.c, in addition to its function as the authoritative definition of the pg_foo catalog's rowtype. In some cases, we'd been forced to split out those extern declarations into separate pg_foo_fn.h headers so that the catalog definitions could be #include'd by frontend code. That problem is gone as of commit `9c0a0de4c`, so let's undo the splits to make things less confusing. Discussion: https://postgr.es/m/23690.1523031777@sss.pgh.pa.us	2018-04-08 14:35:29 -04:00
Andres Freund	f16241bef7	Raise error when affecting tuple moved into different partition. When an update moves a row between partitions (supported since `2f17844104`), our normal logic for following update chains in READ COMMITTED mode doesn't work anymore. Cross partition updates are modeled as an delete from the old and insert into the new partition. No ctid chain exists across partitions, and there's no convenient space to introduce that link. Not throwing an error in a partitioned context when one would have been thrown without partitioning is obviously problematic. This commit introduces infrastructure to detect when a tuple has been moved, not just plainly deleted. That allows to throw an error when encountering a deletion that's actually a move, while attempting to following a ctid chain. The row deleted as part of a cross partition update is marked by pointing it's t_ctid to an invalid block, instead of self as a normal update would. That was deemed to be the least invasive and most future proof way to represent the knowledge, given how few infomask bits are there to be recycled (there's also some locking issues with using infomask bits). External code following ctid chains should be updated to check for moved tuples. The most likely consequence of not doing so is a missed error. Author: Amul Sul, editorialized by me Reviewed-By: Amit Kapila, Pavan Deolasee, Andres Freund, Robert Haas Discussion: http://postgr.es/m/CAAJ_b95PkwojoYfz0bzXU8OokcTVGzN6vYGCNVUukeUDrnF3dw@mail.gmail.com	2018-04-07 13:24:27 -07:00
Teodor Sigaev	8224de4f42	Indexes with INCLUDE columns and their support in B-tree This patch introduces INCLUDE clause to index definition. This clause specifies a list of columns which will be included as a non-key part in the index. The INCLUDE columns exist solely to allow more queries to benefit from index-only scans. Also, such columns don't need to have appropriate operator classes. Expressions are not supported as INCLUDE columns since they cannot be used in index-only scans. Index access methods supporting INCLUDE are indicated by amcaninclude flag in IndexAmRoutine. For now, only B-tree indexes support INCLUDE clause. In B-tree indexes INCLUDE columns are truncated from pivot index tuples (tuples located in non-leaf pages and high keys). Therefore, B-tree indexes now might have variable number of attributes. This patch also provides generic facility to support that: pivot tuples contain number of their attributes in t_tid.ip_posid. Free 13th bit of t_info is used for indicating that. This facility will simplify further support of index suffix truncation. The changes of above are backward-compatible, pg_upgrade doesn't need special handling of B-tree indexes for that. Bump catalog version Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes, David Rowley, Alexander Korotkov Discussion: https://www.postgresql.org/message-id/flat/56168952.4010101@postgrespro.ru	2018-04-07 23:00:39 +03:00
Simon Riggs	4b2d44031f	MERGE post-commit review Review comments from Andres Freund * Consolidate code into AfterTriggerGetTransitionTable() * Rename nodeMerge.c to execMerge.c * Rename nodeMerge.h to execMerge.h * Move MERGE handling in ExecInitModifyTable() into a execMerge.c ExecInitMerge() * Move mt_merge_subcommands flags into execMerge.h * Rename opt_and_condition to opt_merge_when_and_condition * Wordsmith various comments Author: Pavan Deolasee Reviewer: Simon Riggs	2018-04-05 09:54:07 +01:00
Simon Riggs	d204ef6377	MERGE SQL Command following SQL:2016 MERGE performs actions that modify rows in the target table using a source table or query. MERGE provides a single SQL statement that can conditionally INSERT/UPDATE/DELETE rows a task that would other require multiple PL statements. e.g. MERGE INTO target AS t USING source AS s ON t.tid = s.sid WHEN MATCHED AND t.balance > s.delta THEN UPDATE SET balance = t.balance - s.delta WHEN MATCHED THEN DELETE WHEN NOT MATCHED AND s.delta > 0 THEN INSERT VALUES (s.sid, s.delta) WHEN NOT MATCHED THEN DO NOTHING; MERGE works with regular and partitioned tables, including column and row security enforcement, as well as support for row, statement and transition triggers. MERGE is optimized for OLTP and is parameterizable, though also useful for large scale ETL/ELT. MERGE is not intended to be used in preference to existing single SQL commands for INSERT, UPDATE or DELETE since there is some overhead. MERGE can be used statically from PL/pgSQL. MERGE does not yet support inheritance, write rules, RETURNING clauses, updatable views or foreign tables. MERGE follows SQL Standard per the most recent SQL:2016. Includes full tests and documentation, including full isolation tests to demonstrate the concurrent behavior. This version written from scratch in 2017 by Simon Riggs, using docs and tests originally written in 2009. Later work from Pavan Deolasee has been both complex and deep, leaving the lead author credit now in his hands. Extensive discussion of concurrency from Peter Geoghegan, with thanks for the time and effort contributed. Various issues reported via sqlsmith by Andreas Seltenreich Authors: Pavan Deolasee, Simon Riggs Reviewer: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com	2018-04-03 09:28:16 +01:00
Simon Riggs	7cf8a5c302	Revert "Modified files for MERGE" This reverts commit `354f13855e`.	2018-04-02 21:34:15 +01:00
Simon Riggs	354f13855e	Modified files for MERGE	2018-04-02 21:12:47 +01:00
Andres Freund	d87510a524	Combine options for RangeVarGetRelidExtended() into a flags argument. A followup patch will add a SKIP_LOCKED option. To avoid introducing evermore arguments, breaking existing callers each time, introduce a flags argument. This'll no doubt break a few external users... Also change the MISSING_OK behaviour so a DEBUG1 debug message is emitted when a relation is not found. Author: Nathan Bossart Reviewed-By: Michael Paquier and Andres Freund Discussion: https://postgr.es/m/20180306005349.b65whmvj7z6hbe2y@alap3.anarazel.de	2018-03-30 17:05:16 -07:00
Alvaro Herrera	86f575948c	Allow FOR EACH ROW triggers on partitioned tables Previously, FOR EACH ROW triggers were not allowed in partitioned tables. Now we allow AFTER triggers on them, and on trigger creation we cascade to create an identical trigger in each partition. We also clone the triggers to each partition that is created or attached later. This means that deferred unique keys are allowed on partitioned tables, too. Author: Álvaro Herrera Reviewed-by: Peter Eisentraut, Simon Riggs, Amit Langote, Robert Haas, Thomas Munro Discussion: https://postgr.es/m/20171229225319.ajltgss2ojkfd3kp@alvherre.pgsql	2018-03-23 10:48:22 -03:00
Tom Lane	25b692568f	Prevent dangling-pointer access when update trigger returns old tuple. A before-update row trigger may choose to return the "new" or "old" tuple unmodified. ExecBRUpdateTriggers failed to consider the second possibility, and would proceed to free the "old" tuple even if it was the one returned, leading to subsequent access to already-deallocated memory. In debug builds this reliably leads to an "invalid memory alloc request size" failure; in production builds it might accidentally work, but data corruption is also possible. This is a very old bug. There are probably a couple of reasons it hasn't been noticed up to now. It would be more usual to return NULL if one wanted to suppress the update action; returning "old" is significantly less efficient since the update will occur anyway. Also, none of the standard PLs would ever cause this because they all returned freshly-manufactured tuples even if they were just copying "old". But commit `4b93f5799` changed that for plpgsql, making it possible to see the bug with a plpgsql trigger. Still, this is certainly legal behavior for a trigger function, so it's ExecBRUpdateTriggers's fault not plpgsql's. It seems worth creating a test case that exercises returning "old" directly with a C-language trigger; testing this through plpgsql seems unreliable because its behavior might change again. Report and fix by Rushabh Lathia; regression test case by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAGPqQf1P4pjiNPrMof=P_16E-DFjt457j+nH2ex3=nBTew7tXw@mail.gmail.com	2018-02-27 13:28:02 -05:00
Andres Freund	ad7dbee368	Allow tupleslots to have a fixed tupledesc, use in executor nodes. The reason for doing so is that it will allow expression evaluation to optimize based on the underlying tupledesc. In particular it will allow to JIT tuple deforming together with the expression itself. For that expression initialization needs to be moved after the relevant slots are initialized - mostly unproblematic, except in the case of nodeWorktablescan.c. After doing so there's no need for ExecAssignResultType() and ExecAssignResultTypeFromTL() anymore, as all former callers have been converted to create a slot with a fixed descriptor. When creating a slot with a fixed descriptor, tts_values/isnull can be allocated together with the main slot, reducing allocation overhead and increasing cache density a bit. Author: Andres Freund Discussion: https://postgr.es/m/20171206093717.vqdxe5icqttpxs3p@alap3.anarazel.de	2018-02-16 21:17:38 -08:00
Robert Haas	2f17844104	Allow UPDATE to move rows between partitions. When an UPDATE causes a row to no longer match the partition constraint, try to move it to a different partition where it does match the partition constraint. In essence, the UPDATE is split into a DELETE from the old partition and an INSERT into the new one. This can lead to surprising behavior in concurrency scenarios because EvalPlanQual rechecks won't work as they normally did; the known problems are documented. (There is a pending patch to improve the situation further, but it needs more review.) Amit Khandekar, reviewed and tested by Amit Langote, David Rowley, Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro Herrera, Amit Kapila, and me. A few final revisions by me. Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com	2018-01-19 15:33:06 -05:00
Peter Eisentraut	8b9e9644dc	Replace AclObjectKind with ObjectType AclObjectKind was basically just another enumeration for object types, and we already have a preferred one for that. It's only used in aclcheck_error. By using ObjectType instead, we can also give some more precise error messages, for example "index" instead of "relation". Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2018-01-19 14:01:15 -05:00
Bruce Momjian	9d4649ca49	Update copyright for 2018 Backpatch-through: certain files through 9.3	2018-01-02 23:30:12 -05:00
Peter Eisentraut	2eb4a831e5	Change TRUE/FALSE to true/false The lower case spellings are C and C++ standard and are used in most parts of the PostgreSQL sources. The upper case spellings are only used in some files/modules. So standardize on the standard spellings. The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so those are left as is when using those APIs. In code comments, we use the lower-case spelling for the C concepts and keep the upper-case spelling for the SQL concepts. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-11-08 11:37:28 -05:00
Tom Lane	5fa6b0d102	Remove unnecessary PG_TRY overhead for CurrentResourceOwner changes. resowner/README contained advice to use a PG_TRY block to restore the old CurrentResourceOwner value anywhere that that variable is transiently changed. That advice was only inconsistently followed, however, and on reflection it seems like unnecessary overhead. We don't bother with such a convention for transient CurrentMemoryContext changes, on the grounds that any (sub)transaction abort will start out by resetting CurrentMemoryContext to what it wants. But the same is true of CurrentResourceOwner, so there seems no need to treat it differently. Hence, remove PG_TRY blocks that exist only to restore CurrentResourceOwner before re-throwing the error. There are a couple of places that restore it along with some other actions, and I left those alone; the restore is probably unnecessary but no noticeable gain will result from removing it. Discussion: https://postgr.es/m/5236.1507583529@sss.pgh.pa.us	2017-10-11 17:44:09 -04:00
Tom Lane	27c6619e9c	Fix possible dangling pointer dereference in trigger.c. AfterTriggerEndQuery correctly notes that the query_stack could get repalloc'd during a trigger firing, but it nonetheless passes the address of a query_stack entry to afterTriggerInvokeEvents, so that if such a repalloc occurs, afterTriggerInvokeEvents is already working with an obsolete dangling pointer while it scans the rest of the events. Oops. The only code at risk is its "delete_ok" cleanup code, so we can prevent unsafe behavior by passing delete_ok = false instead of true. However, that could have a significant performance penalty, because the point of passing delete_ok = true is to not have to re-scan possibly a large number of dead trigger events on the next time through the loop. There's more than one way to skin that cat, though. What we can do is delete all the "chunks" in the event list except the last one, since we know all events in them must be dead. Deleting the chunks is work we'd have had to do later in AfterTriggerEndQuery anyway, and it ends up saving rescanning of just about the same events we'd have gotten rid of with delete_ok = true. In v10 and HEAD, we also have to be careful to mop up any per-table after_trig_events pointers that would become dangling. This is slightly annoying, but I don't think that normal use-cases will traverse this code path often enough for it to be a performance problem. It's pretty hard to hit this in practice because of the unlikelihood of the query_stack getting resized at just the wrong time. Nonetheless, it's definitely a live bug of ancient standing, so back-patch to all supported branches. Discussion: https://postgr.es/m/2891.1505419542@sss.pgh.pa.us	2017-09-17 14:50:01 -04:00
Tom Lane	fd31f9f033	Ensure that BEFORE STATEMENT triggers fire the right number of times. Commit `0f79440fb` introduced mechanism to keep AFTER STATEMENT triggers from firing more than once per statement, which was formerly possible if more than one FK enforcement action had to be applied to a given table. Add a similar mechanism for BEFORE STATEMENT triggers, so that we don't have the unexpected situation of firing BEFORE STATEMENT triggers more often than AFTER STATEMENT. As with the previous patch, back-patch to v10. Discussion: https://postgr.es/m/22315.1505584992@sss.pgh.pa.us	2017-09-17 12:16:38 -04:00

1 2 3 4 5 ...

426 Commits