postgresql

Commit Graph

Author	SHA1	Message	Date
Andres Freund	4c640f4f38	Fix STRICT check for strict aggregates with NULL ORDER BY columns. I (Andres) broke this unintentionally in `69c3936a14`, by checking strictness for all input expressions computed for an aggregate, rather than just the input for the aggregate transition function. Reported-By: Ondřej Bouda Bisected-By: Tom Lane Diagnosed-By: Andrew Gierth Discussion: https://postgr.es/m/2a505161-2727-2473-7c46-591ed108ac52@email.cz Backpatch: 11-, like `69c3936a14`	2018-11-03 14:48:42 -07:00
Thomas Munro	1ce4a807e2	Fix NULL handling in multi-batch Parallel Hash Left Join. NULL keys in left joins were skipped when building batch files. Repair, by making the keep_nulls argument to ExecHashGetHashValue() depend on whether this is a left outer join, as we do in other paths. Bug #15475. Thinko in `1804284042`. Back-patch to 11. Reported-by: Paul Schaap Diagnosed-by: Andrew Gierth Dicussion: https://postgr.es/m/15475-11a7a783fed72a36%40postgresql.org	2018-11-03 11:05:35 +13:00
Magnus Hagander	fbec7459aa	Fix spelling errors and typos in comments Author: Daniel Gustafsson <daniel@yesql.se>	2018-11-02 13:56:52 +01:00
Tom Lane	14a158f9bf	Fix interaction of CASE and ArrayCoerceExpr. An array-type coercion appearing within a CASE that has a constant (after const-folding) test expression was mangled by the planner, causing all the elements of the resulting array to be equal to the coerced value of the CASE's test expression. This is my oversight in commit c12d570fa: that changed ArrayCoerceExpr to use a subexpression involving a CaseTestExpr, and I didn't notice that eval_const_expressions needed an adjustment to keep from folding such a CaseTestExpr to a constant when it's inside a suitable CASE. This is another in what's getting to be a depressingly long line of bugs associated with misidentification of the referent of a CaseTestExpr. We're overdue to redesign that mechanism; but any such fix is unlikely to be back-patchable into v11. As a stopgap, fix eval_const_expressions to do what it must here. Also add a bunch of comments pointing out the restrictions and assumptions that are needed to make this work at all. Also fix a related oversight: contain_context_dependent_node() was not aware of the relationship of ArrayCoerceExpr to CaseTestExpr. That was somewhat fail-soft, in that the outcome of a wrong answer would be to prevent optimizations that could have been made, but let's fix it while we're at it. Per bug #15471 from Matt Williams. Back-patch to v11 where the faulty logic came in. Discussion: https://postgr.es/m/15471-1117f49271989bad@postgresql.org	2018-10-30 15:26:11 -04:00
Tom Lane	26cb82030f	Improve some comments related to executor result relations. es_leaf_result_relations doesn't exist; perhaps this was an old name for es_tuple_routing_result_relations, or maybe this comment has gone unmaintained through multiple rounds of whacking the code around. Related comment in execnodes.h was both obsolete and ungrammatical.	2018-10-17 16:41:00 -04:00
Andres Freund	02a30a09f9	Correct constness of system attributes in heap.c & prerequisites. This allows the compiler / linker to mark affected pages as read-only. There's a fair number of pre-requisite changes, to allow the const properly be propagated. Most of consts were already required for correctness anyway, just not represented on the type-level. Arguably we could be more aggressive in using consts in related code, but.. This requires using a few of the types underlying typedefs that removes pointers (e.g. const NameData *) as declaring the typedefed type constant doesn't have the same meaning (it makes the variable const, not what it points to). Discussion: https://postgr.es/m/20181015200754.7y7zfuzsoux2c4ya@alap3.anarazel.de	2018-10-16 09:44:43 -07:00
Andres Freund	c5257345ef	Move TupleTableSlots boolean member into one flag variable. There's several reasons for this change: 1) It reduces the total size of TupleTableSlot / reduces alignment padding, making the commonly accessed members fit into a single cacheline (but we currently do not force proper alignment, so that's not yet guaranteed to be helpful) 2) Combining the booleans into a flag allows to combine read/writes from memory. 3) With the upcoming slot abstraction changes, it allows to have core and extended flags, in a memory efficient way. Author: Ashutosh Bapat and Andres Freund Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de	2018-10-15 18:23:25 -07:00
Andres Freund	9d906f1119	Move generic slot support functions from heaptuple.c into execTuples.c. heaptuple.c was never a particular good fit for slot_getattr(), slot_getsomeattrs() and slot_getmissingattrs(), but in upcoming changes slots will be made more abstract (allowing slots that contain different types of tuples), making it clearly the wrong place. Note that slot_deform_tuple() remains in it's current place, as it clearly deals with a HeapTuple. getmissingattrs() also remains, but it's less clear that that's correct - but execTuples.c wouldn't be the right place. Author: Ashutosh Bapat. Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de	2018-10-15 15:17:04 -07:00
Tom Lane	82ff0cc91d	Advance transaction timestamp for intra-procedure transactions. Per discussion, this behavior seems less astonishing than not doing so. Peter Eisentraut and Tom Lane Discussion: https://postgr.es/m/20180920234040.GC29981@momjian.us	2018-10-08 16:16:36 -04:00
Tom Lane	f9eb7c14b0	Avoid O(N^2) cost in ExecFindRowMark(). If there are many ExecRowMark structs, we spent O(N^2) time in ExecFindRowMark during executor startup. Once upon a time this was not of great concern, but the addition of native partitioning has squeezed out enough other costs that this can become the dominant overhead in some use-cases for tables with many partitions. To fix, simply replace that List data structure with an array. This adds a little bit of cost to execCurrentOf(), but not much, and anyway that code path is neither of large importance nor very efficient now. If we ever decide it is a bottleneck, constructing a hash table for lookup-by-tableoid would likely be the thing to do. Per complaint from Amit Langote, though this is different from his fix proposal. Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-08 10:41:34 -04:00
Tom Lane	52ed730d51	Remove some unnecessary fields from Plan trees. In the wake of commit `f2343653f`, we no longer need some fields that were used before to control executor lock acquisitions: * PlannedStmt.nonleafResultRelations can go away entirely. * partitioned_rels can go away from Append, MergeAppend, and ModifyTable. However, ModifyTable still needs to know the RT index of the partition root table if any, which was formerly kept in the first entry of that list. Add a new field "rootRelation" to remember that. rootRelation is partly redundant with nominalRelation, in that if it's set it will have the same value as nominalRelation. However, the latter field has a different purpose so it seems best to keep them distinct. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-07 14:33:17 -04:00
Tom Lane	29ef2b310d	Restore sane locking behavior during parallel query. Commit `9a3cebeaa` changed things so that parallel workers didn't obtain any lock of their own on tables they access. That was clearly a bad idea, but I'd mistakenly supposed that it was the intended end result of the series of patches for simplifying the executor's lock management. Undo that change in relation_open(), and adjust ExecOpenScanRelation() so that it gets the correct lock if inside a parallel worker. In passing, clean up some more obsolete comments about when locks are acquired. Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-06 15:49:37 -04:00
Tom Lane	f2343653f5	Remove more redundant relation locking during executor startup. We already have appropriate locks on every relation listed in the query's rangetable before we reach the executor. Take the next step in exploiting that knowledge by removing code that worries about taking locks on non-leaf result relations in a partitioned table. In particular, get rid of ExecLockNonLeafAppendTables and a stanza in InitPlan that asserts we already have locks on certain such tables. In passing, clean up some now-obsolete comments in InitPlan. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-06 15:12:51 -04:00
Tom Lane	c87cb5f7a6	Allow btree comparison functions to return INT_MIN. Historically we forbade datatype-specific comparison functions from returning INT_MIN, so that it would be safe to invert the sort order just by negating the comparison result. However, this was never really safe for comparison functions that directly return the result of memcmp(), strcmp(), etc, as POSIX doesn't place any such restriction on those library functions. Buildfarm results show that at least on recent Linux on s390x, memcmp() actually does return INT_MIN sometimes, causing sort failures. The agreed-on answer is to remove this restriction and fix relevant call sites to not make such an assumption; code such as "res = -res" should be replaced by "INVERT_COMPARE_RESULT(res)". The same is needed in a few places that just directly negated the result of memcmp or strcmp. To help find places having this problem, I've also added a compile option to nbtcompare.c that causes some of the commonly used comparators to return INT_MIN/INT_MAX instead of their usual -1/+1. It'd likely be a good idea to have at least one buildfarm member running with "-DSTRESS_SORT_INT_MIN". That's far from a complete test of course, but it should help to prevent fresh introductions of such bugs. This is a longstanding portability hazard, so back-patch to all supported branches. Discussion: https://postgr.es/m/20180928185215.ffoq2xrq5d3pafna@alap3.anarazel.de	2018-10-05 16:01:29 -04:00
Tom Lane	d73f4c74dd	In the executor, use an array of pointers to access the rangetable. Instead of doing a lot of list_nth() accesses to es_range_table, create a flattened pointer array during executor startup and index into that to get at individual RangeTblEntrys. This eliminates one source of O(N^2) behavior with lots of partitions. (I'm not exactly convinced that it's the most important source, but it's an easy one to fix.) Amit Langote and David Rowley Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-04 15:48:17 -04:00
Tom Lane	9ddef36278	Centralize executor's opening/closing of Relations for rangetable entries. Create an array estate->es_relations[] paralleling the es_range_table, and store references to Relations (relcache entries) there, so that any given RT entry is opened and closed just once per executor run. Scan nodes typically still call ExecOpenScanRelation, but ExecCloseScanRelation is no more; relation closing is now done centrally in ExecEndPlan. This is slightly more complex than one would expect because of the interactions with relcache references held in ResultRelInfo nodes. The general convention is now that ResultRelInfo->ri_RelationDesc does not represent a separate relcache reference and so does not need to be explicitly closed; but there is an exception for ResultRelInfos in the es_trig_target_relations list, which are manufactured by ExecGetTriggerResultRel and have to be cleaned up by ExecCleanUpTriggerState. (That much was true all along, but these ResultRelInfos are now more different from others than they used to be.) To allow the partition pruning logic to make use of es_relations[] rather than having its own relcache references, adjust PartitionedRelPruneInfo to store an RT index rather than a relation OID. Amit Langote, reviewed by David Rowley and Jesper Pedersen, some mods by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-04 14:03:42 -04:00
Tom Lane	9a3cebeaa7	Change executor to just Assert that table locks were already obtained. Instead of locking tables during executor startup, just Assert that suitable locks were obtained already during the parse/plan pipeline (or re-obtained by the plan cache). This must be so, else we have a hazard that concurrent DDL has invalidated the plan. This is pretty inefficient as well as undercommented, but it's all going to go away shortly, so I didn't try hard. This commit is just another attempt to use the buildfarm to see if we've missed anything in the plan to simplify the executor's table management. Note that the change needed here in relation_open() exposes that parallel workers now really are accessing tables without holding any lock of their own, whereas they were not doing that before this commit. This does not give me a warm fuzzy feeling about that aspect of parallel query; it does not seem like a good design, and we now know that it's had exactly no actual testing. I think that we should modify parallel query so that that change can be reverted. Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-03 16:05:12 -04:00
Andres Freund	c03c1449c0	Fix issues around EXPLAIN with JIT. I (Andres) was more than a bit hasty in committing `33001fd7a7` after last minute changes, leading to a number of problems (jit output was only shown for JIT in parallel workers, and just EXPLAIN without ANALYZE didn't work). Lukas luckily found these issues quickly. Instead of combining instrumentation in in standard_ExecutorEnd(), do so on demand in the new ExplainPrintJITSummary(). Also update a documentation example of the JIT output, changed in `52050ad8eb`. Author: Lukas Fittl, with minor changes by me Discussion: https://postgr.es/m/CAP53PkxmgJht69pabxBXJBM+0oc6kf3KHMborLP7H2ouJ0CCtQ@mail.gmail.com Backpatch: 11, where JIT compilation was introduced	2018-10-03 12:48:37 -07:00
Tom Lane	6e35939feb	Change rewriter/planner/executor/plancache to depend on RTE rellockmode. Instead of recomputing the required lock levels in all these places, just use what commit `fdba460a2` made the parser store in the RTE fields. This already simplifies the code measurably in these places, and follow-on changes will remove a bunch of no-longer-needed infrastructure. In a few cases, this change causes us to acquire a higher lock level than we did before. This is OK primarily because said higher lock level should've been acquired already at query parse time; thus, we're saving a useless extra trip through the shared lock manager to acquire a lesser lock alongside the original lock. The only known exception to this is that re-execution of a previously planned SELECT FOR UPDATE/SHARE query, for a table that uses ROW_MARK_REFERENCE or ROW_MARK_COPY methods, might have gotten only AccessShareLock before. Now it will get RowShareLock like the first execution did, which seems fine. While there's more to do, push it in this state anyway, to let the buildfarm help verify that nothing bad happened. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-02 14:43:09 -04:00
Andres Freund	cc2905e963	Use slots more widely in tuple mapping code and make naming more consistent. It's inefficient to use a single slot for mapping between tuple descriptors for multiple tuples, as previously done when using ConvertPartitionTupleSlot(), as that means the slot's tuple descriptors change for every tuple. Previously we also, via ConvertPartitionTupleSlot(), built new tuples after the mapping even in cases where we, immediately afterwards, access individual columns again. Refactor the code so one slot, on demand, is used for each partition. That avoids having to change the descriptor (and allows to use the more efficient "fixed" tuple slots). Then use slot->slot mapping, to avoid unnecessarily forming a tuple. As the naming between the tuple and slot mapping functions wasn't consistent, rename them to execute_attr_map_{tuple,slot}. It's likely that we'll also rename convert_tuples_by_* to denote that these functions "only" build a map, but that's left for later. Author: Amit Khandekar and Amit Langote, editorialized by me Reviewed-By: Amit Langote, Amit Khandekar, Andres Freund Discussion: https://postgr.es/m/CAJ3gD9fR0wRNeAE8VqffNTyONS_UfFPRpqxhnD9Q42vZB+Jvpg@mail.gmail.com https://postgr.es/m/e4f9d743-cd4b-efb0-7574-da21d86a7f36%40lab.ntt.co.jp Backpatch: -	2018-10-02 11:14:26 -07:00
Tom Lane	fdba460a26	Create an RTE field to record the query's lock mode for each relation. Add RangeTblEntry.rellockmode, which records the appropriate lock mode for each RTE_RELATION rangetable entry (either AccessShareLock, RowShareLock, or RowExclusiveLock depending on the RTE's role in the query). This patch creates the field and makes all creators of RTE nodes fill it in reasonably, but for the moment nothing much is done with it. The plan is to replace assorted post-parser logic that re-determines the right lockmode to use with simple uses of rte->rellockmode. For now, just add Asserts in each of those places that the rellockmode matches what they are computing today. (In some cases the match isn't perfect, so the Asserts are weaker than you might expect; but this seems OK, as per discussion.) This passes check-world for me, but it seems worth pushing in this state to see if the buildfarm finds any problems in cases I failed to test. catversion bump due to change of stored rules. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-09-30 13:55:51 -04:00
Andres Freund	10763358c3	Remove absolete function TupleDescGetSlot(). TupleDescGetSlot() was kept around for backward compatibility for user-written SRFs. With the TupleTableSlot abstraction work, that code will need to be version specific anyway, so there's no point in keeping the function around any longer. Author: Ashutosh Bapat Reviewed-By: Andres Freund Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de	2018-09-25 16:28:57 -07:00
Andres Freund	29c94e03c7	Split ExecStoreTuple into ExecStoreHeapTuple and ExecStoreBufferHeapTuple. Upcoming changes introduce further types of tuple table slots, in preparation of making table storage pluggable. New storage methods will have different representation of tuples, therefore the slot accessor should refer explicitly to heap tuples. Instead of just renaming the functions, split it into one function that accepts heap tuples not residing in buffers, and one accepting ones in buffers. Previously one function was used for both, but that was a bit awkward already, and splitting will allow us to represent slot types for tuples in buffers and normal memory separately. This is split out from the patch introducing abstract slots, as this largely consists out of mechanical changes. Author: Ashutosh Bapat Reviewed-By: Andres Freund Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de	2018-09-25 16:27:48 -07:00
Andres Freund	bbdfbb9154	Remove function list from prologue of execTuples.c. That section is never in sync with the actual routines available and their functionality. Author: Ashutosh Bapat Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de	2018-09-25 16:27:48 -07:00
Alvaro Herrera	5913b9bbf3	Remove obsolete comment The documented shortcoming was actually fixed in `4c728f3829` so the comment is not true anymore.	2018-09-25 17:55:22 -03:00
Andres Freund	33001fd7a7	Collect JIT instrumentation from workers. Previously, when using parallel query, EXPLAIN (ANALYZE)'s JIT compilation timings did not include the overhead from doing so on the workers. Fix that. We do so by simply aggregating the cost of doing JIT compilation on workers and the leader together. Arguably that's not quite accurate, because the total time spend doing so is spent in parallel - but it's hard to do much better. For additional detail, when VERBOSE is specified, the stats for workers are displayed separately. Author: Amit Khandekar and Andres Freund Discussion: https://postgr.es/m/CAJ3gD9eLrz51RK_gTkod+71iDcjpB_N8eC6vU2AW-VicsAERpQ@mail.gmail.com Backpatch: 11-	2018-09-25 13:12:44 -07:00
Tom Lane	89b280e139	Fix failure in WHERE CURRENT OF after rewinding the referenced cursor. In a case where we have multiple relation-scan nodes in a cursor plan, such as a scan of an inheritance tree, it's possible to fetch from a given scan node, then rewind the cursor and fetch some row from an earlier scan node. In such a case, execCurrent.c mistakenly thought that the later scan node was still active, because ExecReScan hadn't done anything to make it look not-active. We'd get some sort of failure in the case of a SeqScan node, because the node's scan tuple slot would be pointing at a HeapTuple whose t_self gets reset to invalid by heapam.c. But it seems possible that for other relation scan node types we'd actually return a valid tuple TID to the caller, resulting in updating or deleting a tuple that shouldn't have been considered current. To fix, forcibly clear the ScanTupleSlot in ExecScanReScan. Another issue here, which seems only latent at the moment but could easily become a live bug in future, is that rewinding a cursor does not necessarily lead to immediately applying ExecReScan to every scan-level node in the plan tree. Upper-level nodes will think that they can postpone that call if their child node is already marked with chgParam flags. I don't see a way for that to happen today in a plan tree that's simple enough for execCurrent.c's search_plan_tree to understand, but that's one heck of a fragile assumption. So, add some logic in search_plan_tree to detect chgParam flags being set on nodes that it descended to/through, and assume that that means we should consider lower scan nodes to be logically reset even if their ReScan call hasn't actually happened yet. Per bug #15395 from Matvey Arye. This has been broken for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/153764171023.14986.280404050547008575@wrigleys.postgresql.org	2018-09-23 16:05:45 -04:00
Tom Lane	07a3af0ff8	Fix parsetree representation of XMLTABLE(XMLNAMESPACES(DEFAULT ...)). The original coding for XMLTABLE thought it could represent a default namespace by a T_String Value node with a null string pointer. That's not okay, though; in particular outfuncs.c/readfuncs.c are not on board with such a representation, meaning you'll get a null pointer crash if you try to store a view or rule containing this construct. To fix, change the parsetree representation so that we have a NULL list element, instead of a bogus Value node. This isn't really a functional limitation since default XML namespaces aren't yet implemented in the executor; you'd just get "DEFAULT namespace is not supported" anyway. But crashes are not nice, so back-patch to v10 where this syntax was added. Ordinarily we'd consider a parsetree representation change to be un-backpatchable; but since existing releases would crash on the way to storing such constructs, there can't be any existing views/rules to be incompatible with. Per report from Andrey Lepikhov. Discussion: https://postgr.es/m/3690074f-abd2-56a9-144a-aa5545d7a291@postgrespro.ru	2018-09-17 13:16:32 -04:00
Tom Lane	1f4a920b73	Fix failure with initplans used conditionally during EvalPlanQual rechecks. The EvalPlanQual machinery assumes that any initplans (that is, uncorrelated sub-selects) used during an EPQ recheck would have already been evaluated during the main query; this is implicit in the fact that execPlan pointers are not copied into the EPQ estate's es_param_exec_vals. But it's possible for that assumption to fail, if the initplan is only reached conditionally. For example, a sub-select inside a CASE expression could be reached during a recheck when it had not been previously, if the CASE test depends on a column that was just updated. This bug is old, appearing to date back to my rewrite of EvalPlanQual in commit `9f2ee8f28`, but was not detected until Kyle Samson reported a case. To fix, force all not-yet-evaluated initplans used within the EPQ plan subtree to be evaluated at the start of the recheck, before entering the EPQ environment. This could be inefficient, if such an initplan is expensive and goes unused again during the recheck --- but that's piling one layer of improbability atop another. It doesn't seem worth adding more complexity to prevent that, at least not in the back branches. It was convenient to use the new-in-v11 ExecEvalParamExecParams function to implement this, but I didn't like either its name or the specifics of its API, so revise that. Back-patch all the way. Rather than rewrite the patch to avoid depending on bms_next_member() in the oldest branches, I chose to back-patch that function into 9.4 and 9.3. (This isn't the first time back-patches have needed that, and it exhausted my patience.) I also chose to back-patch some test cases added by commits `71404af2a` and `342a1ffa2` into 9.4 and 9.3, so that the 9.x versions of eval-plan-qual.spec are all the same. Andrew Gierth diagnosed the problem and contributed the added test cases, though the actual code changes are by me. Discussion: https://postgr.es/m/A033A40A-B234-4324-BE37-272279F7B627@tripadvisor.com	2018-09-15 13:42:33 -04:00
Alvaro Herrera	6b78231d91	Move PartitionDispatchData struct definition to execPartition.c There's no reason to expose the struct definition, so don't. Author: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> Discussion: https://postgr.es/m/d3fa24c1-bc65-7133-81df-6474387ccc4f@lab.ntt.co.jp	2018-09-14 19:06:57 -03:00
Tom Lane	361844fe56	Save/restore SPI's global variables in SPI_connect() and SPI_finish(). This patch removes two sources of interference between nominally independent functions when one SPI-using function calls another, perhaps without knowing that it does so. Chapman Flack pointed out that xml.c's query_to_xml_internal() expects SPI_tuptable and SPI_processed to stay valid across datatype output function calls; but it's possible that such a call could involve re-entrant use of SPI. It seems likely that there are similar hazards elsewhere, if not in the core code then in third-party SPI users. Previously SPI_finish() reset SPI's API globals to zeroes/nulls, which would typically make for a crash in such a situation. Restoring them to the values they had at SPI_connect() seems like a considerably more useful behavior, and it still meets the design goal of not leaving any dangling pointers to tuple tables of the function being exited. Also, cause SPI_connect() to reset these variables to zeroes/nulls after saving them. This prevents interference in the opposite direction: it's possible that a SPI-using function that's only ever been tested standalone contains assumptions that these variables start out as zeroes. That was the case as long as you were the outermost SPI user, but not so much for an inner user. Now it's consistent. Report and fix suggestion by Chapman Flack, actual patch by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/9fa25bef-2e4f-1c32-22a4-3ad0723c4a17@anastigmatix.net	2018-09-07 20:09:57 -04:00
Andrew Gierth	520acab171	Set scan direction appropriately for SubPlans (bug #15336 ) When executing a SubPlan in an expression, the EState's direction field was left alone, resulting in an attempt to execute the subplan backwards if it was encountered during a backwards scan of a cursor. Also, though much less likely, it was possible to reach the execution of an InitPlan while in backwards-scan state. Repair by saving/restoring estate->es_direction and forcing forward scan mode in the relevant places. Backpatch all the way, since this has been broken since 8.3 (prior to commit `c7ff7663e`, SubPlans had their own EStates rather than sharing the parent plan's, so there was no confusion over scan direction). Per bug #15336 reported by Vladimir Baranoff; analysis and patch by me, review by Tom Lane. Discussion: https://postgr.es/m/153449812167.1304.1741624125628126322@wrigleys.postgresql.org	2018-08-17 15:44:13 +01:00
Alvaro Herrera	1eb9221585	Fix executor prune failure when plan already pruned In a multi-layer partitioning setup, if at plan time all the sub-partitions are pruned but the intermediate one remains, the executor later throws a spurious error that there's nothing to prune. That is correct, but there's no reason to throw an error. Therefore, don't. Reported-by: Andreas Seltenreich <seltenreich@gmx.de> Author: David Rowley <david.rowley@2ndquadrant.com> Discussion: https://postgr.es/m/87in4h98i0.fsf@ansel.ydns.eu	2018-08-16 12:53:43 -03:00
Amit Kapila	4f9a97e417	Adjust comment atop ExecShutdownNode. After commits `a315b967cc` and `b805b63ac2`, part of the comment atop ExecShutdownNode is redundant. Adjust it. Author: Amit Kapila Backpatch-through: 10 where both the mentioned commits are present. Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info	2018-08-13 10:04:39 +05:30
Amit Kapila	2cd0acfdad	Prohibit shutting down resources if there is a possibility of back up. Currently, we release the asynchronous resources as soon as it is evident that no more rows will be needed e.g. when a Limit is filled. This can be problematic especially for custom and foreign scans where we can scan backward. Fix that by disallowing the shutting down of resources in such cases. Reported-by: Robert Haas Analysed-by: Robert Haas and Amit Kapila Author: Amit Kapila Reviewed-by: Robert Haas Backpatch-through: 9.6 where this code was introduced Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info	2018-08-13 08:22:18 +05:30
Andrew Gierth	07172d5aff	Avoid query-lifetime memory leaks in XMLTABLE (bug #15321 ) Multiple calls to XMLTABLE in a query (e.g. laterally applying it to a table with an xml column, an important use-case) were leaking large amounts of memory into the per-query context, blowing up memory usage. Repair by reorganizing memory context usage in nodeTableFuncscan; use the usual per-tuple context for row-by-row evaluations instead of perValueCxt, and use the explicitly created context -- renamed from perValueCxt to perTableCxt -- for arguments and state for each individual table-generation operation. Backpatch to PG10 where this code was introduced. Original report by IRC user begriffs; analysis and patch by me. Reviewed by Tom Lane and Pavel Stehule. Discussion: https://postgr.es/m/153394403528.10284.7530399040974170549@wrigleys.postgresql.org	2018-08-13 01:59:45 +01:00
Andrew Dunstan	5c047fd709	Revert changes in execMain.c from commit `16828d5c02` These changes were put in at some stage of the development process, but are unnecessary and should not have made it into the final patch. Mea culpa. Per gripe from Andreas Freund Backpatch to REL_11_STABLE	2018-08-10 16:09:25 -04:00
Amit Kapila	85c9d3475e	Fix buffer usage stats for parallel nodes. The buffer usage stats is accounted only for the execution phase of the node. For Gather and Gather Merge nodes, such stats are accumulated at the time of shutdown of workers which is done after execution of node due to which we missed to account them for such nodes. Fix it by treating nodes as running while we shut down them. We can also miss accounting for a Limit node when Gather or Gather Merge is beneath it, because it can finish the execution before shutting down such nodes. So we allow a Limit node to shut down the resources before it completes the execution. In the passing fix the gather node code to allow workers to shut down as soon as we find that all the tuples from the workers have been retrieved. The original code use to do that, but is accidently removed by commit `01edb5c7fc`. Reported-by: Adrien Nayrat Author: Amit Kapila and Robert Haas Reviewed-by: Robert Haas and Andres Freund Backpatch-through: 9.6 where this code was introduced Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info	2018-08-03 11:02:02 +05:30
Amit Kapila	ccc84a956b	Match the buffer usage tracking for leader and worker backends. In the leader backend, we don't track the buffer usage for ExecutorStart phase whereas in worker backend we track it for ExecutorStart phase as well. This leads to different value for buffer usage stats for the parallel and non-parallel query. Change the code so that worker backend also starts tracking buffer usage after ExecutorStart. Author: Amit Kapila and Robert Haas Reviewed-by: Robert Haas and Andres Freund Backpatch-through: 9.6 where this code was introduced Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info	2018-08-03 09:11:37 +05:30
Tom Lane	1c2cb2744b	Fix run-time partition pruning for appends with multiple source rels. The previous coding here supposed that if run-time partitioning applied to a particular Append/MergeAppend plan, then all child plans of that node must be members of a single partitioning hierarchy. This is totally wrong, since an Append could be formed from a UNION ALL: we could have multiple hierarchies sharing the same Append, or child plans that aren't part of any hierarchy. To fix, restructure the related plan-time and execution-time data structures so that we can have a separate list or array for each partitioning hierarchy. Also track subplans that are not part of any hierarchy, and make sure they don't get pruned. Per reports from Phil Florent and others. Back-patch to v11, since the bug originated there. David Rowley, with a lot of cosmetic adjustments by me; thanks also to Amit Langote for review. Discussion: https://postgr.es/m/HE1PR03MB17068BB27404C90B5B788BCABA7B0@HE1PR03MB1706.eurprd03.prod.outlook.com	2018-08-01 19:42:52 -04:00
Alvaro Herrera	91bc213d90	Fix unnoticed variable shadowing in previous commit Per buildfarm.	2018-08-01 17:04:57 -04:00
Alvaro Herrera	1c9bb02d8e	Fix per-tuple memory leak in partition tuple routing Some operations were being done in a longer-lived memory context, causing intra-query leaks. It's not noticeable unless you're doing a large COPY, but if you are, it eats enough memory to cause a problem. Co-authored-by: Kohei KaiGai <kaigai@heterodb.com> Co-authored-by: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CAOP8fzYtVFWZADq4c=KoTAqgDrHWfng+AnEPEZccyxqxPVbbWQ@mail.gmail.com	2018-08-01 16:29:15 -04:00
Peter Eisentraut	0d5f05cde0	Allow multi-inserts during COPY into a partitioned table CopyFrom allows multi-inserts to be used for non-partitioned tables, but this was disabled for partitioned tables. The reason for this appeared to be that the tuple may not belong to the same partition as the previous tuple did. Not allowing multi-inserts here greatly slowed down imports into partitioned tables. These could take twice as long as a copy to an equivalent non-partitioned table. It seems wise to do something about this, so this change allows the multi-inserts by flushing the so-far inserted tuples to the partition when the next tuple does not belong to the same partition, or when the buffer fills. This improves performance when the next tuple in the stream commonly belongs to the same partition as the previous tuple. In cases where the target partition changes on every tuple, using multi-inserts slightly slows the performance. To get around this we track the average size of the batches that have been inserted and adaptively enable or disable multi-inserts based on the size of the batch. Some testing was done and the regression only seems to exist when the average size of the insert batch is close to 1, so let's just enable multi-inserts when the average size is at least 1.3. More performance testing might reveal a better number for, this, but since the slowdown was only 1-2% it does not seem critical enough to spend too much time calculating it. In any case it may depend on other factors rather than just the size of the batch. Allowing multi-inserts for partitions required a bit of work around the per-tuple memory contexts as we must flush the tuples when the next tuple does not belong the same partition. In which case there is no good time to reset the per-tuple context, as we've already built the new tuple by this time. In order to work around this we maintain two per-tuple contexts and just switch between them every time the partition changes and reset the old one. This does mean that the first of each batch of tuples is not allocated in the same memory context as the others, but that does not matter since we only reset the context once the previous batch has been inserted. Author: David Rowley <david.rowley@2ndquadrant.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>	2018-08-01 10:23:09 +02:00
Alvaro Herrera	d25d45e4d9	Verify range bounds to bms_add_range when necessary Now that the bms_add_range boundary protections are gone, some alternative ones are needed in a few places. Author: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> Discussion: https://postgr.es/m/3437ccf8-a144-55ff-1e2f-fc16b437823b@lab.ntt.co.jp	2018-07-30 18:45:39 -04:00
Robert Haas	3e32109049	Use key and partdesc from PartitionDispatch where possible. Instead of repeatedly fishing the data out of the relcache entry, let's use the version that we cached in the PartitionDispatch. We could alternatively rip out the PartitionDispatch fields altogether, but it doesn't make much sense to have them and not use them; before this patch, partdesc was set but altogether unused. Amit Langote and I both thought using them was a litle better than removing them, so this patch takes that approach. Discussion: http://postgr.es/m/CA+TgmobFnxcaW-Co-XO8=yhJ5pJXoNkCj6Z7jm9Mwj9FGv-D7w@mail.gmail.com	2018-07-27 09:40:52 -04:00
Andres Freund	3acc4acd9b	LLVMJIT: Release JIT context after running ExprContext shutdown callbacks. Due to inlining it previously was possible that an ExprContext's shutdown callback pointed to a JITed function. As the JIT context previously was shut down before the shutdown callbacks were called, that could lead to segfaults. Fix the ordering. Reported-By: Dmitry Dolgov Author: Andres Freund Discussion: https://postgr.es/m/CA+q6zcWO7CeAJtHBxgcHn_hj+PenM=tvG0RJ93X1uEJ86+76Ug@mail.gmail.com Backpatch: 11-, where JIT compilation was added	2018-07-25 16:31:49 -07:00
Heikki Linnakangas	99fdebaf2d	Rephrase a few comments for clarity. I was confused by what "intended to be parallel serially" meant, until Robert Haas and David G. Johnston explained it. Rephrase the comment to make it more clear, using David's suggested wording. Discussion: https://www.postgresql.org/message-id/1fec9022-41e8-e484-70ce-2179b08c2092%40iki.fi	2018-07-19 16:08:09 +03:00
Heikki Linnakangas	405cb356d6	Fix comment. This comment was copy-pasted from nodeAppend.c to nodeMergeAppend.c, but while committing `5220bb7533`, I modified wrong copy of it. Spotted by David Rowley	2018-07-19 15:39:06 +03:00
Heikki Linnakangas	5220bb7533	Expand run-time partition pruning to work with MergeAppend This expands the support for the run-time partition pruning which was added for Append in `499be013de` to also allow unneeded subnodes of a MergeAppend to be removed. Author: David Rowley Discussion: https://www.postgresql.org/message-id/CAKJS1f_F_V8D7Wu-HVdnH7zCUxhoGK8XhLLtd%3DCu85qDZzXrgg%40mail.gmail.com	2018-07-19 13:49:43 +03:00
Heikki Linnakangas	6b387179ba	Fix misc typos, mostly in comments. A collection of typos I happened to spot while reading code, as well as grepping for common mistakes. Backpatch to all supported versions, as applicable, to avoid conflicts when backporting other commits in the future.	2018-07-18 16:17:32 +03:00
Amit Kapila	40ca70ebcc	Allow using the updated tuple while moving it to a different partition. An update that causes the tuple to be moved to a different partition was missing out on re-constructing the to-be-updated tuple, based on the latest tuple in the update chain. Instead, it's simply deleting the latest tuple and inserting a new tuple in the new partition based on the old tuple. Commit `2f17844104` didn't consider this case, so some of the updates were getting lost. In passing, change the argument order for output parameter in ExecDelete and add some commentary about it. Reported-by: Pavan Deolasee Author: Amit Khandekar, with minor changes by me Reviewed-by: Dilip Kumar, Amit Kapila and Alvaro Herrera Backpatch-through: 11 Discussion: https://postgr.es/m/CAJ3gD9fRbEzDqdeDq1jxqZUb47kJn+tQ7=Bcgjc8quqKsDViKQ@mail.gmail.com	2018-07-12 12:51:39 +05:30
Tom Lane	ff4f889164	Fix bugs with degenerate window ORDER BY clauses in GROUPS/RANGE mode. nodeWindowAgg.c failed to cope with the possibility that no ordering columns are defined in the window frame for GROUPS mode or RANGE OFFSET mode, leading to assertion failures or odd errors, as reported by Masahiko Sawada and Lukas Eder. In RANGE OFFSET mode, an ordering column is really required, so add an Assert about that. In GROUPS mode, the code would work, except that the node initialization code wasn't in sync with the execution code about when to set up tuplestore read pointers and spare slots. Fix the latter for consistency's sake (even though I think the changes described below make the out-of-sync cases unreachable for now). Per SQL spec, a single ordering column is required for RANGE OFFSET mode, and at least one ordering column is required for GROUPS mode. The parser enforced the former but not the latter; add a check for that. We were able to reach the no-ordering-column cases even with fully spec compliant queries, though, because the planner would drop partitioning and ordering columns from the generated plan if they were redundant with earlier columns according to the redundant-pathkey logic, for instance "PARTITION BY x ORDER BY y" in the presence of a "WHERE x=y" qual. While in principle that's an optimization that could save some pointless comparisons at runtime, it seems unlikely to be meaningful in the real world. I think this behavior was not so much an intentional optimization as a side-effect of an ancient decision to construct the plan node's ordering-column info by reverse-engineering the PathKeys of the input path. If we give up redundant-column removal then it takes very little code to generate the plan node info directly from the WindowClause, ensuring that we have the expected number of ordering columns in all cases. (If anyone does complain about this, the planner could perhaps be taught to remove redundant columns only when it's safe to do so, ie not in RANGE OFFSET mode. But I doubt anyone ever will.) With these changes, the WindowAggPath.winpathkeys field is not used for anything anymore, so remove it. The test cases added here are not actually very interesting given the removal of the redundant-column-removal logic, but they would represent important corner cases if anyone ever tries to put that back. Tom Lane and Masahiko Sawada. Back-patch to v11 where RANGE OFFSET and GROUPS modes were added. Discussion: https://postgr.es/m/CAD21AoDrWqycq-w_+Bx1cjc+YUhZ11XTj9rfxNiNDojjBx8Fjw@mail.gmail.com Discussion: https://postgr.es/m/153086788677.17476.8002640580496698831@wrigleys.postgresql.org	2018-07-11 12:07:20 -04:00
Peter Eisentraut	2e78c5b522	Fix assert in nested SQL procedure call When executing CALL in PL/pgSQL, we need to set a snapshot before invoking the to-be-called procedure. Otherwise, the to-be-called procedure might end up running without a snapshot. For LANGUAGE SQL procedures, this would result in an assertion failure. (For most other languages, this is usually not a problem, because those use SPI and SPI sets snapshots in most cases.) Setting the snapshot restores the behavior of how CALL worked when it was handled as a generic SQL statement in PL/pgSQL (exec_stmt_execsql()). This change revealed another problem: In SPI_commit(), we popped the active snapshot before committing the transaction, to avoid "snapshot %p still active" errors. However, there is no particular reason why only at most one snapshot should be on the stack. So change this to pop all active snapshots instead of only one.	2018-07-06 23:25:44 +02:00
Andrew Dunstan	1e9c858090	pgindent run prior to branching	2018-06-30 12:25:49 -04:00
Amit Kapila	2e61c50785	Fix thinko in comments. A slot can not be stored in a tuple but it's vice versa. Reported-by: Ashutosh Bapat Author: Ashutosh Bapat Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CAFjFpRcHhNhXdegyJv3KKDWrwO1_NB_KYZM_ZSDeMOZaL1A5jQ@mail.gmail.com	2018-06-27 18:05:24 +05:30
Tom Lane	19832753f1	Fix some ill-chosen names for globally-visible partition support functions. "compute_hash_value" is particularly gratuitously generic, but IMO all of these ought to have names clearly related to partitioning.	2018-06-13 13:18:02 -04:00
Tom Lane	e23bae82cf	Fix up run-time partition pruning's use of relcache's partition data. The previous coding saved pointers into the partitioned table's relcache entry, but then closed the relcache entry, causing those pointers to nominally become dangling. Actual trouble would be seen in the field only if a relcache flush occurred mid-query, but that's hardly out of the question. While we could fix this by copying all the data in question at query start, it seems better to just hold the relcache entry open for the whole query. While at it, improve the handling of support-function lookups: do that once per query not once per pruning test. There's still something to be desired here, in that we fail to exploit the possibility of caching data across queries in the fn_extra fields of the relcache's FmgrInfo structs, which could happen if we just used those structs in-place rather than copying them. However, combining that with the possibility of per-query lookups of cross-type comparison functions seems to require changes in the APIs of a lot of the pruning support functions, so it's too invasive to consider as part of this patch. A win would ensue only for complex partition key data types (e.g. arrays), so it may not be worth the trouble. David Rowley and Tom Lane Discussion: https://postgr.es/m/17850.1528755844@sss.pgh.pa.us	2018-06-13 12:03:26 -04:00
Tom Lane	69025c5a07	Improve ExecFindInitialMatchingSubPlans's subplan renumbering logic. We don't need two passes if we scan child partitions before parents, as that way the children's present_parts are up to date before they're needed. I (tgl) think there's actually a bug being fixed here, for the case of an intermediate partitioned table with no direct leaf children, but haven't attempted to construct a test case to prove it. David Rowley Discussion: https://postgr.es/m/CAKJS1f-6GODRNgEtdPxCnAPme2h2hTztB6LmtfdmcYAAOE0kQg@mail.gmail.com	2018-06-11 17:35:53 -04:00
Alvaro Herrera	5b0c7e2f75	Don't needlessly check the partition contraint twice Starting with commit `f0e44751d7`, ExecConstraints was in charge of running the partition constraint; commit `19c47e7c82` modified that so that caller could request to skip that checking depending on some conditions, but that commit and `15ce775faa` together introduced a small bug there which caused ExecInsert to request skipping the constraint check but have this not be honored -- in effect doing the check twice. This could have been fixed in a very small patch, but on further analysis of the involved function and its callsites, it turns out to be simpler to give the responsibility of checking the partition constraint fully to the caller, and return ExecConstraints to its original (pre-partitioning) shape where it only checked tuple descriptor-related constraints. Each caller must do partition constraint checking on its own schedule, which is more convenient after commit `2f17844104` anyway. Reported-by: David Rowley Author: David Rowley, Álvaro Herrera Reviewed-by: Amit Langote, Amit Khandekar, Simon Riggs Discussion: https://postgr.es/m/CAKJS1f8w8+awsxgea8wt7_UX8qzOQ=Tm1LD+U1fHqBAkXxkW2w@mail.gmail.com	2018-06-11 17:12:16 -04:00
Peter Eisentraut	387543f7bd	Make new error code name match SQL standard more closely Discussion: https://www.postgresql.org/message-id/dff3d555-bea4-ac24-29b2-29521b9d08e8%402ndquadrant.com	2018-06-11 11:15:28 -04:00
Tom Lane	321f648a31	Assorted cosmetic cleanup of run-time-partition-pruning code. Use "subplan" rather than "subnode" to refer to the child plans of a partitioning Append; this seems a bit more specific and hence clearer. Improve assorted comments. No non-cosmetic changes. David Rowley and Tom Lane Discussion: https://postgr.es/m/CAFj8pRBjrufA3ocDm8o4LPGNye9Y+pm1b9kCwode4X04CULG3g@mail.gmail.com	2018-06-10 18:24:34 -04:00
Tom Lane	939449de0e	Relocate partition pruning structs to a saner place. These struct definitions were originally dropped into primnodes.h, which is a poor choice since that's mainly intended for primitive expression node types; these are not in that category. What they are is auxiliary info in Plan trees, so move them to plannodes.h. For consistency, also relocate some related code that was apparently placed with the aid of a dartboard. There's no interesting code changes in this commit, just reshuffling. David Rowley and Tom Lane Discussion: https://postgr.es/m/CAFj8pRBjrufA3ocDm8o4LPGNye9Y+pm1b9kCwode4X04CULG3g@mail.gmail.com	2018-06-10 16:30:14 -04:00
Tom Lane	73b7f48f78	Improve run-time partition pruning to handle any stable expression. The initial coding of the run-time-pruning feature only coped with cases where the partition key(s) are compared to Params. That is a bit silly; we can allow it to work with any non-Var-containing stable expression, as long as we take special care with expressions containing PARAM_EXEC Params. The code is hardly any longer this way, and it's considerably clearer (IMO at least). Per gripe from Pavel Stehule. David Rowley, whacked around a bit by me Discussion: https://postgr.es/m/CAFj8pRBjrufA3ocDm8o4LPGNye9Y+pm1b9kCwode4X04CULG3g@mail.gmail.com	2018-06-10 15:22:32 -04:00
Thomas Munro	86a2218eb0	Limit Parallel Hash's bucket array to MaxAllocSize. Make sure that we don't exceed MaxAllocSize when increasing the number of buckets. Perhaps later we'll remove that limit and use DSA_ALLOC_HUGE, but for now just prevent further increases like the non-parallel code. This change avoids the error from bug report #15225. Author: Thomas Munro Reviewed-By: Tom Lane Reported-by: Frits Jalvingh Discussion: https://postgr.es/m/152802081668.26724.16985037679312485972%40wrigleys.postgresql.org	2018-06-10 20:30:25 +12:00
Peter Eisentraut	acad8b409a	Fix typo	2018-06-08 11:55:12 -04:00
Heikki Linnakangas	57f06a7611	Fix obsolete comment. The 'orig_slot' argument was removed in commit `c0a8ae7be3`, but that commit forgot to update the comment. Author: Amit Langote Discussion: https://www.postgresql.org/message-id/194ac4bf-7b4a-c887-bf26-bc1a85ea995a@lab.ntt.co.jp	2018-06-07 09:56:22 +03:00
Tom Lane	f755a152d4	Improve spelling of new FINALFUNC_MODIFY aggregate attribute. I'd used SHARABLE as a value originally, but Peter Eisentraut points out that dictionaries agree that SHAREABLE is the preferred spelling. Run around and change that before it's too late. Discussion: https://postgr.es/m/d2e1afd4-659c-50d6-1b20-7cfd3675e909@2ndquadrant.com	2018-05-21 11:41:42 -04:00
Robert Haas	b949bbcb7e	Further adjust comment in get_partition_dispatch_recurse. In editing `09b12d52db` I made it wrong; fix that and try to more clearly explain the situation. Patch by me, reviewed by David Rowley and Amit Langote Discussion: http://postgr.es/m/CA+TgmobAq+mA5hzm0a5OS38qQY5758DDDGqa3sBJN4hvir-H9w@mail.gmail.com	2018-05-18 16:11:22 -04:00
Robert Haas	09b12d52db	Improve comment in get_partition_dispatch_recurse. David Rowley, reviewed by Amit Langote, and revised a bit by me. Discussion: http://postgr.es/m/CAKJS1f9yyimYyFzbHM4EwE+tkj4jvrHqSH0H4S4Kbas=UFpc9Q@mail.gmail.com	2018-05-16 10:47:57 -04:00
Tom Lane	05ca21b878	Fix type checking for support functions of parallel VARIADIC aggregates. The impact of VARIADIC on the combine/serialize/deserialize support functions of an aggregate wasn't thought through carefully. There is actually no impact, because variadicity isn't passed through to these functions (and it doesn't seem like it would need to be). However, lookup_agg_function was mistakenly told to check things as though it were passed through. The net result was that it was impossible to declare an aggregate that had both VARIADIC input and parallelism support functions. In passing, fix a runtime check in nodeAgg.c for the combine function's strictness to make its error message agree with the creation-time check. The previous message was actually backwards, and it doesn't seem like there's a good reason to have two versions of this message text anyway. Back-patch to 9.6 where parallel aggregation was introduced. Alexey Bashtanov; message fix by me Discussion: https://postgr.es/m/f86dde87-fef4-71eb-0480-62754aaca01b@imap.cc	2018-05-15 15:06:53 -04:00
Peter Eisentraut	30c66e77be	Fix SPI error cleanup and memory leak Since the SPI stack has been moved from TopTransactionContext to TopMemoryContext, setting _SPI_stack to NULL in AtEOXact_SPI() leaks memory. In fact, we don't need to do that anymore: We just leave the allocated stack around for the next SPI use. Also, refactor the SPI cleanup so that it is run both at transaction end and when returning to the main loop on an exception. The latter is necessary when a procedure calls a COMMIT or ROLLBACK command that itself causes an error.	2018-05-03 08:39:15 -04:00
Tom Lane	41c912cad1	Clean up warnings from -Wimplicit-fallthrough. Recent gcc can warn about switch-case fall throughs that are not explicitly labeled as intentional. This seems like a good thing, so clean up the warnings exposed thereby by labeling all such cases with comments that gcc will recognize. In files that already had one or more suitable comments, I generally matched the existing style of those. Otherwise I went with /* FALLTHROUGH /, which is one of the spellings approved at the more-restrictive-than-default level -Wimplicit-fallthrough=4. (At the default level you can also spell it / FALL ?THRU */, and it's not picky about case. What you can't do is include additional text in the same comment, so some existing comments containing versions of this aren't good enough.) Testing with gcc 8.0.1 (Fedora 28's current version), I found that I also had to put explicit "break"s after elog(ERROR) or ereport(ERROR); apparently, for this purpose gcc doesn't recognize that those don't return. That seems like possibly a gcc bug, but it's fine because in most places we did that anyway; so this amounts to a visit from the style police. Discussion: https://postgr.es/m/15083.1525207729@sss.pgh.pa.us	2018-05-01 19:35:08 -04:00
Robert Haas	37a3058bc7	Fix interaction of foreign tuple routing with remote triggers. Without these fixes, changes to the inserted tuple made by remote triggers are ignored when building local RETURNING tuples. In the core code, call ExecInitRoutingInfo at a later point from within ExecInitPartitionInfo so that the FDW callback gets invoked after the returning list has been built. But move CheckValidResultRel out of ExecInitRoutingInfo so that it can happen at an earlier stage. In postgres_fdw, refactor assorted deparsing functions to work with the RTE rather than the PlannerInfo, which saves us having to construct a fake PlannerInfo in cases where we don't have a real one. Then, we can pass down a constructed RTE that yields the correct deparse result when no real one exists. Unfortunately, this necessitates a hack that understands how the core code manages RT indexes for update tuple routing, which is ugly, but we don't have a better idea right now. Original report, analysis, and patch by Etsuro Fujita. Heavily refactored by me. Then worked over some more by Amit Langote. Discussion: http://postgr.es/m/5AD4882B.10002@lab.ntt.co.jp	2018-05-01 13:21:46 -04:00
Tom Lane	bdf46af748	Post-feature-freeze pgindent run. Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us	2018-04-26 14:47:16 -04:00
Alvaro Herrera	bd4aad3239	Update ExecInitPartitionInfo comment Remove the words "if not already done." This obsolete wording corresponds to an early development version of what became `edd44738bc`. Author: Etsuro Fujita Reviewed-by: Amit Langote Discussion: https://postgr.es/m/5ADF117B.5030606@lab.ntt.co.jp	2018-04-24 23:00:48 -03:00
Alvaro Herrera	1957f8dabf	Initialize ExprStates once in run-time partition pruning Instead of doing ExecInitExpr every time a Param needs to be evaluated in run-time partition pruning, do it once during run-time pruning set-up and cache the exprstate in PartitionPruneContext, saving a lot of work. Author: David Rowley Reviewed-by: Amit Langote, Álvaro Herrera Discussion: https://postgr.es/m/CAKJS1f8-x+q-90QAPDu_okhQBV4DPEtPz8CJ=m0940GyT4DA4w@mail.gmail.com	2018-04-24 14:03:10 -03:00
Tom Lane	a66c03f698	Add missing "static" marker. Per pademelon.	2018-04-21 11:21:18 -04:00
Alvaro Herrera	79b2e52615	Remove quick path in ExecInitPartitionInfo for equal tupdescs I added this "optimization" on top of Amit Langote's `158b7bc6d7`, but the quick path is never taken because the partition uses a different pg_type oid than its parent table (causing equalTupleDescs to return false). Changing that requires more analysis and is too considered dangerous at this point in the cycle, so revert it. We might make it work someday, but not for pg11. Discussion: https://postgr.es/m/825031be-942c-8c24-6163-13c27f217a3d@lab.ntt.co.jp	2018-04-19 16:46:53 -03:00
Alvaro Herrera	b7e2cbc5b4	Update Append's idea of first_partial_plan It turns out that after runtime partition pruning, Append's first_partial_plan does not accurately represent partial plans to run, if any of those got pruned. This could limit participation of workers in some partial subplans, if other subplans got pruned. Fix it by keeping an index of the first valid partial subplan in the state node, determined at execnode Init time. Author: David Rowley, with cosmetic changes by me. Discussion: https://postgr.es/m/CAKJS1f8o2Yd=rOP=Et3A0FWgF+gSAOkFSU6eNhnGzTPV7nN8sQ@mail.gmail.com	2018-04-17 16:25:02 -03:00
Alvaro Herrera	158b7bc6d7	Ignore whole-rows in INSERT/CONFLICT with partitioned tables We had an Assert() preventing whole-row expressions from being used in the SET clause of INSERT ON CONFLICT, but it seems unnecessary, given some tests, so remove it. Add a new test to exercise the case. Still at ExecInitPartitionInfo, we used map_partition_varattnos (which constructs an attribute map, then calls map_variable_attnos) using the same two relations many times in different expressions and with different parameters. Constructing the map over and over is a waste. To avoid this repeated work, construct the map once, and use map_variable_attnos() directly instead. Author: Amit Langote, per comments by me (Álvaro) Discussion: https://postgr.es/m/20180326142016.m4st5e34chrzrknk@alvherre.pgsql	2018-04-16 15:52:28 -03:00
Alvaro Herrera	da6f3e45dd	Reorganize partitioning code There's been a massive addition of partitioning code in PostgreSQL 11, with little oversight on its placement, resulting in a catalog/partition.c with poorly defined boundaries and responsibilities. This commit tries to set a couple of distinct modules to separate things a little bit. There are no code changes here, only code movement. There are three new files: src/backend/utils/cache/partcache.c src/include/partitioning/partdefs.h src/include/utils/partcache.h The previous arrangement of #including catalog/partition.h almost everywhere is no more. Authors: Amit Langote and Álvaro Herrera Discussion: https://postgr.es/m/98e8d509-790a-128c-be7f-e48a5b2d8d97@lab.ntt.co.jp https://postgr.es/m/11aa0c50-316b-18bb-722d-c23814f39059@lab.ntt.co.jp https://postgr.es/m/143ed9a4-6038-76d4-9a55-502035815e68@lab.ntt.co.jp https://postgr.es/m/20180413193503.nynq7bnmgh6vs5vm@alvherre.pgsql	2018-04-14 21:12:14 -03:00
Simon Riggs	08ea7a2291	Revert MERGE patch This reverts commits `d204ef6377`, `83454e3c2b` and a few more commits thereafter (complete list at the end) related to MERGE feature. While the feature was fully functional, with sufficient test coverage and necessary documentation, it was felt that some parts of the executor and parse-analyzer can use a different design and it wasn't possible to do that in the available time. So it was decided to revert the patch for PG11 and retry again in the future. Thanks again to all reviewers and bug reporters. List of commits reverted, in reverse chronological order: `f1464c5380` Improve parse representation for MERGE `ddb4158579` MERGE syntax diagram correction `530e69e59b` Allow cpluspluscheck to pass by renaming variable `01b88b4df5` MERGE minor errata `3af7b2b0d4` MERGE fix variable warning in non-assert builds `a5d86181ec` MERGE INSERT allows only one VALUES clause `4b2d44031f` MERGE post-commit review `4923550c20` Tab completion for MERGE `aa3faa3c7a` WITH support in MERGE `83454e3c2b` New files for MERGE `d204ef6377` MERGE SQL Command following SQL:2016 Author: Pavan Deolasee Reviewed-by: Michael Paquier	2018-04-12 11:22:56 +01:00
Alvaro Herrera	15a8f8caad	Fix IndexOnlyScan counter for heap fetches in parallel mode The HeapFetches counter was using a simple value in IndexOnlyScanState, which fails to propagate values from parallel workers; so the counts are wrong when IndexOnlyScan runs in parallel. Move it to Instrumentation, like all the other counters. While at it, change INSERT ON CONFLICT conflicting tuple counter to use the new ntuples2 instead of nfiltered2, which is a blatant misuse. Discussion: https://postgr.es/m/20180409215851.idwc75ct2bzi6tea@alvherre.pgsql	2018-04-10 15:56:15 -03:00
Alvaro Herrera	468abb8f7a	Fix incorrect logic for choosing the next Parallel Append subplan In `499be013de` support for pruning unneeded Append subnodes was added. The logic in that commit was not correctly checking if the next subplan was in fact a valid subplan. This could cause parallel workers processes to be given a subplan to work on which didn't require any work. Per code review following an otherwise unexplained regression failure in buildfarm member Pademelon. (We haven't been able to reproduce the failure, so this is a bit of a blind fix in terms of whether it'll actually fix it; but it is a clear bug nonetheless). In passing, also add a comment to explain what first_partial_plan means. Author: David Rowley Discussion: https://postgr.es/m/CAKJS1f_E5r05hHUVG3UmCQJ49DGKKHtN=SHybD44LdzBn+CJng@mail.gmail.com	2018-04-09 17:23:49 -03:00
Alvaro Herrera	d7a95f06a1	Minor comment updates Fix a couple of typos, and update a comment about why we set a BMS to NULL. Author: David Rowley Discussion: http://postgr.es/m/CAKJS1f-tux=KdUz6ENJ9GHM_V2qgxysadYiOyQS9Ko9PTteVhQ@mail.gmail.com	2018-04-09 11:17:35 -03:00
Tom Lane	cefa387153	Merge catalog/pg_foo_fn.h headers back into pg_foo.h headers. Traditionally, include/catalog/pg_foo.h contains extern declarations for functions in backend/catalog/pg_foo.c, in addition to its function as the authoritative definition of the pg_foo catalog's rowtype. In some cases, we'd been forced to split out those extern declarations into separate pg_foo_fn.h headers so that the catalog definitions could be #include'd by frontend code. That problem is gone as of commit `9c0a0de4c`, so let's undo the splits to make things less confusing. Discussion: https://postgr.es/m/23690.1523031777@sss.pgh.pa.us	2018-04-08 14:35:29 -04:00
Alvaro Herrera	499be013de	Support partition pruning at execution time Existing partition pruning is only able to work at plan time, for query quals that appear in the parsed query. This is good but limiting, as there can be parameters that appear later that can be usefully used to further prune partitions. This commit adds support for pruning subnodes of Append which cannot possibly contain any matching tuples, during execution, by evaluating Params to determine the minimum set of subnodes that can possibly match. We support more than just simple Params in WHERE clauses. Support additionally includes: 1. Parameterized Nested Loop Joins: The parameter from the outer side of the join can be used to determine the minimum set of inner side partitions to scan. 2. Initplans: Once an initplan has been executed we can then determine which partitions match the value from the initplan. Partition pruning is performed in two ways. When Params external to the plan are found to match the partition key we attempt to prune away unneeded Append subplans during the initialization of the executor. This allows us to bypass the initialization of non-matching subplans meaning they won't appear in the EXPLAIN or EXPLAIN ANALYZE output. For parameters whose value is only known during the actual execution then the pruning of these subplans must wait. Subplans which are eliminated during this stage of pruning are still visible in the EXPLAIN output. In order to determine if pruning has actually taken place, the EXPLAIN ANALYZE must be viewed. If a certain Append subplan was never executed due to the elimination of the partition then the execution timing area will state "(never executed)". Whereas, if, for example in the case of parameterized nested loops, the number of loops stated in the EXPLAIN ANALYZE output for certain subplans may appear lower than others due to the subplan having been scanned fewer times. This is due to the list of matching subnodes having to be evaluated whenever a parameter which was found to match the partition key changes. This commit required some additional infrastructure that permits the building of a data structure which is able to perform the translation of the matching partition IDs, as returned by get_matching_partitions, into the list index of a subpaths list, as exist in node types such as Append, MergeAppend and ModifyTable. This allows us to translate a list of clauses into a Bitmapset of all the subpath indexes which must be included to satisfy the clause list. Author: David Rowley, based on an earlier effort by Beena Emerson Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi, Jesper Pedersen Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com	2018-04-07 17:54:39 -03:00
Andres Freund	f16241bef7	Raise error when affecting tuple moved into different partition. When an update moves a row between partitions (supported since `2f17844104`), our normal logic for following update chains in READ COMMITTED mode doesn't work anymore. Cross partition updates are modeled as an delete from the old and insert into the new partition. No ctid chain exists across partitions, and there's no convenient space to introduce that link. Not throwing an error in a partitioned context when one would have been thrown without partitioning is obviously problematic. This commit introduces infrastructure to detect when a tuple has been moved, not just plainly deleted. That allows to throw an error when encountering a deletion that's actually a move, while attempting to following a ctid chain. The row deleted as part of a cross partition update is marked by pointing it's t_ctid to an invalid block, instead of self as a normal update would. That was deemed to be the least invasive and most future proof way to represent the knowledge, given how few infomask bits are there to be recycled (there's also some locking issues with using infomask bits). External code following ctid chains should be updated to check for moved tuples. The most likely consequence of not doing so is a missed error. Author: Amul Sul, editorialized by me Reviewed-By: Amit Kapila, Pavan Deolasee, Andres Freund, Robert Haas Discussion: http://postgr.es/m/CAAJ_b95PkwojoYfz0bzXU8OokcTVGzN6vYGCNVUukeUDrnF3dw@mail.gmail.com	2018-04-07 13:24:27 -07:00
Teodor Sigaev	8224de4f42	Indexes with INCLUDE columns and their support in B-tree This patch introduces INCLUDE clause to index definition. This clause specifies a list of columns which will be included as a non-key part in the index. The INCLUDE columns exist solely to allow more queries to benefit from index-only scans. Also, such columns don't need to have appropriate operator classes. Expressions are not supported as INCLUDE columns since they cannot be used in index-only scans. Index access methods supporting INCLUDE are indicated by amcaninclude flag in IndexAmRoutine. For now, only B-tree indexes support INCLUDE clause. In B-tree indexes INCLUDE columns are truncated from pivot index tuples (tuples located in non-leaf pages and high keys). Therefore, B-tree indexes now might have variable number of attributes. This patch also provides generic facility to support that: pivot tuples contain number of their attributes in t_tid.ip_posid. Free 13th bit of t_info is used for indicating that. This facility will simplify further support of index suffix truncation. The changes of above are backward-compatible, pg_upgrade doesn't need special handling of B-tree indexes for that. Bump catalog version Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes, David Rowley, Alexander Korotkov Discussion: https://www.postgresql.org/message-id/flat/56168952.4010101@postgrespro.ru	2018-04-07 23:00:39 +03:00
Robert Haas	3d956d9562	Allow insert and update tuple routing and COPY for foreign tables. Also enable this for postgres_fdw. Etsuro Fujita, based on an earlier patch by Amit Langote. The larger patch series of which this is a part has been reviewed by Amit Langote, David Fetter, Maksim Milyutin, Álvaro Herrera, Stephen Frost, and me. Minor documentation changes to the final version by me. Discussion: http://postgr.es/m/29906a26-da12-8c86-4fb9-d8f88442f2b9@lab.ntt.co.jp	2018-04-06 19:22:03 -04:00
Simon Riggs	01b88b4df5	MERGE minor errata	2018-04-05 13:19:13 +01:00
Simon Riggs	3af7b2b0d4	MERGE fix variable warning in non-assert builds Author: Jesper Pedersen	2018-04-05 13:02:29 +01:00
Simon Riggs	4b2d44031f	MERGE post-commit review Review comments from Andres Freund * Consolidate code into AfterTriggerGetTransitionTable() * Rename nodeMerge.c to execMerge.c * Rename nodeMerge.h to execMerge.h * Move MERGE handling in ExecInitModifyTable() into a execMerge.c ExecInitMerge() * Move mt_merge_subcommands flags into execMerge.h * Rename opt_and_condition to opt_merge_when_and_condition * Wordsmith various comments Author: Pavan Deolasee Reviewer: Simon Riggs	2018-04-05 09:54:07 +01:00
Bruce Momjian	242408dbef	C comment: mention null handling in BuildTupleFromCStrings() Discussion: https://postgr.es/m/CAFjFpRcF-wNbe0w-m3NpkEwr9shmOZ=GoESOzd2Wog9h55J8sA@mail.gmail.com Author: Ashutosh Bapat	2018-04-03 14:01:14 -04:00
Simon Riggs	83454e3c2b	New files for MERGE	2018-04-03 10:22:21 +01:00
Simon Riggs	d204ef6377	MERGE SQL Command following SQL:2016 MERGE performs actions that modify rows in the target table using a source table or query. MERGE provides a single SQL statement that can conditionally INSERT/UPDATE/DELETE rows a task that would other require multiple PL statements. e.g. MERGE INTO target AS t USING source AS s ON t.tid = s.sid WHEN MATCHED AND t.balance > s.delta THEN UPDATE SET balance = t.balance - s.delta WHEN MATCHED THEN DELETE WHEN NOT MATCHED AND s.delta > 0 THEN INSERT VALUES (s.sid, s.delta) WHEN NOT MATCHED THEN DO NOTHING; MERGE works with regular and partitioned tables, including column and row security enforcement, as well as support for row, statement and transition triggers. MERGE is optimized for OLTP and is parameterizable, though also useful for large scale ETL/ELT. MERGE is not intended to be used in preference to existing single SQL commands for INSERT, UPDATE or DELETE since there is some overhead. MERGE can be used statically from PL/pgSQL. MERGE does not yet support inheritance, write rules, RETURNING clauses, updatable views or foreign tables. MERGE follows SQL Standard per the most recent SQL:2016. Includes full tests and documentation, including full isolation tests to demonstrate the concurrent behavior. This version written from scratch in 2017 by Simon Riggs, using docs and tests originally written in 2009. Later work from Pavan Deolasee has been both complex and deep, leaving the lead author credit now in his hands. Extensive discussion of concurrency from Peter Geoghegan, with thanks for the time and effort contributed. Various issues reported via sqlsmith by Andreas Seltenreich Authors: Pavan Deolasee, Simon Riggs Reviewer: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com	2018-04-03 09:28:16 +01:00
Simon Riggs	aa5877bb26	Revert "MERGE SQL Command following SQL:2016" This reverts commit `e6597dc353`.	2018-04-02 21:36:38 +01:00
Simon Riggs	7cf8a5c302	Revert "Modified files for MERGE" This reverts commit `354f13855e`.	2018-04-02 21:34:15 +01:00
Simon Riggs	354f13855e	Modified files for MERGE	2018-04-02 21:12:47 +01:00
Simon Riggs	e6597dc353	MERGE SQL Command following SQL:2016 MERGE performs actions that modify rows in the target table using a source table or query. MERGE provides a single SQL statement that can conditionally INSERT/UPDATE/DELETE rows a task that would other require multiple PL statements. e.g. MERGE INTO target AS t USING source AS s ON t.tid = s.sid WHEN MATCHED AND t.balance > s.delta THEN UPDATE SET balance = t.balance - s.delta WHEN MATCHED THEN DELETE WHEN NOT MATCHED AND s.delta > 0 THEN INSERT VALUES (s.sid, s.delta) WHEN NOT MATCHED THEN DO NOTHING; MERGE works with regular and partitioned tables, including column and row security enforcement, as well as support for row, statement and transition triggers. MERGE is optimized for OLTP and is parameterizable, though also useful for large scale ETL/ELT. MERGE is not intended to be used in preference to existing single SQL commands for INSERT, UPDATE or DELETE since there is some overhead. MERGE can be used statically from PL/pgSQL. MERGE does not yet support inheritance, write rules, RETURNING clauses, updatable views or foreign tables. MERGE follows SQL Standard per the most recent SQL:2016. Includes full tests and documentation, including full isolation tests to demonstrate the concurrent behavior. This version written from scratch in 2017 by Simon Riggs, using docs and tests originally written in 2009. Later work from Pavan Deolasee has been both complex and deep, leaving the lead author credit now in his hands. Extensive discussion of concurrency from Peter Geoghegan, with thanks for the time and effort contributed. Various issues reported via sqlsmith by Andreas Seltenreich Authors: Pavan Deolasee, Simon Riggs Reviewers: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com	2018-04-02 21:04:35 +01:00
Tom Lane	0b11a674fb	Fix a boatload of typos in C comments. Justin Pryzby Discussion: https://postgr.es/m/20180331105640.GK28454@telsasoft.com	2018-04-01 15:01:28 -04:00
Andrew Dunstan	ed69864350	Small cleanups in fast default code. Problems identified by Andres Freund and Haribabu Kommi	2018-04-01 08:16:18 +09:30
Bruce Momjian	20b4323bd1	C comments: "a" <--> "an" corrections Reported-by: Michael Paquier, Abhijit Menon-Sen Discussion: https://postgr.es/m/20180305045854.GB2266@paquier.xyz Author: Michael Paquier, Abhijit Menon-Sen, me	2018-03-29 15:18:53 -04:00
Peter Eisentraut	d92bc83c48	PL/pgSQL: Nested CALL with transactions So far, a nested CALL or DO in PL/pgSQL would not establish a context where transaction control statements were allowed. This fixes that by handling CALL and DO specially in PL/pgSQL, passing the atomic/nonatomic execution context through and doing the required management around transaction boundaries. Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>	2018-03-28 13:31:27 -04:00
Andrew Dunstan	16828d5c02	Fast ALTER TABLE ADD COLUMN with a non-NULL default Currently adding a column to a table with a non-NULL default results in a rewrite of the table. For large tables this can be both expensive and disruptive. This patch removes the need for the rewrite as long as the default value is not volatile. The default expression is evaluated at the time of the ALTER TABLE and the result stored in a new column (attmissingval) in pg_attribute, and a new column (atthasmissing) is set to true. Any existing row when fetched will be supplied with the attmissingval. New rows will have the supplied value or the default and so will never need the attmissingval. Any time the table is rewritten all the atthasmissing and attmissingval settings for the attributes are cleared, as they are no longer needed. The most visible code change from this is in heap_attisnull, which acquires a third TupleDesc argument, allowing it to detect a missing value if there is one. In many cases where it is known that there will not be any (e.g. catalog relations) NULL can be passed for this argument. Andrew Dunstan, heavily modified from an original patch from Serge Rielau. Reviewed by Tom Lane, Andres Freund, Tomas Vondra and David Rowley. Discussion: https://postgr.es/m/31e2e921-7002-4c27-59f5-51f08404c858@2ndQuadrant.com	2018-03-28 10:43:52 +10:30
Tom Lane	442accc3fe	Allow memory contexts to have both fixed and variable ident strings. Originally, we treated memory context names as potentially variable in all cases, and therefore always copied them into the context header. Commit `9fa6f00b1` rethought this a little bit and invented a distinction between fixed and variable names, skipping the copy step for the former. But we can make things both simpler and more useful by instead allowing there to be two parts to a context's identification, a fixed "name" and an optional, variable "ident". The name supplied in the context create call is now required to be a compile-time-constant string in all cases, as it is never copied but just pointed to. The "ident" string, if wanted, is supplied later. This is needed because typically we want the ident to be stored inside the context so that it's cleaned up automatically on context deletion; that means it has to be copied into the context before we can set the pointer. The cost of this approach is basically just an additional pointer field in struct MemoryContextData, which isn't much overhead, and is bought back entirely in the AllocSet case by not needing a headerSize field anymore, since we no longer have to cope with variable header length. In addition, we can simplify the internal interfaces for memory context creation still further, saving a few cycles there. And it's no longer true that a custom identifier disqualifies a context from participating in aset.c's freelist scheme, so possibly there's some win on that end. All the places that were using non-compile-time-constant context names are adjusted to put the variable info into the "ident" instead. This allows more effective identification of those contexts in many cases; for example, subsidary contexts of relcache entries are now identified by both type (e.g. "index info") and relname, where before you got only one or the other. Contexts associated with PL function cache entries are now identified more fully and uniformly, too. I also arranged for plancache contexts to use the query source string as their identifier. This is basically free for CachedPlanSources, as they contained a copy of that string already. We pay an extra pstrdup to do it for CachedPlans. That could perhaps be avoided, but it would make things more fragile (since the CachedPlanSource is sometimes destroyed first). I suspect future improvements in error reporting will require CachedPlans to have a copy of that string anyway, so it's not clear that it's worth moving mountains to avoid it now. This also changes the APIs for context statistics routines so that the context-specific routines no longer assume that output goes straight to stderr, nor do they know all details of the output format. This is useful immediately to reduce code duplication, and it also allows for external code to do something with stats output that's different from printing to stderr. The reason for pushing this now rather than waiting for v12 is that it rethinks some of the API changes made by commit `9fa6f00b1`. Seems better for extension authors to endure just one round of API changes not two. Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com	2018-03-27 16:46:51 -04:00
Andres Freund	32af96b2b1	JIT tuple deforming in LLVM JIT provider. Performing JIT compilation for deforming gains performance benefits over unJITed deforming from compile-time knowledge of the tuple descriptor. Fixed column widths, NOT NULLness, etc can be taken advantage of. Right now the JITed deforming is only used when deforming tuples as part of expression evaluation (and obviously only if the descriptor is known). It's likely to be beneficial in other cases, too. By default tuple deforming is JITed whenever an expression is JIT compiled. There's a separate boolean GUC controlling it, but that's expected to be primarily useful for development and benchmarking. Docs will follow in a later commit containing docs for the whole JIT feature. Author: Andres Freund Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de	2018-03-26 12:57:19 -07:00
Alvaro Herrera	555ee77a96	Handle INSERT .. ON CONFLICT with partitioned tables Commit `eb7ed3f306` enabled unique constraints on partitioned tables, but one thing that was not working properly is INSERT/ON CONFLICT. This commit introduces a new node keeps state related to the ON CONFLICT clause per partition, and fills it when that partition is about to be used for tuple routing. Author: Amit Langote, Álvaro Herrera Reviewed-by: Etsuro Fujita, Pavan Deolasee Discussion: https://postgr.es/m/20180228004602.cwdyralmg5ejdqkq@alvherre.pgsql	2018-03-26 10:43:54 -03:00
Tom Lane	3a2cb59887	Remove useless if-test. Coverity complained that this check is pointless, and it's right. There is no case where we'd call ExecutorStart with a null plannedstmt, and if we did, it'd have crashed before here. Thinko in commit `cc415a56d`.	2018-03-25 14:54:16 -04:00
Andres Freund	2a0faed9d7	Add expression compilation support to LLVM JIT provider. In addition to the interpretation of expressions (which back evaluation of WHERE clauses, target list projection, aggregates transition values etc) support compiling expressions to native code, using the infrastructure added in earlier commits. To avoid duplicating a lot of code, only support emitting code for cases that are likely to be performance critical. For expression steps that aren't deemed that, use the existing interpreter. The generated code isn't great - some architectural changes are required to address that. But this already yields a significant speedup for some analytics queries, particularly with WHERE clauses filtering a lot, or computing multiple aggregates. Author: Andres Freund Tested-By: Thomas Munro Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de Disable JITing for VALUES() nodes. VALUES() nodes are only ever executed once. This is primarily helpful for debugging, when forcing JITing even for cheap queries. Author: Andres Freund Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de	2018-03-22 14:45:59 -07:00
Andres Freund	cc415a56d0	Basic planner and executor integration for JIT. This adds simple cost based plan time decision about whether JIT should be performed. jit_above_cost, jit_optimize_above_cost are compared with the total cost of a plan, and if the cost is above them JIT is performed / optimization is performed respectively. For that PlannedStmt and EState have a jitFlags (es_jit_flags) field that stores information about what JIT operations should be performed. EState now also has a new es_jit field, which can store a JitContext. When there are no errors the context is released in standard_ExecutorEnd(). It is likely that the default values for jit_[optimize_]above_cost will need to be adapted further, but in my test these values seem to work reasonably. Author: Andres Freund, with feedback by Peter Eisentraut Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de	2018-03-22 11:51:58 -07:00
Andres Freund	4c0000b839	Handle EEOP_FUNCEXPR_[STRICT_]FUSAGE out of line. This isn't a very common op, and it doesn't seem worth duplicating for JIT. Author: Andres Freund	2018-03-20 17:32:21 -07:00
Alvaro Herrera	ee0a1fc84e	Remove unnecessary members from ModifyTableState and ExecInsert These values can be obtained from the ModifyTable node which is already a part of both the ModifyTableState and ExecInsert. Author: Álvaro Herrera, Amit Langote Reviewed-by: Peter Geoghegan Discussion: https://postgr.es/m/20180316151303.rml2p5wffn3o6qy6@alvherre.pgsql	2018-03-19 18:09:43 -03:00
Alvaro Herrera	839a8eb2b3	Expand comment a little bit The previous commit removed a comment that was a bit more verbose than its replacement.	2018-03-19 18:01:27 -03:00
Alvaro Herrera	6666ee49f4	Fix state reversal after partition tuple routing We make some changes to ModifyTableState and the EState it uses whenever we route tuples to partitions; but we weren't restoring properly in all cases, possibly causing crashes when partitions with different tuple descriptors are targeted by tuples inserted in the same command. Refactor some code, creating ExecPrepareTupleRouting, to encapsulate the needed state changing logic, and have it invoked one level above its current place (ie. put it in ExecModifyTable instead of ExecInsert); this makes it all more readable. Add a test case to exercise this. We don't support having views as partitions; and since only views can have INSTEAD OF triggers, there is no point in testing for INSTEAD OF when processing insertions into a partitioned table. Remove code that appears to support this (but which is actually never relevant.) In passing, fix location of some very confusing comments in ModifyTableState. Reported-by: Amit Langote Author: Etsuro Fujita, Amit Langote Discussion: https://postgr/es/m/0473bf5c-57b1-f1f7-3d58-455c2230bc5f@lab.ntt.co.jp	2018-03-19 17:45:53 -03:00
Tom Lane	8f5ac44043	Fix WHERE CURRENT OF when the referenced cursor uses an index-only scan. "UPDATE/DELETE WHERE CURRENT OF cursor_name" failed, with an error message like "cannot extract system attribute from virtual tuple", if the cursor was using a index-only scan for the target table. Fix it by digging the current TID out of the indexscan state. It seems likely that the same failure could occur for CustomScan plans and perhaps some FDW plan types, so that leaving this to be treated as an internal error with an obscure message isn't as good an idea as it first seemed. Hence, add a bit of heaptuple.c infrastructure to let us deliver a more on-topic message. I chose to make the message match what you get for the case where execCurrentOf can't identify the target scan node at all, "cursor "foo" is not a simply updatable scan of table "bar"". Perhaps it should be different, but we can always adjust that later. In the future, it might be nice to provide hooks that would let custom scan providers and/or FDWs deal with this in other ways; but that's not a suitable topic for a back-patchable bug fix. It's been like this all along, so back-patch to all supported branches. Yugo Nagata and Tom Lane Discussion: https://postgr.es/m/20180201013349.937dfc5f.nagata@sraoss.co.jp	2018-03-17 14:59:49 -04:00
Tom Lane	9e17bdb8a5	Fix query-lifespan memory leakage in repeatedly executed hash joins. ExecHashTableCreate allocated some memory that wasn't freed by ExecHashTableDestroy, specifically the per-hash-key function information. That's not a huge amount of data, but if one runs a query that repeats a hash join enough times, it builds up. Fix by arranging for the data in question to be kept in the hashtable's hashCxt instead of leaving it "loose" in the query-lifespan executor context. (This ensures that we'll also clean up anything that the hash functions allocate in fn_mcxt.) Per report from Amit Khandekar. It's been like this forever, so back-patch to all supported branches. Discussion: https://postgr.es/m/CAJ3gD9cFofAWGvcxLOxDHC=B0hjtW8yGmUsF2hdGh97CM38=7g@mail.gmail.com	2018-03-16 16:03:45 -04:00
Peter Eisentraut	33803f67f1	Support INOUT arguments in procedures In a top-level CALL, the values of INOUT arguments will be returned as a result row. In PL/pgSQL, the values are assigned back to the input arguments. In other languages, the same convention as for return a record from a function is used. That does not require any code changes in the PL implementations. Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>	2018-03-14 12:07:28 -04:00
Stephen Frost	97d18ce27d	Fix comment for ExecProcessReturning resultRelInfo is the argument for the function, not projectReturning. Author: Etsuro Fujita Discussion: https://postgr.es/m/5AA8E11E.1040609@lab.ntt.co.jp	2018-03-14 09:28:08 -04:00
Andres Freund	d06aba240d	Fix parent node of WCO expressions in partitioned tables. Since `edd44738bc` WCO expressions of partitioned tables are initialized with the first subplan as parent. That's not correct, as the correct context is the ModifyTableState node. That's also what is used for RETURNING processing, initialized nearby. This appears not to cause any visible problems for in core code, but is problematic for in development patch. Discussion: https://postgr.es/m/20180303043818.tnvlo243bgy7una3@alap3.anarazel.de	2018-03-05 17:49:59 -08:00
Peter Eisentraut	fd1a421fe6	Add prokind column, replacing proisagg and proiswindow The new column distinguishes normal functions, procedures, aggregates, and window functions. This replaces the existing columns proisagg and proiswindow, and replaces the convention that procedures are indicated by prorettype == 0. Also change prorettype to be VOIDOID for procedures. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz>	2018-03-02 13:48:33 -05:00
Robert Haas	ce1663cdcd	Fix assertion failure when Parallel Append is run serially. Parallel-aware plan nodes must be prepared to run without parallelism if it's not possible at execution time for whatever reason. Commit `ab72716778`, which introduced Parallel Append, overlooked this. Rajkumar Raghuwanshi reported this problem, and I included his test case in this patch. The code changes are by me. Discussion: http://postgr.es/m/CAKcux6=WqkUudLg1GLZZ7fc5ScWC1+Y9qD=pAHeqy32WoeJQvw@mail.gmail.com	2018-02-28 10:58:27 -05:00
Tom Lane	e98a4de7d2	Use the correct tuplestore read pointer in a NamedTuplestoreScan. Tom Kazimiers reported that transition tables don't work correctly when they are scanned by more than one executor node. That's because commit `18ce3a4ab` allocated separate read pointers for each executor node, as it must, but failed to make them active at the appropriate times. Repair. Thomas Munro Discussion: https://postgr.es/m/20180224034748.bixarv6632vbxgeb%40dewberry.localdomain	2018-02-27 15:56:51 -05:00
Tom Lane	9fe802c818	Fix brown-paper-bag bug in commit `0a459cec96`. RANGE_OFFSET comparisons need to examine the first ORDER BY column, which isn't necessarily the first column in the incoming tuples. No idea how this slipped through initial testing. Per bug #15082 from Zhou Digoal. Discussion: https://postgr.es/m/151939899974.1461.9411971793110285476@wrigleys.postgresql.org	2018-02-23 15:11:40 -05:00
Robert Haas	edd44738bc	Be lazier about partition tuple routing. It's not necessary to fully initialize the executor data structures for partitions to which no tuples are ever routed. Consider, for example, an INSERT statement that inserts only one row: it only cares about the partition to which that one row is routed. The new function ExecInitPartitionInfo performs the initialization in question only when a particular partition is about to receive a tuple. This includes creating, validating, and saving a pointer to the ResultRelInfo, setting up for speculative insertions, translating WCOs and initializing the resulting expressions, translating returning lists and building the appropriate projection information, and setting up a tuple conversion map. One thing that's not deferred is locking the child partitions; that seems desirable but would need more thought. Still, testing shows that this makes single-row inserts significantly faster on a table with many partitions without harming the bulk-insert case. Amit Langote, reviewed by Etsuro Fujita, with a few changes by me Discussion: http://postgr.es/m/8975331d-d961-cbdd-f862-fdd3d97dc2d0@lab.ntt.co.jp	2018-02-22 10:55:54 -05:00
Robert Haas	810e7e264a	Remove extra word from comment. Etsuro Fujita Discussion: http://postgr.es/m/5A8EAF74.5010905@lab.ntt.co.jp	2018-02-22 10:08:03 -05:00
Tom Lane	159efe4af4	Fix misbehavior of CTE-used-in-a-subplan during EPQ rechecks. An updating query that reads a CTE within an InitPlan or SubPlan could get incorrect results if it updates rows that are concurrently being modified. This is caused by CteScanNext supposing that nothing inside its recursive ExecProcNode call could change which read pointer is selected in the CTE's shared tuplestore. While that's normally true because of scoping considerations, it can break down if an EPQ plan tree gets built during the call, because EvalPlanQualStart builds execution trees for all subplans whether they're going to be used during the recheck or not. And it seems like a pretty shaky assumption anyway, so let's just reselect our own read pointer here. Per bug #14870 from Andrei Gorita. This has been broken since CTEs were implemented, so back-patch to all supported branches. Discussion: https://postgr.es/m/20171024155358.1471.82377@wrigleys.postgresql.org	2018-02-19 16:00:31 -05:00
Tom Lane	8c44802b6e	Remove redundant initialization of a local variable. In what was doubtless a typo, commit `bf6c614a2` introduced a duplicate initialization of a local variable. This made Coverity unhappy, as well as pretty much anybody reading the code. We don't even have a real use for the local variable, so just remove it.	2018-02-18 23:32:56 -05:00
Andres Freund	ad7dbee368	Allow tupleslots to have a fixed tupledesc, use in executor nodes. The reason for doing so is that it will allow expression evaluation to optimize based on the underlying tupledesc. In particular it will allow to JIT tuple deforming together with the expression itself. For that expression initialization needs to be moved after the relevant slots are initialized - mostly unproblematic, except in the case of nodeWorktablescan.c. After doing so there's no need for ExecAssignResultType() and ExecAssignResultTypeFromTL() anymore, as all former callers have been converted to create a slot with a fixed descriptor. When creating a slot with a fixed descriptor, tts_values/isnull can be allocated together with the main slot, reducing allocation overhead and increasing cache density a bit. Author: Andres Freund Discussion: https://postgr.es/m/20171206093717.vqdxe5icqttpxs3p@alap3.anarazel.de	2018-02-16 21:17:38 -08:00
Andres Freund	bf6c614a2f	Do execGrouping.c via expression eval machinery, take two. This has a performance benefit on own, although not hugely so. The primary benefit is that it will allow for to JIT tuple deforming and comparator invocations. Large parts of this were previously committed (`773aec7aa`), but the commit contained an omission around cross-type comparisons and was thus reverted. Author: Andres Freund Discussion: https://postgr.es/m/20171129080934.amqqkke2zjtekd4t@alap3.anarazel.de	2018-02-16 14:38:13 -08:00
Andres Freund	2a41507dab	Revert "Do execGrouping.c via expression eval machinery." This reverts commit `773aec7aa9`. There's an unresolved issue in the reverted commit: It only creates one comparator function, but in for the nodeSubplan.c case we need more (c.f. FindTupleHashEntry vs LookupTupleHashEntry calls in nodeSubplan.c). This isn't too difficult to fix, but it's not entirely trivial either. The fact that the issue only causes breakage on 32bit systems shows that the current test coverage isn't that great. To avoid turning half the buildfarm red till those two issues are addressed, revert.	2018-02-15 22:39:18 -08:00
Andres Freund	773aec7aa9	Do execGrouping.c via expression eval machinery. This has a performance benefit on own, although not hugely so. The primary benefit is that it will allow for to JIT tuple deforming and comparator invocations. Author: Andres Freund Discussion: https://postgr.es/m/20171129080934.amqqkke2zjtekd4t@alap3.anarazel.de	2018-02-15 21:55:31 -08:00
Tom Lane	4b93f57999	Make plpgsql use its DTYPE_REC code paths for composite-type variables. Formerly, DTYPE_REC was used only for variables declared as "record"; variables of named composite types used DTYPE_ROW, which is faster for some purposes but much less flexible. In particular, the ROW code paths are entirely incapable of dealing with DDL-caused changes to the number or data types of the columns of a row variable, once a particular plpgsql function has been parsed for the first time in a session. And, since the stored representation of a ROW isn't a tuple, there wasn't any easy way to deal with variables of domain-over-composite types, since the domain constraint checking code would expect the value to be checked to be a tuple. A lesser, but still real, annoyance is that ROW format cannot represent a true NULL composite value, only a row of per-field NULL values, which is not exactly the same thing. Hence, switch to using DTYPE_REC for all composite-typed variables, whether "record", named composite type, or domain over named composite type. DTYPE_ROW remains but is used only for its native purpose, to represent a fixed-at-compile-time list of variables, for instance the targets of an INTO clause. To accomplish this without taking significant performance losses, introduce infrastructure that allows storing composite-type variables as "expanded objects", similar to the "expanded array" infrastructure introduced in commit `1dc5ebc90`. A composite variable's value is thereby kept (most of the time) in the form of separate Datums, so that field accesses and updates are not much more expensive than they were in the ROW format. This holds the line, more or less, on performance of variables of named composite types in field-access-intensive microbenchmarks, and makes variables declared "record" perform much better than before in similar tests. In addition, the logic involved with enforcing composite-domain constraints against updates of individual fields is in the expanded record infrastructure not plpgsql proper, so that it might be reusable for other purposes. In further support of this, introduce a typcache feature for assigning a unique-within-process identifier to each distinct tuple descriptor of interest; in particular, DDL alterations on composite types result in a new identifier for that type. This allows very cheap detection of the need to refresh tupdesc-dependent data. This improves on the "tupDescSeqNo" idea I had in commit 687f096ea: that assigned identifying sequence numbers to successive versions of individual composite types, but the numbers were not unique across different types, nor was there support for assigning numbers to registered record types. In passing, allow plpgsql functions to accept as well as return type "record". There was no good reason for the old restriction, and it was out of step with most of the other PLs. Tom Lane, reviewed by Pavel Stehule Discussion: https://postgr.es/m/8962.1514399547@sss.pgh.pa.us	2018-02-13 18:52:21 -05:00
Robert Haas	e44dd84325	Avoid listing the same ResultRelInfo in more than one EState list. Doing so causes EXPLAIN ANALYZE to show trigger statistics multiple times. Commit `2f17844104` seems to be to blame for this. Amit Langote, revieed by Amit Khandekar, Etsuro Fujita, and me.	2018-02-08 14:29:05 -05:00
Robert Haas	88fdc70060	Fix possible infinite loop with Parallel Append. When the previously-chosen plan was non-partial, all pa_finished flags for partial plans are now set, and pa_next_plan has not yet been set to INVALID_SUBPLAN_INDEX, the previous code could go into an infinite loop. Report by Rajkumar Raghuwanshi. Patch by Amit Khandekar and me. Review by Kyotaro Horiguchi. Discussion: http://postgr.es/m/CAJ3gD9cf43z78qY=U=H0HvOEN341qfRO-vLpnKPSviHeWgJQ5w@mail.gmail.com	2018-02-08 12:31:48 -05:00
Tom Lane	0a459cec96	Support all SQL:2011 options for window frame clauses. This patch adds the ability to use "RANGE offset PRECEDING/FOLLOWING" frame boundaries in window functions. We'd punted on that back in the original patch to add window functions, because it was not clear how to do it in a reasonably data-type-extensible fashion. That problem is resolved here by adding the ability for btree operator classes to provide an "in_range" support function that defines how to add or subtract the RANGE offset value. Factoring it this way also allows the operator class to avoid overflow problems near the ends of the datatype's range, if it wishes to expend effort on that. (In the committed patch, the integer opclasses handle that issue, but it did not seem worth the trouble to avoid overflow failures for datetime types.) The patch includes in_range support for the integer_ops opfamily (int2/int4/int8) as well as the standard datetime types. Support for other numeric types has been requested, but that seems like suitable material for a follow-on patch. In addition, the patch adds GROUPS mode which counts the offset in ORDER-BY peer groups rather than rows, and it adds the frame_exclusion options specified by SQL:2011. As far as I can see, we are now fully up to spec on window framing options. Existing behaviors remain unchanged, except that I changed the errcode for a couple of existing error reports to meet the SQL spec's expectation that negative "offset" values should be reported as SQLSTATE 22013. Internally and in relevant parts of the documentation, we now consistently use the terminology "offset PRECEDING/FOLLOWING" rather than "value PRECEDING/FOLLOWING", since the term "value" is confusingly vague. Oliver Ford, reviewed and whacked around some by me Discussion: https://postgr.es/m/CAGMVOdu9sivPAxbNN0X+q19Sfv9edEPv=HibOJhB14TJv_RCQg@mail.gmail.com	2018-02-07 00:06:56 -05:00
Robert Haas	2320945731	Fix incorrect grammar. Etsuro Fujita Discussion: http://postgr.es/m/5A7981EA.8020201@lab.ntt.co.jp	2018-02-06 15:50:13 -05:00
Tom Lane	05d0f13f07	Skip setting up shared instrumentation for Hash node if not needed. We don't need to set up the shared space for hash join instrumentation data if instrumentation hasn't been requested. Let's follow the example of the similar Sort node code and save a few cycles by skipping that when we can. This reverts commit `d59ff4ab3` and instead allows us to use the safer choice of passing noError = false to shm_toc_lookup in ExecHashInitializeWorker, since if we reach that call there should be a TOC entry to be found. Thomas Munro Discussion: https://postgr.es/m/E1ehkoZ-0005uW-43%40gemulon.postgresql.org	2018-02-04 22:14:07 -05:00
Tom Lane	d59ff4ab31	Fix another instance of unsafe coding for shm_toc_lookup failure. One or another author of commit `5bcf389ec` seems to have thought that computing an offset from a NULL pointer would yield another NULL pointer. There may possibly be architectures where that works, but common machines don't work like that. Per a quick code review of places calling shm_toc_lookup and not using noError = false.	2018-02-02 18:32:05 -05:00
Robert Haas	9da0cc3528	Support parallel btree index builds. To make this work, tuplesort.c and logtape.c must also support parallelism, so this patch adds that infrastructure and then applies it to the particular case of parallel btree index builds. Testing to date shows that this can often be 2-3x faster than a serial index build. The model for deciding how many workers to use is fairly primitive at present, but it's better than not having the feature. We can refine it as we get more experience. Peter Geoghegan with some help from Rushabh Lathia. While Heikki Linnakangas is not an author of this patch, he wrote other patches without which this feature would not have been possible, and therefore the release notes should possibly credit him as an author of this feature. Reviewed by Claudio Freire, Heikki Linnakangas, Thomas Munro, Tels, Amit Kapila, me. Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com	2018-02-02 13:32:44 -05:00
Robert Haas	9222c0d9ed	Add new function WaitForParallelWorkersToAttach. Once this function has been called, we know that all workers have started and attached to their error queues -- so if any of them subsequently exit uncleanly, we'll be sure to throw an ERROR promptly. Otherwise, users of the ParallelContext machinery must be careful not to wait forever for a worker that has failed to start. Parallel query manages to work without needing this for reasons explained in new comments added by this patch, but it's a useful primitive for other parallel operations, such as the pending patch to make creating a btree index run in parallel. Amit Kapila, revised by me. Additional review by Peter Geoghegan. Discussion: http://postgr.es/m/CAA4eK1+e2MzyouF5bg=OtyhDSX+=Ao=3htN=T-r_6s3gCtKFiw@mail.gmail.com	2018-02-02 09:00:59 -05:00
Andres Freund	c12693d8f3	Introduce ExecQualAndReset() helper. It's a common task to evaluate a qual and reset the corresponding expression context. Currently that requires storing the result of the qual eval, resetting the context, and then reacting on the result. As that's awkward several places only reset the context next time through a node. That's not great, so introduce a helper that evaluates and resets. It's a bit ugly that it currently uses MemoryContextReset() instead of ResetExprContext(), but that seems easier than reordering all of executor.h. Author: Andres Freund Discussion: https://postgr.es/m/20180109222544.f7loxrunqh3xjl5f@alap3.anarazel.de	2018-01-29 12:19:12 -08:00
Andres Freund	fc96c69425	Initialize unused ExprEvalStep fields. ExecPushExprSlots didn't initialize ExprEvalStep's resvalue/resnull steps as it didn't use them. That caused wrong valgrind warnings for an upcoming patch, so zero-intialize. Also zero-initialize all scratch ExprEvalStep's allocated on the stack, to avoid issues with similar future omissions of non-critial data.	2018-01-29 12:01:07 -08:00
Andres Freund	c068f87723	Improve bit perturbation in TupleHashTableHash. The changes in `b81b5a96f4` did not fully address the issue, because the bit-mixing of the IV into the final hash-key didn't prevent clustering in the input-data survive in the output data. This didn't cause a lot of problems because of the additional growth conditions added `d4c62a6b62`. But as we want to rein those in due to explosive growth in some edges, this needs to be fixed. Author: Andres Freund Discussion: https://postgr.es/m/20171127185700.1470.20362@wrigleys.postgresql.org Backpatch: 10, where simplehash was introduced	2018-01-29 11:24:57 -08:00
Tom Lane	2e668c522e	Avoid crash during EvalPlanQual recheck of an inner indexscan. Commit `09529a70b` changed nodeIndexscan.c and nodeIndexonlyscan.c to postpone initialization of the indexscan proper until the first tuple fetch. It overlooked the question of mark/restore behavior, which means that if some caller attempts to mark the scan before the first tuple fetch, you get a null pointer dereference. The only existing user of mark/restore is nodeMergejoin.c, which (somewhat accidentally) will never attempt to set a mark before the first inner tuple unless the inner child node is a Material node. Hence the case can't arise normally, so it seems sufficient to document the assumption at both ends. However, during an EvalPlanQual recheck, ExecScanFetch doesn't call IndexNext but just returns the jammed-in test tuple. Therefore, if we're doing a recheck in a plan tree with a mergejoin with inner indexscan, it's possible to reach ExecIndexMarkPos with iss_ScanDesc still null, as reported by Guo Xiang Tan in bug #15032. Really, when there's a test tuple supplied during an EPQ recheck, touching the index at all is the wrong thing: rather, the behavior of mark/restore ought to amount to saving and restoring the es_epqScanDone flag. We can avoid finding a place to actually save the flag, for the moment, because given the assumption that no caller will set a mark before fetching a tuple, es_epqScanDone must always be set by the time we try to mark. So the actual behavior change required is just to not reach the index access if a test tuple is supplied. The set of plan node types that need to consider this issue are those that support EPQ test tuples (i.e., call ExecScan()) and also support mark/restore; which is to say, IndexScan, IndexOnlyScan, and perhaps CustomScan. It's tempting to try to fix the problem in one place by teaching ExecMarkPos() itself about EPQ; but ExecMarkPos supports some plan types that aren't Scans, and also it seems risky to make assumptions about what a CustomScan wants to do here. Also, the most likely future change here is to decide that we do need to support marks placed before the first tuple, which would require additional work in IndexScan and IndexOnlyScan in any case. Hence, fix the EPQ issue in nodeIndexscan.c and nodeIndexonlyscan.c, accepting the small amount of code duplicated thereby, and leave it to CustomScan providers to fix this bug if they have it. Back-patch to v10 where commit `09529a70b` came in. In earlier branches, the index_markpos() call is a waste of cycles when EPQ is active, but no more than that, so it doesn't seem appropriate to back-patch further. Discussion: https://postgr.es/m/20180126074932.3098.97815@wrigleys.postgresql.org	2018-01-27 13:52:24 -05:00
Tom Lane	bb415675d8	Add missing "static" markers. Per buildfarm.	2018-01-25 14:32:28 -05:00
Robert Haas	945f71db84	Avoid referencing off the end of subplan_partition_offsets. Report by buildfarm member skink and Tom Lane. Analysis by me. Patch by Amit Khandekar. Discussion: http://postgr.es/m/CAJ3gD9fVA1iXQYhfqHP5n_TEd4U9=V8TL_cc-oKRnRmxgdvJrQ@mail.gmail.com	2018-01-24 16:34:51 -05:00
Tom Lane	434e6e1484	Improve implementation of pg_attribute_always_inline. Avoid compiler warnings on MSVC (which doesn't want to see both __forceinline and inline) and ancient GCC (which doesn't have __attribute__((always_inline))). Don't force inline-ing when building at -O0, as the programmer is probably hoping for exact source-to-object-line correspondence in that case. (For the moment this only works for GCC; maybe we can extend it later.) Make pg_attribute_always_inline be syntactically a drop-in replacement for inline, rather than an additional wart. And improve the comments. Thomas Munro and Michail Nikolaev, small tweaks by me Discussion: https://postgr.es/m/32278.1514863068@sss.pgh.pa.us Discussion: https://postgr.es/m/CANtu0oiYp74brgntKOxgg1FK5+t8uQ05guSiFU6FYz_5KUhr6Q@mail.gmail.com	2018-01-23 23:07:13 -05:00
Peter Eisentraut	8561e4840c	Transaction control in PL procedures In each of the supplied procedural languages (PL/pgSQL, PL/Perl, PL/Python, PL/Tcl), add language-specific commit and rollback functions/commands to control transactions in procedures in that language. Add similar underlying functions to SPI. Some additional cleanup so that transaction commit or abort doesn't blow away data structures still used by the procedure call. Add execution context tracking to CALL and DO statements so that transaction control commands can only be issued in top-level procedure and block calls, not function calls or other procedure or block calls. - SPI Add a new function SPI_connect_ext() that is like SPI_connect() but allows passing option flags. The only option flag right now is SPI_OPT_NONATOMIC. A nonatomic SPI connection can execute transaction control commands, otherwise it's not allowed. This is meant to be passed down from CALL and DO statements which themselves know in which context they are called. A nonatomic SPI connection uses different memory management. A normal SPI connection allocates its memory in TopTransactionContext. For nonatomic connections we use PortalContext instead. As the comment in SPI_connect_ext() (previously SPI_connect()) indicates, one could potentially use PortalContext in all cases, but it seems safest to leave the existing uses alone, because this stuff is complicated enough already. SPI also gets new functions SPI_start_transaction(), SPI_commit(), and SPI_rollback(), which can be used by PLs to implement their transaction control logic. - portalmem.c Some adjustments were made in the code that cleans up portals at transaction abort. The portal code could already handle a command committing a transaction and continuing (e.g., VACUUM), but it was not quite prepared for a command aborting a transaction and continuing. In AtAbort_Portals(), remove the code that marks an active portal as failed. As the comment there already predicted, this doesn't work if the running command wants to keep running after transaction abort. And it's actually not necessary, because pquery.c is careful to run all portal code in a PG_TRY block and explicitly runs MarkPortalFailed() if there is an exception. So the code in AtAbort_Portals() is never used anyway. In AtAbort_Portals() and AtCleanup_Portals(), we need to be careful not to clean up active portals too much. This mirrors similar code in PreCommit_Portals(). - PL/Perl Gets new functions spi_commit() and spi_rollback() - PL/pgSQL Gets new commands COMMIT and ROLLBACK. Update the PL/SQL porting example in the documentation to reflect that transactions are now possible in procedures. - PL/Python Gets new functions plpy.commit and plpy.rollback. - PL/Tcl Gets new commands commit and rollback. Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>	2018-01-22 08:43:06 -05:00
Robert Haas	2f17844104	Allow UPDATE to move rows between partitions. When an UPDATE causes a row to no longer match the partition constraint, try to move it to a different partition where it does match the partition constraint. In essence, the UPDATE is split into a DELETE from the old partition and an INSERT into the new one. This can lead to surprising behavior in concurrency scenarios because EvalPlanQual rechecks won't work as they normally did; the known problems are documented. (There is a pending patch to improve the situation further, but it needs more review.) Amit Khandekar, reviewed and tested by Amit Langote, David Rowley, Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro Herrera, Amit Kapila, and me. A few final revisions by me. Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com	2018-01-19 15:33:06 -05:00

1 2 3 4 5 ...

1966 Commits