postgresql

Commit Graph

Author	SHA1	Message	Date
Robert Haas	e5253fdc4f	Add parallel_leader_participation GUC. Sometimes, for testing, it's useful to have the leader do nothing but read tuples from workers; and it's possible that could work out better even in production. Thomas Munro, reviewed by Amit Kapila and by me. A few final tweaks by me. Discussion: http://postgr.es/m/CAEepm=2U++Lp3bNTv2Bv_kkr5NE2pOyHhxU=G0YTa4ZhSYhHiw@mail.gmail.com	2017-11-15 08:23:18 -05:00
Robert Haas	5edc63bda6	Account for the effect of lossy pages when costing bitmap scans. Dilip Kumar, reviewed by Alexander Kumenkov, Amul Sul, and me. Some final adjustments by me. Discussion: http://postgr.es/m/CAFiTN-sYtqUOXQ4SpuhTv0Z9gD0si3YxZGv_PQAAMX8qbOotcg@mail.gmail.com	2017-11-10 16:50:50 -05:00
Tom Lane	7b6c075471	Teach planner to account for HAVING quals in aggregation plan nodes. For some reason, we have never accounted for either the evaluation cost or the selectivity of filter conditions attached to Agg and Group nodes (which, in practice, are always conditions from a HAVING clause). Applying our regular selectivity logic to post-grouping conditions is a bit bogus, but it's surely better than taking the selectivity as 1.0. Perhaps someday the extended-statistics mechanism can be taught to provide statistics that would help us in getting non-default estimates here. Per a gripe from Benjamin Coutu. This is surely a bug fix, but I'm hesitant to back-patch because of the prospect of destabilizing existing plan choices. Given that it took us this long to notice the bug, it's probably not hurting too many people in the field. Discussion: https://postgr.es/m/20968.1509486337@sss.pgh.pa.us	2017-11-02 11:24:12 -04:00
Robert Haas	f49842d1ee	Basic partition-wise join functionality. Instead of joining two partitioned tables in their entirety we can, if it is an equi-join on the partition keys, join the matching partitions individually. This involves teaching the planner about "other join" rels, which are related to regular join rels in the same way that other member rels are related to baserels. This can use significantly more CPU time and memory than regular join planning, because there may now be a set of "other" rels not only for every base relation but also for every join relation. In most practical cases, this probably shouldn't be a problem, because (1) it's probably unusual to join many tables each with many partitions using the partition keys for all joins and (2) if you do that scenario then you probably have a big enough machine to handle the increased memory cost of planning and (3) the resulting plan is highly likely to be better, so what you spend in planning you'll make up on the execution side. All the same, for now, turn this feature off by default. Currently, we can only perform joins between two tables whose partitioning schemes are absolutely identical. It would be nice to cope with other scenarios, such as extra partitions on one side or the other with no match on the other side, but that will have to wait for a future patch. Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit Khandekar, and by me. A few final adjustments by me. Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com	2017-10-06 11:11:10 -04:00
Tom Lane	c12d570fa1	Support arrays over domains. Allowing arrays with a domain type as their element type was left un-done in the original domain patch, but not for any very good reason. This omission leads to such surprising results as array_agg() not working on a domain column, because the parser can't identify a suitable output type for the polymorphic aggregate. In order to fix this, first clean up the APIs of coerce_to_domain() and some internal functions in parse_coerce.c so that we consistently pass around a CoercionContext along with CoercionForm. Previously, we sometimes passed an "isExplicit" boolean flag instead, which is strictly less information; and coerce_to_domain() didn't even get that, but instead had to reverse-engineer isExplicit from CoercionForm. That's contrary to the documentation in primnodes.h that says that CoercionForm only affects display and not semantics. I don't think this change fixes any live bugs, but it makes things more consistent. The main reason for doing it though is that now build_coercion_expression() receives ccontext, which it needs in order to be able to recursively invoke coerce_to_target_type(). Next, reimplement ArrayCoerceExpr so that the node does not directly know any details of what has to be done to the individual array elements while performing the array coercion. Instead, the per-element processing is represented by a sub-expression whose input is a source array element and whose output is a target array element. This simplifies life in parse_coerce.c, because it can build that sub-expression by a recursive invocation of coerce_to_target_type(). The executor now handles the per-element processing as a compiled expression instead of hard-wired code. The main advantage of this is that we can use a single ArrayCoerceExpr to handle as many as three successive steps per element: base type conversion, typmod coercion, and domain constraint checking. The old code used two stacked ArrayCoerceExprs to handle type + typmod coercion, which was pretty inefficient, and adding yet another array deconstruction to do domain constraint checking seemed very unappetizing. In the case where we just need a single, very simple coercion function, doing this straightforwardly leads to a noticeable increase in the per-array-element runtime cost. Hence, add an additional shortcut evalfunc in execExprInterp.c that skips unnecessary overhead for that specific form of expression. The runtime speed of simple cases is within 1% or so of where it was before, while cases that previously required two levels of array processing are significantly faster. Finally, create an implicit array type for every domain type, as we do for base types, enums, etc. Everything except the array-coercion case seems to just work without further effort. Tom Lane, reviewed by Andrew Dunstan Discussion: https://postgr.es/m/9852.1499791473@sss.pgh.pa.us	2017-09-30 13:40:56 -04:00
Andrew Dunstan	28ae524bbf	Quieten warnings about unused variables These variables are only ever written to in assertion-enabled builds, and the latest Microsoft compilers complain about such variables in non-assertion-enabled builds. Apparently they don't worry so much about variables that are written to but not read from, so most of our PG_USED_FOR_ASSERTS_ONLY variables don't cause the problem. Discussion: https://postgr.es/m/7800.1505950322@sss.pgh.pa.us	2017-09-21 08:41:14 -04:00
Tom Lane	4867d7f62f	Avoid out-of-memory in a hash join with many duplicate inner keys. The executor is capable of splitting buckets during a hash join if too much memory is being used by a small number of buckets. However, this only helps if a bucket's population is actually divisible; if all the hash keys are alike, the tuples still end up in the same new bucket. This can result in an OOM failure if there are enough inner keys with identical hash values. The planner's cost estimates will bias it against choosing a hash join in such situations, but not by so much that it will never do so. To mitigate the OOM hazard, explicitly estimate the hash bucket space needed by just the inner side's most common value, and if that would exceed work_mem then add disable_cost to the hash cost estimate. This approach doesn't account for the possibility that two or more common values would share the same hash value. On the other hand, work_mem is normally a fairly conservative bound, so that eating two or more times that much space is probably not going to kill us. If we have no stats about the inner side, ignore this consideration. There was some discussion of making a conservative assumption, but that would effectively result in disabling hash join whenever we lack stats, which seems like an overreaction given how seldom the problem manifests in the field. Per a complaint from David Hinkle. Although this could be viewed as a bug fix, the lack of similar complaints weighs against back- patching; indeed we waited for v11 because it seemed already rather late in the v10 cycle to be making plan choice changes like this one. Discussion: https://postgr.es/m/32013.1487271761@sss.pgh.pa.us	2017-08-15 14:05:53 -04:00
Tom Lane	decb08ebdf	Code review for NextValueExpr expression node type. Add missing infrastructure for this node type, notably in ruleutils.c where its lack could demonstrably cause EXPLAIN to fail. Add outfuncs/readfuncs support. (outfuncs support is useful today for debugging purposes. The readfuncs support may never be needed, since at present it would only matter for parallel query and NextValueExpr should never appear in a parallelizable query; but it seems like a bad idea to have a primnode type that isn't fully supported here.) Teach planner infrastructure that NextValueExpr is a volatile, parallel-unsafe, non-leaky expression node with cost cpu_operator_cost. Given its limited scope of usage, there might be no live bug today from the lack of that knowledge, but it's certainly going to bite us on the rear someday. Teach pg_stat_statements about the new node type, too. While at it, also teach cost_qual_eval() that MinMaxExpr, SQLValueFunction, XmlExpr, and CoerceToDomain should be charged as cpu_operator_cost. Failing to do this for SQLValueFunction was an oversight in my commit `0bb51aa96`. The others are longer-standing oversights, but no time like the present to fix them. (In principle, CoerceToDomain could have cost much higher than this, but it doesn't presently seem worth trying to examine the domain's constraints here.) Modify execExprInterp.c to execute NextValueExpr as an out-of-line function; it seems quite unlikely to me that it's worth insisting that it be inlined in all expression eval methods. Besides, providing the out-of-line function doesn't stop anyone from inlining if they want to. Adjust some places where NextValueExpr support had been inserted with the aid of a dartboard rather than keeping it in the same order as elsewhere. Discussion: https://postgr.es/m/23862.1499981661@sss.pgh.pa.us	2017-07-14 15:25:43 -04:00
Tom Lane	382ceffdf7	Phase 3 of pgindent updates. Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us	2017-06-21 15:35:54 -04:00
Tom Lane	c7b8998ebb	Phase 2 of pgindent updates. Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit `e3860ffa4d` wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us	2017-06-21 15:19:25 -04:00
Tom Lane	e3860ffa4d	Initial pgindent run with pg_bsd_indent version 2.0. The new indent version includes numerous fixes thanks to Piotr Stefaniak. The main changes visible in this commit are: * Nicer formatting of function-pointer declarations. * No longer unexpectedly removes spaces in expressions using casts, sizeof, or offsetof. * No longer wants to add a space in "struct structname varname", as well as some similar cases for const- or volatile-qualified pointers. Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely. * Fixes bug where comments following declarations were sometimes placed with no space separating them from the code. * Fixes some odd decisions for comments following case labels. * Fixes some cases where comments following code were indented to less than the expected column 33. On the less good side, it now tends to put more whitespace around typedef names that are not listed in typedefs.list. This might encourage us to put more effort into typedef name collection; it's not really a bug in indent itself. There are more changes coming after this round, having to do with comment indentation and alignment of lines appearing within parentheses. I wanted to limit the size of the diffs to something that could be reviewed without one's eyes completely glazing over, so it seemed better to split up the changes as much as practical. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us	2017-06-21 14:39:04 -04:00
Tom Lane	d8e6b84bd2	Avoid regressions in foreign-key-based selectivity estimates. David Rowley found that the "use the smallest per-column selectivity" heuristic applied in some cases by get_foreign_key_join_selectivity() was badly off if the FK columns are independent, producing estimates much worse than we got before that code was added in 9.6. One case where that heuristic was used was for LEFT and FULL outer joins with the referenced rel on the outside of the join. But we should not really need to special-case those here. eqjoinsel() never has had such a special case; the correction is applied by calc_joinrel_size_estimate() instead. Let's just estimate such cases like inner joins and rely on that later adjustment. (I think there was something of a thinko here, in that the comments seem to be thinking about the selectivity as defined for semi/anti joins; but that shouldn't apply to left/full joins.) Add a regression test exercising such a case to show that this is sane in at least some cases. The other case where we used that heuristic was for SEMI/ANTI outer joins, either if the referenced rel was on the outside, or if it was on the inside but was part of a join within the RHS. In either case, the FK doesn't give us a lot of traction towards estimating the selectivity. To ensure that we don't have regressions from what happened before 9.6, let's punt by ignoring the FK in such cases and applying the traditional selectivity calculation. (We might be able to improve on that later, but for now I just want to be sure it's not worse than 9.5.) Report and patch by David Rowley, simplified a bit by me. Back-patch to 9.6 where this code was added. Discussion: https://postgr.es/m/CAKJS1f8NO8oCDcxrteohG6O72uU1saEVT9qX=R8pENr5QWerXw@mail.gmail.com	2017-06-19 15:33:41 -04:00
Tom Lane	23886581b5	Fix old corner-case logic error in final_cost_nestloop(). When costing a nestloop with stop-at-first-inner-match semantics, and a non-indexscan inner path, final_cost_nestloop() wants to charge the full scan cost of the inner rel at least once, with additional scans charged at inner_rescan_run_cost which might be less. However the logic for doing this effectively assumed that outer_matched_rows is at least 1. If it's zero, which is not unlikely for a small outer rel, we ended up charging inner_run_cost plus N times inner_rescan_run_cost, as much as double the correct charge for an outer rel with only one row that we're betting won't be matched. (Unless the inner rel is materialized, in which case it has very small inner_rescan_run_cost and the cost is not so far off what it should have been.) The upshot of this was that the planner had a tendency to select plans that failed to make effective use of the stop-at-first-inner-match semantics, and that might have Materialize nodes in them even when the predicted number of executions of the Materialize subplan was only 1. This was not so obvious before commit `9c7f5229a`, because the case only arose in connection with semi/anti joins where there's not freedom to reverse the join order. But with the addition of unique-inner joins, it could result in some fairly bad planning choices, as reported by Teodor Sigaev. Indeed, some of the test cases added by that commit have plans that look dubious on closer inspection, and are changed by this patch. Fix the logic to ensure that we don't charge for too many inner scans. I chose to adjust it so that the full-freight scan cost is associated with an unmatched outer row if possible, not a matched one, since that seems like a better model of what would happen at runtime. This is a longstanding bug, but given the lesser impact in back branches, and the lack of field complaints, I won't risk a back-patch. Discussion: https://postgr.es/m/CAKJS1f-LzkUsFxdJ_-Luy38orQ+AdEXM5o+vANR+-pHAWPSecg@mail.gmail.com	2017-06-03 13:48:15 -04:00
Bruce Momjian	a6fd7b7a5f	Post-PG 10 beta1 pgindent run perltidy run not included.	2017-05-17 16:31:56 -04:00
Tom Lane	8f0530f580	Improve castNode notation by introducing list-extraction-specific variants. This extends the castNode() notation introduced by commit `5bcab1114` to provide, in one step, extraction of a list cell's pointer and coercion to a concrete node type. For example, "lfirst_node(Foo, lc)" is the same as "castNode(Foo, lfirst(lc))". Almost half of the uses of castNode that have appeared so far include a list extraction call, so this is pretty widely useful, and it saves a few more keystrokes compared to the old way. As with the previous patch, back-patch the addition of these macros to pg_list.h, so that the notation will be available when back-patching. Patch by me, after an idea of Andrew Gierth's. Discussion: https://postgr.es/m/14197.1491841216@sss.pgh.pa.us	2017-04-10 13:51:53 -04:00
Tom Lane	9c7f5229ad	Optimize joins when the inner relation can be proven unique. If there can certainly be no more than one matching inner row for a given outer row, then the executor can move on to the next outer row as soon as it's found one match; there's no need to continue scanning the inner relation for this outer row. This saves useless scanning in nestloop and hash joins. In merge joins, it offers the opportunity to skip mark/restore processing, because we know we have not advanced past the first possible match for the next outer row. Of course, the devil is in the details: the proof of uniqueness must depend only on joinquals (not otherquals), and if we want to skip mergejoin mark/restore then it must depend only on merge clauses. To avoid adding more planning overhead than absolutely necessary, the present patch errs in the conservative direction: there are cases where inner_unique or skip_mark_restore processing could be used, but it will not do so because it's not sure that the uniqueness proof depended only on "safe" clauses. This could be improved later. David Rowley, reviewed and rather heavily editorialized on by me Discussion: https://postgr.es/m/CAApHDvqF6Sw-TK98bW48TdtFJ+3a7D2mFyZ7++=D-RyPsL76gw@mail.gmail.com	2017-04-07 22:20:13 -04:00
Simon Riggs	ac2b095088	Reset API of clause_selectivity() Discussion: https://postgr.es/m/CAKJS1f9yurJQW9pdnzL+rmOtsp2vOytkpXKGnMFJEO-qz5O5eA@mail.gmail.com	2017-04-06 19:10:51 -04:00
Simon Riggs	2686ee1b7c	Collect and use multi-column dependency stats Follow on patch in the multi-variate statistics patch series. CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t; ANALYZE; will collect dependency stats on (a, b) and then use the measured dependency in subsequent query planning. Commit `7b504eb282` added CREATE STATISTICS with n-distinct coefficients. These are now specified using the mutually exclusive option WITH (ndistinct). Author: Tomas Vondra, David Rowley Reviewed-by: Kyotaro HORIGUCHI, Álvaro Herrera, Dean Rasheed, Robert Haas and many other comments and contributions Discussion: https://postgr.es/m/56f40b20-c464-fad2-ff39-06b668fac47c@2ndquadrant.com	2017-04-05 18:00:42 -04:00
Kevin Grittner	18ce3a4ab2	Add infrastructure to support EphemeralNamedRelation references. A QueryEnvironment concept is added, which allows new types of objects to be passed into queries from parsing on through execution. At this point, the only thing implemented is a collection of EphemeralNamedRelation objects -- relations which can be referenced by name in queries, but do not exist in the catalogs. The only type of ENR implemented is NamedTuplestore, but provision is made to add more types fairly easily. An ENR can carry its own TupleDesc or reference a relation in the catalogs by relid. Although these features can be used without SPI, convenience functions are added to SPI so that ENRs can easily be used by code run through SPI. The initial use of all this is going to be transition tables in AFTER triggers, but that will be added to each PL as a separate commit. An incidental effect of this patch is to produce a more informative error message if an attempt is made to modify the contents of a CTE from a referencing DML statement. No tests previously covered that possibility, so one is added. Kevin Grittner and Thomas Munro Reviewed by Heikki Linnakangas, David Fetter, and Thomas Munro with valuable comments and suggestions from many others	2017-03-31 23:17:18 -05:00
Andrew Gierth	b5635948ab	Support hashed aggregation with grouping sets. This extends the Aggregate node with two new features: HashAggregate can now run multiple hashtables concurrently, and a new strategy MixedAggregate populates hashtables while doing sorted grouping. The planner will now attempt to save as many sorts as possible when planning grouping sets queries, while not exceeding work_mem for the estimated combined sizes of all hashtables used. No SQL-level changes are required. There should be no user-visible impact other than the new EXPLAIN output and possible changes to result ordering when ORDER BY was not used (which affected a few regression tests). The enable_hashagg option is respected. Author: Andrew Gierth Reviewers: Mark Dilger, Andres Freund Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk	2017-03-27 04:20:54 +01:00
Andres Freund	b8d7f053c5	Faster expression evaluation and targetlist projection. This replaces the old, recursive tree-walk based evaluation, with non-recursive, opcode dispatch based, expression evaluation. Projection is now implemented as part of expression evaluation. This both leads to significant performance improvements, and makes future just-in-time compilation of expressions easier. The speed gains primarily come from: - non-recursive implementation reduces stack usage / overhead - simple sub-expressions are implemented with a single jump, without function calls - sharing some state between different sub-expressions - reduced amount of indirect/hard to predict memory accesses by laying out operation metadata sequentially; including the avoidance of nearly all of the previously used linked lists - more code has been moved to expression initialization, avoiding constant re-checks at evaluation time Future just-in-time compilation (JIT) has become easier, as demonstrated by released patches intended to be merged in a later release, for primarily two reasons: Firstly, due to a stricter split between expression initialization and evaluation, less code has to be handled by the JIT. Secondly, due to the non-recursive nature of the generated "instructions", less performance-critical code-paths can easily be shared between interpreted and compiled evaluation. The new framework allows for significant future optimizations. E.g.: - basic infrastructure for to later reduce the per executor-startup overhead of expression evaluation, by caching state in prepared statements. That'd be helpful in OLTPish scenarios where initialization overhead is measurable. - optimizing the generated "code". A number of proposals for potential work has already been made. - optimizing the interpreter. Similarly a number of proposals have been made here too. The move of logic into the expression initialization step leads to some backward-incompatible changes: - Function permission checks are now done during expression initialization, whereas previously they were done during execution. In edge cases this can lead to errors being raised that previously wouldn't have been, e.g. a NULL array being coerced to a different array type previously didn't perform checks. - The set of domain constraints to be checked, is now evaluated once during expression initialization, previously it was re-built every time a domain check was evaluated. For normal queries this doesn't change much, but e.g. for plpgsql functions, which caches ExprStates, the old set could stick around longer. The behavior around might still change. Author: Andres Freund, with significant changes by Tom Lane, changes by Heikki Linnakangas Reviewed-By: Tom Lane, Heikki Linnakangas Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de	2017-03-25 14:52:06 -07:00
Robert Haas	1ea60ad602	Fix failure to use clamp_row_est() for parallel joins. Commit `0c2070cefa` neglected to use clamp_row_est() where it should have done so. Patch by me. Report by Amit Kapila. Discussion: http://postgr.es/m/CAA4eK1KPm8RYa1Kun3ZmQj9pb723b-EFN70j47Pid1vn3ByquA@mail.gmail.com	2017-03-15 12:28:54 -04:00
Robert Haas	2609e91fcf	Fix regression in parallel planning against inheritance tables. Commit `51ee6f3160` accidentally changed the behavior around inheritance hierarchies; before, we always considered parallel paths even for very small inheritance children, because otherwise an inheritance hierarchy with even one small child wouldn't be eligible for parallelism. That exception was inadverently removed; put it back. In passing, also adjust the degree-of-parallelism comptuation for index-only scans not to consider the number of heap pages fetched. Otherwise, we'll avoid parallel index-only scans on tables that are mostly all-visible, which isn't especially logical. Robert Haas and Amit Kapila, per a report from Ashutosh Sharma. Discussion: http://postgr.es/m/CAE9k0PmgSoOHRd60SHu09aRVTHRSs8s6pmyhJKWHxWw9C_x+XA@mail.gmail.com	2017-03-14 14:33:14 -04:00
Alvaro Herrera	a9c074ba7e	Silence unused variable compiler warning Fallout from fcec6caafa2: mark a variable in set_tablefunc_size_estimates as used for asserts only. Also, the planner_rte_fetch() call is pointless with assertions disabled, so enclose it in a USE_ASSERT_CHECKING #ifdef; fix the same problem in set_subquery_size_estimates(). First problem noted by David Rowley, whose compiler is noisier than mine in this regard.	2017-03-13 19:02:38 -03:00
Robert Haas	355d3993c5	Add a Gather Merge executor node. Like Gather, we spawn multiple workers and run the same plan in each one; however, Gather Merge is used when each worker produces the same output ordering and we want to preserve that output ordering while merging together the streams of tuples from various workers. (In a way, Gather Merge is like a hybrid of Gather and MergeAppend.) This works out to a win if it saves us from having to perform an expensive Sort. In cases where only a small amount of data would need to be sorted, it may actually be faster to use a regular Gather node and then sort the results afterward, because Gather Merge sometimes needs to wait synchronously for tuples whereas a pure Gather generally doesn't. But if this avoids an expensive sort then it's a win. Rushabh Lathia, reviewed and tested by Amit Kapila, Thomas Munro, and Neha Sharma, and reviewed and revised by me. Discussion: http://postgr.es/m/CAGPqQf09oPX-cQRpBKS0Gq49Z+m6KBxgxd_p9gX8CKk_d75HoQ@mail.gmail.com	2017-03-09 07:49:29 -05:00
Robert Haas	f35742ccb7	Support parallel bitmap heap scans. The index is scanned by a single process, but then all cooperating processes can iterate jointly over the resulting set of heap blocks. In the future, we might also want to support using a parallel bitmap index scan to set up for a parallel bitmap heap scan, but that's a job for another day. Dilip Kumar, with some corrections and cosmetic changes by me. The larger patch set of which this is a part has been reviewed and tested by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia Sabih, Haribabu Kommi, Thomas Munro, and me. Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com	2017-03-08 12:05:43 -05:00
Alvaro Herrera	fcec6caafa	Support XMLTABLE query expression XMLTABLE is defined by the SQL/XML standard as a feature that allows turning XML-formatted data into relational form, so that it can be used as a <table primary> in the FROM clause of a query. This new construct provides significant simplicity and performance benefit for XML data processing; what in a client-side custom implementation was reported to take 20 minutes can be executed in 400ms using XMLTABLE. (The same functionality was said to take 10 seconds using nested PostgreSQL XPath function calls, and 5 seconds using XMLReader under PL/Python). The implemented syntax deviates slightly from what the standard requires. First, the standard indicates that the PASSING clause is optional and that multiple XML input documents may be given to it; we make it mandatory and accept a single document only. Second, we don't currently support a default namespace to be specified. This implementation relies on a new executor node based on a hardcoded method table. (Because the grammar is fixed, there is no extensibility in the current approach; further constructs can be implemented on top of this such as JSON_TABLE, but they require changes to core code.) Author: Pavel Stehule, Álvaro Herrera Extensively reviewed by: Craig Ringer Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com	2017-03-08 12:40:26 -03:00
Peter Eisentraut	38d103763d	Make more use of castNode()	2017-02-21 11:59:09 -05:00
Robert Haas	5262f7a4fc	Add optimizer and executor support for parallel index scans. In combination with `569174f1be`, which taught the btree AM how to perform parallel index scans, this allows parallel index scan plans on btree indexes. This infrastructure should be general enough to support parallel index scans for other index AMs as well, if someone updates them to support parallel scans. Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar Ahuja, and Haribabu Kommi, and me.	2017-02-15 13:53:24 -05:00
Robert Haas	da08a65989	Refactor bitmap heap scan estimation of heap pages fetched. Currently, we only need this logic in order to cost a Bitmap Heap Scan. But a pending patch for Parallel Bitmap Heap Scan also uses it to help figure out how many workers to use for the scan, which has to be determined prior to costing. So, move the logic to a separate function to make that easier. Dilip Kumar. The patch series of which this is a part has been reviewed by Andres Freund, Amit Khendekar, Tushar Ahuja, Rafia Sabih, Haribabu Kommi, and me; it is not clear from the email discussion which of those people have looked specifically at this part. Discussion: http://postgr.es/m/CAFiTN-v3QYNJEZnnmKCeATuLbN-h9tMVfeEF0+BrouYDqjXgwg@mail.gmail.com	2017-01-27 16:28:47 -05:00
Robert Haas	0c2070cefa	Fix cardinality estimates for parallel joins. For a partial path, the cardinality estimate needs to reflect the number of rows we think each worker will see, rather than the total number of rows; otherwise, costing will go wrong. The previous coding got this completely wrong for parallel joins. Unfortunately, this change may destabilize plans for users of 9.6 who have enabled parallel query, but since 9.6 is still fairly new I'm hoping expectations won't be too settled yet. Also, this is really a brown-paper-bag bug, so leaving it unfixed for the entire lifetime of 9.6 seems unwise. Related reports (whose import I initially failed to recognize) by Tomas Vondra and Tom Lane. Discussion: http://postgr.es/m/CA+TgmoaDxZ5z5Kw_oCQoymNxNoVaTCXzPaODcOuao=CzK8dMZw@mail.gmail.com	2017-01-13 13:34:10 -05:00
Bruce Momjian	1d25779284	Update copyright via script for 2017	2017-01-03 13:48:53 -05:00
Tom Lane	7fa93eec4e	Fix FK-based join selectivity estimation for semi/antijoins. This case wasn't thought through sufficiently in commit `100340e2d`. It's true that the FK proves that every outer row has a match in the inner table, but we forgot that some of the inner rows might be filtered away by WHERE conditions located within the semijoin's RHS. If the RHS is just one table, we can reasonably take the semijoin selectivity as equal to the fraction of the referenced table's rows that are expected to survive its restriction clauses. If the RHS is a join, it's not clear how much of the referenced table might get through the join, so fall back to the same rule we were already using for other outer-join cases: use the minimum of the regular per-clause selectivity estimates. This gives the same result as if we hadn't considered the FK at all when there's a single FK column, but it should still help for multi-column FKs, which is the case that `100340e2d` is really meant to help with. Back-patch to 9.6 where the previous commit came in. Discussion: https://postgr.es/m/16149.1481835103@sss.pgh.pa.us	2016-12-17 15:28:54 -05:00
Tom Lane	34ca090570	Adjust cost_merge_append() to reflect use of binaryheap_replace_first(). Commit `7a2fe9bd0` improved merge append so that replacement of a tuple takes log(N) operations, not twice log(N). Since cost_merge_append knew about that explicitly, we should adjust it. This probably makes little difference in practice, but the obsolete comment is confusing. Ideally this would have been put in in 9.3 with the underlying behavior change; but I'm not going to back-patch it, since there's some small chance of changing a plan choice that somebody's optimized for. Thomas Munro Discussion: <CAEepm=0WQBSvuYcMOUj4Ga4NXpu2J=ejZcE=e=eiTjTX-6_gDw@mail.gmail.com>	2016-11-05 13:48:11 -04:00
Tom Lane	69995c3b3f	Fix cost_rescan() to account for multi-batch hashing correctly. cost_rescan assumed that we don't need to rebuild the hash table when rescanning a hash join. However, that's currently only true for single-batch joins; for a multi-batch join we must charge full freight. This probably has escaped notice because we'd be unlikely to put a hash join on the inside of a nestloop anyway. Nonetheless, it's wrong. Fix in HEAD, but don't backpatch for fear of destabilizing plans in stable releases.	2016-07-27 17:45:05 -04:00
Tom Lane	c89d507649	Round rowcount estimate for a partial path to an integer. I'd been wondering why I was sometimes seeing fractional rowcount estimates in parallel-query situations, and this seems to be the reason. (You won't see the fractional parts in EXPLAIN, because it prints rowcounts with %.0f, but they are apparent in the debugger.) A fractional rowcount is not any saner for a partial path than any other kind of path, and it's equally likely to break cost estimation for higher paths, so apply clamp_row_est() like we do in other places.	2016-07-03 14:53:46 -04:00
Tom Lane	3154e16737	Dodge compiler bug in Visual Studio 2013. VS2013 apparently has a problem with taking the address of a formal parameter in some cases. We do that elsewhere without trouble, but in this case the address is being passed to a subroutine that will probably get inlined, so maybe the combination of those things is what tickles the bug. Anyway, introducing an extra copy of the parameter value is enough to work around it. Per trouble report from Umair Shahid. Report: <CAM184AcjqKYZSdQqBHDrnENXHhW=mXbUC46QYPJ=nAh0gUHCGA@mail.gmail.com>	2016-06-29 19:07:19 -04:00
Tom Lane	100340e2dc	Restore foreign-key-aware estimation of join relation sizes. This patch provides a new implementation of the logic added by commit `137805f89` and later removed by `77ba61080`. It differs from the original primarily in expending much less effort per joinrel in large queries, which it accomplishes by doing most of the matching work once per query not once per joinrel. Hopefully, it's also less buggy and better commented. The never-documented enable_fkey_estimates GUC remains gone. There remains work to be done to make the selectivity estimates account for nulls in FK referencing columns; but that was true of the original patch as well. We may be able to address this point later in beta. In the meantime, any error should be in the direction of overestimating rather than underestimating joinrel sizes, which seems like the direction we want to err in. Tomas Vondra and Tom Lane Discussion: <31041.1465069446@sss.pgh.pa.us>	2016-06-18 15:22:34 -04:00
Robert Haas	c9ce4a1c61	Eliminate "parallel degree" terminology. This terminology provoked widespread complaints. So, instead, rename the GUC max_parallel_degree to max_parallel_workers_per_gather (leaving room for a possible future GUC max_parallel_workers that acts as a system-wide limit), and rename the parallel_degree reloption to parallel_workers. Rename structure members to match. These changes create a dump/restore hazard for users of PostgreSQL 9.6beta1 who have set the reloption (or applied the GUC using ALTER USER or ALTER DATABASE).	2016-06-09 10:00:26 -04:00
Tom Lane	77ba610805	Revert "Use Foreign Key relationships to infer multi-column join selectivity". This commit reverts `137805f89` as well as the associated commits `015e88942`, `5306df283`, and `68d704edb`. We found multiple bugs in this feature, and there was concern about possible planner slowdown (though to be fair, exhibiting a very large slowdown proved difficult). The way forward requires a considerable rewrite, which may or may not be possible to accomplish in time for beta2. In my judgment reviewing the rewrite will be easier to accomplish starting from a clean slate, so let's temporarily revert what's there now. This also leaves us in a safe state if it turns out to be necessary to postpone the rewrite to the next development cycle. Discussion: <20160429102531.GA13701@huehner.biz>	2016-06-07 17:21:17 -04:00
Robert Haas	68d704edbf	Minimal fix for crash bug in quals_match_foreign_key. Discussion is still underway as to whether to revert the entire patch that added this function, but that discussion may not conclude before beta1. So, in the meantime, let's do at least this much. David Rowley	2016-05-06 15:00:55 -04:00
Robert Haas	77cd477c4b	Enable parallel query by default. Change max_parallel_degree default from 0 to 2. It is possible that this is not a good idea, or that we should go with 1 worker rather than 2, but we won't find out without trying it. Along the way, reword the documentation for max_parallel_degree a little bit to hopefully make it more clear. Discussion: 20160420174631.3qjjhpwsvvx5bau5@alap3.anarazel.de	2016-04-26 08:35:58 -04:00
Robert Haas	0711803775	Use quicksort, not replacement selection, for external sorting. We still use replacement selection for the first run of the sort only and only when the number of tuples is relatively small. Otherwise, the first run, and subsequent runs in all cases, are produced using quicksort. This tends to be faster except perhaps for very small amounts of working memory. Peter Geoghegan, reviewed by Tomas Vondra, Jeff Janes, Mithun Cy, Greg Stark, and me.	2016-04-08 02:36:26 -04:00
Simon Riggs	137805f89a	Use Foreign Key relationships to infer multi-column join selectivity In cases where joins use multiple columns we currently assess each join separately causing gross mis-estimates for join cardinality. This patch adds use of FK information for the first time into the planner. When FKs are present and we have multi-column join information, plan estimates will be drastically improved. Cases with multiple FKs are handled, though partial matches are ignored currently. Net effect is substantial performance improvements for joins in many common cases. Additional planning time is isolated to cases that are currently performing poorly, measured at 0.08 - 0.15 ms. Please watch for planner performance regressions; circumstances seem unlikely but the law of unintended consequences may apply somewhen. Additional complex tests welcome to prove this before release. Tests can be performed using SET enable_fkey_estimates = on \| off using scripts provided during Hackers discussions, message id: 552335D9.3090707@2ndquadrant.com Authors: Tomas Vondra and David Rowley Reviewed and tested by Simon Riggs, adding comments only	2016-04-08 02:51:09 +01:00
Tom Lane	de94e2af18	Run pgindent on a batch of (mostly-planner-related) source files. Getting annoyed at the amount of unrelated chatter I get from pgindent'ing Rowley's unique-joins patch. Re-indent all the files it touches.	2016-04-06 11:34:02 -04:00
Tom Lane	f9aefcb91f	Support using index-only scans with partial indexes in more cases. Previously, the planner would reject an index-only scan if any restriction clause for its table used a column not available from the index, even if that restriction clause would later be dropped from the plan entirely because it's implied by the index's predicate. This is a fairly common situation for partial indexes because predicates using columns not included in the index are often the most useful kind of predicate, and we have to duplicate (or at least imply) the predicate in the WHERE clause in order to get the index to be considered at all. So index-only scans were essentially unavailable with such partial indexes. To fix, we have to do detection of implied-by-predicate clauses much earlier in the planner. This patch puts it in check_index_predicates (nee check_partial_indexes), meaning it gets done for every partial index, whereas we previously only considered this issue at createplan time, so that the work was only done for an index actually selected for use. That could result in a noticeable planning slowdown for queries against tables with many partial indexes. However, testing suggested that there isn't really a significant cost, especially not with reasonable numbers of partial indexes. We do get a small additional benefit, which is that cost_index is more accurate since it correctly discounts the evaluation cost of clauses that will be removed. We can also avoid considering such clauses as potential indexquals, which saves useless matching cycles in the case where the predicate columns aren't in the index, and prevents generating bogus plans that double-count the clause's selectivity when the columns are in the index. Tomas Vondra and Kyotaro Horiguchi, reviewed by Kevin Grittner and Konstantin Knizhnik, and whacked around a little by me	2016-03-31 14:49:10 -04:00
Tom Lane	76281aa964	Avoid a couple of zero-divide scenarios in the planner. cost_subplan() supposed that the given subplan must have plan_rows > 0, which as far as I can tell was true until recent refactoring of the code in createplan.c; but now that code allows the Result for a provably empty subquery to have plan_rows = 0. Rather than undo that change, put in a clamp to prevent zero divide. get_cheapest_fractional_path() likewise supposed that best_path->rows > 0. This assumption has been wrong for longer. It's actually harmless given IEEE float math, because a positive value divided by zero gives +Infinity and compare_fractional_path_costs() will do the right thing with that. Still, best not to assume that. final_cost_nestloop() also seems to have some risks in this area, so borrow the clamping logic already present in the mergejoin cost functions. Lastly, remove unnecessary clamp_row_est() in planner.c's calls to get_number_of_groups(). The only thing that function does with path_rows is pass it to estimate_num_groups() which already has an internal clamp, so we don't need the extra call; and if we did, the callers are arguably the wrong place for it anyway. First two items reported by Piotr Stefaniak, the others are products of my nosing around for similar problems. No back-patch since there's no evidence that problems arise in the back branches.	2016-03-26 12:03:12 -04:00
Robert Haas	e06a38965b	Support parallel aggregation. Parallel workers can now partially aggregate the data and pass the transition values back to the leader, which can combine the partial results to produce the final answer. David Rowley, based on earlier work by Haribabu Kommi. Reviewed by Álvaro Herrera, Tomas Vondra, Amit Kapila, James Sewell, and me.	2016-03-21 09:30:18 -04:00
Tom Lane	307c78852f	Rethink representation of PathTargets. In commit `19a541143a` I did not make PathTarget a subtype of Node, and embedded a RelOptInfo's reltarget directly into it rather than having a separately-allocated Node. In hindsight that was misguided micro-optimization, enabled by the fact that at that point we didn't have any Paths with custom PathTargets. Now that PathTarget processing has been fleshed out some more, it's easier to see that it's better to have PathTarget as an indepedent Node type, even if it does cost us one more palloc to create a RelOptInfo. So change it while we still can. This commit just changes the representation, without doing anything more interesting than that.	2016-03-14 16:59:59 -04:00
Tom Lane	cf8e7b16a5	Spell "parallel" correctly. Per David Rowley.	2016-03-07 21:48:17 -05:00

1 2 3 4 5 ...

341 Commits