postgresql

Commit Graph

Author	SHA1	Message	Date
Tom Lane	385d396b80	Split up a couple of long-running regression test scripts. The point of this change is to increase the potential for parallelism while running the core regression tests. Most people these days are using parallel testing modes on multi-core machines, so we might as well try a bit harder to keep multiple cores busy. Hence, a test that runs much longer than others in its parallel group is a candidate to be sub-divided. In this patch, create_index.sql and join.sql are split up. I haven't changed the content of the tests in any way, just moved them. I moved create_index.sql's SP-GiST-related tests into a new script create_index_spgist, and moved its btree multilevel page deletion test over to the existing script btree_index. (btree_index is a more natural home for that test, and it's shorter than others in its parallel group, so this doesn't hurt total runtime of that group.) There might be room for more aggressive splitting of create_index, but this is enough to improve matters considerably. Likewise, I moved join.sql's "exercises for the hash join code" into a new file join_hash. Those exercises contributed three-quarters of the script's runtime. Which might well be excessive ... but for the moment, I'm satisfied with shoving them into a different parallel group, where they can share runtime with the roughly-equally-lengthy gist test. (Note for anybody following along at home: there are interesting interactions between the runtimes of create_index and anything running in parallel with it, because the tests of CREATE INDEX CONCURRENTLY in that file will repeatedly block waiting for concurrent transactions to commit. As committed in this patch, create_index and create_index_spgist have roughly equal runtimes, but that's mostly an artifact of forced synchronization of the CONCURRENTLY tests; when run serially, create_index is much faster. A followup patch will reduce the runtime of create_index_spgist and thereby also create_index.) Discussion: https://postgr.es/m/735.1554935715@sss.pgh.pa.us	2019-04-11 16:15:54 -04:00
Tom Lane	0a9d7e1f6d	Ensure dummy paths have correct required_outer if rel is parameterized. The assertions added by commits `34ea1ab7f` et al found another problem: set_dummy_rel_pathlist and mark_dummy_rel were failing to label the dummy paths they create with the correct outer_relids, in case the relation is necessarily parameterized due to having lateral references in its tlist. It's likely that this has no user-visible consequences in production builds, at the moment; but still an assertion failure is a bad thing, so back-patch the fix. Per bug #15694 from Roman Zharkov (via Alexander Lakhin) and an independent report by Tushar Ahuja. Discussion: https://postgr.es/m/15694-74f2ca97e7044f7f@postgresql.org Discussion: https://postgr.es/m/7d72ab20-c725-3ce2-f99d-4e64dd8a0de6@enterprisedb.com	2019-03-14 12:16:36 -04:00
Tom Lane	24d08f3c0a	Fix mark-and-restore-skipping test case to not be a self-join. There isn't any good reason for this test to be a self-join rather than a join between separate tables, except that it saved a couple of SQL commands for setup. A proposed patch to optimize away self-joins breaks the test, so adjust it to avoid that happening. Discussion: https://postgr.es/m/64486b0b-0404-e39e-322d-0801154901f3@postgrespro.ru	2019-02-21 18:55:29 -05:00
Tom Lane	4be058fe9e	In the planner, replace an empty FROM clause with a dummy RTE. The fact that "SELECT expression" has no base relations has long been a thorn in the side of the planner. It makes it hard to flatten a sub-query that looks like that, or is a trivial VALUES() item, because the planner generally uses relid sets to identify sub-relations, and such a sub-query would have an empty relid set if we flattened it. prepjointree.c contains some baroque logic that works around this in certain special cases --- but there is a much better answer. We can replace an empty FROM clause with a dummy RTE that acts like a table of one row and no columns, and then there are no such corner cases to worry about. Instead we need some logic to get rid of useless dummy RTEs, but that's simpler and covers more cases than what was there before. For really trivial cases, where the query is just "SELECT expression" and nothing else, there's a hazard that adding the extra RTE makes for a noticeable slowdown; even though it's not much processing, there's not that much for the planner to do overall. However testing says that the penalty is very small, close to the noise level. In more complex queries, this is able to find optimizations that we could not find before. The new RTE type is called RTE_RESULT, since the "scan" plan type it gives rise to is a Result node (the same plan we produced for a "SELECT expression" query before). To avoid confusion, rename the old ResultPath path type to GroupResultPath, reflecting that it's only used in degenerate grouping cases where we know the query produces just one grouped row. (It wouldn't work to unify the two cases, because there are different rules about where the associated quals live during query_planner.) Note: although this touches readfuncs.c, I don't think a catversion bump is required, because the added case can't occur in stored rules, only plans. Patch by me, reviewed by David Rowley and Mark Dilger Discussion: https://postgr.es/m/15944.1521127664@sss.pgh.pa.us	2019-01-28 17:54:23 -05:00
Tom Lane	7d4a10e260	Use PlaceHolderVars within the quals of a FULL JOIN. This prevents failures in cases where we pull up a constant or var-free expression from a subquery and put it into a full join's qual. That can result in not recognizing the qual as containing a mergejoin-able or hashjoin-able condition. A PHV prevents the problem because it is still recognized as belonging to the side of the join the subquery is in. I'm not very sure about the net effect of this change on plan quality. In "typical" cases where the join keys are Vars, nothing changes. In an affected case, the PHV-wrapped expression is less likely to be seen as equal to PHV-less instances below the join, but more likely to be seen as equal to similar expressions above the join, so it may end up being a wash. In the one existing case where there's any visible change in a regression-test plan, it amounts to referencing a lower computation of a COALESCE result instead of recomputing it, which seems like a win. Given my uncertainty about that and the lack of field complaints, no back-patch, even though this is a very ancient problem. Discussion: https://postgr.es/m/32090.1539378124@sss.pgh.pa.us	2018-10-14 13:07:29 -04:00
Tom Lane	a11b3bd37f	Fix misprocessing of equivalence classes involving record_eq(). canonicalize_ec_expression() is supposed to agree with coerce_type() as to whether a RelabelType should be inserted to make a subexpression be valid input for the operators of a given opclass. However, it did the wrong thing with named-composite-type inputs to record_eq(): it put in a RelabelType to RECORDOID, which the parser doesn't. In some cases this was harmless because all code paths involving a particular equivalence class did the same thing, but in other cases this would result in failing to recognize a composite-type expression as being a member of an equivalence class that it actually is a member of. The most obvious bad effect was to fail to recognize that an index on a composite column could provide the sort order needed for a mergejoin on that column, as reported by Teodor Sigaev. I think there might be other, subtler, cases that result in misoptimization. It also seems possible that an unwanted RelabelType would sometimes get into an emitted plan --- but because record_eq and friends don't examine the declared type of their input expressions, that would not create any visible problems. To fix, just treat RECORDOID as if it were a polymorphic type, which in some sense it is. We might want to consider formalizing that a bit more someday, but for the moment this seems to be the only place where an IsPolymorphicType() test ought to include RECORDOID as well. This has been broken for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/a6b22369-e3bf-4d49-f59d-0c41d3551e81@sigaev.ru	2018-05-16 13:46:23 -04:00
Tom Lane	e5d83995e9	Fix incorrect handling of join clauses pushed into parameterized paths. In some cases a clause attached to an outer join can be pushed down into the outer join's RHS even though the clause is not degenerate --- this can happen if we choose to make a parameterized path for the RHS. If the clause ends up attached to a lower outer join, we'd misclassify it as being a "join filter" not a plain "filter" condition at that node, leading to wrong query results. To fix, teach extract_actual_join_clauses to examine each join clause's required_relids, not just its is_pushed_down flag. (The latter now seems vestigial, or at least in need of rethinking, but we won't do anything so invasive as redefining it in a bug-fix patch.) This has been wrong since we introduced parameterized paths in 9.2, though it's evidently hard to hit given the lack of previous reports. The test case used here involves a lateral function call, and I think that a lateral reference may be required to get the planner to select a broken plan; though I wouldn't swear to that. In any case, even if LATERAL is needed to trigger the bug, it still affects all supported branches, so back-patch to all. Per report from Andreas Karlsson. Thanks to Andrew Gierth for preliminary investigation. Discussion: https://postgr.es/m/f8128b11-c5bf-3539-48cd-234178b2314d@proxel.se	2018-04-19 15:49:30 -04:00
Tom Lane	2cf8c7aa48	Clean up duplicate table and function names in regression tests. Many of the objects we create during the regression tests are put in the public schema, so that using the same names in different regression tests creates a hazard of test failures if any two such scripts run concurrently. This patch cleans up a bunch of latent hazards of that sort, as well as two live hazards. The current situation in this regard is far worse than it was a year or two back, because practically all of the partitioning-related test cases have reused table names with enthusiasm. I despaired of cleaning up that mess within the five most-affected tests (create_table, alter_table, insert, update, inherit); fortunately those don't run concurrently. Other than partitioning problems, most of the issues boil down to using names like "foo", "bar", "tmp", etc, without thought for the fact that other test scripts might use similar names concurrently. I've made an effort to make all such names more specific. One of the live hazards was that commit `7421f4b8` caused with.sql to create a table named "test", conflicting with a similarly-named table in alter_table.sql; this was exposed in the buildfarm recently. The other one was that join.sql and transactions.sql both create tables named "foo" and "bar"; but join.sql's uses of those names date back only to December or so. Since commit `7421f4b8` was back-patched to v10, back-patch a minimal fix for that problem. The rest of this is just future-proofing. Discussion: https://postgr.es/m/4627.1521070268@sss.pgh.pa.us	2018-03-15 17:09:02 -04:00
Tom Lane	9afd513df0	Fix planner failures with overlapping mergejoin clauses in an outer join. Given overlapping or partially redundant join clauses, for example t1 JOIN t2 ON t1.a = t2.x AND t1.b = t2.x the planner's EquivalenceClass machinery will ordinarily refactor the clauses as "t1.a = t1.b AND t1.a = t2.x", so that join processing doesn't see multiple references to the same EquivalenceClass in a list of join equality clauses. However, if the join is outer, it's incorrect to derive a restriction clause on the outer side from the join conditions, so the clause refactoring does not happen and we end up with overlapping join conditions. The code that attempted to deal with such cases had several subtle bugs, which could result in "left and right pathkeys do not match in mergejoin" or "outer pathkeys do not match mergeclauses" planner errors, if the selected join plan type was a mergejoin. (It does not appear that any actually incorrect plan could have been emitted.) The core of the problem really was failure to recognize that the outer and inner relations' pathkeys have different relationships to the mergeclause list. A join's mergeclause list is constructed by reference to the outer pathkeys, so it will always be ordered the same as the outer pathkeys, but this cannot be presumed true for the inner pathkeys. If the inner sides of the mergeclauses contain multiple references to the same EquivalenceClass ({t2.x} in the above example) then a simplistic rendering of the required inner sort order is like "ORDER BY t2.x, t2.x", but the pathkey machinery recognizes that the second sort column is redundant and throws it away. The mergejoin planning code failed to account for that behavior properly. One error was to try to generate cut-down versions of the mergeclause list from cut-down versions of the inner pathkeys in the same way as the initial construction of the mergeclause list from the outer pathkeys was done; this could lead to choosing a mergeclause list that fails to match the outer pathkeys. The other problem was that the pathkey cross-checking code in create_mergejoin_plan treated the inner and outer pathkey lists identically, whereas actually the expectations for them must be different. That led to false "pathkeys do not match" failures in some cases, and in principle could have led to failure to detect bogus plans in other cases, though there is no indication that such bogus plans could be generated. Reported by Alexander Kuzmenkov, who also reviewed this patch. This has been broken for years (back to around 8.3 according to my testing), so back-patch to all supported branches. Discussion: https://postgr.es/m/5dad9160-4632-0e47-e120-8e2082000c01@postgrespro.ru	2018-02-23 13:47:33 -05:00
Tom Lane	bb94ce4d26	Teach reparameterize_path() to handle AppendPaths. If we're inside a lateral subquery, there may be no unparameterized paths for a particular child relation of an appendrel, in which case we must be able to create similarly-parameterized paths for each other child relation, else the planner will fail with "could not devise a query plan for the given query". This means that there are situations where we'd better be able to reparameterize at least one path for each child. This calls into question the assumption in reparameterize_path() that it can just punt if it feels like it. However, the only case that is known broken right now is where the child is itself an appendrel so that all its paths are AppendPaths. (I think possibly I disregarded that in the original coding on the theory that nested appendrels would get folded together --- but that only happens after reparameterize_path(), so it's not excused from handling a child AppendPath.) Given that this code's been like this since 9.3 when LATERAL was introduced, it seems likely we'd have heard of other cases by now if there were a larger problem. Per report from Elvis Pranskevichus. Back-patch to 9.3. Discussion: https://postgr.es/m/5981018.zdth1YWmNy@hammer.magicstack.net	2018-01-23 16:50:34 -05:00
Tom Lane	934c7986f4	Tweak parallel hash join test case in hopes of improving stability. This seems to make things better on gaur, let's see what the rest of the buildfarm thinks. Thomas Munro Discussion: https://postgr.es/m/CAEepm=1uuT8iJxMEsR=jL+3zEi87DB2v0+0H9o_rUXXCZPZT3A@mail.gmail.com	2018-01-04 01:06:58 -05:00
Andres Freund	1804284042	Add parallel-aware hash joins. Introduce parallel-aware hash joins that appear in EXPLAIN plans as Parallel Hash Join with Parallel Hash. While hash joins could already appear in parallel queries, they were previously always parallel-oblivious and had a partial subplan only on the outer side, meaning that the work of the inner subplan was duplicated in every worker. After this commit, the planner will consider using a partial subplan on the inner side too, using the Parallel Hash node to divide the work over the available CPU cores and combine its results in shared memory. If the join needs to be split into multiple batches in order to respect work_mem, then workers process different batches as much as possible and then work together on the remaining batches. The advantages of a parallel-aware hash join over a parallel-oblivious hash join used in a parallel query are that it: * avoids wasting memory on duplicated hash tables * avoids wasting disk space on duplicated batch files * divides the work of building the hash table over the CPUs One disadvantage is that there is some communication between the participating CPUs which might outweigh the benefits of parallelism in the case of small hash tables. This is avoided by the planner's existing reluctance to supply partial plans for small scans, but it may be necessary to estimate synchronization costs in future if that situation changes. Another is that outer batch 0 must be written to disk if multiple batches are required. A potential future advantage of parallel-aware hash joins is that right and full outer joins could be supported, since there is a single set of matched bits for each hashtable, but that is not yet implemented. A new GUC enable_parallel_hash is defined to control the feature, defaulting to on. Author: Thomas Munro Reviewed-By: Andres Freund, Robert Haas Tested-By: Rafia Sabih, Prabhat Sahu Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com https://postgr.es/m/CAEepm=37HKyJ4U6XOLi=JgfSHM3o6B-GaeO-6hkOmneTDkH+Uw@mail.gmail.com	2017-12-21 00:43:41 -08:00
Robert Haas	7d3583ad9a	Test instrumentation of Hash nodes with parallel query. Commit `8526bcb2df` fixed bugs related to both Sort and Hash, but only added a test case for Sort. This adds a test case for Hash to match. Thomas Munro Discussion: http://postgr.es/m/CAEepm=2-LRnfwUBZDqQt+XAcd0af_ykNyyVvP3h1uB1AQ=e-eA@mail.gmail.com	2017-12-19 15:29:08 -05:00
Andres Freund	5bcf389ecf	Fix EXPLAIN ANALYZE of hash join when the leader doesn't participate. If a hash join appears in a parallel query, there may be no hash table available for explain.c to inspect even though a hash table may have been built in other processes. This could happen either because parallel_leader_participation was set to off or because the leader happened to hit the end of the outer relation immediately (even though the complete relation is not empty) and decided not to build the hash table. Commit `bf11e7ee` introduced a way for workers to exchange instrumentation via the DSM segment for Sort nodes even though they are not parallel-aware. This commit does the same for Hash nodes, so that explain.c has a way to find instrumentation data from an arbitrary participant that actually built the hash table. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm%3D3DUQC2-z252N55eOcZBer6DPdM%3DFzrxH9dZc5vYLsjaA%40mail.gmail.com	2017-12-05 10:55:56 -08:00
Tom Lane	7ca25b7de6	Fix neqjoinsel's behavior for semi/anti join cases. Previously, this function estimated the selectivity as 1 minus eqjoinsel() for the negator equality operator, regardless of join type (I think there was an expectation that eqjoinsel would handle the join type). But actually this is completely wrong for semijoin cases: the fraction of the LHS that has a non-matching row is not one minus the fraction of the LHS that has a matching row. In reality a semijoin with <> will nearly always succeed: it can only fail when the RHS is empty, or it contains a single distinct value that is equal to the particular LHS value, or the LHS value is null. The only one of those things we should have much confidence in estimating is the fraction of LHS values that are null, so let's just take the selectivity as 1 minus outer nullfrac. Per coding convention, antijoin should be estimated the same as semijoin. Arguably this is a bug fix, but in view of the lack of field complaints and the risk of destabilizing plans, no back-patch. Thomas Munro, reviewed by Ashutosh Bapat Discussion: https://postgr.es/m/CAEepm=270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg@mail.gmail.com	2017-11-29 22:00:37 -05:00
Andres Freund	fa330f9adf	Add some regression tests that exercise hash join code. Although hash joins are already tested by many queries, these tests systematically cover the four different states we can reach as part of the strategy for respecting work_mem. Author: Thomas Munro Reviewed-By: Andres Freund	2017-11-29 16:06:50 -08:00
Robert Haas	57eebca03a	Fix create_lateral_join_info to handle dead relations properly. Commit `0a480502b0` broke it. Report by Andreas Seltenreich. Fix by Ashutosh Bapat. Discussion: http://postgr.es/m/874ls2vrnx.fsf@ansel.ydns.eu	2017-09-20 10:20:10 -04:00
Robert Haas	0a480502b0	Expand partitioned table RTEs level by level, without flattening. Flattening the partitioning hierarchy at this stage makes various desirable optimizations difficult. The original use case for this patch was partition-wise join, which wants to match up the partitions in one partitioning hierarchy with those in another such hierarchy. However, it now seems that it will also be useful in making partition pruning work using the PartitionDesc rather than constraint exclusion, because with a flattened expansion, we have no easy way to figure out which PartitionDescs apply to which leaf tables in a multi-level partition hierarchy. As it turns out, we end up creating both rte->inh and !rte->inh RTEs for each intermediate partitioned table, just as we previously did for the root table. This seems unnecessary since the partitioned tables have no storage and are not scanned. We might want to go back and rejigger things so that no partitioned tables (including the parent) need !rte->inh RTEs, but that seems to require some adjustments not related to the core purpose of this patch. Ashutosh Bapat, reviewed by me and by Amit Langote. Some final adjustments by me. Discussion: http://postgr.es/m/CAFjFpRd=1venqLL7oGU=C1dEkuvk2DJgvF+7uKbnPHaum1mvHQ@mail.gmail.com	2017-09-14 15:41:08 -04:00
Tom Lane	d8e6b84bd2	Avoid regressions in foreign-key-based selectivity estimates. David Rowley found that the "use the smallest per-column selectivity" heuristic applied in some cases by get_foreign_key_join_selectivity() was badly off if the FK columns are independent, producing estimates much worse than we got before that code was added in 9.6. One case where that heuristic was used was for LEFT and FULL outer joins with the referenced rel on the outside of the join. But we should not really need to special-case those here. eqjoinsel() never has had such a special case; the correction is applied by calc_joinrel_size_estimate() instead. Let's just estimate such cases like inner joins and rely on that later adjustment. (I think there was something of a thinko here, in that the comments seem to be thinking about the selectivity as defined for semi/anti joins; but that shouldn't apply to left/full joins.) Add a regression test exercising such a case to show that this is sane in at least some cases. The other case where we used that heuristic was for SEMI/ANTI outer joins, either if the referenced rel was on the outside, or if it was on the inside but was part of a join within the RHS. In either case, the FK doesn't give us a lot of traction towards estimating the selectivity. To ensure that we don't have regressions from what happened before 9.6, let's punt by ignoring the FK in such cases and applying the traditional selectivity calculation. (We might be able to improve on that later, but for now I just want to be sure it's not worse than 9.5.) Report and patch by David Rowley, simplified a bit by me. Back-patch to 9.6 where this code was added. Discussion: https://postgr.es/m/CAKJS1f8NO8oCDcxrteohG6O72uU1saEVT9qX=R8pENr5QWerXw@mail.gmail.com	2017-06-19 15:33:41 -04:00
Tom Lane	92a43e4857	Reduce semijoins with unique inner relations to plain inner joins. If the inner relation can be proven unique, that is it can have no more than one matching row for any row of the outer query, then we might as well implement the semijoin as a plain inner join, allowing substantially more freedom to the planner. This is a form of outer join strength reduction, but it can't be implemented in reduce_outer_joins() because we don't have enough info about the individual relations at that stage. Instead do it much like remove_useless_joins(): once we've built base relations, we can make another pass over the SpecialJoinInfo list and get rid of any entries representing reducible semijoins. This is essentially a followon to the inner-unique patch (commit `9c7f5229a`) and makes use of the proof machinery that that patch created. We need only minor refactoring of innerrel_is_unique's API to support this usage. Per performance complaint from Teodor Sigaev. Discussion: https://postgr.es/m/f994fc98-389f-4a46-d1bc-c42e05cb43ed@sigaev.ru	2017-05-01 14:53:42 -04:00
Tom Lane	2057a58d16	Fix mis-optimization of semijoins with more than one LHS relation. The inner-unique patch (commit `9c7f5229a`) supposed that if we're considering a JOIN_UNIQUE_INNER join path, we can always set inner_unique for the join, because the inner path produced by create_unique_path should be unique relative to the outer relation. However, that's true only if we're considering joining to the whole outer relation --- otherwise we may be applying only some of the join quals, and so the inner path might be non-unique from the perspective of this join. Adjust the test to only believe that we can set inner_unique if we have the whole semijoin LHS on the outer side. There is more that can be done in this area, but this commit is only intended to provide the minimal fix needed to get correct plans. Per report from Teodor Sigaev. Thanks to David Rowley for preliminary investigation. Discussion: https://postgr.es/m/f994fc98-389f-4a46-d1bc-c42e05cb43ed@sigaev.ru	2017-05-01 14:39:11 -04:00
Tom Lane	9c7f5229ad	Optimize joins when the inner relation can be proven unique. If there can certainly be no more than one matching inner row for a given outer row, then the executor can move on to the next outer row as soon as it's found one match; there's no need to continue scanning the inner relation for this outer row. This saves useless scanning in nestloop and hash joins. In merge joins, it offers the opportunity to skip mark/restore processing, because we know we have not advanced past the first possible match for the next outer row. Of course, the devil is in the details: the proof of uniqueness must depend only on joinquals (not otherquals), and if we want to skip mergejoin mark/restore then it must depend only on merge clauses. To avoid adding more planning overhead than absolutely necessary, the present patch errs in the conservative direction: there are cases where inner_unique or skip_mark_restore processing could be used, but it will not do so because it's not sure that the uniqueness proof depended only on "safe" clauses. This could be improved later. David Rowley, reviewed and rather heavily editorialized on by me Discussion: https://postgr.es/m/CAApHDvqF6Sw-TK98bW48TdtFJ+3a7D2mFyZ7++=D-RyPsL76gw@mail.gmail.com	2017-04-07 22:20:13 -04:00
Andres Freund	7c5d8c16e1	Add explicit ORDER BY to a few tests that exercise hash-join code. A proposed patch, also by Thomas and in the same thread, would change the output order of these. Independent of the follow-up patches getting committed, nailing down the order in these specific tests at worst seems harmless. Author: Thomas Munro Discussion: https://postgr.es/m/CAEepm=1D4-tP7j7UAgT_j4ZX2j4Ehe1qgZQWFKBMb8F76UW5Rg@mail.gmail.com	2017-02-08 16:58:21 -08:00
Heikki Linnakangas	181bdb90ba	Fix typos in comments. Backpatch to all supported versions, where applicable, to make backpatching of future fixes go more smoothly. Josh Soref Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com	2017-02-06 11:33:58 +02:00
Tom Lane	207d5a656e	Fix mishandling of equivalence-class tests in parameterized plans. Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z, it was possible for the planner to omit one of the quals needed to enforce that all members of the equivalence class are actually equal. This only happened in the case of a parameterized join node for two of the relations, that is a plan tree like Nested Loop -> Scan X -> Nested Loop -> Scan Y -> Scan Z Filter: Z.Z = X.X The eclass machinery normally expects to apply X.X = Y.Y when those two relations are joined, but in this shape of plan tree they aren't joined until the top node --- and, if the lower nested loop is marked as parameterized by X, the top node will assume that the relevant eclass condition(s) got pushed down into the lower node. On the other hand, the scan of Z assumes that it's only responsible for constraining Z.Z to match any one of the other eclass members. So one or another of the required quals sometimes fell between the cracks, depending on whether consideration of the eclass in get_joinrel_parampathinfo() for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z as the appropriate constraint there. If it generated the latter, it'd erroneously suppose that the Z scan would take care of matters. To fix, force X.X = Y.Y to be generated and applied at that join node when this case occurs. This is extremely hard to hit in practice, because various planner behaviors conspire to mask the problem; starting with the fact that the planner doesn't really like to generate a parameterized plan of the above shape. (It might have been impossible to hit it before we tweaked things to allow this plan shape for star-schema cases.) Many thanks to Alexander Kirkouski for submitting a reproducible test case. The bug can be demonstrated in all branches back to 9.2 where parameterized paths were introduced, so back-patch that far.	2016-04-29 20:19:38 -04:00
Tom Lane	80f66a9ad0	Fix planner failure with full join in RHS of left join. Given a left join containing a full join in its righthand side, with the left join's joinclause referencing only one side of the full join (in a non-strict fashion, so that the full join doesn't get simplified), the planner could fail with "failed to build any N-way joins" or related errors. This happened because the full join was seen as overlapping the left join's RHS, and then recent changes within join_is_legal() caused that function to conclude that the full join couldn't validly be formed. Rather than try to rejigger join_is_legal() yet more to allow this, I think it's better to fix initsplan.c so that the required join order is explicit in the SpecialJoinInfo data structure. The previous coding there essentially ignored full joins, relying on the fact that we don't flatten them in the joinlist data structure to preserve their ordering. That's sufficient to prevent a wrong plan from being formed, but as this example shows, it's not sufficient to ensure that the right plan will be formed. We need to work a bit harder to ensure that the right plan looks sane according to the SpecialJoinInfos. Per bug #14105 from Vojtech Rylko. This was apparently induced by commit `8703059c6` (though now that I've seen it, I wonder whether there are related cases that could have failed before that); so back-patch to all active branches. Unfortunately, that patch also went into 9.0, so this bug is a regression that won't be fixed in that branch.	2016-04-21 20:05:58 -04:00
Tom Lane	d4c3a156cb	Remove GROUP BY columns that are functionally dependent on other columns. If a GROUP BY clause includes all columns of a non-deferred primary key, as well as other columns of the same relation, those other columns are redundant and can be dropped from the grouping; the pkey is enough to ensure that each row of the table corresponds to a separate group. Getting rid of the excess columns will reduce the cost of the sorting or hashing needed to implement GROUP BY, and can indeed remove the need for a sort step altogether. This seems worth testing for since many query authors are not aware of the GROUP-BY-primary-key exception to the rule about queries not being allowed to reference non-grouped-by columns in their targetlists or HAVING clauses. Thus, redundant GROUP BY items are not uncommon. Also, we can make the test pretty cheap in most queries where it won't help by not looking up a rel's primary key until we've found that at least two of its columns are in GROUP BY. David Rowley, reviewed by Julien Rouhaud	2016-02-11 17:34:59 -05:00
Tom Lane	f867ce5518	ExecHashRemoveNextSkewBucket must physically copy tuples to main hashtable. Commit `45f6240a8f` added an assumption in ExecHashIncreaseNumBatches and ExecHashIncreaseNumBuckets that they could find all tuples in the main hash table by iterating over the "dense storage" introduced by that patch. However, ExecHashRemoveNextSkewBucket continued its old practice of simply re-linking deleted skew tuples into the main table's hashchains. Hence, such tuples got lost during any subsequent increase in nbatch or nbuckets, and would never get joined, as reported in bug #13908 from Seth P. I (tgl) think that the aforesaid commit has got multiple design issues and should be reworked rather completely; but there is no time for that right now, so band-aid the problem by making ExecHashRemoveNextSkewBucket physically copy deleted skew tuples into the "dense storage" arena. The added test case is able to exhibit the problem by means of fooling the planner with a WHERE condition that it will underestimate the selectivity of, causing the initial nbatch estimate to be too small. Tomas Vondra and Tom Lane. Thanks to David Johnston for initial investigation into the bug report.	2016-02-07 12:29:32 -05:00
Tom Lane	acfcd45cac	Still more fixes for planner's handling of LATERAL references. More fuzz testing by Andreas Seltenreich exposed that the planner did not cope well with chains of lateral references. If relation X references Y laterally, and Y references Z laterally, then we will have to scan X on the inside of a nestloop with Z, so for all intents and purposes X is laterally dependent on Z too. The planner did not understand this and would generate intermediate joins that could not be used. While that was usually harmless except for wasting some planning cycles, under the right circumstances it would lead to "failed to build any N-way joins" or "could not devise a query plan" planner failures. To fix that, convert the existing per-relation lateral_relids and lateral_referencers relid sets into their transitive closures; that is, they now show all relations on which a rel is directly or indirectly laterally dependent. This not only fixes the chained-reference problem but allows some of the relevant tests to be made substantially simpler and faster, since they can be reduced to simple bitmap manipulations instead of searches of the LateralJoinInfo list. Also, when a PlaceHolderVar that is due to be evaluated at a join contains lateral references, we should treat those references as indirect lateral dependencies of each of the join's base relations. This prevents us from trying to join any individual base relations to the lateral reference source before the join is formed, which again cannot work. Andreas' testing also exposed another oversight in the "dangerous PlaceHolderVar" test added in commit `85e5e222b1`. Simply rejecting unsafe join paths in joinpath.c is insufficient, because in some cases we will end up rejecting all possible paths for a particular join, again leading to "could not devise a query plan" failures. The restriction has to be known also to join_is_legal and its cohort functions, so that they will not select a join for which that will happen. I chose to move the supporting logic into joinrels.c where the latter functions are. Back-patch to 9.3 where LATERAL support was introduced.	2015-12-11 14:22:20 -05:00
Tom Lane	7e19db0c09	Fix another oversight in checking if a join with LATERAL refs is legal. It was possible for the planner to decide to join a LATERAL subquery to the outer side of an outer join before the outer join itself is completed. Normally that's fine because of the associativity rules, but it doesn't work if the subquery contains a lateral reference to the inner side of the outer join. In such a situation the outer join must be done first. join_is_legal() missed this consideration and would allow the join to be attempted, but the actual path-building code correctly decided that no valid join path could be made, sometimes leading to planner errors such as "failed to build any N-way joins". Per report from Andreas Seltenreich. Back-patch to 9.3 where LATERAL support was added.	2015-12-07 17:42:11 -05:00
Tom Lane	6a0779a397	Improve regression test case to avoid depending on system catalog stats. In commit `95f4e59c32` I added a regression test case that examined the plan of a query on system catalogs. That isn't a terribly great idea because the catalogs tend to change from version to version, or even within a version if someone makes an unrelated regression-test change that populates the catalogs a bit differently. Usually I try to make planner test cases rely on test tables that have not changed since Berkeley days, but I got sloppy in this case because the submitted crasher example queried the catalogs and I didn't spend enough time on rewriting it. But it was a problem waiting to happen, as I was rudely reminded when I tried to port that patch into Salesforce's Postgres variant :-(. So spend a little more effort and rewrite the query to not use any system catalogs. I verified that this version still provokes the Assert if 95f4e59c32866716's code fix is reverted. I also removed the EXPLAIN output from the test, as it turns out that the assertion occurs while considering a plan that isn't the one ultimately selected anyway; so there's no value in risking any cross-platform variation in that printout. Back-patch to 9.2, like the previous patch.	2015-08-13 13:25:22 -04:00
Tom Lane	cfe30a72fa	Undo mistaken tightening in join_is_legal(). One of the changes I made in commit `8703059c6b` turns out not to have been such a good idea: we still need the exception in join_is_legal() that allows a join if both inputs already overlap the RHS of the special join we're checking. Otherwise we can miss valid plans, and might indeed fail to find a plan at all, as in recent report from Andreas Seltenreich. That code was added way back in commit `c17117649b`, but I failed to include a regression test case then; my bad. Put it back with a better explanation, and a test this time. The logic does end up a bit different than before though: I now believe it's appropriate to make this check first, thereby allowing such a case whether or not we'd consider the previous SJ(s) to commute with this one. (Presumably, we already decided they did; but it was confusing to have this consideration in the middle of the code that was handling the other case.) Back-patch to all active branches, like the previous patch.	2015-08-12 21:19:03 -04:00
Tom Lane	68fa28f771	Postpone extParam/allParam calculations until the very end of planning. Until now we computed these Param ID sets at the end of subquery_planner, but that approach depends on subquery_planner returning a concrete Plan tree. We would like to switch over to returning one or more Paths for a subquery, and in that representation the necessary details aren't fully fleshed out (not to mention that we don't really want to do this work for Paths that end up getting discarded). Hence, refactor so that we can compute the param ID sets at the end of planning, just before set_plan_references is run. The main change necessary to make this work is that we need to capture the set of outer-level Param IDs available to the current query level before exiting subquery_planner, since the outer levels' plan_params lists are transient. (That's not going to pose a problem for returning Paths, since all the work involved in producing that data is part of expression preprocessing, which will continue to happen before Paths are produced.) On the plus side, this change gets rid of several existing kluges. Eventually I'd like to get rid of SS_finalize_plan altogether in favor of doing this work during set_plan_references, but that will require some complex rejiggering because SS_finalize_plan needs to visit subplans and initplans before the main plan. So leave that idea for another day.	2015-08-11 23:48:37 -04:00
Tom Lane	4200a92862	Further mucking with PlaceHolderVar-related restrictions on join order. Commit `85e5e222b1` turns out not to have taken care of all cases of the partially-evaluatable-PlaceHolderVar problem found by Andreas Seltenreich's fuzz testing. I had set it up to check for risky PHVs only in the event that we were making a star-schema-based exception to the param_source_rels join ordering heuristic. However, it turns out that the problem can occur even in joins that satisfy the param_source_rels heuristic, in which case allow_star_schema_join() isn't consulted. Refactor so that we check for risky PHVs whenever the proposed join has any remaining parameterization. Back-patch to 9.2, like the previous patch (except for the regression test case, which only works back to 9.3 because it uses LATERAL). Note that this discovery implies that problems of this sort could've occurred in 9.2 and up even before the star-schema patch; though I've not tried to prove that experimentally.	2015-08-10 17:18:17 -04:00
Tom Lane	89db83922a	Further adjustments to PlaceHolderVar removal. A new test case from Andreas Seltenreich showed that we were still a bit confused about removing PlaceHolderVars during join removal. Specifically, remove_rel_from_query would remove a PHV that was used only underneath the removable join, even if the place where it's used was the join partner relation and not the join clause being deleted. This would lead to a "too late to create a new PlaceHolderInfo" error later on. We can defend against that by checking ph_eval_at to see if the PHV could possibly be getting used at some partner rel. Also improve some nearby LATERAL-related logic. I decided that the check on ph_lateral needed to take precedence over the check on ph_needed, in case there's a lateral reference underneath the join being considered. (That may be impossible, but I'm not convinced of it, and it's easy enough to defend against the case.) Also, I realized that remove_rel_from_query's logic for updating LateralJoinInfos is dead code, because we don't build those at all until after join removal. Back-patch to 9.3. Previous versions didn't have the LATERAL issues, of course, and they also didn't attempt to remove PlaceHolderInfos during join removal. (I'm starting to wonder if changing that was really such a great idea.)	2015-08-07 14:13:50 -04:00
Tom Lane	bab163e121	Fix old oversight in join removal logic. Commit `9e7e29c75a` introduced an Assert that join removal didn't reduce the eval_at set of any PlaceHolderVar to empty. At first glance it looks like join_is_removable ensures that's true --- but actually, the loop in join_is_removable skips PlaceHolderVars that are not referenced above the join due to be removed. So, if we don't want any empty eval_at sets, the right thing to do is to delete any now-unreferenced PlaceHolderVars from the data structure entirely. Per fuzz testing by Andreas Seltenreich. Back-patch to 9.3 where the aforesaid Assert was added.	2015-08-06 22:14:27 -04:00
Tom Lane	8703059c6b	Further fixes for degenerate outer join clauses. Further testing revealed that commit `f69b4b9495` was still a few bricks shy of a load: minor tweaking of the previous test cases resulted in the same wrong-outer-join-order problem coming back. After study I concluded that my previous changes in make_outerjoininfo() were just accidentally masking the problem, and should be reverted in favor of forcing syntactic join order whenever an upper outer join's predicate doesn't mention a lower outer join's LHS. This still allows the chained-outer-joins style that is the normally optimizable case. I also tightened things up some more in join_is_legal(). It seems to me on review that what's really happening in the exception case where we ignore a mismatched special join is that we're allowing the proposed join to associate into the RHS of the outer join we're comparing it to. As such, we should always insist that the proposed join be a left join, which eliminates a bunch of rather dubious argumentation. The case where we weren't enforcing that was the one that was already known buggy anyway (it had a violatable Assert before the aforesaid commit) so it hardly deserves a lot of deference. Back-patch to all active branches, like the previous patch. The added regression test case failed in all branches back to 9.1, and I think it's only an unrelated change in costing calculations that kept 9.0 from choosing a broken plan.	2015-08-06 15:35:46 -04:00
Tom Lane	85e5e222b1	Fix a PlaceHolderVar-related oversight in star-schema planning patch. In commit `b514a7460d`, I changed the planner so that it would allow nestloop paths to remain partially parameterized, ie the inner relation might need parameters from both the current outer relation and some upper-level outer relation. That's fine so long as we're talking about distinct parameters; but the patch also allowed creation of nestloop paths for cases where the inner relation's parameter was a PlaceHolderVar whose eval_at set included the current outer relation and some upper-level one. That does not work. In principle we could allow such a PlaceHolderVar to be evaluated at the lower join node using values passed down from the upper relation along with values from the join's own outer relation. However, nodeNestloop.c only supports simple Vars not arbitrary expressions as nestloop parameters. createplan.c is also a few bricks shy of being able to handle such cases; it misplaces the PlaceHolderVar parameters in the plan tree, which is why the visible symptoms of this bug are "plan should not reference subplan's variable" and "failed to assign all NestLoopParams to plan nodes" planner errors. Adding the necessary complexity to make this work doesn't seem like it would be repaid in significantly better plans, because in cases where such a PHV exists, there is probably a corresponding join order constraint that would allow a good plan to be found without using the star-schema exception. Furthermore, adding complexity to nodeNestloop.c would create a run-time penalty even for plans where this whole consideration is irrelevant. So let's just reject such paths instead. Per fuzz testing by Andreas Seltenreich; the added regression test is based on his example query. Back-patch to 9.2, like the previous patch.	2015-08-04 14:55:50 -04:00
Tom Lane	f69b4b9495	Fix some planner issues with degenerate outer join clauses. An outer join clause that didn't actually reference the RHS (perhaps only after constant-folding) could confuse the join order enforcement logic, leading to wrong query results. Also, nested occurrences of such things could trigger an Assertion that on reflection seems incorrect. Per fuzz testing by Andreas Seltenreich. The practical use of such cases seems thin enough that it's not too surprising we've not heard field reports about it. This has been broken for a long time, so back-patch to all active branches.	2015-08-01 20:57:41 -04:00
Tom Lane	a6492ff897	Fix an oversight in checking whether a join with LATERAL refs is legal. In many cases, we can implement a semijoin as a plain innerjoin by first passing the righthand-side relation through a unique-ification step. However, one of the cases where this does NOT work is where the RHS has a LATERAL reference to the LHS; that makes the RHS dependent on the LHS so that unique-ification is meaningless. joinpath.c understood this, and so would not generate any join paths of this kind ... but join_is_legal neglected to check for the case, so it would think that we could do it. The upshot would be a "could not devise a query plan for the given query" failure once we had failed to generate any join paths at all for the bogus join pair. Back-patch to 9.3 where LATERAL was added.	2015-07-31 19:26:33 -04:00
Tom Lane	95f4e59c32	Remove an unsafe Assert, and explain join_clause_is_movable_into() better. join_clause_is_movable_into() is approximate, in the sense that it might sometimes return "false" when actually it would be valid to push the given join clause down to the specified level. This is okay ... but there was an Assert in get_joinrel_parampathinfo() that's only safe if the answers are always exact. Comment out the Assert, and add a bunch of commentary to clarify what's going on. Per fuzz testing by Andreas Seltenreich. The added regression test is a pretty silly query, but it's based on his crasher example. Back-patch to 9.2 where the faulty logic was introduced.	2015-07-28 13:20:39 -04:00
Tom Lane	fca8e59c1c	Fix oversight in flattening of subqueries with empty FROM. I missed a restriction that commit `f4abd0241d` should have enforced: we can't pull up an empty-FROM subquery if it's under an outer join, because then we'd need to wrap its output columns in PlaceHolderVars. As the code currently stands, the PHVs end up with empty relid sets, which doesn't work (and is correctly caught by an Assert). It's possible that this could be fixed by assigning the PHVs the relid sets of the parent FromExpr/JoinExpr, but getting that to work is more complication than I care to add right now; indeed it's likely that we'll never bother, since pulling up empty-FROM subqueries is a rather marginal optimization anyway. Per report from Andreas Seltenreich. Back-patch to 9.5 where the faulty code was added.	2015-07-26 17:44:27 -04:00
Tom Lane	358eaa01bf	Make entirely-dummy appendrels get marked as such in set_append_rel_size. The planner generally expects that the estimated rowcount of any relation is at least one row, unless it has been proven empty by constraint exclusion or similar mechanisms, which is marked by installing a dummy path as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up allpaths.c's processing of base rels into separate set_base_rel_sizes and set_base_rel_pathlists steps, the intention was that dummy rels would get marked as such during the "set size" step; this is what justifies an Assert in indxpath.c's get_loop_count that other relations should either be dummy or have positive rowcount. Unfortunately I didn't get that quite right for append relations: if all the child rels have been proven empty then set_append_rel_size would come up with a rowcount of zero, which is correct, but it didn't then do set_dummy_rel_pathlist. (We would have ended up with the right state after set_append_rel_pathlist, but that's too late, if we generate indexpaths for some other rel first.) In addition to fixing the actual bug, I installed an Assert enforcing this convention in set_rel_size; that then allows simplification of a couple of now-redundant tests for zero rowcount in set_append_rel_size. Also, to cover the possibility that third-party FDWs have been careless about not returning a zero rowcount estimate, apply clamp_row_est to whatever an FDW comes up with as the rows estimate. Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches did not have the separation between set_base_rel_sizes and set_base_rel_pathlists steps, so there was no intermediate state where an appendrel would have had inconsistent rowcount and pathlist. It's possible that adding the Assert to set_rel_size would be a good idea in older branches too; but since they're not under development any more, it's likely not worth the trouble.	2015-07-26 16:19:08 -04:00
Tom Lane	3cf8686014	Prevent improper reordering of antijoins vs. outer joins. An outer join appearing within the RHS of an antijoin can't commute with the antijoin, but somehow I missed teaching make_outerjoininfo() about that. In Teodor Sigaev's recent trouble report, this manifests as a "could not find RelOptInfo for given relids" error within eqjoinsel(); but I think silently wrong query results are possible too, if the planner misorders the joins and doesn't happen to trigger any internal consistency checks. It's broken as far back as we had antijoins, so back-patch to all supported branches.	2015-04-25 16:44:27 -04:00
Tom Lane	ca6805338f	Fix incorrect matching of subexpressions in outer-join plan nodes. Previously we would re-use input subexpressions in all expression trees attached to a Join plan node. However, if it's an outer join and the subexpression appears in the nullable-side input, this is potentially incorrect for apparently-matching subexpressions that came from above the outer join (ie, targetlist and qpqual expressions), because the executor will treat the subexpression value as NULL when maybe it should not be. The case is fairly hard to hit because (a) you need a non-strict subexpression (else NULL is correct), and (b) we don't usually compute expressions in the outputs of non-toplevel plan nodes. But we might do so if the expressions are sort keys for a mergejoin, for example. Probably in the long run we should make a more explicit distinction between Vars appearing above and below an outer join, but that will be a major planner redesign and not at all back-patchable. For the moment, just hack set_join_references so that it will not match any non-Var expressions coming from nullable inputs to expressions that came from above the join. (This is somewhat overkill, in that a strict expression could still be matched, but it doesn't seem worth the effort to check that.) Per report from Qingqing Zhou. The added regression test case is based on his example. This has been broken for a very long time, so back-patch to all active branches.	2015-04-04 19:55:15 -04:00
Tom Lane	f4abd0241d	Support flattening of empty-FROM subqueries and one-row VALUES tables. We can't handle this in the general case due to limitations of the planner's data representations; but we can allow it in many useful cases, by being careful to flatten only when we are pulling a single-row subquery up into a FROM (or, equivalently, inner JOIN) node that will still have at least one remaining relation child. Per discussion of an example from Kyotaro Horiguchi.	2015-03-11 23:18:03 -04:00
Tom Lane	b746d0c32d	Fix old bug in get_loop_count(). While poking at David Kubečka's issue I noticed an ancient logic error in get_loop_count(): it used 1.0 as a "no data yet" indicator, but since that is actually a valid rowcount estimate, this doesn't work. If we have one input relation with 1.0 as rowcount and then another one with a larger rowcount, we should use 1.0 as the result, but we picked the larger rowcount instead. (I think when I coded this, I recognized the conflict, but mistakenly thought that the logic would pick the desired count anyway.) Fixing this changed the plan for one existing regression test case. Since the point of that test is to exercise creation of a particular shape of nestloop plan, I tweaked the query a little bit so it still results in the same plan choice. This is definitely a bug, but I'm hesitant to back-patch since it might change plan choices unexpectedly, and anyway failure to implement a heuristic precisely as intended is a pretty low-grade bug.	2015-03-11 22:53:32 -04:00
Robert Haas	e529cd4ffa	Suggest to the user the column they may have meant to reference. Error messages informing the user that no such column exists can sometimes provoke a perplexed response. This often happens due to a subtle typo in the column name or, perhaps less likely, in the alias name. To speed discovery of what the real issue is in such cases, we'll now search the range table for approximate matches. If there are one or two such matches that are good enough to think that they might be what the user intended to type, and better than all other approximate matches, we'll issue a hint suggesting that the user might have intended to reference those columns. Peter Geoghegan and Robert Haas	2015-03-11 10:44:04 -04:00
Tom Lane	b514a7460d	Fix planning of star-schema-style queries. Part of the intent of the parameterized-path mechanism was to handle star-schema queries efficiently, but some overly-restrictive search limiting logic added in commit `e2fa76d80b` prevented such cases from working as desired. Fix that and add a regression test about it. Per gripe from Marc Cousin. This is arguably a bug rather than a new feature, so back-patch to 9.2 where parameterized paths were introduced.	2015-02-28 12:43:04 -05:00
Tom Lane	1b4cc493d2	Preserve AND/OR flatness while extracting restriction OR clauses. The code I added in commit `f343a880d5` was careless about preserving AND/OR flatness: it could create a structure with an OR node directly underneath another one. That breaks an assumption that's fairly important for planning efficiency, not to mention triggering various Asserts (as reported by Benjamin Smith). Add a trifle more logic to handle the case properly.	2014-09-09 18:35:31 -04:00
Tom Lane	f15821eefd	Allow join removal in some cases involving a left join to a subquery. We can remove a left join to a relation if the relation's output is provably distinct for the columns involved in the join clause (considering only equijoin clauses) and the relation supplies no variables needed above the join. Previously, the join removal logic could only prove distinctness by reference to unique indexes of a table. This patch extends the logic to consider subquery relations, wherein distinctness might be proven by reference to GROUP BY, DISTINCT, etc. We actually already had some code to check that a subquery's output was provably distinct, but it was hidden inside pathnode.c; which was a pretty bad place for it really, since that file is mostly boilerplate Path construction and comparison. Move that code to analyzejoins.c, which is arguably a more appropriate location, and is certainly the site of the new usage for it. David Rowley, reviewed by Simon Riggs	2014-07-15 21:12:43 -04:00
Tom Lane	ab76208e3d	Forward-port regression test for bug #10587 into 9.3 and HEAD. Although this bug is already fixed in post-9.2 branches, the case triggering it is quite different from what was under consideration at the time. It seems worth memorializing this example in HEAD just to make sure it doesn't get broken again in future. Extracted from commit `187ae17300`.	2014-06-09 21:37:18 -04:00
Tom Lane	a16d421ca4	Revert "Auto-tune effective_cache size to be 4x shared buffers" This reverts commit `ee1e5662d8`, as well as a remarkably large number of followup commits, which were mostly concerned with the fact that the implementation didn't work terribly well. It still doesn't: we probably need some rather basic work in the GUC infrastructure if we want to fully support GUCs whose default varies depending on the value of another GUC. Meanwhile, it also emerged that there wasn't really consensus in favor of the definition the patch tried to implement (ie, effective_cache_size should default to 4 times shared_buffers). So whack it all back to where it was. In a followup commit, I'll do what was recently agreed to, which is to simply change the default to a higher value.	2014-05-08 20:49:38 -04:00
Tom Lane	043f6ff05d	Fix bogus handling of "postponed" lateral quals. When pulling a "postponed" qual from a LATERAL subquery up into the quals of an outer join, we must make sure that the postponed qual is included in those seen by make_outerjoininfo(). Otherwise we might compute a too-small min_lefthand or min_righthand for the outer join, leading to "JOIN qualification cannot refer to other relations" failures from distribute_qual_to_rels. Subtler errors in the created plan seem possible, too, if the extra qual would only affect join ordering constraints. Per bug #9041 from David Leverton. Back-patch to 9.3.	2014-01-30 14:51:16 -05:00
Tom Lane	158b7fa6a3	Disallow LATERAL references to the target table of an UPDATE/DELETE. On second thought, commit `0c051c9008` was over-hasty: rather than allowing this case, we ought to reject it for now. That leaves the field clear for a future feature that allows the target table to be re-specified in the FROM (or USING) clause, which will enable left-joining the target table to something else. We can then also allow LATERAL references to such an explicitly re-specified target table. But allowing them right now will create ambiguities or worse for such a feature, and it isn't something we documented 9.3 as supporting. While at it, add a convenience subroutine to avoid having several copies of the ereport for disalllowed-LATERAL-reference cases.	2014-01-11 19:03:12 -05:00
Tom Lane	0c051c9008	Fix LATERAL references to target table of UPDATE/DELETE. I failed to think much about UPDATE/DELETE when implementing LATERAL :-(. The implemented behavior ended up being that subqueries in the FROM or USING clause (respectively) could access the update/delete target table as though it were a lateral reference; which seems fine if they said LATERAL, but certainly ought to draw an error if they didn't. Fix it so you get a suitable error when you omit LATERAL. Per report from Emre Hasegeli.	2014-01-07 15:25:27 -05:00
Tom Lane	f343a880d5	Extract restriction OR clauses whether or not they are indexable. It's possible to extract a restriction OR clause from a join clause that has the form of an OR-of-ANDs, if each sub-AND includes a clause that mentions only one specific relation. While PG has been aware of that idea for many years, the code previously only did it if it could extract an indexable OR clause. On reflection, though, that seems a silly limitation: adding a restriction clause can be a win by reducing the number of rows that have to be filtered at the join step, even if we have to test the clause as a plain filter clause during the scan. This should be especially useful for foreign tables, where the change can cut the number of rows that have to be retrieved from the foreign server; but testing shows it can win even on local tables. Per a suggestion from Robert Haas. As a heuristic, I made the code accept an extracted restriction clause if its estimated selectivity is less than 0.9, which will probably result in accepting extracted clauses just about always. We might need to tweak that later based on experience. Since the code no longer has even a weak connection to Path creation, remove orindxpath.c and create a new file optimizer/util/orclauses.c. There's some additional janitorial cleanup of now-dead code that needs to happen, but it seems like that's a fit subject for a separate commit.	2013-12-30 12:24:37 -05:00
Tom Lane	b5e0a2a384	Tweak placement of explicit ANALYZE commands in the regression tests. Make the COPY test, which loads most of the large static tables used in the tests, also explicitly ANALYZE those tables. This allows us to get rid of various ad-hoc, and rather redundant, ANALYZE commands that had gotten stuck into various test scripts over time to ensure we got consistent plan choices. (We could have done a database-wide ANALYZE, but that would cause stats to get attached to the small static tables too, which results in plan changes compared to the historical behavior. I'm not sure that's a good idea, so not going that far for now.) Back-patch to 9.0, since 9.0 and 9.1 are currently sometimes failing regression tests for lack of an "ANALYZE tenk1" in the subselect test. There's no need for this in 8.4 since we didn't print any plans back then.	2013-12-11 15:09:15 -05:00
Tom Lane	f19e92ed04	Flatten join alias Vars before pulling up targetlist items from a subquery. pullup_replace_vars()'s decisions about whether a pulled-up replacement expression needs to be wrapped in a PlaceHolderVar depend on the assumption that what looks like a Var behaves like a Var. However, if the Var is a join alias reference, later flattening of join aliases might replace the Var with something that's not a Var at all, and should have been wrapped. To fix, do a forcible pass of flatten_join_alias_vars() on the subquery targetlist before we start to copy items out of it. We'll re-run that processing on the pulled-up expressions later, but that's harmless. Per report from Ken Tanzer; the added regression test case is based on his example. This bug has been there since the PlaceHolderVar mechanism was invented, but has escaped detection because the circumstances that trigger it are fairly narrow. You need a flattenable query underneath an outer join, which contains another flattenable query inside a join of its own, with a dangerous expression (a constant or something else non-strict) in that one's targetlist. Having seen this, I'm wondering if it wouldn't be prudent to do all alias-variable flattening earlier, perhaps even in the rewriter. But that would probably not be a back-patchable change.	2013-11-22 14:37:21 -05:00
Tom Lane	f3b3b8d5be	Compute correct em_nullable_relids in get_eclass_for_sort_expr(). Bug #8591 from Claudio Freire demonstrates that get_eclass_for_sort_expr must be able to compute valid em_nullable_relids for any new equivalence class members it creates. I'd worried about this in the commit message for `db9f0e1d9a`, but claimed that it wasn't a problem because multi-member ECs should already exist when it runs. That is transparently wrong, though, because this function is also called by initialize_mergeclause_eclasses, which runs during deconstruct_jointree. The example given in the bug report (which the new regression test item is based upon) fails because the COALESCE() expression is first seen by initialize_mergeclause_eclasses rather than process_equivalence. Fixing this requires passing the appropriate nullable_relids set to get_eclass_for_sort_expr, and it requires new code to compute that set for top-level expressions such as ORDER BY, GROUP BY, etc. We store the top-level nullable_relids in a new field in PlannerInfo to avoid computing it many times. In the back branches, I've added the new field at the end of the struct to minimize ABI breakage for planner plugins. There doesn't seem to be a good alternative to changing get_eclass_for_sort_expr's API signature, though. There probably aren't any third-party extensions calling that function directly; moreover, if there are, they probably need to think about what to pass for nullable_relids anyway. Back-patch to 9.2, like the previous patch in this area.	2013-11-15 16:46:18 -05:00
Tom Lane	648bd05b13	Re-allow duplicate aliases within aliased JOINs. Although the SQL spec forbids duplicate table aliases, historically we've allowed queries like SELECT ... FROM tab1 x CROSS JOIN (tab2 x CROSS JOIN tab3 y) z on the grounds that the aliased join (z) hides the aliases within it, therefore there is no conflict between the two RTEs named "x". The LATERAL patch broke this, on the misguided basis that "x" could be ambiguous if tab3 were a LATERAL subquery. To avoid breaking existing queries, it's better to allow this situation and complain only if tab3 actually does contain an ambiguous reference. We need only remove the check that was throwing an error, because the column lookup code is already prepared to handle ambiguous references. Per bug #8444.	2013-11-11 10:42:57 -05:00
Bruce Momjian	ee1e5662d8	Auto-tune effective_cache size to be 4x shared buffers	2013-10-08 12:12:24 -04:00
Tom Lane	c64de21e96	Fix qual-clause-misplacement issues with pulled-up LATERAL subqueries. In an example such as SELECT * FROM i LEFT JOIN LATERAL (SELECT * FROM j WHERE i.n = j.n) j ON true; it is safe to pull up the LATERAL subquery into its parent, but we must then treat the "i.n = j.n" clause as a qual clause of the LEFT JOIN. The previous coding in deconstruct_recurse mistakenly labeled the clause as "is_pushed_down", resulting in wrong semantics if the clause were applied at the join node, as per an example submitted awhile ago by Jeremy Evans. To fix, postpone processing of such clauses until we return back up to the appropriate recursion depth in deconstruct_recurse. In addition, tighten the is-safe-to-pull-up checks in is_simple_subquery; we previously missed the possibility that the LATERAL subquery might itself contain an outer join that makes lateral references in lower quals unsafe. A regression test case equivalent to Jeremy's example was already in my commit of yesterday, but was giving the wrong results because of this bug. This patch fixes the expected output for that, and also adds a test case for the second problem.	2013-08-19 13:19:41 -04:00
Tom Lane	9e7e29c75a	Fix planner problems with LATERAL references in PlaceHolderVars. The planner largely failed to consider the possibility that a PlaceHolderVar's expression might contain a lateral reference to a Var coming from somewhere outside the PHV's syntactic scope. We had a previous report of a problem in this area, which I tried to fix in a quick-hack way in commit `4da6439bd8`, but Antonin Houska pointed out that there were still some problems, and investigation turned up other issues. This patch largely reverts that commit in favor of a more thoroughly thought-through solution. The new theory is that a PHV's ph_eval_at level cannot be higher than its original syntactic level. If it contains lateral references, those don't change the ph_eval_at level, but rather they create a lateral-reference requirement for the ph_eval_at join relation. The code in joinpath.c needs to handle that. Another issue is that createplan.c wasn't handling nested PlaceHolderVars properly. In passing, push knowledge of lateral-reference checks for join clauses into join_clause_is_movable_to. This is mainly so that FDWs don't need to deal with it. This patch doesn't fix the original join-qual-placement problem reported by Jeremy Evans (and indeed, one of the new regression test cases shows the wrong answer because of that). But the PlaceHolderVar problems need to be fixed before that issue can be addressed, so committing this separately seems reasonable.	2013-08-17 20:22:37 -04:00
Tom Lane	1b1d3d92c3	Remove ph_may_need from PlaceHolderInfo, with attendant simplifications. The planner logic that attempted to make a preliminary estimate of the ph_needed levels for PlaceHolderVars seems to be completely broken by lateral references. Fortunately, the potential join order optimization that this code supported seems to be of relatively little value in practice; so let's just get rid of it rather than trying to fix it. Getting rid of this allows fairly substantial simplifications in placeholder.c, too, so planning in such cases should be a bit faster. Issue noted while pursuing bugs reported by Jeremy Evans and Antonin Houska, though this doesn't in itself fix either of their reported cases. What this does do is prevent an Assert crash in the kind of query illustrated by the added regression test. (I'm not sure that the plan for that query is stable enough across platforms to be usable as a regression test output ... but we'll soon find out from the buildfarm.) Back-patch to 9.3. The problem case can't arise without LATERAL, so no need to touch older branches.	2013-08-14 18:38:47 -04:00
Tom Lane	db9f0e1d9a	Postpone creation of pathkeys lists to fix bug #8049 . This patch gets rid of the concept of, and infrastructure for, non-canonical PathKeys; we now only ever create canonical pathkey lists. The need for non-canonical pathkeys came from the desire to have grouping_planner initialize query_pathkeys and related pathkey lists before calling query_planner. However, since query_planner didn't actually do anything with those lists before they'd been made canonical, we can get rid of the whole mess by just not creating the lists at all until the point where we formerly canonicalized them. There are several ways in which we could implement that without making query_planner itself deal with grouping/sorting features (which are supposed to be the province of grouping_planner). I chose to add a callback function to query_planner's API; other alternatives would have required adding more fields to PlannerInfo, which while not bad in itself would create an ABI break for planner-related plugins in the 9.2 release series. This still breaks ABI for anything that calls query_planner directly, but it seems somewhat unlikely that there are any such plugins. I had originally conceived of this change as merely a step on the way to fixing bug #8049 from Teun Hoogendoorn; but it turns out that this fixes that bug all by itself, as per the added regression test. The reason is that now get_eclass_for_sort_expr is adding the ORDER BY expression at the end of EquivalenceClass creation not the start, and so anything that is in a multi-member EquivalenceClass has already been created with correct em_nullable_relids. I am suspicious that there are related scenarios in which we still need to teach get_eclass_for_sort_expr to compute correct nullable_relids, but am not eager to risk destabilizing either 9.2 or 9.3 to fix bugs that are only hypothetical. So for the moment, do this and stop here. Back-patch to 9.2 but not to earlier branches, since they don't exhibit this bug for lack of join-clause-movement logic that depends on em_nullable_relids being correct. (We might have to revisit that choice if any related bugs turn up.) In 9.2, don't change the signature of make_pathkeys_for_sortclauses nor remove canonicalize_pathkeys, so as not to risk more plugin breakage than we have to.	2013-04-29 14:50:03 -04:00
Tom Lane	2378d79ab2	Make LATERAL implicit for functions in FROM. The SQL standard does not have general functions-in-FROM, but it does allow UNNEST() there (see the <collection derived table> production), and the semantics of that are defined to include lateral references. So spec compliance requires allowing lateral references within UNNEST() even without an explicit LATERAL keyword. Rather than making UNNEST() a special case, it seems best to extend this flexibility to any function-in-FROM. We'll still allow LATERAL to be written explicitly for clarity's sake, but it's now a noise word in this context. In theory this change could result in a change in behavior of existing queries, by allowing what had been an outer reference in a function-in-FROM to be captured by an earlier FROM-item at the same level. However, all pre-9.3 PG releases have a bug that causes them to match variable references to earlier FROM-items in preference to outer references (and then throw an error). So no previously-working query could contain the type of ambiguity that would risk a change of behavior. Per a suggestion from Andrew Gierth, though I didn't use his patch.	2013-01-26 16:18:42 -05:00
Tom Lane	72a4231f0c	Fix planning of non-strict equivalence clauses above outer joins. If a potential equivalence clause references a variable from the nullable side of an outer join, the planner needs to take care that derived clauses are not pushed to below the outer join; else they may use the wrong value for the variable. (The problem arises only with non-strict clauses, since if an upper clause can be proven strict then the outer join will get simplified to a plain join.) The planner attempted to prevent this type of error by checking that potential equivalence clauses aren't outerjoin-delayed as a whole, but actually we have to check each side separately, since the two sides of the clause will get moved around separately if it's treated as an equivalence. Bugs of this type can be demonstrated as far back as 7.4, even though releases before 8.3 had only a very ad-hoc notion of equivalence clauses. In addition, we neglected to account for the possibility that such clauses might have nonempty nullable_relids even when not outerjoin-delayed; so the equivalence-class machinery lacked logic to compute correct nullable_relids values for clauses it constructs. This oversight was harmless before 9.2 because we were only using RestrictInfo.nullable_relids for OR clauses; but as of 9.2 it could result in pushing constructed equivalence clauses to incorrect places. (This accounts for bug #7604 from Bill MacArthur.) Fix the first problem by adding a new test check_equivalence_delay() in distribute_qual_to_rels, and fix the second one by adding code in equivclass.c and called functions to set correct nullable_relids for generated clauses. Although I believe the second part of this is not currently necessary before 9.2, I chose to back-patch it anyway, partly to keep the logic similar across branches and partly because it seems possible we might find other reasons why we need valid values of nullable_relids in the older branches. Add regression tests illustrating these problems. In 9.0 and up, also add test cases checking that we can push constants through outer joins, since we've broken that optimization before and I nearly broke it again with an overly simplistic patch for this problem.	2012-10-18 12:30:10 -04:00
Tom Lane	3b8968f252	Rethink heuristics for choosing index quals for parameterized paths. Some experimentation with examples similar to bug #7539 has convinced me that indxpath.c's original implementation of parameterized-path generation was several bricks shy of a load. In general, if we are relying on a particular outer rel or set of outer rels for a parameterized path, the path should use every indexable join clause that's available from that rel or rels. Any join clauses that get left out of the indexqual will end up getting applied as plain filter quals (qpquals), and that's generally a significant loser compared to having the index AM enforce them. (This is particularly true with btree, which can skip the index scan entirely if it can see that the given indexquals are mutually contradictory.) The original heuristics failed to ensure this, though, and were overly complicated anyway. Rewrite to make the code explicitly identify each useful set of outer rels and then select all applicable join clauses for each one. The one plan that changes in the regression tests is in fact for the better according to the planner's cost estimates. (Note: this is not a correctness issue but just a matter of plan quality. I don't yet know what is going on in bug #7539, but I don't expect this change to fix that.)	2012-09-16 17:58:09 -04:00
Tom Lane	6d2c8c0e2a	Drop cheap-startup-cost paths during add_path() if we don't need them. We can detect whether the planner top level is going to care at all about cheap startup cost (it will only do so if query_planner's tuple_fraction argument is greater than zero). If it isn't, we might as well discard paths immediately whose only advantage over others is cheap startup cost. This turns out to get rid of quite a lot of paths in complex queries --- I saw planner runtime reduction of more than a third on one large query. Since add_path isn't currently passed the PlannerInfo "root", the easiest way to tell it whether to do this was to add a bool flag to RelOptInfo. That's a bit redundant, since all relations in a given query level will have the same setting. But in the future it's possible that we'd refine the control decision to work on a per-relation basis, so this seems like a good arrangement anyway. Per my suggestion of a few months ago.	2012-09-01 18:16:24 -04:00
Tom Lane	4da6439bd8	Fix mark_placeholder_maybe_needed to handle LATERAL references. If a PlaceHolderVar contains a pulled-up LATERAL reference, its minimum possible evaluation level might be higher in the join tree than its original syntactic location. That in turn affects the ph_needed level for any contained PlaceHolderVars (that is, those PHVs had better propagate up the join tree at least to the evaluation level of the outer PHV). We got this mostly right, but mark_placeholder_maybe_needed() failed to account for the effect, and in consequence could leave the inner PHVs with ph_may_need less than what their ultimate ph_needed value will be. That's bad because it could lead to failure to select a join order that will allow evaluation of the inner PHV at a valid location. Fix that, and add an Assert that checks that we don't ever set ph_needed to more than ph_may_need.	2012-09-01 13:56:46 -04:00
Tom Lane	da3df99870	Fix LATERAL references to join alias variables. I had thought this case worked already, but perhaps I didn't re-test it after adding extract_lateral_references() ...	2012-08-31 17:44:31 -04:00
Tom Lane	9ff79b9d4e	Fix up planner infrastructure to support LATERAL properly. This patch takes care of a number of problems having to do with failure to choose valid join orders and incorrect handling of lateral references pulled up from subqueries. Notable changes: * Add a LateralJoinInfo data structure similar to SpecialJoinInfo, to represent join ordering constraints created by lateral references. (I first considered extending the SpecialJoinInfo structure, but the semantics are different enough that a separate data structure seems better.) Extend join_is_legal() and related functions to prevent trying to form unworkable joins, and to ensure that we will consider joins that satisfy lateral references even if the joins would be clauseless. * Fill in the infrastructure needed for the last few types of relation scan paths to support parameterization. We'd have wanted this eventually anyway, but it is necessary now because a relation that gets pulled up out of a UNION ALL subquery may acquire a reltargetlist containing lateral references, meaning that its paths have to be parameterized whether or not we have any code that can push join quals down into the scan. * Compute data about lateral references early in query_planner(), and save in RelOptInfo nodes, to avoid repetitive calculations later. * Assorted corner-case bug fixes. There's probably still some bugs left, but this is a lot closer to being real than it was before.	2012-08-26 22:50:23 -04:00
Tom Lane	084a29c94f	Another round of planner fixes for LATERAL. Formerly, subquery pullup had no need to examine other entries in the range table, since they could not contain any references to the subquery being pulled up. That's no longer true with LATERAL, so now we need to be able to visit rangetable subexpressions to replace Vars referencing the pulled-up subquery. Also, this means that extract_lateral_references must be unsurprised at encountering lateral PlaceHolderVars, since such might be created when pulling up a subquery that's underneath an outer join with respect to the lateral reference.	2012-08-18 14:10:17 -04:00
Tom Lane	c1774d2c81	More fixes for planner's handling of LATERAL. Re-allow subquery pullup for LATERAL subqueries, except when the subquery is below an outer join and contains lateral references to relations outside that outer join. If we pull up in such a case, we risk introducing lateral cross-references into outer joins' ON quals, which is something the code is entirely unprepared to cope with right now; and I'm not sure it'll ever be worth coping with. Support lateral refs in VALUES (this seems to be the only additional path type that needs such support as a consequence of re-allowing subquery pullup). Put in a slightly hacky fix for joinpath.c's refusal to consider parameterized join paths even when there cannot be any unparameterized ones. This was causing "could not devise a query plan for the given query" failures in queries involving more than two FROM items. Put in an even more hacky fix for distribute_qual_to_rels() being unhappy with join quals that contain references to rels outside their syntactic scope; which is to say, disable that test altogether. Need to think about how to preserve some sort of debugging cross-check here, while not expending more cycles than befits a debugging cross-check.	2012-08-12 16:01:26 -04:00
Tom Lane	e76af54137	Fix some issues with LATERAL(SELECT UNION ALL SELECT). The LATERAL marking has to be propagated down to the UNION leaf queries when we pull them up. Also, fix the formerly stubbed-off set_append_rel_pathlist(). It does already have enough smarts to cope with making a parameterized Append path at need; it just has to not assume that there must be an unparameterized path.	2012-08-11 18:42:56 -04:00
Tom Lane	5ebaaa4944	Implement SQL-standard LATERAL subqueries. This patch implements the standard syntax of LATERAL attached to a sub-SELECT in FROM, and also allows LATERAL attached to a function in FROM, since set-returning function calls are expected to be one of the principal use-cases. The main change here is a rewrite of the mechanism for keeping track of which relations are visible for column references while the FROM clause is being scanned. The parser "namespace" lists are no longer lists of bare RTEs, but are lists of ParseNamespaceItem structs, which carry an RTE pointer as well as some visibility-controlling flags. Aside from supporting LATERAL correctly, this lets us get rid of the ancient hacks that required rechecking subqueries and JOIN/ON and function-in-FROM expressions for invalid references after they were initially parsed. Invalid column references are now always correctly detected on sight. In passing, remove assorted parser error checks that are now dead code by virtue of our having gotten rid of add_missing_from, as well as some comments that are obsolete for the same reason. (It was mainly add_missing_from that caused so much fudging here in the first place.) The planner support for this feature is very minimal, and will be improved in future patches. It works well enough for testing purposes, though. catversion bump forced due to new field in RangeTblEntry.	2012-08-07 19:02:54 -04:00
Tom Lane	5b7b5518d0	Revise parameterized-path mechanism to fix assorted issues. This patch adjusts the treatment of parameterized paths so that all paths with the same parameterization (same set of required outer rels) for the same relation will have the same rowcount estimate. We cache the rowcount estimates to ensure that property, and hopefully save a few cycles too. Doing this makes it practical for add_path_precheck to operate without a rowcount estimate: it need only assume that paths with different parameterizations never dominate each other, which is close enough to true anyway for coarse filtering, because normally a more-parameterized path should yield fewer rows thanks to having more join clauses to apply. In add_path, we do the full nine yards of comparing rowcount estimates along with everything else, so that we can discard parameterized paths that don't actually have an advantage. This fixes some issues I'd found with add_path rejecting parameterized paths on the grounds that they were more expensive than not-parameterized ones, even though they yielded many fewer rows and hence would be cheaper once subsequent joining was considered. To make the same-rowcounts assumption valid, we have to require that any parameterized path enforce all join clauses that could be obtained from the particular set of outer rels, even if not all of them are useful for indexing. This is required at both base scans and joins. It's a good thing anyway since the net impact is that join quals are checked at the lowest practical level in the join tree. Hence, discard the original rather ad-hoc mechanism for choosing parameterization joinquals, and build a better one that has a more principled rule for when clauses can be moved. The original rule was actually buggy anyway for lack of knowledge about which relations are part of an outer join's outer side; getting this right requires adding an outer_relids field to RestrictInfo.	2012-04-19 15:53:47 -04:00
Tom Lane	e3ffd05b02	Weaken the planner's tests for relevant joinclauses. We should be willing to cross-join two small relations if that allows us to use an inner indexscan on a large relation (that is, the potential indexqual for the large table requires both smaller relations). This worked in simple cases but fell apart as soon as there was a join clause to a fourth relation, because the existence of any two-relation join clause caused the planner to not consider clauseless joins between other base relations. The added regression test shows an example case adapted from a recent complaint from Benoit Delbosc. Adjust have_relevant_joinclause, have_relevant_eclass_joinclause, and has_relevant_eclass_joinclause to consider that a join clause mentioning three or more relations is sufficient grounds for joining any subset of those relations, even if we have to do so via a cartesian join. Since such clauses are relatively uncommon, this shouldn't affect planning speed on typical queries; in fact it should help a bit, because the latter two functions in particular get significantly simpler. Although this is arguably a bug fix, I'm not going to risk back-patching it, since it might have currently-unforeseen consequences.	2012-04-13 16:07:17 -04:00
Tom Lane	8279eb4191	Fix planner's handling of outer PlaceHolderVars within subqueries. For some reason, in the original coding of the PlaceHolderVar mechanism I had supposed that PlaceHolderVars couldn't propagate into subqueries. That is of course entirely possible. When it happens, we need to treat an outer-level PlaceHolderVar much like an outer Var or Aggref, that is SS_replace_correlation_vars() needs to replace the PlaceHolderVar with a Param, and then when building the finished SubPlan we have to provide the PlaceHolderVar expression as an actual parameter for the SubPlan. The handling of the contained expression is a bit delicate but it can be treated exactly like an Aggref's expression. In addition to the missing logic in subselect.c, prepjointree.c was failing to search subqueries for PlaceHolderVars that need their relids adjusted during subquery pullup. It looks like everyplace else that touches PlaceHolderVars got it right, though. Per report from Mark Murawski. In 9.1 and HEAD, queries affected by this oversight would fail with "ERROR: Upper-level PlaceHolderVar found where not expected". But in 9.0 and 8.4, you'd silently get possibly-wrong answers, since the value transmitted into the subquery wouldn't go to null when it should.	2012-03-24 16:21:39 -04:00
Tom Lane	7e3bf99baa	Fix handling of PlaceHolderVars in nestloop parameter management. If we use a PlaceHolderVar from the outer relation in an inner indexscan, we need to reference the PlaceHolderVar as such as the value to be passed in from the outer relation. The previous code effectively tried to reconstruct the PHV from its component expression, which doesn't work since (a) the Vars therein aren't necessarily bubbled up far enough, and (b) it would be the wrong semantics anyway because of the possibility that the PHV is supposed to have gone to null at some point before the current join. Point (a) led to "variable not found in subplan target list" planner errors, but point (b) would have led to silently wrong answers. Per report from Roger Niederland.	2011-11-03 00:50:58 -04:00
Tom Lane	77ba232564	Fix nested PlaceHolderVar expressions that appear only in targetlists. A PlaceHolderVar's expression might contain another, lower-level PlaceHolderVar. If the outer PlaceHolderVar is used, the inner one certainly will be also, and so we have to make sure that both of them get into the placeholder_list with correct ph_may_need values during the initial pre-scan of the query (before deconstruct_jointree starts). We did this correctly for PlaceHolderVars appearing in the query quals, but overlooked the issue for those appearing in the top-level targetlist; with the result that nested placeholders referenced only in the targetlist did not work correctly, as illustrated in bug #6154. While at it, add some error checking to find_placeholder_info to ensure that we don't try to create new placeholders after it's too late to do so; they have to all be created before deconstruct_jointree starts. Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.	2011-08-09 00:50:07 -04:00
Tom Lane	eb22950510	Fix PlaceHolderVar mechanism's interaction with outer joins. The point of a PlaceHolderVar is to allow a non-strict expression to be evaluated below an outer join, after which its value bubbles up like a Var and can be forced to NULL when the outer join's semantics require that. However, there was a serious design oversight in that, namely that we didn't ensure that there was actually a correct place in the plan tree to evaluate the placeholder :-(. It may be necessary to delay evaluation of an outer join to ensure that a placeholder that should be evaluated below the join can be evaluated there. Per recent bug report from Kirill Simonov. Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.	2010-09-28 14:19:00 -04:00
Tom Lane	c8c03d72e1	Fix another join removal bug: the check on PlaceHolderVars was wrong. The previous coding would decide that join removal was unsafe upon finding a PlaceHolderVar that needed to be evaluated at the inner rel and then used above the join. However, this fails to cover the case of PlaceHolderVars that refer to both the inner rel and some other rels. Per bug report from Andrus.	2010-09-25 19:03:50 -04:00
Tom Lane	4e97631e6a	Fix join-removal logic for pseudoconstant and outerjoin-delayed quals. In these cases a qual can get marked with the removable rel in its required_relids, but this is just to schedule its evaluation correctly, not because it really depends on the rel. We were assuming that, in effect, we could throw away all quals so marked, which is nonsense. Tighten up the logic to be a little more paranoid about which quals belong to the outer join being considered for removal, and arrange for all quals that don't belong to be updated so they will still get evaluated correctly. Also fix another problem that happened to be exposed by this test case, which was that make_join_rel() was failing to notice some cases where a constant-false qual could be used to prove a join relation empty. If it's a pushed-down constant false, then the relation is empty even if it's an outer join, because the qual applies after the outer join expansion. Per report from Nathan Grange. Back-patch into 9.0.	2010-09-14 23:15:29 +00:00
Tom Lane	7df4cf7fd3	Fix oversight in join removal patch: we have to delete the removed relation from SpecialJoinInfo relid sets as well. Per example from Vaclav Novotny.	2010-05-23 16:34:38 +00:00
Tom Lane	b78f6264eb	Rework join-removal logic as per recent discussion. In particular this fixes things so that it works for cases where nested removals are possible. The overhead of the optimization should be significantly less, as well.	2010-03-28 22:59:34 +00:00
Tom Lane	8d3c4aa614	Fix an oversight in join-removal optimization: we have to check not only for plain Vars that are generated in the inner rel and used above the join, but also for PlaceHolderVars. Per report from Oleg K.	2010-03-22 13:57:16 +00:00
Tom Lane	90f4c2d960	Add support for doing FULL JOIN ON FALSE. While this is really a rather peculiar variant of UNION ALL, and so wouldn't likely get written directly as-is, it's possible for it to arise as a result of simplification of less-obviously-silly queries. In particular, now that we can do flattening of subqueries that have constant outputs and are underneath an outer join, it's possible for the case to result from simplification of queries of the type exhibited in bug #5263. Back-patch to 8.4 to avoid a functionality regression for this type of query.	2010-01-05 23:25:36 +00:00
Tom Lane	57c9dff9d1	Fix subquery pullup to wrap a PlaceHolderVar around the entire RowExpr that's generated for a whole-row Var referencing the subquery, when the subquery is in the nullable side of an outer join. The previous coding instead put PlaceHolderVars around the elements of the RowExpr. The effect was that when the outer join made the subquery outputs go to null, the whole-row Var produced ROW(NULL,NULL,...) rather than just NULL. There are arguments afoot about whether those things ought to be semantically indistinguishable, but for the moment they are not entirely so, and the planner needs to take care that its machinations preserve the difference. Per bug #5025. Making this feasible required refactoring ResolveNew() to allow more caller control over what is substituted for a Var. I chose to make ResolveNew() a wrapper around a new general-purpose function replace_rte_variables(). I also fixed the ancient bogosity that ResolveNew might fail to set a query's hasSubLinks field after inserting a SubLink in it. Although all current callers make sure that happens anyway, we've had bugs of that sort before, and it seemed like a good time to install a proper solution. Back-patch to 8.4. The problem can be demonstrated clear back to 8.0, but the fix would be too invasive in earlier branches; not to mention that people may be depending on the subtly-incorrect behavior. The 8.4 series is new enough that fixing this probably won't cause complaints, but it might in older branches. Also, 8.4 shows the incorrect behavior in more cases than older branches do, because it is able to flatten subqueries in more cases.	2009-09-02 17:52:24 +00:00
Tom Lane	501255114d	Add a simple test case covering a join against an inheritance tree, since we're evidently not testing that at all right now :-(	2009-08-13 17:14:38 +00:00
Tom Lane	a43b190e3c	Fix a thinko in join_is_legal: when we decide we can implement a semijoin by unique-ifying the RHS and then inner-joining to some other relation, that is not grounds for violating the RHS of some other outer join. Noticed while regression-testing new GEQO code, which will blindly follow any path that join_is_legal says is legal, and then complain later if that leads to a dead end. I'm not certain that this can result in any visible failure in 8.4: the mistake may always be masked by the fact that subsequent attempts to join the rest of the RHS of the other join will fail. But I'm not certain it can't, either, and it's definitely not operating as intended. So back-patch. The added regression test depends on the new no-failures-allowed logic that I'm about to commit in GEQO, so no point back-patching that.	2009-07-19 20:32:48 +00:00
Tom Lane	fb18055998	Repair bug #4926 "too few pathkeys for mergeclauses". This example shows that the sanity checking I added to create_mergejoin_plan() in 8.3 was a few bricks shy of a load: the mergeclauses could reference pathkeys in a noncanonical order such as x,y,x, not only cases like x,x,y which is all that the code had allowed for. The odd cases only turn up when using redundant clauses in an outer join condition, which is why no one had noticed before.	2009-07-17 23:19:34 +00:00
Tom Lane	e549722a8b	Get rid of the rather fuzzily defined FlattenedSubLink node type in favor of making pull_up_sublinks() construct a full-blown JoinExpr tree representation of IN/EXISTS SubLinks that it is able to convert to semi or anti joins. This makes pull_up_sublinks() a shade more complex, but the gain in semantic clarity is worth it. I still have more to do in this area to address the previously-discussed problems, but this commit in itself fixes at least one bug in HEAD, as shown by added regression test case.	2009-02-25 03:30:38 +00:00
Peter Eisentraut	8987c115f1	Alter regression test cases that rely on the sort order of "aa". Some locales (da_DK, fo_FO, kl_GL, nb_NO, nn_NO in glibc) sort "aa" after "z".	2009-01-19 13:38:47 +00:00
Tom Lane	dcc2334736	Consider a clause to be outerjoin_delayed if it references the nullable side of any lower outer join, even if it also references the non-nullable side and so could not get pushed below the outer join anyway. We need this in case the clause is an OR clause: if it doesn't get marked outerjoin_delayed, create_or_index_quals() could pull an indexable restriction for the nullable side out of it, leading to wrong results as demonstrated by today's bug report from toruvinn. (See added regression test case for an example.) In principle this has been wrong for quite a while. In practice I don't think any branch before 8.3 can really show the failure, because create_or_index_quals() will only pull out indexable conditions, and before 8.3 those were always strict. So though we might have improperly generated null-extended rows in the outer join, they'd get discarded from the result anyway. The gating factor that makes the failure visible is that 8.3 considers "col IS NULL" to be indexable. Hence I'm not going to risk back-patching further than 8.3.	2008-06-27 20:54:37 +00:00
Tom Lane	6a6522529f	Fix some planner issues found while investigating Kevin Grittner's report of poorer planning in 8.3 than 8.2: 1. After pushing a constant across an outer join --- ie, given "a LEFT JOIN b ON (a.x = b.y) WHERE a.x = 42", we can deduce that b.y is sort of equal to 42, in the sense that we needn't fetch any b rows where it isn't 42 --- loop to see if any additional deductions can be made. Previous releases did that by recursing, but I had mistakenly thought that this was no longer necessary given the EquivalenceClass machinery. 2. Allow pushing constants across outer join conditions even if the condition is outerjoin_delayed due to a lower outer join. This is safe as long as the condition is strict and we re-test it at the upper join. 3. Keep the outer-join clause even if we successfully push a constant across it. This is necessary in the outerjoin_delayed case, but even in the simple case, it seems better to do this to ensure that the join search order heuristics will consider the join as reasonable to make. Mark such a clause as having selectivity 1.0, though, since it's not going to eliminate very many rows after application of the constant condition. 4. Tweak have_relevant_eclass_joinclause to report that two relations are joinable when they have vars that are equated to the same constant. We won't actually generate any joinclause from such an EquivalenceClass, but again it seems that in such a case it's a good idea to consider the join as worth costing out. 5. Fix a bug in select_mergejoin_clauses that was exposed by these changes: we have to reject candidate mergejoin clauses if either side was equated to a constant, because we can't construct a canonical pathkey list for such a clause. This is an implementation restriction that might be worth fixing someday, but it doesn't seem critical to get it done for 8.3.	2008-01-09 20:42:29 +00:00
Tom Lane	b4c806faa8	Rewrite make_outerjoininfo's construction of min_lefthand and min_righthand sets for outer joins, in the light of bug #3588 and additional thought and experimentation. The original methodology was fatally flawed for nests of more than two outer joins: it got the relationships between adjacent joins right, but didn't always come to the right conclusions about whether a join could be interchanged with one two or more levels below it. This was largely caused by a mistaken idea that we should use the min_lefthand + min_righthand sets of a sub-join as the minimum left or right input set of an upper join when we conclude that the sub-join can't commute with the upper one. If there's a still-lower join that the sub-join can commute with, this method led us to think that that one could commute with the topmost join; which it can't. Another problem (not directly connected to bug #3588) was that make_outerjoininfo's processing-order-dependent method for enforcing outer join identity #3 didn't work right: if we decided that join A could safely commute with lower join B, we dropped all information about sub-joins under B that join A could perhaps not safely commute with, because we removed B's entire min_righthand from A's. To fix, make an explicit computation of all inner join combinations that occur below an outer join, and add to that the full syntactic relsets of any lower outer joins that we determine it can't commute with. This method gives much more direct enforcement of the outer join rearrangement identities, and it turns out not to cost a lot of additional bookkeeping. Thanks to Richard Harris for the bug report and test case.	2007-08-31 01:44:06 +00:00
Tom Lane	ed5d55dafe	Fix a bug in the original implementation of redundant-join-clause removal: clauses in which one side or the other references both sides of the join cannot be removed as redundant, because that expression won't have been constrained below the join. Per report from Sergey Burladyan. CVS HEAD does not contain this bug due to EquivalenceClass rewrite, but it seems wise to include the regression test for it anyway.	2007-07-31 19:53:37 +00:00
Tom Lane	11086f2f2b	Repair planner bug introduced in 8.2 by ability to rearrange outer joins: in cases where a sub-SELECT inserts a WHERE clause between two outer joins, that clause may prevent us from re-ordering the two outer joins. The code was considering only the joins' own ON-conditions in determining reordering safety, which is not good enough. Add a "delay_upper_joins" flag to OuterJoinInfo to flag that we have detected such a clause and higher-level outer joins shouldn't be permitted to commute with this one. (This might seem overly coarse, but given the current rules for OJ reordering, it's sufficient AFAICT.) The failure case is actually pretty narrow: it needs a WHERE clause within the RHS of a left join that checks the RHS of a lower left join, but is not strict for that RHS (else we'd have simplified the lower join to a plain join). Even then no failure will be manifest unless the planner chooses to rearrange the join order. Per bug report from Adam Terrey.	2007-05-22 23:23:58 +00:00

1 2 3 4

170 Commits