Commit Graph

98 Commits

Author SHA1 Message Date
Tom Lane d7153c5fad Fix best_inner_indexscan to return both the cheapest-total-cost and
cheapest-startup-cost innerjoin indexscans, and make joinpath.c consider
both of these (when different) as the inside of a nestloop join.  The
original design was based on the assumption that indexscan paths always
have negligible startup cost, and so total cost is the only important
figure of merit; an assumption that's obviously broken by bitmap
indexscans.  This oversight could lead to choosing poor plans in cases
where fast-start behavior is more important than total cost, such as
LIMIT and IN queries.  8.1-vintage brain fade exposed by an example from
Chuck D.
2007-05-22 01:40:33 +00:00
Tom Lane fa92d21a48 Avoid running build_index_pathkeys() in situations where there cannot
possibly be any useful pathkeys --- to wit, queries with neither any
join clauses nor any ORDER BY request.  It's nearly free to check for
this case and it saves a useful fraction of the planning time for simple
queries.
2007-04-15 20:09:28 +00:00
Tom Lane 6bef118b01 Restructure code that is responsible for ensuring that clauseless joins are
considered when it is necessary to do so because of a join-order restriction
(that is, an outer-join or IN-subselect construct).  The former coding was a
bit ad-hoc and inconsistent, and it missed some cases, as exposed by Mario
Weilguni's recent bug report.  His specific problem was that an IN could be
turned into a "clauseless" join due to constant-propagation removing the IN's
joinclause, and if the IN's subselect involved more than one relation and
there was more than one such IN linking to the same upper relation, then the
only valid join orders involve "bushy" plans but we would fail to consider the
specific paths needed to get there.  (See the example case added to the join
regression test.)  On examining the code I wonder if there weren't some other
problem cases too; in particular it seems that GEQO was defending against a
different set of corner cases than the main planner was.  There was also an
efficiency problem, in that when we did realize we needed a clauseless join
because of an IN, we'd consider clauseless joins against every other relation
whether this was sensible or not.  It seems a better design is to use the
outer-join and in-clause lists as a backup heuristic, just as the rule of
joining only where there are joinclauses is a heuristic: we'll join two
relations if they have a usable joinclause *or* this might be necessary to
satisfy an outer-join or IN-clause join order restriction.  I refactored the
code to have just one place considering this instead of three, and made sure
that it covered all the cases that any of them had been considering.

Backpatch as far as 8.1 (which has only the IN-clause form of the disease).
By rights 8.0 and 7.4 should have the bug too, but they accidentally fail
to fail, because the joininfo structure used in those releases preserves some
memory of there having once been a joinclause between the inner and outer
sides of an IN, and so it leads the code in the right direction anyway.
I'll be conservative and not touch them.
2007-02-16 00:14:01 +00:00
Tom Lane f41803bb39 Refactor planner's pathkeys data structure to create a separate, explicit
representation of equivalence classes of variables.  This is an extensive
rewrite, but it brings a number of benefits:
* planner no longer fails in the presence of "incomplete" operator families
that don't offer operators for every possible combination of datatypes.
* avoid generating and then discarding redundant equality clauses.
* remove bogus assumption that derived equalities always use operators
named "=".
* mergejoins can work with a variety of sort orders (e.g., descending) now,
instead of tying each mergejoinable operator to exactly one sort order.
* better recognition of redundant sort columns.
* can make use of equalities appearing underneath an outer join.
2007-01-20 20:45:41 +00:00
Bruce Momjian 29dccf5fe0 Update CVS HEAD for 2007 copyright. Back branches are typically not
back-stamped for this.
2007-01-05 22:20:05 +00:00
Tom Lane 8a30cc2127 Make the planner estimate costs for nestloop inner indexscans on the basis
that the Mackert-Lohmann formula applies across all the repetitions of the
nestloop, not just each scan independently.  We use the M-L formula to
estimate the number of pages fetched from the index as well as from the table;
that isn't what it was designed for, but it seems reasonably applicable
anyway.  This makes large numbers of repetitions look much cheaper than
before, which accords with many reports we've received of overestimation
of the cost of a nestloop.  Also, change the index access cost model to
charge random_page_cost per index leaf page touched, while explicitly
not counting anything for access to metapage or upper tree pages.  This
may all need tweaking after we get some field experience, but in simple
tests it seems to be giving saner results than before.  The main thing
is to get the infrastructure in place to let cost_index() and amcostestimate
functions take repeated scans into account at all.  Per my recent proposal.

Note: this patch changes pg_proc.h, but I did not force initdb because
the changes are basically cosmetic --- the system does not look into
pg_proc to decide how to call an index amcostestimate function, and
there's no way to call such a function from SQL at all.
2006-06-06 17:59:58 +00:00
Bruce Momjian f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Tom Lane a1b7e70c5f Fix code that checks to see if an index can be considered to match the query's
requested sort order.  It was assuming that build_index_pathkeys always
generates a pathkey per index column, which was not true if implied equality
deduction had determined that two index columns were effectively equated to
each other.  Simplest fix seems to be to install an option that causes
build_index_pathkeys to support this behavior as well as the original one.
Per report from Brian Hirt.
2006-01-29 17:27:42 +00:00
Tom Lane e3b9852728 Teach planner how to rearrange join order for some classes of OUTER JOIN.
Per my recent proposal.  I ended up basing the implementation on the
existing mechanism for enforcing valid join orders of IN joins --- the
rules for valid outer-join orders are somewhat similar.
2005-12-20 02:30:36 +00:00
Tom Lane 290166f934 Teach planner and executor to handle ScalarArrayOpExpr as an indexable
qualification when the underlying operator is indexable and useOr is true.
That is, indexkey op ANY (ARRAY[...]) is effectively translated into an
OR combination of one indexscan for each array element.  This only works
for bitmap index scans, of course, since regular indexscans no longer
support OR'ing of scans.  There are still some loose ends to clean up
before changing 'x IN (list)' to translate as a ScalarArrayOpExpr;
for instance predtest.c ought to be taught about it.  But this gets the
basic functionality in place.
2005-11-25 19:47:50 +00:00
Bruce Momjian 1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane 4e5fbb34b3 Change the division of labor between grouping_planner and query_planner
so that the latter estimates the number of groups that grouping will
produce.  This is needed because it is primarily query_planner that
makes the decision between fast-start and fast-finish plans, and in the
original coding it was unable to make more than a crude rule-of-thumb
choice when the query involved grouping.  This revision helps us make
saner choices for queries like SELECT ... GROUP BY ... LIMIT, as in a
recent example from Mark Kirkwood.  Also move the responsibility for
canonicalizing sort_pathkeys and group_pathkeys into query_planner;
this information has to be available anyway to support the first change,
and doing it this way lets us get rid of compare_noncanonical_pathkeys
entirely.
2005-08-27 22:13:44 +00:00
Tom Lane a4ca842319 Fix a bunch of bad interactions between partial indexes and the new
planning logic for bitmap indexscans.  Partial indexes create corner
cases in which a scan might be done with no explicit index qual conditions,
and the code wasn't handling those cases nicely.  Also be a little
tenser about eliminating redundant clauses in the generated plan.
Per report from Dmitry Karasik.
2005-07-28 20:26:22 +00:00
Tom Lane 2f1210629c Separate predicate-testing code out of indxpath.c, making it a module
in its own right.  As proposed by Simon Riggs, but with some editorializing
of my own.
2005-06-10 22:25:37 +00:00
Tom Lane 9ab4d98168 Remove planner's private fields from Query struct, and put them into
a new PlannerInfo struct, which is passed around instead of the bare
Query in all the planning code.  This commit is essentially just a
code-beautification exercise, but it does open the door to making
larger changes to the planner data structures without having to muck
with the widely-known Query struct.
2005-06-05 22:32:58 +00:00
Tom Lane 5b05185262 Remove support for OR'd indexscans internal to a single IndexScan plan
node, as this behavior is now better done as a bitmap OR indexscan.
This allows considerable simplification in nodeIndexscan.c itself as
well as several planner modules concerned with indexscan plan generation.
Also we can improve the sharing of code between regular and bitmap
indexscans, since they are now working with nigh-identical Plan nodes.
2005-04-25 01:30:14 +00:00
Tom Lane bc843d3960 First cut at planner support for bitmap index scans. Lots to do yet,
but the code is basically working.  Along the way, rewrite the entire
approach to processing OR index conditions, and make it work in join
cases for the first time ever.  orindxpath.c is now basically obsolete,
but I left it in for the time being to allow easy comparison testing
against the old implementation.
2005-04-22 21:58:32 +00:00
Tom Lane addc42c339 Create the planner mechanism for optimizing simple MIN and MAX queries
into indexscans on matching indexes.  For the moment, it only handles
int4 and text datatypes; next step is to add a column to pg_aggregate
so that all MIN/MAX aggregates can be handled.  Per my recent proposal.
2005-04-11 23:06:57 +00:00
Tom Lane 926e8a00d3 Add a back-link from IndexOptInfo structs to their parent RelOptInfo
structs.  There are many places in the planner where we were passing
both a rel and an index to subroutines, and now need only pass the
index struct.  Notationally simpler, and perhaps a tad faster.
2005-03-27 06:29:49 +00:00
Tom Lane febc9a613c Expand the 'special index operator' machinery to handle special cases
for boolean indexes.  Previously we would only use such an index with
WHERE clauses like 'indexkey = true' or 'indexkey = false'.  The new
code transforms the cases 'indexkey', 'NOT indexkey', 'indexkey IS TRUE',
and 'indexkey IS FALSE' into one of these.  While this is only marginally
useful in itself, I intend soon to change constant-expression simplification
so that 'foo = true' and 'foo = false' are reduced to just 'foo' and
'NOT foo' ... which would lose the ability to use boolean indexes for
such queries at all, if the indexscan machinery couldn't make the
reverse transformation.
2005-03-26 23:29:20 +00:00
Tom Lane 94e4778a31 The result of a FULL or RIGHT join can't be assumed to be sorted by the
left input's sorting, because null rows may be inserted at various points.
Per report from Ferenc Lutischá¸n.
2005-01-23 02:21:36 +00:00
PostgreSQL Daemon 2ff501590b Tag appropriate files for rc3
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
2004-12-31 22:04:05 +00:00
Tom Lane 26112850ec Fix OR-index-scan planner to recognize that a partial index is usable
for scanning one term of an OR clause if the index's predicate is implied
by that same OR clause term (possibly in conjunction with top-level WHERE
clauses).  Per recent example from Dawid Kuroczko,
http://archives.postgresql.org/pgsql-performance/2004-10/msg00095.php
Also, fix a very long-standing bug in index predicate testing, namely the
bizarre ordering of decomposition of predicate and restriction clauses.
AFAICS the correct way is to break down the predicate all the way, and
then for each component term see if you can prove it from the entire
restriction set.  The original coding had a purely-implementation-artifact
distinction between ANDing at the top level and ANDing below that, and
proceeded to get the decomposition order wrong everywhere below the top
level, with the result that even slightly complicated AND/OR predicates
could not be proven.  For instance, given
create index foop on foo(f2) where f1=42 or f1=1
    or (f1 = 11 and f2 = 55);
the old code would fail to match this index to the query
select * from foo where f1 = 11 and f2 = 55;
when it obviously ought to match.
2004-10-11 22:57:00 +00:00
Bruce Momjian b6b71b85bc Pgindent run for 8.0. 2004-08-29 05:07:03 +00:00
Bruce Momjian da9a8649d8 Update copyright to 2004. 2004-08-29 04:13:13 +00:00
Tom Lane fa559a86ee Adjust indexscan planning logic to keep RestrictInfo nodes associated
with index qual clauses in the Path representation.  This saves a little
work during createplan and (probably more importantly) allows reuse of
cached selectivity estimates during indexscan planning.  Also fix latent
bug: wrong plan would have been generated for a 'special operator' used
in a nestloop-inner-indexscan join qual, because the special operator
would not have gotten into the list of quals to recheck.  This bug is
only latent because at present the special-operator code could never
trigger on a join qual, but sooner or later someone will want to do it.
2004-01-05 23:39:54 +00:00
Tom Lane 9091e8d1b2 Add the ability to extract OR indexscan conditions from OR-of-AND
join conditions in which each OR subclause includes a constraint on
the same relation.  This implements the other useful side-effect of
conversion to CNF format, without its unpleasant side-effects.  As
per pghackers discussion of a few weeks ago.
2004-01-05 05:07:36 +00:00
Tom Lane 6cb1c0238b Rewrite OR indexscan processing to be more flexible. We can now for the
first time generate an OR indexscan for a two-column index when the WHERE
condition is like 'col1 = foo AND (col2 = bar OR col2 = baz)' --- before,
the OR had to be on the first column of the index or we'd not notice the
possibility of using it.  Some progress towards extracting OR indexscans
from subclauses of an OR that references multiple relations, too, although
this code is #ifdef'd out because it needs more work.
2004-01-04 00:07:32 +00:00
PostgreSQL Daemon 55b113257c make sure the $Id tags are converted to $PostgreSQL as well ... 2003-11-29 22:41:33 +00:00
Bruce Momjian f3c3deb7d0 Update copyrights to 2003. 2003-08-04 02:40:20 +00:00
Bruce Momjian 089003fb46 pgindent run. 2003-08-04 00:43:34 +00:00
Tom Lane f45df8c014 Cause CHAR(n) to TEXT or VARCHAR conversion to automatically strip trailing
blanks, in hopes of reducing the surprise factor for newbies.  Remove
redundant operators for VARCHAR (it depends wholly on TEXT operations now).
Clean up resolution of ambiguous operators/functions to avoid surprising
choices for domains: domains are treated as equivalent to their base types
and binary-coercibility is no longer considered a preference item when
choosing among multiple operators/functions.  IsBinaryCoercible now correctly
reflects the notion that you need *only* relabel the type to get from type
A to type B: that is, a domain is binary-coercible to its base type, but
not vice versa.  Various marginal cleanup, including merging the essentially
duplicate resolution code in parse_func.c and parse_oper.c.  Improve opr_sanity
regression test to understand about binary compatibility (using pg_cast),
and fix a couple of small errors in the catalogs revealed thereby.
Restructure "special operator" handling to fetch operators via index opclasses
rather than hardwiring assumptions about names (cleans up the pattern_ops
stuff a little).
2003-05-26 00:11:29 +00:00
Tom Lane 056467ec6b Teach planner how to propagate pathkeys from sub-SELECTs in FROM up to
the outer query.  (The implementation is a bit klugy, but it would take
nontrivial restructuring to make it nicer, which this is probably not
worth.)  This avoids unnecessary sort steps in examples like
SELECT foo,count(*) FROM (SELECT ... ORDER BY foo,bar) sub GROUP BY foo
which means there is now a reasonable technique for controlling the
order of inputs to custom aggregates, even in the grouping case.
2003-02-15 20:12:41 +00:00
Tom Lane 9f5f212475 Allow the planner to collapse explicit inner JOINs together, rather than
necessarily following the JOIN syntax to develop the query plan.  The old
behavior is still available by setting GUC variable JOIN_COLLAPSE_LIMIT
to 1.  Also create a GUC variable FROM_COLLAPSE_LIMIT to control the
similar decision about when to collapse sub-SELECT lists into their parent
lists.  (This behavior existed already, but the limit was always
GEQO_THRESHOLD/2; now it's separately adjustable.)
2003-01-25 23:10:30 +00:00
Tom Lane f5e83662d0 Modify planner's implied-equality-deduction code so that when a set
of known-equal expressions includes any constant expressions (including
Params from outer queries), we actively suppress any 'var = var'
clauses that are or could be deduced from the set, generating only the
deducible 'var = const' clauses instead.  The idea here is to push down
the restrictions implied by the equality set to base relations whenever
possible.  Once we have applied the 'var = const' clauses, the 'var = var'
clauses are redundant, and should be suppressed both to save work at
execution and to avoid double-counting restrictivity.
2003-01-24 03:58:44 +00:00
Tom Lane 9f76d0d926 Fix GEQO to work again in CVS tip, by being more careful about memory
allocation in best_inner_indexscan().  While at it, simplify GEQO's
interface to the main planner --- make_join_rel() offers exactly the
API it really wants, whereas calling make_rels_by_clause_joins() and
make_rels_by_clauseless_joins() required jumping through hoops.
Rewrite gimme_tree for clarity (sometimes iteration is much better than
recursion), and approximately halve GEQO's runtime by recognizing that
tours of the forms (a,b,c,d,...) and (b,a,c,d,...) are equivalent
because of symmetry in make_join_rel().
2002-12-16 21:30:30 +00:00
Tom Lane a0bf885f9e Phase 2 of read-only-plans project: restructure expression-tree nodes
so that all executable expression nodes inherit from a common supertype
Expr.  This is somewhat of an exercise in code purity rather than any
real functional advance, but getting rid of the extra Oper or Func node
formerly used in each operator or function call should provide at least
a little space and speed improvement.
initdb forced by changes in stored-rules representation.
2002-12-12 15:49:42 +00:00
Tom Lane 04c8785c7b Restructure planning of nestloop inner indexscans so that the set of usable
joinclauses is determined accurately for each join.  Formerly, the code only
considered joinclauses that used all of the rels from the outer side of the
join; thus for example
	FROM (a CROSS JOIN b) JOIN c ON (c.f1 = a.x AND c.f2 = b.y)
could not exploit a two-column index on c(f1,f2), since neither of the
qual clauses would be in the joininfo list it looked in.  The new code does
this correctly, and also is able to eliminate redundant clauses, thus fixing
the problem noted 24-Oct-02 by Hans-Jürgen Schönig.
2002-11-24 21:52:15 +00:00
Bruce Momjian d84fe82230 Update copyright to 2002. 2002-06-20 20:29:54 +00:00
Bruce Momjian ea08e6cd55 New pgindent run with fixes suggested by Tom. Patch manually reviewed,
initdb/regression tests pass.
2001-11-05 17:46:40 +00:00
Bruce Momjian 6783b2372e Another pgindent run. Fixes enum indenting, and improves #endif
spacing.  Also adds space for one-line comments.
2001-10-28 06:26:15 +00:00
Bruce Momjian b81844b173 pgindent run on all C files. Java run to follow. initdb/regression
tests pass.
2001-10-25 05:50:21 +00:00
Tom Lane 6254465d06 Extend code that deduces implied equality clauses to detect whether a
clause being added to a particular restriction-clause list is redundant
with those already in the list.  This avoids useless work at runtime,
and (perhaps more importantly) keeps the selectivity estimation routines
from generating too-small estimates of numbers of output rows.
Also some minor improvements in OPTIMIZER_DEBUG displays.
2001-10-18 16:11:42 +00:00
Tom Lane f933766ba7 Restructure pg_opclass, pg_amop, and pg_amproc per previous discussions in
pgsql-hackers.  pg_opclass now has a row for each opclass supported by each
index AM, not a row for each opclass name.  This allows pg_opclass to show
directly whether an AM supports an opclass, and furthermore makes it possible
to store additional information about an opclass that might be AM-dependent.
pg_opclass and pg_amop now store "lossy" and "haskeytype" information that we
previously expected the user to remember to provide in CREATE INDEX commands.
Lossiness is no longer an index-level property, but is associated with the
use of a particular operator in a particular index opclass.

Along the way, IndexSupportInitialize now uses the syscaches to retrieve
pg_amop and pg_amproc entries.  I find this reduces backend launch time by
about ten percent, at the cost of a couple more special cases in catcache.c's
IndexScanOK.

Initial work by Oleg Bartunov and Teodor Sigaev, further hacking by Tom Lane.

initdb forced.
2001-08-21 16:36:06 +00:00
Tom Lane cdd230d628 Improve planning of OR indexscan plans: for quals like
WHERE (a = 1 or a = 2) and b = 42
and an index on (a,b), include the clause b = 42 in the indexquals
generated for each arm of the OR clause.  Essentially this is an index-
driven conversion from CNF to DNF.  Implementation is a bit klugy, but
better than not exploiting the extra quals at all ...
2001-06-05 17:13:52 +00:00
Tom Lane be03eb25f3 Modify optimizer data structures so that IndexOptInfo lists built for
create_index_paths are not immediately discarded, but are available for
subsequent planner work.  This allows avoiding redundant syscache lookups
in several places.  Change interface to operator selectivity estimation
procedures to allow faster and more flexible estimation.
Initdb forced due to change of pg_proc entries for selectivity functions!
2001-05-20 20:28:20 +00:00
Bruce Momjian 9e1552607a pgindent run. Make it all clean. 2001-03-22 04:01:46 +00:00
Bruce Momjian 623bf843d2 Change Copyright from PostgreSQL, Inc to PostgreSQL Global Development Group. 2001-01-24 19:43:33 +00:00
Bruce Momjian 7df3bb50f0 Add all possible config file options. 2001-01-24 18:37:31 +00:00
Tom Lane ea166f1146 Planner speedup hacking. Avoid saving useless pathkeys, so that path
comparison does not consider paths different when they differ only in
uninteresting aspects of sort order.  (We had a special case of this
consideration for indexscans already, but generalize it to apply to
ordered join paths too.)  Be stricter about what is a canonical pathkey
to allow faster pathkey comparison.  Cache canonical pathkeys and
dispersion stats for left and right sides of a RestrictInfo's clause,
to avoid repeated computation.  Total speedup will depend on number of
tables in a query, but I see about 4x speedup of planning phase for
a sample seven-table query.
2000-12-14 22:30:45 +00:00