Commit Graph

347 Commits

Author SHA1 Message Date
Tom Lane e540b97248 Fix an oversight in the 8.2 patch that improved mergejoin performance by
inserting a materialize node above an inner-side sort node, when the sort is
expected to spill to disk.  (The materialize protects the sort from having
to support mark/restore, allowing it to do its final merge pass on-the-fly.)
We neglected to teach cost_mergejoin about that hack, so it was failing to
include the materialize's costs in the estimated cost of the mergejoin.
The materialize's costs are generally going to be pretty negligible in
comparison to the sort's, so this is only a small error and probably not
worth back-patching; but it's still wrong.

In the similar case where a materialize is inserted to protect an inner-side
node that can't do mark/restore at all, it's still true that the materialize
should not spill to disk, and so we should cost it cheaply rather than
expensively.

Noted while thinking about a question from Tom Raney.
2008-09-05 21:07:29 +00:00
Tom Lane e006a24ad1 Implement SEMI and ANTI joins in the planner and executor. (Semijoins replace
the old JOIN_IN code, but antijoins are new functionality.)  Teach the planner
to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
joins respectively.  Also, LEFT JOINs with suitable upper-level IS NULL
filters are recognized as being anti joins.  Unify the InClauseInfo and
OuterJoinInfo infrastructure into "SpecialJoinInfo".  With that change,
it becomes possible to associate a SpecialJoinInfo with every join attempt,
which permits some cleanup of join selectivity estimation.  That needs to be
taken much further than this patch does, but the next step is to change the
API for oprjoin selectivity functions, which seems like material for a
separate patch.  So for the moment the output size estimates for semi and
especially anti joins are quite bogus.
2008-08-14 18:48:00 +00:00
Tom Lane 2d1d96b1ce Teach the system how to use hashing for UNION. (INTERSECT/EXCEPT will follow,
but seem like a separate patch since most of the remaining work is on the
executor side.)  I took the opportunity to push selection of the grouping
operators for set operations into the parser where it belongs.  Otherwise this
is just a small exercise in making prepunion.c consider both alternatives.

As with the recent DISTINCT patch, this means we can UNION on datatypes that
can hash but not sort, and it means that UNION without ORDER BY is no longer
certain to produce sorted output.
2008-08-07 01:11:52 +00:00
Tom Lane 9511304752 Rearrange the querytree representation of ORDER BY/GROUP BY/DISTINCT items
as per my recent proposal:

1. Fold SortClause and GroupClause into a single node type SortGroupClause.
We were already relying on them to be struct-equivalent, so using two node
tags wasn't accomplishing much except to get in the way of comparing items
with equal().

2. Add an "eqop" field to SortGroupClause to carry the associated equality
operator.  This is cheap for the parser to get at the same time it's looking
up the sort operator, and storing it eliminates the need for repeated
not-so-cheap lookups during planning.  In future this will also let us
represent GROUP/DISTINCT operations on datatypes that have hash opclasses
but no btree opclasses (ie, they have equality but no natural sort order).
The previous representation simply didn't work for that, since its only
indicator of comparison semantics was a sort operator.

3. Add a hasDistinctOn boolean to struct Query to explicitly record whether
the distinctClause came from DISTINCT or DISTINCT ON.  This allows removing
some complicated and not 100% bulletproof code that attempted to figure
that out from the distinctClause alone.

This patch doesn't in itself create any new capability, but it's necessary
infrastructure for future attempts to use hash-based grouping for DISTINCT
and UNION/INTERSECT/EXCEPT.
2008-08-02 21:32:01 +00:00
Tom Lane ff673f558a Fix convert_IN_to_join to properly handle the case where the subselect's
output is not of the same type that's needed for the IN comparison (ie,
where the parser inserted an implicit coercion above the subselect result).
We should record the coerced expression, not just a raw Var referencing
the subselect output, as the quantity that needs to be unique-ified if
we choose to implement the IN as Unique followed by a plain join.

As of 8.3 this error was causing crashes, as seen in bug #4113 from Javier
Hernandez, because the executor was being told to hash or sort the raw
subselect output column using operators appropriate to the coerced type.

In prior versions there was no crash because the executor chose the
hash or sort operators for itself based on the column type it saw.
However, that's still not really right, because what's unique for one data
type might not be unique for another.  In corner cases we could get multiple
outputs of a row that should appear only once, as demonstrated by the
regression test case included in this commit.

However, this patch doesn't apply cleanly to 8.2 or before, and the code
involved has shifted enough over time that I'm hesitant to try to back-patch.
Given the lack of complaints from the field about such corner cases, I think
the bug may not be important enough to risk breaking other things with a
back-patch.
2008-04-21 20:54:15 +00:00
Bruce Momjian 9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
Bruce Momjian fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
Tom Lane d26559dbf3 Teach tuplesort.c about "top N" sorting, in which only the first N tuples
need be returned.  We keep a heap of the current best N tuples and sift-up
new tuples into it as we scan the input.  For M input tuples this means
only about M*log(N) comparisons instead of M*log(M), not to mention a lot
less workspace when N is small --- avoiding spill-to-disk for large M
is actually the most attractive thing about it.  Patch includes planner
and executor support for invoking this facility in ORDER BY ... LIMIT
queries.  Greg Stark, with some editorialization by moi.
2007-05-04 01:13:45 +00:00
Tom Lane afcf09dd90 Some further performance tweaks for planning large inheritance trees that
are mostly excluded by constraints: do the CE test a bit earlier to save
some adjust_appendrel_attrs() work on excluded children, and arrange to
use array indexing rather than rt_fetch() to fetch RTEs in the main body
of the planner.  The latter is something I'd wanted to do for awhile anyway,
but seeing list_nth_cell() as 35% of the runtime gets one's attention.
2007-04-21 21:01:45 +00:00
Tom Lane ab05eedecc Add support for cross-type hashing in hashed subplans (hashed IN/NOT IN cases
that aren't turned into true joins).  Since this is the last missing bit of
infrastructure, go ahead and fill out the hash integer_ops and float_ops
opfamilies with cross-type operators.  The operator family project is now
DONE ... er, except for documentation ...
2007-02-06 02:59:15 +00:00
Tom Lane f41803bb39 Refactor planner's pathkeys data structure to create a separate, explicit
representation of equivalence classes of variables.  This is an extensive
rewrite, but it brings a number of benefits:
* planner no longer fails in the presence of "incomplete" operator families
that don't offer operators for every possible combination of datatypes.
* avoid generating and then discarding redundant equality clauses.
* remove bogus assumption that derived equalities always use operators
named "=".
* mergejoins can work with a variety of sort orders (e.g., descending) now,
instead of tying each mergejoinable operator to exactly one sort order.
* better recognition of redundant sort columns.
* can make use of equalities appearing underneath an outer join.
2007-01-20 20:45:41 +00:00
Tom Lane a191a169d6 Change the planner-to-executor API so that the planner tells the executor
which comparison operators to use for plan nodes involving tuple comparison
(Agg, Group, Unique, SetOp).  Formerly the executor looked up the default
equality operator for the datatype, which was really pretty shaky, since it's
possible that the data being fed to the node is sorted according to some
nondefault operator class that could have an incompatible idea of equality.
The planner knows what it has sorted by and therefore can provide the right
equality operator to use.  Also, this change moves a couple of catalog lookups
out of the executor and into the planner, which should help startup time for
pre-planned queries by some small amount.  Modify the planner to remove some
other cavalier assumptions about always being able to use the default
operators.  Also add "nulls first/last" info to the Plan node for a mergejoin
--- neither the executor nor the planner can cope yet, but at least the API is
in place.
2007-01-10 18:06:05 +00:00
Bruce Momjian 29dccf5fe0 Update CVS HEAD for 2007 copyright. Back branches are typically not
back-stamped for this.
2007-01-05 22:20:05 +00:00
Tom Lane a78fcfb512 Restructure operator classes to allow improved handling of cross-data-type
cases.  Operator classes now exist within "operator families".  While most
families are equivalent to a single class, related classes can be grouped
into one family to represent the fact that they are semantically compatible.
Cross-type operators are now naturally adjunct parts of a family, without
having to wedge them into a particular opclass as we had done originally.

This commit restructures the catalogs and cleans up enough of the fallout so
that everything still works at least as well as before, but most of the work
needed to actually improve the planner's behavior will come later.  Also,
there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way
to create a new family right now is to allow CREATE OPERATOR CLASS to make
one by default.  I owe some more documentation work, too.  But that can all
be done in smaller pieces once this infrastructure is in place.
2006-12-23 00:43:13 +00:00
Bruce Momjian f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Joe Conway 9caafda579 Add support for multi-row VALUES clauses as part of INSERT statements
(e.g. "INSERT ... VALUES (...), (...), ...") and elsewhere as allowed
by the spec. (e.g. similar to a FROM clause subselect). initdb required.
Joe Conway and Tom Lane.
2006-08-02 01:59:48 +00:00
Tom Lane 98359c3e3f In the recent changes to make the planner account better for cache
effects in a nestloop inner indexscan, I had only dealt with plain index
scans and the index portion of bitmap scans.  But there will be cache
benefits for the heap accesses of bitmap scans too, so fix
cost_bitmap_heap_scan() to account for that.
2006-07-22 15:41:56 +00:00
Bruce Momjian e0522505bd Remove 576 references of include files that were not needed. 2006-07-14 14:52:27 +00:00
Tom Lane cffd89ca73 Revise the planner's handling of "pseudoconstant" WHERE clauses, that is
clauses containing no variables and no volatile functions.  Such a clause
can be used as a one-time qual in a gating Result plan node, to suppress
plan execution entirely when it is false.  Even when the clause is true,
putting it in a gating node wins by avoiding repeated evaluation of the
clause.  In previous PG releases, query_planner() would do this for
pseudoconstant clauses appearing at the top level of the jointree, but
there was no ability to generate a gating Result deeper in the plan tree.
To fix it, get rid of the special case in query_planner(), and instead
process pseudoconstant clauses through the normal RestrictInfo qual
distribution mechanism.  When a pseudoconstant clause is found attached to
a path node in create_plan(), pull it out and generate a gating Result at
that point.  This requires special-casing pseudoconstants in selectivity
estimation and cost_qual_eval, but on the whole it's pretty clean.
It probably even makes the planner a bit faster than before for the normal
case of no pseudoconstants, since removing pull_constant_clauses saves one
useless traversal of the qual tree.  Per gripe from Phil Frost.
2006-07-01 18:38:33 +00:00
Tom Lane 8a30cc2127 Make the planner estimate costs for nestloop inner indexscans on the basis
that the Mackert-Lohmann formula applies across all the repetitions of the
nestloop, not just each scan independently.  We use the M-L formula to
estimate the number of pages fetched from the index as well as from the table;
that isn't what it was designed for, but it seems reasonably applicable
anyway.  This makes large numbers of repetitions look much cheaper than
before, which accords with many reports we've received of overestimation
of the cost of a nestloop.  Also, change the index access cost model to
charge random_page_cost per index leaf page touched, while explicitly
not counting anything for access to metapage or upper tree pages.  This
may all need tweaking after we get some field experience, but in simple
tests it seems to be giving saner results than before.  The main thing
is to get the infrastructure in place to let cost_index() and amcostestimate
functions take repeated scans into account at all.  Per my recent proposal.

Note: this patch changes pg_proc.h, but I did not force initdb because
the changes are basically cosmetic --- the system does not look into
pg_proc to decide how to call an index amcostestimate function, and
there's no way to call such a function from SQL at all.
2006-06-06 17:59:58 +00:00
Bruce Momjian f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Tom Lane da27c0a1ef Teach tid-scan code to make use of "ctid = ANY (array)" clauses, so that
"ctid IN (list)" will still work after we convert IN to ScalarArrayOpExpr.
Make some minor efficiency improvements while at it, such as ensuring that
multiple TIDs are fetched in physical heap order.  And fix EXPLAIN so that
it shows what's really going on for a TID scan.
2005-11-26 22:14:57 +00:00
Bruce Momjian 1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane 37c443eefd Fix compare_fuzzy_path_costs() to behave a bit more sanely. The original
coding would ignore startup cost differences of less than 1% of the
estimated total cost; which was OK for normal planning but highly not OK
if a very small LIMIT was applied afterwards, so that startup cost becomes
the name of the game.  Instead, compare startup and total costs fuzzily
but independently.  This changes the plan selected for two queries in the
regression tests; adjust expected-output files for resulting changes in
row order.  Per reports from Dawid Kuroczko and Sam Mason.
2005-07-22 19:12:02 +00:00
Tom Lane 0182951bc8 Fix overenthusiastic optimization of 'x IN (SELECT DISTINCT ...)' and related
cases: we can't just consider whether the subquery's output is unique on its
own terms, we have to check whether the set of output columns we are going to
use will be unique.  Per complaint from Luca Pireddu and test case from
Michael Fuhr.
2005-07-15 17:09:26 +00:00
Tom Lane 9ab4d98168 Remove planner's private fields from Query struct, and put them into
a new PlannerInfo struct, which is passed around instead of the bare
Query in all the planning code.  This commit is essentially just a
code-beautification exercise, but it does open the door to making
larger changes to the planner data structures without having to muck
with the widely-known Query struct.
2005-06-05 22:32:58 +00:00
Tom Lane 3531383224 Just noticed that you can't Query-Cancel a long planner run, because
no part of the planner did CHECK_FOR_INTERRUPTS().  Add one in a
suitably strategic spot.
2005-06-03 19:00:12 +00:00
Tom Lane 5b05185262 Remove support for OR'd indexscans internal to a single IndexScan plan
node, as this behavior is now better done as a bitmap OR indexscan.
This allows considerable simplification in nodeIndexscan.c itself as
well as several planner modules concerned with indexscan plan generation.
Also we can improve the sharing of code between regular and bitmap
indexscans, since they are now working with nigh-identical Plan nodes.
2005-04-25 01:30:14 +00:00
Tom Lane bc843d3960 First cut at planner support for bitmap index scans. Lots to do yet,
but the code is basically working.  Along the way, rewrite the entire
approach to processing OR index conditions, and make it work in join
cases for the first time ever.  orindxpath.c is now basically obsolete,
but I left it in for the time being to allow easy comparison testing
against the old implementation.
2005-04-22 21:58:32 +00:00
Tom Lane 14c7fba3f7 Rethink original decision to use AND/OR Expr nodes to represent bitmap
logic operations during planning.  Seems cleaner to create two new Path
node types, instead --- this avoids duplication of cost-estimation code.
Also, create an enable_bitmapscan GUC parameter to control use of bitmap
plans.
2005-04-21 19:18:13 +00:00
Tom Lane e6f7edb9d5 Install some slightly realistic cost estimation for bitmap index scans. 2005-04-21 02:28:02 +00:00
Tom Lane 4a8c5d0375 Create executor and planner-backend support for decoupled heap and index
scans, using in-memory tuple ID bitmaps as the intermediary.  The planner
frontend (path creation and cost estimation) is not there yet, so none
of this code can be executed.  I have tested it using some hacked planner
code that is far too ugly to see the light of day, however.  Committing
now so that the bulk of the infrastructure changes go in before the tree
drifts under me.
2005-04-19 22:35:18 +00:00
Tom Lane ad161bcc8a Merge Resdom nodes into TargetEntry nodes to simplify code and save a
few palloc's.  I also chose to eliminate the restype and restypmod fields
entirely, since they are redundant with information stored in the node's
contained expression; re-examining the expression at need seems simpler
and more reliable than trying to keep restype/restypmod up to date.

initdb forced due to change in contents of stored rules.
2005-04-06 16:34:07 +00:00
Tom Lane 926e8a00d3 Add a back-link from IndexOptInfo structs to their parent RelOptInfo
structs.  There are many places in the planner where we were passing
both a rel and an index to subroutines, and now need only pass the
index struct.  Notationally simpler, and perhaps a tad faster.
2005-03-27 06:29:49 +00:00
Tom Lane febc9a613c Expand the 'special index operator' machinery to handle special cases
for boolean indexes.  Previously we would only use such an index with
WHERE clauses like 'indexkey = true' or 'indexkey = false'.  The new
code transforms the cases 'indexkey', 'NOT indexkey', 'indexkey IS TRUE',
and 'indexkey IS FALSE' into one of these.  While this is only marginally
useful in itself, I intend soon to change constant-expression simplification
so that 'foo = true' and 'foo = false' are reduced to just 'foo' and
'NOT foo' ... which would lose the ability to use boolean indexes for
such queries at all, if the indexscan machinery couldn't make the
reverse transformation.
2005-03-26 23:29:20 +00:00
Tom Lane 595ed2a855 Make the behavior of HAVING without GROUP BY conform to the SQL spec.
Formerly, if such a clause contained no aggregate functions we mistakenly
treated it as equivalent to WHERE.  Per spec it must cause the query to
be treated as a grouped query of a single group, the same as appearance
of aggregate functions would do.  Also, the HAVING filter must execute
after aggregate function computation even if it itself contains no
aggregate functions.
2005-03-10 23:21:26 +00:00
PostgreSQL Daemon 2ff501590b Tag appropriate files for rc3
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
2004-12-31 22:04:05 +00:00
Bruce Momjian b6b71b85bc Pgindent run for 8.0. 2004-08-29 05:07:03 +00:00
Bruce Momjian da9a8649d8 Update copyright to 2004. 2004-08-29 04:13:13 +00:00
Tom Lane fcbc438727 Label CVS tip as 8.0devel instead of 7.5devel. Adjust various comments
and documentation to reference 8.0 instead of 7.5.
2004-08-04 21:34:35 +00:00
Tom Lane 80c6847cc5 Desultory de-FastList-ification. RelOptInfo.reltargetlist is back to
being a plain List.
2004-06-01 03:03:05 +00:00
Neil Conway 72b6ad6313 Use the new List API function names throughout the backend, and disable the
list compatibility API by default. While doing this, I decided to keep
the llast() macro around and introduce llast_int() and llast_oid() variants.
2004-05-30 23:40:41 +00:00
Neil Conway d0b4399d81 Reimplement the linked list data structure used throughout the backend.
In the past, we used a 'Lispy' linked list implementation: a "list" was
merely a pointer to the head node of the list. The problem with that
design is that it makes lappend() and length() linear time. This patch
fixes that problem (and others) by maintaining a count of the list
length and a pointer to the tail node along with each head node pointer.
A "list" is now a pointer to a structure containing some meta-data
about the list; the head and tail pointers in that structure refer
to ListCell structures that maintain the actual linked list of nodes.

The function names of the list API have also been changed to, I hope,
be more logically consistent. By default, the old function names are
still available; they will be disabled-by-default once the rest of
the tree has been updated to use the new API names.
2004-05-26 04:41:50 +00:00
Neil Conway 1812d3b233 Remove the last traces of Joe Hellerstein's "xfunc" optimization. Patch
from Alvaro Herrera. Also, removed lispsort.c, since it is no longer
used.
2004-04-25 18:23:57 +00:00
Tom Lane 8d9a28eeef Use fuzzy comparison of path costs in add_path(), so that paths with the
same path keys and nearly equivalent costs will be considered redundant.
The exact nature of the fuzziness may get adjusted later based on current
discussions, but no one has shot a hole in the basic idea yet ...
2004-03-29 19:58:04 +00:00
Tom Lane 03e2a47e0b Teach is_distinct_query to recognize that GROUP BY forces a subquery's
output to be distinct, if all the GROUP BY columns appear in the output.
Per suggestion from Dennis Haney.
2004-03-02 16:42:20 +00:00
Tom Lane 391c3811a2 Rename SortMem and VacuumMem to work_mem and maintenance_work_mem.
Make btree index creation and initial validation of foreign-key constraints
use maintenance_work_mem rather than work_mem as their memory limit.
Add some code to guc.c to allow these variables to be referenced by their
old names in SHOW and SET commands, for backwards compatibility.
2004-02-03 17:34:04 +00:00
Tom Lane 864412fd0a Recognize that IN subqueries return already-unique results if they use
UNION/INTERSECT/EXCEPT (without ALL).  This adds on to the previous
optimization for subqueries using DISTINCT.
2004-01-19 03:49:41 +00:00
Tom Lane fa559a86ee Adjust indexscan planning logic to keep RestrictInfo nodes associated
with index qual clauses in the Path representation.  This saves a little
work during createplan and (probably more importantly) allows reuse of
cached selectivity estimates during indexscan planning.  Also fix latent
bug: wrong plan would have been generated for a 'special operator' used
in a nestloop-inner-indexscan join qual, because the special operator
would not have gotten into the list of quals to recheck.  This bug is
only latent because at present the special-operator code could never
trigger on a join qual, but sooner or later someone will want to do it.
2004-01-05 23:39:54 +00:00
Tom Lane 5c74ce23db Improve UniquePath logic to detect the case where the input is already
known unique (eg, it is a SELECT DISTINCT ... subquery), and not do a
redundant unique-ification step.
2004-01-05 18:04:39 +00:00
Tom Lane 9091e8d1b2 Add the ability to extract OR indexscan conditions from OR-of-AND
join conditions in which each OR subclause includes a constraint on
the same relation.  This implements the other useful side-effect of
conversion to CNF format, without its unpleasant side-effects.  As
per pghackers discussion of a few weeks ago.
2004-01-05 05:07:36 +00:00
PostgreSQL Daemon 969685ad44 $Header: -> $PostgreSQL Changes ... 2003-11-29 19:52:15 +00:00
Bruce Momjian f3c3deb7d0 Update copyrights to 2003. 2003-08-04 02:40:20 +00:00
Bruce Momjian 089003fb46 pgindent run. 2003-08-04 00:43:34 +00:00
Tom Lane 45708f5ebc Error message editing in backend/optimizer, backend/rewrite. 2003-07-25 00:01:09 +00:00
Tom Lane 3d09f6c560 Make cost estimates for SubqueryScan more realistic: charge cpu_tuple_cost
for each row processed, and don't forget the evaluation cost of any
restriction clauses attached to the node.  Per discussion with Greg Stark.
2003-07-14 22:35:54 +00:00
Tom Lane 835bb975d8 Restructure building of join relation targetlists so that a join plan
node emits only those vars that are actually needed above it in the
plan tree.  (There were comments in the code suggesting that this was
done at some point in the dim past, but for a long time we have just
made join nodes emit everything that either input emitted.)  Aside from
being marginally more efficient, this fixes the problem noted by Peter
Eisentraut where a join above an IN-implemented-as-join might fail,
because the subplan targetlist constructed in the latter case didn't
meet the expectation of including everything.
Along the way, fix some places that were O(N^2) in the targetlist
length.  This is not all the trouble spots for wide queries by any
means, but it's a step forward.
2003-06-29 23:05:05 +00:00
Tom Lane cb02610e50 Adjust nestloop-with-inner-indexscan plan generation so that we catch
some cases of redundant clauses that were formerly not caught.  We have
to special-case this because the clauses involved never get attached to
the same join restrictlist and so the existing logic does not notice
that they are redundant.
2003-06-15 22:51:45 +00:00
Tom Lane f45df8c014 Cause CHAR(n) to TEXT or VARCHAR conversion to automatically strip trailing
blanks, in hopes of reducing the surprise factor for newbies.  Remove
redundant operators for VARCHAR (it depends wholly on TEXT operations now).
Clean up resolution of ambiguous operators/functions to avoid surprising
choices for domains: domains are treated as equivalent to their base types
and binary-coercibility is no longer considered a preference item when
choosing among multiple operators/functions.  IsBinaryCoercible now correctly
reflects the notion that you need *only* relabel the type to get from type
A to type B: that is, a domain is binary-coercible to its base type, but
not vice versa.  Various marginal cleanup, including merging the essentially
duplicate resolution code in parse_func.c and parse_oper.c.  Improve opr_sanity
regression test to understand about binary compatibility (using pg_cast),
and fix a couple of small errors in the catalogs revealed thereby.
Restructure "special operator" handling to fetch operators via index opclasses
rather than hardwiring assumptions about names (cleans up the pattern_ops
stuff a little).
2003-05-26 00:11:29 +00:00
Tom Lane 056467ec6b Teach planner how to propagate pathkeys from sub-SELECTs in FROM up to
the outer query.  (The implementation is a bit klugy, but it would take
nontrivial restructuring to make it nicer, which this is probably not
worth.)  This avoids unnecessary sort steps in examples like
SELECT foo,count(*) FROM (SELECT ... ORDER BY foo,bar) sub GROUP BY foo
which means there is now a reasonable technique for controlling the
order of inputs to custom aggregates, even in the grouping case.
2003-02-15 20:12:41 +00:00
Tom Lane c15a4c2aef Replace planner's representation of relation sets, per pghackers discussion.
Instead of Lists of integers, we now store variable-length bitmap sets.
This should be faster as well as less error-prone.
2003-02-08 20:20:55 +00:00
Tom Lane 70fba70430 Upgrade cost estimation for joins, per discussion with Bradley Baetz.
Try to model the effect of rescanning input tuples in mergejoins;
account for JOIN_IN short-circuiting where appropriate.  Also, recognize
that mergejoin and hashjoin clauses may now be more than single operator
calls, so we have to charge appropriate execution costs.
2003-01-27 20:51:54 +00:00
Tom Lane e2114817c7 Implement choice between hash-based and sort-based grouping for doing
DISTINCT processing on the output of an IN sub-select.
2003-01-22 00:07:00 +00:00
Tom Lane bdfbfde1b1 IN clauses appearing at top level of WHERE can now be handled as joins.
There are two implementation techniques: the executor understands a new
JOIN_IN jointype, which emits at most one matching row per left-hand row,
or the result of the IN's sub-select can be fed through a DISTINCT filter
and then joined as an ordinary relation.
Along the way, some minor code cleanup in the optimizer; notably, break
out most of the jointree-rearrangement preprocessing in planner.c and
put it in a new file prep/prepjointree.c.
2003-01-20 18:55:07 +00:00
Tom Lane 1fd0c59e25 Phase 1 of read-only-plans project: cause executor state nodes to point
to plan nodes, not vice-versa.  All executor state nodes now inherit from
struct PlanState.  Copying of plan trees has been simplified by not
storing a list of SubPlans in Plan nodes (eliminating duplicate links).
The executor still needs such a list, but it can build it during
ExecutorStart since it has to scan the plan tree anyway.
No initdb forced since no stored-on-disk structures changed, but you
will need a full recompile because of node-numbering changes.
2002-12-05 15:50:39 +00:00
Tom Lane 935969415a Be more realistic about plans involving Materialize nodes: take their
cost into account while planning.
2002-11-30 05:21:03 +00:00
Tom Lane ddb2d78de0 Upgrade planner and executor to allow multiple hash keys for a hash join,
instead of only one.  This should speed up planning (only one hash path
to consider for a given pair of relations) as well as allow more effective
hashing, when there are multiple hashable joinclauses.
2002-11-30 00:08:22 +00:00
Tom Lane 04c8785c7b Restructure planning of nestloop inner indexscans so that the set of usable
joinclauses is determined accurately for each join.  Formerly, the code only
considered joinclauses that used all of the rels from the outer side of the
join; thus for example
	FROM (a CROSS JOIN b) JOIN c ON (c.f1 = a.x AND c.f2 = b.y)
could not exploit a two-column index on c(f1,f2), since neither of the
qual clauses would be in the joininfo list it looked in.  The new code does
this correctly, and also is able to eliminate redundant clauses, thus fixing
the problem noted 24-Oct-02 by Hans-Jürgen Schönig.
2002-11-24 21:52:15 +00:00
Tom Lane f6dba10e62 First phase of implementing hash-based grouping/aggregation. An AGG plan
node now does its own grouping of the input rows, and has no need for a
preceding GROUP node in the plan pipeline.  This allows elimination of
the misnamed tuplePerGroup option for GROUP, and actually saves more code
in nodeGroup.c than it costs in nodeAgg.c, as well as being presumably
faster.  Restructure the API of query_planner so that we do not commit to
using a sorted or unsorted plan in query_planner; instead grouping_planner
makes the decision.  (Right now it isn't any smarter than query_planner
was, but that will change as soon as it has the option to select a hash-
based aggregation step.)  Despite all the hackery, no initdb needed since
only in-memory node types changed.
2002-11-06 00:00:45 +00:00
Bruce Momjian d84fe82230 Update copyright to 2002. 2002-06-20 20:29:54 +00:00
Tom Lane f9e4f611a1 First pass at set-returning-functions in FROM, by Joe Conway with
some kibitzing from Tom Lane.  Not everything works yet, and there's
no documentation or regression test, but let's commit this so Joe
doesn't need to cope with tracking changes in so many files ...
2002-05-12 20:10:05 +00:00
Bruce Momjian b81844b173 pgindent run on all C files. Java run to follow. initdb/regression
tests pass.
2001-10-25 05:50:21 +00:00
Tom Lane f31dc0ada7 Partial indexes work again, courtesy of Martijn van Oosterhout.
Note: I didn't force an initdb, figuring that one today was enough.
However, there is a new function in pg_proc.h, and pg_dump won't be
able to dump partial indexes until you add that function.
2001-07-16 05:07:00 +00:00
Tom Lane 7c579fa12d Further work on making use of new statistics in planner. Adjust APIs
of costsize.c routines to pass Query root, so that costsize can figure
more things out by itself and not be so dependent on its callers to tell
it everything it needs to know.  Use selectivity of hash or merge clause
to estimate number of tuples processed internally in these joins
(this is more useful than it would've been before, since eqjoinsel is
somewhat more accurate than before).
2001-06-05 05:26:05 +00:00
Tom Lane be03eb25f3 Modify optimizer data structures so that IndexOptInfo lists built for
create_index_paths are not immediately discarded, but are available for
subsequent planner work.  This allows avoiding redundant syscache lookups
in several places.  Change interface to operator selectivity estimation
procedures to allow faster and more flexible estimation.
Initdb forced due to change of pg_proc entries for selectivity functions!
2001-05-20 20:28:20 +00:00
Tom Lane f905d65ee3 Rewrite of planner statistics-gathering code. ANALYZE is now available as
a separate statement (though it can still be invoked as part of VACUUM, too).
pg_statistic redesigned to be more flexible about what statistics are
stored.  ANALYZE now collects a list of several of the most common values,
not just one, plus a histogram (not just the min and max values).  Random
sampling is used to make the process reasonably fast even on very large
tables.  The number of values and histogram bins collected is now
user-settable via an ALTER TABLE command.

There is more still to do; the new stats are not being used everywhere
they could be in the planner.  But the remaining changes for this project
should be localized, and the behavior is already better than before.

A not-very-related change is that sorting now makes use of btree comparison
routines if it can find one, rather than invoking '<' twice.
2001-05-07 00:43:27 +00:00
Bruce Momjian 9e1552607a pgindent run. Make it all clean. 2001-03-22 04:01:46 +00:00
Bruce Momjian 623bf843d2 Change Copyright from PostgreSQL, Inc to PostgreSQL Global Development Group. 2001-01-24 19:43:33 +00:00
Tom Lane ea166f1146 Planner speedup hacking. Avoid saving useless pathkeys, so that path
comparison does not consider paths different when they differ only in
uninteresting aspects of sort order.  (We had a special case of this
consideration for indexscans already, but generalize it to apply to
ordered join paths too.)  Be stricter about what is a canonical pathkey
to allow faster pathkey comparison.  Cache canonical pathkeys and
dispersion stats for left and right sides of a RestrictInfo's clause,
to avoid repeated computation.  Total speedup will depend on number of
tables in a query, but I see about 4x speedup of planning phase for
a sample seven-table query.
2000-12-14 22:30:45 +00:00
Tom Lane 6543d81d65 Restructure handling of inheritance queries so that they work with outer
joins, and clean things up a good deal at the same time.  Append plan node
no longer hacks on rangetable at runtime --- instead, all child tables are
given their own RT entries during planning.  Concept of multiple target
tables pushed up into execMain, replacing bug-prone implementation within
nodeAppend.  Planner now supports generating Append plans for inheritance
sets either at the top of the plan (the old way) or at the bottom.  Expanding
at the bottom is appropriate for tables used as sources, since they may
appear inside an outer join; but we must still expand at the top when the
target of an UPDATE or DELETE is an inheritance set, because we actually need
a different targetlist and junkfilter for each target table in that case.
Fortunately a target table can't be inside an outer join...  Bizarre mutual
recursion between union_planner and prepunion.c is gone --- in fact,
union_planner doesn't really have much to do with union queries anymore,
so I renamed it grouping_planner.
2000-11-12 00:37:02 +00:00
Bruce Momjian b32685a999 Add proofreader's changes to docs.
Fix misspelling of disbursion to dispersion.
2000-10-05 19:48:34 +00:00
Tom Lane 3a94e789f5 Subselects in FROM clause, per ISO syntax: FROM (SELECT ...) [AS] alias.
(Don't forget that an alias is required.)  Views reimplemented as expanding
to subselect-in-FROM.  Grouping, aggregates, DISTINCT in views actually
work now (he says optimistically).  No UNION support in subselects/views
yet, but I have some ideas about that.  Rule-related permissions checking
moved out of rewriter and into executor.
INITDB REQUIRED!
2000-09-29 18:21:41 +00:00
Tom Lane ed5003c584 First cut at full support for OUTER JOINs. There are still a few loose
ends to clean up (see my message of same date to pghackers), but mostly
it works.  INITDB REQUIRED!
2000-09-12 21:07:18 +00:00
Bruce Momjian a12a23f0d0 Remove unused include files. Do not touch /port or includes used by defines. 2000-05-30 00:49:57 +00:00
Bruce Momjian 52f77df613 Ye-old pgindent run. Same 4-space tabs. 2000-04-12 17:17:23 +00:00
Tom Lane 1d5e7a6f46 Repair logic flaw in cost estimator: cost_nestloop() was estimating CPU
costs using the inner path's parent->rows count as the number of tuples
processed per inner scan iteration.  This is wrong when we are using an
inner indexscan with indexquals based on join clauses, because the rows
count in a Relation node reflects the selectivity of the restriction
clauses for that rel only.  Upshot was that if join clause was very
selective, we'd drastically overestimate the true cost of the join.
Fix is to calculate correct output-rows estimate for an inner indexscan
when the IndexPath node is created and save it in the path node.
Change of path node doesn't require initdb, since path nodes don't
appear in saved rules.
2000-03-22 22:08:35 +00:00
Tom Lane 3cbcb78a3d Plug some more memory leaks in the planner. It still leaks like a sieve,
but this is as good as it'll get for this release...
2000-02-18 23:47:31 +00:00
Tom Lane b1577a7c78 New cost model for planning, incorporating a penalty for random page
accesses versus sequential accesses, a (very crude) estimate of the
effects of caching on random page accesses, and cost to evaluate WHERE-
clause expressions.  Export critical parameters for this model as SET
variables.  Also, create SET variables for the planner's enable flags
(enable_seqscan, enable_indexscan, etc) so that these can be controlled
more conveniently than via PGOPTIONS.

Planner now estimates both startup cost (cost before retrieving
first tuple) and total cost of each path, so it can optimize queries
with LIMIT on a reasonable basis by interpolating between these costs.
Same facility is a win for EXISTS(...) subqueries and some other cases.

Redesign pathkey representation to achieve a major speedup in planning
(I saw as much as 5X on a 10-way join); also minor changes in planner
to reduce memory consumption by recycling discarded Path nodes and
not constructing unnecessary lists.

Minor cleanups to display more-plausible costs in some cases in
EXPLAIN output.

Initdb forced by change in interface to index cost estimation
functions.
2000-02-15 20:49:31 +00:00
Tom Lane d8733ce674 Repair planning bugs caused by my misguided removal of restrictinfo link
fields in JoinPaths --- turns out that we do need that after all :-(.
Also, rearrange planner so that only one RelOptInfo is created for a
particular set of joined base relations, no matter how many different
subsets of relations it can be created from.  This saves memory and
processing time compared to the old method of making a bunch of RelOptInfos
and then removing the duplicates.  Clean up the jointree iteration logic;
not sure if it's better, but I sure find it more readable and plausible
now, particularly for the case of 'bushy plans'.
2000-02-07 04:41:04 +00:00
Bruce Momjian 5c25d60244 Add:
* Portions Copyright (c) 1996-2000, PostgreSQL, Inc

to all files copyright Regents of Berkeley.  Man, that's a lot of files.
2000-01-26 05:58:53 +00:00
Tom Lane 71ed7eb494 Revise handling of index-type-specific indexscan cost estimation, per
pghackers discussion of 5-Jan-2000.  The amopselect and amopnpages
estimators are gone, and in their place is a per-AM amcostestimate
procedure (linked to from pg_am, not pg_amop).
2000-01-22 23:50:30 +00:00
Tom Lane 166b5c1def Another round of planner/optimizer work. This is just restructuring and
code cleanup; no major improvements yet.  However, EXPLAIN does produce
more intuitive outputs for nested loops with indexscans now...
2000-01-09 00:26:47 +00:00
Bruce Momjian 6f9ff92cc0 Tid access method feature from Hiroshi Inoue, Inoue@tpf.co.jp 1999-11-23 20:07:06 +00:00
Tom Lane e6381966c1 Major planner/optimizer revision: get rid of PathOrder node type,
store all ordering information in pathkeys lists (which are now lists of
lists of PathKeyItem nodes, not just lists of lists of vars).  This was
a big win --- the code is smaller and IMHO more understandable than it
was, even though it handles more cases.  I believe the node changes will
not force an initdb for anyone; planner nodes don't show up in stored
rules.
1999-08-16 02:17:58 +00:00
Tom Lane e1fad50a5d Revise generation of hashjoin paths: generate one path per
hashjoinable clause, not one path for a randomly-chosen element of each
set of clauses with the same join operator.  That is, if you wrote
   SELECT ... WHERE t1.f1 = t2.f2 and t1.f3 = t2.f4,
and both '=' ops were the same opcode (say, all four fields are int4),
then the system would either consider hashing on f1=f2 or on f3=f4,
but it would *not* consider both possibilities.  Boo hiss.
Also, revise estimation of hashjoin costs to include a penalty when the
inner join var has a high disbursion --- ie, the most common value is
pretty common.  This tends to lead to badly skewed hash bucket occupancy
and way more comparisons than you'd expect on average.
I imagine that the cost calculation still needs tweaking, but at least
it generates a more reasonable plan than before on George Young's example.
1999-08-06 04:00:17 +00:00
Tom Lane 30da344cb1 Update comments about clause selectivity estimation. 1999-07-30 22:34:19 +00:00
Tom Lane 04578a9180 Further cleanups of indexqual processing: simplify control
logic in indxpath.c, avoid generation of redundant indexscan paths for the
same relation and index.
1999-07-30 04:07:25 +00:00
Tom Lane 7d572886d6 Fix coredump seen when doing mergejoin between indexed tables,
for example in the regression test database, try
select * from tenk1 t1, tenk1 t2 where t1.unique1 = t2.unique2;
6.5 has this same bug ...
1999-07-30 00:56:17 +00:00
Tom Lane 9e7e29e6c9 First cut at doing LIKE/regex indexing optimization in
optimizer rather than parser.  This has many advantages, such as not
getting fooled by chance uses of operator names ~ and ~~ (the operators
are identified by OID now), and not creating useless comparison operations
in contexts where the comparisons will not actually be used as indexquals.
The new code also recognizes exact-match LIKE and regex patterns, and
produces an = indexqual instead of >= and <=.

This change does NOT fix the problem with non-ASCII locales: the code
still doesn't know how to generate an upper bound indexqual for non-ASCII
collation order.  But it's no worse than before, just the same deficiency
in a different place...

Also, dike out loc_restrictinfo fields in Plan nodes.  These were doing
nothing useful in the absence of 'expensive functions' optimization,
and they took a considerable amount of processing to fill in.
1999-07-27 03:51:11 +00:00
Tom Lane 49ed4dd779 Further work on planning of indexscans. Cleaned up interfaces
to index_selectivity so that it can be handed an indexqual clause list
rather than a bunch of assorted derivative data.
1999-07-25 23:07:26 +00:00
Tom Lane ac4913a0dd Clean up messy clause-selectivity code in clausesel.c; repair bug
identified by Hiroshi (incorrect cost attributed to OR clauses
after multiple passes through set_rest_selec()).  I think the code
was trying to allow selectivities of OR subclauses to be passed in
from outside, but noplace was actually passing any useful data, and
set_rest_selec() was passing wrong data.

Restructure representation of "indexqual" in IndexPath nodes so that
it is the same as for indxqual in completed IndexScan nodes: namely,
a toplevel list with an entry for each pass of the index scan, having
sublists that are implicitly-ANDed index qual conditions for that pass.
You don't want to know what the old representation was :-(

Improve documentation of OR-clause indexscan functions.

Remove useless 'notclause' field from RestrictInfo nodes.  (This might
force an initdb for anyone who has stored rules containing RestrictInfos,
but I do not think that RestrictInfo ever appears in completed plans.)
1999-07-24 23:21:14 +00:00
Bruce Momjian a71802e12e Final cleanup. 1999-07-16 05:00:38 +00:00
Bruce Momjian 9b645d481c Update #include cleanups 1999-07-16 03:14:30 +00:00
Bruce Momjian 2e6b1e63a3 Remove unused #includes in *.c files. 1999-07-15 22:40:16 +00:00
Bruce Momjian 4b2c2850bf Clean up #include in /include directory. Add scripts for checking includes. 1999-07-15 15:21:54 +00:00
Bruce Momjian fcff1cdf4e Another pgindent run. Sorry folks. 1999-05-25 22:43:53 +00:00
Bruce Momjian 07842084fe pgindent run over code. 1999-05-25 16:15:34 +00:00
Marc G. Fournier 8c3e8a8a0e From: Tatsuo Ishii <t-ishii@sra.co.jp>
Ok. I made patches replacing all of "#if FALSE" or "#if 0" to "#ifdef
NOT_USED" for current. I have tested these patches in that the
postgres binaries are identical.
1999-02-21 03:49:55 +00:00
Bruce Momjian 75cccd0ad3 pathkeys fixes 1999-02-20 19:02:43 +00:00
Bruce Momjian 0ff2733355 Update pathkeys comparison function. 1999-02-20 18:01:02 +00:00
Bruce Momjian 31cce21fb0 Fix bushy plans. Cleanup. 1999-02-18 00:49:48 +00:00
Bruce Momjian ba2883b264 Remove duplicate geqo functions, and more optimizer cleanup 1999-02-15 03:22:37 +00:00
Bruce Momjian 6724a50787 Change my-function-name-- to my_function_name, and optimizer renames. 1999-02-13 23:22:53 +00:00
Bruce Momjian c0d17c7aee JoinPath -> NestPath for nested loop. 1999-02-12 06:43:53 +00:00
Bruce Momjian 3fdb9bb9c7 Fix optimizer and make faster. 1999-02-12 05:57:08 +00:00
Bruce Momjian 55d0465009 optimizer update 1999-02-12 02:37:52 +00:00
Bruce Momjian 34ecb9d850 Optimizer cleanups. 1999-02-11 21:05:28 +00:00
Bruce Momjian c873fcdaf4 Optimizer cleanup. 1999-02-11 17:21:51 +00:00
Bruce Momjian 8dc2209f71 optimizer cleanup 1999-02-11 17:03:17 +00:00
Bruce Momjian 6de25f09b1 Optimizer cleanup. 1999-02-11 17:00:49 +00:00
Bruce Momjian 4ea3f728e9 More optimization. 1999-02-11 16:09:41 +00:00
Bruce Momjian d244df95db More optimizer speedups. 1999-02-11 14:59:09 +00:00
Bruce Momjian 129543e22d optimizer cleanup 1999-02-11 05:29:08 +00:00
Bruce Momjian dbd80c97f4 Optimizer fix for samekeys() and cost fixes for longer optimizer keys. 1999-02-11 04:08:44 +00:00
Bruce Momjian 9dbb0efb0b Optmizer cleanup 1999-02-10 21:02:50 +00:00
Bruce Momjian f859c81c18 Rename Path.keys to Path.pathkeys. Too many 'keys' used for other things. 1999-02-10 03:52:54 +00:00
Bruce Momjian fe35ffe7e0 Major optimizer improvement for joining a large number of tables. 1999-02-09 03:51:42 +00:00
Bruce Momjian 54e5d25666 Optimizer cleanup. 1999-02-08 04:29:25 +00:00
Bruce Momjian a553760845 Optimizer cleanup. 1999-02-06 17:29:30 +00:00
Bruce Momjian ae12e25263 Update optimizer comments. 1999-02-04 19:20:12 +00:00
Bruce Momjian 18fbe4142f More optimizer renaming HInfo -> HashInfo. 1999-02-04 01:47:02 +00:00
Bruce Momjian 9322950aa4 Cleanup of source files where 'return' or 'var =' is alone on a line. 1999-02-03 21:18:02 +00:00
Bruce Momjian 8d9237d485 Optimizer rename ClauseInfo -> RestrictInfo. Update optimizer README. 1999-02-03 20:15:53 +00:00
Bruce Momjian 4090d17fee SET_ARGS cleanup 1999-02-02 23:53:26 +00:00
Bruce Momjian 748e300317 Fix for AND/OR handling. 1998-09-21 15:41:28 +00:00
Bruce Momjian fa1a8d6a97 OK, folks, here is the pgindent output. 1998-09-01 04:40:42 +00:00
Bruce Momjian af74855a60 Renaming cleanup, no pgindent yet. 1998-09-01 03:29:17 +00:00
Bruce Momjian d9be0ff432 MergeSort was sometimes called mergejoin and was confusing. Now
it is now only mergejoin.
1998-08-04 16:44:31 +00:00
Bruce Momjian 584f9438ca Rename Rel to RelOptInfo. 1998-07-18 04:22:52 +00:00
Bruce Momjian 6bd323c6b3 Remove un-needed braces around single statements. 1998-06-15 19:30:31 +00:00
Bruce Momjian a32450a585 pgindent run before 6.3 release, with Thomas' requested changes. 1998-02-26 04:46:47 +00:00
Bruce Momjian 59f6a57e59 Used modified version of indent that understands over 100 typedefs. 1997-09-08 21:56:23 +00:00
Bruce Momjian 319dbfa736 Another PGINDENT run that changes variable indenting and case label indenting. Also static variable indenting. 1997-09-08 02:41:22 +00:00
Bruce Momjian 1ccd423235 Massive commit to run PGINDENT on all *.c and *.h files. 1997-09-07 05:04:48 +00:00
Bruce Momjian a1635450b3 Cleanups needed for indent. Remove }; 1997-09-05 18:13:45 +00:00
Marc G. Fournier d146305065 Patches for Vadim's multikey indexing... 1997-03-18 18:41:37 +00:00
Marc G. Fournier d31084e9d1 Postgres95 1.01 Distribution - Virgin Sources 1996-07-09 06:22:35 +00:00