postgresql/src/backend/optimizer/README

Summary
-------

The optimizer generates optimial query plans by doing several steps:

1) Take each relation in a query, and make a RelOptInfo structure for
it.  Find each way of accessing the relation, called a Path, including
sequential and index scans, and add it to RelOptInfo.pathlist.  Also
create RelOptInfo.joininfo that lists all the joins that involve this
relation.  For example, the WHERE clause "tab1.col1 = tab2.col1"
generates a JoinInfo for tab1 listing tab2 as an unjoined relation, and
tab2's joininfo shows tab1 as an unjoined relation.

2) Join each RelOptInfo to other RelOptInfo as specified in
RelOptInfo.joininfo.  At this point each RelOptInfo is a single
relation, so you are joining every relation to the other relations as
joined in the WHERE clause.

Joins occur using two RelOptInfos.  One is outer, the other inner. 
Outers drive lookups of values in the inner.  In a nested loop, lookups
of values in the inner occur by scanning to find each matching inner
row.  In a mergejoin, inner and outer rows are ordered, and are accessed
in order, so only one scan of inner is required to perform the entire
join.  In a hashjoin, inner rows are hashed for lookups.

Each unique join combination becomes a new RelOptInfo.  The RelOptInfo
is now the joining of two relations.  RelOptInfo.pathlist are various
paths to create the joined result, having different orderings depending
on the join method used.

3) At this point, every RelOptInfo is joined to each other again, with
a new relation added to each RelOptInfo.  This continues until all
relations have been joined into one RelOptInfo, and the cheapest Path is
chosen.

    SELECT  *
    FROM    tab1, tab2, tab3, tab4
    WHERE   tab1.col = tab2.col AND
        tab2.col = tab3.col AND
        tab3.col = tab4.col

    Tables 1, 2, 3, and 4 are joined as:
    {1 2},{2 3},{3 4}
    {1 2 3},{2 3 4}
    {1 2 3 4}

    SELECT  *
    FROM    tab1, tab2, tab3, tab4
    WHERE   tab1.col = tab2.col AND
        tab1.col = tab3.col AND
        tab1.col = tab4.col

    Tables 1, 2, 3, and 4 are joined as:
    {1 2},{1 3},{1 4}
    {1 2 3},{1 3 4},{1,2,4}
    {1 2 3 4}

In the default left-handed joins, each RelOptInfo adds one
single-relation RelOptInfo in each join pass, and the added RelOptInfo
is always the inner relation in the join.  In right-handed joins, the
added RelOptInfo is the outer relation in the join.  In bushy plans,
multi-relation RelOptInfo's can be joined to other multi-relation
RelOptInfo's. 

Optimizer Functions
-------------------

These directories take the Query structure returned by the parser, and
generate a plan used by the executor.  The /plan directory generates the
plan, the /path generates all possible ways to join the tables, and
/prep handles special cases like inheritance.  /utils is utility stuff.

planner()
 handle inheritance by processing separately
-init_query_planner()
  preprocess target list
  preprocess qualifications(WHERE)
--query_planner()
   cnfify()
    Summary:

     Simple cases with all AND's are handled by removing the AND's:

     convert:   a = 1 AND b = 2 AND c = 3
     to:        a = 1, b = 2, c = 3

     Qualifications with OR's are handled differently.  OR's inside AND
     clauses are not modified drastically:

     convert:   a = 1 AND b = 2 AND (c = 3 OR d = 4)
     to:        a = 1, b = 2, c = 3 OR d = 4

     OR's in the upper level are more complex to handle:

     convert:   (a = 1 AND b = 2) OR c = 3
     to:        (a = 1 OR c = 3) AND (b = 2 OR c = 3)
     finally:   (a = 1 OR c = 3), (b = 2 OR c = 3)

     These clauses all have to be true for a result to be returned,
     so the optimizer can choose the most restrictive clauses.

   pull out constants from target list
   get a target list that only contains column names, no expressions
   if none, then return
---subplanner()
    make list of relations in target
    make list of relations in where clause
    split up the qual into restrictions (a=1) and joins (b=c)
    find relation clauses can do merge sort and hash joins
----make_one_rel()
     set_base_rel_pathlist()
      find scan and all index paths for each relation
      find selectivity of columns used in joins
-----make_one_rel_by_joins()
      jump to geqo if needed
      again:
       make_rels_by_joins():
        for each joinrel:
         make_rels_by_clause_joins()
          for each rel's joininfo list:
           if a join from the join clause adds only one relation, do the join
         or make_rels_by_clauseless_joins()
       update_rels_pathlist_for_joins()
        generate nested,merge,hash join paths for new rel's created above
       merge_rels_with_same_relids()
        merge RelOptInfo paths that have the same relids because of joins
       rels_set_cheapest()
        set cheapest path
       if all relations in one RelOptInfo, return
   do group(GROUP)
   do aggregate
   put back constants
   re-flatten target list
 make unique(DISTINCT)
 make sort(ORDER BY)


Optimizer Structures
--------------------

RelOptInfo      - a relation or joined relations

 RestrictInfo   - restriction clauses, like "x = 3"
 JoinInfo       - join clauses, including the relids needed for the join

 Path           - every way to generate a RelOptInfo(sequential,index,joins)
  IndexPath     - index scans
  NestPath      - nested joins
  MergePath     - merge joins
  HashPath      - hash joins

 PathOrder      - every ordering type (sort, merge of relations)
Major optimizer improvement for joining a large number of tables. 1999-02-09 04:51:42 +01:00			`Summary`
			`-------`

Optimizer cleanup. 1999-02-08 05:29:25 +01:00			`The optimizer generates optimial query plans by doing several steps:`

Update optimizer readme. 1999-02-15 23:19:01 +01:00			`1) Take each relation in a query, and make a RelOptInfo structure for`
			`it. Find each way of accessing the relation, called a Path, including`
optimizer cleanup 1999-02-19 03:05:20 +01:00			`sequential and index scans, and add it to RelOptInfo.pathlist. Also`
			`create RelOptInfo.joininfo that lists all the joins that involve this`
optimizer cleanup 1999-02-19 06:18:06 +01:00			`relation. For example, the WHERE clause "tab1.col1 = tab2.col1"`
			`generates a JoinInfo for tab1 listing tab2 as an unjoined relation, and`
			`tab2's joininfo shows tab1 as an unjoined relation.`
Optimizer cleanup. 1999-02-08 05:29:25 +01:00
optimizer cleanup 1999-02-19 03:05:20 +01:00			`2) Join each RelOptInfo to other RelOptInfo as specified in`
			`RelOptInfo.joininfo. At this point each RelOptInfo is a single`
			`relation, so you are joining every relation to the other relations as`
			`joined in the WHERE clause.`
Major optimizer improvement for joining a large number of tables. 1999-02-09 04:51:42 +01:00
			`Joins occur using two RelOptInfos. One is outer, the other inner.`
			`Outers drive lookups of values in the inner. In a nested loop, lookups`
			`of values in the inner occur by scanning to find each matching inner`
Update optimizer readme. 1999-02-15 23:19:01 +01:00			`row. In a mergejoin, inner and outer rows are ordered, and are accessed`
			`in order, so only one scan of inner is required to perform the entire`
			`join. In a hashjoin, inner rows are hashed for lookups.`
Major optimizer improvement for joining a large number of tables. 1999-02-09 04:51:42 +01:00
			`Each unique join combination becomes a new RelOptInfo. The RelOptInfo`
Update optimizer readme. 1999-02-15 23:19:01 +01:00			`is now the joining of two relations. RelOptInfo.pathlist are various`
Major optimizer improvement for joining a large number of tables. 1999-02-09 04:51:42 +01:00			`paths to create the joined result, having different orderings depending`
			`on the join method used.`

			`3) At this point, every RelOptInfo is joined to each other again, with`
			`a new relation added to each RelOptInfo. This continues until all`
			`relations have been joined into one RelOptInfo, and the cheapest Path is`
			`chosen.`

Update optimizer readme. 1999-02-15 23:19:01 +01:00			`SELECT *`
			`FROM tab1, tab2, tab3, tab4`
			`WHERE tab1.col = tab2.col AND`
			`tab2.col = tab3.col AND`
			`tab3.col = tab4.col`

			`Tables 1, 2, 3, and 4 are joined as:`
			`{1 2},{2 3},{3 4}`
			`{1 2 3},{2 3 4}`
			`{1 2 3 4}`

			`SELECT *`
			`FROM tab1, tab2, tab3, tab4`
			`WHERE tab1.col = tab2.col AND`
			`tab1.col = tab3.col AND`
			`tab1.col = tab4.col`

			`Tables 1, 2, 3, and 4 are joined as:`
			`{1 2},{1 3},{1 4}`
			`{1 2 3},{1 3 4},{1,2,4}`
			`{1 2 3 4}`

			`In the default left-handed joins, each RelOptInfo adds one`
			`single-relation RelOptInfo in each join pass, and the added RelOptInfo`
			`is always the inner relation in the join. In right-handed joins, the`
			`added RelOptInfo is the outer relation in the join. In bushy plans,`
			`multi-relation RelOptInfo's can be joined to other multi-relation`
			`RelOptInfo's.`
Optimizer cleanup. 1999-02-08 05:29:25 +01:00
More optimizer cleanups. 1999-02-04 04:19:11 +01:00			`Optimizer Functions`
			`-------------------`

Optimizer rename ClauseInfo -> RestrictInfo. Update optimizer README. 1999-02-03 21:15:53 +01:00			`These directories take the Query structure returned by the parser, and`
Add optimizer README file. 1997-12-17 19:02:33 +01:00			`generate a plan used by the executor. The /plan directory generates the`
			`plan, the /path generates all possible ways to join the tables, and`
			`/prep handles special cases like inheritance. /utils is utility stuff.`

			`planner()`
			`handle inheritance by processing separately`
			`-init_query_planner()`
			`preprocess target list`
			`preprocess qualifications(WHERE)`
			`--query_planner()`
Cleanup optimizer function names and clarify code. 1998-08-10 04:26:40 +02:00			`cnfify()`
			`Summary:`

			`Simple cases with all AND's are handled by removing the AND's:`

			`convert: a = 1 AND b = 2 AND c = 3`
			`to: a = 1, b = 2, c = 3`

			`Qualifications with OR's are handled differently. OR's inside AND`
			`clauses are not modified drastically:`

			`convert: a = 1 AND b = 2 AND (c = 3 OR d = 4)`
			`to: a = 1, b = 2, c = 3 OR d = 4`

			`OR's in the upper level are more complex to handle:`

			`convert: (a = 1 AND b = 2) OR c = 3`
			`to: (a = 1 OR c = 3) AND (b = 2 OR c = 3)`
			`finally: (a = 1 OR c = 3), (b = 2 OR c = 3)`

			`These clauses all have to be true for a result to be returned,`
			`so the optimizer can choose the most restrictive clauses.`

Add optimizer README file. 1997-12-17 19:02:33 +01:00			`pull out constants from target list`
			`get a target list that only contains column names, no expressions`
			`if none, then return`
			`---subplanner()`
			`make list of relations in target`
			`make list of relations in where clause`
Update optimizer readme. 1999-02-15 23:19:01 +01:00			`split up the qual into restrictions (a=1) and joins (b=c)`
			`find relation clauses can do merge sort and hash joins`
			`----make_one_rel()`
			`set_base_rel_pathlist()`
			`find scan and all index paths for each relation`
			`find selectivity of columns used in joins`
			`-----make_one_rel_by_joins()`
Add optimizer README file. 1997-12-17 19:02:33 +01:00			`jump to geqo if needed`
			`again:`
Update optimizer readme. 1999-02-15 23:19:01 +01:00			`make_rels_by_joins():`
Add optimizer README file. 1997-12-17 19:02:33 +01:00			`for each joinrel:`
Update optimizer readme. 1999-02-15 23:19:01 +01:00			`make_rels_by_clause_joins()`
			`for each rel's joininfo list:`
Add optimizer README file. 1997-12-17 19:02:33 +01:00			`if a join from the join clause adds only one relation, do the join`
Update optimizer readme. 1999-02-15 23:19:01 +01:00			`or make_rels_by_clauseless_joins()`
			`update_rels_pathlist_for_joins()`
			`generate nested,merge,hash join paths for new rel's created above`
			`merge_rels_with_same_relids()`
			`merge RelOptInfo paths that have the same relids because of joins`
			`rels_set_cheapest()`
			`set cheapest path`
			`if all relations in one RelOptInfo, return`
Add optimizer README file. 1997-12-17 19:02:33 +01:00			`do group(GROUP)`
			`do aggregate`
			`put back constants`
			`re-flatten target list`
			`make unique(DISTINCT)`
			`make sort(ORDER BY)`
Optimizer rename ClauseInfo -> RestrictInfo. Update optimizer README. 1999-02-03 21:15:53 +01:00


			`Optimizer Structures`
			`--------------------`
More optimizer cleanups. 1999-02-04 04:19:11 +01:00
Update optimizer readme. 1999-02-15 23:19:01 +01:00			`RelOptInfo - a relation or joined relations`
More optimizer cleanups. 1999-02-04 04:19:11 +01:00
optimizer cleanup 1999-02-19 03:05:20 +01:00			`RestrictInfo - restriction clauses, like "x = 3"`
			`JoinInfo - join clauses, including the relids needed for the join`
More optimizer cleanups. 1999-02-04 04:19:11 +01:00
Update optimizer readme. 1999-02-15 23:19:01 +01:00			`Path - every way to generate a RelOptInfo(sequential,index,joins)`
			`IndexPath - index scans`
			`NestPath - nested joins`
			`MergePath - merge joins`
			`HashPath - hash joins`
More optimizer renaming HInfo -> HashInfo. 1999-02-04 02:47:02 +01:00
Update optimizer readme. 1999-02-15 23:19:01 +01:00			`PathOrder - every ordering type (sort, merge of relations)`