postgresql

Commit Graph

Author	SHA1	Message	Date
Tom Lane	a8fe109ac1	Fix thinko in hash cost estimation: average frequency should be computed from total number of distinct values in whole relation, not # distinct values we expect to have after restriction clauses are applied.	2001-06-10 02:59:35 +00:00
Tom Lane	7c579fa12d	Further work on making use of new statistics in planner. Adjust APIs of costsize.c routines to pass Query root, so that costsize can figure more things out by itself and not be so dependent on its callers to tell it everything it needs to know. Use selectivity of hash or merge clause to estimate number of tuples processed internally in these joins (this is more useful than it would've been before, since eqjoinsel is somewhat more accurate than before).	2001-06-05 05:26:05 +00:00
Tom Lane	be03eb25f3	Modify optimizer data structures so that IndexOptInfo lists built for create_index_paths are not immediately discarded, but are available for subsequent planner work. This allows avoiding redundant syscache lookups in several places. Change interface to operator selectivity estimation procedures to allow faster and more flexible estimation. Initdb forced due to change of pg_proc entries for selectivity functions!	2001-05-20 20:28:20 +00:00
Tom Lane	c23bc6fbb0	First cut at making indexscan cost estimates depend on correlation between index order and table order.	2001-05-09 23:13:37 +00:00
Tom Lane	6cda3ad8fe	Cause planner to make use of average-column-width statistic that is now collected by ANALYZE. Also, add some modest amount of intelligence to guesses that are used for varlena columns in the absence of any ANALYZE statistics. The 'width' reported by EXPLAIN is finally something less than totally bogus for varlena columns ... and, in consequence, hashjoin estimating should be a little better ...	2001-05-09 00:35:09 +00:00
Tom Lane	f905d65ee3	Rewrite of planner statistics-gathering code. ANALYZE is now available as a separate statement (though it can still be invoked as part of VACUUM, too). pg_statistic redesigned to be more flexible about what statistics are stored. ANALYZE now collects a list of several of the most common values, not just one, plus a histogram (not just the min and max values). Random sampling is used to make the process reasonably fast even on very large tables. The number of values and histogram bins collected is now user-settable via an ALTER TABLE command. There is more still to do; the new stats are not being used everywhere they could be in the planner. But the remaining changes for this project should be localized, and the behavior is already better than before. A not-very-related change is that sorting now makes use of btree comparison routines if it can find one, rather than invoking '<' twice.	2001-05-07 00:43:27 +00:00
Tom Lane	a43f20cb0a	Tweak nestloop costing to weight restart cost of inner path more heavily. Without this, it was making some pretty silly decisions about whether an expensive sub-SELECT should be the inner or outer side of a join...	2001-04-25 22:04:37 +00:00
Bruce Momjian	9e1552607a	pgindent run. Make it all clean.	2001-03-22 04:01:46 +00:00
Tom Lane	b29f68f611	Take OUTER JOIN semantics into account when estimating the size of join relations. It's not very bright, but at least it now knows that A LEFT JOIN B must produce at least as many rows as are in A ...	2001-02-16 00:03:08 +00:00
Tom Lane	83b4ab53ad	Update a couple of obsolete comments.	2001-02-15 17:46:40 +00:00
Bruce Momjian	623bf843d2	Change Copyright from PostgreSQL, Inc to PostgreSQL Global Development Group.	2001-01-24 19:43:33 +00:00
Tom Lane	17b843d677	Cache eval cost of qualification expressions in RestrictInfo nodes to avoid repeated evaluations in cost_qual_eval(). This turns out to save a useful fraction of planning time. No change to external representation of RestrictInfo --- although that node type doesn't appear in stored rules anyway.	2000-12-12 23:33:34 +00:00
Bruce Momjian	b32685a999	Add proofreader's changes to docs. Fix misspelling of disbursion to dispersion.	2000-10-05 19:48:34 +00:00
Tom Lane	3a94e789f5	Subselects in FROM clause, per ISO syntax: FROM (SELECT ...) [AS] alias. (Don't forget that an alias is required.) Views reimplemented as expanding to subselect-in-FROM. Grouping, aggregates, DISTINCT in views actually work now (he says optimistically). No UNION support in subselects/views yet, but I have some ideas about that. Rule-related permissions checking moved out of rewriter and into executor. INITDB REQUIRED!	2000-09-29 18:21:41 +00:00
Tom Lane	1ee26b7764	Reimplement nodeMaterial to use a temporary BufFile (or even memory, if the materialized tupleset is small enough) instead of a temporary relation. This was something I was thinking of doing anyway for performance, and Jan says he needs it for TOAST because he doesn't want to cope with toasting noname relations. With this change, the 'noname table' support in heap.c is dead code, and I have accordingly removed it. Also clean up 'noname' plan handling in planner --- nonames are either sort or materialize plans, and it seems less confusing to handle them separately under those names.	2000-06-18 22:44:35 +00:00
Peter Eisentraut	6a68f42648	The heralded `Grand Unified Configuration scheme' (GUC) That means you can now set your options in either or all of $PGDATA/configuration, some postmaster option (--enable-fsync=off), or set a SET command. The list of options is in backend/utils/misc/guc.c, documentation will be written post haste. pg_options is gone, so is that pq_geqo config file. Also removed were backend -K, -Q, and -T options (no longer applicable, although -d0 does the same as -Q). Added to configure an --enable-syslog option. changed all callers from TPRINTF to elog(DEBUG)	2000-05-31 00:28:42 +00:00
Tom Lane	0f1e39643d	Third round of fmgr updates: eliminate calls using fmgr() and fmgr_faddr() in favor of new-style calls. Lots of cleanup of sloppy casts to use XXXGetDatum and DatumGetXXX ...	2000-05-30 04:25:00 +00:00
Bruce Momjian	a12a23f0d0	Remove unused include files. Do not touch /port or includes used by defines.	2000-05-30 00:49:57 +00:00
Tom Lane	25442d8d2f	Correct oversight in hashjoin cost estimation: nodeHash sizes its hash table for an average of NTUP_PER_BUCKET tuples/bucket, but cost_hashjoin was assuming a target load of one tuple/bucket. This was causing a noticeable underestimate of hashjoin costs.	2000-04-18 05:43:02 +00:00
Bruce Momjian	52f77df613	Ye-old pgindent run. Same 4-space tabs.	2000-04-12 17:17:23 +00:00
Tom Lane	9c38a8d296	Further tweaking of indexscan cost estimates.	2000-04-09 04:31:37 +00:00
Tom Lane	e55985d3be	Tweak indexscan cost estimation: round estimated # of tuples visited up to next integer. Previously, if selectivity was small, we could compute very tiny scan cost on the basis of estimating that only 0.001 tuple would be fetched, which is silly. This naturally led to some rather silly plans...	2000-03-30 00:53:30 +00:00
Tom Lane	1d5e7a6f46	Repair logic flaw in cost estimator: cost_nestloop() was estimating CPU costs using the inner path's parent->rows count as the number of tuples processed per inner scan iteration. This is wrong when we are using an inner indexscan with indexquals based on join clauses, because the rows count in a Relation node reflects the selectivity of the restriction clauses for that rel only. Upshot was that if join clause was very selective, we'd drastically overestimate the true cost of the join. Fix is to calculate correct output-rows estimate for an inner indexscan when the IndexPath node is created and save it in the path node. Change of path node doesn't require initdb, since path nodes don't appear in saved rules.	2000-03-22 22:08:35 +00:00
Tom Lane	6217a8c7ba	Fix some bogosities in the code that deals with estimating the fraction of tuples we are going to retrieve from a sub-SELECT. Must have been half asleep when I did this code the first time :-(	2000-03-14 02:23:15 +00:00
Tom Lane	b1577a7c78	New cost model for planning, incorporating a penalty for random page accesses versus sequential accesses, a (very crude) estimate of the effects of caching on random page accesses, and cost to evaluate WHERE- clause expressions. Export critical parameters for this model as SET variables. Also, create SET variables for the planner's enable flags (enable_seqscan, enable_indexscan, etc) so that these can be controlled more conveniently than via PGOPTIONS. Planner now estimates both startup cost (cost before retrieving first tuple) and total cost of each path, so it can optimize queries with LIMIT on a reasonable basis by interpolating between these costs. Same facility is a win for EXISTS(...) subqueries and some other cases. Redesign pathkey representation to achieve a major speedup in planning (I saw as much as 5X on a 10-way join); also minor changes in planner to reduce memory consumption by recycling discarded Path nodes and not constructing unnecessary lists. Minor cleanups to display more-plausible costs in some cases in EXPLAIN output. Initdb forced by change in interface to index cost estimation functions.	2000-02-15 20:49:31 +00:00
Tom Lane	d8733ce674	Repair planning bugs caused by my misguided removal of restrictinfo link fields in JoinPaths --- turns out that we do need that after all :-(. Also, rearrange planner so that only one RelOptInfo is created for a particular set of joined base relations, no matter how many different subsets of relations it can be created from. This saves memory and processing time compared to the old method of making a bunch of RelOptInfos and then removing the duplicates. Clean up the jointree iteration logic; not sure if it's better, but I sure find it more readable and plausible now, particularly for the case of 'bushy plans'.	2000-02-07 04:41:04 +00:00
Bruce Momjian	5c25d60244	Add: * Portions Copyright (c) 1996-2000, PostgreSQL, Inc to all files copyright Regents of Berkeley. Man, that's a lot of files.	2000-01-26 05:58:53 +00:00
Tom Lane	8449df8a67	First cut at unifying regular selectivity estimation with indexscan selectivity estimation wasn't right. This is better...	2000-01-23 02:07:00 +00:00
Tom Lane	71ed7eb494	Revise handling of index-type-specific indexscan cost estimation, per pghackers discussion of 5-Jan-2000. The amopselect and amopnpages estimators are gone, and in their place is a per-AM amcostestimate procedure (linked to from pg_am, not pg_amop).	2000-01-22 23:50:30 +00:00
Tom Lane	166b5c1def	Another round of planner/optimizer work. This is just restructuring and code cleanup; no major improvements yet. However, EXPLAIN does produce more intuitive outputs for nested loops with indexscans now...	2000-01-09 00:26:47 +00:00
Bruce Momjian	6f9ff92cc0	Tid access method feature from Hiroshi Inoue, Inoue@tpf.co.jp	1999-11-23 20:07:06 +00:00
Tom Lane	78114cd4d4	Further planner/optimizer cleanups. Move all set_tlist_references and fix_opids processing to a single recursive pass over the plan tree executed at the very tail end of planning, rather than haphazardly here and there at different places. Now that tlist Vars do not get modified until the very end, it's possible to get rid of the klugy var_equal and match_varid partial-matching routines, and just use plain equal() throughout the optimizer. This is a step towards allowing merge and hash joins to be done on expressions instead of only Vars ...	1999-08-22 20:15:04 +00:00
Tom Lane	e1fad50a5d	Revise generation of hashjoin paths: generate one path per hashjoinable clause, not one path for a randomly-chosen element of each set of clauses with the same join operator. That is, if you wrote SELECT ... WHERE t1.f1 = t2.f2 and t1.f3 = t2.f4, and both '=' ops were the same opcode (say, all four fields are int4), then the system would either consider hashing on f1=f2 or on f3=f4, but it would not consider both possibilities. Boo hiss. Also, revise estimation of hashjoin costs to include a penalty when the inner join var has a high disbursion --- ie, the most common value is pretty common. This tends to lead to badly skewed hash bucket occupancy and way more comparisons than you'd expect on average. I imagine that the cost calculation still needs tweaking, but at least it generates a more reasonable plan than before on George Young's example.	1999-08-06 04:00:17 +00:00
Bruce Momjian	a71802e12e	Final cleanup.	1999-07-16 05:00:38 +00:00
Bruce Momjian	2e6b1e63a3	Remove unused #includes in *.c files.	1999-07-15 22:40:16 +00:00
Bruce Momjian	e9c977da7d	Fix spelling of variable name.	1999-07-07 09:36:45 +00:00
Bruce Momjian	9f7ac20e57	Cleanup of min tuple size.	1999-07-07 09:27:28 +00:00
Bruce Momjian	1391098851	Fix misspelling.	1999-07-07 09:11:15 +00:00
Bruce Momjian	fcff1cdf4e	Another pgindent run. Sorry folks.	1999-05-25 22:43:53 +00:00
Bruce Momjian	07842084fe	pgindent run over code.	1999-05-25 16:15:34 +00:00
Tom Lane	605d84941d	Clean up cost_sort some more: most callers were double-counting the cost of reading the source data.	1999-05-01 19:47:42 +00:00
Tom Lane	7a7ba33536	Clean up some bogosities in path cost estimation, like sometimes estimating an index scan of a table to be cheaper than a sequential scan of the same tuples...	1999-04-30 04:01:44 +00:00
Tom Lane	e91f43a122	Fix potential overflow problems when relation size exceeds 2gig. Fix failure to reliably put the smaller relation on the inside of a hashjoin.	1999-04-05 02:07:07 +00:00
Bruce Momjian	ba2883b264	Remove duplicate geqo functions, and more optimizer cleanup	1999-02-15 03:22:37 +00:00
Bruce Momjian	6724a50787	Change my-function-name-- to my_function_name, and optimizer renames.	1999-02-13 23:22:53 +00:00
Bruce Momjian	ad4b27ac3f	Optimizer cleanup.	1999-02-12 17:25:05 +00:00
Bruce Momjian	c0d17c7aee	JoinPath -> NestPath for nested loop.	1999-02-12 06:43:53 +00:00
Bruce Momjian	9dbb0efb0b	Optmizer cleanup	1999-02-10 21:02:50 +00:00
Bruce Momjian	f859c81c18	Rename Path.keys to Path.pathkeys. Too many 'keys' used for other things.	1999-02-10 03:52:54 +00:00
Bruce Momjian	318e593f03	Rename Temp to Noname for noname tables.	1999-02-09 17:03:14 +00:00

1 2

76 Commits