There are two implementation techniques: the executor understands a new
JOIN_IN jointype, which emits at most one matching row per left-hand row,
or the result of the IN's sub-select can be fed through a DISTINCT filter
and then joined as an ordinary relation.
Along the way, some minor code cleanup in the optimizer; notably, break
out most of the jointree-rearrangement preprocessing in planner.c and
put it in a new file prep/prepjointree.c.
containing a volatile function), rather than only on 'Var = Var' clauses
as before. This makes it practical to do flatten_join_alias_vars at the
start of planning, which in turn eliminates a bunch of klugery inside the
planner to deal with alias vars. As a free side effect, we now detect
implied equality of non-Var expressions; for example in
SELECT ... WHERE a.x = b.y and b.y = 42
we will deduce a.x = 42 and use that as a restriction qual on a. Also,
we can remove the restriction introduced 12/5/02 to prevent pullup of
subqueries whose targetlists contain sublinks.
Still TODO: make statistical estimation routines in selfuncs.c and costsize.c
smarter about expressions that are more complex than plain Vars. The need
for this is considerably greater now that we have to be able to estimate
the suitability of merge and hash join techniques on such expressions.
costs for expression evaluation, not only per-tuple cost as before.
This extension is needed in order to deal realistically with hashed or
materialized sub-selects.
computation: reduce the bucket number mod nbatch. This changes the
association between original bucket numbers and batches, but that
doesn't matter. Minor other cleanups in hashjoin code to help
centralize decisions.
allocation in best_inner_indexscan(). While at it, simplify GEQO's
interface to the main planner --- make_join_rel() offers exactly the
API it really wants, whereas calling make_rels_by_clause_joins() and
make_rels_by_clauseless_joins() required jumping through hoops.
Rewrite gimme_tree for clarity (sometimes iteration is much better than
recursion), and approximately halve GEQO's runtime by recognizing that
tours of the forms (a,b,c,d,...) and (b,a,c,d,...) are equivalent
because of symmetry in make_join_rel().
a per-query memory context created by CreateExecutorState --- and destroyed
by FreeExecutorState. This provides a final solution to the longstanding
problem of memory leaked by various ExecEndNode calls.
in the planned representation of a subplan at all any more, only SubPlan.
This means subselect.c doesn't scribble on its input anymore, which seems
like a good thing; and there are no longer three different possible
interpretations of a SubLink. Simplify node naming and improve comments
in primnodes.h. No change to stored rules, though.
execution state trees, and ExecEvalExpr takes an expression state tree
not an expression plan tree. The plan tree is now read-only as far as
the executor is concerned. Next step is to begin actually exploiting
this property.
so that all executable expression nodes inherit from a common supertype
Expr. This is somewhat of an exercise in code purity rather than any
real functional advance, but getting rid of the extra Oper or Func node
formerly used in each operator or function call should provide at least
a little space and speed improvement.
initdb forced by changes in stored-rules representation.
instead of only one. This should speed up planning (only one hash path
to consider for a given pair of relations) as well as allow more effective
hashing, when there are multiple hashable joinclauses.
joinclauses is determined accurately for each join. Formerly, the code only
considered joinclauses that used all of the rels from the outer side of the
join; thus for example
FROM (a CROSS JOIN b) JOIN c ON (c.f1 = a.x AND c.f2 = b.y)
could not exploit a two-column index on c(f1,f2), since neither of the
qual clauses would be in the joininfo list it looked in. The new code does
this correctly, and also is able to eliminate redundant clauses, thus fixing
the problem noted 24-Oct-02 by Hans-Jürgen Schönig.
parameter to allow it to be forced off for comparison purposes.
Add ORDER BY clauses to a bunch of regression test queries that will
otherwise produce randomly-ordered output in the new regime.
node now does its own grouping of the input rows, and has no need for a
preceding GROUP node in the plan pipeline. This allows elimination of
the misnamed tuplePerGroup option for GROUP, and actually saves more code
in nodeGroup.c than it costs in nodeAgg.c, as well as being presumably
faster. Restructure the API of query_planner so that we do not commit to
using a sorted or unsorted plan in query_planner; instead grouping_planner
makes the decision. (Right now it isn't any smarter than query_planner
was, but that will change as soon as it has the option to select a hash-
based aggregation step.) Despite all the hackery, no initdb needed since
only in-memory node types changed.
Ray Ontko 28-June-02. Also, fix prefix_selectivity for NAME lefthand
variables (it was bogusly assuming binary compatibility), and adjust
make_greater_string() to not call pg_mbcliplen() with invalid multibyte
data (this last per bug report that I can't find at the moment, but it
was in July '02).
to be flexible about assignment casts without introducing ambiguity in
operator/function resolution. Introduce a well-defined promotion hierarchy
for numeric datatypes (int2->int4->int8->numeric->float4->float8).
Change make_const to initially label numeric literals as int4, int8, or
numeric (never float8 anymore).
Explicitly mark Func and RelabelType nodes to indicate whether they came
from a function call, explicit cast, or implicit cast; use this to do
reverse-listing more accurately and without so many heuristics.
Explicit casts to char, varchar, bit, varbit will truncate or pad without
raising an error (the pre-7.2 behavior), while assigning to a column without
any explicit cast will still raise an error for wrong-length data like 7.3.
This more nearly follows the SQL spec than 7.2 behavior (we should be
reporting a 'completion condition' in the explicit-cast cases, but we have
no mechanism for that, so just do silent truncation).
Fix some problems with enforcement of typmod for array elements;
it didn't work at all in 'UPDATE ... SET array[n] = foo', for example.
Provide a generalized array_length_coerce() function to replace the
specialized per-array-type functions that used to be needed (and were
missing for NUMERIC as well as all the datetime types).
Add missing conversions int8<->float4, text<->numeric, oid<->int8.
initdb forced.
> src/backend/optimizer/path/indxpath.c; see the "special indexable
> operators" stuff near the bottom of that file. (It's a bit of a crock
> that this code is hardwired there, and not somehow accessed through a
> system catalog, but it's what we've got at the moment.)
The attached patch re-enables a bytea right hand argument (as compared
to a text right hand argument), and enables index usage, for bytea LIKE
Joe Conway
Reused the Expr node to hold DISTINCT which strongly resembles
the existing OP info. Define DISTINCT_EXPR which strongly resembles
the existing OPER_EXPR opType, but with handling for NULLs required
by SQL99.
We have explicit support for single-element DISTINCT comparisons
all the way through to the executor. But, multi-element DISTINCTs
are handled by expanding into a comparison tree in gram.y as is done for
other row comparisons. Per discussions, it might be desirable to move
this into one or more purpose-built nodes to be handled in the backend.
Define the optional ROW keyword and token per SQL99.
This allows single-element row constructs, which were formerly disallowed
due to shift/reduce conflicts with parenthesized a_expr clauses.
Define the SQL99 TREAT() function. Currently, use as a synonym for CAST().
comments on one of the optimizer functions a lot more
clear, adds a summary of the recent KSQO discussion to the
comments in the code, adds regression tests for the bug with
sequence state Tom fixed recently and another reg. test, and
removes some PostQuel legacy stuff: ExecAppend -> ExecInsert,
ExecRetrieve -> ExecSelect, etc.
Error messages remain unchanged until a vote.
Neil Conway
comments on one of the optimizer functions a lot more
clear, adds a summary of the recent KSQO discussion to the
comments in the code, adds regression tests for the bug with
sequence state Tom fixed recently and another reg. test, and
removes some PostQuel legacy stuff: ExecAppend -> ExecInsert,
ExecRetrieve -> ExecSelect, etc. This was changed because the
elog() messages from this routine are user-visible, so we
should be using the SQL terms.
Neil Conway
yesterday's proposal to pghackers. Also remove unnecessary parameters
to heap_beginscan, heap_rescan. I modified pg_proc.h to reflect the
new numbers of parameters for the AM interface routines, but did not
force an initdb because nothing actually looks at those fields.
returns-set boolean field in Func and Oper nodes. This allows cleaner,
more reliable tests for expressions returning sets in the planner and
parser. For example, a WHERE clause returning a set is now detected
and complained of in the parser, not only at runtime.
some kibitzing from Tom Lane. Not everything works yet, and there's
no documentation or regression test, but let's commit this so Joe
doesn't need to cope with tracking changes in so many files ...
qualified operator names directly, for example CREATE OPERATOR myschema.+
( ... ). To qualify an operator name in an expression you need to write
OPERATOR(myschema.+) (thanks to Peter for suggesting an escape hatch).
I also took advantage of having to reformat pg_operator to fix something
that'd been bugging me for a while: mergejoinable operators should have
explicit links to the associated cross-data-type comparison operators,
rather than hardwiring an assumption that they are named < and >.
volatile), rather than the old cachable/noncachable distinction. This
allows indexscan optimizations in many places where we formerly didn't.
Also, add a pronamespace column to pg_proc (it doesn't do anything yet,
however).
now has an RTE of its own, and references to its outputs now are Vars
referencing the JOIN RTE, rather than CASE-expressions. This allows
reverse-listing in ruleutils.c to use the correct alias easily, rather
than painfully reverse-engineering the alias namespace as it used to do.
Also, nested FULL JOINs work correctly, because the result of the inner
joins are simple Vars that the planner can cope with. This fixes a bug
reported a couple times now, notably by Tatsuo on 18-Nov-01. The alias
Vars are expanded into COALESCE expressions where needed at the very end
of planning, rather than during parsing.
Also, beginnings of support for showing plan qualifier expressions in
EXPLAIN. There are probably still cases that need work.
initdb forced due to change of stored-rule representation.
o Change all current CVS messages of NOTICE to WARNING. We were going
to do this just before 7.3 beta but it has to be done now, as you will
see below.
o Change current INFO messages that should be controlled by
client_min_messages to NOTICE.
o Force remaining INFO messages, like from EXPLAIN, VACUUM VERBOSE, etc.
to always go to the client.
o Remove INFO from the client_min_messages options and add NOTICE.
Seems we do need three non-ERROR elog levels to handle the various
behaviors we need for these messages.
Regression passed.
now just below FATAL in server_min_messages. Added more text to
highlight ordering difference between it and client_min_messages.
---------------------------------------------------------------------------
REALLYFATAL => PANIC
STOP => PANIC
New INFO level the prints to client by default
New LOG level the prints to server log by default
Cause VACUUM information to print only to the client
NOTICE => INFO where purely information messages are sent
DEBUG => LOG for purely server status messages
DEBUG removed, kept as backward compatible
DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1 added
DebugLvl removed in favor of new DEBUG[1-5] symbols
New server_min_messages GUC parameter with values:
DEBUG[5-1], INFO, NOTICE, ERROR, LOG, FATAL, PANIC
New client_min_messages GUC parameter with values:
DEBUG[5-1], LOG, INFO, NOTICE, ERROR, FATAL, PANIC
Server startup now logged with LOG instead of DEBUG
Remove debug_level GUC parameter
elog() numbers now start at 10
Add test to print error message if older elog() values are passed to elog()
Bootstrap mode now has a -d that requires an argument, like postmaster
both input streams to the end. If one variable's range is much less
than the other, an indexscan-based merge can win by not scanning all
of the other table. Per example from Reinhard Max.
set-returning functions in its target list. This ensures that we
won't rewrite the query in a way that places set-returning functions
into quals (WHERE clauses). Cf. bug reports from Joe Conway.
clauses per path key. Indeed, we *must* do so or we will be unable to
form a valid plan for FULL JOIN with overlapping join conditions, eg
select * from a full join b on
a.v1 = b.v1 and a.v2 = b.v2 and a.v1 = b.v2.
mergeclauses in RIGHT/FULL join cases, just like the other routines have.
I'm not quite sure why I thought it didn't need one --- but Nick
Fankhauser's recent bug report proves that it does.
clause being added to a particular restriction-clause list is redundant
with those already in the list. This avoids useless work at runtime,
and (perhaps more importantly) keeps the selectivity estimation routines
from generating too-small estimates of numbers of output rows.
Also some minor improvements in OPTIMIZER_DEBUG displays.
pgsql-hackers. pg_opclass now has a row for each opclass supported by each
index AM, not a row for each opclass name. This allows pg_opclass to show
directly whether an AM supports an opclass, and furthermore makes it possible
to store additional information about an opclass that might be AM-dependent.
pg_opclass and pg_amop now store "lossy" and "haskeytype" information that we
previously expected the user to remember to provide in CREATE INDEX commands.
Lossiness is no longer an index-level property, but is associated with the
use of a particular operator in a particular index opclass.
Along the way, IndexSupportInitialize now uses the syscaches to retrieve
pg_amop and pg_amproc entries. I find this reduces backend launch time by
about ten percent, at the cost of a couple more special cases in catcache.c's
IndexScanOK.
Initial work by Oleg Bartunov and Teodor Sigaev, further hacking by Tom Lane.
initdb forced.
clauses are equal(), before trying to match them up using btree opclass
inference rules. This allows it to recognize many simple cases involving
non-btree operations, for example 'x IS NULL'. Clean up code a little.
has a DISTINCT ON clause, per bug report from Anthony Wood. While at it,
improve the DISTINCT-ON-clause recognizer routine to not be fooled by out-
of-order DISTINCT lists.
Note: I didn't force an initdb, figuring that one today was enough.
However, there is a new function in pg_proc.h, and pg_dump won't be
able to dump partial indexes until you add that function.
IS TRUE, etc, with some degree of verisimilitude. Split out
selectivity support functions from builtins.h into a new header
file selfuncs.h, so as to reduce the number of header files builtins.h
must depend on. Fix a few missing inclusions exposed thereby.
From Joe Conway, with some kibitzing from Tom Lane.
should be computed from total number of distinct values in whole
relation, not # distinct values we expect to have after restriction
clauses are applied.
WHERE (a = 1 or a = 2) and b = 42
and an index on (a,b), include the clause b = 42 in the indexquals
generated for each arm of the OR clause. Essentially this is an index-
driven conversion from CNF to DNF. Implementation is a bit klugy, but
better than not exploiting the extra quals at all ...
of costsize.c routines to pass Query root, so that costsize can figure
more things out by itself and not be so dependent on its callers to tell
it everything it needs to know. Use selectivity of hash or merge clause
to estimate number of tuples processed internally in these joins
(this is more useful than it would've been before, since eqjoinsel is
somewhat more accurate than before).
create_index_paths are not immediately discarded, but are available for
subsequent planner work. This allows avoiding redundant syscache lookups
in several places. Change interface to operator selectivity estimation
procedures to allow faster and more flexible estimation.
Initdb forced due to change of pg_proc entries for selectivity functions!
collected by ANALYZE. Also, add some modest amount of intelligence to
guesses that are used for varlena columns in the absence of any ANALYZE
statistics. The 'width' reported by EXPLAIN is finally something less
than totally bogus for varlena columns ... and, in consequence, hashjoin
estimating should be a little better ...
a separate statement (though it can still be invoked as part of VACUUM, too).
pg_statistic redesigned to be more flexible about what statistics are
stored. ANALYZE now collects a list of several of the most common values,
not just one, plus a histogram (not just the min and max values). Random
sampling is used to make the process reasonably fast even on very large
tables. The number of values and histogram bins collected is now
user-settable via an ALTER TABLE command.
There is more still to do; the new stats are not being used everywhere
they could be in the planner. But the remaining changes for this project
should be localized, and the behavior is already better than before.
A not-very-related change is that sorting now makes use of btree comparison
routines if it can find one, rather than invoking '<' twice.
join clauses. The mergejoin executor wants all the join clauses to appear
as merge quals, not as extra joinquals, for these kinds of joins. But the
planner would consider plans in which partially-sorted input paths were
used, leading to only some of the join clauses becoming merge quals.
This is fine for inner/left joins, not fine for right/full joins.
1. If there is exactly one pg_operator entry of the right name and oprkind,
oper() and related routines would return that entry whether its input type
had anything to do with the request or not. This is just premature
optimization: we shouldn't return the single candidate until after we verify
that it really is a valid candidate, ie, is at least coercion-compatible
with the given types.
2. oper() and related routines only promise a coercion-compatible result.
Unfortunately, there were quite a few callers that assumed the returned
operator is binary-compatible with the given datatype; they would proceed
to call it without making any datatype coercions. These callers include
sorting, grouping, aggregation, and VACUUM ANALYZE. In general I think
it is appropriate for these callers to require an exact or binary-compatible
match, so I've added a new routine compatible_oper() that only succeeds if
it can find an operator that doesn't require any run-time conversions.
Callers now call oper() or compatible_oper() depending on whether they are
prepared to deal with type conversion or not.
The upshot of these bugs is revealed by the following silliness in PL/Tcl's
selftest: it creates an operator @< on int4, and then tries to use it to
sort a char(N) column. The system would let it do that :-( (and evidently
has done so since 6.3 :-( :-(). The result in this case was just a silly
sort order, but the reverse combination would've provoked coredump from
trying to dereference integers. With this fix you get more reasonable
behavior:
pltcl_test=# select * from T_pkey1 order by key1, key2 using @<;
ERROR: Unable to identify an operator '@<' for types 'bpchar' and 'bpchar'
You will have to retype this query using an explicit cast
try to push restrictions on the view down into the view subquery,
so that they can become indexscan quals or what-have-you rather than
being applied at the top level of the subquery. 7.0 and before were
able to do this, though in a much klugier way, and I'd hate to have
anyone complaining that 7.1 is stupider than 7.0 ...
comparison does not consider paths different when they differ only in
uninteresting aspects of sort order. (We had a special case of this
consideration for indexscans already, but generalize it to apply to
ordered join paths too.) Be stricter about what is a canonical pathkey
to allow faster pathkey comparison. Cache canonical pathkeys and
dispersion stats for left and right sides of a RestrictInfo's clause,
to avoid repeated computation. Total speedup will depend on number of
tables in a query, but I see about 4x speedup of planning phase for
a sample seven-table query.
avoid repeated evaluations in cost_qual_eval(). This turns out to save
a useful fraction of planning time. No change to external representation
of RestrictInfo --- although that node type doesn't appear in stored
rules anyway.
re-adopt these settings at every postmaster or standalone-backend startup.
This should fix problems with indexes becoming corrupt due to failure to
provide consistent locale environment for postmaster at all times. Also,
refuse to start up a non-locale-enabled compilation in a database originally
initdb'd with a non-C locale. Suppress LIKE index optimization if locale
is not "C" or "POSIX" (are there any other locales where it's safe?).
Issue NOTICE during initdb if selected locale disables LIKE optimization.
maintained for each cache entry. A cache entry will not be freed until
the matching ReleaseSysCache call has been executed. This eliminates
worries about cache entries getting dropped while still in use. See
my posting to pg-hackers of even date for more info.
joins, and clean things up a good deal at the same time. Append plan node
no longer hacks on rangetable at runtime --- instead, all child tables are
given their own RT entries during planning. Concept of multiple target
tables pushed up into execMain, replacing bug-prone implementation within
nodeAppend. Planner now supports generating Append plans for inheritance
sets either at the top of the plan (the old way) or at the bottom. Expanding
at the bottom is appropriate for tables used as sources, since they may
appear inside an outer join; but we must still expand at the top when the
target of an UPDATE or DELETE is an inheritance set, because we actually need
a different targetlist and junkfilter for each target table in that case.
Fortunately a target table can't be inside an outer join... Bizarre mutual
recursion between union_planner and prepunion.c is gone --- in fact,
union_planner doesn't really have much to do with union queries anymore,
so I renamed it grouping_planner.
SQL92 semantics, including support for ALL option. All three can be used
in subqueries and views. DISTINCT and ORDER BY work now in views, too.
This rewrite fixes many problems with cross-datatype UNIONs and INSERT/SELECT
where the SELECT yields different datatypes than the INSERT needs. I did
that by making UNION subqueries and SELECT in INSERT be treated like
subselects-in-FROM, thereby allowing an extra level of targetlist where the
datatype conversions can be inserted safely.
INITDB NEEDED!
(Don't forget that an alias is required.) Views reimplemented as expanding
to subselect-in-FROM. Grouping, aggregates, DISTINCT in views actually
work now (he says optimistically). No UNION support in subselects/views
yet, but I have some ideas about that. Rule-related permissions checking
moved out of rewriter and into executor.
INITDB REQUIRED!
query representation. Note that GEQO_RELS setting is now interpreted
as the number of top-level items in the FROM list, not necessarily the
number of relations in the query. This seems appropriate since we are
only doing join-path searching over the top-level items.
for example, an SQL function can be used in a functional index. (I make
no promises about speed, but it'll work ;-).) Clean up and simplify
handling of functions returning sets.
right thing with variable-free clauses that contain noncachable functions,
such as 'WHERE random() < 0.5' --- these are evaluated once per
potential output tuple. Expressions that contain only Params are
now candidates to be indexscan quals --- for example, 'var = ($1 + 1)'
can now be indexed. Cope with RelabelType nodes atop potential indexscan
variables --- this oversight prevents 7.0.* from recognizing some
potentially indexscanable situations.
from Param nodes, per discussion a few days ago on pghackers. Add new
expression node type FieldSelect that implements the functionality where
it's actually needed. Clean up some other unused fields in Func nodes
as well.
NOTE: initdb forced due to change in stored expression trees for rules.
to use with a multiple-key index. Formerly we would only extract clauses
that had to do with the first key of the index, which was correct but
didn't exploit the index fully.
mergejoinable qual clauses, and add them to the query quals. For
example, WHERE a = b AND b = c will cause us to add AND a = c.
This is necessary to ensure that it's safe to use these variables
as interchangeable sort keys, which is something 7.0 knows how to do.
Should provide a useful improvement in planning ability, too.
them, but forgot to attach relevant restriction clauses, so that the
plan represented a scan over the whole table with restrictions applied
as qpquals not indexquals. Another day, another bug...
materialized tupleset is small enough) instead of a temporary relation.
This was something I was thinking of doing anyway for performance, and Jan
says he needs it for TOAST because he doesn't want to cope with toasting
noname relations. With this change, the 'noname table' support in heap.c
is dead code, and I have accordingly removed it. Also clean up 'noname'
plan handling in planner --- nonames are either sort or materialize plans,
and it seems less confusing to handle them separately under those names.
(ie, parameters instead of consts) will be treated as a range query.
We do not know the actual selectivities involved, but it seems like
a good idea to use a smaller estimate than we would use for two unrelated
inequalities.
That means you can now set your options in either or all of $PGDATA/configuration,
some postmaster option (--enable-fsync=off), or set a SET command. The list of
options is in backend/utils/misc/guc.c, documentation will be written post haste.
pg_options is gone, so is that pq_geqo config file. Also removed were backend -K,
-Q, and -T options (no longer applicable, although -d0 does the same as -Q).
Added to configure an --enable-syslog option.
changed all callers from TPRINTF to elog(DEBUG)
key call sites are changed, but most called functions are still oldstyle.
An exception is that the PL managers are updated (so, for example, NULL
handling now behaves as expected in plperl and plpgsql functions).
NOTE initdb is forced due to added column in pg_proc.
cases where joinclauses were present but some joins have to be made
by cartesian-product join anyway. An example is
SELECT * FROM a,b,c WHERE (a.f1 + b.f2 + c.f3) = 0;
Even though all the rels have joinclauses, we must join two of them
in cartesian style before we can use the join clause...
table for an average of NTUP_PER_BUCKET tuples/bucket, but cost_hashjoin
was assuming a target load of one tuple/bucket. This was causing a
noticeable underestimate of hashjoin costs.
(LIKE and regexp matches). These are not yet referenced in pg_operator,
so by default the system will continue to use eqsel/neqsel.
Also, tweak convert_to_scalar() logic so that common prefixes of strings
are stripped off, allowing better accuracy when all strings in a table
share a common prefix.
to next integer. Previously, if selectivity was small, we could compute
very tiny scan cost on the basis of estimating that only 0.001 tuple
would be fetched, which is silly. This naturally led to some rather
silly plans...
to avoid undue sensitivity to roundoff error, believe that a zero
or slightly negative range estimate should represent a small
positive selectivity, rather than falling back on a generic default
estimate.
use a default value that's fairly small. We were generating a result
of about 0.1, but I think 0.01 is probably better --- want to encourage
use of an indexscan in this situation.
costs using the inner path's parent->rows count as the number of tuples
processed per inner scan iteration. This is wrong when we are using an
inner indexscan with indexquals based on join clauses, because the rows
count in a Relation node reflects the selectivity of the restriction
clauses for that rel only. Upshot was that if join clause was very
selective, we'd drastically overestimate the true cost of the join.
Fix is to calculate correct output-rows estimate for an inner indexscan
when the IndexPath node is created and save it in the path node.
Change of path node doesn't require initdb, since path nodes don't
appear in saved rules.
running gcc and HP's cc with warnings cranked way up. Signed vs unsigned
comparisons, routines declared static and then defined not-static,
that kind of thing. Tedious, but perhaps useful...
accesses versus sequential accesses, a (very crude) estimate of the
effects of caching on random page accesses, and cost to evaluate WHERE-
clause expressions. Export critical parameters for this model as SET
variables. Also, create SET variables for the planner's enable flags
(enable_seqscan, enable_indexscan, etc) so that these can be controlled
more conveniently than via PGOPTIONS.
Planner now estimates both startup cost (cost before retrieving
first tuple) and total cost of each path, so it can optimize queries
with LIMIT on a reasonable basis by interpolating between these costs.
Same facility is a win for EXISTS(...) subqueries and some other cases.
Redesign pathkey representation to achieve a major speedup in planning
(I saw as much as 5X on a 10-way join); also minor changes in planner
to reduce memory consumption by recycling discarded Path nodes and
not constructing unnecessary lists.
Minor cleanups to display more-plausible costs in some cases in
EXPLAIN output.
Initdb forced by change in interface to index cost estimation
functions.
fields in JoinPaths --- turns out that we do need that after all :-(.
Also, rearrange planner so that only one RelOptInfo is created for a
particular set of joined base relations, no matter how many different
subsets of relations it can be created from. This saves memory and
processing time compared to the old method of making a bunch of RelOptInfos
and then removing the duplicates. Clean up the jointree iteration logic;
not sure if it's better, but I sure find it more readable and plausible
now, particularly for the case of 'bushy plans'.
nonoverlap_sets() and is_subset() to list.c, where they should have lived
to begin with, and rename to nonoverlap_setsi and is_subseti since they
only work on integer lists.
extracting from an AND subclause just those opclauses that are relevant
for a particular index. For example, we can now consider using an index
on x to process WHERE (x = 1 AND y = 2) OR (x = 2 AND y = 4) OR ...
(ie, WHERE x > lowbound AND x < highbound). It's not very bright yet
but it does something useful. Also, rename intltsel/intgtsel to
scalarltsel/scalargtsel to reflect usage better. Extend convert_to_scalar
to do something a little bit useful with string data types. Still need
to make it do something with date/time datatypes, but I'll wait for
Thomas's datetime unification dust to settle first. Eventually the
routine ought not have any type-specific knowledge at all; it ought to
be calling a type-dependent routine found via a pg_type column; but
that's a task for another day.
pghackers discussion of 5-Jan-2000. The amopselect and amopnpages
estimators are gone, and in their place is a per-AM amcostestimate
procedure (linked to from pg_am, not pg_amop).
Make all system indexes unique.
Make all cache loads use system indexes.
Rename *rel to *relid in inheritance tables.
Rename cache names to be clearer.
additional argument specifying the kind of lock to acquire/release (or
'NoLock' to do no lock processing). Ensure that all relations are locked
with some appropriate lock level before being examined --- this ensures
that relevant shared-inval messages have been processed and should prevent
problems caused by concurrent VACUUM. Fix several bugs having to do with
mismatched increment/decrement of relation ref count and mismatched
heap_open/close (which amounts to the same thing). A bogus ref count on
a relation doesn't matter much *unless* a SI Inval message happens to
arrive at the wrong time, which is probably why we got away with this
sloppiness for so long. Repair missing grab of AccessExclusiveLock in
DROP TABLE, ALTER/RENAME TABLE, etc, as noted by Hiroshi.
Recommend 'make clean all' after pulling this update; I modified the
Relation struct layout slightly.
Will post further discussion to pghackers list shortly.
conditions. There are some pretty bogus heuristics in prepqual.c that
try to decide whether to output CNF or DNF format; they need to be replaced,
likely. Right now the code is probably too willing to choose DNF form,
which might hurt performance in some cases that used to work OK.
But at least we have a foundation to build on.
was rejecting negative attnums as bogus, which of course they are not.
Add code to get_attdisbursion to produce a useful value for OID attribute,
since VACUUM does not store stats for system attributes.
Also, repair bug that's been in eqjoinsel for a long time: it was taking
the max of the two columns' disbursions, whereas it should use the min.
and fix_opids processing to a single recursive pass over the plan tree
executed at the very tail end of planning, rather than haphazardly here
and there at different places. Now that tlist Vars do not get modified
until the very end, it's possible to get rid of the klugy var_equal and
match_varid partial-matching routines, and just use plain equal()
throughout the optimizer. This is a step towards allowing merge and
hash joins to be done on expressions instead of only Vars ...
sort order down into planner, instead of handling it only at the very top
level of the planner. This fixes many things. An explicit sort is now
avoided if there is a cheaper alternative (typically an indexscan) not
only for ORDER BY, but also for the internal sort of GROUP BY. It works
even when there is no other reason (such as a WHERE condition) to consider
the indexscan. It works for indexes on functions. It works for indexes
on functions, backwards. It's just so cool...
CAUTION: I have changed the representation of SortClause nodes, therefore
THIS UPDATE BREAKS STORED RULES. You will need to initdb.
store all ordering information in pathkeys lists (which are now lists of
lists of PathKeyItem nodes, not just lists of lists of vars). This was
a big win --- the code is smaller and IMHO more understandable than it
was, even though it handles more cases. I believe the node changes will
not force an initdb for anyone; planner nodes don't show up in stored
rules.
commuted (ie, the index var appears on the right). These are now handled
the same way as merge and hash join quals that need to be commuted: the
actual reversing of the clause only happens if we actually choose the path
and generate a plan from it. Furthermore, the clause is only reversed in
the 'indexqual' field of the plan, not in the 'indxqualorig' field. This
allows the clause to still be recognized and removed from qpquals of upper
level join plans. Also, simplify and generalize match_clause_to_indexkey;
now it recognizes binary-compatible indexes for join as well as restriction
clauses.
hashjoinable clause, not one path for a randomly-chosen element of each
set of clauses with the same join operator. That is, if you wrote
SELECT ... WHERE t1.f1 = t2.f2 and t1.f3 = t2.f4,
and both '=' ops were the same opcode (say, all four fields are int4),
then the system would either consider hashing on f1=f2 or on f3=f4,
but it would *not* consider both possibilities. Boo hiss.
Also, revise estimation of hashjoin costs to include a penalty when the
inner join var has a high disbursion --- ie, the most common value is
pretty common. This tends to lead to badly skewed hash bucket occupancy
and way more comparisons than you'd expect on average.
I imagine that the cost calculation still needs tweaking, but at least
it generates a more reasonable plan than before on George Young's example.
rels that the inner path needs to join to, but it was only checking for
the first one. Failure could only have been observed with an OR-clause
that mentions 3 or more tables, and then only if the bogus path was
actually selected as cheapest ...
optimizer rather than parser. This has many advantages, such as not
getting fooled by chance uses of operator names ~ and ~~ (the operators
are identified by OID now), and not creating useless comparison operations
in contexts where the comparisons will not actually be used as indexquals.
The new code also recognizes exact-match LIKE and regex patterns, and
produces an = indexqual instead of >= and <=.
This change does NOT fix the problem with non-ASCII locales: the code
still doesn't know how to generate an upper bound indexqual for non-ASCII
collation order. But it's no worse than before, just the same deficiency
in a different place...
Also, dike out loc_restrictinfo fields in Plan nodes. These were doing
nothing useful in the absence of 'expensive functions' optimization,
and they took a considerable amount of processing to fill in.
The only place it was being used was as temporary storage in indxpath.c,
and the logic was wrong: the same restrictinfo node could get chosen to
carry the info for two different joins. Right fix is to return a second
list of unjoined-relids parallel to the list of clause groups.
identified by Hiroshi (incorrect cost attributed to OR clauses
after multiple passes through set_rest_selec()). I think the code
was trying to allow selectivities of OR subclauses to be passed in
from outside, but noplace was actually passing any useful data, and
set_rest_selec() was passing wrong data.
Restructure representation of "indexqual" in IndexPath nodes so that
it is the same as for indxqual in completed IndexScan nodes: namely,
a toplevel list with an entry for each pass of the index scan, having
sublists that are implicitly-ANDed index qual conditions for that pass.
You don't want to know what the old representation was :-(
Improve documentation of OR-clause indexscan functions.
Remove useless 'notclause' field from RestrictInfo nodes. (This might
force an initdb for anyone who has stored rules containing RestrictInfos,
but I do not think that RestrictInfo ever appears in completed plans.)
remove optimizer's arbitrary limit on how large a join it will use hashing
for. (The limit was too large to prevent the problems we'd been seeing,
anyway...)
so remove them from MergeJoin node. Hack together a partial
solution for commuted mergejoin operators --- yesterday
a mergejoin int4 = int8 would crash if the planner decided to
commute it, today it works. The planner's representation of
mergejoins really needs a rewrite though.
Also, further testing of mergejoin ops in opr_sanity regress test.
Ok. I made patches replacing all of "#if FALSE" or "#if 0" to "#ifdef
NOT_USED" for current. I have tested these patches in that the
postgres binaries are identical.
patch is applied:
Rewrite rules on relation level work fine now.
Event qualifications on insert/update/delete rules work
fine now.
I added the new keyword OLD to reference the CURRENT
tuple. CURRENT will be removed in 6.5.
Update rules can reference NEW and OLD in the rule
qualification and the actions.
Insert/update/delete rules on views can be established to
let them behave like real tables.
For insert/update/delete rules multiple actions are
supported now. The actions can also be surrounded by
parantheses to make psql happy. Multiple actions are
required if update to a view requires updates to multiple
tables.
Regular users are permitted to create/drop rules on
tables they have RULE permissions for
(DefineQueryRewrite() is now able to get around the
access restrictions on pg_rewrite). This enables view
creation for regular users too. This required an extra
boolean parameter to pg_parse_and_plan() that tells to
set skipAcl on all rangetable entries of the resulting
queries. There is a new function
pg_exec_query_acl_override() that could be used by
backend utilities to use this facility.
All rule actions (not only views) inherit the permissions
of the event relations owner. Sample: User A creates
tables T1 and T2, creates rules that log
INSERT/UPDATE/DELETE on T1 in T2 (like in the regression
tests for rules I created) and grants ALL but RULE on T1
to user B. User B can now fully access T1 and the
logging happens in T2. But user B cannot access T2 at
all, only the rule actions can. And due to missing RULE
permissions on T1, user B cannot disable logging.
Rules on the attribute level are disabled (they don't
work properly and since regular users are now permitted
to create rules I decided to disable them).
Rules on select must have exactly one action that is a
select (so select rules must be a view definition).
UPDATE NEW/OLD rules are disabled (still broken, but
triggers can do it).
There are two new system views (pg_rule and pg_view) that
show the definition of the rules or views so the db admin
can see what the users do. They use two new functions
pg_get_ruledef() and pg_get_viewdef() that are builtins.
The functions pg_get_ruledef() and pg_get_viewdef() could
be used to implement rule and view support in pg_dump.
PostgreSQL is now the only database system I know, that
has rewrite rules on the query level. All others (where I
found a rule statement at all) use stored database
procedures or the like (triggers as we call them) for
active rules (as some call them).
Future of the rule system:
The now disabled parts of the rule system (attribute
level, multiple actions on select and update new stuff)
require a complete new rewrite handler from scratch. The
old one is too badly wired up.
After 6.4 I'll start to work on a new rewrite handler,
that fully supports the attribute level rules, multiple
actions on select and update new. This will be available
for 6.5 so we get full rewrite rule capabilities.
Jan
no longer returns buffer pointer, can be gotten from scan;
descriptor; bootstrap can create multi-key indexes;
pg_procname index now is multi-key index; oidint2, oidint4, oidname
are gone (must be removed from regression tests); use System Cache
rather than sequential scan in many places; heap_modifytuple no
longer takes buffer parameter; remove unused buffer parameter in
a few other functions; oid8 is not index-able; remove some use of
single-character variable names; cleanup Buffer variables usage
and scan descriptor looping; cleaned up allocation and freeing of
tuples; 18k lines of diff;
indices for restriction clauses containing a constant.
Note that if an index does not match directly (usually because the types
on both side of the clause don't match), and if a binary-compatible index
is identified, then the operator function will be replaced by a new
one. Should not be a problem, but be sure that if types are listed as
being binary compatible (in parse_coerce.h) then the comparison functions
are also binary-compatible, giving equivalent results.
1. Removes the unnecessary "#define AbcRegProcedure 123"'s from
pg_proc.h.
2. Changes those #defines to use the names already defined in
fmgr.h.
3. Forces the make of fmgr.h in backend/Makefile instead of having
it
made as a dependency in access/common/Makefile *hack*hack*hack*
4. Rearranged the #includes to a less helter-skelter arrangement,
also
changing <file.h> to "file.h" to signify a non-system header.
5. Removed "pg_proc.h" from files where its only purpose was for
the
#defines removed in item #1.
6. Added "fmgr.h" to each file changed for completeness sake.
Turns out that #6 was not necessary for some files because fmgr.h
was being included in a roundabout way SIX levels deep by the first
include.
"access/genam.h"
->"access/relscan.h"
->"utils/rel.h"
->"access/strat.h"
->"access/skey.h"
->"fmgr.h"
So adding fmgr.h really didn't add anything to the compile, hopefully
just made it clearer to the programmer.
S Darren.
Attached you'll find a (big) patch that fixes make dep and make
depend in all Makefiles where I found it to be appropriate.
It also removes the dependency in Makefile.global for NAMEDATALEN
and OIDNAMELEN by making backend/catalog/genbki.sh and bin/initdb/initdb.sh
a little smarter.
This no longer requires initdb.sh that is turned into initdb with
a sed script when installing Postgres, hence initdb.sh should be
renamed to initdb (after the patch has been applied :-) )
This patch is against the 6.3 sources, as it took a while to
complete.
Please review and apply,
Cheers,
Jeroen van Vianen
yyerror ones from bison. It also includes a few 'enhancements' to
the C programming style (which are, of course, personal).
The other patch removes the compilation of backend/lib/qsort.c, as
qsort() is a standard function in stdlib.h and can be used any
where else (and it is). It was only used in
backend/optimizer/geqo/geqo_pool.c, backend/optimizer/path/predmig.c,
and backend/storage/page/bufpage.c
> > Some or all of these changes might not be appropriate for v6.3,
since we > > are in beta testing and since they do not affect the
current functionality. > > For those cases, how about submitting
patches based on the final v6.3 > > release?
There's more to come. Please review these patches. I ran the
regression tests and they only failed where this was expected
(random, geo, etc).
Cheers,
Jeroen
==========================================
What follows is a set of diffs that cleans up the usage of BLCKSZ.
As a side effect, the person compiling the code can change the
value of BLCKSZ _at_their_own_risk_. By that, I mean that I've
tried it here at 4096 and 16384 with no ill-effects. A value
of 4096 _shouldn't_ affect much as far as the kernel/file system
goes, but making it bigger than 8192 can have severe consequences
if you don't know what you're doing. 16394 worked for me, _BUT_
when I went to 32768 and did an initdb, the SCSI driver broke and
the partition that I was running under went to hell in a hand
basket. Had to reboot and do a good bit of fsck'ing to fix things up.
The patch can be safely applied though. Just leave BLCKSZ = 8192
and everything is as before. It basically only cleans up all of the
references to BLCKSZ in the code.
If this patch is applied, a comment in the config.h file though above
the BLCKSZ define with warning about monkeying around with it would
be a good idea.
Darren darrenk@insightdist.com
(Also cleans up some of the #includes in files referencing BLCKSZ.)
==========================================
Makefile.global.
End result, if all goes well, should allow for much easier porting, since
there will no longer be a concept of a "port". Most, if not everything,
*should* be determined by configure, or by the compiler itself. Still
work to be done though :)
for join-relations. Sizes already computed by
prune_rel_paths():compute_joinrel_size().
joinrels.c:
< if ( _use_right_sided_plans_ )
---
> if ( _use_right_sided_plans_ &&
> length (outer_rel->relids) > 1 )
- r_plans are useful when outer_rel is join-relation... It
decreases the size of search space...
2. PageWeights are variables now.
3. Fixed using ceil((double)selec*indextuples) as estimation
for expected heap pages: ceil((double)selec*relpages) now.
Subject: [HACKERS] linux/alpha patches
These patches lay the groundwork for a Linux/Alpha port. The port doesn't
actually work unless you tweak the linker to put all the pointers in the
first 32 bits of the address space, but it's at least a start. It
implements the test-and-set instruction in Alpha assembly, and also fixes
a lot of pointer-to-integer conversions, which is probably good anyway.
2. IndexScanableOperand now uses match_indexkey_operand
instead of equal_indexkey_var (if we have some index on attribute X
then we shouldn't use it for 'where some_func(X) OP CONST').
/usr/include/limits.h (which quiets the costsize.c warnings)...under
FreeBSD, /usr/include/limits.h *includes* machine/limits.h, while under
Solaris, there is no such things as /usr/include/machine...
Problem with Solaris pointed out by Mark Wahl
The problem is that the function arguments are not considered as possible key
candidates for index scan and so only a sequential scan is possible inside
the body of a function. I have therefore made some patches to the optimizer
so that indices are now used also by functions. I have also moved the plan
debug message from pg_eval to pg_plan so that it is printed also for plans
genereated for function execution. I had also to add an index rescan to the
executor because it ignored the parameters set in the execution state, they
were flagged as runtime variables in ExecInitIndexScan but then never used
by the executor so that the scan were always done with any key=1. Very odd.
This means that an index rescan is now done twice for each function execution
which uses an index, the first time when the index scan is initialized and
the second when the actual function arguments are finally available for the
execution. I don't know what is the cost of an double index scan but I
suppose it is anyway less than the cost of a full sequential scan, at leat
for large tables. This is my patch, you must also add -DINDEXSCAN_PATCH in
Makefile.global to enable the changes.
Submitted by: Massimo Dal Zotto <dz@cs.unitn.it>
Select queries with an isnull or notnull clause, like "select * where
somefield isnull", crash the backend if the table has at least one index.
If the indices are deleted the queries work again. Also the explain
command fail in the same way.
The is caused by a bug in subroutine of the optimizer which doesn't check
null values in the clauses.
Submitted by: Massimo Dal Zotto <dz@cs.unitn.it>