Commit Graph

393 Commits

Author SHA1 Message Date
Alvaro Herrera b93f5a5673 Move Trigger and TriggerDesc structs out of rel.h into a new reltrigger.h
This lets us stop including rel.h into execnodes.h, which is a widely
used header.
2011-07-04 14:35:58 -04:00
Bruce Momjian 6560407c7d Pgindent run before 9.1 beta2. 2011-06-09 14:32:50 -04:00
Tom Lane 92647fc4b9 Avoid possible divide-by-zero in gincostestimate.
Per report from Jeff Janes.
2011-04-21 19:28:36 -04:00
Tom Lane d64713df7e Pass collations to functions in FunctionCallInfoData, not FmgrInfo.
Since collation is effectively an argument, not a property of the function,
FmgrInfo is really the wrong place for it; and this becomes critical in
cases where a cached FmgrInfo is used for varying purposes that might need
different collation settings.  Fix by passing it in FunctionCallInfoData
instead.  In particular this allows a clean fix for bug #5970 (record_cmp
not working).  This requires touching a bit more code than the original
method, but nobody ever thought that collations would not be an invasive
patch...
2011-04-12 19:19:24 -04:00
Tom Lane 3f5d2fe302 Be more wary of missing statistics in eqjoinsel_semi().
In particular, if we don't have real ndistinct estimates for both sides,
fall back to assuming that half of the left-hand rows have join partners.
This is what was done in 8.2 and 8.3 (cf nulltestsel() in those versions).
It's pretty stupid but it won't lead us to think that an antijoin produces
no rows out, as seen in recent example from Uwe Schroeder.
2011-04-12 01:59:34 -04:00
Peter Eisentraut 5caa3479c2 Clean up most -Wunused-but-set-variable warnings from gcc 4.6
This warning is new in gcc 4.6 and part of -Wall.  This patch cleans
up most of the noise, but there are some still warnings that are
trickier to remove.
2011-04-11 22:28:45 +03:00
Tom Lane 3c381a55b0 Teach pattern_fixed_prefix() about collations.
This is necessary, not optional, now that ILIKE and regexes are collation
aware --- else we might derive a wrong comparison constant for index
optimized pattern matches.
2011-04-11 12:28:28 -04:00
Bruce Momjian bf50caf105 pgindent run before PG 9.1 beta 1. 2011-04-10 11:42:00 -04:00
Tom Lane 466dac8656 Fix make_greater_string to not have an undocumented collation assumption.
The previous coding worked only if ltproc->fn_collation was always either
DEFAULT_COLLATION_OID or a C-compatible locale.  While that's true at the
moment, it wasn't documented (and in fact wasn't true when this code was
committed...).  But it only takes a couple more lines to make its internal
caching behavior locale-aware, so let's do that.
2011-04-08 17:40:20 -04:00
Tom Lane 7208fae18f Clean up cruft around collation initialization for tupdescs and scankeys.
I found actual bugs in GiST and plpgsql; the rest of this is cosmetic
but meant to decrease the odds of future bugs of omission.
2011-03-26 18:28:40 -04:00
Tom Lane bfa4440ca5 Pass collation to makeConst() instead of looking it up internally.
In nearly all cases, the caller already knows the correct collation, and
in a number of places, the value the caller has handy is more correct than
the default for the type would be.  (In particular, this patch makes it
significantly less likely that eval_const_expressions will result in
changing the exposed collation of an expression.)  So an internal lookup
is both expensive and wrong.
2011-03-25 20:10:42 -04:00
Tom Lane b310b6e31c Revise collation derivation method and expression-tree representation.
All expression nodes now have an explicit output-collation field, unless
they are known to only return a noncollatable data type (such as boolean
or record).  Also, nodes that can invoke collation-aware functions store
a separate field that is the collation value to pass to the function.
This avoids confusion that arises when a function has collatable inputs
and noncollatable output type, or vice versa.

Also, replace the parser's on-the-fly collation assignment method with
a post-pass over the completed expression tree.  This allows us to use
a more complex (and hopefully more nearly spec-compliant) assignment
rule without paying for it in extra storage in every expression node.

Fix assorted bugs in the planner's handling of collations by making
collation one of the defining properties of an EquivalenceClass and
by converting CollateExprs into discardable RelabelType nodes during
expression preprocessing.
2011-03-19 20:30:08 -04:00
Tom Lane 696d1f7f06 Make all comparisons done for/with statistics use the default collation.
While this will give wrong answers when estimating selectivity for a
comparison operator that's using a non-default collation, the estimation
error probably won't be large; and anyway the former approach created
estimation errors of its own by trying to use a histogram that might have
been computed with some other collation.  So we'll adopt this simplified
approach for now and perhaps improve it sometime in the future.

This patch incorporates changes from Andres Freund to make sure that
selfuncs.c passes a valid collation OID to any datatype-specific function
it calls, in case that function wants collation information.  Said OID will
now always be DEFAULT_COLLATION_OID, but at least we won't get errors.
2011-03-12 16:30:36 -05:00
Tom Lane a2095f7fb5 Fix bogus test for hypothetical indexes in get_actual_variable_range().
That function was supposing that indexoid == 0 for a hypothetical index,
but that is not likely to be true in any non-toy implementation of an index
adviser, since assigning a fake OID is the only way to know at EXPLAIN time
which hypothetical index got selected.  Fix by adding a flag to
IndexOptInfo to mark hypothetical indexes.  Back-patch to 9.0 where
get_actual_variable_range() was added.

Gurjeet Singh
2011-02-16 19:24:45 -05:00
Peter Eisentraut 414c5a2ea6 Per-column collation support
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.

Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
2011-02-08 23:04:18 +02:00
Tom Lane 4d1b76e49e Fix up gincostestimate for new extractQuery API.
The only reason this wasn't crashing while testing the core anyarray
operators was that it was disabled for those cases because of passing the
wrong type information to get_opfamily_proc :-(.  So fix that too, and make
it insist on finding the support proc --- in hindsight, silently doing
nothing is not as sane a coping mechanism as all that.
2011-01-08 20:26:13 -05:00
Bruce Momjian 5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Tom Lane f2ba1e994c Avoid unexpected conversion overflow in planner for distant date values.
The "date" type supports a wider range of dates than int64 timestamps do.
However, there is pre-int64-timestamp code in the planner that assumes that
all date values can be converted to timestamp with impunity.  Fortunately,
what we really need out of the conversion is always a double (float8)
value; so even when the date is out of timestamp's range it's possible to
produce a sane answer.  All we need is a code path that doesn't try to
force the result into int64.  Per trouble report from David Rericha.

Back-patch to all supported versions.  Although this is surely a corner
case, there's not much point in advertising a date range wider than
timestamp's if we will choke on such values in unexpected places.
2010-12-28 22:49:57 -05:00
Tom Lane d583f10b7e Create core infrastructure for KNNGIST.
This is a heavily revised version of builtin_knngist_core-0.9.  The
ordering operators are no longer mixed in with actual quals, which would
have confused not only humans but significant parts of the planner.
Instead, ordering operators are carried separately throughout planning and
execution.

Since the API for ambeginscan and amrescan functions had to be changed
anyway, this commit takes the opportunity to rationalize that a bit.
RelationGetIndexScan no longer forces a premature index_rescan call;
instead, callers of index_beginscan must call index_rescan too.  Aside from
making the AM-side initialization logic a bit less peculiar, this has the
advantage that we do not make a useless extra am_rescan call when there are
runtime key values.  AMs formerly could not assume that the key values
passed to amrescan were actually valid; now they can.

Teodor Sigaev and Tom Lane
2010-12-02 20:51:37 -05:00
Tom Lane c0b5fac701 Simplify and speed up mapping of index opfamilies to pathkeys.
Formerly we looked up the operators associated with each index (caching
them in relcache) and then the planner looked up the btree opfamily
containing such operators in order to build the btree-centric pathkey
representation that describes the index's sort order.  This is quite
pointless for btree indexes: we might as well just use the index's opfamily
information directly.  That saves syscache lookup cycles during planning,
and furthermore allows us to eliminate the relcache's caching of operators
altogether, which may help in reducing backend startup time.

I added code to plancat.c to perform the same type of double lookup
on-the-fly if it's ever faced with a non-btree amcanorder index AM.
If such a thing actually becomes interesting for production, we should
replace that logic with some more-direct method for identifying the
corresponding btree opfamily; but it's not worth spending effort on now.

There is considerably more to do pursuant to my recent proposal to get rid
of sort-operator-based representations of sort orderings, but this patch
grabs some of the low-hanging fruit.  I'll look at the remainder of that
work after the current commitfest.
2010-11-29 12:30:43 -05:00
Tom Lane 529cb267a6 Improve handling of domains over arrays.
This patch eliminates various bizarre behaviors caused by sloppy thinking
about the difference between a domain type and its underlying array type.
In particular, the operation of updating one element of such an array
has to be considered as yielding a value of the underlying array type,
*not* a value of the domain, because there's no assurance that the
domain's CHECK constraints are still satisfied.  If we're intending to
store the result back into a domain column, we have to re-cast to the
domain type so that constraints are re-checked.

For similar reasons, such a domain can't be blindly matched to an ANYARRAY
polymorphic parameter, because the polymorphic function is likely to apply
array-ish operations that could invalidate the domain constraints.  For the
moment, we just forbid such matching.  We might later wish to insert an
automatic downcast to the underlying array type, but such a change should
also change matching of domains to ANYELEMENT for consistency.

To ensure that all such logic is rechecked, this patch removes the original
hack of setting a domain's pg_type.typelem field to match its base type;
the typelem will always be zero instead.  In those places where it's really
okay to look through the domain type with no other logic changes, use the
newly added get_base_element_type function in place of get_element_type.
catversion bumped due to change in pg_type contents.

Per bug #5717 from Richard Huxton and subsequent discussion.
2010-10-21 16:07:17 -04:00
Tom Lane 48c7d9f6ff Improve GIN indexscan cost estimation.
The better estimate requires more statistics than we previously stored:
in particular, counts of "entry" versus "data" pages within the index,
as well as knowledge of the number of distinct key values.  We collect
this information during initial index build and update it during VACUUM,
storing the info in new fields on the index metapage.  No initdb is
required because these fields will read as zeroes in a pre-existing
index, and the new gincostestimate code is coded to behave (reasonably)
sanely if they are zeroes.

Teodor Sigaev, reviewed by Jan Urbanski, Tom Lane, and Itagaki Takahiro.
2010-10-17 20:52:32 -04:00
Magnus Hagander 9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Bruce Momjian 65e806cba1 pgindent run for 9.0 2010-02-26 02:01:40 +00:00
Robert Haas e26c539e9f Wrap calls to SearchSysCache and related functions using macros.
The purpose of this change is to eliminate the need for every caller
of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
GetSysCacheOid, and SearchSysCacheList to know the maximum number
of allowable keys for a syscache entry (currently 4).  This will
make it far easier to increase the maximum number of keys in a
future release should we choose to do so, and it makes the code
shorter, too.

Design and review by Tom Lane.
2010-02-14 18:42:19 +00:00
Robert Haas d86d51a958 Support ALTER TABLESPACE name SET/RESET ( tablespace_options ).
This patch only supports seq_page_cost and random_page_cost as parameters,
but it provides the infrastructure to scalably support many more.
In particular, we may want to add support for effective_io_concurrency,
but I'm leaving that as future work for now.

Thanks to Tom Lane for design help and Alvaro Herrera for the review.
2010-01-05 21:54:00 +00:00
Tom Lane 40608e7f94 When estimating the selectivity of an inequality "column > constant" or
"column < constant", and the comparison value is in the first or last
histogram bin or outside the histogram entirely, try to fetch the actual
column min or max value using an index scan (if there is an index on the
column).  If successful, replace the lower or upper histogram bound with
that value before carrying on with the estimate.  This limits the
estimation error caused by moving min/max values when the comparison
value is close to the min or max.  Per a complaint from Josh Berkus.

It is tempting to consider using this mechanism for mergejoinscansel as well,
but that would inject index fetches into main-line join estimation not just
endpoint cases.  I'm refraining from that until we can get a better handle
on the costs of doing this type of lookup.
2010-01-04 02:44:40 +00:00
Bruce Momjian 0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Tom Lane 29c4ad9829 Support "x IS NOT NULL" clauses as indexscan conditions. This turns out
to be just a minor extension of the previous patch that made "x IS NULL"
indexable, because we can treat the IS NOT NULL condition as if it were
"x < NULL" or "x > NULL" (depending on the index's NULLS FIRST/LAST option),
just like IS NULL is treated like "x = NULL".  Aside from any possible
usefulness in its own right, this is an important improvement for
index-optimized MAX/MIN aggregates: it is now reliably possible to get
a column's min or max value cheaply, even when there are a lot of nulls
cluttering the interesting end of the index.
2010-01-01 21:53:49 +00:00
Tom Lane 649b5ec7c8 Add the ability to store inheritance-tree statistics in pg_statistic,
and teach ANALYZE to compute such stats for tables that have subclasses.
Per my proposal of yesterday.

autovacuum still needs to be taught about running ANALYZE on parent tables
when their subclasses change, but the feature is useful even without that.
2009-12-29 20:11:45 +00:00
Tom Lane ab61df9e52 Remove regex_flavor GUC, so that regular expressions are always "advanced"
style by default.  Per discussion, there seems to be hardly anything that
really relies on being able to change the regex flavor, so the ability to
select it via embedded options ought to be enough for any stragglers.
Also, if we didn't remove the GUC, we'd really be morally obligated to
mark the regex functions non-immutable, which'd possibly create performance
issues.
2009-10-21 20:38:58 +00:00
Tom Lane a2a8c7a662 Support hex-string input and output for type BYTEA.
Both hex format and the traditional "escape" format are automatically
handled on input.  The output format is selected by the new GUC variable
bytea_output.

As committed, bytea_output defaults to HEX, which is an *incompatible
change*.  We will keep it this way for awhile for testing purposes, but
should consider whether to switch to the more backwards-compatible
default of ESCAPE before 8.5 is released.

Peter Eisentraut
2009-08-04 16:08:37 +00:00
Bruce Momjian d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Tom Lane 1d97c19a0f Fix estimate_num_groups() to not fail on PlaceHolderVars, per report from
Stefan Kaltenbrunner.  The most reasonable behavior (at least for the near
term) seems to be to ignore the PlaceHolderVar and examine its argument
instead.  In support of this, change the API of pull_var_clause() to allow
callers to request recursion into PlaceHolderVars.  Currently
estimate_num_groups() is the only customer for that behavior, but where
there's one there may be others.
2009-04-19 19:46:33 +00:00
Tom Lane ce6e31de9c Teach the planner to treat a partial unique index as proving a variable is
unique for a particular query, if the index predicate is satisfied.  This
requires a bit of reordering of operations so that we check the predicates
before doing any selectivity estimates, but shouldn't really cause any
noticeable slowdown.  Per a comment from Michal Politowski.
2009-02-15 20:16:21 +00:00
Bruce Momjian 511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
Tom Lane 7f3eba30c9 When estimating without benefit of MCV lists (suggesting that one or both
inputs is unique or nearly so), make eqjoinsel() clamp the ndistinct estimates
to be not more than the estimated number of rows coming from the input
relations.  This allows the estimate to change in response to the selectivity
of restriction conditions on the inputs.

This is a pretty narrow patch and maybe we should be more aggressive about
similarly clamping ndistinct in other cases; but I'm worried about
double-counting the effects of the restriction conditions.  However, it seems
to help for the case exhibited by Grzegorz Jaskiewicz (antijoin against a
small subset of a relation), so let's try this for awhile.
2008-10-23 00:24:50 +00:00
Tom Lane e6ae3b5dbf Add a concept of "placeholder" variables to the planner. These are variables
that represent some expression that we desire to compute below the top level
of the plan, and then let that value "bubble up" as though it were a plain
Var (ie, a column value).

The immediate application is to allow sub-selects to be flattened even when
they are below an outer join and have non-nullable output expressions.
Formerly we couldn't flatten because such an expression wouldn't properly
go to NULL when evaluated above the outer join.  Now, we wrap it in a
PlaceHolderVar and arrange for the actual evaluation to occur below the outer
join.  When the resulting Var bubbles up through the join, it will be set to
NULL if necessary, yielding the correct results.  This fixes a planner
limitation that's existed since 7.1.

In future we might want to use this mechanism to re-introduce some form of
Hellerstein's "expensive functions" optimization, ie place the evaluation of
an expensive function at the most suitable point in the plan tree.
2008-10-21 20:42:53 +00:00
Tom Lane 2dbc0ca937 Dept of second thoughts: let's make sure that get_index_stats_hook is only
applied to expression indexes, not to plain relations.  The original coding
in btcostestimate conflated the two cases, but it's not hard to use
get_relation_stats_hook instead when we're looking to the underlying relation.
2008-09-28 20:42:12 +00:00
Tom Lane 7b7df9f0b1 Add hooks to let plugins override the planner's lookups in pg_statistic.
Simon Riggs, with some editorialization by me.
2008-09-28 19:51:40 +00:00
Tom Lane e5536e77a5 Move exprType(), exprTypmod(), expression_tree_walker(), and related routines
into nodes/nodeFuncs, so as to reduce wanton cross-subsystem #includes inside
the backend.  There's probably more that should be done along this line,
but this is a start anyway.
2008-08-25 22:42:34 +00:00
Tom Lane d4af2a6481 Clean up the loose ends in selectivity estimation left by my patch for semi
and anti joins.  To do this, pass the SpecialJoinInfo struct for the current
join as an additional optional argument to operator join selectivity
estimation functions.  This allows the estimator to tell not only what kind
of join is being formed, but which variable is on which side of the join;
a requirement long recognized but not dealt with till now.  This also leaves
the door open for future improvements in the estimators, such as accounting
for the null-insertion effects of lower outer joins.  I didn't do anything
about that in the current patch but the information is in principle deducible
from what's passed.

The patch also clarifies the definition of join selectivity for semi/anti
joins: it's the fraction of the left input that has (at least one) match
in the right input.  This allows getting rid of some very fuzzy thinking
that I had committed in the original 7.4-era IN-optimization patch.
There's probably room to estimate this better than the present patch does,
but at least we know what to estimate.

Since I had to touch CREATE OPERATOR anyway to allow a variant signature
for join estimator functions, I took the opportunity to add a couple of
additional checks that were missing, per my recent message to -hackers:
* Check that estimator functions return float8;
* Require execute permission at the time of CREATE OPERATOR on the
operator's function as well as the estimator functions;
* Require ownership of any pre-existing operator that's modified by
the command.
I also moved the lookup of the functions out of OperatorCreate() and
into operatorcmds.c, since that seemed more consistent with most of
the other catalog object creation processes, eg CREATE TYPE.
2008-08-16 00:01:38 +00:00
Tom Lane e006a24ad1 Implement SEMI and ANTI joins in the planner and executor. (Semijoins replace
the old JOIN_IN code, but antijoins are new functionality.)  Teach the planner
to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
joins respectively.  Also, LEFT JOINs with suitable upper-level IS NULL
filters are recognized as being anti joins.  Unify the InClauseInfo and
OuterJoinInfo infrastructure into "SpecialJoinInfo".  With that change,
it becomes possible to associate a SpecialJoinInfo with every join attempt,
which permits some cleanup of join selectivity estimation.  That needs to be
taken much further than this patch does, but the next step is to change the
API for oprjoin selectivity functions, which seems like material for a
separate patch.  So for the moment the output size estimates for semi and
especially anti joins are quite bogus.
2008-08-14 18:48:00 +00:00
Tom Lane 170063cd1e Fix estimate_num_groups() to assume that GROUP BY expressions yielding boolean
results always contribute two groups, regardless of the expression contents.
This is very substantially more accurate than the regular heuristic for
certain boolean tests like "col IS NULL".  Per gripe from Sam Mason.

Back-patch to all supported releases, since the behavior of
estimate_num_groups() hasn't changed all that much since 7.4.
2008-07-07 20:24:55 +00:00
Alvaro Herrera f8c4d7db60 Restructure some header files a bit, in particular heapam.h, by removing some
unnecessary #include lines in it.  Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.

For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.

While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
2008-05-12 00:00:54 +00:00
Tom Lane 226837e57e Since createplan.c no longer cares whether index operators are lossy, it has
no particular need to do get_op_opfamily_properties() while building an
indexscan plan.  Postpone that lookup until executor start.  This simplifies
createplan.c a lot more than it complicates nodeIndexscan.c, and makes things
more uniform since we already had to do it that way for RowCompare
expressions.  Should be a bit faster too, at least for plans that aren't
re-used many times, since we avoid palloc'ing and perhaps copying the
intermediate list data structure.
2008-04-13 20:51:21 +00:00
Tom Lane 220db7ccd8 Simplify and standardize conversions between TEXT datums and ordinary C
strings.  This patch introduces four support functions cstring_to_text,
cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and
two macros CStringGetTextDatum and TextDatumGetCString.  A number of
existing macros that provided variants on these themes were removed.

Most of the places that need to make such conversions now require just one
function or macro call, in place of the multiple notational layers that used
to be needed.  There are no longer any direct calls of textout or textin,
and we got most of the places that were using handmade conversions via
memcpy (there may be a few still lurking, though).

This commit doesn't make any serious effort to eliminate transient memory
leaks caused by detoasting toasted text objects before they reach
text_to_cstring.  We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few
places where it was easy, but much more could be done.

Brendan Jurd and Tom Lane
2008-03-25 22:42:46 +00:00
Tom Lane 164899db1c Revert thinko introduced into prefix_selectivity() by my recent patch:
make_greater_string needs the < procedure not the >= one.  Spotted by
Peter.
2008-03-17 17:13:54 +00:00
Tom Lane f4230d2937 Change patternsel() so that instead of switching from a pure
pattern-examination heuristic method to purely histogram-driven selectivity at
histogram size 100, we compute both estimates and use a weighted average.
The weight put on the heuristic estimate decreases linearly with histogram
size, dropping to zero for 100 or more histogram entries.
Likewise in ltreeparentsel().  After a patch by Greg Stark, though I
reorganized the logic a bit to give the caller of histogram_selectivity()
more control.
2008-03-09 00:32:09 +00:00
Tom Lane 422495d0da Modify prefix_selectivity() so that it will never estimate the selectivity
of the generated range condition var >= 'foo' AND var < 'fop' as being less
than what eqsel() would estimate for var = 'foo'.  This is intuitively
reasonable and it gets rid of the need for some entirely ad-hoc coding we
formerly used to reject bogus estimates.  The basic problem here is that
if the prefix is more than a few characters long, the two boundary values
are too close together to be distinguishable by comparison to the column
histogram, resulting in a selectivity estimate of zero, which is often
not very sane.  Change motivated by an example from Peter Eisentraut.

Arguably this is a bug fix, but I'll refrain from back-patching it
for the moment.
2008-03-08 22:41:38 +00:00
Bruce Momjian 9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
Tom Lane 9fd8843647 Fix mergejoin cost estimation so that we consider the statistical ranges of
the two join variables at both ends: not only trailing rows that need not be
scanned because there cannot be a match on the other side, but initial rows
that will be scanned without possibly having a match.  This allows a more
realistic estimate of startup cost to be made, per recent pgsql-performance
discussion.  In passing, fix a couple of bugs that had crept into
mergejoinscansel: it was not quite up to speed for the task of estimating
descending-order scans, which is a new requirement in 8.3.
2007-12-08 21:05:11 +00:00
Bruce Momjian f6e8730d11 Re-run pgindent with updated list of typedefs. (Updated README should
avoid this problem in the future.)
2007-11-15 22:25:18 +00:00
Bruce Momjian fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
Tom Lane a96fa85025 Second pass at improving LIKE/regex estimation in non-C locales. It turns
out that it's actually quite likely that a string that is an extension of
the given prefix will sort as larger than the "greater" string our previous
code created.  To provide some defense against that, do the comparisons
against a modified string instead of just the bare prefix.  We tack on
"Z", "z", "y", or "9", whichever is seen as largest in the current locale.
Testing suggests that this is sufficient at least for cases involving
ASCII data.
2007-11-09 20:10:02 +00:00
Tom Lane 2de946be6a Improve the performance of LIKE/regex estimation in non-C locales, by making
make_greater_string() try harder to generate a string that's actually greater
than its input string.  Before we just assumed that making a string that was
memcmp-greater was enough, but it is easy to generate examples where this is
not so when the locale is not C.  Instead, loop until the relevant comparison
function agrees that the generated string is greater than the input.

Unfortunately this is probably not enough to guarantee that the generated
string is greater than all extensions of the input, so we cannot relax the
restriction to C locale for the LIKE/regex index optimization.  But it should
at least improve the odds of getting a useful selectivity estimate in
prefix_selectivity().  Per example from Guillaume Smet.

Backpatch to 8.1, mainly because that's what the complainant is using...
2007-11-07 22:37:24 +00:00
Tom Lane 9542287123 Fix patternsel() and callers to do the right thing for NOT LIKE and the other
negated-match operators.  patternsel had been using the supplied operator as
though it were a positive-match operator, and thus obtaining a wrong result,
which was even more wrong after the caller subtracted it from 1.  Seems
cleanest to give patternsel an explicit "negate" argument so that it knows
what's going on.  Also install the same factorization scheme for pattern
join selectivity estimators; even though they are just stubs at the
moment, this may keep someone from making the same type of mistake when
they get filled out.  Per report from Greg Mullane.

Backpatch to 8.2 --- previous releases do not show the problem because
patternsel() doesn't actually use the operator directly.
2007-11-07 21:00:37 +00:00
Tom Lane 0ee5a39862 Apply a band-aid fix for the problem that 8.2 and up completely misestimate
the number of rows likely to be produced by a query such as
	SELECT * FROM t1 LEFT JOIN t2 USING (key) WHERE t2.key IS NULL;
What this is doing is selecting for t1 rows with no match in t2, and thus
it may produce a significant number of rows even if the t2.key table column
contains no nulls at all.  8.2 thinks the table column's null fraction is
relevant and thus may estimate no rows out, which results in terrible plans
if there are more joins above this one.  A proper fix for this will involve
passing much more information about the context of a clause to the selectivity
estimator functions than we ever have.  There's no time left to write such a
patch for 8.3, and it wouldn't be back-patchable into 8.2 anyway.  Instead,
put in an ad-hoc test to defeat the normal table-stats-based estimation when
an IS NULL test is evaluated at an outer join, and just use a constant
estimate instead --- I went with 0.5 for lack of a better idea.  This won't
catch every case but it will catch the typical ways of writing such queries,
and it seems unlikely to make things worse for other queries.
2007-08-31 23:35:22 +00:00
Tom Lane 140d4ebcb4 Tsearch2 functionality migrates to core. The bulk of this work is by
Oleg Bartunov and Teodor Sigaev, but I did a lot of editorializing,
so anything that's broken is probably my fault.

Documentation is nonexistent as yet, but let's land the patch so we can
get some portability testing done.
2007-08-21 01:11:32 +00:00
Magnus Hagander 343a9a27a9 Check return code from strxfrm on Windows since it has a
non-standard way of indicating errors, so we don't try to
allocate INT_MAX bytes to store a result in.
2007-05-05 17:05:48 +00:00
Tom Lane afcf09dd90 Some further performance tweaks for planning large inheritance trees that
are mostly excluded by constraints: do the CE test a bit earlier to save
some adjust_appendrel_attrs() work on excluded children, and arrange to
use array indexing rather than rt_fetch() to fetch RTEs in the main body
of the planner.  The latter is something I'd wanted to do for awhile anyway,
but seeing list_nth_cell() as 35% of the runtime gets one's attention.
2007-04-21 21:01:45 +00:00
Tom Lane f02a82b6ad Make 'col IS NULL' clauses be indexable conditions.
Teodor Sigaev, with some kibitzing from Tom Lane.
2007-04-06 22:33:43 +00:00
Tom Lane bf94076348 Fix array coercion expressions to ensure that the correct volatility is
seen by code inspecting the expression.  The best way to do this seems
to be to drop the original representation as a function invocation, and
instead make a special expression node type that represents applying
the element-type coercion function to each array element.  In this way
the element function is exposed and will be checked for volatility.
Per report from Guillaume Smet.
2007-03-27 23:21:12 +00:00
Tom Lane 54d20024c1 Fix some problems with selectivity estimation for partial indexes.
First, genericcostestimate() was being way too liberal about including
partial-index conditions in its selectivity estimate, resulting in
substantial underestimates for situations such as an indexqual "x = 42"
used with an index on x "WHERE x >= 40 AND x < 50".  While the code is
intentionally set up to favor selecting partial indexes when available,
this was too much...

Second, choose_bitmap_and() was likewise easily fooled by cases of this
type, since it would similarly think that the partial index had selectivity
independent of the indexqual.

Fixed by using predicate_implied_by() rather than simple equality checks
to determine redundancy.  This is a good deal more expensive but I don't
see much alternative.  At least the extra cost is only paid when there's
actually a partial index under consideration.

Per report from Jeff Davis.  I'm not going to risk back-patching this,
though.
2007-03-21 22:18:12 +00:00
Tom Lane 0f4ff460c4 Fix up the remaining places where the expression node structure would lose
available information about the typmod of an expression; namely, Const,
ArrayRef, ArrayExpr, and EXPR and ARRAY SubLinks.  In the ArrayExpr and
SubLink cases it wasn't really the data structure's fault, but exprTypmod()
being lazy.  This seems like a good idea in view of the expected increase in
typmod usage from Teodor's work to allow user-defined types to have typmods.
In particular this responds to the concerns we had about eliminating the
special-purpose hack that exprTypmod() used to have for BPCHAR Consts.
We can now tell whether or not such a Const has been cast to a specific
length, and report or display properly if so.

initdb forced due to changes in stored rules.
2007-03-17 00:11:05 +00:00
Tom Lane 234a02b2a8 Replace direct assignments to VARATT_SIZEP(x) with SET_VARSIZE(x, len).
Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with
VARSIZE and VARDATA, and as a consequence almost no code was using the
longer names.  Rename the length fields of struct varlena and various
derived structures to catch anyplace that was accessing them directly;
and clean up various places so caught.  In itself this patch doesn't
change any behavior at all, but it is necessary infrastructure if we hope
to play any games with the representation of varlena headers.
Greg Stark and Tom Lane
2007-02-27 23:48:10 +00:00
Tom Lane eab6b8b27e Turn the rangetable used by the executor into a flat list, and avoid storing
useless substructure for its RangeTblEntry nodes.  (I chose to keep using the
same struct node type and just zero out the link fields for unneeded info,
rather than making a separate ExecRangeTblEntry type --- it seemed too
fragile to have two different rangetable representations.)

Along the way, put subplans into a list in the toplevel PlannedStmt node,
and have SubPlan nodes refer to them by list index instead of direct pointers.
Vadim wanted to do that years ago, but I never understood what he was on about
until now.  It makes things a *whole* lot more robust, because we can stop
worrying about duplicate processing of subplans during expression tree
traversals.  That's been a constant source of bugs, and it's finally gone.

There are some consequent simplifications yet to be made, like not using
a separate EState for subplans in the executor, but I'll tackle that later.
2007-02-22 22:00:26 +00:00
Tom Lane 7c5e5439d2 Get rid of some old and crufty global variables in the planner. When
this code was last gone over, there wasn't really any alternative to
globals because we didn't have the PlannerInfo struct being passed all
through the planner code.  Now that we do, we can restructure things
to avoid non-reentrancy.  I'm fooling with this because otherwise I'd
have had to add another global variable for the planned compact
range table list.
2007-02-19 07:03:34 +00:00
Teodor Sigaev 61f621b506 Revert gincostestimate changes. 2007-01-31 16:54:51 +00:00
Teodor Sigaev d4c6da1527 Allow GIN's extractQuery method to signal that nothing can satisfy the query.
In this case extractQuery should returns -1 as nentries. This changes
prototype of extractQuery method to use int32* instead of uint32* for
nentries argument.
Based on that gincostestimate may see two corner cases: nothing will be found
or seqscan should be used.

Per proposal at http://archives.postgresql.org/pgsql-hackers/2007-01/msg01581.php

PS tsearch_core patch should be sightly modified to support changes, but I'm
waiting a verdict about reviewing of tsearch_core patch.
2007-01-31 15:09:45 +00:00
Tom Lane a053437d9e Dept of second thoughts: the IQ of estimate_array_length() needs to be
kept on par with that of scalararraysel(), else estimates that should
track might not.  Hence teach it about binary-compatible cases, too.
2007-01-28 02:53:34 +00:00
Tom Lane af18f6ad85 Fix scalararraysel() to cope with binary-compatible cases, such as text[]
versus varchar[].  This oversight probably explains Ryan Holmes' recent
complaint --- he was getting a generic selectivity estimate instead of
anything intelligent.
2007-01-28 01:37:38 +00:00
Tom Lane 4f06c688c7 Put back planner's ability to cache the results of mergejoinscansel(),
which I had removed in the first cut of the EquivalenceClass rewrite to
simplify that patch a little.  But it's still important --- in a four-way
join problem mergejoinscansel() was eating about 40% of the planning time
according to gprof.  Also, improve the EquivalenceClass code to re-use
join RestrictInfos rather than generating fresh ones for each join
considered.  This saves some memory space but more importantly improves
the effectiveness of caching planning info in RestrictInfos.
2007-01-22 20:00:40 +00:00
Tom Lane f41803bb39 Refactor planner's pathkeys data structure to create a separate, explicit
representation of equivalence classes of variables.  This is an extensive
rewrite, but it brings a number of benefits:
* planner no longer fails in the presence of "incomplete" operator families
that don't offer operators for every possible combination of datatypes.
* avoid generating and then discarding redundant equality clauses.
* remove bogus assumption that derived equalities always use operators
named "=".
* mergejoins can work with a variety of sort orders (e.g., descending) now,
instead of tying each mergejoinable operator to exactly one sort order.
* better recognition of redundant sort columns.
* can make use of equalities appearing underneath an outer join.
2007-01-20 20:45:41 +00:00
Tom Lane 4431758229 Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LAST
per-column options for btree indexes.  The planner's support for this is still
pretty rudimentary; it does not yet know how to plan mergejoins with
nondefault ordering options.  The documentation is pretty rudimentary, too.
I'll work on improving that stuff later.

Note incompatible change from prior behavior: ORDER BY ... USING will now be
rejected if the operator is not a less-than or greater-than member of some
btree opclass.  This prevents less-than-sane behavior if an operator that
doesn't actually define a proper sort ordering is selected.
2007-01-09 02:14:16 +00:00
Bruce Momjian 29dccf5fe0 Update CVS HEAD for 2007 copyright. Back branches are typically not
back-stamped for this.
2007-01-05 22:20:05 +00:00
Tom Lane d6061d2f31 Fix regex_fixed_prefix() to cope reasonably well with regex patterns of the
form '^(foo)$'.  Before, these could never be optimized into indexscans.
The recent changes to make psql and pg_dump generate such patterns (for \d
commands and -t and related switches, respectively) therefore represented
a big performance hit for people with large pg_class catalogs, as seen in
recent gripe from Erik Jones.  While at it, be more paranoid about
case-sensitivity checking in multibyte encodings, and fix some other
corner cases in which a regex might be interpreted too liberally.
2007-01-03 22:39:26 +00:00
Tom Lane a78fcfb512 Restructure operator classes to allow improved handling of cross-data-type
cases.  Operator classes now exist within "operator families".  While most
families are equivalent to a single class, related classes can be grouped
into one family to represent the fact that they are semantically compatible.
Cross-type operators are now naturally adjunct parts of a family, without
having to wedge them into a particular opclass as we had done originally.

This commit restructures the catalogs and cleans up enough of the fallout so
that everything still works at least as well as before, but most of the work
needed to actually improve the planner's behavior will come later.  Also,
there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way
to create a new family right now is to allow CREATE OPERATOR CLASS to make
one by default.  I owe some more documentation work, too.  But that can all
be done in smaller pieces once this infrastructure is in place.
2006-12-23 00:43:13 +00:00
Tom Lane 281f40187f Fix some planner bugs exposed by reports from Arjen van der Meijden. These
are all in new-in-8.2 logic associated with indexability of ScalarArrayOpExpr
(IN-clauses) or amortization of indexscan costs across repeated indexscans
on the inside of a nestloop.  In particular:

Fix some logic errors in the estimation for multiple scans induced by a
ScalarArrayOpExpr indexqual.

Include a small cost component in bitmap index scans to reflect the costs of
manipulating the bitmap itself; this is mainly to prevent a bitmap scan from
appearing to have the same cost as a plain indexscan for fetching a single
tuple.

Also add a per-index-scan-startup CPU cost component; while prior releases
were clearly too pessimistic about the cost of repeated indexscans, the
original 8.2 coding allowed the cost of an indexscan to effectively go to zero
if repeated often enough, which is overly optimistic.

Pay some attention to index correlation when estimating costs for a nestloop
inner indexscan: this is significant when the plan fetches multiple heap
tuples per iteration, since high correlation means those tuples are probably
on the same or adjacent heap pages.
2006-12-15 18:42:26 +00:00
Bruce Momjian f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Tom Lane bfd1ffa948 Change patternsel (LIKE/regex selectivity estimation) so that if there
is a large enough histogram, it will use the number of matches in the
histogram to derive a selectivity estimate, rather than the admittedly
pretty bogus heuristics involving examining the pattern contents.  I set
'large enough' at 100, but perhaps we should change that later.  Also
apply the same technique in contrib/ltree's <@ and @> estimator.  Per
discussion with Stefan Kaltenbrunner and Matteo Beccati.
2006-09-20 19:50:21 +00:00
Tom Lane b74c543685 Improve usage of effective_cache_size parameter by assuming that all the
tables in the query compete for cache space, not just the one we are
currently costing an indexscan for.  This seems more realistic, and it
definitely will help in examples recently exhibited by Stefan
Kaltenbrunner.  To get the total size of all the tables involved, we must
tweak the handling of 'append relations' a bit --- formerly we looked up
information about the child tables on-the-fly during set_append_rel_pathlist,
but it needs to be done before we start doing any cost estimation, so
push it into the add_base_rels_to_query scan.
2006-09-19 22:49:53 +00:00
Bruce Momjian 9a7483714f Work around bug in strxfmt() but in MS VS2005.
William ZHANG
2006-07-26 17:17:28 +00:00
Tom Lane 8dcaea7be0 Add a fudge factor to genericcostestimate() to prevent the planner from
thinking that indexes of different sizes are equally attractive.  Per
gripe from Jim Nasby.  (I remain unconvinced that there's such a problem
in existing releases, but CVS HEAD definitely has got a problem because
of its new count-only-leaf-pages approach to indexscan costing.)
2006-07-24 01:19:48 +00:00
Bruce Momjian e0522505bd Remove 576 references of include files that were not needed. 2006-07-14 14:52:27 +00:00
Tom Lane 08ccdf020e Fix oversight in planning for multiple indexscans driven by
ScalarArrayOpExpr index quals: we were estimating the right total
number of rows returned, but treating the index-access part of the
cost as if a single scan were fetching that many consecutive index
tuples.  Actually we should treat it as a multiple indexscan, and
if there are enough of 'em the Mackert-Lohman discount should kick in.
2006-07-01 22:07:23 +00:00
Tom Lane 8a30cc2127 Make the planner estimate costs for nestloop inner indexscans on the basis
that the Mackert-Lohmann formula applies across all the repetitions of the
nestloop, not just each scan independently.  We use the M-L formula to
estimate the number of pages fetched from the index as well as from the table;
that isn't what it was designed for, but it seems reasonably applicable
anyway.  This makes large numbers of repetitions look much cheaper than
before, which accords with many reports we've received of overestimation
of the cost of a nestloop.  Also, change the index access cost model to
charge random_page_cost per index leaf page touched, while explicitly
not counting anything for access to metapage or upper tree pages.  This
may all need tweaking after we get some field experience, but in simple
tests it seems to be giving saner results than before.  The main thing
is to get the infrastructure in place to let cost_index() and amcostestimate
functions take repeated scans into account at all.  Per my recent proposal.

Note: this patch changes pg_proc.h, but I did not force initdb because
the changes are basically cosmetic --- the system does not look into
pg_proc to decide how to call an index amcostestimate function, and
there's no way to call such a function from SQL at all.
2006-06-06 17:59:58 +00:00
Tom Lane eed6c9ed7e Add a GUC parameter seq_page_cost, and use that everywhere we formerly
assumed that a sequential page fetch has cost 1.0.  This patch doesn't
in itself change the system's behavior at all, but it opens the door to
people adopting other units of measurement for EXPLAIN costs.  Also, if
we ever decide it's worth inventing per-tablespace access cost settings,
this change provides a workable intellectual framework for that.
2006-06-05 02:49:58 +00:00
Teodor Sigaev 8a3631f8d8 GIN: Generalized Inverted iNdex.
text[], int4[], Tsearch2 support for GIN.
2006-05-02 11:28:56 +00:00
Tom Lane 427c6b5b98 Avoid assuming that statistics for a parent relation reflect the properties of
the union of its child relations as well.  This might have been a good idea
when it was originally coded, but it's a fatally bad idea when inheritance is
being used for partitioning.  It's better to have no stats at all than
completely misleading stats.  Per report from Mark Liberman.

The bug arguably exists all the way back, but I've only patched HEAD and 8.1
because we weren't particularly trying to support partitioning before 8.1.

Eventually we ought to look at deriving union statistics instead of just
punting, but for now the drop kick looks good.
2006-05-02 04:34:18 +00:00
Tom Lane 0f0a33099c Generalize mcv_selectivity() to support both VAR OP CONST and CONST OP VAR
cases.  This was not needed in the existing uses within selfuncs.c, but if
we're gonna export it for general use, the extra generality seems helpful.
Motivated by looking at ltree example.
2006-04-27 17:52:40 +00:00
Tom Lane a3c1a11fc1 If we're going to expose VariableStatData for contrib modules to use,
then we should export a reasonable set of the supporting routines too.
2006-04-27 00:46:59 +00:00
Bruce Momjian 59d61409cd Move ltree parentsel() selectivity function into /contrib/ltree. 2006-04-26 22:33:36 +00:00
Bruce Momjian b3e4aefcfb Enhanced containment selectivity function for /contrib/ltree
Matteo Beccati
2006-04-26 18:28:34 +00:00
Tom Lane efe222268f Eliminate some no-longer-needed workarounds for palloc's old behavior
of rejecting palloc(0).  Also, tweak like_selectivity() to avoid assuming
the presented pattern is nonempty; although that assumption is valid,
it doesn't really help much, and the new coding is more correct anyway
since it properly handles redundant wildcards.  In combination these
changes should eliminate a Coverity warning noted by Martijn.
2006-04-20 17:50:18 +00:00
Bruce Momjian f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Tom Lane 3a0a16cb7e Allow row comparisons to be used as indexscan qualifications.
This completes the project to upgrade our handling of row comparisons.
2006-01-25 20:29:24 +00:00
Tom Lane 34f8ee9737 Add selectivity-calculation code for RowCompareExpr nodes. Simplistic,
but a lot better than nothing at all ...
2006-01-14 00:14:12 +00:00
Tom Lane ce8fd39e15 Improve patternsel() by applying the operator itself to each value
listed in the column's most-common-values statistics entry.  This gives
us an exact selectivity result for the portion of the column population
represented by the MCV list, which can be a big leg up in accuracy if
that's a large fraction of the population.  The heuristics involving
pattern contents and prefix are applied only to the part of the population
not included in the MCV list.
2006-01-10 17:35:52 +00:00
Tom Lane 290166f934 Teach planner and executor to handle ScalarArrayOpExpr as an indexable
qualification when the underlying operator is indexable and useOr is true.
That is, indexkey op ANY (ARRAY[...]) is effectively translated into an
OR combination of one indexscan for each array element.  This only works
for bitmap index scans, of course, since regular indexscans no longer
support OR'ing of scans.  There are still some loose ends to clean up
before changing 'x IN (list)' to translate as a ScalarArrayOpExpr;
for instance predtest.c ought to be taught about it.  But this gets the
basic functionality in place.
2005-11-25 19:47:50 +00:00
Bruce Momjian 436a2956d8 Re-run pgindent, fixing a problem where comment lines after a blank
comment line where output as too long, and update typedefs for /lib
directory.  Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).

Backpatch to 8.1.X.
2005-11-22 18:17:34 +00:00
Tom Lane 2a8d3d83ef R-tree is dead ... long live GiST. 2005-11-07 17:36:47 +00:00
Bruce Momjian 1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane 0cc0d0822d Document that get_attstatsslot/free_attstatsslot only need to be passed
valid type information if they are asked to fetch the values part of a
pg_statistic slot; these arguments are unneeded if fetching only the
numbers part.  Use this to save a catcache lookup in btcostestimate,
which is looking like a bit of a hotspot in recent profiling.  Not a
big savings, but since it's essentially free, might as well do it.
2005-10-11 17:27:14 +00:00
Tom Lane 303e089df5 Clean up possibly-uninitialized-variable warnings reported by gcc 4.x. 2005-09-24 22:54:44 +00:00
Tom Lane 8889685555 Suppress signed-vs-unsigned-char warnings. 2005-09-24 17:53:28 +00:00
Bruce Momjian 9dbd00b0e2 Remove unnecessary parentheses in assignments.
Add spaces where needed.
Reference time interval variables as tinterval.
2005-07-21 04:41:43 +00:00
Bruce Momjian a536b2dd80 Add time/date macros for code clarity:
#define DAYS_PER_YEAR   365.25
	#define MONTHS_PER_YEAR 12
	#define DAYS_PER_MONTH  30
	#define HOURS_PER_DAY   24
2005-07-21 03:56:25 +00:00
Bruce Momjian db05f4a7eb Add 'day' field to INTERVAL so 1 day interval can be distinguished from
24 hours. This is very helpful for daylight savings time:

	select '2005-05-03 00:00:00 EST'::timestamp with time zone + '24 hours';
	      ?column?
	----------------------
	2005-05-04 01:00:00-04

	select '2005-05-03 00:00:00 EST'::timestamp with time zone + '1 day';
	      ?column?
	----------------------
	2005-05-04 01:00:00-04

Michael Glaesemann
2005-07-20 16:42:32 +00:00
Bruce Momjian 7f0b690334 Improve comments for AdjustIntervalForTypmod.
Blank line adjustments.
2005-07-12 16:05:12 +00:00
Tom Lane b5f7cff84f Clean up the rather historically encumbered interface to now() and
current time: provide a GetCurrentTimestamp() function that returns
current time in the form of a TimestampTz, instead of separate time_t
and microseconds fields.  This is what all the callers really want
anyway, and it eliminates low-level dependencies on AbsoluteTime,
which is a deprecated datatype that will have to disappear eventually.
2005-06-29 22:51:57 +00:00
Tom Lane c186c93148 Change the planner to allow indexscan qualification clauses to use
nonconsecutive columns of a multicolumn index, as per discussion around
mid-May (pghackers thread "Best way to scan on-disk bitmaps").  This
turns out to require only minimal changes in btree, and so far as I can
see none at all in GiST.  btcostestimate did need some work, but its
original assumption that index selectivity == heap selectivity was
quite bogus even before this.
2005-06-13 23:14:49 +00:00
Tom Lane 2f1210629c Separate predicate-testing code out of indxpath.c, making it a module
in its own right.  As proposed by Simon Riggs, but with some editorializing
of my own.
2005-06-10 22:25:37 +00:00
Tom Lane 9ab4d98168 Remove planner's private fields from Query struct, and put them into
a new PlannerInfo struct, which is passed around instead of the bare
Query in all the planning code.  This commit is essentially just a
code-beautification exercise, but it does open the door to making
larger changes to the planner data structures without having to muck
with the widely-known Query struct.
2005-06-05 22:32:58 +00:00
Tom Lane 1e85fa2008 patternsel() was improperly stripping RelabelType from the derived
expressions it constructed, causing scalarineqsel to become confused
if the underlying variable was of a domain type.  Per report from
Kevin Grittner.
2005-06-01 17:05:11 +00:00
Tom Lane 5b05185262 Remove support for OR'd indexscans internal to a single IndexScan plan
node, as this behavior is now better done as a bitmap OR indexscan.
This allows considerable simplification in nodeIndexscan.c itself as
well as several planner modules concerned with indexscan plan generation.
Also we can improve the sharing of code between regular and bitmap
indexscans, since they are now working with nigh-identical Plan nodes.
2005-04-25 01:30:14 +00:00
Tom Lane 162bd08b3f Completion of project to use fixed OIDs for all system catalogs and
indexes.  Replace all heap_openr and index_openr calls by heap_open
and index_open.  Remove runtime lookups of catalog OID numbers in
various places.  Remove relcache's support for looking up system
catalogs by name.  Bulky but mostly very boring patch ...
2005-04-14 20:03:27 +00:00
Tom Lane a5dda5dc3a Second try at making examine_variable and friends behave sanely in
cases with binary-compatible relabeling.  My first try was implicitly
assuming that all operators scalarineqsel is used for have binary-
compatible datatypes on both sides ... which is very wrong of course.
Per report from Michael Fuhr.
2005-04-01 20:31:50 +00:00
Tom Lane bf3dbb5881 First steps towards index scans with heap access decoupled from index
access: define new index access method functions 'amgetmulti' that can
fetch multiple TIDs per call.  (The functions exist but are totally
untested as yet.)  Since I was modifying pg_am anyway, remove the
no-longer-needed 'rel' parameter from amcostestimate functions, and
also remove the vestigial amowner column that was creating useless
work for Alvaro's shared-object-dependencies project.
Initdb forced due to changes in pg_am.
2005-03-27 23:53:05 +00:00
Tom Lane 9d388e1f39 Fix a pair of related issues with estimation of inequalities that involve
binary-compatible relabeling of one or both operands.  examine_variable
should avoid stripping RelabelType from non-variable expressions, so that
they will continue to have the correct type; and convert_to_scalar should
just use that type and ignore the other input type.  This isn't perfect
but it beats failing entirely.  Per example from Michael Fuhr.
2005-03-26 20:55:39 +00:00
Bruce Momjian e3d7de6b99 Rename canonical encodings, per Peter:
UNICODE => UTF8
	ALT => WIN866
	WIN => WIN1251
	TCVN => WIN1258

The old codes continue to work.
2005-03-07 04:30:55 +00:00
Tom Lane 849074f9ae Revise hash join code so that we can increase the number of batches
on-the-fly, and thereby avoid blowing out memory when the planner has
underestimated the hash table size.  Hash join will now obey the
work_mem limit with some faithfulness.  Per my recent proposal
(hash aggregate part isn't done yet though).
2005-03-06 22:15:05 +00:00
Tom Lane a3f945a1b2 Adjust estimate_num_groups() to not clamp per-relation group count
estimate to less than the number of values estimated for any one grouping
Var, as suggested by Manfred.  This is intuitively right, and what's
more it puts the plan choices in the subselect regression test back the
way they were before ...
2005-02-01 23:08:13 +00:00
Tom Lane 875b0c62fa When dealing with multiple grouping columns coming from the same table,
clamp the estimated number of groups to table row count over 10, instead
of table row count; this reflects a heuristic that people probably won't
group over a near-unique set of columns, and the knowledge that we don't
currently have any way to estimate the correlation of the columns better
than guessing.  This change creates a trivial plan change in one of the
regression tests.
2005-01-28 20:34:27 +00:00
PostgreSQL Daemon 2ff501590b Tag appropriate files for rc3
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
2004-12-31 22:04:05 +00:00
Tom Lane 76e8a87f15 Teach regex_fixed_prefix() the correct handling of advanced regex
escapes --- they aren't simply quoted characters.  Problem noted by
Antti Salmela.  Also fix problem with incorrect handling of multibyte
characters when followed by a quantifier.
2004-12-02 02:45:07 +00:00
Tom Lane 547bb4a7f2 Use a hopefully-more-reliable method of detecting default selectivity
estimates when combining the estimates for a range query.  As pointed out
by Miquel van Smoorenburg, the existing check for an impossible combined
result would quite possibly fail to detect one default and one non-default
input.  It seems better to use the default range query estimate in such
cases.  To do so, add a check for an estimate of exactly DEFAULT_INEQ_SEL.
This is a bit ugly because it introduces additional coupling between
clauselist_selectivity and scalarltsel/scalargtsel, but it's not like
there wasn't plenty already...
2004-11-09 00:34:46 +00:00
Tom Lane 84c7cef5eb Fix estimate_num_groups to be able to use expression-index statistics
when there is an expressional index matching a GROUP BY item.
2004-09-18 19:39:50 +00:00
Bruce Momjian 15d3f9f6b7 Another pgindent run with lib typedefs added. 2004-08-30 02:54:42 +00:00
Bruce Momjian b6b71b85bc Pgindent run for 8.0. 2004-08-29 05:07:03 +00:00
Bruce Momjian da9a8649d8 Update copyright to 2004. 2004-08-29 04:13:13 +00:00
Tom Lane fcbc438727 Label CVS tip as 8.0devel instead of 7.5devel. Adjust various comments
and documentation to reference 8.0 instead of 7.5.
2004-08-04 21:34:35 +00:00
Tom Lane 7643bed58e When using extended-query protocol, postpone planning of unnamed statements
until Bind is received, so that actual parameter values are visible to the
planner.  Make use of the parameter values for estimation purposes (but
don't fold them into the actual plan).  This buys back most of the
potential loss of plan quality that ensues from using out-of-line
parameters instead of putting literal values right into the query text.

This patch creates a notion of constant-folding expressions 'for
estimation purposes only', in which case we can be more aggressive than
the normal eval_const_expressions() logic can be.  Right now the only
difference in behavior is inserting bound values for Params, but it will
be interesting to look at other possibilities.  One that we've seen
come up repeatedly is reducing now() and related functions to current
values, so that queries like ... WHERE timestampcol > now() - '1 day'
have some chance of being planned effectively.

Oliver Jowett, with some kibitzing from Tom Lane.
2004-06-11 01:09:22 +00:00
Neil Conway 72b6ad6313 Use the new List API function names throughout the backend, and disable the
list compatibility API by default. While doing this, I decided to keep
the llast() macro around and introduce llast_int() and llast_oid() variants.
2004-05-30 23:40:41 +00:00
Neil Conway d0b4399d81 Reimplement the linked list data structure used throughout the backend.
In the past, we used a 'Lispy' linked list implementation: a "list" was
merely a pointer to the head node of the list. The problem with that
design is that it makes lappend() and length() linear time. This patch
fixes that problem (and others) by maintaining a count of the list
length and a pointer to the tail node along with each head node pointer.
A "list" is now a pointer to a structure containing some meta-data
about the list; the head and tail pointers in that structure refer
to ListCell structures that maintain the actual linked list of nodes.

The function names of the list API have also been changed to, I hope,
be more logically consistent. By default, the old function names are
still available; they will be disabled-by-default once the rest of
the tree has been updated to use the new API names.
2004-05-26 04:41:50 +00:00
Tom Lane df79b847fe genericcostestimate() neglected to include qual startup cost in
indexTotalCost.  I think this may not make any real difference in 7.4,
but it definitely is a problem with CVS tip's new equation.
2004-02-27 21:44:34 +00:00
Tom Lane a536ed53bc Make use of statistics on index expressions. There are still some
corner cases that could stand improvement, but it does all the basic
stuff.  A byproduct is that the selectivity routines are no longer
constrained to working on simple Vars; we might in future be able to
improve the behavior for subexpressions that don't match indexes.
2004-02-17 00:52:53 +00:00
Tom Lane 9fe097577e Avoid generating invalid character encoding sequences in make_greater_string.
Not sure how this mistake evaded detection for so long.
2004-02-02 03:07:08 +00:00
Tom Lane de816a03c4 Repair misestimation of indexscan CPU costs. When an indexqual contains
a run-time key (that is, a nonconstant expression compared to the index
variable), the key is evaluated just once per scan, but we were charging
costs as though it were evaluated once per visited index entry.
2004-01-17 20:09:35 +00:00
Neil Conway 192ad63bd7 More janitorial work: remove the explicit casting of NULL literals to a
pointer type when it is not necessary to do so.

For future reference, casting NULL to a pointer type is only necessary
when (a) invoking a function AND either (b) the function has no prototype
OR (c) the function is a varargs function.
2004-01-07 18:56:30 +00:00
Tom Lane fa559a86ee Adjust indexscan planning logic to keep RestrictInfo nodes associated
with index qual clauses in the Path representation.  This saves a little
work during createplan and (probably more importantly) allows reuse of
cached selectivity estimates during indexscan planning.  Also fix latent
bug: wrong plan would have been generated for a 'special operator' used
in a nestloop-inner-indexscan join qual, because the special operator
would not have gotten into the list of quals to recheck.  This bug is
only latent because at present the special-operator code could never
trigger on a join qual, but sooner or later someone will want to do it.
2004-01-05 23:39:54 +00:00
Tom Lane 5c5b911fcc Using canonicalize_qual() to get rid of duplicate index predicate
conditions is overkill; set_union() does the job about as well, and
much more efficiently.  Furthermore this avoids assuming that
canonicalize_qual() will check for duplicate clauses at all, which
it may not always do.
2003-12-29 22:22:45 +00:00
Tom Lane c607bd693f Clean up the usage of canonicalize_qual(): in particular, be consistent
about whether it is applied before or after eval_const_expressions().
I believe there were some corner cases where the system would fail to
recognize that a partial index is applicable because of the previous
inconsistency.  Store normal rather than 'implicit AND' representations
of constraints and index predicates in the catalogs.
initdb forced due to representation change of constraints/predicates.
2003-12-28 21:57:37 +00:00
Joe Conway 53e7c1363a Repair indexed bytea like operations, and related selectivity
functionality. Per bug report by Alvar Freude:
http://archives.postgresql.org/pgsql-bugs/2003-12/msg00022.php
2003-12-07 04:14:10 +00:00
PostgreSQL Daemon 969685ad44 $Header: -> $PostgreSQL Changes ... 2003-11-29 19:52:15 +00:00
Tom Lane fa5c8a055a Cross-data-type comparisons are now indexable by btrees, pursuant to my
pghackers proposal of 8-Nov.  All the existing cross-type comparison
operators (int2/int4/int8 and float4/float8) have appropriate support.
The original proposal of storing the right-hand-side datatype as part of
the primary key for pg_amop and pg_amproc got modified a bit in the event;
it is easier to store zero as the 'default' case and only store a nonzero
when the operator is actually cross-type.  Along the way, remove the
long-since-defunct bigbox_ops operator class.
2003-11-12 21:15:59 +00:00
Tom Lane 64c1fc7257 Avoid division by zero in estimate_num_groups() when table has no rows. 2003-10-16 21:37:54 +00:00
Peter Eisentraut feb4f44d29 Message editing: remove gratuitous variations in message wording, standardize
terms, add some clarifications, fix some untranslatable attempts at dynamic
message building.
2003-09-25 06:58:07 +00:00
Bruce Momjian 46785776c4 Another pgindent run with updated typedefs. 2003-08-08 21:42:59 +00:00
Bruce Momjian f3c3deb7d0 Update copyrights to 2003. 2003-08-04 02:40:20 +00:00