Commit Graph

349 Commits

Author SHA1 Message Date
Tom Lane 39df0f150c Allow planner to use expression-index stats for function calls in WHERE.
Previously, a function call appearing at the top level of WHERE had a
hard-wired selectivity estimate of 0.3333333, a kludge conveniently dated
in the source code itself to July 1992.  The expectation at the time was
that somebody would soon implement estimator support functions analogous
to those for operators; but no such code has appeared, nor does it seem
likely to in the near future.  We do have an alternative solution though,
at least for immutable functions on single relations: creating an
expression index on the function call will allow ANALYZE to gather stats
about the function's selectivity.  But the code in clause_selectivity()
failed to make use of such data even if it exists.

Refactor so that that will happen.  I chose to make it try this technique
for any clause type for which clause_selectivity() doesn't have a special
case, not just functions.  To avoid adding unnecessary overhead in the
common case where we don't learn anything new, make selfuncs.c provide an
API that hooks directly to examine_variable() and then var_eq_const(),
rather than the previous coding which laboriously constructed an OpExpr
only so that it could be expensively deconstructed again.

I preserved the behavior that the default estimate for a function call
is 0.3333333.  (For any other expression node type, it's 0.5, as before.)
I had originally thought to make the default be 0.5 across the board, but
changing a default estimate that's survived for twenty-three years seems
like something not to do without a lot more testing than I care to put
into it right now.

Per a complaint from Jehan-Guillaume de Rorthais.  Back-patch into 9.5,
but not further, at least for the moment.
2015-09-24 18:35:46 -04:00
Tom Lane aad663a0b4 Reduce number of bytes examined by convert_one_string_to_scalar().
Previously, convert_one_string_to_scalar() would examine up to 20 bytes of
the input string, producing a scalar conversion with theoretical precision
far greater than is of any possible use considering the other limitations
on the accuracy of the resulting selectivity estimate.  (I think this
choice might pre-date the caller-level logic that strips any common prefix
of the strings; before that, there could have been value in scanning the
strings far enough to use all the precision available in a double.)

Aside from wasting cycles to little purpose, this choice meant that the
"denom" variable could grow to as much as 256^21 = 3.74e50, which could
overflow in some non-IEEE float arithmetics.  While we don't really support
any machines with non-IEEE arithmetic anymore, this still seems like quite
an unnecessary platform dependency.  Limit the scan to 12 bytes instead,
thus limiting "denom" to 256^13 = 2.03e31, a value more likely to be
computable everywhere.

Per testing by Greg Stark, which showed overflow failures in our standard
regression tests on VAX.
2015-08-23 15:15:47 -04:00
Tom Lane 8693ebe37d Avoid some zero-divide hazards in the planner.
Although I think on all modern machines floating division by zero
results in Infinity not SIGFPE, we still don't want infinities
running around in the planner's costing estimates; too much risk
of that leading to insane behavior.

grouping_planner() failed to consider the possibility that final_rel
might be known dummy and hence have zero rowcount.  (I wonder if it
would be better to set a rows estimate of 1 for dummy relations?
But at least in the back branches, changing this convention seems
like a bad idea, so I'll leave that for another day.)

Make certain that get_variable_numdistinct() produces a nonzero result.
The case that can be shown to be broken is with stadistinct < 0.0 and
small ntuples; we did not prevent the result from rounding to zero.
For good luck I applied clamp_row_est() to all the nonconstant return
values.

In ExecChooseHashTableSize(), Assert that we compute positive nbuckets
and nbatch.  I know of no reason to think this isn't the case, but it
seems like a good safety check.

Per reports from Piotr Stefaniak.  Back-patch to all active branches.
2015-07-30 12:11:23 -04:00
Noah Misch be8b06c364 Revoke support for strxfrm() that write past the specified array length.
This formalizes a decision implicit in commit
4ea51cdfe8 and adds clean detection of
affected systems.  Vendor updates are available for each such known bug.
Back-patch to 9.5, where the aforementioned commit first appeared.
2015-07-08 20:44:21 -04:00
Andres Freund f3d3118532 Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.

This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.

The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.

The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.

Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage.  The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting.  The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.

A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.

Needs a catversion bump because stored rules may change.

Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
    Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:46:31 +02:00
Andrew Dunstan cb9fa802b3 Add new OID alias type regnamespace
Catalog version bumped

Kyotaro HORIGUCHI
2015-05-09 13:36:52 -04:00
Andrew Dunstan 0c90f6769d Add new OID alias type regrole
The new type has the scope of whole the database cluster so it doesn't
behave the same as the existing OID alias types which have database
scope,
concerning object dependency. To avoid confusion constants of the new
type are prohibited from appearing where dependencies are made involving
it.

Also, add a note to the docs about possible MVCC violation and
optimization issues, which are general over the all reg* types.

Kyotaro Horiguchi
2015-05-09 13:06:49 -04:00
Tom Lane a5c29d37aa Avoid unused-variable warning in non-assert builds.
Oversight in my commit b9896198cf.
2015-03-04 22:00:36 -05:00
Tom Lane b9896198cf Fix cost estimation for indexscans on expensive indexed expressions.
genericcostestimate() and friends used the cost of the entire indexqual
expressions as the charge for initial evaluation of indexscan arguments.
But of course the index column is not evaluated, only the other side
of the qual expression, so this was a bad overestimate if the index
column was an expensive expression.

To fix, refactor the logic in this area so that there's a single routine
charged with deconstructing index quals and figuring out what is the index
column and what is the comparison expression.  This is more or less free in
the case of btree indexes, since btcostestimate() was doing equivalent
deconstruction already.  It probably adds a bit of new overhead in the cases
of other index types, but not a lot.  (In the case of GIN I think I saved
something by getting rid of code that wasn't aware that the index column
associations were already available "for free".)

Per recent gripe from Jeff Janes.

Arguably this is a bug fix, but I'm hesitant to back-patch because of the
possibility of destabilizing plan choices that people may be happy with.
2015-03-03 23:23:24 -05:00
Bruce Momjian 4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Alvaro Herrera 7516f52594 BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes.  They work by maintaining "summary" data about
block ranges.  Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not.  Normal index scans are not supported
because these indexes do not store TIDs.

As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.

For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range.  This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results.  In this commit I only include minmax.

Catalog version bumped due to new builtin catalog entries.

There's more that could be done here, but this is a good step forwards.

Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.

Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.

PS:
  The research leading to these results has received funding from the
  European Union's Seventh Framework Programme (FP7/2007-2013) under
  grant agreement n° 318633.
2014-11-07 16:38:14 -03:00
Bruce Momjian 0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Bruce Momjian 886c0be3f6 C comments: remove odd blank lines after #ifdef WIN32 lines 2014-03-13 01:34:42 -04:00
Heikki Linnakangas 17d787a3b1 Items on GIN data pages are no longer always 6 bytes; update gincostestimate.
Also improve the comments a bit.
2014-03-12 20:52:22 +02:00
Tom Lane fccebe421d Use SnapshotDirty rather than an active snapshot to probe index endpoints.
If there are lots of uncommitted tuples at the end of the index range,
get_actual_variable_range() ends up fetching each one and doing an MVCC
visibility check on it, until it finally hits a visible tuple.  This is
bad enough in isolation, considering that we don't need an exact answer
only an approximate one.  But because the tuples are not yet committed,
each visibility check does a TransactionIdIsInProgress() test, which
involves scanning the ProcArray.  When multiple sessions do this
concurrently, the ensuing contention results in horrid performance loss.
20X overall throughput loss on not-too-complicated queries is easy to
demonstrate in the back branches (though someone's made it noticeably
less bad in HEAD).

We can dodge the problem fairly effectively by using SnapshotDirty rather
than a normal MVCC snapshot.  This will cause the index probe to take
uncommitted tuples as good, so that we incur only one tuple fetch and test
even if there are many such tuples.  The extent to which this degrades the
estimate is debatable: it's possible the result is actually a more accurate
prediction than before, if the endmost tuple has become committed by the
time we actually execute the query being planned.  In any case, it's not
very likely that it makes the estimate a lot worse.

SnapshotDirty will still reject tuples that are known committed dead, so
we won't give bogus answers if an invalid outlier has been deleted but not
yet vacuumed from the index.  (Because btrees know how to mark such tuples
dead in the index, we shouldn't have a big performance problem in the case
that there are many of them at the end of the range.)  This consideration
motivates not using SnapshotAny, which was also considered as a fix.

Note: the back branches were using SnapshotNow instead of an MVCC snapshot,
but the problem and solution are the same.

Per performance complaints from Bartlomiej Romanski, Josh Berkus, and
others.  Back-patch to 9.0, where the issue was introduced (by commit
40608e7f94).
2014-02-25 16:04:06 -05:00
Tom Lane 77585bce03 Do ScalarArrayOp estimation correctly when array is a stable expression.
Most estimation functions apply estimate_expression_value to see if they
can reduce an expression to a constant; the key difference is that it
allows evaluation of stable as well as immutable functions in hopes of
ending up with a simple Const node.  scalararraysel didn't get the memo
though, and neither did gincost_opexpr/gincost_scalararrayopexpr.  Fix
that, and remove a now-unnecessary estimate_expression_value step in the
subsidiary function scalararraysel_containment.

Per complaint from Alexey Klyukin.  Back-patch to 9.3.  The problem
goes back further, but I'm hesitant to change estimation behavior in
long-stable release branches.
2014-02-21 17:10:46 -05:00
Bruce Momjian 7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Tom Lane ebefbb5fde Fix failure with whole-row reference to a subquery.
Simple oversight in commit 1cb108efb0 ---
recursively examining a subquery output column is only sane if the
original Var refers to a single output column.  Found by Kevin Grittner.
2013-11-11 16:36:27 -05:00
Robert Haas 3483f4332d Don't use SnapshotNow in get_actual_variable_range.
Instead, use the active snapshot.  Per Tom Lane, this function is
most interested in knowing the range of tuples our scan will actually
see.

This is another step towards full removal of SnapshotNow.
2013-07-25 14:30:00 -04:00
Tom Lane b32a25c3d5 Fix booltestsel() for case where we have NULL stats but not MCV stats.
In a boolean column that contains mostly nulls, ANALYZE might not find
enough non-null values to populate the most-common-values stats,
but it would still create a pg_statistic entry with stanullfrac set.
The logic in booltestsel() for this situation did the wrong thing for
"col IS NOT TRUE" and "col IS NOT FALSE" tests, forgetting that null
values would satisfy these tests (so that the true selectivity would
be close to one, not close to zero).  Per bug #8274.

Fix by Andrew Gierth, some comment-smithing by me.
2013-07-24 00:44:09 -04:00
Bruce Momjian 9af4159fce pgindent run for release 9.3
This is the first run of the Perl-based pgindent script.  Also update
pgindent instructions.
2013-05-29 16:58:43 -04:00
Tom Lane 69cc60dcfd Guard against input_rows == 0 in estimate_num_groups().
This case doesn't normally happen, because the planner usually clamps
all row estimates to at least one row; but I found that it can arise
when dealing with relations excluded by constraints.  Without a defense,
estimate_num_groups() can return zero, which leads to divisions by zero
inside the planner as well as assertion failures in the executor.

An alternative fix would be to change set_dummy_rel_pathlist() to make
the size estimate for a dummy relation 1 row instead of 0, but that seemed
pretty ugly; and probably someday we'll want to drop the convention that
the minimum rowcount estimate is 1 row.

Back-patch to 8.4, as the problem can be demonstrated that far back.
2013-05-10 17:15:30 -04:00
Tom Lane 3ccae48f44 Support indexing of regular-expression searches in contrib/pg_trgm.
This works by extracting trigrams from the given regular expression,
in generally the same spirit as the previously-existing support for
LIKE searches, though of course the details are far more complicated.

Currently, only GIN indexes are supported.  We might be able to make
it work with GiST indexes later.

The implementation includes adding API functions to backend/regex/
to provide a view of the search NFA created from a regular expression.
These functions are meant to be generic enough to be supportable in
a standalone version of the regex library, should that ever happen.

Alexander Korotkov, reviewed by Heikki Linnakangas and Tom Lane
2013-04-09 01:06:54 -04:00
Tom Lane 31f38f28b0 Redesign the planner's handling of index-descent cost estimation.
Historically we've used a couple of very ad-hoc fudge factors to try to
get the right results when indexes of different sizes would satisfy a
query with the same number of index leaf tuples being visited.  In
commit 21a39de580 I tweaked one of these
fudge factors, with results that proved disastrous for larger indexes.
Commit bf01e34b55 fudged it some more,
but still with not a lot of principle behind it.

What seems like a better way to address these issues is to explicitly model
index-descent costs, since that's what's really at stake when considering
diferent indexes with similar leaf-page-level costs.  We tried that once
long ago, and found that charging random_page_cost per page descended
through was way too much, because upper btree levels tend to stay in cache
in real-world workloads.  However, there's still CPU costs to think about,
and the previous fudge factors can be seen as a crude attempt to account
for those costs.  So this patch replaces those fudge factors with explicit
charges for the number of tuple comparisons needed to descend the index
tree, plus a small charge per page touched in the descent.  The cost
multipliers are chosen so that the resulting charges are in the vicinity of
the historical (pre-9.2) fudge factors for indexes of up to about a million
tuples, while not ballooning unreasonably beyond that, as the old fudge
factor did (even more so in 9.2).

To make this work accurately for btree indexes, add some code that allows
extraction of the known root-page height from a btree.  There's no
equivalent number readily available for other index types, but we can use
the log of the number of index pages as an approximate substitute.

This seems like too much of a behavioral change to risk back-patching,
but it should improve matters going forward.  In 9.2 I'll just revert
the fudge-factor change.
2013-01-11 12:56:58 -05:00
Bruce Momjian bd61a623ac Update copyrights for 2013
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
2013-01-01 17:15:01 -05:00
Tom Lane bf01e34b55 Tweak genericcostestimate's fudge factor for index size.
To provide some bias against using a large index when a small one would do
as well, genericcostestimate adds a "fudge factor", which for a long time
was random_page_cost * index_pages/10000.  However, this can grow to be the
dominant term in indexscan cost estimates when the index involved is large
enough, a behavior that was never intended.  Change to a ln(1 + n/10000)
formulation, which has nearly the same behavior up to a few hundred pages
but tails off significantly thereafter.  (A log curve seems correct on
first principles, since what we're trying to account for here is index
descent costs, which are typically logarithmic.)  Per bug #7619 from Niko
Kiirala.

Possibly this change should get back-patched, but I'm hesitant to mess with
cost estimates in stable branches.
2012-10-24 16:25:40 -04:00
Tom Lane 1f91c8ca1d Avoid planner crash/Assert failure with joins to unflattened subqueries.
examine_simple_variable supposed that any RTE_SUBQUERY rel it gets pointed
at must have been planned already.  However, this isn't a safe assumption
because we must do selectivity estimation while generating indexscan paths,
and that code might look at join clauses involving a rel that the loop in
set_base_rel_sizes() hasn't reached yet.  The simplest fix is to play dumb
in such a situation, that is give up trying to extract any stats for the
Var.  This could possibly be improved by making a separate pass over the
RTE list to plan each unflattened subquery before we start the main
planning work --- but that would be pretty invasive and it doesn't seem
worth it, for now at least.  (We couldn't just break set_base_rel_sizes()
into two loops: the prescan would need to handle all subquery rels in the
query, not only those in the current join subproblem.)

This bug was introduced in commit 1cb108efb0,
although I think that subsequent changes may have exposed it more than it
was originally.  Per bug #7580 from Maxim Boguk.
2012-10-03 13:37:53 -04:00
Alvaro Herrera c219d9b0a5 Split tuple struct defs from htup.h to htup_details.h
This reduces unnecessary exposure of other headers through htup.h, which
is very widely included by many files.

I have chosen to move the function prototypes to the new file as well,
because that means htup.h no longer needs to include tupdesc.h.  In
itself this doesn't have much effect in indirect inclusion of tupdesc.h
throughout the tree, because it's also required by execnodes.h; but it's
something to explore in the future, and it seemed best to do the htup.h
change now while I'm busy with it.
2012-08-30 16:52:35 -04:00
Tom Lane 628cbb50ba Re-implement extraction of fixed prefixes from regular expressions.
To generate btree-indexable conditions from regex WHERE conditions (such as
WHERE indexed_col ~ '^foo'), we need to be able to identify any fixed
prefix that a regex might have; that is, find any string that must be a
prefix of all strings satisfying the regex.  We used to do that with
entirely ad-hoc code that looked at the source text of the regex.  It
didn't know very much about regex syntax, which mostly meant that it would
fail to identify some optimizable cases; but Viktor Rosenfeld reported that
it would produce actively wrong answers for quantified parenthesized
subexpressions, such as '^(foo)?bar'.  Rather than trying to extend the
ad-hoc code to cover this, let's get rid of it altogether in favor of
identifying prefixes by examining the compiled form of a regex.

To do this, I've added a new entry point "pg_regprefix" to the regex library;
hopefully it is defined in a sufficiently general fashion that it can remain
in the library when/if that code gets split out as a standalone project.

Since this bug has been there for a very long time, this fix needs to get
back-patched.  However it depends on some other recent commits (particularly
the addition of wchar-to-database-encoding conversion), so I'll commit this
separately and then go to work on back-porting the necessary fixes.
2012-07-10 14:54:37 -04:00
Tom Lane 00dac6000d Refactor pattern_fixed_prefix() to avoid dealing in incomplete patterns.
Previously, pattern_fixed_prefix() was defined to return whatever fixed
prefix it could extract from the pattern, plus the "rest" of the pattern.
That definition was sensible for LIKE patterns, but not so much for
regexes, where reconstituting a valid pattern minus the prefix could be
quite tricky (certainly the existing code wasn't doing that correctly).
Since the only thing that callers ever did with the "rest" of the pattern
was to pass it to like_selectivity() or regex_selectivity(), let's cut out
the middle-man and just have pattern_fixed_prefix's subroutines do this
directly.  Then pattern_fixed_prefix can return a simple selectivity
number, and the question of how to cope with partial patterns is removed
from its API specification.

While at it, adjust the API spec so that callers who don't actually care
about the pattern's selectivity (which is a lot of them) can pass NULL for
the selectivity pointer to skip doing the work of computing a selectivity
estimate.

This patch is only an API refactoring that doesn't actually change any
processing, other than allowing a little bit of useless work to be skipped.
However, it's necessary infrastructure for my upcoming fix to regex prefix
extraction, because after that change there won't be any simple way to
identify the "rest" of the regex, not even to the low level of fidelity
needed by regex_selectivity.  We can cope with that if regex_fixed_prefix
and regex_selectivity communicate directly, but not if we have to work
within the old API.  Hence, back-patch to all active branches.
2012-07-09 23:22:55 -04:00
Tom Lane e7ef6d7e24 Fix planner to pass correct collation to operator selectivity estimators.
We can do this without creating an API break for estimation functions
by passing the collation using the existing fmgr functionality for
passing an input collation as a hidden parameter.

The need for this was foreseen at the outset, but we didn't get around to
making it happen in 9.1 because of the decision to sort all pg_statistic
histograms according to the database's default collation.  That meant that
selectivity estimators generally need to use the default collation too,
even if they're estimating for an operator that will do something
different.  The reason it's suddenly become more interesting is that
regexp interpretation also uses a collation (for its LC_TYPE not LC_COLLATE
property), and we no longer want to use the wrong collation when examining
regexps during planning.  It's not that the selectivity estimate is likely
to change much from this; rather that we are thinking of caching compiled
regexps during planner estimation, and we won't get the intended benefit
if we cache them with a different collation than the executor will use.

Back-patch to 9.1, both because the regexp change is likely to get
back-patched and because we might as well get this right in all
collation-supporting branches, in case any third-party code wants to
rely on getting the collation.  The patch turns out to be minuscule
now that I've done it ...
2012-07-08 23:51:08 -04:00
Bruce Momjian 927d61eeff Run pgindent on 9.2 source tree in preparation for first 9.3
commit-fest.
2012-06-10 15:20:04 -04:00
Tom Lane 65fd91333e Fix an Assert that turns out to be reachable after all.
estimate_num_groups() gets unhappy with
	create table empty();
	select * from empty except select * from empty e2;
I can't see any actual use-case for such a query (and the table is illegal
per SQL spec), but it seems like a good idea that it not cause an assert
failure.
2012-04-09 11:58:24 -04:00
Peter Eisentraut 0e85abd658 Clean up compiler warnings from unused variables with asserts disabled
For those variables only used when asserts are enabled, use a new
macro PG_USED_FOR_ASSERTS_ONLY, which expands to
__attribute__((unused)) when asserts are not enabled.
2012-03-21 23:33:10 +02:00
Tom Lane 66a7e6bae9 Improve estimation of IN/NOT IN by assuming array elements are distinct.
In constructs such as "x IN (1,2,3,4)" and "x <> ALL(ARRAY[1,2,3,4])",
we formerly always used a general-purpose assumption that the probability
of success is independent for each comparison of "x" to an array element.
But in real-world usage of these constructs, that's a pretty poor
assumption; it's much saner to assume that the array elements are distinct
and so the match probabilities are disjoint.  Apply that assumption if the
operator appears to behave as equality (for ANY) or inequality (for ALL).
But fall back to the normal independent-probabilities calculation if this
yields an impossible result, ie probability > 1 or < 0.  We could protect
ourselves against bad estimates even more by explicitly checking for equal
array elements, but that is expensive and doesn't seem worthwhile: doing
it would amount to optimizing for poorly-written queries at the expense
of well-written ones.

Daniele Varrazzo and Tom Lane, after a suggestion by Ants Aasma
2012-03-07 22:59:49 -05:00
Tom Lane 0e5e167aae Collect and use element-frequency statistics for arrays.
This patch improves selectivity estimation for the array <@, &&, and @>
(containment and overlaps) operators.  It enables collection of statistics
about individual array element values by ANALYZE, and introduces
operator-specific estimators that use these stats.  In addition,
ScalarArrayOpExpr constructs of the forms "const = ANY/ALL (array_column)"
and "const <> ANY/ALL (array_column)" are estimated by treating them as
variants of the containment operators.

Since we still collect scalar-style stats about the array values as a
whole, the pg_stats view is expanded to show both these stats and the
array-style stats in separate columns.  This creates an incompatible change
in how stats for tsvector columns are displayed in pg_stats: the stats
about lexemes are now displayed in the array-related columns instead of the
original scalar-related columns.

There are a few loose ends here, notably that it'd be nice to be able to
suppress either the scalar-style stats or the array-element stats for
columns for which they're not useful.  But the patch is in good enough
shape to commit for wider testing.

Alexander Korotkov, reviewed by Noah Misch and Nathan Boley
2012-03-03 20:20:57 -05:00
Tom Lane 4767bc8ff2 Improve statistics estimation to make some use of DISTINCT in sub-queries.
Formerly, we just punted when trying to estimate stats for variables coming
out of sub-queries using DISTINCT, on the grounds that whatever stats we
might have for underlying table columns would be inapplicable.  But if the
sub-query has only one DISTINCT column, we can consider its output variable
as being unique, which is useful information all by itself.  The scope of
this improvement is pretty narrow, but it costs nearly nothing, so we might
as well do it.  Per discussion with Andres Freund.

This patch differs from the draft I submitted yesterday in updating various
comments about vardata.isunique (to reflect its extended meaning) and in
tweaking the interaction with security_barrier views.  There does not seem
to be a reason why we can't use this sort of knowledge even when the
sub-query is such a view.
2012-02-16 17:34:00 -05:00
Tom Lane 21a39de580 Tweak index costing for problems with partial indexes.
btcostestimate() makes an estimate of the number of index tuples that will
be visited based on knowledge of which index clauses can actually bound the
scan within nbtree.  However, it forgot to account for partial indexes in
this calculation, with the result that the cost of the index scan could be
significantly overestimated for a partial index.  Fix that by merging the
predicate with the abbreviated indexclause list, in the same way as we do
with the full list to estimate how many heap tuples will be visited.

Also, slightly increase the "fudge factor" that's meant to give preference
to smaller indexes over larger ones.  While this is applied to all indexes,
it's most important for partial indexes since it can be the only factor
that makes a partial index look cheaper than a similar full index.
Experimentation shows that the existing value is so small as to easily get
swamped by noise such as page-boundary-roundoff behavior.  I'm tempted to
kick it up more than this, but will refrain for now.

Per report from Ruben Blanco.  These are long-standing issues, but given
the lack of prior complaints I'm not going to risk changing planner
behavior in back branches by back-patching.
2012-01-29 18:37:14 -05:00
Tom Lane e2fa76d80b Use parameterized paths to generate inner indexscans more flexibly.
This patch fixes the planner so that it can generate nestloop-with-
inner-indexscan plans even with one or more levels of joining between
the indexscan and the nestloop join that is supplying the parameter.
The executor was fixed to handle such cases some time ago, but the
planner was not ready.  This should improve our plans in many situations
where join ordering restrictions formerly forced complete table scans.

There is probably a fair amount of tuning work yet to be done, because
of various heuristics that have been added to limit the number of
parameterized paths considered.  However, we are not going to find out
what needs to be adjusted until the code gets some real-world use, so
it's time to get it in there where it can be tested easily.

Note API change for index AM amcostestimate functions.  I'm not aware of
any non-core index AMs, but if there are any, they will need minor
adjustments.
2012-01-27 19:26:38 -05:00
Bruce Momjian e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Tom Lane 472d3935a2 Rethink representation of index clauses' mapping to index columns.
In commit e2c2c2e8b1 I made use of nested
list structures to show which clauses went with which index columns, but
on reflection that's a data structure that only an old-line Lisp hacker
could love.  Worse, it adds unnecessary complication to the many places
that don't much care which clauses go with which index columns.  Revert
to the previous arrangement of flat lists of clauses, and instead add a
parallel integer list of column numbers.  The places that care about the
pairing can chase both lists with forboth(), while the places that don't
care just examine one list the same as before.

The only real downside to this is that there are now two more lists that
need to be passed to amcostestimate functions in case they care about
column matching (which btcostestimate does, so not passing the info is not
an option).  Rather than deal with 11-argument amcostestimate functions,
pass just the IndexPath and expect the functions to extract fields from it.
That gets us down to 7 arguments which is better than 11, and it seems
more future-proof against likely additions to the information we keep
about an index path.
2011-12-24 19:03:21 -05:00
Tom Lane e2c2c2e8b1 Improve planner's handling of duplicated index column expressions.
It's potentially useful for an index to repeat the same indexable column
or expression in multiple index columns, if the columns have different
opclasses.  (If they share opclasses too, the duplicate column is pretty
useless, but nonetheless we've allowed such cases since 9.0.)  However,
the planner failed to cope with this, because createplan.c was relying on
simple equal() matching to figure out which index column each index qual
is intended for.  We do have that information available upstream in
indxpath.c, though, so the fix is to not flatten the multi-level indexquals
list when putting it into an IndexPath.  Then we can rely on the sublist
structure to identify target index columns in createplan.c.  There's a
similar issue for index ORDER BYs (the KNNGIST feature), so introduce a
multi-level-list representation for that too.  This adds a bit more
representational overhead, but we might more or less buy that back by not
having to search for matching index columns anymore in createplan.c;
likewise btcostestimate saves some cycles.

Per bug #6351 from Christian Rudolph.  Likely symptoms include the "btree
index keys must be ordered by attribute" failure shown there, as well as
"operator MMMM is not a member of opfamily NNNN".

Although this is a pre-existing problem that can be demonstrated in 9.0 and
9.1, I'm not going to back-patch it, because the API changes in the planner
seem likely to break things such as index plugins.  The corner cases where
this matters seem too narrow to justify possibly breaking things in a minor
release.
2011-12-23 18:45:14 -05:00
Robert Haas 0e4611c023 Add a security_barrier option for views.
When a view is marked as a security barrier, it will not be pulled up
into the containing query, and no quals will be pushed down into it,
so that no function or operator chosen by the user can be applied to
rows not exposed by the view.  Views not configured with this
option cannot provide robust row-level security, but will perform far
better.

Patch by KaiGai Kohei; original problem report by Heikki Linnakangas
(in October 2009!).  Review (in earlier versions) by Noah Misch and
others.  Design advice by Tom Lane and myself.  Further review and
cleanup by me.
2011-12-22 16:16:31 -05:00
Tom Lane 1db5af2794 Fix gincostestimate to handle ScalarArrayOpExpr reasonably.
The original coding of this function overlooked the possibility that
it could be passed anything except simple OpExpr indexquals.  But
ScalarArrayOpExpr is possible too, and the code would probably crash
(and surely give ridiculous answers) in such a case.  Add logic to try
to estimate sanely for such cases.

In passing, fix the treatment of inner-indexscan cost estimation: it was
failing to scale up properly for multiple iterations of a nestloop.
(I think somebody might've thought that index_pages_fetched() is linear,
but of course it's not.)

Report, diagnosis, and preliminary patch by Marti Raudsepp; I refactored
it a bit and fixed the cost estimation.

Back-patch into 9.1 where the bogus code was introduced.
2011-12-20 19:57:34 -05:00
Tom Lane 8daeb5ddd6 Add SP-GiST (space-partitioned GiST) index access method.
SP-GiST is comparable to GiST in flexibility, but supports non-balanced
partitioned search structures rather than balanced trees.  As described at
PGCon 2011, this new indexing structure can beat GiST in both index build
time and query speed for search problems that it is well matched to.

There are a number of areas that could still use improvement, but at this
point the code seems committable.

Teodor Sigaev and Oleg Bartunov, with considerable revisions by Tom Lane
2011-12-17 16:42:30 -05:00
Tom Lane eb5834d5af Further improvement of make_greater_string.
Make sure that it considers all the possibilities that the old code did,
instead of trying only one possibility per character position.  To keep the
runtime in bounds, instead tweak the character incrementers to not try
every possible multibyte character code.  Remove unnecessary logic to
restore the old character value on failure.  Additional comment and
formatting cleanup.
2011-10-30 12:22:11 -04:00
Robert Haas 78d523b633 Improve make_greater_string() with encoding-specific incrementers.
This infrastructure doesn't in any way guarantee that the character
we produce will sort before the one we incremented; but it does at least
make it much more likely that we'll end up with something that is a valid
character, which improves our chances.

Kyotaro Horiguchi, with various adjustments by me.
2011-10-29 14:22:20 -04:00
Tom Lane 0f39d5050d Don't trust deferred-unique indexes for join removal.
The uniqueness condition might fail to hold intra-transaction, and assuming
it does can give incorrect query results.  Per report from Marti Raudsepp,
though this is not his proposed patch.

Back-patch to 9.0, where both these features were introduced.  In the
released branches, add the new IndexOptInfo field to the end of the struct,
to try to minimize ABI breakage for third-party code that may be examining
that struct.
2011-10-23 00:43:39 -04:00
Tom Lane 9e8da0f757 Teach btree to handle ScalarArrayOpExpr quals natively.
This allows "indexedcol op ANY(ARRAY[...])" conditions to be used in plain
indexscans, and particularly in index-only scans.
2011-10-16 15:39:24 -04:00
Tom Lane 79edb2b1dc Fix recursion into previously planned sub-query in examine_simple_variable.
This code was looking at the sub-Query tree as seen in the parent query's
RangeTblEntry; but that's the pristine parser output, and what we need to
look at is the tree as it stands at the completion of planning.  Otherwise
we might pick up a Var that references a subquery that got flattened and
hence has no RelOptInfo in the subroot.  Per report from Peter Geoghegan.
2011-09-29 18:13:16 -04:00