Commit Graph

82 Commits

Author SHA1 Message Date
Tom Lane 40608e7f94 When estimating the selectivity of an inequality "column > constant" or
"column < constant", and the comparison value is in the first or last
histogram bin or outside the histogram entirely, try to fetch the actual
column min or max value using an index scan (if there is an index on the
column).  If successful, replace the lower or upper histogram bound with
that value before carrying on with the estimate.  This limits the
estimation error caused by moving min/max values when the comparison
value is close to the min or max.  Per a complaint from Josh Berkus.

It is tempting to consider using this mechanism for mergejoinscansel as well,
but that would inject index fetches into main-line join estimation not just
endpoint cases.  I'm refraining from that until we can get a better handle
on the costs of doing this type of lookup.
2010-01-04 02:44:40 +00:00
Bruce Momjian 0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Tom Lane 21d11e7ee2 Avoid unnecessary copying of source string when generating a cloned TParser.
For long source strings the copying results in O(N^2) behavior, and the
multiplier can be significant if wide-char conversion is involved.

Andres Freund, reviewed by Kevin Grittner.
2009-12-15 20:37:17 +00:00
Tom Lane 908854209b Avoid core dump on empty thesaurus dictionary.
Per report from Robert Gravsjö.
2009-11-30 16:38:31 +00:00
Peter Eisentraut 66363e8d6d Make text search parser accept underscores in XML attributes (bug #5075) 2009-11-15 13:57:01 +00:00
Peter Eisentraut f1c5247563 Simplify a few makefile rules since install-sh can now install multiple
files in one run.
2009-10-26 21:33:01 +00:00
Tom Lane dd6de24e69 Remove duplicate variable initializations identified by clang static checker.
One of these represents a nontrivial bug (a promptly-leaked palloc), so
backpatch.

Greg Stark
2009-08-30 16:53:31 +00:00
Peter Eisentraut 9d182ef002 Update of install-sh, mkinstalldirs, and associated configury
Update install-sh to that from Autoconf 2.63, plus our Darwin-specific
changes (which I simplified a bit).  install-sh is now able to install
multiple files in one run, so we could simplify our makefiles sometime.

install-sh also now has a -d option to create directories, so we don't need
mkinstalldirs anymore.

Use AC_PROG_MKDIR_P in configure.in, so we can use mkdir -p when available
instead of install-sh -d.  For consistency with the rest of the world,
the corresponding make variable has been renamed from $(mkinstalldirs) to
$(MKDIR_P).
2009-08-26 22:24:44 +00:00
Teodor Sigaev a88a48011c Introduce filtering dictionary support to tsearch. Propagate --nolocale option
to CREATE DATABASE command in pg_regress to allow correct checking of
locale-sensitive contrib modules.
2009-08-18 10:30:41 +00:00
Teodor Sigaev abd8c94ff9 Add prefix support for synonym dictionary 2009-08-14 14:53:20 +00:00
Peter Eisentraut de160e2c00 Make backend header files C++ safe
This alters various incidental uses of C++ key words to use other similar
identifiers, so that a C++ compiler won't choke outright.  You still
(probably) need extern "C" { }; around the inclusion of backend headers.

based on a patch by Kurt Harriman <harriman@acm.org>

Also add a script cpluspluscheck to check for C++ compatibility in the
future.  As of right now, this passes without error for me.
2009-07-16 06:33:46 +00:00
Bruce Momjian d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Tom Lane a734979e0a Fix tsquerysel() to not fail on an empty TSQuery. Per report from
Tatsuo Ishii.
2009-06-03 18:42:13 +00:00
Teodor Sigaev e43bb5beb7 Some languages have symbols with zero display's width or/and vowels/signs which
are not an alphabetic character although they are not word-breakers too.
So, treat them as part of word.

Per off-list discussion with Dibyendra Hyoju <dibyendra@gmail.com> and
and Bal Krishna Bal <balkrishna7bal@gmail.com> about Nepali language and
Devanagari alphabet.
2009-03-11 16:03:40 +00:00
Teodor Sigaev 42831729f7 Prevent recursion during parse of email-like string with multiple '@'.
Patch by Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
2009-03-10 17:32:14 +00:00
Teodor Sigaev 32032d42b5 Fix usage of char2wchar/wchar2char. Changes:
- pg_wchar and wchar_t could have different size, so char2wchar
  doesn't call pg_mb2wchar_with_len to prevent out-of-bound
  memory bug
- make char2wchar/wchar2char symmetric, now they should not be
  called with C-locale because mbstowcs/wcstombs oftenly doesn't
  work correct with C-locale.
- Text parser uses pg_mb2wchar_with_len directly in case of
  C-locale and multibyte encoding

Per bug report by Hiroshi Inoue <inoue@tpf.co.jp> and
following discussion.

Backpatch up to 8.2 when multybyte support was implemented in tsearch.
2009-03-02 15:10:09 +00:00
Teodor Sigaev b5b3134813 Fix incorrect dereferencing of char* to array's index.
Per Tommy Gildseth <tommy.gildseth@usit.uio.no> report
2009-01-29 16:22:10 +00:00
Teodor Sigaev 41d17e042b Fix URL generation in headline. Only tag lexeme will be replaced by space.
Per http://archives.postgresql.org/pgsql-bugs/2008-12/msg00013.php
2009-01-15 16:33:59 +00:00
Teodor Sigaev 8fd07a35ba Fix generation too long headline with ShortWords.
Per http://archives.postgresql.org/pgsql-hackers/2008-09/msg01088.php
2009-01-15 16:33:28 +00:00
Bruce Momjian 511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
Tom Lane 301194f8ea Reduce the scaling factor for attstattarget to number-of-lexemes from 100
to 10, to compensate for the recent change in default statistics target.
The original number was pulled out of the air anyway :-(, but it was picked
in the context of the old default, so holding the default size of the
MCELEM array constant seems the best thing.  Per discussion.
2008-12-15 15:06:31 +00:00
Tom Lane 65e3ea7641 Increase the default value of default_statistics_target from 10 to 100,
and its maximum value from 1000 to 10000.  ALTER TABLE SET STATISTICS
similarly now allows a value up to 10000.  Per discussion.
2008-12-13 19:13:44 +00:00
Heikki Linnakangas a93b3b98cd Fix bug in the tsvector stats collection function, which caused a crash if
the sample contains just a one tsvector, containing only one lexeme.
2008-11-27 21:17:39 +00:00
Tom Lane 2b74d45c1b pg_do_encoding_conversion cannot return NULL (at least not unless the input
is NULL), so remove some useless tests for the case.
2008-11-10 15:18:40 +00:00
Teodor Sigaev 2a0083ede8 Improve headeline generation. Now headline can contain
several fragments a-la Google.

Sushant Sinha <sushant354@gmail.com>
2008-10-17 18:05:19 +00:00
Teodor Sigaev 906b7e5f6c Fix small bug in headline generation.
Patch from Sushant Sinha <sushant354@gmail.com>
http://archives.postgresql.org/pgsql-hackers/2008-07/msg00785.php
2008-10-17 17:27:46 +00:00
Tom Lane 4e57668da4 Create a selectivity estimation function for the text search @@ operator.
Jan Urbanski
2008-09-19 19:03:41 +00:00
Tom Lane 6f6d863258 Create a type-specific typanalyze routine for tsvector, which collects stats
on the most common individual lexemes in place of the mostly-useless default
behavior of counting duplicate tsvectors.  Future work: create selectivity
estimation functions that actually do something with these stats.

(Some other things we ought to look at doing: using the Lossy Counting
algorithm in compute_minimal_stats, and using the element-counting idea for
stats on regular arrays.)

Jan Urbanski
2008-07-14 00:51:46 +00:00
Tom Lane 30dc388a0d Fix a few places that were non-multibyte-safe in tsearch configuration file
parsing.  Per bug #4253 from Giorgio Valoti.
2008-06-19 16:52:24 +00:00
Tom Lane fbeb9da22b Improve error reporting for problems in text search configuration files
by installing an error context subroutine that will provide the file name
and line number for all errors detected while reading a config file.
Some of the reader routines were already doing that in an ad-hoc way for
errors detected directly in the reader, but it didn't help for problems
detected in subroutines, such as encoding violations.

Back-patch to 8.3 because 8.3 is where people will be trying to debug
configuration files.
2008-06-18 20:55:42 +00:00
Bruce Momjian 9de09c087d Move wchar2char() and char2wchar() from tsearch into /mb to be easier to
use for other modules;  also move pnstrdup().

Clean up code slightly.
2008-06-18 18:42:54 +00:00
Bruce Momjian dc69c0362f Move USE_WIDE_UPPER_LOWER define to c.h, and remove TS_USE_WIDE and use
USE_WIDE_UPPER_LOWER instead.
2008-06-17 16:09:06 +00:00
Tom Lane e6dbcb72fa Extend GIN to support partial-match searches, and extend tsquery to support
prefix matching using this facility.

Teodor Sigaev and Oleg Bartunov
2008-05-16 16:31:02 +00:00
Alvaro Herrera f8c4d7db60 Restructure some header files a bit, in particular heapam.h, by removing some
unnecessary #include lines in it.  Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.

For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.

While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
2008-05-12 00:00:54 +00:00
Tom Lane 220db7ccd8 Simplify and standardize conversions between TEXT datums and ordinary C
strings.  This patch introduces four support functions cstring_to_text,
cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and
two macros CStringGetTextDatum and TextDatumGetCString.  A number of
existing macros that provided variants on these themes were removed.

Most of the places that need to make such conversions now require just one
function or macro call, in place of the multiple notational layers that used
to be needed.  There are no longer any direct calls of textout or textin,
and we got most of the places that were using handmade conversions via
memcpy (there may be a few still lurking, though).

This commit doesn't make any serious effort to eliminate transient memory
leaks caused by detoasting toasted text objects before they reach
text_to_cstring.  We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few
places where it was easy, but much more could be done.

Brendan Jurd and Tom Lane
2008-03-25 22:42:46 +00:00
Tom Lane 7953fdcd9e Add a CaseSensitive parameter to synonym dictionaries.
Simon Riggs
2008-03-10 03:01:28 +00:00
Teodor Sigaev 3b8bca335d Fix memory arrangement of tsquery after removing stop words. It causes
a unused memory holes in tsquery.

Per report by Richard Huxton <dev@archonet.com>.

It was working well because in fact tsquery->size is not used for any
kind of operation except comparing tsqueries. So, in HEAD it's enough to
fix to_tsquery function, but for previous version it's needed to
remove optimization in CompareTSQ to prevent requirement of renew all
stored tsquery.
2008-03-07 14:30:20 +00:00
Bruce Momjian 910bc51862 When text search string is too long, in error message report actual and
maximum number of bytes allowed.
2008-03-05 15:50:37 +00:00
Peter Eisentraut 0474dcb608 Refactor backend makefiles to remove lots of duplicate code 2008-02-19 10:30:09 +00:00
Peter Eisentraut a345dcd2f7 Observe errors in makefile 2008-02-18 16:04:32 +00:00
Tom Lane 716e8b8374 Fix RS_isRegis() to agree exactly with RS_compile()'s idea of what's a valid
regis.  Correct the latter's oversight that a bracket-expression needs to be
terminated.  Reduce the ereports to elogs, since they are now not expected to
ever be hit (thus addressing Alvaro's original complaint).
In passing, const-ify the string argument to RS_compile.
2008-01-21 02:46:11 +00:00
Teodor Sigaev cd42dd5a17 Fix core dump with buffer-overrun by too long infinitive. Add checking of using
fixed length arrays to prevent array's overrun. Per report by
Hannes Dorbath <light@theendofthetunnel.de> and comments by Tom.
2008-01-16 13:01:03 +00:00
Tom Lane deb7deda26 Tweak new error message to conform to style guidelines. 2008-01-15 18:22:47 +00:00
Teodor Sigaev f7807f1de8 Add check of headline method presence. Per report by Yoshiyuki Asaba <y-asaba@sraoss.co.jp> 2008-01-15 17:16:01 +00:00
Bruce Momjian 9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
Peter Eisentraut f5f1355dc4 Wording improvements 2007-12-27 13:02:48 +00:00
Tom Lane bb0e3011f8 Make a cleanup pass over error reports in tsearch code. Use ereport
for user-facing errors, fix some poor choices of errcode, adhere to
message style guide.
2007-11-28 21:56:30 +00:00
Peter Eisentraut a238bd146d Proper capitalization of Ispell 2007-11-28 15:42:46 +00:00
Peter Eisentraut 2609345c85 Improve terminology 2007-11-28 13:30:36 +00:00
Bruce Momjian 43e082fc98 Change a stop word on the right-hand-side in the thesaurus file to be an
ERROR, not NOTICE.
2007-11-28 04:24:38 +00:00