Commit Graph

30 Commits

Author SHA1 Message Date
Neil Conway
8e5a10d46c This patch makes the error message strings throughout the backend
more compliant with the error message style guide. In particular,
errdetail should begin with a capital letter and end with a period,
whereas errmsg should not. I also fixed a few related issues in
passing, such as fixing the repeated misspelling of "lexeme" in
contrib/tsearch2 (per Tom's suggestion).
2006-03-01 06:30:32 +00:00
Teodor Sigaev
dde9457294 Fixing and improve compound word support. This changes cannot be applied to
previous version iwthout recreating tsvector fields...

Thanks to Alexander Presber <aljoscha@weisshuhn.de> to discover a problem.
2006-02-20 17:51:05 +00:00
Teodor Sigaev
01f2172ec1 Allow "'" symbol in affixes ("'s" affix in english): it was diallowed during
multibyte support work.
Add line number to error output during affix file parsing.
2006-02-10 12:56:14 +00:00
Teodor Sigaev
46a25ce6a9 1 Fix bug with very short word: prefix and suffix might be overlapped,
sorry but fix can't be applyed to previous version: it's require
  refill tsvector...
2 Small optimize of load time for huge dictionaries
3 use palloc instead of malloc during load dict file
2006-02-09 18:04:20 +00:00
Teodor Sigaev
a6fefc866c Check number of affixes to prevent core dump with zero number of affixes 2006-02-06 15:45:34 +00:00
Teodor Sigaev
7ac8a4be89 Multibyte encodings support for ISpell dictionary 2005-12-21 13:05:49 +00:00
Teodor Sigaev
cb4ea994c6 Improve support of multibyte encoding:
- tsvector_(in|out)
- tsquery_(in|out)
- to_tsvector
- to_tsquery, plainto_tsquery
- 'simple' dictionary
2005-12-12 11:10:12 +00:00
Bruce Momjian
1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane
8a65b820e2 Suppress signed-vs-unsigned-char warnings in contrib. 2005-09-24 19:14:05 +00:00
Bruce Momjian
21634e513f Add extra argument for new pg_regexec API. 2005-07-10 18:31:59 +00:00
Tom Lane
c0e0d3e2e9 Avoid unnecessary dependence on u_int16_t, per buildfarm failure.
(It doesn't compile on HPUX either...)
2005-01-26 18:49:39 +00:00
Teodor Sigaev
324300bc7c improve support of agglutinative languages (query with compound words).
regression=# select to_tsquery( '\'fotballklubber\'');
                   to_tsquery
------------------------------------------------
 'fotball' & 'klubb' | 'fot' & 'ball' & 'klubb'
(1 row)

So, changed interface to dictionaries, lexize method of dictionary shoud return
pointer to aray of TSLexeme structs instead of char**. Last element should
have TSLexeme->lexeme == NULL.

typedef struct {
        /* number of variant of split word , for example
                Word 'fotballklubber' (norwegian) has two varian to split:
                ( fotball, klubb ) and ( fot, ball, klubb ). So, dictionary
                should return:
                nvariant        lexeme
                1               fotball
                1               klubb
                2               fot
                2               ball
                2               klubb

        */
        uint16  nvariant;

        /* currently unused */
        uint16  flags;

        /* C-string */
        char    *lexeme;
} TSLexeme;
2005-01-25 15:24:38 +00:00
Teodor Sigaev
5b354d2c7e Fixes:
1 Report error message instead of do nothing in case of error in regex
2 Malloced storage for mask, find and repl part of Affix. This parts may be
  large enough in real life (for example in czech, thanks to moje <moje@kalhotky.net>)
2005-01-11 16:07:55 +00:00
Bruce Momjian
b6b71b85bc Pgindent run for 8.0. 2004-08-29 05:07:03 +00:00
Teodor Sigaev
df9d87f608 Previous commit wasnt full... 2004-06-23 11:29:58 +00:00
Teodor Sigaev
de55c0cef6 1 Fix affixes with void replacement (AFAIK, it's only russian)
2 Optimize regex execution
2004-06-23 11:06:11 +00:00
Teodor Sigaev
7cb55d21ed Fix memory leak with pg_regexec 2004-05-31 13:55:19 +00:00
Teodor Sigaev
d222bb4d5e Fix memory leak with pg_regcomp 2004-05-31 13:52:57 +00:00
Teodor Sigaev
11864ab657 Win32 related patch by Darko Prenosil. Small correct by teodor 2004-05-31 13:29:43 +00:00
Tom Lane
a90b2a035f Suppress 'uninitialized variable' warning emitted by some (not all)
versions of gcc.  The code is correct AFAICS, but it requires slightly
more analysis than usual to see that the variable can't be used uninitialized.
2004-05-07 13:09:12 +00:00
Tom Lane
0bd61548ab Solve the 'Turkish problem' with undesirable locale behavior for case
conversion of basic ASCII letters.  Remove all uses of strcasecmp and
strncasecmp in favor of new functions pg_strcasecmp and pg_strncasecmp;
remove most but not all direct uses of toupper and tolower in favor of
pg_toupper and pg_tolower.  These functions use the same notions of
case folding already developed for identifier case conversion.  I left
the straight locale-based folding in place for situations where we are
just manipulating user data and not trying to match it to built-in
strings --- for example, the SQL upper() function is still locale
dependent.  Perhaps this will prove not to be what's wanted, but at
the moment we can initdb and pass regression tests in Turkish locale.
2004-05-07 00:24:59 +00:00
Tom Lane
47fe0517fc Fix some portability issues (reliance on gcc-isms). 2004-04-01 23:44:38 +00:00
Teodor Sigaev
125d69cd9b Fix signed char in comparison and check memory allocation 2003-12-18 19:27:53 +00:00
Teodor Sigaev
565dc5d1ae Fix integer types to use definition from c.h. Per bug report by Patrick Boulay <patrick.boulay@medrium.com> 2003-12-10 15:54:58 +00:00
Teodor Sigaev
6de3fe3c0d Avoid conflict strndup with glibc 2003-12-04 12:21:11 +00:00
Teodor Sigaev
cabdf460d3 Fix free instead of pfree 2003-11-28 12:09:02 +00:00
Teodor Sigaev
c63c1946a2 Optimize. Improve ispell support for compound words. This work was sponsored by ABC Startsiden AS. 2003-11-17 17:34:35 +00:00
Bruce Momjian
089003fb46 pgindent run. 2003-08-04 00:43:34 +00:00
Tom Lane
8fd5b3ed67 Error message editing in contrib (mostly by Joe Conway --- thanks Joe!) 2003-07-24 17:52:50 +00:00
Teodor Sigaev
b88605337e tsearch2 module 2003-07-21 10:27:44 +00:00