Commit Graph

171 Commits

Author SHA1 Message Date
Teodor Sigaev
dd92a8c33f Fix type in return value 2006-11-21 18:31:28 +00:00
Teodor Sigaev
419fe7cd1b Fix bug http://archives.postgresql.org/pgsql-bugs/2006-10/msg00258.php.
Fix string's length calculation for recoding, fix strlower() to avoid wrong
assumption about length of recoded string (was: recoded string is no greater
that source, it may not true for multibyte encodings)
Thanks to Thomas H. <me@alternize.com> and Magnus Hagander <mha@sollentuna.net>
2006-11-20 14:03:30 +00:00
Neil Conway
9d6f26325f Fix two typos. 2006-11-08 19:06:15 +00:00
Teodor Sigaev
092ed294fc New README, forgotten when docs was updated 2006-11-08 16:00:29 +00:00
Teodor Sigaev
bf028fa8a6 Add description of new features 2006-10-31 16:23:05 +00:00
Tom Lane
e378f82e00 Make use of qsort_arg in several places that were formerly using klugy
static variables.  This avoids any risk of potential non-reentrancy,
and in particular offers a much cleaner workaround for the Intel compiler
bug that was affecting ginutil.c.
2006-10-05 17:57:40 +00:00
Tom Lane
c48f2e3124 Improve error messages from to_tsquery per yesterday's discussion:
provide the bad input, and be sure to mention that we are talking about
a tsearch query.
2006-10-04 17:52:52 +00:00
Bruce Momjian
f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Bruce Momjian
26ffa627ac Update tsearch2 README.
Robert Treat
2006-10-02 22:32:10 +00:00
Tom Lane
41dcc65c0e Rename the uninstall scripts for contrib/lo and contrib/tsearch2 to
match the convention that foo's uninstall script is uninstall_foo.sql.
Also, stop installing lo_test.sql, which really ought to be made into
a regression test anyway (though it's unclear how to avoid a dependency
on the current OID counter...)
2006-09-11 15:14:46 +00:00
Tom Lane
684ad6a92f Rename contrib contains/contained-by operators to @> and <@, per discussion. 2006-09-10 17:36:52 +00:00
Bruce Momjian
0c4f2894f9 Use '' rather than \' for literal single quotes in strings in
/contrib/tsearch2.

Teodor Sigaev
2006-09-02 22:03:30 +00:00
Teodor Sigaev
72a3582522 Add description of tsvector type layout 2006-08-29 13:57:34 +00:00
Teodor Sigaev
3a214ab0f1 Remove pos comparison in silly_cmp_tsvector(): it is not a semantically significant 2006-08-29 13:39:20 +00:00
Teodor Sigaev
9711782628 Fix incorrect length of lexemes in silly_cmp_tsvector() 2006-08-29 13:31:54 +00:00
Teodor Sigaev
74dbba701f Fix regression tests: after changing comparing function
order is changed.
2006-08-25 07:39:08 +00:00
Teodor Sigaev
8f91e2b607 Fix compare bug for tsvector: problem was in aligment. Per Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> and Phil Frost <indigo@bitglue.com> 2006-08-24 17:37:34 +00:00
Tom Lane
a7143b3088 Fix some makefiles that fail to yield good results from 'make -qp'.
This doesn't really matter for ordinary building of Postgres, but it's
useful for automated checks, such as my just-committed pgcheckdefines.
2006-07-15 03:33:14 +00:00
Tom Lane
ae643747b1 Fix a passel of recently-committed violations of the rule 'thou shalt
have no other gods before c.h'.  Also remove some demonstrably redundant
#include lines, mostly of <errno.h> which was added to c.h years ago.
2006-07-14 05:28:29 +00:00
Bruce Momjian
66c15dfda1 Adjust /contrib for new include file contents. 2006-07-13 16:57:31 +00:00
Bruce Momjian
ac230e7431 Alphabetically order reference to include files, "S"-"Z". 2006-07-11 18:26:11 +00:00
Bruce Momjian
0ff3461bcc Alphabetically order reference to include files, "N" - "S". 2006-07-11 17:26:59 +00:00
Teodor Sigaev
234163649e GIN improvements
- Replace sorted array of entries in maintenance_work_mem to binary tree,
  this should improve create performance.
- More precisely calculate allocated memory, eliminate leaks
  with user-defined extractValue()
- Improve wordings in tsearch2
2006-07-11 16:55:34 +00:00
Bruce Momjian
fa601357fb Sort reference of include files, "A" - "F". 2006-07-11 16:35:33 +00:00
Bruce Momjian
c5133e5920 Allow /contrib include files to compile on their own. 2006-07-10 22:06:11 +00:00
Teodor Sigaev
1f7ef548ec Changes
* new split algorithm (as proposed in http://archives.postgresql.org/pgsql-hackers/2006-06/msg00254.php)
  * possible call pickSplit() for second and below columns
  * add spl_(l|r)datum_exists to GIST_SPLITVEC -
    pickSplit should check its values to use already defined
    spl_(l|r)datum for splitting. pickSplit should set
    spl_(l|r)datum_exists to 'false' (if they was 'true') to
    signal to caller about using spl_(l|r)datum.
  * support for old pickSplit(): not very optimal
    but correct split
* remove 'bytes' field from GISTENTRY: in any case size of
  value is defined by it's type.
* split GIST_SPLITVEC to two structures: one for using in picksplit
  and second - for internal use.
* some code refactoring
* support of subsplit to rtree opclasses

TODO: add support of subsplit to contrib modules
2006-06-28 12:00:14 +00:00
Teodor Sigaev
04e9704b9e Now ispell dictionary can eat dictionaries in MySpell format,
used by OpenOffice. Dictionaries are placed at
http://lingucomponent.openoffice.org/spell_dic.html
Dictionary automatically recognizes format of files.

Warning. MySpell's format has limitation with compound
word support: it's impossible to mark affix as
compound-only affix. So for norwegian, german etc
languages it's recommended to use original ispell format.
For that reason I don't want to remove my2ispell
scripts, it's has workaround at least for norwegian language.
2006-06-09 13:25:59 +00:00
Teodor Sigaev
92bcb5abe0 Allow do not lexize words in substitution.
Docs will be submitted some later, now it's at
 http://www.sai.msu.su/~megera/oddmuse/index.cgi/Thesaurus_dictionary
2006-06-06 16:25:55 +00:00
Teodor Sigaev
a513ce2dff Fix wrong NOTICE/ERROR levels 2006-06-02 18:03:06 +00:00
Teodor Sigaev
efe1d427da Distinguish between stop-word recognized in thesaurus_lexize() 2006-06-02 17:55:40 +00:00
Teodor Sigaev
c7faf45160 Add more strict check of stop and non-recognized words,
allow only recognized words in thezaurus configuration file.
2006-06-02 15:35:42 +00:00
Teodor Sigaev
c269f0f1e2 fix comparison with SPI_processed 2006-05-31 14:53:41 +00:00
Teodor Sigaev
22505f4703 Add thesaurus dictionary which can replace N>0 lexemes by M>0 lexemes.
It required some changes in lexize algorithm, but interface with
dictionaries stays compatible with old dictionaries.

Funded by Georgia Public Library Service and LibLime, Inc.
2006-05-31 14:05:31 +00:00
Tom Lane
a0ffab351e Magic blocks don't do us any good unless we use 'em ... so install one
in every shared library.
2006-05-30 22:12:16 +00:00
Bruce Momjian
19892feb3c Back out \' change for tsearch2, broke regression tests. 2006-05-19 04:39:47 +00:00
Bruce Momjian
cc84163fa9 Use SQL standard '' rather than \' in /contrib. Backpatch to 8.1.X. 2006-05-19 02:38:47 +00:00
Teodor Sigaev
8a3631f8d8 GIN: Generalized Inverted iNdex.
text[], int4[], Tsearch2 support for GIN.
2006-05-02 11:28:56 +00:00
Teodor Sigaev
e30df619cd Fix stupid mistake in rank_cd_def cleanup 2006-04-10 09:56:52 +00:00
Bruce Momjian
f3d99d160d Add CVS tag lines to files that were lacking them. 2006-03-11 04:38:42 +00:00
Teodor Sigaev
38c4fe87ac Significantly improve ranking:
1) rank_cd now use weight of lexemes
2) rank_cd and rank can use any combination of normalization methods:
        no normalization
        normalization by log(length of document)
        -----/------- by length of document
        -----/------- by number of unique word in document
        -----/------- by log(number of unique word in document)
        -----/------- by number of covers (only rank_cd)

Improve cover's search.

TODO: changes in documentation
2006-03-02 19:07:19 +00:00
Neil Conway
8e5a10d46c This patch makes the error message strings throughout the backend
more compliant with the error message style guide. In particular,
errdetail should begin with a capital letter and end with a period,
whereas errmsg should not. I also fixed a few related issues in
passing, such as fixing the repeated misspelling of "lexeme" in
contrib/tsearch2 (per Tom's suggestion).
2006-03-01 06:30:32 +00:00
Peter Eisentraut
7f4f42fa10 Clean up CREATE FUNCTION syntax usage in contrib and elsewhere, in
particular get rid of single quotes around language names and old WITH ()
construct.
2006-02-27 16:09:50 +00:00
Teodor Sigaev
dde9457294 Fixing and improve compound word support. This changes cannot be applied to
previous version iwthout recreating tsvector fields...

Thanks to Alexander Presber <aljoscha@weisshuhn.de> to discover a problem.
2006-02-20 17:51:05 +00:00
Tom Lane
b35fdaaa1a Clean up some signedness warnings. 2006-02-10 15:57:58 +00:00
Teodor Sigaev
01f2172ec1 Allow "'" symbol in affixes ("'s" affix in english): it was diallowed during
multibyte support work.
Add line number to error output during affix file parsing.
2006-02-10 12:56:14 +00:00
Teodor Sigaev
011c520cb6 renew output of regression test accordingly to
http://archives.postgresql.org/pgsql-committers/2006-02/msg00089.php
2006-02-10 11:18:40 +00:00
Teodor Sigaev
46a25ce6a9 1 Fix bug with very short word: prefix and suffix might be overlapped,
sorry but fix can't be applyed to previous version: it's require
  refill tsvector...
2 Small optimize of load time for huge dictionaries
3 use palloc instead of malloc during load dict file
2006-02-09 18:04:20 +00:00
Teodor Sigaev
a6fefc866c Check number of affixes to prevent core dump with zero number of affixes 2006-02-06 15:45:34 +00:00
Teodor Sigaev
5e2707c45f Snowball multibyte. It's a pity, but snowball sources is very diferent for multibyte and
singlebyte encodings, so we should have snowball for every encodings.

I hope that finalize multibyte support work in tsearch2, but testing is needed...
2006-01-27 16:32:31 +00:00
Teodor Sigaev
80324fb1e3 Fix typeing as Tom suggest 2006-01-23 14:24:06 +00:00