Commit Graph

21 Commits

Author SHA1 Message Date
Teodor Sigaev 22505f4703 Add thesaurus dictionary which can replace N>0 lexemes by M>0 lexemes.
It required some changes in lexize algorithm, but interface with
dictionaries stays compatible with old dictionaries.

Funded by Georgia Public Library Service and LibLime, Inc.
2006-05-31 14:05:31 +00:00
Teodor Sigaev 8a3631f8d8 GIN: Generalized Inverted iNdex.
text[], int4[], Tsearch2 support for GIN.
2006-05-02 11:28:56 +00:00
Teodor Sigaev 38c4fe87ac Significantly improve ranking:
1) rank_cd now use weight of lexemes
2) rank_cd and rank can use any combination of normalization methods:
        no normalization
        normalization by log(length of document)
        -----/------- by length of document
        -----/------- by number of unique word in document
        -----/------- by log(number of unique word in document)
        -----/------- by number of covers (only rank_cd)

Improve cover's search.

TODO: changes in documentation
2006-03-02 19:07:19 +00:00
Teodor Sigaev 011c520cb6 renew output of regression test accordingly to
http://archives.postgresql.org/pgsql-committers/2006-02/msg00089.php
2006-02-10 11:18:40 +00:00
Teodor Sigaev 5e2707c45f Snowball multibyte. It's a pity, but snowball sources is very diferent for multibyte and
singlebyte encodings, so we should have snowball for every encodings.

I hope that finalize multibyte support work in tsearch2, but testing is needed...
2006-01-27 16:32:31 +00:00
Teodor Sigaev c52795d18a Text parser rewritten:
- supports multibyte encodings
        - more strict rules for lexemes
        - flex isn't used
Add:
        - tsquery plainto_tsquery(text)
          Function makes tsquery from plain text.
        - &&, ||, !! operation for tsquery for combining
          tsquery from it's parts:  'foo & bar' || 'asd' => 'foo & bar | asd'
2005-11-21 12:27:57 +00:00
Teodor Sigaev 134bed8089 Fix rwrite(ARRAY) on 64-bit boxes:
Instead of getting elements of array manually call deconstruct_array
2005-11-09 09:26:04 +00:00
Teodor Sigaev 0645663e6c New features for tsearch2:
1 Comparison operation for tsquery
2 Btree index on tsquery
3 numnode(tsquery) - returns 'length' of tsquery
4 tsquery @ tsquery, tsquery ~ tsquery - contains, contained for tsquery.
  Note: They don't gurantee exact result, only MAY BE, so it
  useful only for speed up rewrite functions
5 GiST index support for @,~
6 rewrite():
        select rewrite(orig, what, to);
        select rewrite(ARRAY[orig, what, to]) from tsquery_table;
        select rewrite(orig, 'select what, to from tsquery_table;');
7 significantly improve cover algorithm
2005-11-08 17:08:46 +00:00
Teodor Sigaev 21b748e76a 1 Fix problem with lost precision in rank with OR-ed lexemes
2 Allow tsquery_in to input void tsquery: resolve dump/restore problem with tsquery
2005-10-28 13:05:06 +00:00
Bruce Momjian bb3cce4ec9 Add E'' syntax so eventually normal strings can treat backslashes
literally.

Add GUC variables:

        "escape_string_warning" - warn about backslashes in non-E strings
        "escape_string_syntax" - supports E'' syntax?
        "standard_compliant_strings" - treats backslashes literally in ''

Update code to use E'' when escapes are used.
2005-06-26 03:04:37 +00:00
Tom Lane b04e70b11e Adjust tsearch2.sql to avoid use of COPY FROM STDIN, so as to
simplify life for the Win32 installer.  Per Dave Page.
2004-09-14 03:58:54 +00:00
Teodor Sigaev bb89237531 1 Eliminate duplicate field HLWORD->skip
2 Rework support for html tags in parser
3 add HighlightAll to headline function for generating highlighted
  whole text with saved html tags
2004-06-28 16:19:09 +00:00
Teodor Sigaev 09bc52fe73 Fix stupid bug in installcheck 2004-06-23 09:43:43 +00:00
Teodor Sigaev a6ea6457fa Stat function now can show statistics per weight of lexemes 2004-05-28 15:36:49 +00:00
Teodor Sigaev eebdfcdbe6 1 Minimize memory allocation for void (but not null) value.
2 Add silly ordering for ts_vector to aim grouping, union, except etc. Don't use BTree opclass (tsvector_ops).
2004-03-25 16:56:10 +00:00
Teodor Sigaev 0b1ee9b5a3 fix hlfinditem function. Thanks to "Stphane Bidoul" <stephane.bidoul@softwareag.com>.
The 'word' variable there is initialised from
the prs->words array, but immediately after,
that array may be reallocated, thus leaving
word pointing to unallocated memory.
2003-09-22 13:32:33 +00:00
Teodor Sigaev 61366a9503 More accuracy works with stopwords in queries 2003-08-28 12:23:24 +00:00
Teodor Sigaev dd2870f76f Add ts_debug function for debugging configurations 2003-08-06 09:19:21 +00:00
Tom Lane 6ed071bca5 Update contrib regression tests for recent error message editing. 2003-08-01 02:38:09 +00:00
Teodor Sigaev 8f146a9077 Fix output to psql:tsearch2.sql:13: NOTICE: ... "pg_ts_dict_pkey" 2003-07-21 15:15:19 +00:00
Teodor Sigaev b88605337e tsearch2 module 2003-07-21 10:27:44 +00:00