This adds mention of my latest tweak to the tsearch2/pg_trgm

integration.  It is much better to create a word list of unstemmed words
than stemmed ones.

Chris K-L
This commit is contained in:
Tom Lane 2004-11-27 00:01:02 +00:00
parent c2e5631760
commit b82323e05e
1 changed files with 8 additions and 4 deletions

View File

@ -100,11 +100,15 @@ Tsearch2 Integration
The first step is to generate an auxiliary table containing all
the unique words in the Tsearch2 index:
CREATE TABLE words AS
SELECT word FROM stat('SELECT vector FROM documents');
CREATE TABLE words AS SELECT word FROM
stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');
Where 'documents' is the table that contains the Tsearch2 index
column 'vector', of type 'tsvector'.
Where 'documents' is a table that has a text field 'bodytext'
that TSearch2 is used to search. The use of the 'simple' dictionary
with the to_tsvector function, instead of just using the already
existing vector is to avoid creating a list of already stemmed
words. This way, only the original, unstemmed words are added
to the word list.
Next, create a trigram index on the word column: