From b82323e05e57d7c4fb7a8eab9f27eb059d28309a Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Sat, 27 Nov 2004 00:01:02 +0000 Subject: [PATCH] This adds mention of my latest tweak to the tsearch2/pg_trgm integration. It is much better to create a word list of unstemmed words than stemmed ones. Chris K-L --- contrib/pg_trgm/README.pg_trgm | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/contrib/pg_trgm/README.pg_trgm b/contrib/pg_trgm/README.pg_trgm index ac2eb012de..608c30c455 100644 --- a/contrib/pg_trgm/README.pg_trgm +++ b/contrib/pg_trgm/README.pg_trgm @@ -100,11 +100,15 @@ Tsearch2 Integration The first step is to generate an auxiliary table containing all the unique words in the Tsearch2 index: - CREATE TABLE words AS - SELECT word FROM stat('SELECT vector FROM documents'); + CREATE TABLE words AS SELECT word FROM + stat('SELECT to_tsvector(''simple'', bodytext) FROM documents'); - Where 'documents' is the table that contains the Tsearch2 index - column 'vector', of type 'tsvector'. + Where 'documents' is a table that has a text field 'bodytext' + that TSearch2 is used to search. The use of the 'simple' dictionary + with the to_tsvector function, instead of just using the already + existing vector is to avoid creating a list of already stemmed + words. This way, only the original, unstemmed words are added + to the word list. Next, create a trigram index on the word column: