postgresql/doc/src/sgml/dict-xsyn.sgml

150 lines
4.3 KiB
Plaintext

<!-- doc/src/sgml/dict-xsyn.sgml -->
<sect1 id="dict-xsyn" xreflabel="dict_xsyn">
<title>dict_xsyn &mdash; example synonym full-text search dictionary</title>
<indexterm zone="dict-xsyn">
<primary>dict_xsyn</primary>
</indexterm>
<para>
<filename>dict_xsyn</filename> (Extended Synonym Dictionary) is an example of an
add-on dictionary template for full-text search. This dictionary type
replaces words with groups of their synonyms, and so makes it possible to
search for a word using any of its synonyms.
</para>
<sect2 id="dict-xsyn-config">
<title>Configuration</title>
<para>
A <literal>dict_xsyn</literal> dictionary accepts the following options:
</para>
<itemizedlist>
<listitem>
<para>
<literal>matchorig</literal> controls whether the original word is accepted by
the dictionary. Default is <literal>true</literal>.
</para>
</listitem>
<listitem>
<para>
<literal>matchsynonyms</literal> controls whether the synonyms are
accepted by the dictionary. Default is <literal>false</literal>.
</para>
</listitem>
<listitem>
<para>
<literal>keeporig</literal> controls whether the original word is included in
the dictionary's output. Default is <literal>true</literal>.
</para>
</listitem>
<listitem>
<para>
<literal>keepsynonyms</literal> controls whether the synonyms are included in
the dictionary's output. Default is <literal>true</literal>.
</para>
</listitem>
<listitem>
<para>
<literal>rules</literal> is the base name of the file containing the list of
synonyms. This file must be stored in
<filename>$SHAREDIR/tsearch_data/</filename> (where <literal>$SHAREDIR</literal> means
the <productname>PostgreSQL</productname> installation's shared-data directory).
Its name must end in <literal>.rules</literal> (which is not to be included in
the <literal>rules</literal> parameter).
</para>
</listitem>
</itemizedlist>
<para>
The rules file has the following format:
</para>
<itemizedlist>
<listitem>
<para>
Each line represents a group of synonyms for a single word, which is
given first on the line. Synonyms are separated by whitespace, thus:
<programlisting>
word syn1 syn2 syn3
</programlisting>
</para>
</listitem>
<listitem>
<para>
The sharp (<literal>#</literal>) sign is a comment delimiter. It may appear at
any position in a line. The rest of the line will be skipped.
</para>
</listitem>
</itemizedlist>
<para>
Look at <filename>xsyn_sample.rules</filename>, which is installed in
<filename>$SHAREDIR/tsearch_data/</filename>, for an example.
</para>
</sect2>
<sect2 id="dict-xsyn-usage">
<title>Usage</title>
<para>
Installing the <literal>dict_xsyn</literal> extension creates a text search
template <literal>xsyn_template</literal> and a dictionary <literal>xsyn</literal>
based on it, with default parameters. You can alter the
parameters, for example
<programlisting>
mydb# ALTER TEXT SEARCH DICTIONARY xsyn (RULES='my_rules', KEEPORIG=false);
ALTER TEXT SEARCH DICTIONARY
</programlisting>
or create new dictionaries based on the template.
</para>
<para>
To test the dictionary, you can try
<programlisting>
mydb=# SELECT ts_lexize('xsyn', 'word');
ts_lexize
-----------------------
{syn1,syn2,syn3}
mydb# ALTER TEXT SEARCH DICTIONARY xsyn (RULES='my_rules', KEEPORIG=true);
ALTER TEXT SEARCH DICTIONARY
mydb=# SELECT ts_lexize('xsyn', 'word');
ts_lexize
-----------------------
{word,syn1,syn2,syn3}
mydb# ALTER TEXT SEARCH DICTIONARY xsyn (RULES='my_rules', KEEPORIG=false, MATCHSYNONYMS=true);
ALTER TEXT SEARCH DICTIONARY
mydb=# SELECT ts_lexize('xsyn', 'syn1');
ts_lexize
-----------------------
{syn1,syn2,syn3}
mydb# ALTER TEXT SEARCH DICTIONARY xsyn (RULES='my_rules', KEEPORIG=true, MATCHORIG=false, KEEPSYNONYMS=false);
ALTER TEXT SEARCH DICTIONARY
mydb=# SELECT ts_lexize('xsyn', 'syn1');
ts_lexize
-----------------------
{word}
</programlisting>
Real-world usage will involve including it in a text search
configuration as described in <xref linkend="textsearch"/>.
That might look like this:
<programlisting>
ALTER TEXT SEARCH CONFIGURATION english
ALTER MAPPING FOR word, asciiword WITH xsyn, english_stem;
</programlisting>
</para>
</sect2>
</sect1>