mirror of
https://git.postgresql.org/git/postgresql.git
synced 2024-09-30 18:41:16 +02:00
806eb92c01
Jeff Janes Discussion: https://postgr.es/m/CAMkU=1xRcs_BUPzR0+V3WndaCAv0E_m3h6aUEJ8NF-sY1nnHsw@mail.gmail.com
101 lines
3.0 KiB
Plaintext
101 lines
3.0 KiB
Plaintext
<!-- doc/src/sgml/dict-int.sgml -->
|
|
|
|
<sect1 id="dict-int" xreflabel="dict_int">
|
|
<title>dict_int</title>
|
|
|
|
<indexterm zone="dict-int">
|
|
<primary>dict_int</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
<filename>dict_int</filename> is an example of an add-on dictionary template
|
|
for full-text search. The motivation for this example dictionary is to
|
|
control the indexing of integers (signed and unsigned), allowing such
|
|
numbers to be indexed while preventing excessive growth in the number of
|
|
unique words, which greatly affects the performance of searching.
|
|
</para>
|
|
|
|
<para>
|
|
This module is considered <quote>trusted</quote>, that is, it can be
|
|
installed by non-superusers who have <literal>CREATE</literal> privilege
|
|
on the current database.
|
|
</para>
|
|
|
|
<sect2>
|
|
<title>Configuration</title>
|
|
|
|
<para>
|
|
The dictionary accepts three options:
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
The <literal>maxlen</literal> parameter specifies the maximum number of
|
|
digits allowed in an integer word. The default value is 6.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The <literal>rejectlong</literal> parameter specifies whether an overlength
|
|
integer should be truncated or ignored. If <literal>rejectlong</literal> is
|
|
<literal>false</literal> (the default), the dictionary returns the first
|
|
<literal>maxlen</literal> digits of the integer. If <literal>rejectlong</literal> is
|
|
<literal>true</literal>, the dictionary treats an overlength integer as a stop
|
|
word, so that it will not be indexed. Note that this also means that
|
|
such an integer cannot be searched for.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The <literal>absval</literal> parameter specifies whether leading
|
|
<quote><literal>+</literal></quote> or <quote><literal>-</literal></quote>
|
|
signs should be removed from integer words. The default
|
|
is <literal>false</literal>. When <literal>true</literal>, the sign is
|
|
removed before <literal>maxlen</literal> is applied.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Usage</title>
|
|
|
|
<para>
|
|
Installing the <literal>dict_int</literal> extension creates a text search
|
|
template <literal>intdict_template</literal> and a dictionary <literal>intdict</literal>
|
|
based on it, with the default parameters. You can alter the
|
|
parameters, for example
|
|
|
|
<programlisting>
|
|
mydb# ALTER TEXT SEARCH DICTIONARY intdict (MAXLEN = 4, REJECTLONG = true);
|
|
ALTER TEXT SEARCH DICTIONARY
|
|
</programlisting>
|
|
|
|
or create new dictionaries based on the template.
|
|
</para>
|
|
|
|
<para>
|
|
To test the dictionary, you can try
|
|
|
|
<programlisting>
|
|
mydb# select ts_lexize('intdict', '12345678');
|
|
ts_lexize
|
|
-----------
|
|
{123456}
|
|
</programlisting>
|
|
|
|
but real-world usage will involve including it in a text search
|
|
configuration as described in <xref linkend="textsearch"/>.
|
|
That might look like this:
|
|
|
|
<programlisting>
|
|
ALTER TEXT SEARCH CONFIGURATION english
|
|
ALTER MAPPING FOR int, uint WITH intdict;
|
|
</programlisting>
|
|
|
|
</para>
|
|
</sect2>
|
|
|
|
</sect1>
|