postgresql/doc/src/sgml/unaccent.sgml

<!-- doc/src/sgml/unaccent.sgml -->

<sect1 id="unaccent" xreflabel="unaccent">
 <title>unaccent</title>

 <indexterm zone="unaccent">
  <primary>unaccent</primary>
 </indexterm>

 <para>
  <filename>unaccent</> is a text search dictionary that removes accents
  (diacritic signs) from lexemes.
  It's a filtering dictionary, which means its output is
  always passed to the next dictionary (if any), unlike the normal
  behavior of dictionaries.  This allows accent-insensitive processing
  for full text search.
 </para>

 <para>
  The current implementation of <filename>unaccent</> cannot be used as a
  normalizing dictionary for the <filename>thesaurus</filename> dictionary.
 </para>

 <sect2>
  <title>Configuration</title>

  <para>
   An <literal>unaccent</> dictionary accepts the following options:
  </para>
  <itemizedlist>
   <listitem>
    <para>
     <literal>RULES</> is the base name of the file containing the list of
     translation rules.  This file must be stored in
     <filename>$SHAREDIR/tsearch_data/</> (where <literal>$SHAREDIR</> means
     the <productname>PostgreSQL</> installation's shared-data directory).
     Its name must end in <literal>.rules</> (which is not to be included in
     the <literal>RULES</> parameter).
    </para>
   </listitem>
  </itemizedlist>
  <para>
   The rules file has the following format:
  </para>
  <itemizedlist>
   <listitem>
    <para>
     Each line represents a pair, consisting of a character with accent
     followed by a character without accent.  The first is translated into
     the second.  For example,
<programlisting>
&Agrave;        A
&Aacute;        A
&Acirc;        A
&Atilde;        A
&Auml;        A
&Aring;        A
&AElig;        A
</programlisting>
    </para>
   </listitem>
  </itemizedlist>

  <para>
   A more complete example, which is directly useful for most European
   languages, can be found in <filename>unaccent.rules</>, which is installed
   in <filename>$SHAREDIR/tsearch_data/</> when the <filename>unaccent</>
   module is installed.
  </para>
 </sect2>

 <sect2>
  <title>Usage</title>

  <para>
   Installing the <literal>unaccent</> extension creates a text
   search template <literal>unaccent</> and a dictionary <literal>unaccent</>
   based on it.  The <literal>unaccent</> dictionary has the default
   parameter setting <literal>RULES='unaccent'</>, which makes it immediately
   usable with the standard <filename>unaccent.rules</> file.
   If you wish, you can alter the parameter, for example

<programlisting>
mydb=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules');
</programlisting>

   or create new dictionaries based on the template.
  </para>

  <para>
   To test the dictionary, you can try:
<programlisting>
mydb=# select ts_lexize('unaccent','H&ocirc;tel');
 ts_lexize
-----------
 {Hotel}
(1 row)
</programlisting>
  </para>

  <para>
   Here is an example showing how to insert the
   <filename>unaccent</> dictionary into a text search configuration:
<programlisting>
mydb=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french );
mydb=# ALTER TEXT SEARCH CONFIGURATION fr
        ALTER MAPPING FOR hword, hword_part, word
        WITH unaccent, french_stem;
mydb=# select to_tsvector('fr','H&ocirc;tels de la Mer');
    to_tsvector
-------------------
 'hotel':1 'mer':4
(1 row)

mydb=# select to_tsvector('fr','H&ocirc;tel de la Mer') @@ to_tsquery('fr','Hotels');
 ?column?
----------
 t
(1 row)

mydb=# select ts_headline('fr','H&ocirc;tel de la Mer',to_tsquery('fr','Hotels'));
      ts_headline
------------------------
 &lt;b&gt;H&ocirc;tel&lt;/b&gt; de la Mer
(1 row)
</programlisting>
  </para>
 </sect2>

 <sect2>
 <title>Functions</title>

 <para>
  The <function>unaccent()</> function removes accents (diacritic signs) from
  a given string.  Basically, it's a wrapper around the
  <filename>unaccent</> dictionary, but it can be used outside normal
  text search contexts.
 </para>

 <indexterm>
  <primary>unaccent</primary>
 </indexterm>

<synopsis>
unaccent(<optional><replaceable class="PARAMETER">dictionary</replaceable>, </optional> <replaceable class="PARAMETER">string</replaceable>) returns <type>text</type>
</synopsis>

 <para>
  For example:
<programlisting>
SELECT unaccent('unaccent', 'H&ocirc;tel');
SELECT unaccent('H&ocirc;tel');
</programlisting>
 </para>
 </sect2>

</sect1>
Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00			`<!-- doc/src/sgml/unaccent.sgml -->`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00
Add xreflabels to /contrib manuals so links appear correct. Also update README.links to explain xref properly. 2011-05-08 04:29:20 +02:00			`<sect1 id="unaccent" xreflabel="unaccent">`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`<title>unaccent</title>`

			`<indexterm zone="unaccent">`
			`<primary>unaccent</primary>`
			`</indexterm>`

			`<para>`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`<filename>unaccent</> is a text search dictionary that removes accents`
			`(diacritic signs) from lexemes.`
			`It's a filtering dictionary, which means its output is`
			`always passed to the next dictionary (if any), unlike the normal`
			`behavior of dictionaries. This allows accent-insensitive processing`
			`for full text search.`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`</para>`

			`<para>`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`The current implementation of <filename>unaccent</> cannot be used as a`
			`normalizing dictionary for the <filename>thesaurus</filename> dictionary.`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`</para>`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`<sect2>`
			`<title>Configuration</title>`

			`<para>`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`An <literal>unaccent</> dictionary accepts the following options:`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`</para>`
			`<itemizedlist>`
			`<listitem>`
			`<para>`
			`<literal>RULES</> is the base name of the file containing the list of`
			`translation rules. This file must be stored in`
			`<filename>$SHAREDIR/tsearch_data/</> (where <literal>$SHAREDIR</> means`
			`the <productname>PostgreSQL</> installation's shared-data directory).`
			`Its name must end in <literal>.rules</> (which is not to be included in`
			`the <literal>RULES</> parameter).`
			`</para>`
			`</listitem>`
			`</itemizedlist>`
			`<para>`
			`The rules file has the following format:`
			`</para>`
			`<itemizedlist>`
			`<listitem>`
			`<para>`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`Each line represents a pair, consisting of a character with accent`
			`followed by a character without accent. The first is translated into`
			`the second. For example,`
Fix indentation of verbatim block elements Block elements with verbatim formatting (literallayout, programlisting, screen, synopsis) should be aligned at column 0 independent of the surrounding SGML, because whitespace is significant, and indenting them creates erratic whitespace in the output. The CSS stylesheets already take care of indenting the output. Assorted markup improvements to go along with it. 2010-07-29 21:34:41 +02:00			`<programlisting>`
Remove tabs from SGML. 2009-08-20 14:12:37 +02:00			`À A`
			`Á A`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`Â A`
Remove tabs from SGML. 2009-08-20 14:12:37 +02:00			`Ã A`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`Ä A`
			`Å A`
			`Æ A`
Fix indentation of verbatim block elements Block elements with verbatim formatting (literallayout, programlisting, screen, synopsis) should be aligned at column 0 independent of the surrounding SGML, because whitespace is significant, and indenting them creates erratic whitespace in the output. The CSS stylesheets already take care of indenting the output. Assorted markup improvements to go along with it. 2010-07-29 21:34:41 +02:00			`</programlisting>`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`</para>`
			`</listitem>`
			`</itemizedlist>`

			`<para>`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`A more complete example, which is directly useful for most European`
			`languages, can be found in <filename>unaccent.rules</>, which is installed`
			`in <filename>$SHAREDIR/tsearch_data/</> when the <filename>unaccent</>`
			`module is installed.`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`</para>`
			`</sect2>`

			`<sect2>`
			`<title>Usage</title>`

			`<para>`
Fix obsolete references to old-style contrib installation methods. 2011-02-14 07:10:44 +01:00			`Installing the <literal>unaccent</> extension creates a text`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`search template <literal>unaccent</> and a dictionary <literal>unaccent</>`
Document filtering dictionaries in textsearch.sgml. While at it, copy-edit the description of prefix-match marker support in synonym dictionaries, and clarify the description of the default unaccent dictionary a bit more. 2010-08-25 23:42:55 +02:00			`based on it. The <literal>unaccent</> dictionary has the default`
			`parameter setting <literal>RULES='unaccent'</>, which makes it immediately`
			`usable with the standard <filename>unaccent.rules</> file.`
			`If you wish, you can alter the parameter, for example`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00
			`<programlisting>`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`mydb=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules');`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`</programlisting>`

			`or create new dictionaries based on the template.`
			`</para>`

			`<para>`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`To test the dictionary, you can try:`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`<programlisting>`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`mydb=# select ts_lexize('unaccent','Hôtel');`
			`ts_lexize`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`-----------`
			`{Hotel}`
			`(1 row)`
			`</programlisting>`
			`</para>`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`<para>`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`Here is an example showing how to insert the`
			`<filename>unaccent</> dictionary into a text search configuration:`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`<programlisting>`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`mydb=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french );`
			`mydb=# ALTER TEXT SEARCH CONFIGURATION fr`
Remove tabs from SGML. 2009-08-20 14:12:37 +02:00			`ALTER MAPPING FOR hword, hword_part, word`
			`WITH unaccent, french_stem;`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`mydb=# select to_tsvector('fr','Hôtels de la Mer');`
			`to_tsvector`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`-------------------`
			`'hotel':1 'mer':4`
			`(1 row)`

Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`mydb=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels');`
			`?column?`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`----------`
			`t`
			`(1 row)`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00
			`mydb=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels'));`
			`ts_headline`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`------------------------`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`<b>Hôtel</b> de la Mer`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`(1 row)`
			`</programlisting>`
			`</para>`
			`</sect2>`

			`<sect2>`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`<title>Functions</title>`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00
			`<para>`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`The <function>unaccent()</> function removes accents (diacritic signs) from`
			`a given string. Basically, it's a wrapper around the`
			`<filename>unaccent</> dictionary, but it can be used outside normal`
			`text search contexts.`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`</para>`

			`<indexterm>`
			`<primary>unaccent</primary>`
			`</indexterm>`

Fix indentation of verbatim block elements Block elements with verbatim formatting (literallayout, programlisting, screen, synopsis) should be aligned at column 0 independent of the surrounding SGML, because whitespace is significant, and indenting them creates erratic whitespace in the output. The CSS stylesheets already take care of indenting the output. Assorted markup improvements to go along with it. 2010-07-29 21:34:41 +02:00			`<synopsis>`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`unaccent(<optional><replaceable class="PARAMETER">dictionary</replaceable>, </optional> <replaceable class="PARAMETER">string</replaceable>) returns <type>text</type>`
Fix indentation of verbatim block elements Block elements with verbatim formatting (literallayout, programlisting, screen, synopsis) should be aligned at column 0 independent of the surrounding SGML, because whitespace is significant, and indenting them creates erratic whitespace in the output. The CSS stylesheets already take care of indenting the output. Assorted markup improvements to go along with it. 2010-07-29 21:34:41 +02:00			`</synopsis>`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00
			`<para>`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`For example:`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`<programlisting>`
Docs review for unaccent: fix grammar, markup, etc. 2010-08-25 04:12:00 +02:00			`SELECT unaccent('unaccent', 'Hôtel');`
			`SELECT unaccent('Hôtel');`
Unaccent dictionary. 2009-08-18 12:34:39 +02:00			`</programlisting>`
			`</para>`
			`</sect2>`

			`</sect1>`