2001-11-12 20:19:39 +01:00
|
|
|
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.14 2001/11/12 19:19:39 petere Exp $ -->
|
2000-09-12 07:37:09 +02:00
|
|
|
|
2000-09-30 18:58:20 +02:00
|
|
|
<chapter id="charset">
|
|
|
|
<title>Localization</>
|
|
|
|
|
|
|
|
<abstract>
|
|
|
|
<para>
|
|
|
|
Describes the available localization features from the point of
|
|
|
|
view of the administrator.
|
|
|
|
</para>
|
|
|
|
</abstract>
|
2000-09-12 07:37:09 +02:00
|
|
|
|
|
|
|
<para>
|
2000-09-30 18:58:20 +02:00
|
|
|
<productname>Postgres</productname> supports localization with
|
|
|
|
three approaches:
|
2000-09-12 07:37:09 +02:00
|
|
|
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
2000-09-30 18:58:20 +02:00
|
|
|
Using the locale features of the operating system to provide
|
|
|
|
locale-specific collation order, number formatting, and other
|
|
|
|
aspects.
|
2000-09-12 07:37:09 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Using explicit multiple-byte character sets defined in the
|
2000-09-30 18:58:20 +02:00
|
|
|
<productname>Postgres</productname> server to support languages
|
|
|
|
that require more characters than will fit into a single byte,
|
|
|
|
and to provide character set recoding between client and server.
|
|
|
|
The number of supported character sets is fixed at the time the
|
|
|
|
server is compiled, and internal operations such as string
|
|
|
|
comparisons require expansion of each character into a 32-bit
|
|
|
|
word.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Single byte character recoding provides a more light-weight
|
|
|
|
solution for users of multiple, yet single-byte character sets.
|
2000-09-12 07:37:09 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</para>
|
|
|
|
|
2000-09-30 18:58:20 +02:00
|
|
|
|
|
|
|
<sect1 id="locale">
|
|
|
|
<title>Locale Support</title>
|
|
|
|
|
2001-11-12 20:19:39 +01:00
|
|
|
<indexterm zone="locale"><primary>locale</></>
|
|
|
|
|
2000-09-30 18:58:20 +02:00
|
|
|
<para>
|
|
|
|
<firstterm>Locale</> support refers to an application respecting
|
|
|
|
cultural preferences regarding alphabets, sorting, number
|
|
|
|
formatting, etc. <productname>PostgreSQL</> uses the standard ISO
|
2001-09-10 01:52:12 +02:00
|
|
|
C and <acronym>POSIX</acronym>-like locale facilities provided by the server operating
|
2001-01-19 05:47:50 +01:00
|
|
|
system. For additional information refer to the documentation of your
|
2000-09-30 18:58:20 +02:00
|
|
|
system.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<sect2>
|
|
|
|
<title>Overview</>
|
|
|
|
|
|
|
|
<para>
|
2001-01-19 05:47:50 +01:00
|
|
|
Locale support is not built into <productname>PostgreSQL</> by
|
2000-09-30 18:58:20 +02:00
|
|
|
default; to enable it, supply the <option>--enable-locale</> option
|
|
|
|
to the <filename>configure</> script:
|
|
|
|
<informalexample>
|
|
|
|
<screen>
|
|
|
|
<prompt>$ </><userinput>./configure --enable-locale</>
|
|
|
|
</screen>
|
|
|
|
</informalexample>
|
|
|
|
Locale support only affects the server; all clients are compatible
|
|
|
|
with servers with or without locale support.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The information about which particular cultural rules to use is
|
|
|
|
determined by standard environment variables. If you are getting
|
|
|
|
localized behavior from other programs you probably have them set
|
|
|
|
up already. The simplest way to set the localization information
|
|
|
|
is the <envar>LANG</> variable, for example:
|
|
|
|
<programlisting>
|
|
|
|
export LANG=sv_SE
|
|
|
|
</programlisting>
|
|
|
|
This sets the locale to Swedish (<literal>sv</>) as spoken in
|
|
|
|
Sweden (<literal>SE</>). Other possibilities might be
|
|
|
|
<literal>en_US</> (U.S. English) and <literal>fr_CA</> (Canada,
|
|
|
|
French). If more than one character set can be useful for a locale
|
|
|
|
then the specifications look like this:
|
|
|
|
<literal>cs_CZ.ISO8859-2</>. What locales are available under what
|
|
|
|
names on your system depends on what was provided by the operating
|
|
|
|
system vendor and what was installed.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Occasionally it is useful to mix rules from several locales, e.g.,
|
2001-01-19 05:47:50 +01:00
|
|
|
use U.S. collation rules but Spanish messages. To do that a set of
|
2000-09-30 18:58:20 +02:00
|
|
|
environment variables exist that override the default of
|
|
|
|
<envar>LANG</> for a particular category:
|
|
|
|
|
|
|
|
<informaltable>
|
|
|
|
<tgroup cols="2">
|
|
|
|
<tbody>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><envar>LC_COLLATE</></>
|
2000-09-30 18:58:20 +02:00
|
|
|
<entry>String sort order</>
|
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><envar>LC_CTYPE</></>
|
2001-04-20 17:52:33 +02:00
|
|
|
<entry>Character classification (What is a letter? The upper-case equivalent?)</>
|
2000-09-30 18:58:20 +02:00
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><envar>LC_MESSAGES</></>
|
2000-09-30 18:58:20 +02:00
|
|
|
<entry>Language of messages</>
|
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><envar>LC_MONETARY</></>
|
2000-09-30 18:58:20 +02:00
|
|
|
<entry>Formatting of currency amounts</>
|
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><envar>LC_NUMERIC</></>
|
2000-09-30 18:58:20 +02:00
|
|
|
<entry>Formatting of numbers</>
|
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><envar>LC_TIME</></>
|
2000-09-30 18:58:20 +02:00
|
|
|
<entry>Formatting of dates and times</>
|
|
|
|
</row>
|
|
|
|
</tbody>
|
|
|
|
</tgroup>
|
|
|
|
</informaltable>
|
|
|
|
|
|
|
|
<envar>LC_MESSAGES</> only affects the messages that come from the
|
|
|
|
operating system, not <productname>PostgreSQL</>.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
If you want the system to behave as if it had no locale support,
|
|
|
|
use the special locale <literal>C</> or <literal>POSIX</>, or
|
|
|
|
simply unset all locale related variables.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2001-01-19 05:47:50 +01:00
|
|
|
Note that the locale behavior is determined by the environment
|
|
|
|
variables seen by the server, not by the environment of any client.
|
|
|
|
Therefore, be careful to set these variables before starting the
|
|
|
|
postmaster.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The <envar>LC_COLLATE</> and <envar>LC_CTYPE</> variables affect the
|
|
|
|
sort order of indexes. Therefore, these values must be kept fixed
|
|
|
|
for any particular database cluster, or indexes on text columns will
|
|
|
|
become corrupt. <productname>Postgres</productname> enforces this
|
|
|
|
by recording the values of <envar>LC_COLLATE</> and <envar>LC_CTYPE</>
|
|
|
|
that are seen by <command>initdb</>. The server automatically adopts
|
|
|
|
those two values when it is started; only the other <envar>LC_</>
|
|
|
|
categories can be set from the environment at server startup.
|
|
|
|
In short, only one collation order can be used in a database cluster,
|
|
|
|
and it is chosen at <command>initdb</> time.
|
2000-09-30 18:58:20 +02:00
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2>
|
|
|
|
<title>Benefits</>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Locale support influences in particular the following features:
|
|
|
|
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Sort order in <command>ORDER BY</> queries.
|
2001-11-12 20:19:39 +01:00
|
|
|
<indexterm><primary>ORDER BY</></>
|
2000-09-30 18:58:20 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
The <function>to_char</> family of functions
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
The <literal>LIKE</> and <literal>~</> operators for pattern
|
|
|
|
matching
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The only severe drawback of using the locale support in
|
|
|
|
<productname>PostgreSQL</> is its speed. So use locale only if you
|
2001-01-19 05:47:50 +01:00
|
|
|
actually need it. It should be noted in particular that selecting
|
|
|
|
a non-C locale disables index optimizations for <literal>LIKE</> and
|
|
|
|
<literal>~</> operators, which can make a huge difference in the
|
|
|
|
speed of searches that use those operators.
|
2000-09-30 18:58:20 +02:00
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2>
|
|
|
|
<title>Problems</>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
If locale support doesn't work in spite of the explanation above,
|
2001-09-10 01:52:12 +02:00
|
|
|
check that the locale support in your operating system is correctly configured.
|
2000-09-30 18:58:20 +02:00
|
|
|
To check whether a given locale is installed and functional you
|
|
|
|
can use <application>Perl</>, for example. Perl has also support
|
|
|
|
for locales and if a locale is broken <command>perl -v</> will
|
|
|
|
complain something like this:
|
|
|
|
<screen>
|
|
|
|
<prompt>$</> <userinput>export LC_CTYPE='not_exist'</>
|
|
|
|
<prompt>$</> <userinput>perl -v</>
|
|
|
|
<computeroutput>
|
|
|
|
perl: warning: Setting locale failed.
|
|
|
|
perl: warning: Please check that your locale settings:
|
|
|
|
LC_ALL = (unset),
|
|
|
|
LC_CTYPE = "not_exist",
|
|
|
|
LANG = (unset)
|
|
|
|
are supported and installed on your system.
|
|
|
|
perl: warning: Falling back to the standard locale ("C").
|
|
|
|
</computeroutput>
|
|
|
|
</screen>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Check that your locale files are in the right location. Possible
|
2001-09-10 01:52:12 +02:00
|
|
|
locations include: <filename>/usr/lib/locale</filename> (<systemitem class="osname">Linux</>,
|
|
|
|
<systemitem class="osname">Solaris</>), <filename>/usr/share/locale</filename> (<systemitem class="osname">Linux</>),
|
|
|
|
<filename>/usr/lib/nls/loc</filename> (<systemitem class="osname">DUX 4.0</>). Check the locale
|
2000-09-30 18:58:20 +02:00
|
|
|
man page of your system if you are not sure.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The directory <filename>src/test/locale</> contains a test suite
|
|
|
|
for <productname>PostgreSQL</>'s locale support.
|
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
|
2000-09-29 22:21:34 +02:00
|
|
|
<sect1 id="multibyte">
|
2000-09-30 18:58:20 +02:00
|
|
|
<title>Multibyte Support</title>
|
2000-09-12 07:37:09 +02:00
|
|
|
|
2001-11-12 20:19:39 +01:00
|
|
|
<indexterm zone="multibyte"><primary>multibyte</></>
|
|
|
|
|
2000-09-12 07:37:09 +02:00
|
|
|
<note>
|
|
|
|
<title>Author</title>
|
|
|
|
|
|
|
|
<para>
|
2000-12-22 22:51:58 +01:00
|
|
|
Tatsuo Ishii (<email>ishii@postgresql.org</email>),
|
2000-09-12 07:37:09 +02:00
|
|
|
last updated 2000-03-22.
|
|
|
|
Check <ulink
|
|
|
|
url="http://www.sra.co.jp/people/t-ishii/PostgreSQL/">Tatsuo's
|
|
|
|
web site</ulink> for more information.
|
|
|
|
</para>
|
|
|
|
</note>
|
|
|
|
|
|
|
|
<para>
|
2000-09-30 18:58:20 +02:00
|
|
|
Multibyte (<acronym>MB</acronym>) support is intended to allow
|
2000-09-12 07:37:09 +02:00
|
|
|
<productname>Postgres</productname> to handle
|
2001-09-10 01:52:12 +02:00
|
|
|
multiple-byte character sets such as <acronym>EUC</> (Extended Unix Code), Unicode and
|
|
|
|
Mule internal code. With <acronym>MB</acronym> enabled you can use multibyte
|
2000-09-12 07:37:09 +02:00
|
|
|
character sets in regular expressions (regexp), LIKE, and some
|
|
|
|
other functions. The default
|
|
|
|
encoding system is selected while initializing your
|
|
|
|
<productname>Postgres</productname> installation using
|
|
|
|
<application>initdb</application>. Note that this can be
|
|
|
|
overridden when you create a database using
|
|
|
|
<application>createdb</application> or by using the SQL command
|
|
|
|
CREATE DATABASE. So you can have multiple databases each with
|
|
|
|
a different encoding system.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<acronym>MB</acronym> also fixes some problems concerning 8-bit single byte
|
2001-01-19 05:47:50 +01:00
|
|
|
character sets including ISO8859. (I would not say all problems
|
2000-09-12 07:37:09 +02:00
|
|
|
have been fixed. I just confirmed that the regression test ran fine
|
|
|
|
and a few French characters could be used with the patch. Please let
|
|
|
|
me know if you find any problem while using 8-bit characters.)
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<sect2>
|
|
|
|
<title>Enabling MB</title>
|
|
|
|
|
|
|
|
<para>
|
2001-01-19 05:47:50 +01:00
|
|
|
Run configure with the multibyte option:
|
2000-09-12 07:37:09 +02:00
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
% ./configure --enable-multibyte[=<replaceable>encoding_system</replaceable>]
|
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
where <replaceable>encoding_system</replaceable> can be one of the
|
|
|
|
values in the following table:
|
|
|
|
|
|
|
|
<table tocentry="1">
|
2001-10-09 20:46:00 +02:00
|
|
|
<title>Character Set Encodings</title>
|
2000-09-12 07:37:09 +02:00
|
|
|
<titleabbrev>Encodings</titleabbrev>
|
|
|
|
<tgroup cols="2">
|
|
|
|
<thead>
|
|
|
|
<row>
|
|
|
|
<entry>Encoding</entry>
|
|
|
|
<entry>Description</entry>
|
|
|
|
</row>
|
|
|
|
</thead>
|
|
|
|
<tbody>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>SQL_ASCII</literal></entry>
|
|
|
|
<entry><acronym>ASCII</acronym></entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>EUC_JP</literal></entry>
|
|
|
|
<entry>Japanese <acronym>EUC</></entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>EUC_CN</literal></entry>
|
|
|
|
<entry>Chinese <acronym>EUC</></entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>EUC_KR</literal></entry>
|
|
|
|
<entry>Korean <acronym>EUC</></entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>EUC_TW</literal></entry>
|
|
|
|
<entry>Taiwan <acronym>EUC</acronym></entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>UNICODE</literal></entry>
|
|
|
|
<entry>Unicode (<acronym>UTF</acronym>-8)</entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>MULE_INTERNAL</literal></entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
<entry>Mule internal</entry>
|
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>LATIN1</literal></entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
<entry>ISO 8859-1 English and some European languages</entry>
|
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>LATIN2</literal></entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
<entry>ISO 8859-2 English and some European languages</entry>
|
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>LATIN3</literal></entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
<entry>ISO 8859-3 English and some European languages</entry>
|
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>LATIN4</literal></entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
<entry>ISO 8859-4 English and some European languages</entry>
|
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>LATIN5</literal></entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
<entry>ISO 8859-5 English and some European languages</entry>
|
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>KOI8</literal></entry>
|
|
|
|
<entry><acronym>KOI</acronym>8-R(U)</entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>WIN</literal></entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
<entry>Windows CP1251</entry>
|
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>ALT</literal></entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
<entry>Windows CP866</entry>
|
|
|
|
</row>
|
|
|
|
</tbody>
|
|
|
|
</tgroup>
|
|
|
|
</table>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Here is an example of configuring
|
|
|
|
<productname>Postgres</productname> to use a Japanese encoding by
|
|
|
|
default:
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
% ./configure --enable-multibyte=EUC_JP
|
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
If the encoding system is omitted (./configure --enable-multibyte),
|
|
|
|
SQL_ASCII is assumed.
|
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2>
|
|
|
|
<title>Setting the Encoding</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<application>initdb</application> defines the default encoding
|
|
|
|
for a <productname>Postgres</productname> installation. For example:
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
% initdb -E EUC_JP
|
|
|
|
</programlisting>
|
|
|
|
|
2001-09-10 01:52:12 +02:00
|
|
|
sets the default encoding to <literal>EUC_JP</literal> (Extended Unix Code for Japanese).
|
2001-09-13 17:55:24 +02:00
|
|
|
Note that you can use <option>--encoding</option> instead of <option>-E</option> if you prefer
|
2000-09-12 07:37:09 +02:00
|
|
|
to type longer option strings.
|
|
|
|
If no -E or --encoding option is given, the encoding
|
2001-01-19 05:47:50 +01:00
|
|
|
specified at configure time is used.
|
2000-09-12 07:37:09 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
You can create a database with a different encoding:
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
% createdb -E EUC_KR korean
|
|
|
|
</programlisting>
|
|
|
|
|
2001-09-10 01:52:12 +02:00
|
|
|
will create a database named <database>korean</database> with <literal>EUC_KR</literal> encoding.
|
2001-01-19 05:47:50 +01:00
|
|
|
Another way to accomplish this is to use a SQL command:
|
2000-09-12 07:37:09 +02:00
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
CREATE DATABASE korean WITH ENCODING = 'EUC_KR';
|
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
The encoding for a database is represented as an
|
|
|
|
<firstterm>encoding column</firstterm> in the
|
|
|
|
<literal>pg_database</literal> system catalog.
|
2001-09-10 01:52:12 +02:00
|
|
|
You can see that by using <option>-l</option> or <command>\l</command> of <command>psql</command>
|
2000-09-12 07:37:09 +02:00
|
|
|
command.
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
$ psql -l
|
|
|
|
List of databases
|
|
|
|
Database | Owner | Encoding
|
|
|
|
---------------+---------+---------------
|
|
|
|
euc_cn | t-ishii | EUC_CN
|
|
|
|
euc_jp | t-ishii | EUC_JP
|
|
|
|
euc_kr | t-ishii | EUC_KR
|
|
|
|
euc_tw | t-ishii | EUC_TW
|
|
|
|
mule_internal | t-ishii | MULE_INTERNAL
|
|
|
|
regression | t-ishii | SQL_ASCII
|
|
|
|
template1 | t-ishii | EUC_JP
|
|
|
|
test | t-ishii | EUC_JP
|
|
|
|
unicode | t-ishii | UNICODE
|
|
|
|
(9 rows)
|
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2>
|
|
|
|
<title>Automatic encoding translation between backend and
|
|
|
|
frontend</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<productname>Postgres</productname> supports an automatic
|
|
|
|
encoding translation between backend
|
|
|
|
and frontend for some encodings.
|
|
|
|
|
|
|
|
<table tocentry="1">
|
2001-10-09 20:46:00 +02:00
|
|
|
<title>Client/Server Character Set Encodings</title>
|
2000-09-12 07:37:09 +02:00
|
|
|
<titleabbrev>Communication Encodings</titleabbrev>
|
|
|
|
<tgroup cols="2">
|
|
|
|
<thead>
|
|
|
|
<row>
|
|
|
|
<entry>Server Encoding</entry>
|
|
|
|
<entry>Available Client Encodings</entry>
|
|
|
|
</row>
|
|
|
|
</thead>
|
|
|
|
<tbody>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>EUC_JP</literal></entry>
|
|
|
|
<entry><literal>EUC_JP</literal>, <literal>SJIS</literal></entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>EUC_TW</literal></entry>
|
|
|
|
<entry><literal>EUC_TW</literal>, <literal>BIG5</literal></entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>LATIN2</literal></entry>
|
|
|
|
<entry><literal>LATIN2</literal>, <literal>WIN1250</literal></entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>LATIN5</literal></entry>
|
|
|
|
<entry><literal>LATIN5</literal>, <literal>WIN</literal>, <literal>ALT</literal></entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
</row>
|
|
|
|
<row>
|
2001-09-10 01:52:12 +02:00
|
|
|
<entry><literal>MULE_INTERNAL</literal></entry>
|
|
|
|
<entry><literal>EUC_JP</literal>, <literal>SJIS</literal>, <literal>EUC_KR</literal>, <literal>EUC_CN</literal>,
|
|
|
|
<literal>EUC_TW</literal>, <literal>BIG5</literal>, <literal>LATIN1</literal> to <literal>LATIN5</literal>,
|
|
|
|
<literal>WIN</literal>, <literal>ALT</literal>, <literal>WIN1250</literal></entry>
|
2000-09-12 07:37:09 +02:00
|
|
|
</row>
|
|
|
|
</tbody>
|
|
|
|
</tgroup>
|
|
|
|
</table>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
To enable the automatic encoding translation, you have to tell
|
|
|
|
<productname>Postgres</productname> the encoding you would like
|
|
|
|
to use in frontend. There are
|
|
|
|
several ways to accomplish this.
|
|
|
|
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Using the <command>\encoding</command> command in
|
|
|
|
<application>psql</application>.
|
|
|
|
<command>\encoding</command> allows you to change frontend
|
|
|
|
encoding on the fly. For
|
2001-09-10 01:52:12 +02:00
|
|
|
example, to change the encoding to <literal>SJIS</literal>, type:
|
2000-09-12 07:37:09 +02:00
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
\encoding SJIS
|
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
2001-09-10 01:52:12 +02:00
|
|
|
Using <application>libpq</> functions.
|
2000-09-12 07:37:09 +02:00
|
|
|
<command>\encoding</command> actually calls
|
2001-09-10 01:52:12 +02:00
|
|
|
<function>PQsetClientEncoding()</function> for its purpose.
|
2000-09-12 07:37:09 +02:00
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
int PQsetClientEncoding(PGconn *<replaceable>conn</replaceable>, const char *<replaceable>encoding</replaceable>)
|
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
where <replaceable>conn</replaceable> is a connection to the backend,
|
|
|
|
and <replaceable>encoding</replaceable> is an encoding you
|
|
|
|
want to use. If it successfully sets the encoding, it returns 0,
|
|
|
|
otherwise -1. The current encoding for this connection can be shown by
|
|
|
|
using:
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
int PQclientEncoding(const PGconn *<replaceable>conn</replaceable>)
|
|
|
|
</programlisting>
|
|
|
|
|
2001-09-13 17:55:24 +02:00
|
|
|
Note that it returns the encoding id, not the encoding symbol string
|
2001-09-10 01:52:12 +02:00
|
|
|
such as <literal>EUC_JP</literal>. To convert an encoding id to an encoding symbol, you
|
2000-09-12 07:37:09 +02:00
|
|
|
can use:
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
char *pg_encoding_to_char(int <replaceable>encoding_id</replaceable>)
|
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Using <command>SET CLIENT_ENCODING TO</command>.
|
|
|
|
|
2001-01-19 05:47:50 +01:00
|
|
|
Setting the frontend side encoding can be done by this SQL command:
|
2000-09-12 07:37:09 +02:00
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
SET CLIENT_ENCODING TO 'encoding';
|
|
|
|
</programlisting>
|
|
|
|
|
2001-09-13 17:55:24 +02:00
|
|
|
Also you can use SQL92 syntax <literal>SET NAMES</literal> for this purpose:
|
2000-09-12 07:37:09 +02:00
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
SET NAMES 'encoding';
|
|
|
|
</programlisting>
|
|
|
|
|
2001-01-19 05:47:50 +01:00
|
|
|
To query the current frontend encoding:
|
2000-09-12 07:37:09 +02:00
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
SHOW CLIENT_ENCODING;
|
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
To return to the default encoding:
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
RESET CLIENT_ENCODING;
|
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
</listitem>
|
2001-01-19 05:47:50 +01:00
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Using <envar>PGCLIENTENCODING</envar>.
|
|
|
|
|
|
|
|
If environment variable <envar>PGCLIENTENCODING</envar> is defined
|
|
|
|
in the client's environment, that client encoding is automatically
|
|
|
|
selected when a backend connection is made. (This can subsequently
|
|
|
|
be overridden using any of the other methods mentioned above.)
|
|
|
|
</para>
|
|
|
|
</listitem>
|
2000-09-12 07:37:09 +02:00
|
|
|
</itemizedlist>
|
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2>
|
|
|
|
<title>About Unicode</title>
|
|
|
|
|
2001-11-12 20:19:39 +01:00
|
|
|
<indexterm><primary>Unicode</></>
|
|
|
|
|
2000-09-12 07:37:09 +02:00
|
|
|
<para>
|
|
|
|
An automatic encoding translation between Unicode and other
|
2000-12-20 01:44:49 +01:00
|
|
|
encodings has been supported since PostgreSQL 7.1.
|
|
|
|
Because this requires huge conversion tables, it's not enabled by default.
|
|
|
|
To enable this feature, run configure with the
|
2001-09-10 01:52:12 +02:00
|
|
|
<option>--enable-unicode-conversion</option> option. Note that this requires
|
|
|
|
the <option>--enable-multibyte</option> option also.
|
2000-09-12 07:37:09 +02:00
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2>
|
|
|
|
<title>What happens if the translation is not possible?</title>
|
|
|
|
|
|
|
|
<para>
|
2001-09-10 01:52:12 +02:00
|
|
|
Suppose you choose <literal>EUC_JP</literal> for the backend, <literal>LATIN1</literal> for the frontend,
|
|
|
|
then some Japanese characters could not be translated into <literal>LATIN1</literal>. In
|
|
|
|
this case, a letter that cannot be represented in the <literal>LATIN1</literal> character set
|
2000-09-12 07:37:09 +02:00
|
|
|
would be transformed as:
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
(HEXA DECIMAL)
|
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2>
|
|
|
|
<title>References</title>
|
|
|
|
|
|
|
|
<para>
|
2001-01-19 05:47:50 +01:00
|
|
|
These are good sources to start learning about various kinds of encoding
|
2000-09-12 07:37:09 +02:00
|
|
|
systems.
|
|
|
|
|
2001-10-09 20:46:00 +02:00
|
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
2001-10-31 21:35:02 +01:00
|
|
|
<term><ulink url="ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf"></ulink></term>
|
2001-10-09 20:46:00 +02:00
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Detailed explanations of <literal>EUC_JP</literal>,
|
|
|
|
<literal>EUC_CN</literal>, <literal>EUC_KR</literal>,
|
|
|
|
<literal>EUC_TW</literal> appear in section 3.2.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
2001-10-31 21:35:02 +01:00
|
|
|
<term><ulink url="http://www.unicode.org/"></ulink></term>
|
2001-10-09 20:46:00 +02:00
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
The web site of the Unicode Consortium
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>RFC 2044</term>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
<acronym>UTF</acronym>-8 is defined here.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
2000-09-12 07:37:09 +02:00
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2>
|
|
|
|
<title>History</title>
|
|
|
|
|
2001-10-09 20:46:00 +02:00
|
|
|
<literallayout class="monospaced">
|
2000-12-20 01:44:49 +01:00
|
|
|
Dec 7, 2000
|
|
|
|
* An automatic encoding translation between Unicode and other
|
|
|
|
encodings are implemented
|
|
|
|
* Changes above will appear in 7.1
|
|
|
|
|
2000-09-12 07:37:09 +02:00
|
|
|
May 20, 2000
|
|
|
|
* SJIS UDC (NEC selection IBM kanji) support contributed
|
|
|
|
by Eiji Tokuya
|
|
|
|
* Changes above will appear in 7.0.1
|
|
|
|
|
|
|
|
Mar 22, 2000
|
|
|
|
* Add new libpq functions PQsetClientEncoding, PQclientEncoding
|
|
|
|
* ./configure --with-mb=EUC_JP
|
|
|
|
now deprecated. use
|
|
|
|
./configure --enable-multibyte=EUC_JP
|
|
|
|
instead
|
|
|
|
* Add SQL_ASCII regression test case
|
|
|
|
* Add SJIS User Defined Character (UDC) support
|
|
|
|
* All of above will appear in 7.0
|
|
|
|
|
|
|
|
July 11, 1999
|
|
|
|
* Add support for WIN1250 (Windows Czech) as a client encoding
|
|
|
|
(contributed by Pavel Behal)
|
|
|
|
* fix some compiler warnings (contributed by Tomoaki Nishiyama)
|
|
|
|
|
|
|
|
Mar 23, 1999
|
|
|
|
* Add support for KOI8(KOI8-R), WIN(CP1251), ALT(CP866)
|
|
|
|
(thanks Oleg Broytmann for testing)
|
|
|
|
* Fix problem with MB and locale
|
|
|
|
|
|
|
|
Jan 26, 1999
|
|
|
|
* Add support for Big5 for fronend encoding
|
|
|
|
(you need to create a database with EUC_TW to use Big5)
|
|
|
|
* Add regression test case for EUC_TW
|
2000-12-22 22:51:58 +01:00
|
|
|
(contributed by Jonah Kuo <email>jonahkuo@mail.ttn.com.tw</email>)
|
2000-09-12 07:37:09 +02:00
|
|
|
|
|
|
|
Dec 15, 1998
|
|
|
|
* Bugs related to SQL_ASCII support fixed
|
|
|
|
|
|
|
|
Nov 5, 1998
|
|
|
|
* 6.4 release. In this version, pg_database has "encoding"
|
|
|
|
column that represents the database encoding
|
|
|
|
|
|
|
|
Jul 22, 1998
|
|
|
|
* determine encoding at initdb/createdb rather than compile time
|
|
|
|
* support for PGCLIENTENCODING when issuing COPY command
|
|
|
|
* support for SQL92 syntax "SET NAMES"
|
|
|
|
* support for LATIN2-5
|
|
|
|
* add UNICODE regression test case
|
|
|
|
* new test suite for MB
|
|
|
|
* clean up source files
|
|
|
|
|
|
|
|
Jun 5, 1998
|
|
|
|
* add support for the encoding translation between the backend
|
|
|
|
and the frontend
|
|
|
|
* new command SET CLIENT_ENCODING etc. added
|
|
|
|
* add support for LATIN1 character set
|
|
|
|
* enhance 8 bit cleaness
|
|
|
|
|
|
|
|
April 21, 1998 some enhancements/fixes
|
|
|
|
* character_length(), position(), substring() are now aware of
|
|
|
|
multi-byte characters
|
|
|
|
* add octet_length()
|
|
|
|
* add --with-mb option to configure
|
|
|
|
* new regression tests for EUC_KR
|
2000-12-22 22:51:58 +01:00
|
|
|
(contributed by Soonmyung Hong <email>hong@lunaris.hanmesoft.co.kr</email>)
|
2000-09-12 07:37:09 +02:00
|
|
|
* add some test cases to the EUC_JP regression test
|
|
|
|
* fix problem in regress/regress.sh in case of System V
|
|
|
|
* fix toupper(), tolower() to handle 8bit chars
|
|
|
|
|
|
|
|
Mar 25, 1998 MB PL2 is incorporated into PostgreSQL 6.3.1
|
|
|
|
|
|
|
|
Mar 10, 1998 PL2 released
|
|
|
|
* add regression test for EUC_JP, EUC_CN and MULE_INTERNAL
|
|
|
|
* add an English document (this file)
|
|
|
|
* fix problems concerning 8-bit single byte characters
|
|
|
|
|
|
|
|
Mar 1, 1998 PL1 released
|
2001-10-09 20:46:00 +02:00
|
|
|
</literallayout>
|
2000-09-12 07:37:09 +02:00
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2>
|
|
|
|
<title>WIN1250 on Windows/ODBC</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<!--
|
|
|
|
[Here is a good documentation explaining how to use WIN1250 on
|
2001-01-19 05:47:50 +01:00
|
|
|
Windows/ODBC from Pavel Behal]
|
2000-09-12 07:37:09 +02:00
|
|
|
|
|
|
|
Version: 0.91 for PgSQL 6.5
|
|
|
|
Author: Pavel Behal
|
|
|
|
Revised by: Tatsuo Ishii
|
2000-12-22 22:51:58 +01:00
|
|
|
Email: behal@opf.slu.cz
|
2000-09-12 07:37:09 +02:00
|
|
|
Licence: The Same as PostgreSQL
|
|
|
|
|
|
|
|
Sorry for my Eglish and C code, I'm not native :-)
|
|
|
|
|
|
|
|
!!!!!!!!!!!!!!!!!!!!!!!!! NO WARRANTY !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
|
|
|
|
-->
|
|
|
|
|
|
|
|
The WIN1250 character set on Windows client platforms can be used
|
|
|
|
with <productname>Postgres</productname> with locale support
|
|
|
|
enabled.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The following should be kept in mind:
|
|
|
|
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Success depends on proper system locales. This has been tested
|
2001-09-10 01:52:12 +02:00
|
|
|
with <systemitem class="osname">Red Hat 6.0</> and <systemitem
|
|
|
|
class="osname">Slackware 3.6</>, with <literal>cs_CZ.iso8859-2</literal> locale.
|
2000-09-12 07:37:09 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Never try to set the server multibyte database encoding to WIN1250.
|
|
|
|
Always use LATIN2 instead since there is not a WIN1250 locale
|
|
|
|
in Unix.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
2001-09-10 01:52:12 +02:00
|
|
|
WIN1250 encoding is usable only for Windows ODBC clients. The
|
2000-09-12 07:37:09 +02:00
|
|
|
characters are recoded on the fly, to be displayed and stored
|
|
|
|
back properly.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
When running, it is important to remember the following:
|
|
|
|
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
This configuration reorders your sort order depending on your
|
|
|
|
<envar>LC_<replaceable>x</replaceable></envar> settings. Don't be
|
|
|
|
confused with the regression test results since they don't use
|
|
|
|
locale.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
2001-09-13 17:55:24 +02:00
|
|
|
A locale such as <literal>ch</literal> is correctly sorted
|
2000-09-12 07:37:09 +02:00
|
|
|
only if your system
|
|
|
|
supports that locale; older systems may not do so but new ones
|
|
|
|
(e.g. RH6.0) do.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
You have to insert money as '<literal>162,50</literal>' (note
|
|
|
|
comma within the single-quotes).
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
At the time of writing (early 1999), this configuration has
|
|
|
|
not received extensive testing. Please let us know of any
|
|
|
|
changes you had to make!
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<procedure>
|
|
|
|
<title>WIN1250 on Windows/ODBC</title>
|
|
|
|
<step>
|
|
|
|
<para>
|
2001-01-19 05:47:50 +01:00
|
|
|
Compile <productname>Postgres</productname> with locale enabled
|
2000-09-12 07:37:09 +02:00
|
|
|
and the multibyte encoding set to <literal>LATIN2</literal>.
|
|
|
|
</para>
|
|
|
|
</step>
|
|
|
|
|
|
|
|
<step>
|
|
|
|
<para>
|
2001-01-19 05:47:50 +01:00
|
|
|
Set up your installation. Do not forget to create locale
|
2000-09-12 07:37:09 +02:00
|
|
|
variables in your profile (environment). For example (this may
|
|
|
|
not be correct for <emphasis>your</emphasis> environment):
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
LC_ALL=cs_CZ.ISO8859-2
|
|
|
|
LC_COLLATE=cs_CZ.ISO8859-2
|
|
|
|
LC_CTYPE=cs_CZ.ISO8859-2
|
|
|
|
LC_MONETARY=cs_CZ.ISO8859-2
|
|
|
|
LC_NUMERIC=cs_CZ.ISO8859-2
|
|
|
|
LC_TIME=cs_CZ.ISO8859-2
|
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
</step>
|
|
|
|
|
|
|
|
<step>
|
|
|
|
<para>
|
|
|
|
You have to start the postmaster with locales set!
|
|
|
|
</para>
|
|
|
|
</step>
|
|
|
|
|
|
|
|
<step>
|
|
|
|
<para>
|
|
|
|
Try it with Czech language, and have it sort on a query.
|
|
|
|
</para>
|
|
|
|
</step>
|
|
|
|
|
|
|
|
<step>
|
|
|
|
<para>
|
2001-09-10 01:52:12 +02:00
|
|
|
Install ODBC driver for <productname>PostgreSQL</productname> on your Windows machine.
|
2000-09-12 07:37:09 +02:00
|
|
|
</para>
|
|
|
|
</step>
|
|
|
|
|
|
|
|
<step>
|
|
|
|
<para>
|
|
|
|
Setup properly your data source. Include this line in your ODBC
|
|
|
|
configuration dialog in the field <literal>Connect Settings</literal>:
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
SET CLIENT_ENCODING = 'WIN1250';
|
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
</step>
|
|
|
|
|
|
|
|
<step>
|
|
|
|
<para>
|
|
|
|
Now try it again, but in Windows with ODBC.
|
|
|
|
</para>
|
|
|
|
</step>
|
|
|
|
</procedure>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
2000-09-30 18:58:20 +02:00
|
|
|
|
|
|
|
|
|
|
|
<sect1 id="recode">
|
|
|
|
<title>Single-byte character set recoding</>
|
|
|
|
<!-- formerly in README.charsets, by Josef Balatka, <balatka@email.cz> -->
|
|
|
|
|
|
|
|
<para>
|
|
|
|
You can set up this feature with the <option>--enable-recode</> option
|
|
|
|
to <filename>configure</>. This option was formerly described as
|
|
|
|
<quote>Cyrillic recode support</> which doesn't express all its
|
|
|
|
power. It can be used for <emphasis>any</> single-byte character
|
|
|
|
set recoding.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
This method uses a file <filename>charset.conf</> file located in
|
|
|
|
the database directory (<envar>PGDATA</>). It's a typical
|
|
|
|
configuration text file where spaces and newlines separate items
|
|
|
|
and records and # specifies comments. Three keywords with the
|
|
|
|
following syntax are recognized here:
|
|
|
|
<synopsis>
|
|
|
|
BaseCharset <replaceable>server_charset</>
|
|
|
|
RecodeTable <replaceable>from_charset</> <replaceable>to_charset</> <replaceable>file_name</>
|
|
|
|
HostCharset <replaceable>host_spec</> <replaceable>host_charset</>
|
|
|
|
</synopsis>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<token>BaseCharset</> defines the encoding of the database server.
|
|
|
|
All character set names are only used for mapping inside of
|
|
|
|
<filename>charset.conf</> so you can freely use typing-friendly
|
|
|
|
names.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<token>RecodeTable</> records specify translation tables between
|
|
|
|
server and client. The file name is relative to the
|
|
|
|
<envar>PGDATA</> directory. The table file format is very
|
|
|
|
simple. There are no keywords and characters are represented by a
|
|
|
|
pair of decimal or hexadecimal (0x prefixed) values on single
|
|
|
|
lines:
|
|
|
|
<synopsis>
|
|
|
|
<replaceable>char_value</> <replaceable>translated_char_value</>
|
|
|
|
</synopsis>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<token>HostCharset</> records define the client character set by IP
|
|
|
|
address. You can use a single IP address, an IP mask range starting
|
|
|
|
from the given address or an IP interval (e.g., 127.0.0.1,
|
|
|
|
192.168.1.100/24, 192.168.1.20-192.168.1.40).
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The <filename>charset.conf</> file is always processed up to the
|
|
|
|
end, so you can easily specify exceptions from the previous
|
2001-01-19 05:47:50 +01:00
|
|
|
rules. In the <filename>src/data/</> directory you will find an
|
|
|
|
example <filename>charset.conf</> and a few recoding tables.
|
2000-09-30 18:58:20 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
As this solution is based on the client's IP address and character
|
|
|
|
set mapping there are obviously some restrictions as well. You
|
|
|
|
cannot use different encodings on the same host at the same
|
|
|
|
time. It is also inconvenient when you boot your client hosts into
|
2001-01-19 05:47:50 +01:00
|
|
|
multiple operating systems. Nevertheless, when these restrictions are
|
2001-09-10 01:52:12 +02:00
|
|
|
not limiting and you do not need multibyte characters than it is a
|
2000-09-30 18:58:20 +02:00
|
|
|
simple and effective solution.
|
|
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
</chapter>
|
2000-09-12 07:37:09 +02:00
|
|
|
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
|
|
Local variables:
|
|
|
|
mode:sgml
|
|
|
|
sgml-omittag:nil
|
|
|
|
sgml-shorttag:t
|
|
|
|
sgml-minimize-attributes:nil
|
|
|
|
sgml-always-quote-attributes:t
|
|
|
|
sgml-indent-step:1
|
|
|
|
sgml-indent-data:t
|
|
|
|
sgml-parent-document:nil
|
|
|
|
sgml-default-dtd-file:"./reference.ced"
|
|
|
|
sgml-exposed-tags:nil
|
|
|
|
sgml-local-catalogs:("/usr/lib/sgml/catalog")
|
|
|
|
sgml-local-ecat-files:nil
|
|
|
|
End:
|
|
|
|
-->
|