diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index c0826bdf5d..d348030e9e 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -1,4 +1,4 @@ - + Localization</> @@ -54,7 +54,7 @@ cultural preferences regarding alphabets, sorting, number formatting, etc. <productname>PostgreSQL</> uses the standard ISO C and POSIX-like locale facilities provided by the server operating - system. For additional information refer the documentation of your + system. For additional information refer to the documentation of your system. </para> @@ -62,7 +62,7 @@ <title>Overview</> <para> - Locale support is not build into <productname>PostgreSQL</> by + Locale support is not built into <productname>PostgreSQL</> by default; to enable it, supply the <option>--enable-locale</> option to the <filename>configure</> script: <informalexample> @@ -95,7 +95,7 @@ export LANG=sv_SE <para> Occasionally it is useful to mix rules from several locales, e.g., - use U.S. rules but Spanish messages. To do that a set of + use U.S. collation rules but Spanish messages. To do that a set of environment variables exist that override the default of <envar>LANG</> for a particular category: @@ -141,14 +141,23 @@ export LANG=sv_SE </para> <para> - Once you have chosen a set of localization rules this way you must - keep them fixed for any particular database cluster. That means - that the locales that were active when you ran <filename>initdb</> - must be kept the same when you start the postmaster. Otherwise, - the changed sort order can corrupt indexes or make your data - disappear mysteriously. It is currently not possible to change the - locales after database initialization or to use more than one set - of locales for a given database cluster. + Note that the locale behavior is determined by the environment + variables seen by the server, not by the environment of any client. + Therefore, be careful to set these variables before starting the + postmaster. + </para> + + <para> + The <envar>LC_COLLATE</> and <envar>LC_CTYPE</> variables affect the + sort order of indexes. Therefore, these values must be kept fixed + for any particular database cluster, or indexes on text columns will + become corrupt. <productname>Postgres</productname> enforces this + by recording the values of <envar>LC_COLLATE</> and <envar>LC_CTYPE</> + that are seen by <command>initdb</>. The server automatically adopts + those two values when it is started; only the other <envar>LC_</> + categories can be set from the environment at server startup. + In short, only one collation order can be used in a database cluster, + and it is chosen at <command>initdb</> time. </para> </sect2> @@ -183,7 +192,10 @@ export LANG=sv_SE <para> The only severe drawback of using the locale support in <productname>PostgreSQL</> is its speed. So use locale only if you - actually need it. + actually need it. It should be noted in particular that selecting + a non-C locale disables index optimizations for <literal>LIKE</> and + <literal>~</> operators, which can make a huge difference in the + speed of searches that use those operators. </para> </sect2> @@ -261,7 +273,7 @@ perl: warning: Falling back to the standard locale ("C"). <para> <acronym>MB</acronym> also fixes some problems concerning 8-bit single byte - character sets including ISO8859. (I would not say all of problems + character sets including ISO8859. (I would not say all problems have been fixed. I just confirmed that the regression test ran fine and a few French characters could be used with the patch. Please let me know if you find any problem while using 8-bit characters.) @@ -271,7 +283,7 @@ perl: warning: Falling back to the standard locale ("C"). <title>Enabling MB - Run configure with a multibyte option: + Run configure with the multibyte option: % ./configure --enable-multibyte[=encoding_system] @@ -383,11 +395,11 @@ perl: warning: Falling back to the standard locale ("C"). % initdb -E EUC_JP - sets the default encoding to EUC_JP(Extended Unix Code for Japanese). + sets the default encoding to EUC_JP (Extended Unix Code for Japanese). Note that you can use "--encoding" instead of "-E" if you prefer to type longer option strings. If no -E or --encoding option is given, the encoding - specified at the compile time is used. + specified at configure time is used. @@ -397,8 +409,8 @@ perl: warning: Falling back to the standard locale ("C"). % createdb -E EUC_KR korean - will create a database named "korean" with EUC_KR encoding. The - another way to accomplish this is to use a SQL command: + will create a database named "korean" with EUC_KR encoding. + Another way to accomplish this is to use a SQL command: CREATE DATABASE korean WITH ENCODING = 'EUC_KR'; @@ -527,20 +539,11 @@ char *pg_encoding_to_char(int encoding_id) - - - Using PGCLIENTENCODING. - - If an environment variable PGCLIENTENCODING is defined in the - frontend, an automatic encoding translation is done by the backend. - - - Using SET CLIENT_ENCODING TO. - Setting the frontend side encoding can be done a SQL command: + Setting the frontend side encoding can be done by this SQL command: SET CLIENT_ENCODING TO 'encoding'; @@ -552,7 +555,7 @@ SET CLIENT_ENCODING TO 'encoding'; SET NAMES 'encoding'; - To query the current the frontend encoding: + To query the current frontend encoding: SHOW CLIENT_ENCODING; @@ -565,6 +568,17 @@ RESET CLIENT_ENCODING; + + + + Using PGCLIENTENCODING. + + If environment variable PGCLIENTENCODING is defined + in the client's environment, that client encoding is automatically + selected when a backend connection is made. (This can subsequently + be overridden using any of the other methods mentioned above.) + + @@ -588,7 +602,7 @@ RESET CLIENT_ENCODING; Suppose you choose EUC_JP for the backend, LATIN1 for the frontend, then some Japanese characters could not be translated into LATIN1. In - this case, a letter cannot be represented in the LATIN1 character set, + this case, a letter that cannot be represented in the LATIN1 character set would be transformed as: @@ -601,7 +615,7 @@ RESET CLIENT_ENCODING; References - These are good sources to start learning various kind of encoding + These are good sources to start learning about various kinds of encoding systems. @@ -724,8 +738,7 @@ Mar 1, 1998 PL1 released