From 972124091dc0e1e2b9ff8026fed219326fac9997 Mon Sep 17 00:00:00 2001 From: Bruce Momjian Date: Mon, 16 Aug 1999 20:27:19 +0000 Subject: [PATCH] I've sent 3 mails to pgsql-patches. There are two files, one for doc and for src/data directories, and one minor patch for doc/README.locale. Please apply. Oleg. --- doc/README.Charsets | 113 +++++++++++++++++++++++++++++++++++++++ doc/README.locale | 14 ++++- src/data/isocz-wincz.tab | 12 +++++ 3 files changed, 138 insertions(+), 1 deletion(-) create mode 100644 doc/README.Charsets create mode 100644 src/data/isocz-wincz.tab diff --git a/doc/README.Charsets b/doc/README.Charsets new file mode 100644 index 0000000000..d7fa298cde --- /dev/null +++ b/doc/README.Charsets @@ -0,0 +1,113 @@ + + PostgreSQL Charsets README + Josef Balatka, + Draft v0.1, Tue Jul 20 15:49:07 CEST 1999 + + This document is a brief overview of the national charsets support + that PostgreSQL ver. 6.5 has implemented. Various compilation options + and setup tips are mentioned here to be helpful in the particular use. + + --------------------------------------------------------------------------- + + Table of Contents + + 1. Locale awareness + + 2. Single-byte charsets recoding + + 3. Multi-byte support/recoding + + 4. Credits + + --------------------------------------------------------------------------- + + 1. Locale awareness + + PostgreSQL server supports both locale aware and locale not aware + (default) operational modes. You can determine this mode during the + configuration stage of the installation with --enable-locale option. + + If you don't use --enable-locale, the multi-language code will not be + compiled and PostgreSQL will behave as an ASCII compliant application. + This mode is useful for its speed but only provided that you don't + have to consider national specific chars. + + With --enable-locale you will get a locale aware server using LC_* + environment variables to determine how to process national specifics. + In this case strcoll(3) and similar functions are used internally + so speed is somewhat lower. + + Notice here that --enable-locale is sufficient when all your clients + use the same single-byte encoding as the database server does. + + When your clients use encoding different from the server than you have + to use, moreover, --enable-recode or --with-mb= options on + the server side or a particular client that does recoding itself (e.g. + there exists a PostgreSQL ODBC driver for Win32 with various Cyrillic + encoding capability). Option --with-mb= is necessary for the + multi-byte charsets support. + + + 2. Single-byte charsets recoding + + You can set up this feature with --enable-recode option. This option + is described as 'enable Cyrillic recode support' which doesn't express + all its power. It can be used for *any* single-byte charset recoding. + + This method uses charset.conf file located in the $PGDATA directory. + It's a typical configuration text file where spaces and newlines + separate items and records and # specifies comments. Three keywords + with the following syntax are recognized here: + + BaseCharset + RecodeTable + HostCharset + + BaseCharset defines encoding of the database server. All charset + names are only used for mapping inside the charset.conf so you can + freely use typing-friendly names. + + RecodeTable records specify translation table between server and client. + The file name is relative to the $PGDATA directory. Table file format + is very simple. There are no keywords and characters are represented by + a pair of decimal or hexadecimal (0x prefixed) values on single lines: + + + + HostCharset records define IP address and charset. You can use a single + IP address, an IP mask range starting from the given address or an IP + interval (e.g. 127.0.0.1, 192.168.1.100/24, 192.168.1.20-192.168.1.40) + + The charset.conf is always processed up to the end, so you can easily + specify exceptions from the previous rules. In the src/data you will + find charset.conf example and a few recoding tables. + + As this solution is based on the client's IP address / charset mapping + there are obviously some restrictions as well. You can't use different + encoding on the same host at the same time. It's also inconvenient when + you boot your client hosts into more operating systems. + Nevertheless, when these restrictions are not limiting and you don't + need multi-byte chars than it's a simple and effective solution. + + + 3. Multi-byte support/recoding + + It's a new generation of charset encoding in PostgreSQL designed as a + more complex solution supporting both single-byte and multi-byte chars. + You can set up this feature with --with-mb= option. + + There is no IP mapping file and recoding is controlled through the new + SQL statements. Recoding tables are included in the code. Many national + charsets are already supported and further will follow. + + See doc/README.mb, doc/README.mb.jp to get detailed instruction on how + to use the multibyte support. In the file doc/README.locale there is + a particular instruction on usage of the multibyte support with Cyrillic. + + + 4. Credits + + I'd like to thank the PostgreSQL development team and all contributors + for creating PostgreSQL. Thanks to Oleg Bartunov, Oleg Broytmann and + Tatsuo Ishii for opening the door into the multi-language world. + diff --git a/doc/README.locale b/doc/README.locale index fb5d0e8cd2..0b5203d669 100644 --- a/doc/README.locale +++ b/doc/README.locale @@ -1,5 +1,17 @@ =========== -14 Apr 1999 +1999 Jul 21 +=========== + + Josef Balatka, asked us to remove RECODE and sent me +Czech ISO-8859-2 -> WIN-1250 translation table. + RECODE is no longer contains Cyrillic RECODE and will stay in PostgreSQL. + + He also created some bits of documentation, mostly concerning RECODE - +see README.Charsets. + + +=========== +1999 Apr 14 =========== Tatsuo Ishii updated Multibyte support extending it diff --git a/src/data/isocz-wincz.tab b/src/data/isocz-wincz.tab new file mode 100644 index 0000000000..b27b0555a6 --- /dev/null +++ b/src/data/isocz-wincz.tab @@ -0,0 +1,12 @@ +# +# Czech ISO-8859-2 -> WIN-1250 translation table +# +165 188 +169 138 +171 141 +174 142 +181 190 +185 154 +187 157 +190 158 +