Commit Graph

49 Commits

Author SHA1 Message Date
Bruce Momjian 0658a6a634 Formatting cleanup. 2005-12-24 16:49:48 +00:00
Tatsuo Ishii 804f6b8fc9 Fix long standing Asian multibyte charsets bug.
See:

Subject: [HACKERS] bugs with certain Asian multibyte charsets
From: Tatsuo Ishii <ishii@sraoss.co.jp>
To: pgsql-hackers@postgresql.org
Date: Sat, 24 Dec 2005 18:25:33 +0900 (JST)

for more details/
2005-12-24 09:35:36 +00:00
Peter Eisentraut 07bb9f086b Message corrections 2005-10-29 00:31:52 +00:00
Bruce Momjian 1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane 8889685555 Suppress signed-vs-unsigned-char warnings. 2005-09-24 17:53:28 +00:00
Bruce Momjian 5955945828 Support 3 and 4-byte unicode characters.
John Hansen
2005-06-15 00:15:08 +00:00
Bruce Momjian e7fb9f18bf Add support for Win1252 encoding.
Roland Volkmann
2005-03-14 18:31:25 +00:00
Bruce Momjian 41e2a80f57 Update comments for new encoding names. 2005-03-14 00:19:13 +00:00
Bruce Momjian e3d7de6b99 Rename canonical encodings, per Peter:
UNICODE => UTF8
	ALT => WIN866
	WIN => WIN1251
	TCVN => WIN1258

The old codes continue to work.
2005-03-07 04:30:55 +00:00
Bruce Momjian 08e0b34bad Back out fix for Unicode characters above 0x10000 2004-12-03 01:20:33 +00:00
Bruce Momjian 4ea4f8bd06 Fix for Unicode characters above 0x10000.
John Hansen
2004-12-02 22:37:14 +00:00
Peter Eisentraut 152a101f2b Allow WIN1250 as server encoding. 2004-09-17 21:59:57 +00:00
Bruce Momjian b6b71b85bc Pgindent run for 8.0. 2004-08-29 05:07:03 +00:00
Tatsuo Ishii e8c3205037 Add PQmbdsplen() which returns the "display length" of a character.
Still some works needed:
- UTF-8, MULE_INTERNAL always returns 1
2004-03-15 10:41:26 +00:00
PostgreSQL Daemon 55b113257c make sure the $Id tags are converted to $PostgreSQL as well ... 2003-11-29 22:41:33 +00:00
Peter Eisentraut feb4f44d29 Message editing: remove gratuitous variations in message wording, standardize
terms, add some clarifications, fix some untranslatable attempts at dynamic
message building.
2003-09-25 06:58:07 +00:00
Bruce Momjian 089003fb46 pgindent run. 2003-08-04 00:43:34 +00:00
Tom Lane b6a1d25b0a Error message editing in utils/adt. Again thanks to Joe Conway for doing
the bulk of the heavy lifting ...
2003-07-27 04:53:12 +00:00
Tatsuo Ishii 38535f8e32 Fix typo in an error message 2003-01-11 06:55:11 +00:00
Bruce Momjian e50f52a074 pgindent run. 2002-09-04 20:31:48 +00:00
Peter Eisentraut 77f7763b55 Remove all traces of multibyte and locale options. Clean up comments
referring to "multibyte" where it really means character encoding.
2002-09-03 21:45:44 +00:00
Tatsuo Ishii 14f72b9a4d Add GB18030 support. Contributed by Bill Huang <bill_huanghb@ybb.ne.jp>
(ODBC support has not been committed yet. left for Hiroshi...)
2002-06-13 08:30:22 +00:00
Bruce Momjian a8bd7e1c6e > Tatsuo Ishii wrote:
> > > > It was made to cope with encoding such as an Asian bloc in 7.2Beta2.
> > > >
> > > > Added ServerEncoding
> > > >         Korean (JOHAB), Thai (WIN874),
> > > >         Vietnamese (TCVN), Arabic (WIN1256)
> > > >
> > > > Added ClientEncoding
> > > >         Simplified Chinese (GBK), Korean (UHC)
> > > >
> > > >
> > > >
> http://www.sankyo-unyu.co.jp/Pool/postgresql-7.2b2.newencoding.diff.tar.gz
> > > > (608K)
> > >
> > > Looks good.  I need some people to review this for me.
> >
> > For me they look good too. The only missing part is a
> > documentation. I will ask him to write it up. If he couldn't, I will
> > do it for him.
> > > The diff is 3mb
> > > but appears to address only additions to multibyte.  I have attached a
> > > list of files it modifies.  Also, look at the sizes of the mb/
> > > directory.  It is getting large:
> > >
> > >   4       ./CVS
> > >   6       ./Unicode/CVS
> > >   3433    ./Unicode
> > >   6197    .
> >
> > Yes. We definitely need the on-the-fly encoding addition capability:
> > i.e. CREATE CHRACTER SET in the future...
> > --
> > Tatsuo Ishii
> >
> >

Address chainge.

http://www.sankyo-unyu.co.jp/Pool/postgresql-7.2.newencoding.diff.gz

Add PsqlODBC and document ...etc patch.

Eiji Tokuya
2002-03-05 05:52:50 +00:00
Bruce Momjian 6783b2372e Another pgindent run. Fixes enum indenting, and improves #endif
spacing.  Also adds space for one-line comments.
2001-10-28 06:26:15 +00:00
Bruce Momjian b81844b173 pgindent run on all C files. Java run to follow. initdb/regression
tests pass.
2001-10-25 05:50:21 +00:00
Tatsuo Ishii d07bacd54a Add UTF-8 char >= 0x10000 check 2001-10-15 01:19:15 +00:00
Tatsuo Ishii 51053d3216 Add support for ISO-8859-6 to 16 2001-10-11 14:20:35 +00:00
Tatsuo Ishii be629abfc8 Add pg_database_encoding_max_length() function. 2001-09-23 10:59:45 +00:00
Tom Lane e3f5bc3492 Fix type_maximum_size() to give the right answer in MULTIBYTE cases.
Avoid use of prototype-less function pointers in MB code.
2001-09-21 15:27:38 +00:00
Tatsuo Ishii e1de3e0833 Implement following item in TODO:
* Reject character sequences those are not valid in their charset
2001-09-11 04:50:36 +00:00
Tatsuo Ishii 227767112c Commit Karel's patch.
-------------------------------------------------------------------
Subject: Re: [PATCHES] encoding names
From: Karel Zak <zakkr@zf.jcu.cz>
To: Peter Eisentraut <peter_e@gmx.net>
Cc: pgsql-patches <pgsql-patches@postgresql.org>
Date: Fri, 31 Aug 2001 17:24:38 +0200

On Thu, Aug 30, 2001 at 01:30:40AM +0200, Peter Eisentraut wrote:
> > 		- convert encoding 'name' to 'id'
>
> I thought we decided not to add functions returning "new" names until we
> know exactly what the new names should be, and pending schema

 Ok, the patch not to add functions.

> better
>
>     ...(): encoding name too long

 Fixed.

 I found new bug in command/variable.c in parse_client_encoding(), nobody
probably never see this error:

if (pg_set_client_encoding(encoding))
{
	elog(ERROR, "Conversion between %s and %s is not supported",
                     value, GetDatabaseEncodingName());
}

because pg_set_client_encoding() returns -1 for error and 0 as true.
It's fixed too.

 IMHO it can be apply.

		Karel
PS:

    * following files are renamed:

src/utils/mb/Unicode/KOI8_to_utf8.map  -->
        src/utils/mb/Unicode/koi8r_to_utf8.map

src/utils/mb/Unicode/WIN_to_utf8.map  -->
        src/utils/mb/Unicode/win1251_to_utf8.map

src/utils/mb/Unicode/utf8_to_KOI8.map -->
        src/utils/mb/Unicode/utf8_to_koi8r.map

src/utils/mb/Unicode/utf8_to_WIN.map -->
        src/utils/mb/Unicode/utf8_to_win1251.map

   * new file:

src/utils/mb/encname.c

   * removed file:

src/utils/mb/common.c

--
 Karel Zak  <zakkr@zf.jcu.cz>
 http://home.zf.jcu.cz/~zakkr/

 C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
2001-09-06 04:57:30 +00:00
Tatsuo Ishii b9be04e63d Add a crash gurard to pg_encoding_mblen in case of an invalid encoding
given.
2001-04-19 02:34:35 +00:00
Bruce Momjian 9e1552607a pgindent run. Make it all clean. 2001-03-22 04:01:46 +00:00
Tom Lane 572fda2711 Modify wchar conversion routines to not fetch the next byte past the end
of a counted input string.  Marinos Yannikos' recent crash report turns
out to be due to applying pg_ascii2wchar_with_len to a TEXT object that
is smack up against the end of memory.  This is the second just-barely-
reproducible bug report I have seen that traces to some bit of code
fetching one more byte than it is allowed to.  Let's be more careful
out there, boys and girls.
While at it, I changed the code to not risk a similar crash when there
is a truncated multibyte character at the end of an input string.  The
output in this case might not be the most reasonable output possible;
if anyone wants to improve it further, step right up...
2001-03-08 00:24:34 +00:00
Tatsuo Ishii 8f17e53f0e Move pg_encoding_mblen() from common.c to wchar.c. 2001-02-11 01:59:22 +00:00
Tom Lane d08741eab5 Restructure the key include files per recent pghackers discussion: there
are now separate files "postgres.h" and "postgres_fe.h", which are meant
to be the primary include files for backend .c files and frontend .c files
respectively.  By default, only include files meant for frontend use are
installed into the installation include directory.  There is a new make
target 'make install-all-headers' that adds the whole content of the
src/include tree to the installed fileset, for use by people who want to
develop server-side code without keeping the complete source tree on hand.
Cleaned up a whole lot of crufty and inconsistent header inclusions.
2001-02-10 02:31:31 +00:00
Tatsuo Ishii de53ce8131 Support for conversion between UNICODE and other encodings
currently ISO8859-[1-5] and EUC_JP are supported.
support for other encodings will be coming soon.
2000-10-12 06:06:50 +00:00
Tatsuo Ishii bfdd6a716d Change pg_mblen and pg_encoding_mblen return types from void
to int so that they return the number of whcars.
2000-08-27 10:40:48 +00:00
Bruce Momjian 52f77df613 Ye-old pgindent run. Same 4-space tabs. 2000-04-12 17:17:23 +00:00
Tatsuo Ishii 6f843e8dd8 Fix pg_euccn_mblen() so that it always returns 2 if data is not ascii.
(EUC_CN does have only code set 0 and 1)
2000-01-25 02:12:27 +00:00
Tatsuo Ishii 8f02f2252d Fix some compiler warnings (Tomoaki Nishiyama), add WIN1250 support (Pavel Behal) 1999-07-11 22:47:21 +00:00
Bruce Momjian 07842084fe pgindent run over code. 1999-05-25 16:15:34 +00:00
Tom Lane 0d99c95388 Correct potential infinite loop in pg_utf2wchar_with_len;
it failed to cover the case where high bits of char are 100 or 101.
Not sure if fix is right, but it agrees with pg_utf_mblen ... and it
doesn't lock up ...
1999-04-25 20:35:51 +00:00
Tatsuo Ishii 5ae9d85f77 Add KOI8/WIN/ALT support 1999-03-24 07:02:17 +00:00
Bruce Momjian a7ad43cd18 Included patches make some enhancements to the multi-byte support.
o allow to use Big5 (a Chinese encoding used in Taiwan) as a client
  encoding. In this case the server side encoding should be EUC_TW

o add EUC_TW and Big5 test cases to the regression and the mb test
  (contributed by Jonah Kuo)

o fix mistake in include/mb/pg_wchar.h. An encoding id for EUC_TW was
  not correct (was 3 and now is 4)

o update documents (doc/README.mb and README.mb.jp)

o update psql helpfile (bin/psql/psqlHelp.h)

--
Tatsuo Ishii
t-ishii@sra.co.jp
1999-02-02 18:51:40 +00:00
Bruce Momjian fa1a8d6a97 OK, folks, here is the pgindent output. 1998-09-01 04:40:42 +00:00
Bruce Momjian 2aab1b9a22 >Applied.
Thanks. But patches for src/backend/catalog/Makefile seems missing
in the current source tree. Please apply attached patches.

It also includes some corrections to src/backend/util/mb/wchar.c.
-- Tatsuo Ishii t-ishii@sra.co.jp
1998-08-25 04:19:16 +00:00
Bruce Momjian c0b01461db o note that now pg_database has a new attribuite "encoding" even
if MULTIBYTE is not enabled. So be sure to run initdb.

o these patches are made against the latest source tree (after
Bruce's massive patch, I think) BTW, I noticed that after running
regression, the oid field of pg_type seems disappeared.

	regression=> select oid from pg_type; ERROR:  attribute
	'oid' not found

this happens after the constraints test. This occures with/without
my patches. strange...

o pg_database_mb.h, pg_class_mb.h, pg_attribute_mb.h are no longer
used, and shoud be removed.

o GetDatabaseInfo() in utils/misc/database.c removed (actually in
#ifdef 0). seems nobody uses.

t-ishii@sra.co.jp
1998-08-24 01:14:24 +00:00
Marc G. Fournier bf00bbb0c4 I really hope that I haven't missed anything in this one...
From: t-ishii@sra.co.jp

Attached are patches to enhance the multi-byte support.  (patches are
against 7/18 snapshot)

* determine encoding at initdb/createdb rather than compile time

Now initdb/createdb has an option to specify the encoding. Also, I
modified the syntax of CREATE DATABASE to accept encoding option. See
README.mb for more details.

For this purpose I have added new column "encoding" to pg_database.
Also pg_attribute and pg_class are changed to catch up the
modification to pg_database.  Actually I haved added pg_database_mb.h,
pg_attribute_mb.h and pg_class_mb.h. These are used only when MB is
enabled. The reason having separate files is I couldn't find a way to
use ifdef or whatever in those files. I have to admit it looks
ugly. No way.

* support for PGCLIENTENCODING when issuing COPY command

commands/copy.c modified.

* support for SQL92 syntax "SET NAMES"

See gram.y.

* support for LATIN2-5
* add UNICODE regression test case
* new test suite for MB

New directory test/mb added.

* clean up source files

Basic idea is to have MB's own subdirectory for easier maintenance.
These are include/mb and backend/utils/mb.
1998-07-24 03:32:46 +00:00