Commit Graph

4 Commits

Author SHA1 Message Date
Tom Lane e17dab53ea Auto-generate file header comments in Unicode mapping files.
Some of the Unicode/*.map files had identification comments added to them,
evidently by hand.  Others did not.  Modify the generating scripts to
produce these comments automatically, and update the generated files that
lacked them.

This is just minor cleanup as a by-product of trying to verify that the
*.map files can indeed be reproduced from authoritative data.  There are a
depressingly large number that fail to reproduce from the claimed sources.
I have not touched those in this commit, except for the JIS 2004-related
files which required only a single comment update to match.

Since this only affects comments, no need to consider a back-patch.
2015-11-27 16:50:47 -05:00
Tom Lane 7730f48ede Teach UtfToLocal/LocalToUtf to support algorithmic encoding conversions.
Until now, these functions have only supported encoding conversions using
lookup tables, which is fine as long as there's not too many code points
to convert.  However, GB18030 expects all 1.1 million Unicode code points
to be convertible, which would require a ridiculously-sized lookup table.
Fortunately, a large fraction of those conversions can be expressed through
arithmetic, ie the conversions are one-to-one in certain defined ranges.
To support that, provide a callback function that is used after consulting
the lookup tables.  (This patch doesn't actually change anything about the
GB18030 conversion behavior, just provide infrastructure for fixing it.)

Since this requires changing the APIs of UtfToLocal/LocalToUtf anyway,
take the opportunity to rearrange their argument lists into what seems
to me a saner order.  And beautify the call sites by using lengthof()
instead of error-prone sizeof() arithmetic.

In passing, also mark all the lookup tables used by these calls "const".
This moves an impressive amount of stuff into the text segment, at least
on my machine, and is safer anyhow.
2015-05-14 22:27:12 -04:00
Tatsuo Ishii 188065cb5c Unicode conversion fix suggested by Jan Varga...
--------------------------------------------------
Subject: Bug in unicode conversion ...
From: Jan Varga <varga@utcru.sk>
To: t-ishii@sra.co.jp
Date: Sat, 18 Nov 2000 17:41:20 +0100 (CET)


Hi,

I tried this new feature in PostgreSQL. I found one bug.
Script UCS_to_8859.pl skips input lines which
1. code <0x80 or
2. ucs <0x100

I think second one is not good idea because some codes in ISO8859-2
have ucs <0x100 (e.g. 0xE9 - 0x00E9)
--------------------------------------------------
2000-11-26 10:40:43 +00:00
Tatsuo Ishii 1acf6f9c8e Add support for code conversion between Unicode and other encodings.
Supported encodings are: EUC_JP, EUC_CN, EUC_KR, EUC_TW, Shift JIS,
Big5, ISO8859-[1-5].
TODO: testings! and documentations...
2000-10-30 10:41:05 +00:00