Commit Graph

8 Commits

Author SHA1 Message Date
Peter Eisentraut 86a1aae764 Fix typo in comment
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/20210716.170209.175434392011070182.horikyota.ntt%40gmail.com
2021-07-22 09:37:35 +02:00
Peter Eisentraut c4ba1bee68 Update headers of generated files
The scripts were changed in c98c35cd08,
but the output files were not updated to reflect the script changes.
2018-02-24 14:54:17 -05:00
Heikki Linnakangas dd12bef58c Include array size in forward declaration.
Some compilers require it. At least Visual Studio, according to the
buildfarm, and gcc with the -pedantic flag.
2017-03-13 21:53:38 +02:00
Heikki Linnakangas aeed17d000 Use radix tree for character encoding conversions.
Replace the mapping tables used to convert between UTF-8 and other
character encodings with new radix tree-based maps. Looking up an entry in
a radix tree is much faster than a binary search in the old maps. As a
bonus, the radix tree representation is also more compact, making the
binaries slightly smaller.

The "combined" maps work the same as before, with binary search. They are
much smaller than the main tables, so it doesn't matter so much. However,
the "combined" maps are now stored in the same .map files as the main
tables. This seems more clear, since they're always used together, and
generated from the same source files.

Patch by Kyotaro Horiguchi, with lot of hacking by me at various stages.
Reviewed by Michael Paquier and Daniel Gustafsson.

Discussion: https://www.postgresql.org/message-id/20170306.171609.204324917.horiguchi.kyotaro%40lab.ntt.co.jp
2017-03-13 20:46:39 +02:00
Heikki Linnakangas 1de9cc0dcc Rewrite the perl scripts to produce our Unicode conversion tables.
Generate EUC_CN mappings from gb-18030-2000.xml, because GB2312.TXT is no
longer available.

Get UHC from windows-949-2000.xml, it's more up-to-date.

Plus tons more small changes. With these changes, the perl scripts
faithfully produce the *.map files we have in the repository, from the
external source files.

In the passing, fix the Makefile to also download CP932.TXT and CP950.TXT.

Based on patches by Kyotaro Horiguchi, reviewed by Daniel Gustafsson.

Discussion: https://postgr.es/m/08e7892a-d55c-eefe-76e6-7910bc8dd1f3@iki.fi
2016-11-30 14:54:52 +02:00
Tom Lane 7730f48ede Teach UtfToLocal/LocalToUtf to support algorithmic encoding conversions.
Until now, these functions have only supported encoding conversions using
lookup tables, which is fine as long as there's not too many code points
to convert.  However, GB18030 expects all 1.1 million Unicode code points
to be convertible, which would require a ridiculously-sized lookup table.
Fortunately, a large fraction of those conversions can be expressed through
arithmetic, ie the conversions are one-to-one in certain defined ranges.
To support that, provide a callback function that is used after consulting
the lookup tables.  (This patch doesn't actually change anything about the
GB18030 conversion behavior, just provide infrastructure for fixing it.)

Since this requires changing the APIs of UtfToLocal/LocalToUtf anyway,
take the opportunity to rearrange their argument lists into what seems
to me a saner order.  And beautify the call sites by using lengthof()
instead of error-prone sizeof() arithmetic.

In passing, also mark all the lookup tables used by these calls "const".
This moves an impressive amount of stuff into the text segment, at least
on my machine, and is safer anyhow.
2015-05-14 22:27:12 -04:00
Tatsuo Ishii 5c7f55342b Update UTF-8 <--> EUC_KR, JOHAB, UHC mappings.
Patch contributed by Chuck McDevitt
2009-05-03 01:17:41 +00:00
Bruce Momjian e6227fd0ec Add missing Unicode multibyte files. 2002-03-06 06:12:59 +00:00