postgresql/src/common/unicode
Peter Eisentraut f85a485f89 Add support for automatically updating Unicode derived files
We currently have several sets of files generated from data provided
by Unicode.  These all have ad hoc rules and instructions for updating
when new Unicode versions appear, and it's not done consistently.

This patch centralizes and automates the process and makes it part of
the release checklist.  The Unicode and CLDR versions are specified in
Makefile.global.in.  There is a new make target "update-unicode" that
downloads all the relevant files and runs the generation script.

There is also a new script for generating the table of combining
characters for ucs_wcwidth().  That table is now in a separate include
file rather than hardcoded into the middle of other code.  This is
based on the script that was used for generating
d8594d123c, but the script itself wasn't
committed at that time.

Reviewed-by: John Naylor <john.naylor@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/c8d05f42-443e-6c23-819b-05b31759a37c@2ndquadrant.com
2020-01-09 10:08:14 +01:00
..
.gitignore Add support for automatically updating Unicode derived files 2020-01-09 10:08:14 +01:00
Makefile Add support for automatically updating Unicode derived files 2020-01-09 10:08:14 +01:00
README Add support for automatically updating Unicode derived files 2020-01-09 10:08:14 +01:00
generate-norm_test_table.pl Update copyrights for 2020 2020-01-01 12:21:45 -05:00
generate-unicode_combining_table.pl Add support for automatically updating Unicode derived files 2020-01-09 10:08:14 +01:00
generate-unicode_norm_table.pl Update copyrights for 2020 2020-01-01 12:21:45 -05:00
norm_test.c Update copyrights for 2020 2020-01-01 12:21:45 -05:00

README

This directory contains tools to generate the tables in
src/include/common/unicode_norm.h, used for Unicode normalization. The
generated .h file is included in the source tree, so these are normally not
needed to build PostgreSQL, only if you need to re-generate the .h file
from the Unicode data files for some reason, e.g. to update to a new version
of Unicode.

Generating unicode_norm_table.h
-------------------------------

Run

    make update-unicode

from the top level of the source tree and commit the result.

Tests
-----

The Unicode consortium publishes a comprehensive test suite for the
normalization algorithm, in a file called NormalizationTest.txt. This
directory also contains a perl script and some C code, to run our
normalization code with all the test strings in NormalizationTest.txt.
To download NormalizationTest.txt and run the tests:

    make normalization-check

This is also run as part of the update-unicode target.