postgresql/src/common/unicode
Andres Freund 401874ab02 meson: don't require 'touch' binary, make use of 'cp' optional
We already didn't use touch (some earlier version of the meson build did ),
and cp is only used for updating unicode files. The latter already depends on
the optional availability of 'wget', so doing the same for 'cp' makes sense.

Eventually we probably want a portable command for updating source code as
part of a target, but for now...

Reported-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/70e96c34-64ee-e549-8c4a-f91a7a668804@dunslane.net
2023-03-07 18:44:42 -08:00
..
.gitignore Update display widths as part of updating Unicode 2021-08-26 10:53:56 -04:00
Makefile Treat Unicode codepoints of category "Format" as non-spacing 2022-09-13 16:13:33 +07:00
README Add support for automatically updating Unicode derived files 2020-01-09 10:08:14 +01:00
generate-norm_test_table.pl Update copyright for 2023 2023-01-02 15:00:37 -05:00
generate-unicode_east_asian_fw_table.pl Update copyright for 2023 2023-01-02 15:00:37 -05:00
generate-unicode_nonspacing_table.pl Update copyright for 2023 2023-01-02 15:00:37 -05:00
generate-unicode_norm_table.pl Update copyright for 2023 2023-01-02 15:00:37 -05:00
generate-unicode_normprops_table.pl Update copyright for 2023 2023-01-02 15:00:37 -05:00
meson.build meson: don't require 'touch' binary, make use of 'cp' optional 2023-03-07 18:44:42 -08:00
norm_test.c Update copyright for 2023 2023-01-02 15:00:37 -05:00

README

This directory contains tools to generate the tables in
src/include/common/unicode_norm.h, used for Unicode normalization. The
generated .h file is included in the source tree, so these are normally not
needed to build PostgreSQL, only if you need to re-generate the .h file
from the Unicode data files for some reason, e.g. to update to a new version
of Unicode.

Generating unicode_norm_table.h
-------------------------------

Run

    make update-unicode

from the top level of the source tree and commit the result.

Tests
-----

The Unicode consortium publishes a comprehensive test suite for the
normalization algorithm, in a file called NormalizationTest.txt. This
directory also contains a perl script and some C code, to run our
normalization code with all the test strings in NormalizationTest.txt.
To download NormalizationTest.txt and run the tests:

    make normalization-check

This is also run as part of the update-unicode target.