postgresql/src/common/unicode
Tom Lane 0245f8db36 Pre-beta mechanical code beautification.
Run pgindent, pgperltidy, and reformat-dat-files.

This set of diffs is a bit larger than typical.  We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop).  We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up.  Going
forward, that should make for fewer random-seeming changes to existing
code.

Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
2023-05-19 17:24:48 -04:00
..
.gitignore Update display widths as part of updating Unicode 2021-08-26 10:53:56 -04:00
Makefile Treat Unicode codepoints of category "Format" as non-spacing 2022-09-13 16:13:33 +07:00
README Add support for automatically updating Unicode derived files 2020-01-09 10:08:14 +01:00
generate-norm_test_table.pl Pre-beta mechanical code beautification. 2023-05-19 17:24:48 -04:00
generate-unicode_east_asian_fw_table.pl Update copyright for 2023 2023-01-02 15:00:37 -05:00
generate-unicode_nonspacing_table.pl Update copyright for 2023 2023-01-02 15:00:37 -05:00
generate-unicode_norm_table.pl Pre-beta mechanical code beautification. 2023-05-19 17:24:48 -04:00
generate-unicode_normprops_table.pl Pre-beta mechanical code beautification. 2023-05-19 17:24:48 -04:00
meson.build meson: don't require 'touch' binary, make use of 'cp' optional 2023-03-07 18:44:42 -08:00
norm_test.c Update copyright for 2023 2023-01-02 15:00:37 -05:00

README

This directory contains tools to generate the tables in
src/include/common/unicode_norm.h, used for Unicode normalization. The
generated .h file is included in the source tree, so these are normally not
needed to build PostgreSQL, only if you need to re-generate the .h file
from the Unicode data files for some reason, e.g. to update to a new version
of Unicode.

Generating unicode_norm_table.h
-------------------------------

Run

    make update-unicode

from the top level of the source tree and commit the result.

Tests
-----

The Unicode consortium publishes a comprehensive test suite for the
normalization algorithm, in a file called NormalizationTest.txt. This
directory also contains a perl script and some C code, to run our
normalization code with all the test strings in NormalizationTest.txt.
To download NormalizationTest.txt and run the tests:

    make normalization-check

This is also run as part of the update-unicode target.