Further hacking on ICU collation creation and usage.

pg_import_system_collations() refused to create any ICU collations if
the current database's encoding didn't support ICU.  This is wrongheaded:
initdb must initialize pg_collation in an encoding-independent way
since it might be used in other databases with different encodings.
The reason for the restriction seems to be that get_icu_locale_comment()
used icu_from_uchar() to convert the UChar-format display name, and that
unsurprisingly doesn't know what to do in unsupported encodings.
But by the same token that the initial catalog contents must be
encoding-independent, we can't allow non-ASCII characters in the comment
strings.  So we don't really need icu_from_uchar() here: just check for
Unicode codes outside the ASCII range, and if there are none, the format
conversion is trivial.  If there are some, we can simply not install the
comment.  (In my testing, this affects only Norwegian Bokmål, which has
given us trouble before.)

For paranoia's sake, also check for non-ASCII characters in ICU locale
names, and skip such locales, as we do for libc locales.  I don't
currently have a reason to believe that this will ever reject anything,
but then again the libc maintainers should have known better too.

With just the import changes, ICU collations can be found in pg_collation
in databases with unsupported encodings.  This resulted in more or less
clean failures at runtime, but that's not how things act for unsupported
encodings with libc collations.  Make it work the same as our traditional
behavior for libc collations by having collation lookup take into account
whether is_encoding_supported_by_icu().

Adjust documentation to match.  Also, expand Table 23.1 to show which
encodings are supported by ICU.

catversion bump because of likely change in pg_collation/pg_description
initial contents in ICU-enabled builds.

Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
This commit is contained in:
Tom Lane 2017-06-24 13:54:15 -04:00
parent a15b47df35
commit ddb5fdc068
4 changed files with 194 additions and 87 deletions

View File

@ -508,8 +508,8 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
operating system C library. These are the locales that most tools operating system C library. These are the locales that most tools
provided by the operating system use. Another provider provided by the operating system use. Another provider
is <literal>icu</literal>, which uses the external is <literal>icu</literal>, which uses the external
ICU<indexterm><primary>ICU</></> library. Support for ICU has to be ICU<indexterm><primary>ICU</></> library. ICU locales can only be
configured when PostgreSQL is built. used if support for ICU was configured when PostgreSQL was built.
</para> </para>
<para> <para>
@ -529,12 +529,12 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
</para> </para>
<para> <para>
A collation provided by <literal>icu</literal> maps to a named collator A collation object provided by <literal>icu</literal> maps to a named
provided by the ICU library. ICU does not support collator provided by the ICU library. ICU does not support
separate <quote>collate</quote> and <quote>ctype</quote> settings, so they separate <quote>collate</quote> and <quote>ctype</quote> settings, so
are always the same. Also, ICU collations are independent of the they are always the same. Also, ICU collations are independent of the
encoding, so there is always only one ICU collation for a given name in a encoding, so there is always only one ICU collation of a given name in
database. a database.
</para> </para>
<sect3> <sect3>
@ -566,10 +566,10 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
<para> <para>
If the operating system provides support for using multiple locales If the operating system provides support for using multiple locales
within a single program (<function>newlocale</> and related functions), within a single program (<function>newlocale</> and related functions),
or support for ICU is configured, or if support for ICU is configured,
then when a database cluster is initialized, <command>initdb</command> then when a database cluster is initialized, <command>initdb</command>
populates the system catalog <literal>pg_collation</literal> with populates the system catalog <literal>pg_collation</literal> with
collations based on all the locales it finds on the operating collations based on all the locales it finds in the operating
system at the time. system at the time.
</para> </para>
@ -602,10 +602,12 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
directly to the locales installed in the operating system, which can be directly to the locales installed in the operating system, which can be
listed using the command <literal>locale -a</literal>. In case listed using the command <literal>locale -a</literal>. In case
a <literal>libc</literal> collation is needed that has different values a <literal>libc</literal> collation is needed that has different values
for <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol>, or new for <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol>, or if new
locales are installed in the operating system after the database system locales are installed in the operating system after the database system
was initialized, then a new collation may be created using was initialized, then a new collation may be created using
the <xref linkend="sql-createcollation"> command. the <xref linkend="sql-createcollation"> command.
New operating system locales can also be imported en masse using
the <link linkend="functions-admin-collation"><function>pg_import_system_collations()</function></link> function.
</para> </para>
<para> <para>
@ -617,8 +619,8 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
Use of the stripped collation names is recommended, since it will Use of the stripped collation names is recommended, since it will
make one less thing you need to change if you decide to change to make one less thing you need to change if you decide to change to
another database encoding. Note however that the <literal>default</>, another database encoding. Note however that the <literal>default</>,
<literal>C</>, and <literal>POSIX</> collations, as well as all collations <literal>C</>, and <literal>POSIX</> collations can be used regardless of
provided by ICU can be used regardless of the database encoding. the database encoding.
</para> </para>
<para> <para>
@ -641,7 +643,7 @@ SELECT a COLLATE "C" &lt; b COLLATE "POSIX" FROM test1;
Collations provided by ICU are created with names in BCP 47 language tag Collations provided by ICU are created with names in BCP 47 language tag
format, with a <quote>private use</quote> format, with a <quote>private use</quote>
extension <literal>-x-icu</literal> appended, to distinguish them from extension <literal>-x-icu</literal> appended, to distinguish them from
libc locales. So <literal>de-x-icu</literal> would be an example. libc locales. So <literal>de-x-icu</literal> would be an example name.
</para> </para>
<para> <para>
@ -652,7 +654,7 @@ SELECT a COLLATE "C" &lt; b COLLATE "POSIX" FROM test1;
See <ulink url="http://userguide.icu-project.org/locale"></ulink> for See <ulink url="http://userguide.icu-project.org/locale"></ulink> for
information on ICU locale naming. <command>initdb</command> uses the ICU information on ICU locale naming. <command>initdb</command> uses the ICU
APIs to extract a set of locales with distinct collation rules to populate APIs to extract a set of locales with distinct collation rules to populate
the initial set of collations. Here are some examples collations that the initial set of collations. Here are some example collations that
might be created: might be created:
<variablelist> <variablelist>
@ -675,7 +677,7 @@ SELECT a COLLATE "C" &lt; b COLLATE "POSIX" FROM test1;
<listitem> <listitem>
<para>German collation for Austria, default variant</para> <para>German collation for Austria, default variant</para>
<para> <para>
(Note that as of this writing, there is no, (As of this writing, there is no,
say, <literal>de-DE-x-icu</literal> or <literal>de-CH-x-icu</literal>, say, <literal>de-DE-x-icu</literal> or <literal>de-CH-x-icu</literal>,
because those are equivalent to <literal>de-x-icu</literal>.) because those are equivalent to <literal>de-x-icu</literal>.)
</para> </para>
@ -701,9 +703,11 @@ SELECT a COLLATE "C" &lt; b COLLATE "POSIX" FROM test1;
</para> </para>
<para> <para>
Some (less frequently used) encodings are not supported by ICU. If the Some (less frequently used) encodings are not supported by ICU. When the
database cluster was initialized with such an encoding, no ICU collations database encoding is one of these, ICU collation entries
will be predefined. in <literal>pg_collation</literal> are ignored. Attempting to use one
will draw an error along the lines of <quote>collation "de-x-icu" for
encoding "WIN874" does not exist</>.
</para> </para>
</sect4> </sect4>
</sect3> </sect3>
@ -761,8 +765,11 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
classification) and <envar>LC_COLLATE</> (string sort order) locale classification) and <envar>LC_COLLATE</> (string sort order) locale
settings. For <literal>C</> or settings. For <literal>C</> or
<literal>POSIX</> locale, any character set is allowed, but for other <literal>POSIX</> locale, any character set is allowed, but for other
locales there is only one character set that will work correctly. libc-provided locales there is only one character set that will work
correctly.
(On Windows, however, UTF-8 encoding can be used with any locale.) (On Windows, however, UTF-8 encoding can be used with any locale.)
If you have ICU support configured, ICU-provided locales can be used
with most but not all server-side encodings.
</para> </para>
<sect2 id="multibyte-charset-supported"> <sect2 id="multibyte-charset-supported">
@ -775,13 +782,14 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<table id="charset-table"> <table id="charset-table">
<title><productname>PostgreSQL</productname> Character Sets</title> <title><productname>PostgreSQL</productname> Character Sets</title>
<tgroup cols="6"> <tgroup cols="7">
<thead> <thead>
<row> <row>
<entry>Name</entry> <entry>Name</entry>
<entry>Description</entry> <entry>Description</entry>
<entry>Language</entry> <entry>Language</entry>
<entry>Server?</entry> <entry>Server?</entry>
<entry>ICU?</entry>
<!-- <!--
The Bytes/Char field is populated by looking at the values returned The Bytes/Char field is populated by looking at the values returned
by pg_wchar_table.mblen function for each encoding. by pg_wchar_table.mblen function for each encoding.
@ -796,6 +804,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Big Five</entry> <entry>Big Five</entry>
<entry>Traditional Chinese</entry> <entry>Traditional Chinese</entry>
<entry>No</entry> <entry>No</entry>
<entry>No</entry>
<entry>1-2</entry> <entry>1-2</entry>
<entry><literal>WIN950</>, <literal>Windows950</></entry> <entry><literal>WIN950</>, <literal>Windows950</></entry>
</row> </row>
@ -804,6 +813,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Extended UNIX Code-CN</entry> <entry>Extended UNIX Code-CN</entry>
<entry>Simplified Chinese</entry> <entry>Simplified Chinese</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1-3</entry> <entry>1-3</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -812,6 +822,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Extended UNIX Code-JP</entry> <entry>Extended UNIX Code-JP</entry>
<entry>Japanese</entry> <entry>Japanese</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1-3</entry> <entry>1-3</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -820,6 +831,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Extended UNIX Code-JP, JIS X 0213</entry> <entry>Extended UNIX Code-JP, JIS X 0213</entry>
<entry>Japanese</entry> <entry>Japanese</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>No</entry>
<entry>1-3</entry> <entry>1-3</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -828,6 +840,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Extended UNIX Code-KR</entry> <entry>Extended UNIX Code-KR</entry>
<entry>Korean</entry> <entry>Korean</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1-3</entry> <entry>1-3</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -836,6 +849,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Extended UNIX Code-TW</entry> <entry>Extended UNIX Code-TW</entry>
<entry>Traditional Chinese, Taiwanese</entry> <entry>Traditional Chinese, Taiwanese</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1-3</entry> <entry>1-3</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -844,6 +858,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>National Standard</entry> <entry>National Standard</entry>
<entry>Chinese</entry> <entry>Chinese</entry>
<entry>No</entry> <entry>No</entry>
<entry>No</entry>
<entry>1-4</entry> <entry>1-4</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -852,6 +867,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Extended National Standard</entry> <entry>Extended National Standard</entry>
<entry>Simplified Chinese</entry> <entry>Simplified Chinese</entry>
<entry>No</entry> <entry>No</entry>
<entry>No</entry>
<entry>1-2</entry> <entry>1-2</entry>
<entry><literal>WIN936</>, <literal>Windows936</></entry> <entry><literal>WIN936</>, <literal>Windows936</></entry>
</row> </row>
@ -860,6 +876,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>ISO 8859-5, <acronym>ECMA</> 113</entry> <entry>ISO 8859-5, <acronym>ECMA</> 113</entry>
<entry>Latin/Cyrillic</entry> <entry>Latin/Cyrillic</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -868,6 +885,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>ISO 8859-6, <acronym>ECMA</> 114</entry> <entry>ISO 8859-6, <acronym>ECMA</> 114</entry>
<entry>Latin/Arabic</entry> <entry>Latin/Arabic</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -876,6 +894,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>ISO 8859-7, <acronym>ECMA</> 118</entry> <entry>ISO 8859-7, <acronym>ECMA</> 118</entry>
<entry>Latin/Greek</entry> <entry>Latin/Greek</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -884,6 +903,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>ISO 8859-8, <acronym>ECMA</> 121</entry> <entry>ISO 8859-8, <acronym>ECMA</> 121</entry>
<entry>Latin/Hebrew</entry> <entry>Latin/Hebrew</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -892,6 +912,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry><acronym>JOHAB</></entry> <entry><acronym>JOHAB</></entry>
<entry>Korean (Hangul)</entry> <entry>Korean (Hangul)</entry>
<entry>No</entry> <entry>No</entry>
<entry>No</entry>
<entry>1-3</entry> <entry>1-3</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -900,6 +921,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry><acronym>KOI</acronym>8-R</entry> <entry><acronym>KOI</acronym>8-R</entry>
<entry>Cyrillic (Russian)</entry> <entry>Cyrillic (Russian)</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry><literal>KOI8</></entry> <entry><literal>KOI8</></entry>
</row> </row>
@ -908,6 +930,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry><acronym>KOI</acronym>8-U</entry> <entry><acronym>KOI</acronym>8-U</entry>
<entry>Cyrillic (Ukrainian)</entry> <entry>Cyrillic (Ukrainian)</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -916,6 +939,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>ISO 8859-1, <acronym>ECMA</> 94</entry> <entry>ISO 8859-1, <acronym>ECMA</> 94</entry>
<entry>Western European</entry> <entry>Western European</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry><literal>ISO88591</></entry> <entry><literal>ISO88591</></entry>
</row> </row>
@ -924,6 +948,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>ISO 8859-2, <acronym>ECMA</> 94</entry> <entry>ISO 8859-2, <acronym>ECMA</> 94</entry>
<entry>Central European</entry> <entry>Central European</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry><literal>ISO88592</></entry> <entry><literal>ISO88592</></entry>
</row> </row>
@ -932,6 +957,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>ISO 8859-3, <acronym>ECMA</> 94</entry> <entry>ISO 8859-3, <acronym>ECMA</> 94</entry>
<entry>South European</entry> <entry>South European</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry><literal>ISO88593</></entry> <entry><literal>ISO88593</></entry>
</row> </row>
@ -940,6 +966,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>ISO 8859-4, <acronym>ECMA</> 94</entry> <entry>ISO 8859-4, <acronym>ECMA</> 94</entry>
<entry>North European</entry> <entry>North European</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry><literal>ISO88594</></entry> <entry><literal>ISO88594</></entry>
</row> </row>
@ -948,6 +975,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>ISO 8859-9, <acronym>ECMA</> 128</entry> <entry>ISO 8859-9, <acronym>ECMA</> 128</entry>
<entry>Turkish</entry> <entry>Turkish</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry><literal>ISO88599</></entry> <entry><literal>ISO88599</></entry>
</row> </row>
@ -956,6 +984,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>ISO 8859-10, <acronym>ECMA</> 144</entry> <entry>ISO 8859-10, <acronym>ECMA</> 144</entry>
<entry>Nordic</entry> <entry>Nordic</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry><literal>ISO885910</></entry> <entry><literal>ISO885910</></entry>
</row> </row>
@ -964,6 +993,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>ISO 8859-13</entry> <entry>ISO 8859-13</entry>
<entry>Baltic</entry> <entry>Baltic</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry><literal>ISO885913</></entry> <entry><literal>ISO885913</></entry>
</row> </row>
@ -972,6 +1002,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>ISO 8859-14</entry> <entry>ISO 8859-14</entry>
<entry>Celtic</entry> <entry>Celtic</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry><literal>ISO885914</></entry> <entry><literal>ISO885914</></entry>
</row> </row>
@ -980,6 +1011,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>ISO 8859-15</entry> <entry>ISO 8859-15</entry>
<entry>LATIN1 with Euro and accents</entry> <entry>LATIN1 with Euro and accents</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry><literal>ISO885915</></entry> <entry><literal>ISO885915</></entry>
</row> </row>
@ -988,6 +1020,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>ISO 8859-16, <acronym>ASRO</> SR 14111</entry> <entry>ISO 8859-16, <acronym>ASRO</> SR 14111</entry>
<entry>Romanian</entry> <entry>Romanian</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>No</entry>
<entry>1</entry> <entry>1</entry>
<entry><literal>ISO885916</></entry> <entry><literal>ISO885916</></entry>
</row> </row>
@ -996,6 +1029,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Mule internal code</entry> <entry>Mule internal code</entry>
<entry>Multilingual Emacs</entry> <entry>Multilingual Emacs</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>No</entry>
<entry>1-4</entry> <entry>1-4</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -1004,6 +1038,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Shift JIS</entry> <entry>Shift JIS</entry>
<entry>Japanese</entry> <entry>Japanese</entry>
<entry>No</entry> <entry>No</entry>
<entry>No</entry>
<entry>1-2</entry> <entry>1-2</entry>
<entry><literal>Mskanji</>, <literal>ShiftJIS</>, <literal>WIN932</>, <literal>Windows932</></entry> <entry><literal>Mskanji</>, <literal>ShiftJIS</>, <literal>WIN932</>, <literal>Windows932</></entry>
</row> </row>
@ -1012,6 +1047,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Shift JIS, JIS X 0213</entry> <entry>Shift JIS, JIS X 0213</entry>
<entry>Japanese</entry> <entry>Japanese</entry>
<entry>No</entry> <entry>No</entry>
<entry>No</entry>
<entry>1-2</entry> <entry>1-2</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -1020,6 +1056,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>unspecified (see text)</entry> <entry>unspecified (see text)</entry>
<entry><emphasis>any</></entry> <entry><emphasis>any</></entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>No</entry>
<entry>1</entry> <entry>1</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -1028,6 +1065,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Unified Hangul Code</entry> <entry>Unified Hangul Code</entry>
<entry>Korean</entry> <entry>Korean</entry>
<entry>No</entry> <entry>No</entry>
<entry>No</entry>
<entry>1-2</entry> <entry>1-2</entry>
<entry><literal>WIN949</>, <literal>Windows949</></entry> <entry><literal>WIN949</>, <literal>Windows949</></entry>
</row> </row>
@ -1036,6 +1074,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Unicode, 8-bit</entry> <entry>Unicode, 8-bit</entry>
<entry><emphasis>all</></entry> <entry><emphasis>all</></entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1-4</entry> <entry>1-4</entry>
<entry><literal>Unicode</></entry> <entry><literal>Unicode</></entry>
</row> </row>
@ -1044,6 +1083,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Windows CP866</entry> <entry>Windows CP866</entry>
<entry>Cyrillic</entry> <entry>Cyrillic</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry><literal>ALT</></entry> <entry><literal>ALT</></entry>
</row> </row>
@ -1052,6 +1092,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Windows CP874</entry> <entry>Windows CP874</entry>
<entry>Thai</entry> <entry>Thai</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>No</entry>
<entry>1</entry> <entry>1</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -1060,6 +1101,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Windows CP1250</entry> <entry>Windows CP1250</entry>
<entry>Central European</entry> <entry>Central European</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -1068,6 +1110,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Windows CP1251</entry> <entry>Windows CP1251</entry>
<entry>Cyrillic</entry> <entry>Cyrillic</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry><literal>WIN</></entry> <entry><literal>WIN</></entry>
</row> </row>
@ -1076,6 +1119,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Windows CP1252</entry> <entry>Windows CP1252</entry>
<entry>Western European</entry> <entry>Western European</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -1084,6 +1128,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Windows CP1253</entry> <entry>Windows CP1253</entry>
<entry>Greek</entry> <entry>Greek</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -1092,6 +1137,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Windows CP1254</entry> <entry>Windows CP1254</entry>
<entry>Turkish</entry> <entry>Turkish</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -1100,6 +1146,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Windows CP1255</entry> <entry>Windows CP1255</entry>
<entry>Hebrew</entry> <entry>Hebrew</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -1108,6 +1155,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Windows CP1256</entry> <entry>Windows CP1256</entry>
<entry>Arabic</entry> <entry>Arabic</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -1116,6 +1164,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Windows CP1257</entry> <entry>Windows CP1257</entry>
<entry>Baltic</entry> <entry>Baltic</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry></entry> <entry></entry>
</row> </row>
@ -1124,6 +1173,7 @@ CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
<entry>Windows CP1258</entry> <entry>Windows CP1258</entry>
<entry>Vietnamese</entry> <entry>Vietnamese</entry>
<entry>Yes</entry> <entry>Yes</entry>
<entry>Yes</entry>
<entry>1</entry> <entry>1</entry>
<entry><literal>ABC</>, <literal>TCVN</>, <literal>TCVN5712</>, <literal>VSCII</></entry> <entry><literal>ABC</>, <literal>TCVN</>, <literal>TCVN5712</>, <literal>VSCII</></entry>
</row> </row>

View File

@ -1914,10 +1914,61 @@ OpfamilyIsVisible(Oid opfid)
return visible; return visible;
} }
/*
* lookup_collation
* If there's a collation of the given name/namespace, and it works
* with the given encoding, return its OID. Else return InvalidOid.
*/
static Oid
lookup_collation(const char *collname, Oid collnamespace, int32 encoding)
{
Oid collid;
HeapTuple colltup;
Form_pg_collation collform;
/* Check for encoding-specific entry (exact match) */
collid = GetSysCacheOid3(COLLNAMEENCNSP,
PointerGetDatum(collname),
Int32GetDatum(encoding),
ObjectIdGetDatum(collnamespace));
if (OidIsValid(collid))
return collid;
/*
* Check for any-encoding entry. This takes a bit more work: while libc
* collations with collencoding = -1 do work with all encodings, ICU
* collations only work with certain encodings, so we have to check that
* aspect before deciding it's a match.
*/
colltup = SearchSysCache3(COLLNAMEENCNSP,
PointerGetDatum(collname),
Int32GetDatum(-1),
ObjectIdGetDatum(collnamespace));
if (!HeapTupleIsValid(colltup))
return InvalidOid;
collform = (Form_pg_collation) GETSTRUCT(colltup);
if (collform->collprovider == COLLPROVIDER_ICU)
{
if (is_encoding_supported_by_icu(encoding))
collid = HeapTupleGetOid(colltup);
else
collid = InvalidOid;
}
else
{
collid = HeapTupleGetOid(colltup);
}
ReleaseSysCache(colltup);
return collid;
}
/* /*
* CollationGetCollid * CollationGetCollid
* Try to resolve an unqualified collation name. * Try to resolve an unqualified collation name.
* Returns OID if collation found in search path, else InvalidOid. * Returns OID if collation found in search path, else InvalidOid.
*
* Note that this will only find collations that work with the current
* database's encoding.
*/ */
Oid Oid
CollationGetCollid(const char *collname) CollationGetCollid(const char *collname)
@ -1935,19 +1986,7 @@ CollationGetCollid(const char *collname)
if (namespaceId == myTempNamespace) if (namespaceId == myTempNamespace)
continue; /* do not look in temp namespace */ continue; /* do not look in temp namespace */
/* Check for database-encoding-specific entry */ collid = lookup_collation(collname, namespaceId, dbencoding);
collid = GetSysCacheOid3(COLLNAMEENCNSP,
PointerGetDatum(collname),
Int32GetDatum(dbencoding),
ObjectIdGetDatum(namespaceId));
if (OidIsValid(collid))
return collid;
/* Check for any-encoding entry */
collid = GetSysCacheOid3(COLLNAMEENCNSP,
PointerGetDatum(collname),
Int32GetDatum(-1),
ObjectIdGetDatum(namespaceId));
if (OidIsValid(collid)) if (OidIsValid(collid))
return collid; return collid;
} }
@ -1961,6 +2000,9 @@ CollationGetCollid(const char *collname)
* Determine whether a collation (identified by OID) is visible in the * Determine whether a collation (identified by OID) is visible in the
* current search path. Visible means "would be found by searching * current search path. Visible means "would be found by searching
* for the unqualified collation name". * for the unqualified collation name".
*
* Note that only collations that work with the current database's encoding
* will be considered visible.
*/ */
bool bool
CollationIsVisible(Oid collid) CollationIsVisible(Oid collid)
@ -1990,9 +2032,10 @@ CollationIsVisible(Oid collid)
{ {
/* /*
* If it is in the path, it might still not be visible; it could be * If it is in the path, it might still not be visible; it could be
* hidden by another conversion of the same name earlier in the path. * hidden by another collation of the same name earlier in the path,
* So we must do a slow check to see if this conversion would be found * or it might not work with the current DB encoding. So we must do a
* by CollationGetCollid. * slow check to see if this collation would be found by
* CollationGetCollid.
*/ */
char *collname = NameStr(collform->collname); char *collname = NameStr(collform->collname);
@ -3442,6 +3485,9 @@ PopOverrideSearchPath(void)
/* /*
* get_collation_oid - find a collation by possibly qualified name * get_collation_oid - find a collation by possibly qualified name
*
* Note that this will only find collations that work with the current
* database's encoding.
*/ */
Oid Oid
get_collation_oid(List *name, bool missing_ok) get_collation_oid(List *name, bool missing_ok)
@ -3463,17 +3509,7 @@ get_collation_oid(List *name, bool missing_ok)
if (missing_ok && !OidIsValid(namespaceId)) if (missing_ok && !OidIsValid(namespaceId))
return InvalidOid; return InvalidOid;
/* first try for encoding-specific entry, then any-encoding */ colloid = lookup_collation(collation_name, namespaceId, dbencoding);
colloid = GetSysCacheOid3(COLLNAMEENCNSP,
PointerGetDatum(collation_name),
Int32GetDatum(dbencoding),
ObjectIdGetDatum(namespaceId));
if (OidIsValid(colloid))
return colloid;
colloid = GetSysCacheOid3(COLLNAMEENCNSP,
PointerGetDatum(collation_name),
Int32GetDatum(-1),
ObjectIdGetDatum(namespaceId));
if (OidIsValid(colloid)) if (OidIsValid(colloid))
return colloid; return colloid;
} }
@ -3489,16 +3525,7 @@ get_collation_oid(List *name, bool missing_ok)
if (namespaceId == myTempNamespace) if (namespaceId == myTempNamespace)
continue; /* do not look in temp namespace */ continue; /* do not look in temp namespace */
colloid = GetSysCacheOid3(COLLNAMEENCNSP, colloid = lookup_collation(collation_name, namespaceId, dbencoding);
PointerGetDatum(collation_name),
Int32GetDatum(dbencoding),
ObjectIdGetDatum(namespaceId));
if (OidIsValid(colloid))
return colloid;
colloid = GetSysCacheOid3(COLLNAMEENCNSP,
PointerGetDatum(collation_name),
Int32GetDatum(-1),
ObjectIdGetDatum(namespaceId));
if (OidIsValid(colloid)) if (OidIsValid(colloid))
return colloid; return colloid;
} }

View File

@ -353,6 +353,21 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
} }
/*
* Check a string to see if it is pure ASCII
*/
static bool
is_all_ascii(const char *str)
{
while (*str)
{
if (IS_HIGHBIT_SET(*str))
return false;
str++;
}
return true;
}
/* will we use "locale -a" in pg_import_system_collations? */ /* will we use "locale -a" in pg_import_system_collations? */
#if defined(HAVE_LOCALE_T) && !defined(WIN32) #if defined(HAVE_LOCALE_T) && !defined(WIN32)
#define READ_LOCALE_A_OUTPUT #define READ_LOCALE_A_OUTPUT
@ -431,7 +446,9 @@ get_icu_language_tag(const char *localename)
/* /*
* Get a comment (specifically, the display name) for an ICU locale. * Get a comment (specifically, the display name) for an ICU locale.
* The result is a palloc'd string. * The result is a palloc'd string, or NULL if we can't get a comment
* or find that it's not all ASCII. (We can *not* accept non-ASCII
* comments, because the contents of template0 must be encoding-agnostic.)
*/ */
static char * static char *
get_icu_locale_comment(const char *localename) get_icu_locale_comment(const char *localename)
@ -439,6 +456,7 @@ get_icu_locale_comment(const char *localename)
UErrorCode status; UErrorCode status;
UChar displayname[128]; UChar displayname[128];
int32 len_uchar; int32 len_uchar;
int32 i;
char *result; char *result;
status = U_ZERO_ERROR; status = U_ZERO_ERROR;
@ -446,11 +464,20 @@ get_icu_locale_comment(const char *localename)
displayname, lengthof(displayname), displayname, lengthof(displayname),
&status); &status);
if (U_FAILURE(status)) if (U_FAILURE(status))
ereport(ERROR, return NULL; /* no good reason to raise an error */
(errmsg("could not get display name for locale \"%s\": %s",
localename, u_errorName(status))));
icu_from_uchar(&result, displayname, len_uchar); /* Check for non-ASCII comment (can't use is_all_ascii for this) */
for (i = 0; i < len_uchar; i++)
{
if (displayname[i] > 127)
return NULL;
}
/* OK, transcribe */
result = palloc(len_uchar + 1);
for (i = 0; i < len_uchar; i++)
result[i] = displayname[i];
result[len_uchar] = '\0';
return result; return result;
} }
@ -502,7 +529,6 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
{ {
size_t len; size_t len;
int enc; int enc;
bool skip;
char alias[NAMEDATALEN]; char alias[NAMEDATALEN];
len = strlen(localebuf); len = strlen(localebuf);
@ -521,16 +547,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
* interpret the non-ASCII characters. We can't do much with * interpret the non-ASCII characters. We can't do much with
* those, so we filter them out. * those, so we filter them out.
*/ */
skip = false; if (!is_all_ascii(localebuf))
for (i = 0; i < len; i++)
{
if (IS_HIGHBIT_SET(localebuf[i]))
{
skip = true;
break;
}
}
if (skip)
{ {
elog(DEBUG1, "locale name has non-ASCII characters, skipped: \"%s\"", localebuf); elog(DEBUG1, "locale name has non-ASCII characters, skipped: \"%s\"", localebuf);
continue; continue;
@ -642,14 +659,6 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
/* Load collations known to ICU */ /* Load collations known to ICU */
#ifdef USE_ICU #ifdef USE_ICU
if (!is_encoding_supported_by_icu(GetDatabaseEncoding()))
{
ereport(NOTICE,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("encoding \"%s\" not supported by ICU",
pg_encoding_to_char(GetDatabaseEncoding()))));
}
else
{ {
int i; int i;
@ -661,6 +670,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
{ {
const char *name; const char *name;
char *langtag; char *langtag;
char *icucomment;
const char *collcollate; const char *collcollate;
UEnumeration *en; UEnumeration *en;
UErrorCode status; UErrorCode status;
@ -674,6 +684,14 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
langtag = get_icu_language_tag(name); langtag = get_icu_language_tag(name);
collcollate = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name; collcollate = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name;
/*
* Be paranoid about not allowing any non-ASCII strings into
* pg_collation
*/
if (!is_all_ascii(langtag) || !is_all_ascii(collcollate))
continue;
collid = CollationCreate(psprintf("%s-x-icu", langtag), collid = CollationCreate(psprintf("%s-x-icu", langtag),
nspid, GetUserId(), nspid, GetUserId(),
COLLPROVIDER_ICU, -1, COLLPROVIDER_ICU, -1,
@ -686,8 +704,10 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
CommandCounterIncrement(); CommandCounterIncrement();
CreateComments(collid, CollationRelationId, 0, icucomment = get_icu_locale_comment(name);
get_icu_locale_comment(name)); if (icucomment)
CreateComments(collid, CollationRelationId, 0,
icucomment);
} }
/* /*
@ -708,6 +728,14 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
langtag = get_icu_language_tag(localeid); langtag = get_icu_language_tag(localeid);
collcollate = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : localeid; collcollate = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : localeid;
/*
* Be paranoid about not allowing any non-ASCII strings into
* pg_collation
*/
if (!is_all_ascii(langtag) || !is_all_ascii(collcollate))
continue;
collid = CollationCreate(psprintf("%s-x-icu", langtag), collid = CollationCreate(psprintf("%s-x-icu", langtag),
nspid, GetUserId(), nspid, GetUserId(),
COLLPROVIDER_ICU, -1, COLLPROVIDER_ICU, -1,
@ -720,8 +748,10 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
CommandCounterIncrement(); CommandCounterIncrement();
CreateComments(collid, CollationRelationId, 0, icucomment = get_icu_locale_comment(name);
get_icu_locale_comment(localeid)); if (icucomment)
CreateComments(collid, CollationRelationId, 0,
icucomment);
} }
} }
if (U_FAILURE(status)) if (U_FAILURE(status))

View File

@ -53,6 +53,6 @@
*/ */
/* yyyymmddN */ /* yyyymmddN */
#define CATALOG_VERSION_NO 201706231 #define CATALOG_VERSION_NO 201706241
#endif #endif