Check whether the datctype is C to determine whether t_isspace() and
related functions use isspace() or iswspace().
Previously, t_isspace() checked whether the database default collation
was C; which is incorrect when the default collation uses the ICU
provider.
Discussion: https://postgr.es/m/79e4354d9eccfdb00483146a6b9f6295202e7890.camel@j-davis.com
Reviewed-by: Peter Eisentraut
Backpatch-through: 15
As noted by Thomas Munro, CLDR 36 has added SOUND RECORDING COPYRIGHT
(U+2117), and we use CLDR 41, so this can be removed from the set of
special cases.
The set of regression tests is expanded for degree signs, which are two
of the special cases, and a fancy case with U+210C in Latin-ASCII.xml
that we have discovered about when diving into what could be done for
Cyrillic characters (this last part is material for a future patch, not
tackled yet).
While on it, some of the assertions of generate_unaccent_rules.py are
expanded to report the codepoint on which a failure is found, something
useful for debugging.
Extracted from a larger patch by the same author.
Author: Przemysław Sztoch
Discussion: https://postgr.es/m/8478da0d-3b61-d24f-80b4-ce2f5e971c60@sztoch.pl
This has required an update of the python script generating the rules,
as its format has changed in release 29. This release has also added
new punctuation and symbols, and a new set of rules has been generated
to include them. The way to find newest versions of Latin-ASCII gets
also more clearly documented.
Author: Hugh Ranalli, Michael Paquier
Discussion: https://postgr.es/m/15548-cef1b3f8de190d4f@postgresql.org
This isn't fully tested as yet, in particular I'm not sure that the
"foo--unpackaged--1.0.sql" scripts are OK. But it's time to get some
buildfarm cycles on it.
sepgsql is not converted to an extension, mainly because it seems to
require a very nonstandard installation process.
Dimitri Fontaine and Tom Lane