Commit Graph

155 Commits

Author SHA1 Message Date
Peter Eisentraut 708007dced Remove error message hints mentioning configure options
These are usually not useful since users will use packaged
distributions and won't be interested in rebuilding their installation
from source.  Also, we have only used these kinds of hints for some
features and in some places, not consistently throughout.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/2552aed7-d0e9-280a-54aa-2dc7073f371d%40enterprisedb.com
2022-04-08 07:41:55 +02:00
Peter Eisentraut 3a671e1f7c Fix global ICU collations for ICU < 54
createdb() didn't check for collation attributes validity, which has
to be done explicitly on ICU < 54.  It also forgot to close the ICU collator
opened during the check which leaks some memory.

To fix both, add a new check_icu_locale() that does all the appropriate
verification and close the ICU collator.

initdb also had some partial check for ICU < 54.  To have consistent error
reporting across major ICU versions, and get rid of the need to include ucol.h,
remove the partial check there.  The backend will report an error if needed
during the post-boostrap iniitialization phase.

Author: Julien Rouhaud <julien.rouhaud@free.fr>
Discussion: https://www.postgresql.org/message-id/20220319041459.qqqiqh335sga5ezj@jrouhaud
2022-03-20 10:21:45 +01:00
Peter Eisentraut f2553d4306 Add option to use ICU as global locale provider
This adds the option to use ICU as the default locale provider for
either the whole cluster or a database.  New options for initdb,
createdb, and CREATE DATABASE are used to select this.

Since some (legacy) code still uses the libc locale facilities
directly, we still need to set the libc global locale settings even if
ICU is otherwise selected.  So pg_database now has three
locale-related fields: the existing datcollate and datctype, which are
always set, and a new daticulocale, which is only set if ICU is
selected.  A similar change is made in pg_collation for consistency,
but in that case, only the libc-related fields or the ICU-related
field is set, never both.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a%402ndquadrant.com
2022-03-17 11:13:16 +01:00
Peter Eisentraut f5744f1d1e Update comment
Update a comment that assumed that libc collations don't support
versioning.  Also improve an adjacent error message a bit.
2022-02-10 09:16:17 +01:00
Peter Eisentraut 54637508f8 Change collate and ctype fields to type text
This changes the data type of the catalog fields datcollate, datctype,
collcollate, and collctype from name to text.  There wasn't ever a
really good reason for them to be of type name; presumably this was
just carried over from when they were fixed-size fields in pg_control,
first into the corresponding pg_database fields, and then to
pg_collation.  The values are not identifiers or object names, and we
don't ever look them up that way.

Changing to type text saves space in the typical case, since locale
names are typically only a few bytes long.  But it is also possible
that an ICU locale name with several customization options appended
could be longer than 63 bytes, so this also enables that case, which
was previously probably broken.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a@2ndquadrant.com
2022-01-27 08:54:25 +01:00
Peter Eisentraut b99ccd2cb2 Call pg_newlocale_from_collation() also with default collation
Previously, callers of pg_newlocale_from_collation() did not call it
if the collation was DEFAULT_COLLATION_OID and instead proceeded with
a pg_locale_t of 0.  Instead, now we call it anyway and have it return
0 if the default collation was passed.  It already did this, so we
just have to adjust the callers.  This simplifies all the call sites
and also makes future enhancements easier.

After discussion and testing, the previous comment in pg_locale.c
about avoiding this for performance reasons may have been mistaken
since it was testing a very different patch version way back when.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/ed3baa81-7fac-7788-cc12-41e3f7917e34@enterprisedb.com
2022-01-20 09:50:18 +01:00
Bruce Momjian 27b77ecf9f Update copyright for 2022
Backpatch-through: 10
2022-01-07 19:04:57 -05:00
Michael Paquier 98105e53e0 Fix typos
Author: Lingjie Qiang
Discussion: https://postgr.es/m/OSAPR01MB71654E773F62AC88DC1FC8CC80669@OSAPR01MB7165.jpnprd01.prod.outlook.com
2021-11-30 11:05:15 +09:00
Tom Lane def5b065ff Initial pgindent and pgperltidy run for v14.
Also "make reformat-dat-files".

The only change worthy of note is that pgindent messed up the formatting
of launcher.c's struct LogicalRepWorkerId, which led me to notice that
that struct wasn't used at all anymore, so I just took it out.
2021-05-12 13:14:10 -04:00
Peter Eisentraut ec6e70c79f Refactor some error messages for easier translation 2021-05-12 07:42:51 +02:00
Thomas Munro ec48314708 Revert per-index collation version tracking feature.
Design problems were discovered in the handling of composite types and
record types that would cause some relevant versions not to be recorded.
Misgivings were also expressed about the use of the pg_depend catalog
for this purpose.  We're out of time for this release so we'll revert
and try again.

Commits reverted:

1bf946bd: Doc: Document known problem with Windows collation versions.
cf002008: Remove no-longer-relevant test case.
ef387bed: Fix bogus collation-version-recording logic.
0fb0a050: Hide internal error for pg_collation_actual_version(<bad OID>).
ff942057: Suppress "warning: variable 'collcollate' set but not used".
d50e3b1f: Fix assertion in collation version lookup.
f24b1569: Rethink extraction of collation dependencies.
257836a7: Track collation versions for indexes.
cd6f479e: Add pg_depend.refobjversion.
7d1297df: Remove pg_collation.collversion.

Discussion: https://postgr.es/m/CA%2BhUKGLhj5t1fcjqAu8iD9B3ixJtsTNqyCCD4V0aTO9kAKAjjA%40mail.gmail.com
2021-05-07 21:10:11 +12:00
Thomas Munro 8556267b2b Revert "pg_collation_actual_version() -> pg_collation_current_version()."
This reverts commit 9cf184cc05.  Name
change less well received than anticipated.

Discussion: https://postgr.es/m/afcfb97e-88a1-a540-db95-6c573b93bc2b%40eisentraut.org
2021-02-26 15:29:27 +13:00
Thomas Munro beb4480c85 Refactor get_collation_current_version().
The code paths for three different OSes finished up with three different
ways of excluding C[.xxx] and POSIX from consideration.  Merge them.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
2021-02-22 23:32:16 +13:00
Thomas Munro 9cf184cc05 pg_collation_actual_version() -> pg_collation_current_version().
The new name seems a bit more natural.

Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
2021-02-22 23:32:16 +13:00
Thomas Munro 0fb0a0503b Hide internal error for pg_collation_actual_version(<bad OID>).
Instead of an unsightly internal "cache lookup failed" message, just
return NULL for bad OIDs, as is the convention for other similar things.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
2021-02-22 23:01:20 +13:00
Bruce Momjian ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Tom Lane 7ca37fb040 Use setenv() in preference to putenv().
Since at least 2001 we've used putenv() and avoided setenv(), on the
grounds that the latter was unportable and not in POSIX.  However,
POSIX added it that same year, and by now the situation has reversed:
setenv() is probably more portable than putenv(), since POSIX now
treats the latter as not being a core function.  And setenv() has
cleaner semantics too.  So, let's reverse that old policy.

This commit adds a simple src/port/ implementation of setenv() for
any stragglers (we have one in the buildfarm, but I'd not be surprised
if that code is never used in the field).  More importantly, extend
win32env.c to also support setenv().  Then, replace usages of putenv()
with setenv(), and get rid of some ad-hoc implementations of setenv()
wannabees.

Also, adjust our src/port/ implementation of unsetenv() to follow the
POSIX spec that it returns an error indicator, rather than returning
void as per the ancient BSD convention.  I don't feel a need to make
all the call sites check for errors, but the portability stub ought
to match real-world practice.

Discussion: https://postgr.es/m/2065122.1609212051@sss.pgh.pa.us
2020-12-30 12:56:06 -05:00
Tom Lane b3817f5f77 Improve hash_create()'s API for some added robustness.
Invent a new flag bit HASH_STRINGS to specify C-string hashing, which
was formerly the default; and add assertions insisting that exactly
one of the bits HASH_STRINGS, HASH_BLOBS, and HASH_FUNCTION be set.
This is in hopes of preventing recurrences of the type of oversight
fixed in commit a1b8aa1e4 (i.e., mistakenly omitting HASH_BLOBS).

Also, when HASH_STRINGS is specified, insist that the keysize be
more than 8 bytes.  This is a heuristic, but it should catch
accidental use of HASH_STRINGS for integer or pointer keys.
(Nearly all existing use-cases set the keysize to NAMEDATALEN or
more, so there's little reason to think this restriction should
be problematic.)

Tweak hash_create() to insist that the HASH_ELEM flag be set, and
remove the defaults it had for keysize and entrysize.  Since those
defaults were undocumented and basically useless, no callers
omitted HASH_ELEM anyway.

Also, remove memset's zeroing the HASHCTL parameter struct from
those callers that had one.  This has never been really necessary,
and while it wasn't a bad coding convention it was confusing that
some callers did it and some did not.  We might as well save a few
cycles by standardizing on "not".

Also improve the documentation for hash_create().

In passing, improve reinit.c's usage of a hash table by storing
the key as a binary Oid rather than a string; and, since that's
a temporary hash table, allocate it in CurrentMemoryContext for
neatness.

Discussion: https://postgr.es/m/590625.1607878171@sss.pgh.pa.us
2020-12-15 11:38:53 -05:00
Thomas Munro ca051d8b10 Add collation versions for FreeBSD.
On FreeBSD 13, use querylocale() to read the current version of libc
collations.  Similar to commits 352f6f2d for Windows and d5ac14f9 for
GNU/Linux.

Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
2020-11-20 21:49:57 +13:00
Tom Lane ff94205787 Suppress "warning: variable 'collcollate' set but not used".
Buildfarm members that lack both HAVE_LOCALE_T and USE_ICU have been
complaining about pg_newlocale_from_collation's collcollate variable.
This is evidently fallout from commit 7d1297df0, which removed the
only usage outside those two #ifdef'd code paths.  Mark the variable
pg_attribute_unused(), like its sibling collctype, which has been that
way for a long time.
2020-11-15 12:39:49 -05:00
Thomas Munro d50e3b1f8d Fix assertion in collation version lookup.
Commit 257836a7 included an assertion that a version lookup routine is
not trying to look up "C" or "POSIX", but that case is reachable with
the user-facing SQL function pg_collation_actual_version().  Remove the
assertion.
2020-11-08 20:45:29 +13:00
Thomas Munro 9f12a3b95d Tolerate version lookup failure for old style Windows locale names.
Accept that we can't get versions for such locale names for now.  Users
will need to specify the newer language tag format to enable the
collation versioning feature.  It's not clear that we can do automatic
conversion from the old style to the new style reliably enough for this
purpose.

Unfortunately, this means that collation versioning probably won't work
for the default collation unless you provide something like en-US at
initdb or CREATE DATABASE time (though, for reasons not yet understood,
it does seem to work on some systems).  It'd be nice to find a better
solution, or document this quirk if we settle on it, but this should
unbreak the 3 failing build farm animals in the meantime.

Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
2020-11-04 15:13:08 +13:00
Thomas Munro 257836a755 Track collation versions for indexes.
Record the current version of dependent collations in pg_depend when
creating or rebuilding an index.  When accessing the index later, warn
that the index may be corrupted if the current version doesn't match.

Thanks to Douglas Doole, Peter Eisentraut, Christoph Berg, Laurenz Albe,
Michael Paquier, Robert Haas, Tom Lane and others for very helpful
discussion.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> (earlier versions)
Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
2020-11-03 01:19:50 +13:00
Thomas Munro 7d1297df08 Remove pg_collation.collversion.
This model couldn't be extended to cover the default collation, and
didn't have any information about the affected database objects when the
version changed.  Remove, in preparation for a follow-up commit that
will add a new mechanism.

Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
2020-11-03 00:44:59 +13:00
Peter Eisentraut 1784f278a6 Replace remaining StrNCpy() by strlcpy()
They are equivalent, except that StrNCpy() zero-fills the entire
destination buffer instead of providing just one trailing zero.  For
all but a tiny number of callers, that's just overhead rather than
being desirable.

Remove StrNCpy() as it is now unused.

In some cases, namestrcpy() is the more appropriate function to use.
While we're here, simplify the API of namestrcpy(): Remove the return
value, don't check for NULL input.  Nothing was using that anyway.
Also, remove a few unused name-related functions.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/44f5e198-36f6-6cdb-7fa9-60e34784daae%402ndquadrant.com
2020-08-10 23:20:37 +02:00
Amit Kapila a169155453 Fix the MSVC build for versions 2015 and later.
Visual Studio 2015 and later versions should still be able to do the same
as Visual Studio 2012, but the declaration of locale_name is missing in
_locale_t, causing the code compilation to fail, hence this falls back
instead on to enumerating all system locales by using EnumSystemLocalesEx
to find the required locale name.  If the input argument is in Unix-style
then we can get ISO Locale name directly by using GetLocaleInfoEx() with
LCType as LOCALE_SNAME.

In passing, change the documentation references of the now obsolete links.

Note that this problem occurs only with NLS enabled builds.

Author: Juan José Santamaría Flecha, Davinder Singh and Amit Kapila
Reviewed-by: Ranier Vilela and Amit Kapila
Backpatch-through: 9.5
Discussion: https://postgr.es/m/CAHzhFSFoJEWezR96um4-rg5W6m2Rj9Ud2CNZvV4NWc9tXV7aXQ@mail.gmail.com
2020-05-14 09:24:33 +05:30
Thomas Munro 352f6f2df6 Add collation versions for Windows.
On Vista and later, use GetNLSVersionEx() to request collation version
information.

Reviewed-by: Juan José Santamaría Flecha <juanjo.santamaria@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJvqup3s%2BJowVTcacZADO6dOhfdBmvOPHLS3KXUJu41Jw%40mail.gmail.com
2020-03-25 16:04:32 +13:00
Thomas Munro 382a821907 Allow NULL version for individual collations.
Remove the documented restriction that collation providers must either
return NULL for all collations or non-NULL for all collations.

Use NULL for glibc collations like "C.UTF-8", which might otherwise lead
future proposed commits to force unnecessary index rebuilds.

Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://postgr.es/m/CA%2BhUKGJvqup3s%2BJowVTcacZADO6dOhfdBmvOPHLS3KXUJu41Jw%40mail.gmail.com
2020-03-25 15:53:24 +13:00
Tom Lane d677550493 Allow to_date/to_timestamp to recognize non-English month/day names.
to_char() has long allowed the TM (translation mode) prefix to
specify output of translated month or day names; but that prefix
had no effect in input format strings.  Now it does.  to_date()
and to_timestamp() will now recognize the same month or day names
that to_char() would output for the same format code.  Matching is
case-insensitive (per the active collation's notion of what that
means), just as it has always been for English month/day names
without the TM prefix.

(As per the discussion thread, there are lots of cases that this
feature will not handle, such as alternate day names.  But being
able to accept what to_char() will output seems useful enough.)

In passing, fix some shaky English and violations of message
style guidelines in jsonpath errors for the .datetime() method,
which depends on this code.

Juan José Santamaría Flecha, reviewed and modified by me,
with other commentary from Alvaro Herrera, Tomas Vondra,
Arthur Zakirov, Peter Eisentraut, Mark Dilger.

Discussion: https://postgr.es/m/CAC+AXB3u1jTngJcoC1nAHBf=M3v-jrEfo86UFtCqCjzbWS9QhA@mail.gmail.com
2020-03-03 11:06:47 -05:00
Tom Lane 5afaa2e426 Rationalize code placement between wchar.c, encnames.c, and mbutils.c.
Move all the backend-only code that'd crept into wchar.c and encnames.c
into mbutils.c.

To remove the last few #ifdef dependencies from wchar.c and encnames.c,
also make the following changes:

* Adjust get_encoding_name_for_icu to return NULL, not throw an error,
for unsupported encodings.  Its sole caller can perfectly well throw an
error instead.  (While at it, I also made this function and its sibling
is_encoding_supported_by_icu proof against out-of-range encoding IDs.)

* Remove the overlength-name error condition from pg_char_to_encoding.
It's completely silly not to treat that just like any other
the-name-is-not-in-the-table case.

Also, get rid of pg_mic_mblen --- there's no obvious reason why
conv.c shouldn't call pg_mule_mblen instead.

Other than that, this is just code movement and comment-polishing with
no functional changes.  Notably, I reordered declarations in pg_wchar.h
to show which functions are frontend-accessible and which are not.

Discussion: https://postgr.es/m/CA+TgmoYO8oq-iy8E02rD8eX25T-9SmyxKWqqks5OMHxKvGXpXQ@mail.gmail.com
2020-01-16 18:08:21 -05:00
Bruce Momjian 7559d8ebfa Update copyrights for 2020
Backpatch-through: update all files in master, backpatch legal files through 9.4
2020-01-01 12:21:45 -05:00
Thomas Munro d5ac14f9cc Use libc version as a collation version on glibc systems.
Using glibc's version string to detect potential collation definition
changes is not 100% reliable, but it's better than nothing.  Currently
this affects only collations explicitly provided by "libc".  More work
will be needed to handle the default collation.

Author: Thomas Munro, based on a suggestion from Christoph Berg
Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/4b76c6d4-ae5e-0dc6-7d0d-b5c796a07e34%402ndquadrant.com
2019-10-16 17:28:24 +13:00
Peter Eisentraut 50518ec296 Revert "Use libc version as a collation version on glibc systems."
This reverts commit 9f90b1d08d.

This needs some refinements in the pg_dump and pg_upgrade tests.
2019-10-09 21:36:01 +02:00
Peter Eisentraut 9f90b1d08d Use libc version as a collation version on glibc systems.
Using glibc's version number to detect potential collation definition
changes is not 100% reliable, but it's better than nothing.

Author: Thomas Munro
Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/4b76c6d4-ae5e-0dc6-7d0d-b5c796a07e34%402ndquadrant.com
2019-10-09 21:17:47 +02:00
Peter Eisentraut 38d8dce61f Remove some code for old unsupported versions of MSVC
As of d9dd406fe2, we require MSVC 2013,
which means _MSC_VER >= 1800.  This means that conditionals about
older versions of _MSC_VER can be removed or simplified.

Previous code was also in some cases handling MinGW, where _MSC_VER is
not defined at all, incorrectly, such as in pg_ctl.c and win32_port.h,
leading to some compiler warnings.  This should now be handled better.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
2019-10-08 10:50:54 +02:00
Michael Paquier c96581abe4 Fix inconsistencies and typos in the tree, take 11
This fixes various typos in docs and comments, and removes some orphaned
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/5da8e325-c665-da95-21e0-c8a99ea61fbf@gmail.com
2019-08-19 16:21:39 +09:00
Michael Paquier 66bde49d96 Fix inconsistencies and typos in the tree, take 10
This addresses some issues with unnecessary code comments, fixes various
typos in docs and comments, and removes some orphaned structures and
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/9aabc775-5494-b372-8bcb-4dfc0bd37c68@gmail.com
2019-08-13 13:53:41 +09:00
Alvaro Herrera 75445c1515 More message style fixes
Discussion: https://postgr.es/m/20190515183005.GA26486@alvherre.pgsql
2019-05-16 19:14:31 -04:00
Tom Lane 7ad1cd31bf Repair assorted issues in locale data extraction.
cache_locale_time (extraction of LC_TIME-related info) had never been
taught the lessons we previously learned about extraction of info related
to LC_MONETARY and LC_NUMERIC.  Specifically, commit 95a777c61 taught
PGLC_localeconv() that data coming out of localeconv() was in an encoding
determined by the relevant locale, but we didn't realize that there's a
similar issue with strftime().  And commit a4930e7ca hardened
PGLC_localeconv() against errors occurring partway through, but failed
to do likewise for cache_locale_time().  So, rearrange the latter
function to perform encoding conversion and not risk failure while
it's got the locales set to temporary values.

This time around I also changed PGLC_localeconv() to treat it as FATAL
if it can't restore the previous settings of the locale values.  There
is no reason (except possibly OOM) for that to fail, and proceeding with
the wrong locale values seems like a seriously bad idea --- especially
on Windows where we have to also temporarily change LC_CTYPE.  Also,
protect against the possibility that we can't identify the codeset
reported for LC_MONETARY or LC_NUMERIC; rather than just failing,
try to validate the data without conversion.

The user-visible symptom this fixes is that if LC_TIME is set to a locale
name that implies an encoding different from the database encoding,
non-ASCII localized day and month names would be retrieved in the wrong
encoding, leading to either unexpected encoding-conversion error reports
or wrong output from to_char().  The other possible failure modes are
unlikely enough that we've not seen reports of them, AFAIK.

The encoding conversion problems do not manifest on Windows, since
we'd already created special-case code to handle that issue there.

Per report from Juan José Santamaría Flecha.  Back-patch to all
supported versions.

Juan José Santamaría Flecha and Tom Lane

Discussion: https://postgr.es/m/CAC+AXB22So5aZm2vZe+MChYXec7gWfr-n-SK-iO091R0P_1Tew@mail.gmail.com
2019-04-23 18:51:30 -04:00
Peter Eisentraut 5e1963fb76 Collations with nondeterministic comparison
This adds a flag "deterministic" to collations.  If that is false,
such a collation disables various optimizations that assume that
strings are equal only if they are byte-wise equal.  That then allows
use cases such as case-insensitive or accent-insensitive comparisons
or handling of strings with different Unicode normal forms.

This functionality is only supported with the ICU provider.  At least
glibc doesn't appear to have any locales that work in a
nondeterministic way, so it's not worth supporting this for the libc
provider.

The term "deterministic comparison" in this context is from Unicode
Technical Standard #10
(https://unicode.org/reports/tr10/#Deterministic_Comparison).

This patch makes changes in three areas:

- CREATE COLLATION DDL changes and system catalog changes to support
  this new flag.

- Many executor nodes and auxiliary code are extended to track
  collations.  Previously, this code would just throw away collation
  information, because the eventually-called user-defined functions
  didn't use it since they only cared about equality, which didn't
  need collation information.

- String data type functions that do equality comparisons and hashing
  are changed to take the (non-)deterministic flag into account.  For
  comparison, this just means skipping various shortcuts and tie
  breakers that use byte-wise comparison.  For hashing, we first need
  to convert the input string to a canonical "sort key" using the ICU
  analogue of strxfrm().

Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
2019-03-22 12:12:43 +01:00
Peter Eisentraut 1f050c08f9 Fix bug in support for collation attributes on older ICU versions
Unrecognized attribute names are supposed to be ignored.  But the code
would error out on an unrecognized attribute value even if it did not
recognize the attribute name.  So unrecognized attributes wouldn't
really be ignored unless the value happened to be one that matched a
recognized value.  This would break some important cases where the
attribute would be processed by ucol_open() directly.  Fix that and
add a test case.

The restructured code should also avoid compiler warnings about
initializing a UColAttribute value to -1, because the type might be an
unsigned enum.  (reported by Andres Freund)
2019-03-19 09:37:46 +01:00
Peter Eisentraut b8f9a2a69a Add support for collation attributes on older ICU versions
Starting in ICU 54, collation customization attributes can be
specified in the locale string, for example
"@colStrength=primary;colCaseLevel=yes".  Add support for this for
older ICU versions as well, by adding some minimal parsing of the
attributes in the locale string and calling ucol_setAttribute() on
them.  This is essentially what never ICU versions do internally in
ucol_open().  This was we can offer this functionality in a consistent
way in all ICU versions supported by PostgreSQL.

Also add some tests for ICU collation customization.

Reported-by: Daniel Verite <daniel@manitou-mail.org>
Discussion: https://www.postgresql.org/message-id/0270ebd4-f67c-8774-1a5a-91adfb9bb41f@2ndquadrant.com
2019-03-17 08:47:15 +01:00
Bruce Momjian 97c39498e5 Update copyright for 2019
Backpatch-through: certain files through 9.4
2019-01-02 12:44:25 -05:00
Bruce Momjian 9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Tom Lane 85feb77aa0 Assume wcstombs(), towlower(), and sibling functions are always present.
These functions are required by SUS v2, which is our minimum baseline
for Unix platforms, and are present on all interesting Windows versions
as well.  Even our oldest buildfarm members have them.  Thus, we were not
testing the "!USE_WIDE_UPPER_LOWER" code paths, which explains why the bug
fixed in commit e6023ee7f escaped detection.  Per discussion, there seems
to be no more real-world value in maintaining this option.  Hence, remove
the configure-time tests for wcstombs() and towlower(), remove the
USE_WIDE_UPPER_LOWER symbol, and remove all the !USE_WIDE_UPPER_LOWER code.
There's not actually all that much of the latter, but simplifying the #if
nests is a win in itself.

Discussion: https://postgr.es/m/20170921052928.GA188913@rfd.leadboat.com
2017-09-22 11:00:58 -04:00
Tom Lane 7b86c2ac95 Improve dubious memory management in pg_newlocale_from_collation().
pg_newlocale_from_collation() used malloc() and strdup() directly,
which is generally not per backend coding style, and it didn't bother
to check for failure results, but would just SIGSEGV instead.  Also,
if one of the numerous error checks in the middle of the function
failed, the already-allocated memory would be leaked permanently.
Admittedly, it's not a lot of memory, but it could build up if this
function were called repeatedly for a bad collation.

The first two problems are easily cured by palloc'ing in TopMemoryContext
instead of calling libc directly.  We can fairly easily dodge the leakage
problem for the struct pg_locale_struct by filling in a temporary variable
and allocating permanent storage only once we reach the bottom of the
function.  It's harder to get rid of the potential leakage for ICU's copy
of the collcollate string, but at least that's only allocated after most
of the error checks; so live with that aspect.

Back-patch to v10 where this code came in, with one or another of the
ICU patches.
2017-09-20 13:52:36 -04:00
Tom Lane 514f613293 Second try at getting useful errors out of newlocale/_create_locale.
The early buildfarm returns for commit 1e165d05f are pretty awful:
not only does Windows not return a useful error, but it looks like
a lot of Unix-ish platforms don't either.  Given the number of
different errnos seen so far, guess that what's really going on is
that some newlocale() implementations fail to set errno at all.
Hence, let's try zeroing errno just before newlocale() and then
if it's still zero report as though it's ENOENT.  That should cover
the Windows case too.

It's clear that we'll have to drop the regression test case, unless
we want to maintain a separate expected-file for platforms without
HAVE_LOCALE_T.  But I'll leave it there awhile longer to see if this
actually improves matters or not.

Discussion: https://postgr.es/m/CAKKotZS-wcDcofXDCH=sidiuajE+nqHn2CGjLLX78anyDmi3gQ@mail.gmail.com
2017-08-01 17:17:20 -04:00
Tom Lane 1e165d05fe Try to deliver a sane message for _create_locale() failure on Windows.
We were just printing errno, which is certainly not gonna work on
Windows.  Now, it's not entirely clear from Microsoft's documentation
whether _create_locale() adheres to standard Windows error reporting
conventions, but let's assume it does and try to map the GetLastError
result to an errno.  If this turns out not to work, probably the best
thing to do will be to assume the error is always ENOENT on Windows.

This is a longstanding bug, but given the lack of previous field
complaints, I'm not excited about back-patching it.

Per report from Murtuza Zabuawala.

Discussion: https://postgr.es/m/CAKKotZS-wcDcofXDCH=sidiuajE+nqHn2CGjLLX78anyDmi3gQ@mail.gmail.com
2017-08-01 16:11:51 -04:00
Peter Eisentraut d8b3c81335 Refine memory allocation in ICU conversions
The simple calculations done to estimate the size of the output buffers
for ucnv_fromUChars() and ucnv_toUChars() could overflow int32_t for
large strings.  To avoid that, go the long way and run the function
first without an output buffer to get the correct output buffer size
requirement.
2017-07-01 23:08:37 -04:00
Peter Eisentraut 13a57710db Prohibit creating ICU collation with different ctype
ICU does not support "collate" and "ctype" being different, so the
collctype catalog column is ignored.  But for catalog neatness, ensure
that they are the same.
2017-06-30 11:24:00 -04:00