Disable abbreviated keys for string-sorting in non-C locales.

Unfortunately, every version of glibc thus far tested has bugs whereby
strcoll() ordering does not match strxfrm() ordering as required by
the standard.  This can result in, for example, corrupted indexes.
Disabling abbreviated keys in these cases slows down non-C-collation
string sorting considerably, but there seems to be no practical
alternative.  Users who are confident that their libc implementations
are solid in this regard can re-enable the optimization by compiling
with TRUST_STRXFRM.

Users who have built indexes using PostgreSQL 9.5 or PostgreSQL 9.5.1
should REINDEX if there is a possibility that they may have been
affected by this problem.

Report by Marc-Olaf Jaschke.  Investigation mostly by Tom Lane, with
help from Peter Geoghegan, Noah Misch, Stephen Frost, and me.  Patch
by me, reviewed by Peter Geoghegan and Tom Lane.
This commit is contained in:
Robert Haas 2016-03-23 15:58:34 -04:00
parent 3151f16e18
commit 3df9c374e2
1 changed files with 23 additions and 10 deletions

View File

@ -1832,17 +1832,30 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool bpchar)
}
/*
* It's possible that there are platforms where the use of abbreviated
* keys should be disabled at compile time. Having only 4 byte datums
* could make worst-case performance drastically more likely, for example.
* Moreover, Darwin's strxfrm() implementations is known to not
* effectively concentrate a significant amount of entropy from the
* original string in earlier transformed blobs. It's possible that other
* supported platforms are similarly encumbered. However, even in those
* cases, the abbreviated keys optimization may win, and if it doesn't,
* the "abort abbreviation" code may rescue us. So, for now, we don't
* disable this anywhere on the basis of performance.
* Unfortunately, it seems that abbreviation for non-C collations is
* broken on many common platforms; testing of multiple versions of glibc
* reveals that, for many locales, strcoll() and strxfrm() do not return
* consistent results, which is fatal to this optimization. While no
* other libc other than Cygwin has so far been shown to have a problem,
* we take the conservative course of action for right now and disable
* this categorically. (Users who are certain this isn't a problem on
* their system can define TRUST_STRXFRM.)
*
* Even apart from the risk of broken locales, it's possible that there
* are platforms where the use of abbreviated keys should be disabled at
* compile time. Having only 4 byte datums could make worst-case
* performance drastically more likely, for example. Moreover, Darwin's
* strxfrm() implementations is known to not effectively concentrate a
* significant amount of entropy from the original string in earlier
* transformed blobs. It's possible that other supported platforms are
* similarly encumbered. So, if we ever get past disabling this
* categorically, we may still want or need to disable it for particular
* platforms.
*/
#ifndef TRUST_STRXFRM
if (!collate_c)
abbreviate = false;
#endif
/*
* If we're using abbreviated keys, or if we're using a locale-aware