2011-02-12 14:54:13 +01:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* collationcmds.c
|
2011-03-11 19:20:11 +01:00
|
|
|
* collation-related commands support code
|
2011-02-12 14:54:13 +01:00
|
|
|
*
|
2022-01-08 01:04:57 +01:00
|
|
|
* Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
|
2011-02-12 14:54:13 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
|
|
|
* src/backend/commands/collationcmds.c
|
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
|
2012-08-30 22:15:44 +02:00
|
|
|
#include "access/htup_details.h"
|
2019-01-21 19:18:20 +01:00
|
|
|
#include "access/table.h"
|
2011-03-04 21:14:37 +01:00
|
|
|
#include "access/xact.h"
|
2011-02-12 14:54:13 +01:00
|
|
|
#include "catalog/dependency.h"
|
|
|
|
#include "catalog/indexing.h"
|
|
|
|
#include "catalog/namespace.h"
|
2017-03-23 20:25:34 +01:00
|
|
|
#include "catalog/objectaccess.h"
|
2011-02-12 14:54:13 +01:00
|
|
|
#include "catalog/pg_collation.h"
|
2022-10-29 22:30:15 +02:00
|
|
|
#include "catalog/pg_database.h"
|
2022-11-13 08:11:17 +01:00
|
|
|
#include "catalog/pg_namespace.h"
|
2011-02-12 14:54:13 +01:00
|
|
|
#include "commands/alter.h"
|
|
|
|
#include "commands/collationcmds.h"
|
2017-03-23 20:25:34 +01:00
|
|
|
#include "commands/comment.h"
|
2011-02-12 14:54:13 +01:00
|
|
|
#include "commands/dbcommands.h"
|
|
|
|
#include "commands/defrem.h"
|
2020-12-21 01:37:11 +01:00
|
|
|
#include "common/string.h"
|
2011-02-12 14:54:13 +01:00
|
|
|
#include "mb/pg_wchar.h"
|
|
|
|
#include "miscadmin.h"
|
2020-03-10 10:22:52 +01:00
|
|
|
#include "utils/acl.h"
|
2011-02-12 14:54:13 +01:00
|
|
|
#include "utils/builtins.h"
|
|
|
|
#include "utils/lsyscache.h"
|
2011-03-04 21:14:37 +01:00
|
|
|
#include "utils/pg_locale.h"
|
2011-03-26 04:10:07 +01:00
|
|
|
#include "utils/rel.h"
|
2011-02-12 14:54:13 +01:00
|
|
|
#include "utils/syscache.h"
|
|
|
|
|
2017-03-23 20:25:34 +01:00
|
|
|
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
char *localename; /* name of locale, as per "locale -a" */
|
|
|
|
char *alias; /* shortened alias for same */
|
|
|
|
int enc; /* encoding */
|
|
|
|
} CollAliasData;
|
|
|
|
|
|
|
|
|
2011-02-12 14:54:13 +01:00
|
|
|
/*
|
|
|
|
* CREATE COLLATION
|
|
|
|
*/
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddress
|
2017-02-09 04:51:09 +01:00
|
|
|
DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_exists)
|
2011-02-12 14:54:13 +01:00
|
|
|
{
|
|
|
|
char *collName;
|
|
|
|
Oid collNamespace;
|
|
|
|
AclResult aclresult;
|
|
|
|
ListCell *pl;
|
|
|
|
DefElem *fromEl = NULL;
|
|
|
|
DefElem *localeEl = NULL;
|
|
|
|
DefElem *lccollateEl = NULL;
|
|
|
|
DefElem *lcctypeEl = NULL;
|
2017-03-23 20:25:34 +01:00
|
|
|
DefElem *providerEl = NULL;
|
2019-03-22 12:09:32 +01:00
|
|
|
DefElem *deterministicEl = NULL;
|
2021-05-07 10:17:42 +02:00
|
|
|
DefElem *versionEl = NULL;
|
2022-03-11 08:27:24 +01:00
|
|
|
char *collcollate;
|
|
|
|
char *collctype;
|
2022-03-17 11:11:21 +01:00
|
|
|
char *colliculocale;
|
2022-03-11 08:27:24 +01:00
|
|
|
bool collisdeterministic;
|
|
|
|
int collencoding;
|
|
|
|
char collprovider;
|
2021-05-07 10:17:42 +02:00
|
|
|
char *collversion = NULL;
|
2011-03-04 21:14:37 +01:00
|
|
|
Oid newoid;
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddress address;
|
2011-02-12 14:54:13 +01:00
|
|
|
|
|
|
|
collNamespace = QualifiedNameGetCreationNamespace(names, &collName);
|
|
|
|
|
2022-11-13 08:11:17 +01:00
|
|
|
aclresult = object_aclcheck(NamespaceRelationId, collNamespace, GetUserId(), ACL_CREATE);
|
2011-02-12 14:54:13 +01:00
|
|
|
if (aclresult != ACLCHECK_OK)
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(aclresult, OBJECT_SCHEMA,
|
2011-02-12 14:54:13 +01:00
|
|
|
get_namespace_name(collNamespace));
|
|
|
|
|
|
|
|
foreach(pl, parameters)
|
|
|
|
{
|
Improve castNode notation by introducing list-extraction-specific variants.
This extends the castNode() notation introduced by commit 5bcab1114 to
provide, in one step, extraction of a list cell's pointer and coercion to
a concrete node type. For example, "lfirst_node(Foo, lc)" is the same
as "castNode(Foo, lfirst(lc))". Almost half of the uses of castNode
that have appeared so far include a list extraction call, so this is
pretty widely useful, and it saves a few more keystrokes compared to the
old way.
As with the previous patch, back-patch the addition of these macros to
pg_list.h, so that the notation will be available when back-patching.
Patch by me, after an idea of Andrew Gierth's.
Discussion: https://postgr.es/m/14197.1491841216@sss.pgh.pa.us
2017-04-10 19:51:29 +02:00
|
|
|
DefElem *defel = lfirst_node(DefElem, pl);
|
2011-02-12 14:54:13 +01:00
|
|
|
DefElem **defelp;
|
|
|
|
|
Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers.
We have a lot of code in which option names, which from the user's
viewpoint are logically keywords, are passed through the grammar as plain
identifiers, and then matched to string literals during command execution.
This approach avoids making words into lexer keywords unnecessarily. Some
places matched these strings using plain strcmp, some using pg_strcasecmp.
But the latter should be unnecessary since identifiers would have been
downcased on their way through the parser. Aside from any efficiency
concerns (probably not a big factor), the lack of consistency in this area
creates a hazard of subtle bugs due to different places coming to different
conclusions about whether two option names are the same or different.
Hence, standardize on using strcmp() to match any option names that are
expected to have been fed through the parser.
This does create a user-visible behavioral change, which is that while
formerly all of these would work:
alter table foo set (fillfactor = 50);
alter table foo set (FillFactor = 50);
alter table foo set ("fillfactor" = 50);
alter table foo set ("FillFactor" = 50);
now the last case will fail because that double-quoted identifier is
different from the others. However, none of our documentation says that
you can use a quoted identifier in such contexts at all, and we should
discourage doing so since it would break if we ever decide to parse such
constructs as true lexer keywords rather than poor man's substitutes.
So this shouldn't create a significant compatibility issue for users.
Daniel Gustafsson, reviewed by Michael Paquier, small changes by me
Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se
2018-01-27 00:25:02 +01:00
|
|
|
if (strcmp(defel->defname, "from") == 0)
|
2011-02-12 14:54:13 +01:00
|
|
|
defelp = &fromEl;
|
Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers.
We have a lot of code in which option names, which from the user's
viewpoint are logically keywords, are passed through the grammar as plain
identifiers, and then matched to string literals during command execution.
This approach avoids making words into lexer keywords unnecessarily. Some
places matched these strings using plain strcmp, some using pg_strcasecmp.
But the latter should be unnecessary since identifiers would have been
downcased on their way through the parser. Aside from any efficiency
concerns (probably not a big factor), the lack of consistency in this area
creates a hazard of subtle bugs due to different places coming to different
conclusions about whether two option names are the same or different.
Hence, standardize on using strcmp() to match any option names that are
expected to have been fed through the parser.
This does create a user-visible behavioral change, which is that while
formerly all of these would work:
alter table foo set (fillfactor = 50);
alter table foo set (FillFactor = 50);
alter table foo set ("fillfactor" = 50);
alter table foo set ("FillFactor" = 50);
now the last case will fail because that double-quoted identifier is
different from the others. However, none of our documentation says that
you can use a quoted identifier in such contexts at all, and we should
discourage doing so since it would break if we ever decide to parse such
constructs as true lexer keywords rather than poor man's substitutes.
So this shouldn't create a significant compatibility issue for users.
Daniel Gustafsson, reviewed by Michael Paquier, small changes by me
Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se
2018-01-27 00:25:02 +01:00
|
|
|
else if (strcmp(defel->defname, "locale") == 0)
|
2011-02-12 14:54:13 +01:00
|
|
|
defelp = &localeEl;
|
Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers.
We have a lot of code in which option names, which from the user's
viewpoint are logically keywords, are passed through the grammar as plain
identifiers, and then matched to string literals during command execution.
This approach avoids making words into lexer keywords unnecessarily. Some
places matched these strings using plain strcmp, some using pg_strcasecmp.
But the latter should be unnecessary since identifiers would have been
downcased on their way through the parser. Aside from any efficiency
concerns (probably not a big factor), the lack of consistency in this area
creates a hazard of subtle bugs due to different places coming to different
conclusions about whether two option names are the same or different.
Hence, standardize on using strcmp() to match any option names that are
expected to have been fed through the parser.
This does create a user-visible behavioral change, which is that while
formerly all of these would work:
alter table foo set (fillfactor = 50);
alter table foo set (FillFactor = 50);
alter table foo set ("fillfactor" = 50);
alter table foo set ("FillFactor" = 50);
now the last case will fail because that double-quoted identifier is
different from the others. However, none of our documentation says that
you can use a quoted identifier in such contexts at all, and we should
discourage doing so since it would break if we ever decide to parse such
constructs as true lexer keywords rather than poor man's substitutes.
So this shouldn't create a significant compatibility issue for users.
Daniel Gustafsson, reviewed by Michael Paquier, small changes by me
Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se
2018-01-27 00:25:02 +01:00
|
|
|
else if (strcmp(defel->defname, "lc_collate") == 0)
|
2011-02-12 14:54:13 +01:00
|
|
|
defelp = &lccollateEl;
|
Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers.
We have a lot of code in which option names, which from the user's
viewpoint are logically keywords, are passed through the grammar as plain
identifiers, and then matched to string literals during command execution.
This approach avoids making words into lexer keywords unnecessarily. Some
places matched these strings using plain strcmp, some using pg_strcasecmp.
But the latter should be unnecessary since identifiers would have been
downcased on their way through the parser. Aside from any efficiency
concerns (probably not a big factor), the lack of consistency in this area
creates a hazard of subtle bugs due to different places coming to different
conclusions about whether two option names are the same or different.
Hence, standardize on using strcmp() to match any option names that are
expected to have been fed through the parser.
This does create a user-visible behavioral change, which is that while
formerly all of these would work:
alter table foo set (fillfactor = 50);
alter table foo set (FillFactor = 50);
alter table foo set ("fillfactor" = 50);
alter table foo set ("FillFactor" = 50);
now the last case will fail because that double-quoted identifier is
different from the others. However, none of our documentation says that
you can use a quoted identifier in such contexts at all, and we should
discourage doing so since it would break if we ever decide to parse such
constructs as true lexer keywords rather than poor man's substitutes.
So this shouldn't create a significant compatibility issue for users.
Daniel Gustafsson, reviewed by Michael Paquier, small changes by me
Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se
2018-01-27 00:25:02 +01:00
|
|
|
else if (strcmp(defel->defname, "lc_ctype") == 0)
|
2011-02-12 14:54:13 +01:00
|
|
|
defelp = &lcctypeEl;
|
Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers.
We have a lot of code in which option names, which from the user's
viewpoint are logically keywords, are passed through the grammar as plain
identifiers, and then matched to string literals during command execution.
This approach avoids making words into lexer keywords unnecessarily. Some
places matched these strings using plain strcmp, some using pg_strcasecmp.
But the latter should be unnecessary since identifiers would have been
downcased on their way through the parser. Aside from any efficiency
concerns (probably not a big factor), the lack of consistency in this area
creates a hazard of subtle bugs due to different places coming to different
conclusions about whether two option names are the same or different.
Hence, standardize on using strcmp() to match any option names that are
expected to have been fed through the parser.
This does create a user-visible behavioral change, which is that while
formerly all of these would work:
alter table foo set (fillfactor = 50);
alter table foo set (FillFactor = 50);
alter table foo set ("fillfactor" = 50);
alter table foo set ("FillFactor" = 50);
now the last case will fail because that double-quoted identifier is
different from the others. However, none of our documentation says that
you can use a quoted identifier in such contexts at all, and we should
discourage doing so since it would break if we ever decide to parse such
constructs as true lexer keywords rather than poor man's substitutes.
So this shouldn't create a significant compatibility issue for users.
Daniel Gustafsson, reviewed by Michael Paquier, small changes by me
Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se
2018-01-27 00:25:02 +01:00
|
|
|
else if (strcmp(defel->defname, "provider") == 0)
|
2017-03-23 20:25:34 +01:00
|
|
|
defelp = &providerEl;
|
2019-03-22 12:09:32 +01:00
|
|
|
else if (strcmp(defel->defname, "deterministic") == 0)
|
|
|
|
defelp = &deterministicEl;
|
2021-05-07 10:17:42 +02:00
|
|
|
else if (strcmp(defel->defname, "version") == 0)
|
|
|
|
defelp = &versionEl;
|
2011-02-12 14:54:13 +01:00
|
|
|
else
|
|
|
|
{
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
|
|
|
errmsg("collation attribute \"%s\" not recognized",
|
2016-09-06 18:00:00 +02:00
|
|
|
defel->defname),
|
|
|
|
parser_errposition(pstate, defel->location)));
|
2011-02-12 14:54:13 +01:00
|
|
|
break;
|
|
|
|
}
|
2021-07-18 12:08:34 +02:00
|
|
|
if (*defelp != NULL)
|
|
|
|
errorConflictingDefElem(defel, pstate);
|
2011-02-12 14:54:13 +01:00
|
|
|
*defelp = defel;
|
|
|
|
}
|
|
|
|
|
2021-07-18 12:08:34 +02:00
|
|
|
if (localeEl && (lccollateEl || lcctypeEl))
|
|
|
|
ereport(ERROR,
|
|
|
|
errcode(ERRCODE_SYNTAX_ERROR),
|
|
|
|
errmsg("conflicting or redundant options"),
|
|
|
|
errdetail("LOCALE cannot be specified together with LC_COLLATE or LC_CTYPE."));
|
|
|
|
|
|
|
|
if (fromEl && list_length(parameters) != 1)
|
2011-02-12 14:54:13 +01:00
|
|
|
ereport(ERROR,
|
2021-07-18 12:08:34 +02:00
|
|
|
errcode(ERRCODE_SYNTAX_ERROR),
|
|
|
|
errmsg("conflicting or redundant options"),
|
|
|
|
errdetail("FROM cannot be specified together with any other options."));
|
2011-02-12 14:54:13 +01:00
|
|
|
|
|
|
|
if (fromEl)
|
|
|
|
{
|
|
|
|
Oid collid;
|
|
|
|
HeapTuple tp;
|
2022-01-27 08:44:31 +01:00
|
|
|
Datum datum;
|
|
|
|
bool isnull;
|
2011-02-12 14:54:13 +01:00
|
|
|
|
Remove collation information from TypeName, where it does not belong.
The initial collations patch treated a COLLATE spec as part of a TypeName,
following what can only be described as brain fade on the part of the SQL
committee. It's a lot more reasonable to treat COLLATE as a syntactically
separate object, so that it can be added in only the productions where it
actually belongs, rather than needing to reject it in a boatload of places
where it doesn't belong (something the original patch mostly failed to do).
In addition this change lets us meet the spec's requirement to allow
COLLATE anywhere in the clauses of a ColumnDef, and it avoids unfriendly
behavior for constructs such as "foo::type COLLATE collation".
To do this, pull collation information out of TypeName and put it in
ColumnDef instead, thus reverting most of the collation-related changes in
parse_type.c's API. I made one additional structural change, which was to
use a ColumnDef as an intermediate node in AT_AlterColumnType AlterTableCmd
nodes. This provides enough room to get rid of the "transform" wart in
AlterTableCmd too, since the ColumnDef can carry the USING expression
easily enough.
Also fix some other minor bugs that have crept in in the same areas,
like failure to copy recently-added fields of ColumnDef in copyfuncs.c.
While at it, document the formerly secret ability to specify a collation
in ALTER TABLE ALTER COLUMN TYPE, ALTER TYPE ADD ATTRIBUTE, and
ALTER TYPE ALTER ATTRIBUTE TYPE; and correct some misstatements about
what the default collation selection will be when COLLATE is omitted.
BTW, the three-parameter form of format_type() should go away too,
since it just contributes to the confusion in this area; but I'll do
that in a separate patch.
2011-03-10 04:38:52 +01:00
|
|
|
collid = get_collation_oid(defGetQualifiedName(fromEl), false);
|
2011-02-12 14:54:13 +01:00
|
|
|
tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
|
|
|
|
if (!HeapTupleIsValid(tp))
|
|
|
|
elog(ERROR, "cache lookup failed for collation %u", collid);
|
|
|
|
|
2017-03-23 20:25:34 +01:00
|
|
|
collprovider = ((Form_pg_collation) GETSTRUCT(tp))->collprovider;
|
2019-03-22 12:09:32 +01:00
|
|
|
collisdeterministic = ((Form_pg_collation) GETSTRUCT(tp))->collisdeterministic;
|
2017-06-30 14:50:39 +02:00
|
|
|
collencoding = ((Form_pg_collation) GETSTRUCT(tp))->collencoding;
|
2011-02-12 14:54:13 +01:00
|
|
|
|
2022-01-27 08:44:31 +01:00
|
|
|
datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
|
|
|
|
if (!isnull)
|
|
|
|
collcollate = TextDatumGetCString(datum);
|
|
|
|
else
|
|
|
|
collcollate = NULL;
|
|
|
|
|
|
|
|
datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
|
|
|
|
if (!isnull)
|
|
|
|
collctype = TextDatumGetCString(datum);
|
|
|
|
else
|
|
|
|
collctype = NULL;
|
|
|
|
|
2022-03-17 11:11:21 +01:00
|
|
|
datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_colliculocale, &isnull);
|
|
|
|
if (!isnull)
|
|
|
|
colliculocale = TextDatumGetCString(datum);
|
|
|
|
else
|
|
|
|
colliculocale = NULL;
|
|
|
|
|
2011-02-12 14:54:13 +01:00
|
|
|
ReleaseSysCache(tp);
|
2017-06-13 14:49:41 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Copying the "default" collation is not allowed because most code
|
|
|
|
* checks for DEFAULT_COLLATION_OID instead of COLLPROVIDER_DEFAULT,
|
|
|
|
* and so having a second collation with COLLPROVIDER_DEFAULT would
|
|
|
|
* not work and potentially confuse or crash some code. This could be
|
|
|
|
* fixed with some legwork.
|
|
|
|
*/
|
|
|
|
if (collprovider == COLLPROVIDER_DEFAULT)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("collation \"default\" cannot be copied")));
|
2011-02-12 14:54:13 +01:00
|
|
|
}
|
2022-03-11 08:27:24 +01:00
|
|
|
else
|
2011-02-12 14:54:13 +01:00
|
|
|
{
|
2022-03-11 08:27:24 +01:00
|
|
|
char *collproviderstr = NULL;
|
2011-02-12 14:54:13 +01:00
|
|
|
|
2022-03-11 08:27:24 +01:00
|
|
|
collcollate = NULL;
|
|
|
|
collctype = NULL;
|
2022-03-17 11:11:21 +01:00
|
|
|
colliculocale = NULL;
|
2019-03-22 12:09:32 +01:00
|
|
|
|
2022-03-11 08:27:24 +01:00
|
|
|
if (providerEl)
|
|
|
|
collproviderstr = defGetString(providerEl);
|
2021-05-07 10:17:42 +02:00
|
|
|
|
2022-03-11 08:27:24 +01:00
|
|
|
if (deterministicEl)
|
|
|
|
collisdeterministic = defGetBoolean(deterministicEl);
|
2017-03-23 20:25:34 +01:00
|
|
|
else
|
2022-03-11 08:27:24 +01:00
|
|
|
collisdeterministic = true;
|
|
|
|
|
|
|
|
if (versionEl)
|
|
|
|
collversion = defGetString(versionEl);
|
|
|
|
|
|
|
|
if (collproviderstr)
|
|
|
|
{
|
|
|
|
if (pg_strcasecmp(collproviderstr, "icu") == 0)
|
|
|
|
collprovider = COLLPROVIDER_ICU;
|
|
|
|
else if (pg_strcasecmp(collproviderstr, "libc") == 0)
|
|
|
|
collprovider = COLLPROVIDER_LIBC;
|
|
|
|
else
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("unrecognized collation provider: %s",
|
|
|
|
collproviderstr)));
|
|
|
|
}
|
|
|
|
else
|
|
|
|
collprovider = COLLPROVIDER_LIBC;
|
|
|
|
|
2022-03-17 11:11:21 +01:00
|
|
|
if (localeEl)
|
|
|
|
{
|
|
|
|
if (collprovider == COLLPROVIDER_LIBC)
|
|
|
|
{
|
|
|
|
collcollate = defGetString(localeEl);
|
|
|
|
collctype = defGetString(localeEl);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
colliculocale = defGetString(localeEl);
|
|
|
|
}
|
2017-03-23 20:25:34 +01:00
|
|
|
|
2022-03-17 11:11:21 +01:00
|
|
|
if (lccollateEl)
|
|
|
|
collcollate = defGetString(lccollateEl);
|
|
|
|
|
|
|
|
if (lcctypeEl)
|
|
|
|
collctype = defGetString(lcctypeEl);
|
|
|
|
|
|
|
|
if (collprovider == COLLPROVIDER_LIBC)
|
|
|
|
{
|
|
|
|
if (!collcollate)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("parameter \"lc_collate\" must be specified")));
|
|
|
|
|
|
|
|
if (!collctype)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("parameter \"lc_ctype\" must be specified")));
|
|
|
|
}
|
|
|
|
else if (collprovider == COLLPROVIDER_ICU)
|
|
|
|
{
|
|
|
|
if (!colliculocale)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("parameter \"locale\" must be specified")));
|
|
|
|
}
|
2011-02-12 14:54:13 +01:00
|
|
|
|
2022-03-11 08:27:24 +01:00
|
|
|
/*
|
|
|
|
* Nondeterministic collations are currently only supported with ICU
|
|
|
|
* because that's the only case where it can actually make a
|
|
|
|
* difference. So we can save writing the code for the other
|
|
|
|
* providers.
|
|
|
|
*/
|
|
|
|
if (!collisdeterministic && collprovider != COLLPROVIDER_ICU)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("nondeterministic collations not supported with this provider")));
|
2019-03-22 12:09:32 +01:00
|
|
|
|
2017-06-30 14:50:39 +02:00
|
|
|
if (collprovider == COLLPROVIDER_ICU)
|
2021-09-03 22:38:55 +02:00
|
|
|
{
|
|
|
|
#ifdef USE_ICU
|
|
|
|
/*
|
|
|
|
* We could create ICU collations with collencoding == database
|
|
|
|
* encoding, but it seems better to use -1 so that it matches the
|
|
|
|
* way initdb would create ICU collations. However, only allow
|
|
|
|
* one to be created when the current database's encoding is
|
|
|
|
* supported. Otherwise the collation is useless, plus we get
|
|
|
|
* surprising behaviors like not being able to drop the collation.
|
|
|
|
*
|
|
|
|
* Skip this test when !USE_ICU, because the error we want to
|
|
|
|
* throw for that isn't thrown till later.
|
|
|
|
*/
|
|
|
|
if (!is_encoding_supported_by_icu(GetDatabaseEncoding()))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("current database's encoding is not supported with this provider")));
|
|
|
|
#endif
|
2017-06-30 14:50:39 +02:00
|
|
|
collencoding = -1;
|
2021-09-03 22:38:55 +02:00
|
|
|
}
|
2017-06-30 14:50:39 +02:00
|
|
|
else
|
|
|
|
{
|
|
|
|
collencoding = GetDatabaseEncoding();
|
|
|
|
check_encoding_locale_matches(collencoding, collcollate, collctype);
|
|
|
|
}
|
2017-03-23 20:25:34 +01:00
|
|
|
}
|
|
|
|
|
2021-05-07 10:17:42 +02:00
|
|
|
if (!collversion)
|
2022-03-17 11:11:21 +01:00
|
|
|
collversion = get_collation_actual_version(collprovider, collprovider == COLLPROVIDER_ICU ? colliculocale : collcollate);
|
2021-05-07 10:17:42 +02:00
|
|
|
|
2011-03-04 21:14:37 +01:00
|
|
|
newoid = CollationCreate(collName,
|
2011-03-11 19:20:11 +01:00
|
|
|
collNamespace,
|
|
|
|
GetUserId(),
|
2017-03-23 20:25:34 +01:00
|
|
|
collprovider,
|
2019-03-22 12:09:32 +01:00
|
|
|
collisdeterministic,
|
2017-03-23 20:25:34 +01:00
|
|
|
collencoding,
|
2011-03-11 19:20:11 +01:00
|
|
|
collcollate,
|
2017-01-18 18:00:00 +01:00
|
|
|
collctype,
|
2022-03-17 11:11:21 +01:00
|
|
|
colliculocale,
|
2021-05-07 10:17:42 +02:00
|
|
|
collversion,
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
if_not_exists,
|
|
|
|
false); /* not quiet */
|
2017-01-18 18:00:00 +01:00
|
|
|
|
|
|
|
if (!OidIsValid(newoid))
|
|
|
|
return InvalidObjectAddress;
|
2011-03-04 21:14:37 +01:00
|
|
|
|
Allow creation of C/POSIX collations without depending on libc behavior.
Most of our collations code has special handling for the locale names
"C" and "POSIX", allowing those collations to be used whether or not
the system libraries think those locale names are valid, or indeed
whether said libraries even have any locale support. But we missed
handling things that way in CREATE COLLATION. This meant you couldn't
clone the C/POSIX collations, nor explicitly define a new collation
using those locale names, unless the libraries allow it. That's pretty
pointless, as well as being a violation of pg_newlocale_from_collation's
API specification.
The practical effect of this change is quite limited: it allows creating
such collations even on platforms that don't HAVE_LOCALE_T, and it allows
making "POSIX" collation objects on Windows, which before this would only
let you make "C" collation objects. Hence, even though this is a bug fix
IMO, it doesn't seem worth the trouble to back-patch.
In passing, suppress the DROP CASCADE detail messages at the end of the
collation regression test. I'm surprised we've never been bit by
message ordering issues there.
Per report from Murtuza Zabuawala.
Discussion: https://postgr.es/m/CAKKotZS-wcDcofXDCH=sidiuajE+nqHn2CGjLLX78anyDmi3gQ@mail.gmail.com
2017-08-01 19:51:05 +02:00
|
|
|
/*
|
|
|
|
* Check that the locales can be loaded. NB: pg_newlocale_from_collation
|
|
|
|
* is only supposed to be called on non-C-equivalent locales.
|
|
|
|
*/
|
2011-03-04 21:14:37 +01:00
|
|
|
CommandCounterIncrement();
|
Allow creation of C/POSIX collations without depending on libc behavior.
Most of our collations code has special handling for the locale names
"C" and "POSIX", allowing those collations to be used whether or not
the system libraries think those locale names are valid, or indeed
whether said libraries even have any locale support. But we missed
handling things that way in CREATE COLLATION. This meant you couldn't
clone the C/POSIX collations, nor explicitly define a new collation
using those locale names, unless the libraries allow it. That's pretty
pointless, as well as being a violation of pg_newlocale_from_collation's
API specification.
The practical effect of this change is quite limited: it allows creating
such collations even on platforms that don't HAVE_LOCALE_T, and it allows
making "POSIX" collation objects on Windows, which before this would only
let you make "C" collation objects. Hence, even though this is a bug fix
IMO, it doesn't seem worth the trouble to back-patch.
In passing, suppress the DROP CASCADE detail messages at the end of the
collation regression test. I'm surprised we've never been bit by
message ordering issues there.
Per report from Murtuza Zabuawala.
Discussion: https://postgr.es/m/CAKKotZS-wcDcofXDCH=sidiuajE+nqHn2CGjLLX78anyDmi3gQ@mail.gmail.com
2017-08-01 19:51:05 +02:00
|
|
|
if (!lc_collate_is_c(newoid) || !lc_ctype_is_c(newoid))
|
|
|
|
(void) pg_newlocale_from_collation(newoid);
|
|
|
|
|
|
|
|
ObjectAddressSet(address, CollationRelationId, newoid);
|
2012-12-24 00:25:03 +01:00
|
|
|
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
return address;
|
2011-02-12 14:54:13 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
Rework order of checks in ALTER / SET SCHEMA
When attempting to move an object into the schema in which it already
was, for most objects classes we were correctly complaining about
exactly that ("object is already in schema"); but for some other object
classes, such as functions, we were instead complaining of a name
collision ("object already exists in schema"). The latter is wrong and
misleading, per complaint from Robert Haas in
CA+TgmoZ0+gNf7RDKRc3u5rHXffP=QjqPZKGxb4BsPz65k7qnHQ@mail.gmail.com
To fix, refactor the way these checks are done. As a bonus, the
resulting code is smaller and can also share some code with Rename
cases.
While at it, remove use of getObjectDescriptionOids() in error messages.
These are normally disallowed because of translatability considerations,
but this one had slipped through since 9.1. (Not sure that this is
worth backpatching, though, as it would create some untranslated
messages in back branches.)
This is loosely based on a patch by KaiGai Kohei, heavily reworked by
me.
2013-01-15 17:23:43 +01:00
|
|
|
* Subroutine for ALTER COLLATION SET SCHEMA and RENAME
|
|
|
|
*
|
|
|
|
* Is there a collation with the same name of the given collation already in
|
|
|
|
* the given namespace? If so, raise an appropriate error message.
|
2011-02-12 14:54:13 +01:00
|
|
|
*/
|
Rework order of checks in ALTER / SET SCHEMA
When attempting to move an object into the schema in which it already
was, for most objects classes we were correctly complaining about
exactly that ("object is already in schema"); but for some other object
classes, such as functions, we were instead complaining of a name
collision ("object already exists in schema"). The latter is wrong and
misleading, per complaint from Robert Haas in
CA+TgmoZ0+gNf7RDKRc3u5rHXffP=QjqPZKGxb4BsPz65k7qnHQ@mail.gmail.com
To fix, refactor the way these checks are done. As a bonus, the
resulting code is smaller and can also share some code with Rename
cases.
While at it, remove use of getObjectDescriptionOids() in error messages.
These are normally disallowed because of translatability considerations,
but this one had slipped through since 9.1. (Not sure that this is
worth backpatching, though, as it would create some untranslated
messages in back branches.)
This is loosely based on a patch by KaiGai Kohei, heavily reworked by
me.
2013-01-15 17:23:43 +01:00
|
|
|
void
|
|
|
|
IsThereCollationInNamespace(const char *collname, Oid nspOid)
|
2011-02-12 14:54:13 +01:00
|
|
|
{
|
2011-03-11 19:20:11 +01:00
|
|
|
/* make sure the name doesn't already exist in new schema */
|
|
|
|
if (SearchSysCacheExists3(COLLNAMEENCNSP,
|
Rework order of checks in ALTER / SET SCHEMA
When attempting to move an object into the schema in which it already
was, for most objects classes we were correctly complaining about
exactly that ("object is already in schema"); but for some other object
classes, such as functions, we were instead complaining of a name
collision ("object already exists in schema"). The latter is wrong and
misleading, per complaint from Robert Haas in
CA+TgmoZ0+gNf7RDKRc3u5rHXffP=QjqPZKGxb4BsPz65k7qnHQ@mail.gmail.com
To fix, refactor the way these checks are done. As a bonus, the
resulting code is smaller and can also share some code with Rename
cases.
While at it, remove use of getObjectDescriptionOids() in error messages.
These are normally disallowed because of translatability considerations,
but this one had slipped through since 9.1. (Not sure that this is
worth backpatching, though, as it would create some untranslated
messages in back branches.)
This is loosely based on a patch by KaiGai Kohei, heavily reworked by
me.
2013-01-15 17:23:43 +01:00
|
|
|
CStringGetDatum(collname),
|
2011-03-11 19:20:11 +01:00
|
|
|
Int32GetDatum(GetDatabaseEncoding()),
|
Rework order of checks in ALTER / SET SCHEMA
When attempting to move an object into the schema in which it already
was, for most objects classes we were correctly complaining about
exactly that ("object is already in schema"); but for some other object
classes, such as functions, we were instead complaining of a name
collision ("object already exists in schema"). The latter is wrong and
misleading, per complaint from Robert Haas in
CA+TgmoZ0+gNf7RDKRc3u5rHXffP=QjqPZKGxb4BsPz65k7qnHQ@mail.gmail.com
To fix, refactor the way these checks are done. As a bonus, the
resulting code is smaller and can also share some code with Rename
cases.
While at it, remove use of getObjectDescriptionOids() in error messages.
These are normally disallowed because of translatability considerations,
but this one had slipped through since 9.1. (Not sure that this is
worth backpatching, though, as it would create some untranslated
messages in back branches.)
This is loosely based on a patch by KaiGai Kohei, heavily reworked by
me.
2013-01-15 17:23:43 +01:00
|
|
|
ObjectIdGetDatum(nspOid)))
|
2011-03-11 19:20:11 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DUPLICATE_OBJECT),
|
|
|
|
errmsg("collation \"%s\" for encoding \"%s\" already exists in schema \"%s\"",
|
Rework order of checks in ALTER / SET SCHEMA
When attempting to move an object into the schema in which it already
was, for most objects classes we were correctly complaining about
exactly that ("object is already in schema"); but for some other object
classes, such as functions, we were instead complaining of a name
collision ("object already exists in schema"). The latter is wrong and
misleading, per complaint from Robert Haas in
CA+TgmoZ0+gNf7RDKRc3u5rHXffP=QjqPZKGxb4BsPz65k7qnHQ@mail.gmail.com
To fix, refactor the way these checks are done. As a bonus, the
resulting code is smaller and can also share some code with Rename
cases.
While at it, remove use of getObjectDescriptionOids() in error messages.
These are normally disallowed because of translatability considerations,
but this one had slipped through since 9.1. (Not sure that this is
worth backpatching, though, as it would create some untranslated
messages in back branches.)
This is loosely based on a patch by KaiGai Kohei, heavily reworked by
me.
2013-01-15 17:23:43 +01:00
|
|
|
collname, GetDatabaseEncodingName(),
|
|
|
|
get_namespace_name(nspOid))));
|
2011-03-11 19:20:11 +01:00
|
|
|
|
|
|
|
/* mustn't match an any-encoding entry, either */
|
|
|
|
if (SearchSysCacheExists3(COLLNAMEENCNSP,
|
Rework order of checks in ALTER / SET SCHEMA
When attempting to move an object into the schema in which it already
was, for most objects classes we were correctly complaining about
exactly that ("object is already in schema"); but for some other object
classes, such as functions, we were instead complaining of a name
collision ("object already exists in schema"). The latter is wrong and
misleading, per complaint from Robert Haas in
CA+TgmoZ0+gNf7RDKRc3u5rHXffP=QjqPZKGxb4BsPz65k7qnHQ@mail.gmail.com
To fix, refactor the way these checks are done. As a bonus, the
resulting code is smaller and can also share some code with Rename
cases.
While at it, remove use of getObjectDescriptionOids() in error messages.
These are normally disallowed because of translatability considerations,
but this one had slipped through since 9.1. (Not sure that this is
worth backpatching, though, as it would create some untranslated
messages in back branches.)
This is loosely based on a patch by KaiGai Kohei, heavily reworked by
me.
2013-01-15 17:23:43 +01:00
|
|
|
CStringGetDatum(collname),
|
2011-03-11 19:20:11 +01:00
|
|
|
Int32GetDatum(-1),
|
Rework order of checks in ALTER / SET SCHEMA
When attempting to move an object into the schema in which it already
was, for most objects classes we were correctly complaining about
exactly that ("object is already in schema"); but for some other object
classes, such as functions, we were instead complaining of a name
collision ("object already exists in schema"). The latter is wrong and
misleading, per complaint from Robert Haas in
CA+TgmoZ0+gNf7RDKRc3u5rHXffP=QjqPZKGxb4BsPz65k7qnHQ@mail.gmail.com
To fix, refactor the way these checks are done. As a bonus, the
resulting code is smaller and can also share some code with Rename
cases.
While at it, remove use of getObjectDescriptionOids() in error messages.
These are normally disallowed because of translatability considerations,
but this one had slipped through since 9.1. (Not sure that this is
worth backpatching, though, as it would create some untranslated
messages in back branches.)
This is loosely based on a patch by KaiGai Kohei, heavily reworked by
me.
2013-01-15 17:23:43 +01:00
|
|
|
ObjectIdGetDatum(nspOid)))
|
2011-03-11 19:20:11 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DUPLICATE_OBJECT),
|
|
|
|
errmsg("collation \"%s\" already exists in schema \"%s\"",
|
Rework order of checks in ALTER / SET SCHEMA
When attempting to move an object into the schema in which it already
was, for most objects classes we were correctly complaining about
exactly that ("object is already in schema"); but for some other object
classes, such as functions, we were instead complaining of a name
collision ("object already exists in schema"). The latter is wrong and
misleading, per complaint from Robert Haas in
CA+TgmoZ0+gNf7RDKRc3u5rHXffP=QjqPZKGxb4BsPz65k7qnHQ@mail.gmail.com
To fix, refactor the way these checks are done. As a bonus, the
resulting code is smaller and can also share some code with Rename
cases.
While at it, remove use of getObjectDescriptionOids() in error messages.
These are normally disallowed because of translatability considerations,
but this one had slipped through since 9.1. (Not sure that this is
worth backpatching, though, as it would create some untranslated
messages in back branches.)
This is loosely based on a patch by KaiGai Kohei, heavily reworked by
me.
2013-01-15 17:23:43 +01:00
|
|
|
collname, get_namespace_name(nspOid))));
|
2011-02-12 14:54:13 +01:00
|
|
|
}
|
2017-01-18 18:00:00 +01:00
|
|
|
|
2021-05-07 10:17:42 +02:00
|
|
|
/*
|
|
|
|
* ALTER COLLATION
|
|
|
|
*/
|
|
|
|
ObjectAddress
|
|
|
|
AlterCollation(AlterCollationStmt *stmt)
|
|
|
|
{
|
|
|
|
Relation rel;
|
|
|
|
Oid collOid;
|
|
|
|
HeapTuple tup;
|
|
|
|
Form_pg_collation collForm;
|
2022-01-27 08:44:31 +01:00
|
|
|
Datum datum;
|
2021-05-07 10:17:42 +02:00
|
|
|
bool isnull;
|
|
|
|
char *oldversion;
|
|
|
|
char *newversion;
|
|
|
|
ObjectAddress address;
|
|
|
|
|
|
|
|
rel = table_open(CollationRelationId, RowExclusiveLock);
|
|
|
|
collOid = get_collation_oid(stmt->collname, false);
|
|
|
|
|
2022-10-29 23:13:23 +02:00
|
|
|
if (collOid == DEFAULT_COLLATION_OID)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errmsg("cannot refresh version of default collation"),
|
|
|
|
errhint("Use ALTER DATABASE ... REFRESH COLLATION VERSION instead.")));
|
|
|
|
|
2022-11-13 08:11:17 +01:00
|
|
|
if (!object_ownercheck(CollationRelationId, collOid, GetUserId()))
|
2021-05-07 10:17:42 +02:00
|
|
|
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_COLLATION,
|
|
|
|
NameListToString(stmt->collname));
|
|
|
|
|
|
|
|
tup = SearchSysCacheCopy1(COLLOID, ObjectIdGetDatum(collOid));
|
|
|
|
if (!HeapTupleIsValid(tup))
|
|
|
|
elog(ERROR, "cache lookup failed for collation %u", collOid);
|
|
|
|
|
|
|
|
collForm = (Form_pg_collation) GETSTRUCT(tup);
|
2022-01-27 08:44:31 +01:00
|
|
|
datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collversion, &isnull);
|
|
|
|
oldversion = isnull ? NULL : TextDatumGetCString(datum);
|
2021-05-07 10:17:42 +02:00
|
|
|
|
2022-03-17 11:11:21 +01:00
|
|
|
datum = SysCacheGetAttr(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate, &isnull);
|
|
|
|
if (isnull)
|
|
|
|
elog(ERROR, "unexpected null in pg_collation");
|
2022-01-27 08:44:31 +01:00
|
|
|
newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
|
2021-05-07 10:17:42 +02:00
|
|
|
|
|
|
|
/* cannot change from NULL to non-NULL or vice versa */
|
|
|
|
if ((!oldversion && newversion) || (oldversion && !newversion))
|
|
|
|
elog(ERROR, "invalid collation version change");
|
|
|
|
else if (oldversion && newversion && strcmp(newversion, oldversion) != 0)
|
|
|
|
{
|
|
|
|
bool nulls[Natts_pg_collation];
|
|
|
|
bool replaces[Natts_pg_collation];
|
|
|
|
Datum values[Natts_pg_collation];
|
|
|
|
|
|
|
|
ereport(NOTICE,
|
|
|
|
(errmsg("changing version from %s to %s",
|
|
|
|
oldversion, newversion)));
|
|
|
|
|
|
|
|
memset(values, 0, sizeof(values));
|
|
|
|
memset(nulls, false, sizeof(nulls));
|
|
|
|
memset(replaces, false, sizeof(replaces));
|
|
|
|
|
|
|
|
values[Anum_pg_collation_collversion - 1] = CStringGetTextDatum(newversion);
|
|
|
|
replaces[Anum_pg_collation_collversion - 1] = true;
|
|
|
|
|
|
|
|
tup = heap_modify_tuple(tup, RelationGetDescr(rel),
|
|
|
|
values, nulls, replaces);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
ereport(NOTICE,
|
|
|
|
(errmsg("version has not changed")));
|
|
|
|
|
|
|
|
CatalogTupleUpdate(rel, &tup->t_self, tup);
|
|
|
|
|
|
|
|
InvokeObjectPostAlterHook(CollationRelationId, collOid, 0);
|
|
|
|
|
|
|
|
ObjectAddressSet(address, CollationRelationId, collOid);
|
|
|
|
|
|
|
|
heap_freetuple(tup);
|
|
|
|
table_close(rel, NoLock);
|
|
|
|
|
|
|
|
return address;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2017-03-23 20:25:34 +01:00
|
|
|
Datum
|
2021-02-26 03:29:27 +01:00
|
|
|
pg_collation_actual_version(PG_FUNCTION_ARGS)
|
2017-03-23 20:25:34 +01:00
|
|
|
{
|
2022-10-29 22:30:15 +02:00
|
|
|
Oid collid = PG_GETARG_OID(0);
|
|
|
|
char provider;
|
|
|
|
char *locale;
|
|
|
|
char *version;
|
|
|
|
Datum datum;
|
|
|
|
bool isnull;
|
|
|
|
|
|
|
|
if (collid == DEFAULT_COLLATION_OID)
|
|
|
|
{
|
|
|
|
/* retrieve from pg_database */
|
2017-03-23 20:25:34 +01:00
|
|
|
|
2022-10-29 22:30:15 +02:00
|
|
|
HeapTuple dbtup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
|
|
|
|
if (!HeapTupleIsValid(dbtup))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("database with OID %u does not exist", MyDatabaseId)));
|
2021-05-07 10:17:42 +02:00
|
|
|
|
2022-10-29 22:30:15 +02:00
|
|
|
provider = ((Form_pg_database) GETSTRUCT(dbtup))->datlocprovider;
|
2021-05-07 10:17:42 +02:00
|
|
|
|
2022-10-29 22:30:15 +02:00
|
|
|
datum = SysCacheGetAttr(DATABASEOID, dbtup,
|
|
|
|
provider == COLLPROVIDER_ICU ?
|
|
|
|
Anum_pg_database_daticulocale : Anum_pg_database_datcollate,
|
|
|
|
&isnull);
|
2022-03-17 11:11:21 +01:00
|
|
|
if (isnull)
|
2022-10-29 22:30:15 +02:00
|
|
|
elog(ERROR, "unexpected null in pg_database");
|
|
|
|
|
|
|
|
locale = TextDatumGetCString(datum);
|
|
|
|
|
|
|
|
ReleaseSysCache(dbtup);
|
2022-03-17 11:11:21 +01:00
|
|
|
}
|
|
|
|
else
|
2022-10-29 22:30:15 +02:00
|
|
|
{
|
|
|
|
/* retrieve from pg_collation */
|
|
|
|
|
|
|
|
HeapTuple colltp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
|
|
|
|
if (!HeapTupleIsValid(colltp))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("collation with OID %u does not exist", collid)));
|
|
|
|
|
|
|
|
provider = ((Form_pg_collation) GETSTRUCT(colltp))->collprovider;
|
|
|
|
Assert(provider != COLLPROVIDER_DEFAULT);
|
|
|
|
datum = SysCacheGetAttr(COLLOID, colltp,
|
|
|
|
provider == COLLPROVIDER_ICU ?
|
|
|
|
Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate,
|
|
|
|
&isnull);
|
|
|
|
if (isnull)
|
|
|
|
elog(ERROR, "unexpected null in pg_collation");
|
2021-05-07 10:17:42 +02:00
|
|
|
|
2022-10-29 22:30:15 +02:00
|
|
|
locale = TextDatumGetCString(datum);
|
|
|
|
|
|
|
|
ReleaseSysCache(colltp);
|
|
|
|
}
|
2017-03-23 20:25:34 +01:00
|
|
|
|
2022-10-29 22:30:15 +02:00
|
|
|
version = get_collation_actual_version(provider, locale);
|
2017-03-23 20:25:34 +01:00
|
|
|
if (version)
|
|
|
|
PG_RETURN_TEXT_P(cstring_to_text(version));
|
|
|
|
else
|
|
|
|
PG_RETURN_NULL();
|
|
|
|
}
|
|
|
|
|
2017-01-18 18:00:00 +01:00
|
|
|
|
2017-07-08 18:42:25 +02:00
|
|
|
/* will we use "locale -a" in pg_import_system_collations? */
|
|
|
|
#if defined(HAVE_LOCALE_T) && !defined(WIN32)
|
|
|
|
#define READ_LOCALE_A_OUTPUT
|
|
|
|
#endif
|
|
|
|
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
#ifdef READ_LOCALE_A_OUTPUT
|
2017-01-18 18:00:00 +01:00
|
|
|
/*
|
2017-03-23 20:25:34 +01:00
|
|
|
* "Normalize" a libc locale name, stripping off encoding tags such as
|
2017-01-18 18:00:00 +01:00
|
|
|
* ".utf8" (e.g., "en_US.utf8" -> "en_US", but "br_FR.iso885915@euro"
|
|
|
|
* -> "br_FR@euro"). Return true if a new, different name was
|
|
|
|
* generated.
|
|
|
|
*/
|
|
|
|
static bool
|
2017-03-23 20:25:34 +01:00
|
|
|
normalize_libc_locale_name(char *new, const char *old)
|
2017-01-18 18:00:00 +01:00
|
|
|
{
|
|
|
|
char *n = new;
|
|
|
|
const char *o = old;
|
|
|
|
bool changed = false;
|
|
|
|
|
|
|
|
while (*o)
|
|
|
|
{
|
|
|
|
if (*o == '.')
|
|
|
|
{
|
|
|
|
/* skip over encoding tag such as ".utf8" or ".UTF-8" */
|
|
|
|
o++;
|
|
|
|
while ((*o >= 'A' && *o <= 'Z')
|
|
|
|
|| (*o >= 'a' && *o <= 'z')
|
|
|
|
|| (*o >= '0' && *o <= '9')
|
|
|
|
|| (*o == '-'))
|
|
|
|
o++;
|
|
|
|
changed = true;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
*n++ = *o++;
|
|
|
|
}
|
|
|
|
*n = '\0';
|
|
|
|
|
|
|
|
return changed;
|
|
|
|
}
|
|
|
|
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
/*
|
|
|
|
* qsort comparator for CollAliasData items
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
cmpaliases(const void *a, const void *b)
|
|
|
|
{
|
|
|
|
const CollAliasData *ca = (const CollAliasData *) a;
|
|
|
|
const CollAliasData *cb = (const CollAliasData *) b;
|
|
|
|
|
|
|
|
/* comparing localename is enough because other fields are derived */
|
|
|
|
return strcmp(ca->localename, cb->localename);
|
|
|
|
}
|
|
|
|
#endif /* READ_LOCALE_A_OUTPUT */
|
|
|
|
|
2017-01-18 18:00:00 +01:00
|
|
|
|
2017-03-23 20:25:34 +01:00
|
|
|
#ifdef USE_ICU
|
Fix memory leakage in ICU encoding conversion, and other code review.
Callers of icu_to_uchar() neglected to pfree the result string when done
with it. This results in catastrophic memory leaks in varstr_cmp(),
because of our prevailing assumption that btree comparison functions don't
leak memory. For safety, make all the call sites clean up leaks, though
I suspect that we could get away without it in formatting.c. I audited
callers of icu_from_uchar() as well, but found no places that seemed to
have a comparable issue.
Add function API specifications for icu_to_uchar() and icu_from_uchar();
the lack of any thought-through specification is perhaps not unrelated
to the existence of this bug in the first place. Fix icu_to_uchar()
to guarantee a nul-terminated result; although no existing caller appears
to care, the fact that it would have been nul-terminated except in
extreme corner cases seems ideally designed to bite someone on the rear
someday. Fix ucnv_fromUChars() destCapacity argument --- in the worst
case, that could perhaps have led to a non-nul-terminated result, too.
Fix icu_from_uchar() to have a more reasonable definition of the function
result --- no callers are actually paying attention, so this isn't a live
bug, but it's certainly sloppily designed. Const-ify icu_from_uchar()'s
input string for consistency.
That is not the end of what needs to be done to these functions, but
it's as much as I have the patience for right now.
Discussion: https://postgr.es/m/1955.1498181798@sss.pgh.pa.us
2017-06-23 18:22:06 +02:00
|
|
|
/*
|
|
|
|
* Get the ICU language tag for a locale name.
|
|
|
|
* The result is a palloc'd string.
|
|
|
|
*/
|
2017-03-23 20:25:34 +01:00
|
|
|
static char *
|
|
|
|
get_icu_language_tag(const char *localename)
|
|
|
|
{
|
|
|
|
char buf[ULOC_FULLNAME_CAPACITY];
|
|
|
|
UErrorCode status;
|
|
|
|
|
|
|
|
status = U_ZERO_ERROR;
|
2020-11-16 21:16:39 +01:00
|
|
|
uloc_toLanguageTag(localename, buf, sizeof(buf), true, &status);
|
2017-03-23 20:25:34 +01:00
|
|
|
if (U_FAILURE(status))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errmsg("could not convert locale name \"%s\" to language tag: %s",
|
|
|
|
localename, u_errorName(status))));
|
|
|
|
|
|
|
|
return pstrdup(buf);
|
|
|
|
}
|
|
|
|
|
Fix memory leakage in ICU encoding conversion, and other code review.
Callers of icu_to_uchar() neglected to pfree the result string when done
with it. This results in catastrophic memory leaks in varstr_cmp(),
because of our prevailing assumption that btree comparison functions don't
leak memory. For safety, make all the call sites clean up leaks, though
I suspect that we could get away without it in formatting.c. I audited
callers of icu_from_uchar() as well, but found no places that seemed to
have a comparable issue.
Add function API specifications for icu_to_uchar() and icu_from_uchar();
the lack of any thought-through specification is perhaps not unrelated
to the existence of this bug in the first place. Fix icu_to_uchar()
to guarantee a nul-terminated result; although no existing caller appears
to care, the fact that it would have been nul-terminated except in
extreme corner cases seems ideally designed to bite someone on the rear
someday. Fix ucnv_fromUChars() destCapacity argument --- in the worst
case, that could perhaps have led to a non-nul-terminated result, too.
Fix icu_from_uchar() to have a more reasonable definition of the function
result --- no callers are actually paying attention, so this isn't a live
bug, but it's certainly sloppily designed. Const-ify icu_from_uchar()'s
input string for consistency.
That is not the end of what needs to be done to these functions, but
it's as much as I have the patience for right now.
Discussion: https://postgr.es/m/1955.1498181798@sss.pgh.pa.us
2017-06-23 18:22:06 +02:00
|
|
|
/*
|
|
|
|
* Get a comment (specifically, the display name) for an ICU locale.
|
Further hacking on ICU collation creation and usage.
pg_import_system_collations() refused to create any ICU collations if
the current database's encoding didn't support ICU. This is wrongheaded:
initdb must initialize pg_collation in an encoding-independent way
since it might be used in other databases with different encodings.
The reason for the restriction seems to be that get_icu_locale_comment()
used icu_from_uchar() to convert the UChar-format display name, and that
unsurprisingly doesn't know what to do in unsupported encodings.
But by the same token that the initial catalog contents must be
encoding-independent, we can't allow non-ASCII characters in the comment
strings. So we don't really need icu_from_uchar() here: just check for
Unicode codes outside the ASCII range, and if there are none, the format
conversion is trivial. If there are some, we can simply not install the
comment. (In my testing, this affects only Norwegian Bokmål, which has
given us trouble before.)
For paranoia's sake, also check for non-ASCII characters in ICU locale
names, and skip such locales, as we do for libc locales. I don't
currently have a reason to believe that this will ever reject anything,
but then again the libc maintainers should have known better too.
With just the import changes, ICU collations can be found in pg_collation
in databases with unsupported encodings. This resulted in more or less
clean failures at runtime, but that's not how things act for unsupported
encodings with libc collations. Make it work the same as our traditional
behavior for libc collations by having collation lookup take into account
whether is_encoding_supported_by_icu().
Adjust documentation to match. Also, expand Table 23.1 to show which
encodings are supported by ICU.
catversion bump because of likely change in pg_collation/pg_description
initial contents in ICU-enabled builds.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-24 19:54:15 +02:00
|
|
|
* The result is a palloc'd string, or NULL if we can't get a comment
|
|
|
|
* or find that it's not all ASCII. (We can *not* accept non-ASCII
|
|
|
|
* comments, because the contents of template0 must be encoding-agnostic.)
|
Fix memory leakage in ICU encoding conversion, and other code review.
Callers of icu_to_uchar() neglected to pfree the result string when done
with it. This results in catastrophic memory leaks in varstr_cmp(),
because of our prevailing assumption that btree comparison functions don't
leak memory. For safety, make all the call sites clean up leaks, though
I suspect that we could get away without it in formatting.c. I audited
callers of icu_from_uchar() as well, but found no places that seemed to
have a comparable issue.
Add function API specifications for icu_to_uchar() and icu_from_uchar();
the lack of any thought-through specification is perhaps not unrelated
to the existence of this bug in the first place. Fix icu_to_uchar()
to guarantee a nul-terminated result; although no existing caller appears
to care, the fact that it would have been nul-terminated except in
extreme corner cases seems ideally designed to bite someone on the rear
someday. Fix ucnv_fromUChars() destCapacity argument --- in the worst
case, that could perhaps have led to a non-nul-terminated result, too.
Fix icu_from_uchar() to have a more reasonable definition of the function
result --- no callers are actually paying attention, so this isn't a live
bug, but it's certainly sloppily designed. Const-ify icu_from_uchar()'s
input string for consistency.
That is not the end of what needs to be done to these functions, but
it's as much as I have the patience for right now.
Discussion: https://postgr.es/m/1955.1498181798@sss.pgh.pa.us
2017-06-23 18:22:06 +02:00
|
|
|
*/
|
2017-03-23 20:25:34 +01:00
|
|
|
static char *
|
|
|
|
get_icu_locale_comment(const char *localename)
|
|
|
|
{
|
|
|
|
UErrorCode status;
|
|
|
|
UChar displayname[128];
|
|
|
|
int32 len_uchar;
|
Further hacking on ICU collation creation and usage.
pg_import_system_collations() refused to create any ICU collations if
the current database's encoding didn't support ICU. This is wrongheaded:
initdb must initialize pg_collation in an encoding-independent way
since it might be used in other databases with different encodings.
The reason for the restriction seems to be that get_icu_locale_comment()
used icu_from_uchar() to convert the UChar-format display name, and that
unsurprisingly doesn't know what to do in unsupported encodings.
But by the same token that the initial catalog contents must be
encoding-independent, we can't allow non-ASCII characters in the comment
strings. So we don't really need icu_from_uchar() here: just check for
Unicode codes outside the ASCII range, and if there are none, the format
conversion is trivial. If there are some, we can simply not install the
comment. (In my testing, this affects only Norwegian Bokmål, which has
given us trouble before.)
For paranoia's sake, also check for non-ASCII characters in ICU locale
names, and skip such locales, as we do for libc locales. I don't
currently have a reason to believe that this will ever reject anything,
but then again the libc maintainers should have known better too.
With just the import changes, ICU collations can be found in pg_collation
in databases with unsupported encodings. This resulted in more or less
clean failures at runtime, but that's not how things act for unsupported
encodings with libc collations. Make it work the same as our traditional
behavior for libc collations by having collation lookup take into account
whether is_encoding_supported_by_icu().
Adjust documentation to match. Also, expand Table 23.1 to show which
encodings are supported by ICU.
catversion bump because of likely change in pg_collation/pg_description
initial contents in ICU-enabled builds.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-24 19:54:15 +02:00
|
|
|
int32 i;
|
2017-03-23 20:25:34 +01:00
|
|
|
char *result;
|
|
|
|
|
|
|
|
status = U_ZERO_ERROR;
|
Fix memory leakage in ICU encoding conversion, and other code review.
Callers of icu_to_uchar() neglected to pfree the result string when done
with it. This results in catastrophic memory leaks in varstr_cmp(),
because of our prevailing assumption that btree comparison functions don't
leak memory. For safety, make all the call sites clean up leaks, though
I suspect that we could get away without it in formatting.c. I audited
callers of icu_from_uchar() as well, but found no places that seemed to
have a comparable issue.
Add function API specifications for icu_to_uchar() and icu_from_uchar();
the lack of any thought-through specification is perhaps not unrelated
to the existence of this bug in the first place. Fix icu_to_uchar()
to guarantee a nul-terminated result; although no existing caller appears
to care, the fact that it would have been nul-terminated except in
extreme corner cases seems ideally designed to bite someone on the rear
someday. Fix ucnv_fromUChars() destCapacity argument --- in the worst
case, that could perhaps have led to a non-nul-terminated result, too.
Fix icu_from_uchar() to have a more reasonable definition of the function
result --- no callers are actually paying attention, so this isn't a live
bug, but it's certainly sloppily designed. Const-ify icu_from_uchar()'s
input string for consistency.
That is not the end of what needs to be done to these functions, but
it's as much as I have the patience for right now.
Discussion: https://postgr.es/m/1955.1498181798@sss.pgh.pa.us
2017-06-23 18:22:06 +02:00
|
|
|
len_uchar = uloc_getDisplayName(localename, "en",
|
2017-06-23 22:00:45 +02:00
|
|
|
displayname, lengthof(displayname),
|
Fix memory leakage in ICU encoding conversion, and other code review.
Callers of icu_to_uchar() neglected to pfree the result string when done
with it. This results in catastrophic memory leaks in varstr_cmp(),
because of our prevailing assumption that btree comparison functions don't
leak memory. For safety, make all the call sites clean up leaks, though
I suspect that we could get away without it in formatting.c. I audited
callers of icu_from_uchar() as well, but found no places that seemed to
have a comparable issue.
Add function API specifications for icu_to_uchar() and icu_from_uchar();
the lack of any thought-through specification is perhaps not unrelated
to the existence of this bug in the first place. Fix icu_to_uchar()
to guarantee a nul-terminated result; although no existing caller appears
to care, the fact that it would have been nul-terminated except in
extreme corner cases seems ideally designed to bite someone on the rear
someday. Fix ucnv_fromUChars() destCapacity argument --- in the worst
case, that could perhaps have led to a non-nul-terminated result, too.
Fix icu_from_uchar() to have a more reasonable definition of the function
result --- no callers are actually paying attention, so this isn't a live
bug, but it's certainly sloppily designed. Const-ify icu_from_uchar()'s
input string for consistency.
That is not the end of what needs to be done to these functions, but
it's as much as I have the patience for right now.
Discussion: https://postgr.es/m/1955.1498181798@sss.pgh.pa.us
2017-06-23 18:22:06 +02:00
|
|
|
&status);
|
2017-03-23 20:25:34 +01:00
|
|
|
if (U_FAILURE(status))
|
Further hacking on ICU collation creation and usage.
pg_import_system_collations() refused to create any ICU collations if
the current database's encoding didn't support ICU. This is wrongheaded:
initdb must initialize pg_collation in an encoding-independent way
since it might be used in other databases with different encodings.
The reason for the restriction seems to be that get_icu_locale_comment()
used icu_from_uchar() to convert the UChar-format display name, and that
unsurprisingly doesn't know what to do in unsupported encodings.
But by the same token that the initial catalog contents must be
encoding-independent, we can't allow non-ASCII characters in the comment
strings. So we don't really need icu_from_uchar() here: just check for
Unicode codes outside the ASCII range, and if there are none, the format
conversion is trivial. If there are some, we can simply not install the
comment. (In my testing, this affects only Norwegian Bokmål, which has
given us trouble before.)
For paranoia's sake, also check for non-ASCII characters in ICU locale
names, and skip such locales, as we do for libc locales. I don't
currently have a reason to believe that this will ever reject anything,
but then again the libc maintainers should have known better too.
With just the import changes, ICU collations can be found in pg_collation
in databases with unsupported encodings. This resulted in more or less
clean failures at runtime, but that's not how things act for unsupported
encodings with libc collations. Make it work the same as our traditional
behavior for libc collations by having collation lookup take into account
whether is_encoding_supported_by_icu().
Adjust documentation to match. Also, expand Table 23.1 to show which
encodings are supported by ICU.
catversion bump because of likely change in pg_collation/pg_description
initial contents in ICU-enabled builds.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-24 19:54:15 +02:00
|
|
|
return NULL; /* no good reason to raise an error */
|
|
|
|
|
2020-12-21 01:37:11 +01:00
|
|
|
/* Check for non-ASCII comment (can't use pg_is_ascii for this) */
|
Further hacking on ICU collation creation and usage.
pg_import_system_collations() refused to create any ICU collations if
the current database's encoding didn't support ICU. This is wrongheaded:
initdb must initialize pg_collation in an encoding-independent way
since it might be used in other databases with different encodings.
The reason for the restriction seems to be that get_icu_locale_comment()
used icu_from_uchar() to convert the UChar-format display name, and that
unsurprisingly doesn't know what to do in unsupported encodings.
But by the same token that the initial catalog contents must be
encoding-independent, we can't allow non-ASCII characters in the comment
strings. So we don't really need icu_from_uchar() here: just check for
Unicode codes outside the ASCII range, and if there are none, the format
conversion is trivial. If there are some, we can simply not install the
comment. (In my testing, this affects only Norwegian Bokmål, which has
given us trouble before.)
For paranoia's sake, also check for non-ASCII characters in ICU locale
names, and skip such locales, as we do for libc locales. I don't
currently have a reason to believe that this will ever reject anything,
but then again the libc maintainers should have known better too.
With just the import changes, ICU collations can be found in pg_collation
in databases with unsupported encodings. This resulted in more or less
clean failures at runtime, but that's not how things act for unsupported
encodings with libc collations. Make it work the same as our traditional
behavior for libc collations by having collation lookup take into account
whether is_encoding_supported_by_icu().
Adjust documentation to match. Also, expand Table 23.1 to show which
encodings are supported by ICU.
catversion bump because of likely change in pg_collation/pg_description
initial contents in ICU-enabled builds.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-24 19:54:15 +02:00
|
|
|
for (i = 0; i < len_uchar; i++)
|
|
|
|
{
|
|
|
|
if (displayname[i] > 127)
|
|
|
|
return NULL;
|
|
|
|
}
|
2017-03-23 20:25:34 +01:00
|
|
|
|
Further hacking on ICU collation creation and usage.
pg_import_system_collations() refused to create any ICU collations if
the current database's encoding didn't support ICU. This is wrongheaded:
initdb must initialize pg_collation in an encoding-independent way
since it might be used in other databases with different encodings.
The reason for the restriction seems to be that get_icu_locale_comment()
used icu_from_uchar() to convert the UChar-format display name, and that
unsurprisingly doesn't know what to do in unsupported encodings.
But by the same token that the initial catalog contents must be
encoding-independent, we can't allow non-ASCII characters in the comment
strings. So we don't really need icu_from_uchar() here: just check for
Unicode codes outside the ASCII range, and if there are none, the format
conversion is trivial. If there are some, we can simply not install the
comment. (In my testing, this affects only Norwegian Bokmål, which has
given us trouble before.)
For paranoia's sake, also check for non-ASCII characters in ICU locale
names, and skip such locales, as we do for libc locales. I don't
currently have a reason to believe that this will ever reject anything,
but then again the libc maintainers should have known better too.
With just the import changes, ICU collations can be found in pg_collation
in databases with unsupported encodings. This resulted in more or less
clean failures at runtime, but that's not how things act for unsupported
encodings with libc collations. Make it work the same as our traditional
behavior for libc collations by having collation lookup take into account
whether is_encoding_supported_by_icu().
Adjust documentation to match. Also, expand Table 23.1 to show which
encodings are supported by ICU.
catversion bump because of likely change in pg_collation/pg_description
initial contents in ICU-enabled builds.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-24 19:54:15 +02:00
|
|
|
/* OK, transcribe */
|
|
|
|
result = palloc(len_uchar + 1);
|
|
|
|
for (i = 0; i < len_uchar; i++)
|
|
|
|
result[i] = displayname[i];
|
|
|
|
result[len_uchar] = '\0';
|
2017-03-23 20:25:34 +01:00
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
#endif /* USE_ICU */
|
|
|
|
|
|
|
|
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
/*
|
|
|
|
* pg_import_system_collations: add known system collations to pg_collation
|
|
|
|
*/
|
2017-01-18 18:00:00 +01:00
|
|
|
Datum
|
|
|
|
pg_import_system_collations(PG_FUNCTION_ARGS)
|
|
|
|
{
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
Oid nspid = PG_GETARG_OID(0);
|
|
|
|
int ncreated = 0;
|
2017-01-18 18:00:00 +01:00
|
|
|
|
|
|
|
if (!superuser())
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
2020-01-30 17:32:04 +01:00
|
|
|
errmsg("must be superuser to import system collations")));
|
2017-01-18 18:00:00 +01:00
|
|
|
|
2021-03-09 00:21:51 +01:00
|
|
|
if (!SearchSysCacheExists1(NAMESPACEOID, ObjectIdGetDatum(nspid)))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_SCHEMA),
|
|
|
|
errmsg("schema with OID %u does not exist", nspid)));
|
|
|
|
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
/* Load collations known to libc, using "locale -a" to enumerate them */
|
|
|
|
#ifdef READ_LOCALE_A_OUTPUT
|
|
|
|
{
|
|
|
|
FILE *locale_a_handle;
|
2022-01-27 08:44:31 +01:00
|
|
|
char localebuf[LOCALE_NAME_BUFLEN];
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
int nvalid = 0;
|
|
|
|
Oid collid;
|
|
|
|
CollAliasData *aliases;
|
|
|
|
int naliases,
|
|
|
|
maxaliases,
|
|
|
|
i;
|
|
|
|
|
|
|
|
/* expansible array of aliases */
|
|
|
|
maxaliases = 100;
|
|
|
|
aliases = (CollAliasData *) palloc(maxaliases * sizeof(CollAliasData));
|
|
|
|
naliases = 0;
|
|
|
|
|
|
|
|
locale_a_handle = OpenPipeStream("locale -a", "r");
|
|
|
|
if (locale_a_handle == NULL)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not execute command \"%s\": %m",
|
|
|
|
"locale -a")));
|
2017-06-12 16:28:37 +02:00
|
|
|
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
while (fgets(localebuf, sizeof(localebuf), locale_a_handle))
|
|
|
|
{
|
|
|
|
size_t len;
|
|
|
|
int enc;
|
2022-01-27 08:44:31 +01:00
|
|
|
char alias[LOCALE_NAME_BUFLEN];
|
2017-01-18 18:00:00 +01:00
|
|
|
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
len = strlen(localebuf);
|
2017-01-18 18:00:00 +01:00
|
|
|
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
if (len == 0 || localebuf[len - 1] != '\n')
|
|
|
|
{
|
2021-09-15 00:55:15 +02:00
|
|
|
elog(DEBUG1, "skipping locale with too-long name: \"%s\"", localebuf);
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
localebuf[len - 1] = '\0';
|
2017-01-18 18:00:00 +01:00
|
|
|
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
/*
|
|
|
|
* Some systems have locale names that don't consist entirely of
|
|
|
|
* ASCII letters (such as "bokmål" or "français").
|
|
|
|
* This is pretty silly, since we need the locale itself to
|
|
|
|
* interpret the non-ASCII characters. We can't do much with
|
|
|
|
* those, so we filter them out.
|
|
|
|
*/
|
2020-12-21 01:37:11 +01:00
|
|
|
if (!pg_is_ascii(localebuf))
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
{
|
2021-09-15 00:55:15 +02:00
|
|
|
elog(DEBUG1, "skipping locale with non-ASCII name: \"%s\"", localebuf);
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
continue;
|
|
|
|
}
|
2017-01-18 18:00:00 +01:00
|
|
|
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
enc = pg_get_encoding_from_locale(localebuf, false);
|
|
|
|
if (enc < 0)
|
2017-01-18 18:00:00 +01:00
|
|
|
{
|
2021-09-15 00:55:15 +02:00
|
|
|
elog(DEBUG1, "skipping locale with unrecognized encoding: \"%s\"",
|
|
|
|
localebuf);
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
continue;
|
2017-01-18 18:00:00 +01:00
|
|
|
}
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
if (!PG_VALID_BE_ENCODING(enc))
|
2021-09-15 00:55:15 +02:00
|
|
|
{
|
|
|
|
elog(DEBUG1, "skipping locale with client-only encoding: \"%s\"", localebuf);
|
|
|
|
continue;
|
|
|
|
}
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
if (enc == PG_SQL_ASCII)
|
|
|
|
continue; /* C/POSIX are already in the catalog */
|
2017-01-18 18:00:00 +01:00
|
|
|
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
/* count valid locales found in operating system */
|
|
|
|
nvalid++;
|
2017-01-18 18:00:00 +01:00
|
|
|
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
/*
|
|
|
|
* Create a collation named the same as the locale, but quietly
|
|
|
|
* doing nothing if it already exists. This is the behavior we
|
|
|
|
* need even at initdb time, because some versions of "locale -a"
|
|
|
|
* can report the same locale name more than once. And it's
|
|
|
|
* convenient for later import runs, too, since you just about
|
|
|
|
* always want to add on new locales without a lot of chatter
|
|
|
|
* about existing ones.
|
|
|
|
*/
|
|
|
|
collid = CollationCreate(localebuf, nspid, GetUserId(),
|
2019-03-22 12:09:32 +01:00
|
|
|
COLLPROVIDER_LIBC, true, enc,
|
2022-03-17 11:11:21 +01:00
|
|
|
localebuf, localebuf, NULL,
|
2021-05-07 10:17:42 +02:00
|
|
|
get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
true, true);
|
|
|
|
if (OidIsValid(collid))
|
|
|
|
{
|
|
|
|
ncreated++;
|
|
|
|
|
|
|
|
/* Must do CCI between inserts to handle duplicates correctly */
|
|
|
|
CommandCounterIncrement();
|
|
|
|
}
|
2017-01-18 18:00:00 +01:00
|
|
|
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
/*
|
|
|
|
* Generate aliases such as "en_US" in addition to "en_US.utf8"
|
|
|
|
* for ease of use. Note that collation names are unique per
|
|
|
|
* encoding only, so this doesn't clash with "en_US" for LATIN1,
|
|
|
|
* say.
|
|
|
|
*
|
|
|
|
* However, it might conflict with a name we'll see later in the
|
|
|
|
* "locale -a" output. So save up the aliases and try to add them
|
|
|
|
* after we've read all the output.
|
|
|
|
*/
|
|
|
|
if (normalize_libc_locale_name(alias, localebuf))
|
|
|
|
{
|
|
|
|
if (naliases >= maxaliases)
|
|
|
|
{
|
|
|
|
maxaliases *= 2;
|
|
|
|
aliases = (CollAliasData *)
|
|
|
|
repalloc(aliases, maxaliases * sizeof(CollAliasData));
|
|
|
|
}
|
|
|
|
aliases[naliases].localename = pstrdup(localebuf);
|
|
|
|
aliases[naliases].alias = pstrdup(alias);
|
|
|
|
aliases[naliases].enc = enc;
|
|
|
|
naliases++;
|
|
|
|
}
|
|
|
|
}
|
2017-01-18 18:00:00 +01:00
|
|
|
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
ClosePipeStream(locale_a_handle);
|
2017-01-18 18:00:00 +01:00
|
|
|
|
|
|
|
/*
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
* Before processing the aliases, sort them by locale name. The point
|
|
|
|
* here is that if "locale -a" gives us multiple locale names with the
|
|
|
|
* same encoding and base name, say "en_US.utf8" and "en_US.utf-8", we
|
|
|
|
* want to pick a deterministic one of them. First in ASCII sort
|
|
|
|
* order is a good enough rule. (Before PG 10, the code corresponding
|
|
|
|
* to this logic in initdb.c had an additional ordering rule, to
|
|
|
|
* prefer the locale name exactly matching the alias, if any. We
|
|
|
|
* don't need to consider that here, because we would have already
|
|
|
|
* created such a pg_collation entry above, and that one will win.)
|
2017-01-18 18:00:00 +01:00
|
|
|
*/
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
if (naliases > 1)
|
|
|
|
qsort((void *) aliases, naliases, sizeof(CollAliasData), cmpaliases);
|
|
|
|
|
|
|
|
/* Now add aliases, ignoring any that match pre-existing entries */
|
|
|
|
for (i = 0; i < naliases; i++)
|
2017-01-18 18:00:00 +01:00
|
|
|
{
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
char *locale = aliases[i].localename;
|
|
|
|
char *alias = aliases[i].alias;
|
|
|
|
int enc = aliases[i].enc;
|
|
|
|
|
|
|
|
collid = CollationCreate(alias, nspid, GetUserId(),
|
2019-03-22 12:09:32 +01:00
|
|
|
COLLPROVIDER_LIBC, true, enc,
|
2022-03-17 11:11:21 +01:00
|
|
|
locale, locale, NULL,
|
2021-05-07 10:17:42 +02:00
|
|
|
get_collation_actual_version(COLLPROVIDER_LIBC, locale),
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
true, true);
|
|
|
|
if (OidIsValid(collid))
|
|
|
|
{
|
|
|
|
ncreated++;
|
2017-01-18 18:00:00 +01:00
|
|
|
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
CommandCounterIncrement();
|
|
|
|
}
|
|
|
|
}
|
2017-01-18 18:00:00 +01:00
|
|
|
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
/* Give a warning if "locale -a" seems to be malfunctioning */
|
|
|
|
if (nvalid == 0)
|
|
|
|
ereport(WARNING,
|
|
|
|
(errmsg("no usable system locales were found")));
|
2017-01-18 19:44:19 +01:00
|
|
|
}
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
#endif /* READ_LOCALE_A_OUTPUT */
|
2017-01-18 19:44:19 +01:00
|
|
|
|
2017-08-21 15:17:06 +02:00
|
|
|
/*
|
|
|
|
* Load collations known to ICU
|
|
|
|
*
|
|
|
|
* We use uloc_countAvailable()/uloc_getAvailable() rather than
|
|
|
|
* ucol_countAvailable()/ucol_getAvailable(). The former returns a full
|
|
|
|
* set of language+region combinations, whereas the latter only returns
|
2021-04-05 04:18:12 +02:00
|
|
|
* language+region combinations if they are distinct from the language's
|
2017-08-21 15:17:06 +02:00
|
|
|
* base collation. So there might not be a de-DE or en-GB, which would be
|
|
|
|
* confusing.
|
|
|
|
*/
|
2017-03-23 20:25:34 +01:00
|
|
|
#ifdef USE_ICU
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Start the loop at -1 to sneak in the root locale without too much
|
|
|
|
* code duplication.
|
|
|
|
*/
|
2017-08-21 15:17:06 +02:00
|
|
|
for (i = -1; i < uloc_countAvailable(); i++)
|
2017-03-23 20:25:34 +01:00
|
|
|
{
|
|
|
|
const char *name;
|
|
|
|
char *langtag;
|
Further hacking on ICU collation creation and usage.
pg_import_system_collations() refused to create any ICU collations if
the current database's encoding didn't support ICU. This is wrongheaded:
initdb must initialize pg_collation in an encoding-independent way
since it might be used in other databases with different encodings.
The reason for the restriction seems to be that get_icu_locale_comment()
used icu_from_uchar() to convert the UChar-format display name, and that
unsurprisingly doesn't know what to do in unsupported encodings.
But by the same token that the initial catalog contents must be
encoding-independent, we can't allow non-ASCII characters in the comment
strings. So we don't really need icu_from_uchar() here: just check for
Unicode codes outside the ASCII range, and if there are none, the format
conversion is trivial. If there are some, we can simply not install the
comment. (In my testing, this affects only Norwegian Bokmål, which has
given us trouble before.)
For paranoia's sake, also check for non-ASCII characters in ICU locale
names, and skip such locales, as we do for libc locales. I don't
currently have a reason to believe that this will ever reject anything,
but then again the libc maintainers should have known better too.
With just the import changes, ICU collations can be found in pg_collation
in databases with unsupported encodings. This resulted in more or less
clean failures at runtime, but that's not how things act for unsupported
encodings with libc collations. Make it work the same as our traditional
behavior for libc collations by having collation lookup take into account
whether is_encoding_supported_by_icu().
Adjust documentation to match. Also, expand Table 23.1 to show which
encodings are supported by ICU.
catversion bump because of likely change in pg_collation/pg_description
initial contents in ICU-enabled builds.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-24 19:54:15 +02:00
|
|
|
char *icucomment;
|
2022-03-17 11:11:21 +01:00
|
|
|
const char *iculocstr;
|
2017-08-05 17:48:32 +02:00
|
|
|
Oid collid;
|
2017-03-23 20:25:34 +01:00
|
|
|
|
|
|
|
if (i == -1)
|
|
|
|
name = ""; /* ICU root locale */
|
|
|
|
else
|
2017-08-21 15:17:06 +02:00
|
|
|
name = uloc_getAvailable(i);
|
2017-03-23 20:25:34 +01:00
|
|
|
|
|
|
|
langtag = get_icu_language_tag(name);
|
2022-03-17 11:11:21 +01:00
|
|
|
iculocstr = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name;
|
Further hacking on ICU collation creation and usage.
pg_import_system_collations() refused to create any ICU collations if
the current database's encoding didn't support ICU. This is wrongheaded:
initdb must initialize pg_collation in an encoding-independent way
since it might be used in other databases with different encodings.
The reason for the restriction seems to be that get_icu_locale_comment()
used icu_from_uchar() to convert the UChar-format display name, and that
unsurprisingly doesn't know what to do in unsupported encodings.
But by the same token that the initial catalog contents must be
encoding-independent, we can't allow non-ASCII characters in the comment
strings. So we don't really need icu_from_uchar() here: just check for
Unicode codes outside the ASCII range, and if there are none, the format
conversion is trivial. If there are some, we can simply not install the
comment. (In my testing, this affects only Norwegian Bokmål, which has
given us trouble before.)
For paranoia's sake, also check for non-ASCII characters in ICU locale
names, and skip such locales, as we do for libc locales. I don't
currently have a reason to believe that this will ever reject anything,
but then again the libc maintainers should have known better too.
With just the import changes, ICU collations can be found in pg_collation
in databases with unsupported encodings. This resulted in more or less
clean failures at runtime, but that's not how things act for unsupported
encodings with libc collations. Make it work the same as our traditional
behavior for libc collations by having collation lookup take into account
whether is_encoding_supported_by_icu().
Adjust documentation to match. Also, expand Table 23.1 to show which
encodings are supported by ICU.
catversion bump because of likely change in pg_collation/pg_description
initial contents in ICU-enabled builds.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-24 19:54:15 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Be paranoid about not allowing any non-ASCII strings into
|
|
|
|
* pg_collation
|
|
|
|
*/
|
2022-03-17 11:11:21 +01:00
|
|
|
if (!pg_is_ascii(langtag) || !pg_is_ascii(iculocstr))
|
Further hacking on ICU collation creation and usage.
pg_import_system_collations() refused to create any ICU collations if
the current database's encoding didn't support ICU. This is wrongheaded:
initdb must initialize pg_collation in an encoding-independent way
since it might be used in other databases with different encodings.
The reason for the restriction seems to be that get_icu_locale_comment()
used icu_from_uchar() to convert the UChar-format display name, and that
unsurprisingly doesn't know what to do in unsupported encodings.
But by the same token that the initial catalog contents must be
encoding-independent, we can't allow non-ASCII characters in the comment
strings. So we don't really need icu_from_uchar() here: just check for
Unicode codes outside the ASCII range, and if there are none, the format
conversion is trivial. If there are some, we can simply not install the
comment. (In my testing, this affects only Norwegian Bokmål, which has
given us trouble before.)
For paranoia's sake, also check for non-ASCII characters in ICU locale
names, and skip such locales, as we do for libc locales. I don't
currently have a reason to believe that this will ever reject anything,
but then again the libc maintainers should have known better too.
With just the import changes, ICU collations can be found in pg_collation
in databases with unsupported encodings. This resulted in more or less
clean failures at runtime, but that's not how things act for unsupported
encodings with libc collations. Make it work the same as our traditional
behavior for libc collations by having collation lookup take into account
whether is_encoding_supported_by_icu().
Adjust documentation to match. Also, expand Table 23.1 to show which
encodings are supported by ICU.
catversion bump because of likely change in pg_collation/pg_description
initial contents in ICU-enabled builds.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-24 19:54:15 +02:00
|
|
|
continue;
|
|
|
|
|
2017-03-23 20:25:34 +01:00
|
|
|
collid = CollationCreate(psprintf("%s-x-icu", langtag),
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
nspid, GetUserId(),
|
2019-03-22 12:09:32 +01:00
|
|
|
COLLPROVIDER_ICU, true, -1,
|
2022-03-17 11:11:21 +01:00
|
|
|
NULL, NULL, iculocstr,
|
|
|
|
get_collation_actual_version(COLLPROVIDER_ICU, iculocstr),
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
true, true);
|
|
|
|
if (OidIsValid(collid))
|
|
|
|
{
|
|
|
|
ncreated++;
|
2017-03-23 20:25:34 +01:00
|
|
|
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
CommandCounterIncrement();
|
|
|
|
|
Further hacking on ICU collation creation and usage.
pg_import_system_collations() refused to create any ICU collations if
the current database's encoding didn't support ICU. This is wrongheaded:
initdb must initialize pg_collation in an encoding-independent way
since it might be used in other databases with different encodings.
The reason for the restriction seems to be that get_icu_locale_comment()
used icu_from_uchar() to convert the UChar-format display name, and that
unsurprisingly doesn't know what to do in unsupported encodings.
But by the same token that the initial catalog contents must be
encoding-independent, we can't allow non-ASCII characters in the comment
strings. So we don't really need icu_from_uchar() here: just check for
Unicode codes outside the ASCII range, and if there are none, the format
conversion is trivial. If there are some, we can simply not install the
comment. (In my testing, this affects only Norwegian Bokmål, which has
given us trouble before.)
For paranoia's sake, also check for non-ASCII characters in ICU locale
names, and skip such locales, as we do for libc locales. I don't
currently have a reason to believe that this will ever reject anything,
but then again the libc maintainers should have known better too.
With just the import changes, ICU collations can be found in pg_collation
in databases with unsupported encodings. This resulted in more or less
clean failures at runtime, but that's not how things act for unsupported
encodings with libc collations. Make it work the same as our traditional
behavior for libc collations by having collation lookup take into account
whether is_encoding_supported_by_icu().
Adjust documentation to match. Also, expand Table 23.1 to show which
encodings are supported by ICU.
catversion bump because of likely change in pg_collation/pg_description
initial contents in ICU-enabled builds.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-24 19:54:15 +02:00
|
|
|
icucomment = get_icu_locale_comment(name);
|
|
|
|
if (icucomment)
|
|
|
|
CreateComments(collid, CollationRelationId, 0,
|
|
|
|
icucomment);
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
}
|
2017-03-23 20:25:34 +01:00
|
|
|
}
|
|
|
|
}
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
#endif /* USE_ICU */
|
2017-03-23 20:25:34 +01:00
|
|
|
|
Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once. All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.
The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known". This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.
While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them. And make it return a count of the number of collations
it did add, which seems like it might be helpful.
Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities. This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.
In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.
Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that. There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.
Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 20:19:48 +02:00
|
|
|
PG_RETURN_INT32(ncreated);
|
2017-01-18 18:00:00 +01:00
|
|
|
}
|