mirror of
https://git.postgresql.org/git/postgresql.git
synced 2024-09-28 03:21:50 +02:00
fb8697b31a
We have a lot of code in which option names, which from the user's viewpoint are logically keywords, are passed through the grammar as plain identifiers, and then matched to string literals during command execution. This approach avoids making words into lexer keywords unnecessarily. Some places matched these strings using plain strcmp, some using pg_strcasecmp. But the latter should be unnecessary since identifiers would have been downcased on their way through the parser. Aside from any efficiency concerns (probably not a big factor), the lack of consistency in this area creates a hazard of subtle bugs due to different places coming to different conclusions about whether two option names are the same or different. Hence, standardize on using strcmp() to match any option names that are expected to have been fed through the parser. This does create a user-visible behavioral change, which is that while formerly all of these would work: alter table foo set (fillfactor = 50); alter table foo set (FillFactor = 50); alter table foo set ("fillfactor" = 50); alter table foo set ("FillFactor" = 50); now the last case will fail because that double-quoted identifier is different from the others. However, none of our documentation says that you can use a quoted identifier in such contexts at all, and we should discourage doing so since it would break if we ever decide to parse such constructs as true lexer keywords rather than poor man's substitutes. So this shouldn't create a significant compatibility issue for users. Daniel Gustafsson, reviewed by Michael Paquier, small changes by me Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se |
||
---|---|---|
.. | ||
libstemmer | ||
stopwords | ||
.gitignore | ||
dict_snowball.c | ||
Makefile | ||
README | ||
snowball_func.sql.in | ||
snowball.sql.in |
src/backend/snowball/README Snowball-Based Stemming ======================= This module uses the word stemming code developed by the Snowball project, http://snowball.tartarus.org/ which is released by them under a BSD-style license. The files under src/backend/snowball/libstemmer/ and src/include/snowball/libstemmer/ are taken directly from their libstemmer_c distribution, with only some minor adjustments of file inclusions. Note that most of these files are in fact derived files, not master source. The master sources are in the Snowball language, and are available along with the Snowball-to-C compiler from the Snowball project. We choose to include the derived files in the PostgreSQL distribution because most installations will not have the Snowball compiler available. To update the PostgreSQL sources from a new Snowball libstemmer_c distribution: 1. Copy the *.c files in libstemmer_c/src_c/ to src/backend/snowball/libstemmer with replacement of "../runtime/header.h" by "header.h", for example for f in libstemmer_c/src_c/*.c do sed 's|\.\./runtime/header\.h|header.h|' $f >libstemmer/`basename $f` done (Alternatively, if you rebuild the stemmer files from the master Snowball sources, just omit "-r ../runtime" from the Snowball compiler switches.) 2. Copy the *.c files in libstemmer_c/runtime/ to src/backend/snowball/libstemmer, and edit them to remove direct inclusions of system headers such as <stdio.h> --- they should only include "header.h". (This removal avoids portability problems on some platforms where <stdio.h> is sensitive to largefile compilation options.) 3. Copy the *.h files in libstemmer_c/src_c/ and libstemmer_c/runtime/ to src/include/snowball/libstemmer. At this writing the header files do not require any changes. 4. Check whether any stemmer modules have been added or removed. If so, edit the OBJS list in Makefile, the list of #include's in dict_snowball.c, and the stemmer_modules[] table in dict_snowball.c. 5. The various stopword files in stopwords/ must be downloaded individually from pages on the snowball.tartarus.org website. Be careful that these files must be stored in UTF-8 encoding.