postgresql

Commit Graph

Author	SHA1	Message	Date
Bruce Momjian	29275b1d17	Update copyright for 2024 Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12	2024-01-03 20:49:05 -05:00
Michael Paquier	f19669fed3	unaccent: Fix allocation size for target characters on initial load This led to an overestimation of the size allocated for both the quoted and non-quoted cases, while using an inconsistent style. Thinkos in `59f47fb98d`. Per report from Coverity, with extra input from Tom Lane.	2023-09-25 09:31:48 +09:00
Michael Paquier	59f47fb98d	unaccent: Add support for quoted translated characters As reported in bug #18057, the extension unaccent removes in its rule file whitespace characters that are intentionally specified when building unaccent.rules from UnicodeData.txt, causing an incorrect translation for some characters like numeric symbols. This is caused by the fact that all whitespaces before and after the origin and target characters are all discarded (this limitation is documented). This commit makes possible the use of quotes around target characters, so as whitespaces can be considered part of target characters. Some target characters use a double quote, these require an extra double quote. The documentation is updated to show how to use quoted areas, generate_unaccent_rules.py is updated to generate unaccent.rules and a couple of tests are added for numeric symbols. While working on this patch, I have implemented a fake rule file to test the parsing logic implemented, which is not included here as it would just consume extra cycles in the tests, and it requires the manipulation of an installation tree to be able to work correctly. As this requires a change of format in unaccent.rules, this cannot be backpatched, unfortunately. The idea to use double quotes as escaped characters comes from Tom Lane. Reported-by: Martin Schlossarek Author: Michael Paquier Discussion: https://postgr.es/m/18057-62712cad01bd202c@postgresql.org	2023-09-20 12:29:36 +09:00
Bruce Momjian	c8e1ba736b	Update copyright for 2023 Backpatch-through: 11	2023-01-02 15:00:37 -05:00
Bruce Momjian	27b77ecf9f	Update copyright for 2022 Backpatch-through: 10	2022-01-07 19:04:57 -05:00
Bruce Momjian	ca3b37487b	Update copyright for 2021 Backpatch-through: 9.5	2021-01-02 13:06:25 -05:00
Bruce Momjian	7559d8ebfa	Update copyrights for 2020 Backpatch-through: update all files in master, backpatch legal files through 9.4	2020-01-01 12:21:45 -05:00
Bruce Momjian	97c39498e5	Update copyright for 2019 Backpatch-through: certain files through 9.4	2019-01-02 12:44:25 -05:00
Andres Freund	578b229718	Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de	2018-11-20 16:00:17 -08:00
Tom Lane	a5322ca10f	Make contrib/unaccent's unaccent() function work when not in search path. Since the fixes for CVE-2018-1058, we've advised people to schema-qualify function references in order to fix failures in code that executes under a minimal search_path setting. However, that's insufficient to make the single-argument form of unaccent() work, because it looks up the "unaccent" text search dictionary using the search path. The most expedient answer seems to be to remove the search_path dependency by making it look in the same schema that the unaccent() function itself is declared in. This will definitely work for the normal usage of this function with the unaccent dictionary provided by the extension. It's barely possible that there are people who were relying on the search-path-dependent behavior to select other dictionaries with the same name; but if there are any such people at all, they can still get that behavior by writing unaccent('unaccent', ...), or possibly unaccent('unaccent'::text::regdictionary, ...) if the lookup has to be postponed to runtime. Per complaint from Gunnlaugur Thor Briem. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAPs+M8LCex6d=DeneofdsoJVijaG59m9V0ggbb3pOH7hZO4+cQ@mail.gmail.com	2018-09-06 10:49:45 -04:00
Tom Lane	fb8697b31a	Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers. We have a lot of code in which option names, which from the user's viewpoint are logically keywords, are passed through the grammar as plain identifiers, and then matched to string literals during command execution. This approach avoids making words into lexer keywords unnecessarily. Some places matched these strings using plain strcmp, some using pg_strcasecmp. But the latter should be unnecessary since identifiers would have been downcased on their way through the parser. Aside from any efficiency concerns (probably not a big factor), the lack of consistency in this area creates a hazard of subtle bugs due to different places coming to different conclusions about whether two option names are the same or different. Hence, standardize on using strcmp() to match any option names that are expected to have been fed through the parser. This does create a user-visible behavioral change, which is that while formerly all of these would work: alter table foo set (fillfactor = 50); alter table foo set (FillFactor = 50); alter table foo set ("fillfactor" = 50); alter table foo set ("FillFactor" = 50); now the last case will fail because that double-quoted identifier is different from the others. However, none of our documentation says that you can use a quoted identifier in such contexts at all, and we should discourage doing so since it would break if we ever decide to parse such constructs as true lexer keywords rather than poor man's substitutes. So this shouldn't create a significant compatibility issue for users. Daniel Gustafsson, reviewed by Michael Paquier, small changes by me Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se	2018-01-26 18:25:14 -05:00
Bruce Momjian	9d4649ca49	Update copyright for 2018 Backpatch-through: certain files through 9.3	2018-01-02 23:30:12 -05:00
Peter Eisentraut	0e1539ba0d	Add some const decorations to prototypes Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>	2017-11-10 13:38:57 -05:00
Tom Lane	382ceffdf7	Phase 3 of pgindent updates. Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us	2017-06-21 15:35:54 -04:00
Noah Misch	3a0d473192	Use wrappers of PG_DETOAST_DATUM_PACKED() more. This makes almost all core code follow the policy introduced in the previous commit. Specific decisions: - Text search support functions with char* and length arguments, such as prsstart and lexize, may receive unaligned strings. I doubt maintainers of non-core text search code will notice. - Use plain VARDATA() on values detoasted or synthesized earlier in the same function. Use VARDATA_ANY() on varlenas sourced outside the function, even if they happen to always have four-byte headers. As an exception, retain the universal practice of using VARDATA() on return values of SendFunctionCall(). - Retain PG_GETARG_BYTEA_P() in pageinspect. (Page images are too large for a one-byte header, so this misses no optimization.) Sites that do not call get_page_from_raw() typically need the four-byte alignment. - For now, do not change btree_gist. Its use of four-byte headers in memory is partly entangled with storage of 4-byte headers inside GBT_VARKEY, on disk. - For now, do not change gtrgm_consistent() or gtrgm_distance(). They incorporate the varlena header into a cache, and there are multiple credible implementation strategies to consider.	2017-03-12 19:35:34 -04:00
Peter Eisentraut	f21a563d25	Move some things from builtins.h to new header files This avoids that builtins.h has to include additional header files.	2017-01-20 20:29:53 -05:00
Bruce Momjian	1d25779284	Update copyright via script for 2017	2017-01-03 13:48:53 -05:00
Bruce Momjian	ee94300446	Update copyright for 2016 Backpatch certain files through 9.1	2016-01-02 13:33:40 -05:00
Bruce Momjian	4baaf863ec	Update copyright for 2015 Backpatch certain files through 9.0	2015-01-06 11:43:47 -05:00
Tom Lane	5a421a47eb	Fix inadequately-sized output buffer in contrib/unaccent. The output buffer size in unaccent_lexize() was calculated as input string length times pg_database_encoding_max_length(), which effectively assumes that replacement strings aren't more than one character. While that was all that we previously documented it to support, the code actually has always allowed replacement strings of arbitrary length; so if you tried to make use of longer strings, you were at risk of buffer overrun. To fix, use an expansible StringInfo buffer instead of trying to determine the maximum space needed a-priori. This would be a security issue if unaccent rules files could be installed by unprivileged users; but fortunately they can't, so in the back branches the problem can be labeled as improper configuration by a superuser. Nonetheless, a memory stomp isn't a nice way of reacting to improper configuration, so let's back-patch the fix.	2014-07-01 11:23:21 -04:00
Tom Lane	03a25cec8d	Issue a WARNING about invalid rule file format in contrib/unaccent. We were already issuing a WARNING, albeit only elog not ereport, for duplicate source strings; so warning rather than just being stoically silent seems like the best thing to do here. Arguably both of these complaints should be upgraded to ERRORs, but that might be more behavioral change than people want. Note: the faulty line is already printed via an errcontext hook, so there's no need for more information than these messages provide.	2014-06-30 22:03:37 -04:00
Tom Lane	1b2488731c	Allow multi-character source strings in contrib/unaccent. This could be useful in languages where diacritic signs are represented as separate characters; more generally it supports using unaccent dictionaries for substring substitutions beyond narrowly conceived "diacritic removal". In any case, since the rule-file parser doesn't complain about multi-character source strings, it behooves us to do something unsurprising with them.	2014-06-30 21:46:29 -04:00
Tom Lane	97c40ce614	Allow empty replacement strings in contrib/unaccent. This is useful in languages where diacritic signs are represented as separate characters; it's also one step towards letting unaccent be used for arbitrary substring substitutions. In passing, improve the user documentation for unaccent, which was sadly vague about some important details. Mohammad Alhashash, reviewed by Abhijit Menon-Sen	2014-06-30 20:51:30 -04:00
Peter Eisentraut	e7128e8dbb	Create function prototype as part of PG_FUNCTION_INFO_V1 macro Because of gcc -Wmissing-prototypes, all functions in dynamically loadable modules must have a separate prototype declaration. This is meant to detect global functions that are not declared in header files, but in cases where the function is called via dfmgr, this is redundant. Besides filling up space with boilerplate, this is a frequent source of compiler warnings in extension modules. We can fix that by creating the function prototype as part of the PG_FUNCTION_INFO_V1 macro, which such modules have to use anyway. That makes the code of modules cleaner, because there is one less place where the entry points have to be listed, and creates an additional check that functions have the right prototype. Remove now redundant prototypes from contrib and other modules.	2014-04-18 00:03:19 -04:00
Bruce Momjian	7e04792a1c	Update copyright for 2014 Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.	2014-01-07 16:05:30 -05:00
Bruce Momjian	9af4159fce	pgindent run for release 9.3 This is the first run of the Perl-based pgindent script. Also update pgindent instructions.	2013-05-29 16:58:43 -04:00
Heikki Linnakangas	4b06c1820a	The data structure used in unaccent is a trie, not suffix tree. Fix the term used in variable and struct names, and comments. Alexander Korotkov	2013-05-08 20:58:50 +03:00
Bruce Momjian	bd61a623ac	Update copyrights for 2013 Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.	2013-01-01 17:15:01 -05:00
Peter Eisentraut	48658a1b81	Fix some typos Josh Kupershmidt	2012-04-22 19:23:47 +03:00
Bruce Momjian	e126958c2e	Update copyright notices for year 2012.	2012-01-01 18:01:58 -05:00
Tom Lane	ced3a93ccb	Fix assorted bugs in contrib/unaccent's configuration file parsing. Make it use t_isspace() to identify whitespace, rather than relying on sscanf which is known to get it wrong on some platform/locale combinations. Get rid of fixed-size buffers. Make it actually continue to parse the file after ignoring a line with untranslatable characters, as was obviously intended. The first of these issues is per gripe from J Smith, though not exactly either of his proposed patches.	2011-11-07 11:50:18 -05:00
Bruce Momjian	6416a82a62	Remove unnecessary #include references, per pgrminclude script.	2011-09-01 10:04:27 -04:00
Bruce Momjian	5d950e3b0c	Stamp copyrights for year 2011.	2011-01-01 13:18:15 -05:00
Magnus Hagander	9f2e211386	Remove cvs keywords from all files.	2010-09-20 22:08:53 +02:00
Robert Haas	fd1843ff89	Standardize get_whatever_oid functions for other object types. - Rename TSParserGetPrsid to get_ts_parser_oid. - Rename TSDictionaryGetDictid to get_ts_dict_oid. - Rename TSTemplateGetTmplid to get_ts_template_oid. - Rename TSConfigGetCfgid to get_ts_config_oid. - Rename FindConversionByName to get_conversion_oid. - Rename GetConstraintName to get_constraint_oid. - Add new functions get_opclass_oid, get_opfamily_oid, get_rewrite_oid, get_rewrite_oid_without_relid, get_trigger_oid, and get_cast_oid. The name of each function matches the corresponding catalog. Thanks to KaiGai Kohei for the review.	2010-08-05 15:25:36 +00:00
Bruce Momjian	65e806cba1	pgindent run for 9.0	2010-02-26 02:01:40 +00:00
Bruce Momjian	0239800893	Update copyright for the year 2010.	2010-01-02 16:58:17 +00:00
Tom Lane	13845d261c	Fix some other compiler warnings from a different gcc version.	2009-08-18 15:57:26 +00:00
Tom Lane	ec8128465f	Suppress compiler warnings about uninitialized variables.	2009-08-18 15:37:04 +00:00
Teodor Sigaev	92e05bc6a5	Unaccent dictionary.	2009-08-18 10:34:39 +00:00

40 Commits