postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2024-10-02 03:01:14 +02:00

Author	SHA1	Message	Date
Teodor Sigaev	1ec4c7c055	Restore original tsquery operation numbering. As noticed by Tom Lane changing operation's number in commit `bb140506df` causes on-disk format incompatibility. Revert to previous numbering, that is reason to add special array to store priorities of operation. Also it reverts order of tsquery to previous. Author: Dmitry Ivanov	2016-04-08 20:11:30 +03:00
Teodor Sigaev	a7ace3b6d9	Make testing of phraseto_tsquery independ from value of default_text_search_config variable. Per skink buldfarm member	2016-04-07 19:33:23 +03:00
Teodor Sigaev	bb140506df	Phrase full text search. Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery. On-disk and binary in/out format of tsquery are backward compatible. It has two side effect: - change order for tsquery, so, users, who has a btree index over tsquery, should reindex it - less number of parenthesis in tsquery output, and tsquery becomes more readable Authors: Teodor Sigaev, Oleg Bartunov, Dmitry Ivanov Reviewers: Alexander Korotkov, Artur Zakirov	2016-04-07 18:44:18 +03:00
Teodor Sigaev	61d66c44f1	Fix support of digits in email/hostnames. When tsearch was implemented I did several mistakes in hostname/email definition rules: 1) allow underscore in hostname what prohibited by RFC 2) forget to allow leading digits separated by hyphen (like 123-x.com) in hostname 3) do no allow underscore/hyphen after leading digits in localpart of email Artur's patch resolves two last issues, but by the way allows hosts name like 123_x.com together with 123-x.com. RFC forbids underscore usage in hostname but pg allows that since initial tsearch version in core, although only for non-digits. Patch syncs support digits and nondigits in both hostname and email. Forbidding underscore in hostname may break existsing usage of tsearch and, anyhow, it should be done by separate patch. Author: Artur Zakirov BUG: #13964	2016-03-29 18:28:49 +03:00
Bruce Momjian	1420f3a982	Fix ts_rank_cd() to ignore stripped lexemes Previously, stripped lexemes got a default location and could be considered if mixed with non-stripped lexemes. BACKWARD INCOMPATIBILITY CHANGE	2014-03-24 14:37:16 -04:00
Tom Lane	1db5af2794	Fix gincostestimate to handle ScalarArrayOpExpr reasonably. The original coding of this function overlooked the possibility that it could be passed anything except simple OpExpr indexquals. But ScalarArrayOpExpr is possible too, and the code would probably crash (and surely give ridiculous answers) in such a case. Add logic to try to estimate sanely for such cases. In passing, fix the treatment of inner-indexscan cost estimation: it was failing to scale up properly for multiple iterations of a nestloop. (I think somebody might've thought that index_pages_fetched() is linear, but of course it's not.) Report, diagnosis, and preliminary patch by Marti Raudsepp; I refactored it a bit and fixed the cost estimation. Back-patch into 9.1 where the bogus code was introduced.	2011-12-20 19:57:34 -05:00
Peter Eisentraut	fc946c39ae	Remove useless whitespace at end of lines	2010-11-23 22:34:55 +02:00
Tom Lane	2c265adea3	Modify the built-in text search parser to handle URLs more nearly according to RFC 3986. In particular, these characters now terminate the path part of a URL: '"', '<', '>', '\', '^', '`', '{', '\|', '}'. The previous behavior was inconsistent and depended on whether a "?" was present in the path. Per gripe from Donald Fraser and spec research by Kevin Grittner. This is a pre-existing bug, but not back-patching since the risks of breaking existing applications seem to outweigh the benefits.	2010-04-28 02:04:16 +00:00
Tom Lane	1753337cf5	Improve psql's tabular display of wrapped-around data by inserting markers in the formerly-always-blank columns just to left and right of the data. Different marking is used for a line break caused by a newline in the data than for a straight wraparound. A newline break is signaled by a "+" in the right margin column in ASCII mode, or a carriage return arrow in UNICODE mode. Wraparound is signaled by a dot in the right margin as well as the following left margin in ASCII mode, or an ellipsis symbol in the same places in UNICODE mode. "\pset linestyle old-ascii" is added to make the previous behavior available if anyone really wants it. In passing, this commit also cleans up a few regression test files that had unintended spacing differences from the current actual output. Roger Leigh, reviewed by Gabrielle Roth and other members of PDXPUG.	2009-11-22 05:20:41 +00:00
Tom Lane	7280fab717	Fix bug #4814 (wrong subscript in consistent-function call), and add some minimal regression test coverage for matchPartialInPendingList().	2009-05-19 02:48:26 +00:00
Teodor Sigaev	2a0083ede8	Improve headeline generation. Now headline can contain several fragments a-la Google. Sushant Sinha <sushant354@gmail.com>	2008-10-17 18:05:19 +00:00
Tom Lane	e6dbcb72fa	Extend GIN to support partial-match searches, and extend tsquery to support prefix matching using this facility. Teodor Sigaev and Oleg Bartunov	2008-05-16 16:31:02 +00:00
Tom Lane	689d02a2e9	Fix a regression test that fails if default_text_search_config isn't 'english'.	2008-01-13 21:17:46 +00:00
Tom Lane	82ca4d0210	Fix attribution for Rime of the Ancient Mariner (obviously it's been too long since freshman English :-()	2007-12-10 00:12:31 +00:00
Tom Lane	71e90b0df2	The E. J. Pratt verse used as a tsearch test case is unfortunately still under copyright in the US and many other places. Substitute a little something from a poet who's more safely dead. Per gripe from Bjorn Munch.	2007-12-09 21:01:18 +00:00
Andrew Dunstan	3de1f0daac	Fix XML tag namespace change inadvertantly missed from previous fix. Add regression test for XML names and numeric entities.	2007-11-25 15:37:11 +00:00
Andrew Dunstan	1157f3cc81	Change descriptions of entity and tag objects to "XML entity" and "XML tag". Allow tag and entity names that follow XML rules. Provide for hexadecimal as well as decimal numeric entities. Adjust code names to coincide with new descriptions.	2007-11-20 02:25:22 +00:00
Tom Lane	73e6f9d3b6	Change text search parsing rules for hyphenated words so that digit strings containing decimal points aren't considered part of a hyphenated word. Sync the hyphenated-word lookahead states with the subsequent part-by-part reparsing states so that we don't get different answers about how much text is part of the hyphenated word. Per my gripe of a few days ago.	2007-10-27 19:03:45 +00:00
Tom Lane	d015d08b43	Rename default text search parser's "uri" token type to "url_path", per recommendation from Alvaro. This doesn't force initdb since the numeric token type in the catalogs doesn't change; but note that the expected regression test output changed.	2007-10-27 16:01:09 +00:00
Tom Lane	592c88a0d2	Remove the aggregate form of ts_rewrite(), since it doesn't work as desired if there are zero rows to aggregate over, and the API seems both conceptually and notationally ugly anyway. We should look for something that improves on the tsquery-and-text-SELECT version (which is also pretty ugly but at least it works...), but it seems that will take query infrastructure that doesn't exist today. (Hm, I wonder if there's anything in or near SQL2003 window functions that would help?) Per discussion.	2007-10-24 02:24:49 +00:00
Tom Lane	dbaec70c15	Rename and slightly redefine the default text search parser's "word" categories, as per discussion. asciiword (formerly lword) is still ASCII-letters-only, and numword (formerly word) is still the most general mixed-alpha-and-digits case. But word (formerly nlword) is now any-group-of-letters-with-at-least-one-non-ASCII, rather than all-non-ASCII as before. This is no worse than before for parsing mixed Russian/English text, which seems to have been the design center for the original coding; and it should simplify matters for parsing most European languages. In particular it will not be necessary for any language to accept strings containing digits as being regular "words". The hyphenated-word categories are adjusted similarly.	2007-10-23 20:46:12 +00:00
Tom Lane	12f25e70a6	Fix two-argument form of ts_rewrite() so it actually works for cases where a later rewrite rule should change a subtree modified by an earlier one. Per my gripe of a few days ago.	2007-10-23 01:44:40 +00:00
Tom Lane	93eab9312f	Rename built-in Snowball stemmer dictionaries to be english_stem, russian_stem, etc. Per discussion.	2007-08-25 01:06:25 +00:00
Bruce Momjian	1c36de33b0	Uppercase keywords in regression tsearch test scripts.	2007-08-21 15:41:13 +00:00
Tom Lane	140d4ebcb4	Tsearch2 functionality migrates to core. The bulk of this work is by Oleg Bartunov and Teodor Sigaev, but I did a lot of editorializing, so anything that's broken is probably my fault. Documentation is nonexistent as yet, but let's land the patch so we can get some portability testing done.	2007-08-21 01:11:32 +00:00

25 Commits