Fix ts_headline() edge cases for empty query and empty search text.

tsquery's GETQUERY() macro is only safe to apply to a tsquery
that is known non-empty; otherwise it gives a pointer to garbage.
Before commit 5a617d75d, ts_headline() avoided this pitfall, but
only in a very indirect, nonobvious way.  (hlCover could not reach
its TS_execute call, because if the query contains no lexemes
then hlFirstIndex would surely return -1.)  After that commit,
it fell into the trap, resulting in weird errors such as
"unrecognized operator" and/or valgrind complaints.  In HEAD,
fix this by not calling TS_execute_locations() at all for an
empty query.  In the back branches, add a defensive check to
hlCover() --- that's not fixing any live bug, but I judge the
code a bit too fragile as-is.

Also, both mark_hl_fragments() and mark_hl_words() were careless
about the possibility of empty search text: in the cases where
no match has been found, they'd end up telling mark_fragment() to
mark from word indexes 0 to 0 inclusive, even when there is no
word 0.  This is harmless since we over-allocated the prs->words
array, but it does annoy valgrind.  Fix so that the end index is -1
and thus mark_fragment() will do nothing in such cases.

Bottom line is that this fixes a live bug in HEAD, but in the
back branches it's only getting rid of a valgrind nitpick.
Back-patch anyway.

Per report from Alexander Lakhin.

Discussion: https://postgr.es/m/c27f642d-020b-01ff-ae61-086af287c4fd@gmail.com
This commit is contained in:
Tom Lane 2023-04-06 15:52:37 -04:00
parent eac34f7eb3
commit bc428b12ac
3 changed files with 33 additions and 2 deletions

View File

@ -2046,6 +2046,9 @@ hlCover(HeadlineParsedText *prs, TSQuery query, int max_cover,
nextpmax;
hlCheck ch;
if (query->size <= 0)
return false; /* empty query matches nothing */
/*
* We look for the earliest, shortest substring of prs->words that
* satisfies the query. Both the pmin and pmax indices must be words
@ -2350,7 +2353,8 @@ mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall,
/* show the first min_words words if we have not marked anything */
if (num_f <= 0)
{
startpos = endpos = curlen = 0;
startpos = curlen = 0;
endpos = -1;
for (i = 0; i < prs->curwords && curlen < min_words; i++)
{
if (!NONWORDTOKEN(prs->words[i].type))
@ -2505,7 +2509,7 @@ mark_hl_words(HeadlineParsedText *prs, TSQuery query, bool highlightall,
if (bestlen < 0)
{
curlen = 0;
pose = 0;
pose = -1;
for (i = 0; i < prs->curwords && curlen < min_words; i++)
{
if (!NONWORDTOKEN(prs->words[i].type))

View File

@ -1515,6 +1515,27 @@ to_tsquery('english','Lorem') && phraseto_tsquery('english','ullamcorper urna'),
<b>Lorem</b> ipsum <b>urna</b>. Nullam nullam <b>ullamcorper</b> <b>urna</b>
(1 row)
-- Edge cases with empty query
SELECT ts_headline('english',
'', ''::tsquery);
NOTICE: text-search query doesn't contain lexemes: ""
LINE 2: '', ''::tsquery);
^
ts_headline
-------------
(1 row)
SELECT ts_headline('english',
'foo bar', ''::tsquery);
NOTICE: text-search query doesn't contain lexemes: ""
LINE 2: 'foo bar', ''::tsquery);
^
ts_headline
-------------
foo bar
(1 row)
--Rewrite sub system
CREATE TABLE test_tsquery (txtkeyword TEXT, txtsample TEXT);
\set ECHO none

View File

@ -451,6 +451,12 @@ SELECT ts_headline('english',
to_tsquery('english','Lorem') && phraseto_tsquery('english','ullamcorper urna'),
'MaxFragments=100, MaxWords=100, MinWords=1');
-- Edge cases with empty query
SELECT ts_headline('english',
'', ''::tsquery);
SELECT ts_headline('english',
'foo bar', ''::tsquery);
--Rewrite sub system
CREATE TABLE test_tsquery (txtkeyword TEXT, txtsample TEXT);