Doc: improve documentation about ts_headline() function.

Now that I've had my nose in that code, I thought the docs about
it left something to be desired.
This commit is contained in:
Tom Lane 2020-04-09 15:11:08 -04:00
parent 91be1d1906
commit 7627f64ba2

View File

@ -1301,64 +1301,75 @@ ts_headline(<optional> <replaceable class="parameter">config</replaceable> <type
<itemizedlist spacing="compact" mark="bullet"> <itemizedlist spacing="compact" mark="bullet">
<listitem> <listitem>
<para> <para>
<literal>StartSel</literal>, <literal>StopSel</literal>: the strings with <literal>MaxWords</literal>, <literal>MinWords</literal> (integers):
which to delimit query words appearing in the document, to distinguish these numbers determine the longest and shortest headlines to output.
them from other excerpted words. You must double-quote these strings The default values are 35 and 15.
if they contain spaces or commas.
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
<literal>MaxWords</literal>, <literal>MinWords</literal>: these numbers <literal>ShortWord</literal> (integer): words of this length or less
determine the longest and shortest headlines to output. will be dropped at the start and end of a headline, unless they are
query terms. The default value of three eliminates common English
articles.
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
<literal>ShortWord</literal>: words of this length or less will be <literal>HighlightAll</literal> (boolean): if
dropped at the start and end of a headline. The default
value of three eliminates common English articles.
</para>
</listitem>
<listitem>
<para>
<literal>HighlightAll</literal>: Boolean flag; if
<literal>true</literal> the whole document will be used as the <literal>true</literal> the whole document will be used as the
headline, ignoring the preceding three parameters. headline, ignoring the preceding three parameters. The default
is <literal>false</literal>.
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
<literal>MaxFragments</literal>: maximum number of text excerpts <literal>MaxFragments</literal> (integer): maximum number of text
or fragments to display. The default value of zero selects a fragments to display. The default value of zero selects a
non-fragment-oriented headline generation method. A value greater than non-fragment-based headline generation method. A value greater
zero selects fragment-based headline generation. This method than zero selects fragment-based headline generation (see below).
finds text fragments with as many query words as possible and
stretches those fragments around the query words. As a result
query words are close to the middle of each fragment and have words on
each side. Each fragment will be of at most <literal>MaxWords</literal> and
words of length <literal>ShortWord</literal> or less are dropped at the start
and end of each fragment. If not all query words are found in the
document, then a single fragment of the first <literal>MinWords</literal>
in the document will be displayed.
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
<literal>FragmentDelimiter</literal>: When more than one fragment is <literal>StartSel</literal>, <literal>StopSel</literal> (strings):
displayed, the fragments will be separated by this string. the strings with which to delimit query words appearing in the
document, to distinguish them from other excerpted words. The
default values are <quote><literal>&lt;b&gt;</literal></quote> and
<quote><literal>&lt;/b&gt;</literal></quote>, which can be suitable
for HTML output.
</para>
</listitem>
<listitem>
<para>
<literal>FragmentDelimiter</literal> (string): When more than one
fragment is displayed, the fragments will be separated by this string.
The default is <quote><literal> ... </literal></quote>.
</para> </para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
These option names are recognized case-insensitively. These option names are recognized case-insensitively.
Any unspecified options receive these defaults: You must double-quote string values if they contain spaces or commas.
</para>
<programlisting> <para>
StartSel=&lt;b&gt;, StopSel=&lt;/b&gt;, In non-fragment-based headline
MaxWords=35, MinWords=15, ShortWord=3, HighlightAll=FALSE, generation, <function>ts_headline</function> locates matches for the
MaxFragments=0, FragmentDelimiter=" ... " given <replaceable class="parameter">query</replaceable> and chooses a
</programlisting> single one to display, preferring matches that have more query words
within the allowed headline length.
In fragment-based headline generation, <function>ts_headline</function>
locates the query matches and splits each match
into <quote>fragments</quote> of no more than <literal>MaxWords</literal>
words each, preferring fragments with more query words, and when
possible <quote>stretching</quote> fragments to include surrounding
words. The fragment-based mode is thus more useful when the query
matches span large sections of the document, or when it's desirable to
display multiple matches.
In either mode, if no query matches can be identified, then a single
fragment of the first <literal>MinWords</literal> words in the document
will be displayed.
</para> </para>
<para> <para>
@ -1370,25 +1381,24 @@ SELECT ts_headline('english',
is to find all documents containing given query terms is to find all documents containing given query terms
and return them in order of their similarity to the and return them in order of their similarity to the
query.', query.',
to_tsquery('query &amp; similarity')); to_tsquery('english', 'query &amp; similarity'));
ts_headline ts_headline
------------------------------------------------------------ ------------------------------------------------------------
containing given &lt;b&gt;query&lt;/b&gt; terms containing given &lt;b&gt;query&lt;/b&gt; terms +
and return them in order of their &lt;b&gt;similarity&lt;/b&gt; to the and return them in order of their &lt;b&gt;similarity&lt;/b&gt; to the+
&lt;b&gt;query&lt;/b&gt;. &lt;b&gt;query&lt;/b&gt;.
SELECT ts_headline('english', SELECT ts_headline('english',
'The most common type of search 'Search terms may occur
is to find all documents containing given query terms many times in a document,
and return them in order of their similarity to the requiring ranking of the search matches to decide which
query.', occurrences to display in the result.',
to_tsquery('query &amp; similarity'), to_tsquery('english', 'search &amp; term'),
'StartSel = &lt;, StopSel = &gt;'); 'MaxFragments=10, MaxWords=7, MinWords=3, StartSel=&lt;&lt;, StopSel=&gt;&gt;');
ts_headline ts_headline
------------------------------------------------------- ------------------------------------------------------------
containing given &lt;query&gt; terms &lt;&lt;Search&gt;&gt; &lt;&lt;terms&gt;&gt; may occur +
and return them in order of their &lt;similarity&gt; to the many times ... ranking of the &lt;&lt;search&gt;&gt; matches to decide
&lt;query&gt;.
</screen> </screen>
</para> </para>