Commit Graph

53 Commits

Author SHA1 Message Date
Teodor Sigaev
abd8c94ff9 Add prefix support for synonym dictionary 2009-08-14 14:53:20 +00:00
Tom Lane
c30446b9c9 Proofreading for Bruce's recent round of documentation proofreading.
Most of those changes were good, but some not so good ...
2009-06-17 21:58:49 +00:00
Bruce Momjian
ba36c48e39 Proofreading adjustments for first two parts of documentation (Tutorial
and SQL).
2009-04-27 16:27:36 +00:00
Tom Lane
c1c40e580a Fix textsearch documentation examples to not recommend concatenating separate
fields without putting a space between.  Per gripe from Rick Schumeyer.
2009-04-19 20:36:06 +00:00
Tom Lane
bda9dc7e19 Do some copy-editing on description of ts_headline(). 2009-04-14 00:49:56 +00:00
Tom Lane
ff301d6e69 Implement "fastupdate" support for GIN indexes, in which we try to accumulate
multiple index entries in a holding area before adding them to the main index
structure.  This helps because bulk insert is (usually) significantly faster
than retail insert for GIN.

This patch also removes GIN support for amgettuple-style index scans.  The
API defined for amgettuple is difficult to support with fastupdate, and
the previously committed partial-match feature didn't really work with
it either.  We might eventually figure a way to put back amgettuple
support, but it won't happen for 8.4.

catversion bumped because of change in GIN's pg_am entry, and because
the format of GIN indexes changed on-disk (there's a metapage now,
and possibly a pending list).

Teodor Sigaev
2009-03-24 20:17:18 +00:00
Tom Lane
445ce15702 Create a third option named "partition" for constraint_exclusion, and make it
the default.  This setting enables constraint exclusion checks only for
appendrel members (ie, inheritance children and UNION ALL arms), which are
the cases in which constraint exclusion is most likely to be useful.  Avoiding
the overhead for simple queries that are unlikely to benefit should bring
the cost down to the point where this is a reasonable default setting.
Per today's discussion.
2009-01-07 22:40:49 +00:00
Teodor Sigaev
2a0083ede8 Improve headeline generation. Now headline can contain
several fragments a-la Google.

Sushant Sinha <sushant354@gmail.com>
2008-10-17 18:05:19 +00:00
Heikki Linnakangas
61d9674988 Make LC_COLLATE and LC_CTYPE database-level settings. Collation and
ctype are now more like encoding, stored in new datcollate and datctype
columns in pg_database.

This is a stripped-down version of Radek Strnad's patch, with further
changes by me.
2008-09-23 09:20:39 +00:00
Tom Lane
e6dbcb72fa Extend GIN to support partial-match searches, and extend tsquery to support
prefix matching using this facility.

Teodor Sigaev and Oleg Bartunov
2008-05-16 16:31:02 +00:00
Tom Lane
9b5c8d45f6 Push index operator lossiness determination down to GIST/GIN opclass
"consistent" functions, and remove pg_amop.opreqcheck, as per recent
discussion.  The main immediate benefit of this is that we no longer need
8.3's ugly hack of requiring @@@ rather than @@ to test weight-using tsquery
searches on GIN indexes.  In future it should be possible to optimize some
other queries better than is done now, by detecting at runtime whether the
index match is exact or not.

Tom Lane, after an idea of Heikki's, and with some help from Teodor.
2008-04-14 17:05:34 +00:00
Tom Lane
7953fdcd9e Add a CaseSensitive parameter to synonym dictionaries.
Simon Riggs
2008-03-10 03:01:28 +00:00
Bruce Momjian
271205223a Show example of ts_headline() using a configuration name. 2008-03-04 03:17:18 +00:00
Tom Lane
e211470b19 Change a couple of examples to say ALTER MAPPING instead of ADD MAPPING,
per Oleg.
2007-12-13 06:32:47 +00:00
Peter Eisentraut
9293425819 spell checker run 2007-11-28 15:42:31 +00:00
Tom Lane
e66d0c6299 Fix some missed usages of 'HTML tag' and 'HTML entity'. 2007-11-20 15:58:52 +00:00
Andrew Dunstan
1157f3cc81 Change descriptions of entity and tag objects to "XML entity" and "XML tag".
Allow tag and entity names that follow XML rules. Provide for hexadecimal
as well as decimal numeric entities. Adjust code names to coincide with
new descriptions.
2007-11-20 02:25:22 +00:00
Tom Lane
fb8b38e4bf Add a couple of notes pointing out that GIN index build time is very
sensitive to maintenance_work_mem (something I just learned the hard
way).
2007-11-16 03:23:07 +00:00
Tom Lane
a1715ac8f7 Adjust example to reduce confusion between a tsvector column and
an index, per Simon.
2007-11-14 23:48:55 +00:00
Tom Lane
866bad9543 Add a rank/(rank+1) normalization option to ts_rank(). While the usefulness
of this seems a bit marginal, if it's useful enough to be shown in the manual
then we probably ought to support doing it without double evaluation of the
ts_rank function.  Per my proposal earlier today.
2007-11-14 23:43:27 +00:00
Tom Lane
ca450a07ee Add an Accept parameter to "simple" dictionaries. The default of true
gives the old behavior; selecting false allows the dictionary to be used
as a filter ahead of other dictionaries, because it will pass on rather
than accept words that aren't in its stopword list.
Jan Urbanski
2007-11-14 18:36:37 +00:00
Tom Lane
de085820bf Update discussion of tsearch2 migration. I'm not entirely sure about
the division of material between here and the tsearch2 contrib page,
but at least it's not obviously unfinished any more.
2007-11-14 03:26:24 +00:00
Bruce Momjian
d009992ba3 Have text search thesaurus files use "?" for stop words.
Throw an error for actual stop words, rather than a warning.  This fixes
problems with cache reloading causing warning messages.

Re-enable stop words in regression tests;  was disabled by Tom.

Document "?" as API change.
2007-11-10 15:39:34 +00:00
Magnus Hagander
f5f375330e Fix typos.
Guillaume Lelarge
2007-11-05 15:55:53 +00:00
Tom Lane
f9e83a5588 Remove claim that ts_headline knows how to generate multiple ellipsis-separated
excerpts of a document.  That's clearly desirable, but the functionality
is not there yet.
2007-10-29 01:55:11 +00:00
Tom Lane
d015d08b43 Rename default text search parser's "uri" token type to "url_path",
per recommendation from Alvaro.  This doesn't force initdb since the
numeric token type in the catalogs doesn't change; but note that
the expected regression test output changed.
2007-10-27 16:01:09 +00:00
Tom Lane
2aac6f10f6 Minor wording improvements per suggestion from Jeff Davis. Also tweak
hyphenated-word parser examples per earlier discussion with Alvaro.
2007-10-27 00:19:45 +00:00
Alvaro Herrera
0e3ddc8dd5 Use more real-world examples in the text search parser documentation. 2007-10-25 13:06:35 +00:00
Tom Lane
592c88a0d2 Remove the aggregate form of ts_rewrite(), since it doesn't work as desired
if there are zero rows to aggregate over, and the API seems both conceptually
and notationally ugly anyway.  We should look for something that improves
on the tsquery-and-text-SELECT version (which is also pretty ugly but at
least it works...), but it seems that will take query infrastructure that
doesn't exist today.  (Hm, I wonder if there's anything in or near SQL2003
window functions that would help?)  Per discussion.
2007-10-24 02:24:49 +00:00
Tom Lane
dbaec70c15 Rename and slightly redefine the default text search parser's "word"
categories, as per discussion.  asciiword (formerly lword) is still
ASCII-letters-only, and numword (formerly word) is still the most general
mixed-alpha-and-digits case.  But word (formerly nlword) is now
any-group-of-letters-with-at-least-one-non-ASCII, rather than all-non-ASCII as
before.  This is no worse than before for parsing mixed Russian/English text,
which seems to have been the design center for the original coding; and it
should simplify matters for parsing most European languages.  In particular
it will not be necessary for any language to accept strings containing digits
as being regular "words".  The hyphenated-word categories are adjusted
similarly.
2007-10-23 20:46:12 +00:00
Tom Lane
3e17ef1cfa Adjust ts_debug's output as per my proposal of yesterday: show the
active dictionary and its output lexemes as separate columns, instead
of smashing them into one text column, and lowercase the column names.
Also, define the output rowtype using OUT parameters instead of a
composite type, to be consistent with the other built-in functions.
2007-10-22 20:13:37 +00:00
Tom Lane
6088bfb8b6 Create a quick-and-dirty list of known migration issues for pre-8.3
users of tsearch.  This isn't meant to be permanent documentation,
but to call out the areas that need either fixing or real documentation.
2007-10-22 03:37:04 +00:00
Tom Lane
dfc6f130b4 Editorial overhaul for text search documentation. Organize the info
more clearly, improve a lot of unclear descriptions, add some missing
material.  We still need a migration guide though.
2007-10-21 20:04:37 +00:00
Tom Lane
6efae5bf2a Another round of editorialization on the text search documentation.
Notably, standardize on using "token" for the strings output by a parser,
while "lexeme" is reserved for the normalized strings produced by a
dictionary.
2007-10-17 01:01:28 +00:00
Tom Lane
4b21d1f09b Remove obsolete examples of add-on parsers and dictionary templates;
these are more easily and usefully maintained as contrib modules.
Various other wordsmithing, markup improvement, etc.
2007-10-15 21:39:57 +00:00
Neil Conway
f83a9303a6 Minor correction for full-text search limitations docs.
Heikki Linnakangas.
2007-10-10 21:48:22 +00:00
Bruce Momjian
ae36e0d589 Update tsearch include location in example.
Oleg.
2007-09-14 13:21:30 +00:00
Tom Lane
fcc6756341 Sync examples of psql \dF output with current CVS HEAD behavior.
Random other wordsmithing.
2007-09-04 03:46:36 +00:00
Bruce Momjian
45ebcbcc1f Make Gin/Gist text search tertiary index entries in the documentation. 2007-08-31 20:55:57 +00:00
Bruce Momjian
a8b5d6dc26 Place GiST and GIN text search indexes as secondary items under the main
"index" entries for GIN/GiST.
2007-08-31 16:33:36 +00:00
Bruce Momjian
9907b2a74c Again improve text search index entries. 2007-08-31 05:04:03 +00:00
Bruce Momjian
99a01bfd1e In text search docs, properly use indexterm _zone_ only when we want an
entire section, per Peter.
2007-08-31 04:52:29 +00:00
Bruce Momjian
6e832b059e Fix docs so indexes can be built by commenting out GiST/GIN index
entries in textsearch.sgml.
2007-08-31 03:26:27 +00:00
Bruce Momjian
24cba4ee5c Make more logical index sections for text search. 2007-08-30 20:37:26 +00:00
Tatsuo Ishii
c9bfabe24d Fix broken markup. 2007-08-30 01:29:52 +00:00
Bruce Momjian
25188c4f7d Update tsearch documentation wording. 2007-08-29 23:25:47 +00:00
Bruce Momjian
09c29cc57b Text search documentation word improvements; move configuration section
to be more logical.
2007-08-29 21:51:45 +00:00
Bruce Momjian
bb8f629c7a Move full text search operators, functions, and data type sections into
the main documentation, out of its own text search chapter.
2007-08-29 20:37:14 +00:00
Bruce Momjian
f145de27c3 Properly indent SGML in textsearch.sgml. 2007-08-29 02:37:04 +00:00
Bruce Momjian
baf3a134d9 Mention configurations early in text search documentation to table/index
section makes a little more sense.
2007-08-28 03:10:45 +00:00