Commit Graph

99 Commits

Author SHA1 Message Date
Alexander Korotkov f2e403803f Support for INCLUDE attributes in GiST indexes
Similarly to B-tree, GiST index access method gets support of INCLUDE
attributes.  These attributes aren't used for tree navigation and aren't
present in non-leaf pages.  But they are present in leaf pages and can be
fetched during index-only scan.

The point of having INCLUDE attributes in GiST indexes is slightly different
from the point of having them in B-tree.  The main point of INCLUDE attributes
in B-tree is to define UNIQUE constraint over part of attributes enabled for
index-only scan.  In GiST the main point of INCLUDE attributes is to use
index-only scan for attributes, whose data types don't have GiST opclasses.

Discussion: https://postgr.es/m/73A1A452-AD5F-40D4-BD61-978622FF75C1%40yandex-team.ru
Author: Andrey Borodin, with small changes by me
Reviewed-by: Andreas Karlsson
2019-03-10 11:37:17 +03:00
Tom Lane fd582317e1 Sync our Snowball stemmer dictionaries with current upstream.
We haven't touched these since text search functionality landed in core
in 2007 :-(.  While the upstream project isn't a beehive of activity,
they do make additions and bug fixes from time to time.  Update our
copies of these files.

Also update our documentation about how to keep things in sync, since
they're not making distribution tarballs these days.  Fortunately,
their source code turns out to be a breeze to build.

Notable changes:

* The non-UTF8 version of the hungarian stemmer now works in LATIN2
not LATIN1.

* New stemmers have appeared for arabic, indonesian, irish, lithuanian,
nepali, and tamil.  These all work in UTF8, and the indonesian and
irish ones also work in LATIN1.

(There are some new stemmers that I did not incorporate, mainly because
their names don't match the underlying languages, suggesting that they're
not to be considered mainstream.)

Worth noting: the upstream Nepali dictionary was contributed by
Arthur Zakirov.

initdb forced because the contents of snowball_create.sql have
changed.

Still TODO: see about updating the stopword lists.

Arthur Zakirov, minor mods and doc work by me

Discussion: https://postgr.es/m/20180626122025.GA12647@zakirov.localdomain
Discussion: https://postgr.es/m/20180219140849.GA9050@zakirov.localdomain
2018-09-24 17:29:38 -04:00
Peter Eisentraut 0a63f996e0 Change PROCEDURE to FUNCTION in CREATE TRIGGER syntax
Since procedures are now a different thing from functions, change the
CREATE TRIGGER and CREATE EVENT TRIGGER syntax to use FUNCTION in the
clause that specifies the function.  PROCEDURE is still accepted for
compatibility.

pg_dump and ruleutils.c output is not changed yet, because that would
require a change in information_schema.sql and thus a catversion change.

Reported-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Jonathan S. Katz <jonathan.katz@excoventures.com>
2018-08-22 14:44:49 +02:00
Peter Eisentraut a06e56b247 doc: Update redirecting links
Update links that resulted in redirects.  Most are changes from http to
https, but there are also some other minor edits.  (There are still some
redirects where the target URL looks less elegant than the one we
currently have.  I have left those as is.)
2018-07-16 10:48:05 +02:00
Peter Eisentraut 17485552ec doc: Fix some whitespace issues 2018-05-21 11:42:43 -04:00
Teodor Sigaev 1664ae1978 Add websearch_to_tsquery
Error-tolerant conversion function with web-like syntax for search query,
it simplifies  constraining search engine with close to habitual interface for
users.

Bump catalog version

Authors: Victor Drobny, Dmitry Ivanov with editorization by me
Reviewed by: Aleksander Alekseev, Tomas Vondra, Thomas Munro, Aleksandr Parfenov
Discussion: https://www.postgresql.org/message-id/flat/fe931111ff7e9ad79196486ada79e268@postgrespro.ru
2018-04-05 19:55:11 +03:00
Tom Lane fb8697b31a Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers.
We have a lot of code in which option names, which from the user's
viewpoint are logically keywords, are passed through the grammar as plain
identifiers, and then matched to string literals during command execution.
This approach avoids making words into lexer keywords unnecessarily.  Some
places matched these strings using plain strcmp, some using pg_strcasecmp.
But the latter should be unnecessary since identifiers would have been
downcased on their way through the parser.  Aside from any efficiency
concerns (probably not a big factor), the lack of consistency in this area
creates a hazard of subtle bugs due to different places coming to different
conclusions about whether two option names are the same or different.
Hence, standardize on using strcmp() to match any option names that are
expected to have been fed through the parser.

This does create a user-visible behavioral change, which is that while
formerly all of these would work:
	alter table foo set (fillfactor = 50);
	alter table foo set (FillFactor = 50);
	alter table foo set ("fillfactor" = 50);
	alter table foo set ("FillFactor" = 50);
now the last case will fail because that double-quoted identifier is
different from the others.  However, none of our documentation says that
you can use a quoted identifier in such contexts at all, and we should
discourage doing so since it would break if we ever decide to parse such
constructs as true lexer keywords rather than poor man's substitutes.
So this shouldn't create a significant compatibility issue for users.

Daniel Gustafsson, reviewed by Michael Paquier, small changes by me

Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se
2018-01-26 18:25:14 -05:00
Bruce Momjian 255f14183a docs: replace dblink() mention with foreign data mention
Reported-by: steven.winfield@cantabcapital.com

Discussion: https://postgr.es/m/20171031105039.17183.850@wrigleys.postgresql.org
2018-01-12 16:53:33 -05:00
Peter Eisentraut 3c49c6facb Convert documentation to DocBook XML
Since some preparation work had already been done, the only source
changes left were changing empty-element tags like <xref linkend="foo">
to <xref linkend="foo"/>, and changing the DOCTYPE.

The source files are still named *.sgml, but they are actually XML files
now.  Renaming could be considered later.

In the build system, the intermediate step to convert from SGML to XML
is removed.  Everything is build straight from the source files again.
The OpenSP (or the old SP) package is no longer needed.

The documentation toolchain instructions are updated and are much
simpler now.

Peter Eisentraut, Alexander Lakhin, Jürgen Purtz
2017-11-23 09:44:28 -05:00
Peter Eisentraut c29c578908 Don't use SGML empty tags
For DocBook XML compatibility, don't use SGML empty tags (</>) anymore,
replace by the full tag name.  Add a warning option to catch future
occurrences.

Alexander Lakhin, Jürgen Purtz
2017-10-17 15:10:33 -04:00
Peter Eisentraut 44b3230e82 Use lower-case SGML attribute values
for DocBook XML compatibility
2017-10-10 10:15:57 -04:00
Robert Haas 7ada2d31f4 Remove contrib/tsearch2.
This module was intended to ease migrations of applications that used
the pre-8.3 version of text search to the in-core version introduced
in that release.  However, since all pre-8.3 releases of the database
have been out of support for more than 5 years at this point, we
expect that few people are depending on it at this point.  If some
people still need it, nothing prevents it from being maintained as a
separate extension, outside of core.

Discussion: http://postgr.es/m/CA+Tgmob5R8aDHiFRTQsSJbT1oreKg2FOSBrC=2f4tqEH3dOMAg@mail.gmail.com
2017-02-13 11:06:11 -05:00
Tom Lane 89fcea1ace Fix strange behavior (and possible crashes) in full text phrase search.
In an attempt to simplify the tsquery matching engine, the original
phrase search patch invented rewrite rules that would rearrange a
tsquery so that no AND/OR/NOT operator appeared below a PHRASE operator.
But this approach had numerous problems.  The rearrangement step was
missed by ts_rewrite (and perhaps other places), allowing tsqueries
to be created that would cause Assert failures or perhaps crashes at
execution, as reported by Andreas Seltenreich.  The rewrite rules
effectively defined semantics for operators underneath PHRASE that were
buggy, or at least unintuitive.  And because rewriting was done in
tsqueryin() rather than at execution, the rearrangement was user-visible,
which is not very desirable --- for example, it might cause unexpected
matches or failures to match in ts_rewrite.

As a somewhat independent problem, the behavior of nested PHRASE operators
was only sane for left-deep trees; queries like "x <-> (y <-> z)" did not
behave intuitively at all.

To fix, get rid of the rewrite logic altogether, and instead teach the
tsquery execution engine to manage AND/OR/NOT below a PHRASE operator
by explicitly computing the match location(s) and match widths for these
operators.

This requires introducing some additional fields into the publicly visible
ExecPhraseData struct; but since there's no way for third-party code to
pass such a struct to TS_phrase_execute, it shouldn't create an ABI problem
as long as we don't move the offsets of the existing fields.

Another related problem was that index searches supposed that "!x <-> y"
could be lossily approximated as "!x & y", which isn't correct because
the latter will reject, say, "x q y" which the query itself accepts.
This required some tweaking in TS_execute_ternary along with the main
tsquery engine.

Back-patch to 9.6 where phrase operators were introduced.  While this
could be argued to change behavior more than we'd like in a stable branch,
we have to do something about the crash hazards and index-vs-seqscan
inconsistency, and it doesn't seem desirable to let the unintuitive
behaviors induced by the rewriting implementation stand as precedent.

Discussion: https://postgr.es/m/28215.1481999808@sss.pgh.pa.us
Discussion: https://postgr.es/m/26706.1482087250@sss.pgh.pa.us
2016-12-21 15:18:39 -05:00
Tom Lane d5d8a0b7e5 Doc: remove obsolete example.
The documentation for ts_headline() recommends using a sub-select to
avoid extra evaluations of ts_headline() in a query with ORDER BY+LIMIT.
Since commit 9118d03a8 this contortionism is unnecessary, so remove the
recommendation.  Noted by Oleg Bartunov.

Discussion: <CAF4Au4w6rrH_j1bvVhzpOsRiHCog7sGJ3LSX0tY8ZdwhHT88LQ@mail.gmail.com>
2016-11-13 13:12:35 -05:00
Bruce Momjian ca9cb940d2 doc: more replacement of <literal> with something better
Reported-by: Alexander Law

Author: Alexander Law

Backpatch-through: 9.6
2016-08-24 21:11:44 -04:00
Peter Eisentraut 5676da2d01 Documentation spell checking and markup improvements 2016-07-28 22:46:15 -04:00
Tom Lane 4242a715c3 Adjust text search documentation for recent commits.
Fix some now-obsolete statements that were overlooked in commits
6734a1cac, 3dbbd0f02, 028350f61.  Document the behavior of <0>.
Also do a little bit of rearranging and copy-editing for clarity.
2016-06-29 15:00:33 -04:00
Teodor Sigaev 73e6bea603 Document precedence of FTS operators in tsquery
Oleg Bartunov
2016-06-29 17:59:36 +03:00
Teodor Sigaev 028350f619 Make exact distance match for FTS phrase operator
Phrase operator now requires exact distance betweens lexems instead of
less-or-equal.

Per discussion c19fcfec308e6ccd952cdde9e648b505@mail.gmail.com
2016-06-27 20:41:00 +03:00
Tom Lane 6581e930a8 Polish the documentation concerning phrase text search.
Fix grammar, improve examples, etc.

I did not attempt to document the current behavior concerning distance-zero
matches, because I think that's broken and needs to change, so I'm not
going to use up brain cells figuring out how to explain how it works now.
One way or the other, there's still more to write here.
2016-06-09 00:30:59 -04:00
Tom Lane 0b9a234432 Rename tsvector delete() to ts_delete(), and filter() to ts_filter().
The similarity of the original names to SQL keywords seems like a bad
idea.  Rename them before we're stuck with 'em forever.

In passing, minor code and docs cleanup.

Discussion: <4875.1462210058@sss.pgh.pa.us>
2016-05-05 19:43:32 -04:00
Teodor Sigaev f1e3c76066 Fix tsearch docs
Remove mention of setweight(tsquery) which wasn't included in 9.6. Also
replace old forgotten phrase operator to new one.

Dmitry Ivanov
2016-04-26 20:26:26 +03:00
Teodor Sigaev bb140506df Phrase full text search.
Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery.
On-disk and binary in/out format of tsquery are backward compatible.
It has two side effect:
- change order for tsquery, so, users, who has a btree index over tsquery,
  should reindex it
- less number of parenthesis in tsquery output, and tsquery becomes more
  readable

Authors: Teodor Sigaev, Oleg Bartunov, Dmitry Ivanov
Reviewers: Alexander Korotkov, Artur Zakirov
2016-04-07 18:44:18 +03:00
Teodor Sigaev 6943a946c7 Tsvector editing functions
Adds several tsvector editting function: convert tsvector to/from text array,
set weight for given lexemes, delete lexeme(s), unnest, filter lexemes
with given weights

Author: Stas Kelvich with some editorization by me
Reviewers: Tomas Vondram, Teodor Sigaev
2016-03-11 19:22:36 +03:00
Teodor Sigaev d78a7d9c7f Improve support of Hunspell in ispell dictionary.
Now it's possible to load recent version of Hunspell for several languages.
To handle these dictionaries Hunspell patch adds support for:
* FLAG long - sets the double extended ASCII character flag type
* FLAG num - sets the decimal number flag type (from 1 to 65535)
* AF parameter - alias for flag's set

Also it moves test dictionaries into separate directory.

Author: Artur Zakirov with editorization by me
2016-03-04 20:08:47 +03:00
Bruce Momjian 6d8b2aa83a docs: update guidelines on when to use GIN and GiST indexes
Report by Tomas Vondra

Backpatch through 9.5
2015-10-05 13:38:36 -04:00
Teodor Sigaev a1c44e1af6 Update site address of Snowball project 2015-09-07 15:20:45 +03:00
Bruce Momjian f6d65f0c70 docs: consistently uppercase index method and add spacing
Consistently uppercase index method names, e.g. GIN, and add space after
the index method name and the parentheses enclosing the column names.
2015-05-15 11:42:34 -04:00
Kevin Grittner 05258761bf doc: Various typo/grammar fixes
Errors detected using Topy (https://github.com/intgr/topy), all
changes verified by hand and some manual tweaks added.

Marti Raudsepp

Individual changes backpatched, where applicable, as far as 9.0.
2014-08-30 10:52:36 -05:00
Peter Eisentraut 3a9d430af5 doc: Fix DocBook XML validity
The main problem is that DocBook SGML allows indexterm elements just
about everywhere, but DocBook XML is stricter.  For example, this common
pattern

    <varlistentry>
     <indexterm>...</indexterm>
     <term>...</term>
     ...
    </varlistentry>

needs to be changed to something like

    <varlistentry>
     <term>...<indexterm>...</indexterm></term>
     ...
    </varlistentry>

See also bb4eefe7bf.

There is currently nothing in the build system that enforces that things
stay valid, because that requires additional tools and will receive
separate consideration.
2014-05-06 21:28:58 -04:00
Bruce Momjian 0b5c0f3bc7 docs: Add short "cover density" description
Also, previous commit 1420f3a982 to fix
ts_rank_cd() for stripped lexemes was from a patch created by Alex Hill.
2014-03-24 15:46:59 -04:00
Bruce Momjian 1420f3a982 Fix ts_rank_cd() to ignore stripped lexemes
Previously, stripped lexemes got a default location and could be
considered if mixed with non-stripped lexemes.

BACKWARD INCOMPATIBILITY CHANGE
2014-03-24 14:37:16 -04:00
Peter Eisentraut c99d5d1bcc doc: Fix <synopsis> in <term> markup
Although the DTD technically allows this, the resulting HTML is invalid
because it puts block elements inside inline elements.  DocBook 5.0 also
doesn't allow it anymore, so it's fair to assume that this was never
really intended to work.  Replace <synopsis> with <literal>, which is
the markup used elsewhere in the documentation in similar cases.
2013-06-07 22:00:59 -04:00
Kevin Grittner 4bc0d2e2cf Fix typo: lexemes misspelled in full text search docs.
Dan Scott
2012-09-11 19:46:17 -05:00
Peter Eisentraut 5baf6da717 Documentation spell and markup checking 2012-06-08 00:06:20 +03:00
Bruce Momjian fb4340c5ea Update documentation about ts_rank(). 2011-10-13 14:17:20 -04:00
Peter Eisentraut aeabbccea0 Some markup cleanup to deconfuse the find_gt_lt tool
Josh Kupershmidt
2011-08-30 20:32:49 +03:00
Peter Eisentraut a3b681f0bc Link some tables into the surrounding text by their id 2011-05-04 20:24:07 +03:00
Bruce Momjian 159e3d8629 Update contrib documention mentions to point to actual documentation
sections, rather than just calling it "/contrib/module_name".

Also update pg_test_fsync build instructions now that it is in /contrib.
2011-01-26 09:22:21 -05:00
Magnus Hagander 9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Tom Lane 9389ac8928 Document filtering dictionaries in textsearch.sgml.
While at it, copy-edit the description of prefix-match marker support in
synonym dictionaries, and clarify the description of the default unaccent
dictionary a bit more.
2010-08-25 21:42:55 +00:00
Tom Lane 5344945810 Avoid saying "random" when randomness is not actually meant.
Per Thom Brown.
2010-08-20 13:59:45 +00:00
Peter Eisentraut 66424a2848 Fix indentation of verbatim block elements
Block elements with verbatim formatting (literallayout, programlisting,
screen, synopsis) should be aligned at column 0 independent of the surrounding
SGML, because whitespace is significant, and indenting them creates erratic
whitespace in the output.  The CSS stylesheets already take care of indenting
the output.

Assorted markup improvements to go along with it.
2010-07-29 19:34:41 +00:00
Peter Eisentraut 6dcce3985b Remove unnecessary xref endterm attributes and title ids
The endterm attribute is mainly useful when the toolchain does not support
automatic link target text generation for a particular situation.  In  the
past, this was required by the man page tools for all reference page links,
but that is no longer the case, and it now actually gets in the way of
proper automatic link text generation.  The only remaining use cases are
currently xrefs to refsects.
2010-04-03 07:23:02 +00:00
Peter Eisentraut a95e51962d Update broken and permanently moved links 2010-03-17 17:12:31 +00:00
Bruce Momjian 5473df9eb7 Document what user name email symbols are supported by tsearch. 2010-03-13 03:09:04 +00:00
Teodor Sigaev abd8c94ff9 Add prefix support for synonym dictionary 2009-08-14 14:53:20 +00:00
Tom Lane c30446b9c9 Proofreading for Bruce's recent round of documentation proofreading.
Most of those changes were good, but some not so good ...
2009-06-17 21:58:49 +00:00
Bruce Momjian ba36c48e39 Proofreading adjustments for first two parts of documentation (Tutorial
and SQL).
2009-04-27 16:27:36 +00:00
Tom Lane c1c40e580a Fix textsearch documentation examples to not recommend concatenating separate
fields without putting a space between.  Per gripe from Rick Schumeyer.
2009-04-19 20:36:06 +00:00