Commit Graph

61 Commits

Author SHA1 Message Date
Tom Lane f10eab73df Make array_to_tsvector() sort and de-duplicate the given strings.
This is required for the result to be a legal tsvector value.
Noted while fooling with Andreas Seltenreich's ts_delete() crash.

Discussion: <87invhoj6e.fsf@credativ.de>
2016-08-05 16:09:06 -04:00
Tom Lane c50d192ce3 Fix ts_delete(tsvector, text[]) to cope with duplicate array entries.
Such cases either failed an Assert, or produced a corrupt tsvector in
non-Assert builds, as reported by Andreas Seltenreich.  The reason is
that tsvector_delete_by_indices() just assumed that its input array had
no duplicates.  Fix by explicitly de-duping.

In passing, improve some comments, and fix a number of tests for null
values to use ERRCODE_NULL_VALUE_NOT_ALLOWED not
ERRCODE_INVALID_PARAMETER_VALUE.

Discussion: <87invhoj6e.fsf@credativ.de>
2016-08-05 15:14:19 -04:00
Tom Lane 33fe7360af Re-pgindent tsvector_op.c.
Messed up by recent commits --- this is annoying me while trying to fix
some bugs here.
2016-08-05 15:14:19 -04:00
Teodor Sigaev 3dbbd0f02a Do not fallback to AND for FTS phrase operator.
If there is no positional information of lexemes then phrase operator will not
fallback to AND operator. This change makes needing to modify TS_execute()
interface, because somewhere (in indexes, for example) positional information
is unaccesible and in this cases we need to force fallback to AND.

Per discussion c19fcfec308e6ccd952cdde9e648b505@mail.gmail.com
2016-06-27 20:47:32 +03:00
Teodor Sigaev 028350f619 Make exact distance match for FTS phrase operator
Phrase operator now requires exact distance betweens lexems instead of
less-or-equal.

Per discussion c19fcfec308e6ccd952cdde9e648b505@mail.gmail.com
2016-06-27 20:41:00 +03:00
Robert Haas 4bc424b968 pgindent run for 9.6 2016-06-09 18:02:36 -04:00
Tom Lane 0b9a234432 Rename tsvector delete() to ts_delete(), and filter() to ts_filter().
The similarity of the original names to SQL keywords seems like a bad
idea.  Rename them before we're stuck with 'em forever.

In passing, minor code and docs cleanup.

Discussion: <4875.1462210058@sss.pgh.pa.us>
2016-05-05 19:43:32 -04:00
Teodor Sigaev 4bbc1a7ea3 Fix crash of filter(tsvector)
Variable storing a position of lexeme, had a wrong type: char, it's
obviously not enough to store 2^14 possible positions.

Stas Kelvich
2016-05-04 17:58:08 +03:00
Robert Haas 8826d85078 Tweak a few more things in preparation for upcoming pgindent run.
These adjustments adjust code and comments in minor ways to prevent
pgindent from mangling them.  Among other things, I tried to avoid
situations where pgindent would emit "a +b" instead of "a + b", and I
tried to avoid having it break up inline comments across multiple
lines.
2016-05-03 10:52:25 -04:00
Teodor Sigaev 4e55b3f033 Rename comparePos() to compareWordEntryPos()
Rename comparePos() to compareWordEntryPos() to prevent export of too
generic name.

Per gripe from Tom Lane.
2016-04-08 12:04:15 +03:00
Teodor Sigaev bb140506df Phrase full text search.
Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery.
On-disk and binary in/out format of tsquery are backward compatible.
It has two side effect:
- change order for tsquery, so, users, who has a btree index over tsquery,
  should reindex it
- less number of parenthesis in tsquery output, and tsquery becomes more
  readable

Authors: Teodor Sigaev, Oleg Bartunov, Dmitry Ivanov
Reviewers: Alexander Korotkov, Artur Zakirov
2016-04-07 18:44:18 +03:00
Tom Lane 23a27b039d Widen query numbers-of-tuples-processed counters to uint64.
This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout.  Some of these values were declared uint32 before, and
others "long".

I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.

The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command.  Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.

Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long".  It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.

Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-03-12 16:05:29 -05:00
Teodor Sigaev b1fdc727c3 Fix Windows build broken in 6943a946c7
Also it fixes dynamic array allocation disallowed by ANSI-C.

Author: Stas Kelvich
2016-03-11 20:10:20 +03:00
Teodor Sigaev 6943a946c7 Tsvector editing functions
Adds several tsvector editting function: convert tsvector to/from text array,
set weight for given lexemes, delete lexeme(s), unnest, filter lexemes
with given weights

Author: Stas Kelvich with some editorization by me
Reviewers: Tomas Vondram, Teodor Sigaev
2016-03-11 19:22:36 +03:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Teodor Sigaev d63a1720fa Add header forgotten in 213335c145
Report from Peter Eisentraut
2015-09-18 14:32:09 +03:00
Teodor Sigaev 9acb9007de Fix oversight in tsearch type check
Use IsBinaryCoercible() method instead of custom
is_expected_type/is_text_type functions which was introduced when tsearch2
was moved into core.

Per report by David E. Wheeler
Analysis by Tom Lane
Patch by me
2015-09-17 19:50:51 +03:00
Heikki Linnakangas 4fc72cc7bb Collection of typo fixes.
Use "a" and "an" correctly, mostly in comments. Two error messages were
also fixed (they were just elogs, so no translation work required). Two
function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
of these, but I found a lot more with grep.

Also fix a few other typos spotted while grepping for the a/an typos.
For example, "consists out of ..." -> "consists of ...". Plus a "though"/
"through" mixup reported by Euler Taveira.

Many of these typos were in old code, which would be nice to backpatch to
make future backpatching easier. But much of the code was new, and I didn't
feel like crafting separate patches for each branch. So no backpatching.
2015-05-20 16:56:22 +03:00
Tom Lane 2e211211a7 Use FLEXIBLE_ARRAY_MEMBER in a number of other places.
I think we're about done with this...
2015-02-21 16:12:14 -05:00
Bruce Momjian 4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Bruce Momjian 7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Bruce Momjian bd61a623ac Update copyrights for 2013
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
2013-01-01 17:15:01 -05:00
Peter Eisentraut b8b2e3b2de Replace int2/int4 in C code with int16/int32
The latter was already the dominant use, and it's preferable because
in C the convention is that intXX means XX bits.  Therefore, allowing
mixed use of int2, int4, int8, int16, int32 is obviously confusing.

Remove the typedefs for int2 and int4 for now.  They don't seem to be
widely used outside of the PostgreSQL source tree, and the few uses
can probably be cleaned up by the time this ships.
2012-06-25 01:51:46 +03:00
Bruce Momjian 927d61eeff Run pgindent on 9.2 source tree in preparation for first 9.3
commit-fest.
2012-06-10 15:20:04 -04:00
Bruce Momjian e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Bruce Momjian 6416a82a62 Remove unnecessary #include references, per pgrminclude script. 2011-09-01 10:04:27 -04:00
Tom Lane 00eb036c11 Fix potential memory clobber in tsvector_concat().
tsvector_concat() allocated its result workspace using the "conservative"
estimate of the sum of the two input tsvectors' sizes.  Unfortunately that
wasn't so conservative as all that, because it supposed that the number of
pad bytes required could not grow.  Which it can, as per test case from
Jesper Krogh, if there's a mix of lexemes with positions and lexemes
without them in the input data.  The fix is to assume that we might add
a not-previously-present pad byte for each and every lexeme in the two
inputs; which really is conservative, but it doesn't seem worthwhile to
try to be more precise.

This is an aboriginal bug in tsvector_concat, so back-patch to all
versions containing it.
2011-08-26 16:51:34 -04:00
Alvaro Herrera b93f5a5673 Move Trigger and TriggerDesc structs out of rel.h into a new reltrigger.h
This lets us stop including rel.h into execnodes.h, which is a widely
used header.
2011-07-04 14:35:58 -04:00
Bruce Momjian bf50caf105 pgindent run before PG 9.1 beta 1. 2011-04-10 11:42:00 -04:00
Tom Lane 52fd2d65a3 Fix up core tsquery GIN support for new extractQuery API.
No need for the empty-prefix-match kluge to force a full scan anymore.
2011-01-09 14:34:50 -05:00
Bruce Momjian 5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Robert Haas 5aa446c961 Cleanup various comparisons with the constant "true".
Itagaki Takahiro, with slight modifications.
2010-11-14 21:03:48 -05:00
Tom Lane caaf2e8469 Fix sloppy usage of TRIGGER_FIRED_BEFORE/TRIGGER_FIRED_AFTER.
Various places were testing TRIGGER_FIRED_BEFORE() where what they really
meant was !TRIGGER_FIRED_AFTER(), or vice versa.  This needs to be cleaned
up because there are about to be more than two possible states.

We might want to note this in the 9.1 release notes as something for
trigger authors to double-check.

For consistency's sake I also changed some places that assumed that
TRIGGER_FIRED_FOR_ROW and TRIGGER_FIRED_FOR_STATEMENT are necessarily
mutually exclusive; that's not in immediate danger of breaking, but
it's still sloppier than it should be.

Extracted from Dean Rasheed's patch for triggers on views.  I'm committing
this separately since it's an identifiable separate issue, and is the
only reason for the patch to touch most of these particular files.
2010-10-08 13:27:31 -04:00
Magnus Hagander 9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Robert Haas fd1843ff89 Standardize get_whatever_oid functions for other object types.
- Rename TSParserGetPrsid to get_ts_parser_oid.
- Rename TSDictionaryGetDictid to get_ts_dict_oid.
- Rename TSTemplateGetTmplid to get_ts_template_oid.
- Rename TSConfigGetCfgid to get_ts_config_oid.
- Rename FindConversionByName to get_conversion_oid.
- Rename GetConstraintName to get_constraint_oid.
- Add new functions get_opclass_oid, get_opfamily_oid, get_rewrite_oid,
  get_rewrite_oid_without_relid, get_trigger_oid, and get_cast_oid.

The name of each function matches the corresponding catalog.

Thanks to KaiGai Kohei for the review.
2010-08-05 15:25:36 +00:00
Bruce Momjian 0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Tom Lane b140711643 Fix ts_stat's failure on empty tsvector.
Also insert a couple of Asserts that check for stack overflow.
Bogus coding appears to be new in 8.4 --- older releases had a much
simpler algorithm here.  Per bug #5111.
2009-10-13 14:33:14 +00:00
Peter Eisentraut de160e2c00 Make backend header files C++ safe
This alters various incidental uses of C++ key words to use other similar
identifiers, so that a C++ compiler won't choke outright.  You still
(probably) need extern "C" { }; around the inclusion of backend headers.

based on a patch by Kurt Harriman <harriman@acm.org>

Also add a script cpluspluscheck to check for C++ compatibility in the
future.  As of right now, this passes without error for me.
2009-07-16 06:33:46 +00:00
Bruce Momjian d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Peter Eisentraut 9873db6646 Fix compiler warnings on Sun Studio of the sort
"tsquery_op.c", line 193: warning: syntax error:  empty declaration

Zdenek Kotala
2009-05-27 19:41:58 +00:00
Tom Lane 1cfd9e8834 Fix executor/spi.h to follow our usual conventions for include files, ie,
not include postgres.h nor anything else it doesn't directly need.  Add
#includes to calling files as needed to compensate.  Per my proposal of
yesterday.

This should be noted as a source code change in the 8.4 release notes,
since it's likely to require changes in add-on modules.
2009-01-07 13:44:37 +00:00
Bruce Momjian 511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
Teodor Sigaev 26e6c896c9 Fix compiler warning "res may be used uninitialized in this function".
Actually, it can't but some compilers are not smart enough.
Per Peter Eisentraut gripe.
2008-11-19 10:23:21 +00:00
Teodor Sigaev 25ca5a9a54 Replace plain-memory ordered array by binary tree in ts_stat() function.
Performance is increased from 50% up to 10^3 times depending on data.
2008-11-17 12:17:09 +00:00
Alvaro Herrera a44564b4f8 Fix a case of string building. 2008-11-10 21:49:16 +00:00
Tom Lane e6dbcb72fa Extend GIN to support partial-match searches, and extend tsquery to support
prefix matching using this facility.

Teodor Sigaev and Oleg Bartunov
2008-05-16 16:31:02 +00:00
Tom Lane 635aaab278 Fix tsvector_update_trigger() to be domain-friendly: it needs to allow all
the columns it works with to be domains over the expected type, not just
exactly the expected type.  In passing, fix ts_stat() the same way.
Per report from Markus Wollny.
2008-04-08 18:20:29 +00:00
Tom Lane 220db7ccd8 Simplify and standardize conversions between TEXT datums and ordinary C
strings.  This patch introduces four support functions cstring_to_text,
cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and
two macros CStringGetTextDatum and TextDatumGetCString.  A number of
existing macros that provided variants on these themes were removed.

Most of the places that need to make such conversions now require just one
function or macro call, in place of the multiple notational layers that used
to be needed.  There are no longer any direct calls of textout or textin,
and we got most of the places that were using handmade conversions via
memcpy (there may be a few still lurking, though).

This commit doesn't make any serious effort to eliminate transient memory
leaks caused by detoasting toasted text objects before they reach
text_to_cstring.  We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few
places where it was easy, but much more could be done.

Brendan Jurd and Tom Lane
2008-03-25 22:42:46 +00:00
Bruce Momjian 910bc51862 When text search string is too long, in error message report actual and
maximum number of bytes allowed.
2008-03-05 15:50:37 +00:00
Bruce Momjian 9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00