Commit Graph

11 Commits

Author SHA1 Message Date
Neil Conway 1ac9f0e9f7 The attached patch implements the soundex difference function which
compares two strings' soundex values for similarity, from Kris Jurka.
Also mark the text_soundex() function as STRICT, to avoid crashing
on NULL input.
2005-01-26 08:04:04 +00:00
Bruce Momjian 2daed8c5b3 Update copyrights that were missed. 2005-01-01 05:43:09 +00:00
Bruce Momjian da9a8649d8 Update copyright to 2004. 2004-08-29 04:13:13 +00:00
Tom Lane 2f9c859ea1 Fix some copyright notices that weren't updated. Improve copyright tool
so it won't miss 'em again.
2003-08-04 23:59:41 +00:00
Bruce Momjian 7b1f6ffaab Jim C. Nasby wrote:
> Second argument to metaphone is suposed to set the limit on the
> number of characters to return, but it breaks on some phrases:
>
> usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> (select 'Hello world'::varchar AS a) a;
> HLW       | HLWR      | HLWRLT
>
> usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> (select 'A A COMEAUX MEMORIAL'::varchar AS a) a;
  > AKM       | AKMKS     | AKMKSMMRL
>
> In every case I've found that does this, the 4th and 5th letters are
> always 'KS'.

Nice catch.

There was a bug in the original metaphone algorithm from CPAN. Patch
attached (while I was at it I updated my email address, changed the
copyright to PGDG, and removed an unnecessary palloc). Here's how it
looks now:

regression=# select metaphone(a,4) from (select 'A A COMEAUX
MEMORIAL'::varchar AS a) a;
   metaphone
-----------
   AKMK
(1 row)

regression=# select metaphone(a,5) from (select 'A A COMEAUX
MEMORIAL'::varchar AS a) a;
   metaphone
-----------
   AKMKS
(1 row)

Joe Conway
2003-06-24 22:59:46 +00:00
Tom Lane 52c9d25933 Be careful to include postgres.h *before* any system headers, to ensure
that the right flavors of largefile-related definitions are seen.
Most of these changes are probably unnecessary, but better safe than
sorry.
2002-09-05 00:43:07 +00:00
Tom Lane ee051baeac Make sure that all <ctype.h> routines are called with unsigned char
values; it's not portable to call them with signed chars.  I recall doing
this for the last release, but a few more uncasted calls have snuck in.
2001-12-30 23:09:42 +00:00
Bruce Momjian ea08e6cd55 New pgindent run with fixes suggested by Tom. Patch manually reviewed,
initdb/regression tests pass.
2001-11-05 17:46:40 +00:00
Bruce Momjian b81844b173 pgindent run on all C files. Java run to follow. initdb/regression
tests pass.
2001-10-25 05:50:21 +00:00
Bruce Momjian cdd02cdf00 Sorry - I should have gotten to this sooner. Here's a patch which you should
be able to apply against what you just committed. It rolls soundex into
fuzzystrmatch.

Remove soundex/metaphone and merge into fuzzystrmatch.

Joe Conway
2001-08-07 18:16:01 +00:00
Bruce Momjian d8783c512e Per this discussion, here's a patch to implement both levenshtein() and
metaphone() in a contrib. There seem to be a fair number of different
approaches to both of these algorithms. I used the simplest case for
levenshtein which has a cost  of 1 for any character insertion, deletion, or
substitution. For metaphone, I adapted the same code from CPAN that the PHP
folks did.

A couple of questions:
1. Does it make sense to fold the soundex contrib together with this one?

2. I was debating trying to add multibyte support to levenshtein (it would
make no sense at all for metaphone), but a quick search through the contrib
directory found no hits on the word MULTIBYTE. Should worry about adding
multibyte support to levenshtein()?

Joe Conway
2001-08-07 16:47:43 +00:00