Commit Graph

14 Commits

Author SHA1 Message Date
Bruce Momjian
b538215d5d Remove a few baby-C macros in fuzzystrmatch. Add a few missing includes. 2006-07-10 18:40:16 +00:00
Bruce Momjian
f3d99d160d Add CVS tag lines to files that were lacking them. 2006-03-11 04:38:42 +00:00
Bruce Momjian
f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Neil Conway
1ac9f0e9f7 The attached patch implements the soundex difference function which
compares two strings' soundex values for similarity, from Kris Jurka.
Also mark the text_soundex() function as STRICT, to avoid crashing
on NULL input.
2005-01-26 08:04:04 +00:00
Bruce Momjian
2daed8c5b3 Update copyrights that were missed. 2005-01-01 05:43:09 +00:00
Bruce Momjian
da9a8649d8 Update copyright to 2004. 2004-08-29 04:13:13 +00:00
Tom Lane
2f9c859ea1 Fix some copyright notices that weren't updated. Improve copyright tool
so it won't miss 'em again.
2003-08-04 23:59:41 +00:00
Bruce Momjian
7b1f6ffaab Jim C. Nasby wrote:
> Second argument to metaphone is suposed to set the limit on the
> number of characters to return, but it breaks on some phrases:
>
> usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> (select 'Hello world'::varchar AS a) a;
> HLW       | HLWR      | HLWRLT
>
> usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> (select 'A A COMEAUX MEMORIAL'::varchar AS a) a;
  > AKM       | AKMKS     | AKMKSMMRL
>
> In every case I've found that does this, the 4th and 5th letters are
> always 'KS'.

Nice catch.

There was a bug in the original metaphone algorithm from CPAN. Patch
attached (while I was at it I updated my email address, changed the
copyright to PGDG, and removed an unnecessary palloc). Here's how it
looks now:

regression=# select metaphone(a,4) from (select 'A A COMEAUX
MEMORIAL'::varchar AS a) a;
   metaphone
-----------
   AKMK
(1 row)

regression=# select metaphone(a,5) from (select 'A A COMEAUX
MEMORIAL'::varchar AS a) a;
   metaphone
-----------
   AKMKS
(1 row)

Joe Conway
2003-06-24 22:59:46 +00:00
Tom Lane
52c9d25933 Be careful to include postgres.h *before* any system headers, to ensure
that the right flavors of largefile-related definitions are seen.
Most of these changes are probably unnecessary, but better safe than
sorry.
2002-09-05 00:43:07 +00:00
Tom Lane
ee051baeac Make sure that all <ctype.h> routines are called with unsigned char
values; it's not portable to call them with signed chars.  I recall doing
this for the last release, but a few more uncasted calls have snuck in.
2001-12-30 23:09:42 +00:00
Bruce Momjian
ea08e6cd55 New pgindent run with fixes suggested by Tom. Patch manually reviewed,
initdb/regression tests pass.
2001-11-05 17:46:40 +00:00
Bruce Momjian
b81844b173 pgindent run on all C files. Java run to follow. initdb/regression
tests pass.
2001-10-25 05:50:21 +00:00
Bruce Momjian
cdd02cdf00 Sorry - I should have gotten to this sooner. Here's a patch which you should
be able to apply against what you just committed. It rolls soundex into
fuzzystrmatch.

Remove soundex/metaphone and merge into fuzzystrmatch.

Joe Conway
2001-08-07 18:16:01 +00:00
Bruce Momjian
d8783c512e Per this discussion, here's a patch to implement both levenshtein() and
metaphone() in a contrib. There seem to be a fair number of different
approaches to both of these algorithms. I used the simplest case for
levenshtein which has a cost  of 1 for any character insertion, deletion, or
substitution. For metaphone, I adapted the same code from CPAN that the PHP
folks did.

A couple of questions:
1. Does it make sense to fold the soundex contrib together with this one?

2. I was debating trying to add multibyte support to levenshtein (it would
make no sense at all for metaphone), but a quick search through the contrib
directory found no hits on the word MULTIBYTE. Should worry about adding
multibyte support to levenshtein()?

Joe Conway
2001-08-07 16:47:43 +00:00