postgresql/contrib/fuzzystrmatch
Bruce Momjian 7b1f6ffaab Jim C. Nasby wrote:
> Second argument to metaphone is suposed to set the limit on the
> number of characters to return, but it breaks on some phrases:
>
> usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> (select 'Hello world'::varchar AS a) a;
> HLW       | HLWR      | HLWRLT
>
> usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> (select 'A A COMEAUX MEMORIAL'::varchar AS a) a;
  > AKM       | AKMKS     | AKMKSMMRL
>
> In every case I've found that does this, the 4th and 5th letters are
> always 'KS'.

Nice catch.

There was a bug in the original metaphone algorithm from CPAN. Patch
attached (while I was at it I updated my email address, changed the
copyright to PGDG, and removed an unnecessary palloc). Here's how it
looks now:

regression=# select metaphone(a,4) from (select 'A A COMEAUX
MEMORIAL'::varchar AS a) a;
   metaphone
-----------
   AKMK
(1 row)

regression=# select metaphone(a,5) from (select 'A A COMEAUX
MEMORIAL'::varchar AS a) a;
   metaphone
-----------
   AKMKS
(1 row)

Joe Conway
2003-06-24 22:59:46 +00:00
..
Makefile To fix the perpetually broken makefiles in the contrib tree, I have 2001-09-06 10:49:30 +00:00
README.fuzzystrmatch Jim C. Nasby wrote: 2003-06-24 22:59:46 +00:00
README.soundex Sorry - I should have gotten to this sooner. Here's a patch which you should 2001-08-07 18:16:01 +00:00
fuzzystrmatch.c Jim C. Nasby wrote: 2003-06-24 22:59:46 +00:00
fuzzystrmatch.h Jim C. Nasby wrote: 2003-06-24 22:59:46 +00:00
fuzzystrmatch.sql.in Backend support for autocommit removed, per recent discussions. The 2003-05-14 03:26:03 +00:00

README.soundex

NOTE: Modified August 07, 2001 by Joe Conway. Updated for accuracy
	after combining soundex code into the fuzzystrmatch contrib
---------------------------------------------------------------------
The Soundex system is a method of matching similar sounding names
(or any words) to the same code.  It was initially used by the
United States Census in 1880, 1900, and 1910, but it has little use
beyond English names (or the English pronunciation of names), and
it is not a linguistic tool.

The following are some usage examples:

SELECT soundex('hello world!');

CREATE TABLE s (nm text)\g

insert into s values ('john')\g
insert into s values ('joan')\g
insert into s values ('wobbly')\g

select * from s
where soundex(nm) = soundex('john')\g

select nm from s a, s b
where soundex(a.nm) = soundex(b.nm)
and a.oid <> b.oid\g

CREATE FUNCTION text_sx_eq(text, text) RETURNS bool AS
'select soundex($1) = soundex($2)'
LANGUAGE 'sql'\g

CREATE FUNCTION text_sx_lt(text,text) RETURNS bool AS
'select soundex($1) < soundex($2)'
LANGUAGE 'sql'\g

CREATE FUNCTION text_sx_gt(text,text) RETURNS bool AS
'select soundex($1) > soundex($2)'
LANGUAGE 'sql';

CREATE FUNCTION text_sx_le(text,text) RETURNS bool AS
'select soundex($1) <= soundex($2)'
LANGUAGE 'sql';

CREATE FUNCTION text_sx_ge(text,text) RETURNS bool AS
'select soundex($1) >= soundex($2)'
LANGUAGE 'sql';

CREATE FUNCTION text_sx_ne(text,text) RETURNS bool AS
'select soundex($1) <> soundex($2)'
LANGUAGE 'sql';

DROP OPERATOR #= (text,text)\g

CREATE OPERATOR #= (leftarg=text, rightarg=text, procedure=text_sx_eq,
commutator=text_sx_eq)\g

SELECT *
FROM s
WHERE text_sx_eq(nm,'john')\g

SELECT *
from s
where s.nm #= 'john';