Jim C. Nasby wrote:

> Second argument to metaphone is suposed to set the limit on the
> number of characters to return, but it breaks on some phrases:
>
> usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> (select 'Hello world'::varchar AS a) a;
> HLW       | HLWR      | HLWRLT
>
> usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> (select 'A A COMEAUX MEMORIAL'::varchar AS a) a;
  > AKM       | AKMKS     | AKMKSMMRL
>
> In every case I've found that does this, the 4th and 5th letters are
> always 'KS'.

Nice catch.

There was a bug in the original metaphone algorithm from CPAN. Patch
attached (while I was at it I updated my email address, changed the
copyright to PGDG, and removed an unnecessary palloc). Here's how it
looks now:

regression=# select metaphone(a,4) from (select 'A A COMEAUX
MEMORIAL'::varchar AS a) a;
   metaphone
-----------
   AKMK
(1 row)

regression=# select metaphone(a,5) from (select 'A A COMEAUX
MEMORIAL'::varchar AS a) a;
   metaphone
-----------
   AKMKS
(1 row)

Joe Conway
This commit is contained in:
Bruce Momjian 2003-06-24 22:59:46 +00:00
parent 4b1fe23153
commit 7b1f6ffaab
3 changed files with 14 additions and 7 deletions

View File

@ -3,7 +3,10 @@
*
* Functions for "fuzzy" comparison of strings
*
* Copyright (c) Joseph Conway <joseph.conway@home.com>, 2001;
* Joe Conway <mail@joeconway.com>
*
* Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group
* ALL RIGHTS RESERVED;
*
* levenshtein()
* -------------

View File

@ -3,7 +3,10 @@
*
* Functions for "fuzzy" comparison of strings
*
* Copyright (c) Joseph Conway <joseph.conway@home.com>, 2001;
* Joe Conway <mail@joeconway.com>
*
* Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group
* ALL RIGHTS RESERVED;
*
* levenshtein()
* -------------
@ -221,9 +224,6 @@ metaphone(PG_FUNCTION_ARGS)
if (!(reqlen > 0))
elog(ERROR, "metaphone: Requested Metaphone output length must be > 0");
metaph = palloc(reqlen);
memset(metaph, '\0', reqlen);
retval = _metaphone(str_i, reqlen, &metaph);
if (retval == META_SUCCESS)
{
@ -629,7 +629,8 @@ _metaphone(
/* KS */
case 'X':
Phonize('K');
Phonize('S');
if (max_phonemes == 0 || Phone_Len < max_phonemes)
Phonize('S');
break;
/* Y if followed by a vowel */
case 'Y':

View File

@ -3,7 +3,10 @@
*
* Functions for "fuzzy" comparison of strings
*
* Copyright (c) Joseph Conway <joseph.conway@home.com>, 2001;
* Joe Conway <mail@joeconway.com>
*
* Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group
* ALL RIGHTS RESERVED;
*
* levenshtein()
* -------------