Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Message from discussion Comparing 2 similar strings?

View parsed - Show only message text

From: John Machin <sjmac...@lexicon.net>
Newsgroups: comp.unix.shell,comp.lang.awk,comp.lang.python
Subject: Re: Comparing 2 similar strings?
Date: Thu, 19 May 2005 06:38:59 +1000
Message-ID: <3d9n815cpmavos1fl6ts712h9qogdv3fur@4ax.com>
References: <24a6d$428b9c10$d1b717f8$2300@PRIMUS.CA> <428BA05D.1010600@lsupcaemnt.com>
X-Newsreader: Forte Free Agent 2.0/32.652
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
NNTP-Posting-Host: 203.123.88.202
X-Trace: news.eftel.com 1116448737 203.123.88.202 (19 May 2005 04:38:57 +0800)
Lines: 41
Path: g2news1.google.com!news3.google.com!news.glorb.com!nntp.waia.asn.au!203.24.100.2.MISMATCH!news.eftel.com!not-for-mail

On Wed, 18 May 2005 15:06:53 -0500, Ed Morton <mor...@lsupcaemnt.com>
wrote:

>
>
>William Park wrote:
>
>> How do you compare 2 strings, and determine how much they are "close" to
>> each other?  Eg.
>>     aqwerty
>>     qwertyb
>> are similar to each other, except for first/last char.  But, how do I
>> quantify that?
>> 
>> I guess you can say for the above 2 strings that
>>     - at max, 6 chars out of 7 are same sequence --> 85% max
>> 
>> But, for
>>     qawerty
>>     qwerbty
>> max correlation is
>>     - 3 chars out of 7 are the same sequence --> 42% max
>> 
>> (Crossposted to 3 of my favourite newsgroup.)
>>
>
>"However you like" is probably the right answer, but one way might be to 
>compare their soundex encoding 
>(http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?soundex) and figure out 
>percentage difference based on comparing the numeric part.
>

Fantastic suggestion. Here's a tiny piece of real-life test data:

compare the surnames "Mousaferiadis" and "McPherson".







Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google