Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Percentage Match

1,103 views
Skip to first unread message

Macroman

unread,
Aug 16, 2001, 5:07:47 PM8/16/01
to
THis may be a toughie for the MVP's I'm hoping somebody has already
requested this and a solution has been found.

Here goes

Is anyone able to write a formula to compare 2 values and the result is how
close they match.

Example

Value 1 = "John Smith, High Street, London"
Value 2 = "J Smith , London"

I would like to compare Value 2 with that of Value 1 and have a percentage
return of the match. Such as Value 2 compared to Value 1 is a 60% match. I
know I can do this using VBA and writing a VBA function , but it's slow,
really need a cell formula.

I keep coming up with them dont I !!

thanks

Macroman


David Hager

unread,
Aug 16, 2001, 5:07:21 PM8/16/01
to
Maybe you can first explain how your example is a 60% match.

David Hager
Excel MVP

Macroman wrote in message <9lhca6$2l0$1...@neptunium.btinternet.com>...

Harlan Grove

unread,
Aug 16, 2001, 5:32:29 PM8/16/01
to
David Hager <10407...@compuserve.com> wrote...

>Maybe you can first explain how your example is a 60% match.
...

>Macroman wrote in message <9lhca6$2l0$1...@neptunium.btinternet.com>...
>>Value 1 = "John Smith, High Street, London"
>>Value 2 = "J Smith , London"
>>
>>I would like to compare Value 2 with that of Value 1 and have a percentage
>>return of the match. Such as Value 2 compared to Value 1 is a 60% match.
I
...

Indeed. The only thing I can figure out is that single letters are treated
as abbreviations for any word beginning with that letter, so the 60% above
would be due to 3 tokens out of 5 (in the longer Value 1) matching up in
both.

No matter. This would be much simpler in VBA. It'd take a monster of an
array formula in a worksheet cell. It should be obvious how to proceed.


Jimmy L. Day

unread,
Aug 16, 2001, 5:23:41 PM8/16/01
to
I believe this would be a "toughie" for anyone, and my hat is off to you for
your brilliant deduction that your example is 60% matching. How do you know?

Otto Moehrbach

unread,
Aug 16, 2001, 5:29:59 PM8/16/01
to
Ditto. Otto
"Jimmy L. Day" <gjb...@pnoyrbar.argf> wrote in message
news:3B7C39DC...@pnoyrbar.argf...

Macroman

unread,
Aug 18, 2001, 1:48:36 PM8/18/01
to
The 60% is just a figure off the top of my head , the formula would be
something along the lines of how many characters in the first value are in
the second value and are they in the same order. So if comparing Liver with
Pigs Liver Thing I would get 100% match , yet if comparing River with Pigs
Liver Thing I'd get 4 out of 5 which is 80% .

Hope this makes sense now. It does appear that VBA is the way to go based on
replies to date , thanks to those whom have replied. I still await in
anticipation of an answer from a real boffin mathmetician/VBA programmer.

thanks

Macroman
"Harlan Grove" <hrl...@aol.com> wrote in message
news:NZWe7.12127$Ki1.9...@bgtnsc06-news.ops.worldnet.att.net...

Dick Kusleika

unread,
Aug 18, 2001, 2:15:27 PM8/18/01
to
Macroman

This function does not account for duplicate characters, does not ignore
spaces and does not account for the order of the characters, but it may get
you started.

Function MatchPcnt(Str1 As String, Str2 As String) As Double
Dim i As Integer
Dim MCount As Integer

For i = 1 To Len(Str1)
If InStr(Str2, Mid(Str1, i, 1)) > 0 Then
MCount = MCount + 1
End If
Next i

MatchPcnt = MCount / Len(Str1)

End Function

HTH
Dick K.

Macroman <macr...@bigfoot.com> wrote in message
news:9lm9cj$43v$1...@uranium.btinternet.com...

Harlan Grove

unread,
Aug 18, 2001, 11:33:01 PM8/18/01
to
Macroman <macr...@bigfoot.com> wrote...

>The 60% is just a figure off the top of my head , the formula would be
>something along the lines of how many characters in the first value are in
>the second value and are they in the same order. So if comparing Liver
with
>Pigs Liver Thing I would get 100% match , yet if comparing River with Pigs
>Liver Thing I'd get 4 out of 5 which is 80% .
>
>Hope this makes sense now. It does appear that VBA is the way to go based
on
>replies to date , thanks to those whom have replied. I still await in
>anticipation of an answer from a real boffin mathmetician/VBA programmer.
...

This is a problem of specification. 'River' (s1) compared to 'Pigs Liver'
(s2) finding the same characters in the same order - so the initial 'R' in
s1 wouldn't match anything in s2, the 'i' in s1 would match the 'i' in
'Pigs' in s2, and 'ver' at the end of s1 matches 'ver' at the end of s2?

The problem here is how to handle multiple possible matches vs maximal
matching. Finding the maximum number of ordered single character matches
involves backtracking. Implementation isn't trivial. It involves finding
each character in s1 in the progressively smaller right substring of s2
after the most recent character match. For s1 having n characters, it
involves checking s1, then substrings of s1 with 1, 2, 3, . . ., n-1
characters removed from s1 and stopping when an exhaustive match is
achieved.

Example: for s1 = 'abcd' first check if all chars in s1 match chars in s2 in
order, and if not check 'abc', 'abd', 'acd', 'bcd'; if no match yet, check
'ab', 'ac', 'ad', 'bc', 'bd', 'cd'; if mo match yet, check 'a', 'b', 'c',
'd'. In order to handle arbitrarily general cases, this level of
exhaustiveness is necessary.

So do you want maximal ordered matching?


Harlan Grove

unread,
Aug 19, 2001, 3:18:13 AM8/19/01
to
Harlan Grove <Hrl...@aol.com> wrote...
...

> So do you want maximal ordered matching?

If you do, I couldn't resist seeing if I remembered how. This approach
consists of an 'interface' function called from the worksheet which
preprocesses the original arguments, then calls a recursive function to find
the largest number of ordered matches.


'---- begin VBA ----
'call this fcn from the worksheet
'
Function mp(s1 As String, s2 As String) As Double
Dim t As String
Dim i As Long, n As Long

'ensure s1 is shorter string - remove if undesired
If Len(s1) > Len(s2) Then
t = s1
s1 = s2
s2 = t
End If

n = Len(s1)
i = 1
t = s1

'eliminate chars from s1 that aren't in s2
Do While i <= n
If InStr(s2, Mid(t, i, 1)) > 0 Then
i = i + 1
Else
t = Left(t, i - 1) & Mid(t, i + 1)
n = n - 1
End If
Loop

mp = mpr(t, s2) / CDbl(Len(s1))
End Function


'call this only from mp() and recursively to give the
'number of chars in s1 matching chars in order in s2
'
Private Function mpr(ByVal s1 As String, s2 As String) As Long
Dim i As Long, n1 As Long, p As Long, q As Long, m As Long

n1 = Len(s1)
q = 1

'find each char from s1 in substrings of s2 to
'the right of the previously matched char in s2
For i = 1 To n1
p = InStr(q, s2, Mid(s1, i, 1))
If p > 0 Then q = p + 1 Else Exit For
Next i

If i > n1 Then
'if the For-loop above completed, i = n1 + 1 AND all
'chars in s1 were found in s2 in order, so all done
mpr = n1
Else
'no joy, time for recursion with substrings of s1
For i = 1 To n1
m = mpr(Left(s1, i - 1) & Mid(s1, i + 1), s2)
If m > mpr Then mpr = m 'store longest match found
If mpr = n1 - 1 Then Exit For 'if longest substring matches,
all done
Next i
End If
End Function
'---- end VBA ----

Macroman

unread,
Aug 20, 2001, 4:41:11 PM8/20/01
to
This code sometimes freezes, seems to be stuck in a loop.

However I have managed to write my own function which strips out all words
in first string and checks each of these words against that in second
string, and it works a treat. I will e-mail it to you.

thanks

Macroman
"Harlan Grove" <Hrl...@aol.com> wrote in message
news:eRlSt2HKBHA.1592@tkmsftngp04...

Harlan Grove

unread,
Aug 21, 2001, 1:00:36 AM8/21/01
to
Macroman <macr...@bigfoot.com> wrote...

>This code sometimes freezes, seems to be stuck in a loop.

How long are your strings? I suspect Excel/VBA may have an internal limit of
function call stack space which would limit the number of recursive function
calls.

>However I have managed to write my own function which strips out all words
>in first string and checks each of these words against that in second
>string, and it works a treat. I will e-mail it to you.

I don't open attachments, no matter how well intended.


Macroman

unread,
Aug 21, 2001, 2:59:49 PM8/21/01
to
Length of characters in each string are not that long , the purpose of the
function was for address details.

I have received e-mail with thanks will take a look and compare the speed to
that I have wrote. In anticipation that your will work faster than
mine(chances are it will , cos I'm still learning) I'd like to thank you in
advance.

kind regards

Macroman
"Harlan Grove" <Hrl...@aol.com> wrote in message

news:eRGLHzfKBHA.1944@tkmsftngp02...

Lawrence Coomber

unread,
Aug 23, 2001, 8:07:02 PM8/23/01
to
When I RANK() my students scores, some ranking values are missing if there
are duplicate scores in the Range being ranked. For example scores: 17,19,
23,17,26,23 will RANK as 5,4,2,5,1,2 The 3 is missing. A better function for
my application would RANK these scores as 4,3,2,4,1,2 I can handle equal
rankings but not missing ranking positions, as I use this ranking as a
parameter in other function calls and it is distorted under the current
Excel RANK function.


Harlan Grove

unread,
Aug 24, 2001, 3:09:51 AM8/24/01
to
Lawrence Coomber <flow...@watercard.com.au> wrote...

I could question the pedagogic value of ranking students, but I'll just
address the problem.

You could do this with an array formula. If the scores were in consecutive
rows in a single column and that range were named scores, then try the
following array formula

=INDEX(ROUND(MMULT(
N(ROW(scores)-CELL("Row",scores)+1
<=TRANSPOSE(ROW(scores)-CELL("Row",scores)+1)),
1/COUNTIF(scores,SMALL(scores,
ROW(scores)-CELL("Row",scores)+1))),0),
MATCH(x,SMALL(scores,
ROW(scores)-CELL("Row",scores)+1)))

where x represents the address of a score in the range.


0 new messages