Here goes
Is anyone able to write a formula to compare 2 values and the result is how
close they match.
Example
Value 1 = "John Smith, High Street, London"
Value 2 = "J Smith , London"
I would like to compare Value 2 with that of Value 1 and have a percentage
return of the match. Such as Value 2 compared to Value 1 is a 60% match. I
know I can do this using VBA and writing a VBA function , but it's slow,
really need a cell formula.
I keep coming up with them dont I !!
thanks
Macroman
David Hager
Excel MVP
Macroman wrote in message <9lhca6$2l0$1...@neptunium.btinternet.com>...
Indeed. The only thing I can figure out is that single letters are treated
as abbreviations for any word beginning with that letter, so the 60% above
would be due to 3 tokens out of 5 (in the longer Value 1) matching up in
both.
No matter. This would be much simpler in VBA. It'd take a monster of an
array formula in a worksheet cell. It should be obvious how to proceed.
Hope this makes sense now. It does appear that VBA is the way to go based on
replies to date , thanks to those whom have replied. I still await in
anticipation of an answer from a real boffin mathmetician/VBA programmer.
thanks
Macroman
"Harlan Grove" <hrl...@aol.com> wrote in message
news:NZWe7.12127$Ki1.9...@bgtnsc06-news.ops.worldnet.att.net...
This function does not account for duplicate characters, does not ignore
spaces and does not account for the order of the characters, but it may get
you started.
Function MatchPcnt(Str1 As String, Str2 As String) As Double
Dim i As Integer
Dim MCount As Integer
For i = 1 To Len(Str1)
If InStr(Str2, Mid(Str1, i, 1)) > 0 Then
MCount = MCount + 1
End If
Next i
MatchPcnt = MCount / Len(Str1)
End Function
HTH
Dick K.
Macroman <macr...@bigfoot.com> wrote in message
news:9lm9cj$43v$1...@uranium.btinternet.com...
This is a problem of specification. 'River' (s1) compared to 'Pigs Liver'
(s2) finding the same characters in the same order - so the initial 'R' in
s1 wouldn't match anything in s2, the 'i' in s1 would match the 'i' in
'Pigs' in s2, and 'ver' at the end of s1 matches 'ver' at the end of s2?
The problem here is how to handle multiple possible matches vs maximal
matching. Finding the maximum number of ordered single character matches
involves backtracking. Implementation isn't trivial. It involves finding
each character in s1 in the progressively smaller right substring of s2
after the most recent character match. For s1 having n characters, it
involves checking s1, then substrings of s1 with 1, 2, 3, . . ., n-1
characters removed from s1 and stopping when an exhaustive match is
achieved.
Example: for s1 = 'abcd' first check if all chars in s1 match chars in s2 in
order, and if not check 'abc', 'abd', 'acd', 'bcd'; if no match yet, check
'ab', 'ac', 'ad', 'bc', 'bd', 'cd'; if mo match yet, check 'a', 'b', 'c',
'd'. In order to handle arbitrarily general cases, this level of
exhaustiveness is necessary.
So do you want maximal ordered matching?
If you do, I couldn't resist seeing if I remembered how. This approach
consists of an 'interface' function called from the worksheet which
preprocesses the original arguments, then calls a recursive function to find
the largest number of ordered matches.
'---- begin VBA ----
'call this fcn from the worksheet
'
Function mp(s1 As String, s2 As String) As Double
Dim t As String
Dim i As Long, n As Long
'ensure s1 is shorter string - remove if undesired
If Len(s1) > Len(s2) Then
t = s1
s1 = s2
s2 = t
End If
n = Len(s1)
i = 1
t = s1
'eliminate chars from s1 that aren't in s2
Do While i <= n
If InStr(s2, Mid(t, i, 1)) > 0 Then
i = i + 1
Else
t = Left(t, i - 1) & Mid(t, i + 1)
n = n - 1
End If
Loop
mp = mpr(t, s2) / CDbl(Len(s1))
End Function
'call this only from mp() and recursively to give the
'number of chars in s1 matching chars in order in s2
'
Private Function mpr(ByVal s1 As String, s2 As String) As Long
Dim i As Long, n1 As Long, p As Long, q As Long, m As Long
n1 = Len(s1)
q = 1
'find each char from s1 in substrings of s2 to
'the right of the previously matched char in s2
For i = 1 To n1
p = InStr(q, s2, Mid(s1, i, 1))
If p > 0 Then q = p + 1 Else Exit For
Next i
If i > n1 Then
'if the For-loop above completed, i = n1 + 1 AND all
'chars in s1 were found in s2 in order, so all done
mpr = n1
Else
'no joy, time for recursion with substrings of s1
For i = 1 To n1
m = mpr(Left(s1, i - 1) & Mid(s1, i + 1), s2)
If m > mpr Then mpr = m 'store longest match found
If mpr = n1 - 1 Then Exit For 'if longest substring matches,
all done
Next i
End If
End Function
'---- end VBA ----
However I have managed to write my own function which strips out all words
in first string and checks each of these words against that in second
string, and it works a treat. I will e-mail it to you.
thanks
Macroman
"Harlan Grove" <Hrl...@aol.com> wrote in message
news:eRlSt2HKBHA.1592@tkmsftngp04...
How long are your strings? I suspect Excel/VBA may have an internal limit of
function call stack space which would limit the number of recursive function
calls.
>However I have managed to write my own function which strips out all words
>in first string and checks each of these words against that in second
>string, and it works a treat. I will e-mail it to you.
I don't open attachments, no matter how well intended.
I have received e-mail with thanks will take a look and compare the speed to
that I have wrote. In anticipation that your will work faster than
mine(chances are it will , cos I'm still learning) I'd like to thank you in
advance.
kind regards
Macroman
"Harlan Grove" <Hrl...@aol.com> wrote in message
news:eRGLHzfKBHA.1944@tkmsftngp02...
I could question the pedagogic value of ranking students, but I'll just
address the problem.
You could do this with an array formula. If the scores were in consecutive
rows in a single column and that range were named scores, then try the
following array formula
=INDEX(ROUND(MMULT(
N(ROW(scores)-CELL("Row",scores)+1
<=TRANSPOSE(ROW(scores)-CELL("Row",scores)+1)),
1/COUNTIF(scores,SMALL(scores,
ROW(scores)-CELL("Row",scores)+1))),0),
MATCH(x,SMALL(scores,
ROW(scores)-CELL("Row",scores)+1)))
where x represents the address of a score in the range.