Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Find Unique Records; Must Use Fuzzy Logic

180 views
Skip to first unread message

ryguy7272

unread,
Oct 2, 2007, 9:06:00 AM10/2/07
to
My supervisor and I are trying to come up with a way of finding and
eliminating, all duplicates from a large Excel worksheet which came from BCM
in Outlook. I am contemplating a few different ways of doing this. I found
a free utility on www.downloads.com that allows a user to do a ‘complex
filter’ to eliminate duplicates, but it’s not really giving me the results I
am looking for. I sorted by Company (Column F) and then by last name (Column
D) and then by first name (Column B). I am now wondering if there is a way
to use some kind of fuzzy logic to do a search for values that are almost
unique and hide the remainder of the rows, or almost duplicate and hide the
remainder of the rows. The issue is that Excel identifies lots of ‘unique’
records because it identifies two people with two different office addresses
as two different records, but for our purposes this is one contact.
Similarly, a contact’s name could be spelled Freuh, in the personal contacts
part of BCM, and the name could also be spelled in Frueh, in the public
contacts part of BCM. Again, these are two ‘unique’ records, but again, for
our purposes this is one contact. All company names are listed in Column F,
all last names are in Column D, and all first names are in Column B. I would
like to copy/paste all data on all rows with unique records (F, D, & B) to a
new sheet, or hide all rows with dupes. Any ideas? Would Access be able to
handle this?

Thanks, as always!
Ryan---


--
RyGuy

Tom Ogilvy

unread,
Oct 2, 2007, 9:45:00 AM10/2/07
to
While your waiting for a good answer, as an excercise -
Mark the uniques and for the duplicates, list which unique it duplicates:

Frehu
Freuh
Erueh
Fruit
Freh
Rfeuh
Ereuh
Feurh
Fruhe
Furth
Feruh
Ferhu
Fruh
Frueh
Fureh

Now right down the rules you used to do it - define "our purposes". then
maybe someone will have some ideas.

--
Regards,
Tom Ogilvy

Zone

unread,
Oct 2, 2007, 12:03:03 PM10/2/07
to
Harlan Grove sent me this function years ago when I was trying to compare
two strings to see how similar they were. I also included a simple sub to
run it. It returns a number from 0 to 1, depending on how similar it thinks
the strings are. Works good. The three arguments are the strings to be
compared and a boolean that tells it whether to match the whole string or
any part of it. I haven't used it in a while now, so I'd advise just
setting the 3rd argument to True for now. Hope this helps! James

Sub TrySimilar()
MsgBox StrSimilar("Frued", "Freud", True)
End Sub

Function StrSimilar(s1 As String, s2 As String, FindWhole As Boolean) As
Double
'function returns a numerical grade for 2 similar strings, 1.00 being
perfect
'by H Grove
Dim I As Long, j As Long, k As Long, n(2) As Long
Dim c1 As String, c2 As String
Const alphanum As String = "1234567890abcdefghijklmnopqrstuvwxyz "
s1 = LCase(s1)
For I = 1 To Len(s1)
If Not InStr(alphanum, Mid(s1, I, 1)) > 0 Then Mid(s1, I, 1) = " "
Next I
s1 = Application.WorksheetFunction.Trim(s1)
s2 = LCase(s2)
For j = 1 To Len(s2)
If Not InStr(alphanum, Mid(s2, j, 1)) > 0 Then Mid(s2, j, 1) = " "
Next j
s2 = Application.WorksheetFunction.Trim(s2)
j = 1
n(1) = 0
For I = 1 To Len(s1)
c1 = LCase(Mid(s1, I, 1))
k = 0
Do
c2 = LCase(Mid(s2, j + k, 1))
k = k + 1
Loop Until j + k > Len(s2) Or c1 = c2
If c1 = c2 Then
n(1) = n(1) + 1
If j < Len(s2) Then j = j + 1 Else Exit For
End If
Next I
I = 1
n(2) = 0
For j = 1 To Len(s2)
c2 = LCase(Mid(s2, j, 1))
k = 0
Do
c1 = LCase(Mid(s1, I + k, 1))
k = k + 1
Loop Until I + k > Len(s1) Or c1 = c2
If c1 = c2 Then
n(2) = n(2) + 1
If I < Len(s1) Then I = I + 1 Else Exit For
End If
Next j
If FindWhole Then
StrSimilar = CDbl(Application.WorksheetFunction.Min(n(1), n(2))) _
/ CDbl(Application.WorksheetFunction.Max(Len(s1), Len(s2)))
Else
StrSimilar = CDbl(n(2)) / Len(s2)
End If
End Function


"Tom Ogilvy" <TomO...@discussions.microsoft.com> wrote in message
news:F1F17B99-974C-457C...@microsoft.com...

Tom Ogilvy

unread,
Oct 2, 2007, 12:36:03 PM10/2/07
to
MsgBox StrSimilar("Freud", "Frdue", True)
gives a 0.8 as well.

--
Regards,
Tom Ogilvy

Zone

unread,
Oct 2, 2007, 1:19:18 PM10/2/07
to
Tom, so you test-drove it, huh? Four matching letters out of five would be
80%, so that seems reasonable. James

"Tom Ogilvy" <TomO...@discussions.microsoft.com> wrote in message
news:CA1C29B7-2105-4E26...@microsoft.com...

ryguy7272

unread,
Oct 2, 2007, 3:41:00 PM10/2/07
to
Thanks Tom, and thanks Zone, sweet function. However, I think I need to do
this thing (somewhat the old fashioned way). I have had ASAP utilities for
quite some time, but haven't used it much. I am using it now, and I am
seeing that it is quite helpful for this current project, but it will take
several iterations of applying filters, and doing sorting, and some other
things. Anyway, here is a link for others who want to obtain this valuable
utility:
http://www.asap-utilities.com/download-asap-utilities.php


Regards,
Ryan--

--
RyGuy

Zone

unread,
Oct 2, 2007, 3:49:36 PM10/2/07
to
You're welcome. Good luck with that. James
"ryguy7272" <rygu...@discussions.microsoft.com> wrote in message
news:0A1E00EF-D66E-465D...@microsoft.com...

ryguy7272

unread,
Oct 2, 2007, 5:28:00 PM10/2/07
to
One more thing...for anyone who stumbles upon this post sometime in the future.
I used this utility (below), along with the other one (above), and got
pretty darn close to where I wanted to be.

http://members.iinet.net.au/~brettdj/#Partial_Column_Match

0 new messages