Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Using Soundex to identify possible duplicates

528 views
Skip to first unread message

Robin9876

unread,
Jun 11, 2007, 10:53:29 AM6/11/07
to
In a table that has person forename and surname, is it possible to use
soundex to identify for every row in the table what similar matches
there are in the same table?

Russell Fields

unread,
Jun 11, 2007, 1:22:54 PM6/11/07
to
Robin,

You can try something like:

SELECT DISTINCT n.SurName, SOUNDEX(n.SurName) soundslike
FROM MyTable n
JOIN (SELECT SOUNDEX(SurName) Sounds FROM MyTable
GROUP BY SOUNDEX(SurName)
HAVING COUNT(*) > 1) AS x
ON SOUNDEX(n.SurName) = x.Sounds
ORDER BY SoundsLike, n.SurName

You could also rewrite the ON clause to use the DIFFERENCE operator, if you
want some fuzzier measures. If your name database is of any size at all,
you should expect a lot of matches.

RLF

"Robin9876" <robi...@hotmail.com> wrote in message
news:1181573609....@g4g2000hsf.googlegroups.com...

--CELKO--

unread,
Jun 11, 2007, 1:41:15 PM6/11/07
to
>> is it possible to use Soundex to identify for every row in the table what similar matches there are in the same table? <<

Don't do it. Name handling is an ugly problem and if you have to do
this on a regular basis get a package designed for this kind of work.
Some companies are Group 1 Software, SSA (used to have a great booklet
on this topic), Melissa Data Corporation and Promark Software Inc.

Their software handles mailing lists and you can get a review copy
from Melissa Data. They do not choke on names like "John Paul van der
Poon" and worse.

0 new messages