|Name matching||Bala||12/29/10 11:03 AM|
For my NLP project, I need to write a program to match two sets of names.
For example: I need to identify below matchings.
Barack Obama - Democrat Barack Obama
Barack Obama - Mr. Obama
Barack Obama - Senator Barack Obama
I thought of first stripping off the Prefixes like "Democrat", "Senator" etc and then using some variant of edit distance algorithm.
Is there a way nltk can tell me that Democrat, Senator etc are not part of the name, so that I can strip them off.
PS: Is there a good opensource name matching software available.
|Re: Name matching||James Smith||12/30/10 9:04 AM|
You might try named entity recognition. nltk.ne_chunk should do the
You'll need to tokenize and tag the sentence that you'll be chunking.
The chunker will then return a Tree which will look something like
(PERSON Barack/NNP Obama/NNP)
(PERSON George/NNP Bush/NNP)
You can now work on the named entities without their titles and use
NLTKs distance module.
Hope this helps.
|Re: Name matching||Bala||12/30/10 12:09 PM|
Thanks for the reply.
I agree with you that NE with work, but I don't have a sentence. I have just a list of entities. Are there atleast some heuristics that can be applied.
|Re: [nltk-users] Name matching||Ewan Klein||12/30/10 4:09 AM|
I haven't kept up to date on this topic, but Olga Uryupina wrote a nice paper comparing various methods for this task; see