Name matching

218 views
Skip to first unread message

Bala

unread,
Dec 29, 2010, 2:03:40 PM12/29/10
to nltk-...@googlegroups.com
Hi,

For my NLP project, I need to write a program to match two sets of names. 

For example: I need to identify below matchings.
Barack Obama - Democrat Barack Obama
Barack Obama - Mr. Obama
Barack Obama - Senator Barack Obama

I thought of first stripping off the Prefixes like "Democrat", "Senator" etc and then using some variant of edit distance algorithm.

Is there a way nltk can tell me that Democrat, Senator etc are not part of the name, so that I can strip them off.

Thank you
Bala

PS: Is there a good opensource name matching software available.

James Smith

unread,
Dec 30, 2010, 12:04:40 PM12/30/10
to nltk-users
Hi Bala,

You might try named entity recognition. nltk.ne_chunk should do the
trick.

You'll need to tokenize and tag the sentence that you'll be chunking.
The chunker will then return a Tree which will look something like
this:
(S
Senator/NNP
(PERSON Barack/NNP Obama/NNP)
has/VBZ
beaten/VBN
President/NNP
(PERSON George/NNP Bush/NNP)
in/IN
the/DT
recent/JJ
elections/NNS
./.)

You can now work on the named entities without their titles and use
NLTKs distance module.

Hope this helps.

James.

Bala

unread,
Dec 30, 2010, 3:09:56 PM12/30/10
to nltk-...@googlegroups.com
Thanks for the reply.

I agree with you that NE with work, but I don't have a sentence. I have just a list of entities. Are there atleast some heuristics that can be applied.

-- Bala

Ewan Klein

unread,
Dec 30, 2010, 7:09:44 AM12/30/10
to nltk-...@googlegroups.com
Hi Bala,

I haven't kept up to date on this topic, but Olga Uryupina wrote a nice paper comparing various methods for this task; see

http://www.coli.uni-saarland.de/~ourioupi/ury_lrec_fin.ps


Regards,

Ewan

Reply all
Reply to author
Forward
0 new messages