Name matching

Showing 1-4 of 4 messages
Name matching Bala 12/29/10 11:03 AM
Hi,

For my NLP project, I need to write a program to match two sets of names. 

For example: I need to identify below matchings.
Barack Obama - Democrat Barack Obama
Barack Obama - Mr. Obama
Barack Obama - Senator Barack Obama

I thought of first stripping off the Prefixes like "Democrat", "Senator" etc and then using some variant of edit distance algorithm.

Is there a way nltk can tell me that Democrat, Senator etc are not part of the name, so that I can strip them off.

Thank you
Bala

PS: Is there a good opensource name matching software available.

Re: Name matching James Smith 12/30/10 9:04 AM
Hi Bala,

You might try named entity recognition. nltk.ne_chunk should do the
trick.

You'll need to tokenize and tag the sentence that you'll be chunking.
The chunker will then return a Tree which will look something like
this:
(S
  Senator/NNP
  (PERSON Barack/NNP Obama/NNP)
  has/VBZ
  beaten/VBN
  President/NNP
  (PERSON George/NNP Bush/NNP)
  in/IN
  the/DT
  recent/JJ
  elections/NNS
  ./.)

You can now work on the named entities without their titles and use
NLTKs distance module.

Hope this helps.

James.
Re: Name matching Bala 12/30/10 12:09 PM
Thanks for the reply.

I agree with you that NE with work, but I don't have a sentence. I have just a list of entities. Are there atleast some heuristics that can be applied.

-- Bala
Re: [nltk-users] Name matching Ewan Klein 12/30/10 4:09 AM
Hi Bala,

I haven't kept up to date on this topic, but Olga Uryupina wrote a nice paper comparing various methods for this task; see

http://www.coli.uni-saarland.de/~ourioupi/ury_lrec_fin.ps


Regards,

Ewan