Named Entity Recognition with lower case words?

523 views

Skip to first unread message

Samith Dassanayake

unread,

Jun 1, 2013, 3:12:30 AM6/1/13

to nltk-...@googlegroups.com

Hi all,
I am developing a software which analyze the social media and come with new trends and I am using nltk to achieve my goals.

My question is In nltk when we use the NER feature, if we input the sentence "I live in New York." , it will identify "New York" as a Location.
But if the input is "I live in new york." it wont recognize "new york" as a location.
How to overcome this issue? I would be really helpful if anybody can provide me some directions to achieve this :)

rafa

unread,

Aug 6, 2013, 7:48:52 AM8/6/13

to nltk-...@googlegroups.com

Hi Samith,

I'm currently doing some work that depends on it too.

My approach was to use sentences already labeled to retrain my own ne_chunk() method (using nltk built-in classifiers). In good conditions, you get the best classifier one could get for your purposes, albeit this retraining process take too much time.

If you do this, don't forget that part-of-speech tagging outputs are very different if your input is lowercase. If you have some corpora full with sentences that are more similar to those you want to apply your classifier, it's much better, I achieved good results this way.

But since my training data is considerably noisy, my final classifier is not good enough (most powerful methods to train a classifier are too sensitive to noise). Have you succeeded in overcoming it all or got good results during this 2 months? If yes, I'd really appreciate if you could shed some light on a different approach :)

Reply all

Reply to author

Forward

0 new messages