Hi,
I am working on a project where I need to recognize the universities from the education section of a resume. I have thought of using NER for the same. I am considering all named entities being tagged as Organizations as educational institutes. However the inbuilt implementation of NER in NLTK isn't giving me desired result.
I have decided to build my own corpus and train the NER for improving the accuracy. However, I am not able to fiind any documentation which shows where the training data should be kept, in what format it should be, how to train the NER with that particular corpus etc.
I have gone through the nltk/chunk/named_entity.py code and while defining its train_paths it refers to corpora/ace_data folder which I'm not able to find. Also that code isn't sufficiently commented for me to understand what to change in order to add my own training data.
I was hoping that you guys could help me out with this problem, or point me to some blog post which could be of help. Thanks in advance.
Regards,
Abhilash Dighe