I downloaded the POS tagger library from Stanford NLP
http://nlp.stanford.edu/software/tagger.shtml
It has inbuilt tools for tagging, and its pretty simple to tag words
using those tools.
The main decision that we have to make is which words exactly would be
counted as "keywords".
The library uses the "Pen Treebank tag set" for tagging
http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CQP-HTMLDemo/PennTreebankTS.html
The automatic tagger works in the following manner:
INPUT:
A passenger plane has crashed shortly after take-off from Kyrgyzstan's
capital,
OUTPUT:
A_DT passenger_NN plane_NN has_VBZ crashed_VBN shortly_RB after_IN
take-off_NN from_IN Kyrgyzstan_NNP 's_POS capital_NN ,_,
Using the treebank tag set we can figure out what is what and select
the keywords.
Keywords
---------------
Please advise on what should be the keywords?
Nouns, Adjectives ......
On Jul 27, 10:51 pm, Dheeraj Rajagopal <
dheeraj.go...@gmail.com>
wrote:
> this is what we have done so far ,
>
> given a question , i have coded some basic steps : ( we shall improve this
> further )
>
> 1. removed the stop words (
http://en.wikipedia.org/wiki/Stop_words)
> 2. For now , I am assuming that the other words excluding the stop-words are
> the keywords
> 3 . I have done POS tagging for the sentence and extract the POS for the
> keywords .
>
> I want someone to implement the same in Stanford-NLP library . we need to
> compare their results .
>
> the knowledge base we will use is freebase <
http://www.freebase.com/>
>
> I am attaching the freebase paper along with this mail .
>
> Now , we have to do the following .
>
> we need to compare the keywords with the help of the POS tag and search
> the knowledge base and come up with something that is on the database .
> Doesnt matter whether it is right or wrong . we need to evaluate what we are
> getting and we shall decide about what to do further .
>
> anyine who would like to do this , may reply to this mail . I will help you
> with the steps .
>
> --
> Regards
>
> Dheeraj
>
> freebase.pdf
> 443KViewDownload