perplexity score nltk

442 views
Skip to first unread message

Shengrong Liu

unread,
Jul 23, 2013, 8:50:10 PM7/23/13
to nltk-...@googlegroups.com
Dear All,

I did not use nltk package for unigram model, and I had nearly 100 lines. It will be a lot complicated for bigram or trigram. I heard that NLTK can calculate perplexity score. 

This is so far I have 

import nltk
from nltk.model.ngram import NgramModel
from nltk.probability import LidstoneProbDist

f_in= open(r'C:\Python27\A.txt')
ln = f_in.read()    

words = nltk.word_tokenize(ln)
my_bigrams = nltk.bigrams(words)
my_trigrams = nltk.trigrams(words)



estimator = lambda fdist, bins: LidstoneProbDist(fdist, 0.2)


what will be the next? 
what is the estimator? 


Is there any step I need to do before use .perplexity()

Thanks

Reply all
Reply to author
Forward
0 new messages