perplexity score nltk

442 views

Skip to first unread message

Shengrong Liu

unread,

Jul 23, 2013, 8:50:10 PM7/23/13

to nltk-...@googlegroups.com

Dear All,

I did not use nltk package for unigram model, and I had nearly 100 lines. It will be a lot complicated for bigram or trigram. I heard that NLTK can calculate perplexity score.

This is so far I have

import nltk

from nltk.model.ngram import NgramModel

from nltk.probability import LidstoneProbDist

f_in= open(r'C:\Python27\A.txt')

ln = f_in.read()

words = nltk.word_tokenize(ln)

my_bigrams = nltk.bigrams(words)

my_trigrams = nltk.trigrams(words)

estimator = lambda fdist, bins: LidstoneProbDist(fdist, 0.2)

what will be the next？

what is the estimator？

Is there any step I need to do before use .perplexity()

Thanks

Reply all

Reply to author

Forward

0 new messages