Bi-gram with Katz-backoff to calculate cross-entropy

255 views
Skip to first unread message

Clem Niem

unread,
Mar 1, 2017, 3:20:02 PM3/1/17
to nltk-users
Hello everybody,

I would like to implement a bi-gram language model with Katz-backoff smoothing, like they did in: No country for old members

They use bi-gram language model with Katz-backoff and for the unigram step a Laplace smoothing of 0.2. They generate a language model for every month and compare user posts to the corresponding "snapshot language models". They do so by calculating the cross-entropie of the user posts bi-grams with the snapshot language models.

Is this currently possible in NLTK? Because as I read, the n-gram Model package is still under construction (since 2013). Or can I use an old Version of NLTK where this was still possible?

I would love to have an in-Python solution, as I looked at kenLM and SRILM but they both are not quite as handy as NLTK would be.

I am grateful for any push in the right direction,
Thanks in Advance!
Clem

Denzil Correa

unread,
Mar 3, 2017, 7:17:09 PM3/3/17
to nltk-...@googlegroups.com
Hi Clem,

There seems to be a way [0]. I am not sure if the solution is correct though [1]. That said, the Katz Backoff model should be easy to implement from scratch yourself. Is there anything you are particularly stuck with?



--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages