Bi-gram with Katz-backoff to calculate cross-entropy

255 views

Skip to first unread message

Clem Niem

unread,

Mar 1, 2017, 3:20:02 PM3/1/17

to nltk-users

Hello everybody,

I would like to implement a bi-gram language model with Katz-backoff smoothing, like they did in: No country for old members

They use bi-gram language model with Katz-backoff and for the unigram step a Laplace smoothing of 0.2. They generate a language model for every month and compare user posts to the corresponding "snapshot language models". They do so by calculating the cross-entropie of the user posts bi-grams with the snapshot language models.

Is this currently possible in NLTK? Because as I read, the n-gram Model package is still under construction (since 2013). Or can I use an old Version of NLTK where this was still possible?

I would love to have an in-Python solution, as I looked at kenLM and SRILM but they both are not quite as handy as NLTK would be.

I am grateful for any push in the right direction,

Thanks in Advance!

Clem

Denzil Correa

unread,

Mar 3, 2017, 7:17:09 PM3/3/17

to nltk-...@googlegroups.com

Hi Clem,

There seems to be a way [0]. I am not sure if the solution is correct though [1]. That said, the Katz Backoff model should be easy to implement from scratch yourself. Is there anything you are particularly stuck with?

[0] http://www.nltk.org/api/nltk.model.html#nltk.model.ngram.NgramModel.prob

[1] https://github.com/nltk/nltk/issues/1342

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages