How to train a new Ngram Model from raw text

56 views
Skip to first unread message

Mohammad Sadegh Rasooli

unread,
Jul 25, 2014, 11:18:08 AM7/25/14
to berkeleyl...@googlegroups.com
Hi,

Thanks for your useful tool.

I looked at your Test java samples but could not find any sample on training a new lm on raw text. I guess I have to create an ARPA file if I want to get the likelihood of my sentences based on training data. Now my question is how I can train a new language model from raw text?

A sample Java code (or code fragment) is highly appreciated.

Thanks

Adam Pauls

unread,
Jul 25, 2014, 12:12:41 PM7/25/14
to berkeleyl...@googlegroups.com
https://code.google.com/p/berkeleylm/source/browse/trunk/examples/make-kneserney-arpa-from-raw-text.sh

Warning: it can take a lot of memory to build an LM from lots of text, even if the resulting model is small.


--
You received this message because you are subscribed to the Google Groups "berkeleylm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to berkeleylm-disc...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages