--
You received this message because you are subscribed to the Google Groups "berkeleylm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to berkeleylm-disc...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Unfortunately, building a KN LM is very memory intensive in Berkeley LM. The numbers you give seem high, but not all that high. However, the final model after training should be compact.One option is to build with SRILM (which uses disk instead of memory) and see if the counts are similar.Sorry, I'm sure that's not the most satisfactory answer.
On Mon, Jul 21, 2014 at 5:39 AM, Oren Melamud wrote:
Update:
I tried this with 3-grams and managed to build the LM using about 60 GB of RAM.
These are the ngram counts that I got, which seem pretty high relatively to the counts you report for WMT2010 on your paper.
ngram 1=100004
ngram 2=50032413
ngram 3=363244566
Any ideas how to build this LM with 4-grams or 5-grams without increasing RAM requirements?
On Friday, July 18, 2014 9:45:49 PM UTC+3, Oren Melamud wrote:Hi Adam,
I've used your LM toolkit in the past and it was very helpful. Thanks for sharing this!
This time I'm trying to train a 5-gram Kneser-Ney LM on a larger corpus, which includes over 2 billion words.
I'm running this on a linux machine with 48GB memory allocated for this task as follows:
java -ea -Xmx48000m -server -cp berkeleylm.jar edu.berkeley.nlp.lm.io.MakeKneserNeyArpaFromText 5 ukwac.5gram.arpa ukwac.txt
Unfortunately, I get a memory exception after reading only about 10% of the lines in the corpus:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
Any ideas what I could be doing wrong?
Thanks,
Oren.
--
You received this message because you are subscribed to the Google Groups "berkeleylm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to berkeleylm-discuss+unsub...@googlegroups.com.