I created Kneser-Ney binary file with LM order = 2 without any problem. Then I got an access to computer with lots of RAM and tried to extend LM order to 4.
For that I used the following code in MakeLmBinaryFromGoogle.main:
final StringWordIndexer wordIndexer = new StringWordIndexer();
GoogleLmReader.addToIndexer(wordIndexer, googleDir+"/1gms/vocab_cs.gz");
final ArrayEncodedNgramLanguageModel<String> lm = LmReaders.readLmFromGoogleNgramDir(googleDir, false, true, wordIndexer, new ConfigOptions());
During computation I got the following error:
Exception in thread "main" java.lang.AssertionError
at edu.berkeley.nlp.lm.io.KneserNeyLmReaderCallback.getLowerOrderBackoff(KneserNeyLmReaderCallback.java:211)
at edu.berkeley.nlp.lm.io.KneserNeyLmReaderCallback.getProbBackoff(KneserNeyLmReaderCallback.java:342)
at edu.berkeley.nlp.lm.io.KneserNeyLmReaderCallback.parse(KneserNeyLmReaderCallback.java:306)
at edu.berkeley.nlp.lm.io.KneserNeyLmReaderCallback.parse(KneserNeyLmReaderCallback.java:37)
at edu.berkeley.nlp.lm.io.LmReaders.firstPassCommon(LmReaders.java:553)
at edu.berkeley.nlp.lm.io.LmReaders.firstPassArpa(LmReaders.java:530)
at edu.berkeley.nlp.lm.io.LmReaders.readArrayEncodedLmFromArpa(LmReaders.java:171)
at edu.berkeley.nlp.lm.io.LmReaders.readLmFromGoogleNgramDir(LmReaders.java:224)
at edu.berkeley.nlp.lm.io.MakeLmBinaryFromGoogle.main(MakeLmBinaryFromGoogle.java:99)
In out stream I found the following:
...
Counting values {
Writing Kneser-Ney probabilities {
Counting counts for order 0 {
} [0s]
Counting counts for order 1 {
} [7s]
Counting counts for order 2 {
} [8s]
Counting counts for order 3 {
} [4s]
On order 1
Writting line 1
...
On order 2
Writting line ...
...
However I haven't any problems whlie generating Stupid Backoff binary file (LM order = 4) from the same <n>gms folders.
Have you any idea of the cause of the problem?