Hi
I was trying to load a 3.8 billion tokens, 12GB LM in ARPA format using readArrayEncodedLmFromArpa(file, true). However, I ran out of memory even with 40GB memory available.
I noticed in the documentation that there is this CompressedNGramMap which is supposedly in a compressed representation. I have a couple of questions:
1. Does this CompressedNGramMap use less memory than readArrayEncodedLmFromArpa(file, true)?
2. Can this CompressedNGramMap read from ARPA format and how?
Thank you
Yik Jiun