IRSTLM prune-lm costs very long time for 4GB language model.

220 views
Skip to first unread message

4151...@qq.com

unread,
Oct 26, 2016, 5:04:11 AM10/26/16
to kaldi-help
Hi, everyone!
    I have a language model, i.e. 3-gram, total 4 GB, iARPA format, and the details:
I want to prune it with IRSTLM's prune-lm function. But I found it is very slow, the "lmt.load(...) in prune-lm.cpp" costs much time when it loads the gram 3, the gram 1 and gram 2 are both read very quickly. 
I found it costs 10 minutes to read 500000 lines (gram 3) . 

My machine has 6 cores (12 threads) and 64GB memory. The code only runs on a single core.

Can you give me some suggestions?

I have also trained a 80GB 3-gram, and I found it is much harder to perform pruning with IRSTLM.


Jan Trmal

unread,
Oct 26, 2016, 11:36:51 AM10/26/16
to kaldi-help
Yes, loading huge models can take some time. The same with any sufficiently complex operation on those.
You can try to use srilm or perhaps kenlm, but I dont thing you will see huge gains.
y.

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages