IRSTLM prune-lm costs very long time for 4GB language model.

220 views

Skip to first unread message

4151...@qq.com

unread,

Oct 26, 2016, 5:04:11 AM10/26/16

to kaldi-help

Hi, everyone!

I have a language model, i.e. 3-gram, total 4 GB, iARPA format, and the details:

I want to prune it with IRSTLM's prune-lm function. But I found it is very slow, the "lmt.load(...) in prune-lm.cpp" costs much time when it loads the gram 3, the gram 1 and gram 2 are both read very quickly.

I found it costs 10 minutes to read 500000 lines (gram 3) .

My machine has 6 cores (12 threads) and 64GB memory. The code only runs on a single core.

Can you give me some suggestions?

I have also trained a 80GB 3-gram, and I found it is much harder to perform pruning with IRSTLM.

Jan Trmal

unread,

Oct 26, 2016, 11:36:51 AM10/26/16

to kaldi-help

Yes, loading huge models can take some time. The same with any sufficiently complex operation on those.

You can try to use srilm or perhaps kenlm, but I dont thing you will see huge gains.

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages