Decoding memory consuming

New Kaldi User

unread,

Dec 28, 2015, 9:36:05 AM12/28/15

to kaldi-help

Dear All,

I have built an ASR using Kaldi and it is working Ok but the problem is that the decoding consumes a lot of memory. I'm using an HCLG.fst of around 6 GB and an HCLGa.fst of 3.7 GB. On a machine of 128 GB of RAM, I can't run more than 2 decoding jobs. Is that normal for the ASR to consume the vast majority of the RAM to decode 49 wav files of overall length of 13 minutes while using only two decoding jobs? any suggestions for getting a more RAM-friendly decoding process?

Your feedback is always highly appreciated and thank you very much in advance.

Tony Robinson

unread,

Dec 28, 2015, 10:04:07 AM12/28/15

to kaldi...@googlegroups.com

On 28/12/15 14:36, New Kaldi User wrote:

Dear All,

I have built an ASR using Kaldi and it is working Ok but the problem is that the decoding consumes a lot of memory. I'm using an HCLG.fst of around 6 GB and an HCLGa.fst of 3.7 GB. On a machine of 128 GB of RAM, I can't run more than 2 decoding jobs. Is that normal for the ASR to consume the vast majority of the RAM to decode 49 wav files of overall length of 13 minutes while using only two decoding jobs? any suggestions for getting a more RAM-friendly decoding process?

No, it is not normal to consume nearly 128 GB RAM. Are you running a standard recipe or one of your own? Have you changed the beams or acoustic scale?

Tony

--
Speechmatics is a trading name of Cantab Research Limited
We are hiring: www.speechmatics.com/careers
Dr A J Robinson, Founder, Cantab Research Ltd
Phone direct: 01223 794096, office: 01223 794497
Company reg no GB 05697423, VAT reg no 925606030
51 Canterbury Street, Cambridge, CB4 3QG, UK

New Kaldi User

unread,

Dec 28, 2015, 1:16:53 PM12/28/15

to kaldi-help

Hello Tony,

Thanks for your fast reply.

I have run a new trial and here are the values used:

acwt = 0.08

beam = 13

lattice_beam = 6

min_acive = 200

max_active = 7000

number of decode jobs = 1

size of HCLG.fst = 11.9 GB

size of final.nnet = 101.4 MB

size of final.mdl = 480 KB

Memory consumed = 52 GB

Time consumed = 1661 second

I'm using the decode_nnet.sh file almost in its raw form. It uses the latgen-faster-mapped decoder.

Are there any general suggestions? can I use another decoder?

Thanks,

Tony Robinson

unread,

Dec 28, 2015, 1:30:28 PM12/28/15

to kaldi...@googlegroups.com

All those values look fair enough to me. What do you mean by "run a new trial" - are you running a standard recipe or one of your own? It's really important to run a recipe (e.g. tedlium) first to get the feel of everything. If you introduce your own data and end up with a poor acoustic model or a poor language model then you can use a lot more memory in the search.

Tony

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Povey

unread,

Dec 28, 2015, 3:38:04 PM12/28/15

to kaldi-help

I don't see how it could use that much memory in the search--the max-active provides a limit.

If you have a single long utterance it's possible that it's using a lot of memory in the neural net evaluation. I assume this is nnet1. That may be the issue. In this case it would be 'nnet-forward' that was using up a lot of memory. I'm not sure if there is a solution to this - Karel would know. The issue doesn't exist in nnet2 and nnet3.

Dan

a.m.m...@ieee.org

unread,

Dec 29, 2015, 3:07:24 PM12/29/15

to kaldi-help, dpo...@gmail.com

Thank you Tony and Dan,

@Tony I mean by "run a new trial" that I make a decode. I have run Voxforge recipe before and get the expected good result. I also run my own Arabic data of around 90 hour of training and 13 minutes of testing with an accuracy of 79%. I'm using the recipe of gale_Arabic as a reference.

@Dan What is the expected amount of memory that the reported decoding process should take? also, what is the utterance duration to be considered as a long one? To which limit Kaldi is sensitive to the utterance duration in both training and testing? to use nnet2 or nnet3, which steps should I have to redo?

Thank you very much again for your valuable comments.

Daniel Povey

unread,

Dec 29, 2015, 3:26:53 PM12/29/15

to a.m.m...@ieee.org, kaldi-help

I believe 'nnet-forward' will use an amount of memory that grows linearly with the utterance duration- you can figure out a lower bound from the number of output labels (pdfs) in your system and the size of a BaseFloat- then maybe double it to take into account hidden layers, and double again to take into account that it can be held in a couple of programs in the pipe. You still haven't told us which program was using up memory, so we are having to guess.

Regarding nnet2 and nnet3- well, it would probably be better for you to run an nnet2 recipe at this point.

E.g. you could start with egs/wsj/s5/local/online/run_nnet2.sh. However, I anticipate that you will run into problems, and I can't spend much time helping you.

Dan

a.m.m...@ieee.org

unread,

Dec 29, 2015, 3:48:00 PM12/29/15

to kaldi-help, a.m.m...@ieee.org, dpo...@gmail.com

Thanks Dr. Dan for your fast response,

I have used egs/gale_arabic/s5/local/nnet/run_dnn.sh

Thanks,

Reply all

Reply to author

Forward