Is it possible to use full power of ATLAS BLAS in nnet training?

JurPasha

unread,

Aug 2, 2018, 4:43:39 AM8/2/18

to kaldi-help

Hello,

I'm experimenting with nnet module on VoxForge-ru dataset.

I know that CUDA is the best choice to train neural networks, but unfortunately I don't have it for a while.

It is possible to turn CUDA off using the key skip_cuda_check=true.

But I was disappointed when discovered that using ATLAS BLAS the NNET module runs only on a single CPU core.

Is it possible to use full power of ATLAS BLAS in nnet training?

For example, ancient QuickNet can do this.
Maybe I did not find the special keys to enable this option?

Can somebody help me with that?

Thanks and best regards... :)

Daniel Povey

unread,

Aug 2, 2018, 3:19:26 PM8/2/18

to kaldi-help

Depending on the BLAS version you are using, you can sometimes enable multi-threaded BLAS with environment variables, something like

MKL_NUM_THREADS=4

or something (if you are using MKL).

I think with ATLAS, you need to explicitly compile with the multi-threaded version.

Anyway there isn't much point doing it. My experience has been that while it will use multiple threads, the speedup is very small (like, less than twice faster even using 8 threads). I think this is because the matrix multiplications are not big enough.

In nnet2, we did support training with multiple CPU cores, but it was based on separate processes, not based on multi-threaded BLAS (which I found didn't give much speedup). I removed that with nnet3 since even with tons of CPUs, it's much slower than even a single GPU, so it rarely got used.

Dan

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/c675660a-6b0b-493b-909a-ee0637bdae10%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

JurPasha

unread,

Aug 9, 2018, 7:43:12 AM8/9/18

to kaldi-help

Thank you, Dan.

Now I am experimenting with Azure :)

But training with recompiled ATLAS still going on.

This is strange and painful agony :(

I have used Intel MKL in my NNet training tool and I had a huge speed up when all cpu cores was used.

Moreover, cpu-core loading was 100% on Intel core-i7.

I have found that matrix-matrix multiplication (sgemm) can improve performance than matrix-vector (sgemv).

What function did you used in nnet-tool?

четверг, 2 августа 2018 г., 22:19:26 UTC+3 пользователь Dan Povey написал:

Depending on the BLAS version you are using, you can sometimes enable multi-threaded BLAS with environment variables, something like
MKL_NUM_THREADS=4
or something (if you are using MKL).
I think with ATLAS, you need to explicitly compile with the multi-threaded version.
Anyway there isn't much point doing it. My experience has been that while it will use multiple threads, the speedup is very small (like, less than twice faster even using 8 threads). I think this is because the matrix multiplications are not big enough.
In nnet2, we did support training with multiple CPU cores, but it was based on separate processes, not based on multi-threaded BLAS (which I found didn't give much speedup). I removed that with nnet3 since even with tons of CPUs, it's much slower than even a single GPU, so it rarely got used.

Dan

On Thu, Aug 2, 2018 at 1:43 AM, JurPasha <jurp...@gmail.com> wrote:

Hello,
I'm experimenting with nnet module on VoxForge-ru dataset.
I know that CUDA is the best choice to train neural networks, but unfortunately I don't have it for a while.
It is possible to turn CUDA off using the key skip_cuda_check=true.
But I was disappointed when discovered that using ATLAS BLAS the NNET module runs only on a single CPU core.
Is it possible to use full power of ATLAS BLAS in nnet training?
For example, ancient QuickNet can do this.
Maybe I did not find the special keys to enable this option?
Can somebody help me with that?

Thanks and best regards... :)

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Daniel Povey

unread,

Aug 9, 2018, 2:04:07 PM8/9/18

to kaldi-help

The speedup will depend on a lot of things, including minibatch sizes,
model topologies, tc.
If you are using cloud services why don't you just get a machine with
GPUs at AWS and use a GPU?

> https://groups.google.com/d/msgid/kaldi-help/76e3bd71-f82b-4143-9db7-bf2782449059%40googlegroups.com.

Reply all

Reply to author

Forward