Latest gaming GPUs vs K-80 & Tesla V100 for training chain models

3,347 views
Skip to first unread message

Charl van Heerden

unread,
Jun 3, 2019, 12:33:39 PM6/3/19
to kaldi-help
Hi,

I have two questions for fellow Kaldi users who know something about GPUs, and specifically about training time of chain models on different GPU models.

(1) I'm trying to determine whether or not it's worthwhile investing in NVidea's latest gaming GPU (GeForce RTX 2080 Ti) for training chain models. I currently have a 3-year old Tesla K-80, and chain model training takes 10 days. Does anyone have experience with both cards, and any idea of the possible speed-up when using the RTX 2080 Ti vs the K-80? The reviews for general deep learning seem to be good (https://lambdalabs.com/blog/best-gpu-tensorflow-2080-ti-vs-v100-vs-titan-v-vs-1080-ti-benchmark/), and a general "Cuda" score also looks promising, although it's not clear how it's calculated, and if it's relevant wrt speed. https://browser.geekbench.com/cuda-benchmarks

(2) I would also be interested to know if anyone has experience on training chain models on AWS using K-80's vs Tesla V100's, and how much faster the V100's are when training chain models than when using the same number of K-80s.

Charl

Daniel Povey

unread,
Jun 3, 2019, 1:01:59 PM6/3/19
to kaldi-help
I think the RTX 2080 would be faster-- maybe twice as fast or more-- but I don't have specific numbers.

I don't have experience using V100's but would imagine they would be fast.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/843ceb6a-e7d0-4121-bb21-9767f9a6f8bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jan Trmal

unread,
Jun 3, 2019, 1:06:11 PM6/3/19
to kaldi-help
I did some very preliminary and basic tests and I think the AWS pricing correlates very well with the performance. I.e. the V100/P100 (I forgot the name of the card) was twice as fast but also twice as expensive compared to K80, IIRC.
Porbably you should run some benchmark for yourself -- might depend on the DNN sizes and data volume and so on.
y.

Charl van Heerden

unread,
Jun 3, 2019, 1:08:44 PM6/3/19
to kaldi...@googlegroups.com
Thank you very much. I will buy a 2080 and report back here on exactly how much faster it is (double would be very useful!) 

Rudolf A. Braun

unread,
Jun 3, 2019, 5:27:32 PM6/3/19
to kaldi-help
1080TI is significantly faster than a k80 (at least 2x), and a V100 is significantly faster than a 1080ti. It would be good to know if float16 training will be incorporated into kaldi, because that's when the 2080ti's price really becomes worth it vs the 1080 version (otherwise not really imo).

And to answer your question, the 2080ti should be way faster than a k80 (at least 4x).


On Monday, June 3, 2019 at 6:08:44 PM UTC+1, Charl van Heerden wrote:
Thank you very much. I will buy a 2080 and report back here on exactly how much faster it is (double would be very useful!) 

On Mon, 03 Jun 2019, 19:01 Daniel Povey, <dpo...@gmail.com> wrote:
I think the RTX 2080 would be faster-- maybe twice as fast or more-- but I don't have specific numbers.

I don't have experience using V100's but would imagine they would be fast.

On Mon, Jun 3, 2019 at 12:33 PM Charl van Heerden <cvhe...@gmail.com> wrote:
Hi,

I have two questions for fellow Kaldi users who know something about GPUs, and specifically about training time of chain models on different GPU models.

(1) I'm trying to determine whether or not it's worthwhile investing in NVidea's latest gaming GPU (GeForce RTX 2080 Ti) for training chain models. I currently have a 3-year old Tesla K-80, and chain model training takes 10 days. Does anyone have experience with both cards, and any idea of the possible speed-up when using the RTX 2080 Ti vs the K-80? The reviews for general deep learning seem to be good (https://lambdalabs.com/blog/best-gpu-tensorflow-2080-ti-vs-v100-vs-titan-v-vs-1080-ti-benchmark/), and a general "Cuda" score also looks promising, although it's not clear how it's calculated, and if it's relevant wrt speed. https://browser.geekbench.com/cuda-benchmarks

(2) I would also be interested to know if anyone has experience on training chain models on AWS using K-80's vs Tesla V100's, and how much faster the V100's are when training chain models than when using the same number of K-80s.

Charl

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/843ceb6a-e7d0-4121-bb21-9767f9a6f8bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Jun 3, 2019, 6:52:57 PM6/3/19
to kaldi-help
It may be a while, e.g. at least 6 months, before we support float16, unless someone volunteers.


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Truong Do

unread,
Jun 8, 2019, 12:21:31 AM6/8/19
to kaldi-help
The 2080Ti card is 3 times faster than K80. In my experiment, 1 epoch on K80 took 9 minutes while 2080Ti took only 3 minutes.

Justin Luitjens

unread,
Jun 8, 2019, 2:03:22 AM6/8/19
to kaldi...@googlegroups.com
we already added fp16 the easy way.  there is a device option to enable tenser cores. however in training your milage may vary.

Charl van Heerden

unread,
Sep 12, 2019, 4:05:11 AM9/12/19
to kaldi-help
Thanks for all the assistance and suggestions. We bought two 2080Ti's, and as promised herewith the rough comparison for Zulu chain model training (180 iterations) (same training set, but I haven't accounted for different NVidia drivers, different CPU's, HDDs etc):
K80 (essentially 2 x K40): 10h23m21s
2x2080Ti's: 2h59m05s

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

François Hernandez

unread,
Sep 13, 2019, 9:37:31 AM9/13/19
to kaldi-help
@Justin could you elaborate a bit about that? What do you mean by "device option to enable tensor cores"?
We have a bunch of 2080Tis that we already use for mixed-precision training with apex on pytorch (OpenNMT-py). Is this "device option" enabled by default then or do we need to do something to benefit from tensor cores in chain training?
Thanks!

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Sep 14, 2019, 12:39:43 AM9/14/19
to kaldi-help
See this code in cu-device.cc:


#if CUDA_VERSION >= 9000
if (device_options_.use_tensor_cores) {
// Enable tensor cores in CUBLAS
// Note if the device does not support tensor cores this will
fall back to normal math mode
CUBLAS_SAFE_CALL(cublasSetMathMode(cublas_handle_,
CUBLAS_TENSOR_OP_MATH));
}
#endif

I think the tensor cores may use lower precision.
Also the matrix args need to have dimensions that are multiples of 4,
or maybe 8 or 16, I forget.
That's why I have started making dimensions divisible by large powers of 2.
Dan
> To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/7d4a7b07-c1df-42c2-9c13-d1d74b8ed967%40googlegroups.com.

sxk

unread,
Sep 14, 2019, 1:57:53 AM9/14/19
to kaldi-help
I think it would be best to have a reference benchmark post created so as to help others. Google Cloud and AWS does not have RTX2080 Ti s because of licensing issues. As far as my experience goes, tried training the multi en chain example with Nvidia K80, ie 8 x Nvidia K80s, it completed the training in three days time, tried the same with Nvidia V100 , 8 x Nvidia V100s, completed the training in one days time. In terms of pricing, as these GPUs are billed hourly, for the complete training eight V 100s for one day's time came about $100 higher than eight K80s running for three days. If you equate with the time saved, there's value for the price point as well. 

Daniel Povey

unread,
Sep 14, 2019, 5:21:00 AM9/14/19
to kaldi-help
Good idea, but I personally don't have time.
Bear in mind that some people might be limited by file system access.
> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/701ed261-c34c-40cd-a2de-c19101898b3e%40googlegroups.com.

Kirill Katsnelson

unread,
Sep 14, 2019, 6:57:05 PM9/14/19
to kaldi-help
Just from experience.

GPU-wise, P100 is same as GTX1080Ti, and V100 is same as RTX2080Ti (except I *think* V100 has double the RAM? I do not remember). On consumer cards, two disabled features are (1) ECC GDDR support and (2) double-precision floating point. Regarding (1), just do not overclock it :), and, as for (2), for Kaldi you do not care.

1. P100 vs 1080Ti: training on a server with Skylake-X Xeon and a P100 *seemed* faster to me than Skylake-X i9 with 1080Ti, and GPU utilization from nvidia-smi stayed higher. First thing that comes to mind is the bus, but it's PCIe Gen3 in both cases. I did not do the same model, however, so cannot compare apples to apples. So if you ask me, going up from 1080Ti to P100 will give you probably between 0% and 20% performance boost. Count Bens in your pocket and decide. :)

2. V100 vs P100, same NNET model size (large TDNN, 29M parameters): steady 30–35% faster. I am not getting the single-FP ratio of 14/9.3 TFLOP from the two cards specs with Kaldi (that would be 50% faster), but still noticeable.

2a. So I'd expect RTX2080Ti be about 30–35% faster than my 1080Ti. If you are going to train on 1 to 3 GPU in your own machine, that's the best bet

3. I do not know much about K80, but won't be surprised if it's 2x or even more slower than a 2080.

Tensorflow uses hardware differently than Kaldi, so general comparisons would not be much applicable.

 -kkm

Kirill Katsnelson

unread,
Sep 14, 2019, 7:01:27 PM9/14/19
to kaldi-help
Oh. Nearly x3.5 faster. Nice. I responded too fast :)
Reply all
Reply to author
Forward
0 new messages