GTX 980 beats K40?

1,145 views
Skip to first unread message

Charles Shang

unread,
Nov 18, 2014, 4:54:23 AM11/18/14
to caffe...@googlegroups.com
We measure the performance of GTX 980 following the setting of performance_hardware (i.e. 20 iteration of 256 images)
As a result, 
 Training is 9.1 secs/20 iterations (5,120 images)
 
Testing is 41.9 secs/validation set (50,000 images)


which beats K40( training on 5120 images is 19.2 secs and testing on 50k images is 60.7 secs )

For more details, 
We do the measurement by setting the 
 input_dim: 256

then call
 ./caffe time --model=../../models/bvlc_reference_caffenet/deploy.prototxt   --iterations=20 --gpu 0

For experiment on 50k images we change the iterations to 196. We get the following results.

################### 5,120 images
I1118 17:34:10.008749  4848 caffe.cpp:246] Forward pass: 4269.52 milliseconds.
I1118 17:34:14.825181  4848 caffe.cpp:260] Backward pass: 4816.36 milliseconds.
I1118 17:34:14.825232  4848 caffe.cpp:262] Total Time: 9085.97 milliseconds.

################### 50k images
I1118 17:27:04.613462  4801 caffe.cpp:246] Forward pass: 41851.1 milliseconds.
I1118 17:27:52.440292  4801 caffe.cpp:260] Backward pass: 47826.4 milliseconds.
I1118 17:27:52.440318  4801 caffe.cpp:262] Total Time: 89677.6 milliseconds.

Thank Bartosz Ludwiczuk.

Jason Yosinski

unread,
Nov 18, 2014, 4:02:02 PM11/18/14
to Charles Shang, caffe...@googlegroups.com
Hi Charles,

Thanks for posting!

Are these results with cuDNN or without? And with or without ECC?

jason


---------------------------
Jason Yosinski, Cornell Computer Science Ph.D. student
http://yosinski.com/ +1.719.440.1357
> --
> You received this message because you are subscribed to the Google Groups
> "Caffe Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to caffe-users...@googlegroups.com.
> To post to this group, send email to caffe...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/caffe-users/9de1900f-5021-4229-96c2-88f65a241a6e%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

Charles Shang

unread,
Nov 18, 2014, 8:25:21 PM11/18/14
to caffe...@googlegroups.com, shangc...@gmail.com
It's with cuDNN. And GTX 980 has no ECC features.

在 2014年11月19日星期三UTC+8上午5时02分02秒,Jason Yosinski写道:

Bartosz Ludwiczuk

unread,
Nov 19, 2014, 3:01:27 AM11/19/14
to caffe...@googlegroups.com, shangc...@gmail.com
I have done test without cuDNN too,
Total Time: 15715.2 milliseconds.(without cudnn)
So, it get 6 second using cuDNN.
And beat other GPU despite the lack of cuDNN.

Is anybody can confirm that result at his own GTX 980 or can confirm good "Time" procedure?

evancompu...@gmail.com

unread,
Nov 20, 2014, 12:51:19 AM11/20/14
to caffe...@googlegroups.com
I found many kinds of GTX 980, which of these is the one you have?

Charles Shang

unread,
Nov 20, 2014, 3:29:42 AM11/20/14
to caffe...@googlegroups.com
No idea which bank.. 
Since all these have the same cores, I don't think the band can make any real differences.. 
For your reference, We have 4G Memory on gtx980

在 2014年11月20日星期四UTC+8下午1时51分19秒,evancompu...@gmail.com写道:

fengyanchao

unread,
Dec 1, 2014, 11:46:51 PM12/1/14
to caffe...@googlegroups.com
Hi have you trained a net with the well speed and behave normal , if the performance is stabilized ,I am eager to try! 

在 2014年11月20日星期四UTC+8下午4时29分42秒,Charles Shang写道:

Sergio Guadarrama

unread,
Dec 3, 2014, 9:50:48 AM12/3/14
to caffe...@googlegroups.com
Typically backward pass takes much more time than forward pass, since it involves a lot more computations. According to your timings they seem pretty similar, so double check your timings and prototxt.

Bartosz Ludwiczuk

unread,
Dec 16, 2014, 5:48:26 AM12/16/14
to caffe...@googlegroups.com
I get ImageNet database and I must say that running 
 ./caffe time --model=../../models/bvlc_reference_caffenet/deploy.prototxt   --iterations=20 --gpu 0
does not give right timing result. 
I get this log training "Caffe_Reference" net using 980 GTX with cuDNN:
I1216 11:41:06.724963 22263 solver.cpp:403] Iteration 13280, lr = 0.01
I1216 11:41:21.619262 22263 solver.cpp:191] Iteration 13300, loss = 3.75351
I1216 11:41:21.619304 22263 solver.cpp:206]     Train net output #0: loss = 3.75351 (* 1 = 3.75351 loss)
I1216 11:41:21.619314 22263 solver.cpp:403] Iteration 13300, lr = 0.01
I1216 11:41:36.899164 22263 solver.cpp:191] Iteration 13320, loss = 3.69631
I1216 11:41:36.899279 22263 solver.cpp:206]     Train net output #0: loss = 3.69631 (* 1 = 3.69631 loss)
I1216 11:41:36.899289 22263 solver.cpp:403] Iteration 13320, lr = 0.01
I1216 11:41:52.055275 22263 solver.cpp:191] Iteration 13340, loss = 3.61981
I1216 11:41:52.055341 22263 solver.cpp:206]     Train net output #0: loss = 3.61981 (* 1 = 3.61981 loss)
I1216 11:41:52.055351 22263 solver.cpp:403] Iteration 13340, lr = 0.01
I1216 11:42:07.049682 22263 solver.cpp:191] Iteration 13360, loss = 3.63568
I1216 11:42:07.049794 22263 solver.cpp:206]     Train net output #0: loss = 3.63568 (* 1 = 3.63568 loss)
I1216 11:42:07.049805 22263 solver.cpp:403] Iteration 13360, lr = 0.01
I1216 11:42:22.101342 22263 solver.cpp:191] Iteration 13380, loss = 3.81524
I1216 11:42:22.101380 22263 solver.cpp:206]     Train net output #0: loss = 3.81524 (* 1 = 3.81524 loss)
I1216 11:42:22.101389 22263 solver.cpp:403] Iteration 13380, lr = 0.01
I1216 11:42:37.453555 22263 solver.cpp:191] Iteration 13400, loss = 3.65223
I1216 11:42:37.453733 22263 solver.cpp:206]     Train net output #0: loss = 3.65223 (* 1 = 3.65223 loss)
I1216 11:42:37.453757 22263 solver.cpp:403] Iteration 13400, lr = 0.01
I1216 11:42:53.139688 22263 solver.cpp:191] Iteration 13420, loss = 3.78719
I1216 11:42:53.139725 22263 solver.cpp:206]     Train net output #0: loss = 3.78719 (* 1 = 3.78719 loss)
I1216 11:42:53.139734 22263 solver.cpp:403] Iteration 13420, lr = 0.01
I1216 11:43:08.337144 22263 solver.cpp:191] Iteration 13440, loss = 3.58952


As we can see, 20 iteration(5120 images) at learning process take ~15.1 seconds. 
980 GTX is still the fastest GPU on Caffe!!! And it is confirmed by learning process using ImageNet!

fengyanchao

unread,
Dec 19, 2014, 1:39:08 AM12/19/14
to caffe...@googlegroups.com
@Bartosz Ludwiczuk  This is amazing, The log file should give out the right time performance .if you get the  bvlc_reference_caffenet training result after 35w Iterations,inform me here. Thank you!

在 2014年12月16日星期二UTC+8下午6时48分26秒,Bartosz Ludwiczuk写道:

Sergio Guadarrama

unread,
Dec 22, 2014, 8:17:22 PM12/22/14
to caffe...@googlegroups.com
To get a better estimate you do ./caffe time with train_val.prototxt instead with deploy.prototxt

Bartosz Ludwiczuk

unread,
Mar 26, 2015, 6:40:42 AM3/26/15
to caffe...@googlegroups.com
As there were release of cuDNN v2, I have tested 980 GTX with new version of Caffe.  Here are the result:
Setup for traning:
batchsize
: 256
iterations
: 20
model
: bvlc_reference_train_val

I0326 11:24:54.453117 22194 caffe.cpp:271] Average Forward pass: 212.236 ms.
I0326
11:24:54.453129 22194 caffe.cpp:273] Average Backward pass: 395.987 ms.
I0326
11:24:54.453137 22194 caffe.cpp:275] Average Forward-Backward: 608.327 ms.
I0326
11:24:54.453146 22194 caffe.cpp:277] Total Time: 12166.5 ms.
So, training take 12.1/5120 images. It is ~3 seconds faster than cuDNN v1 (relative speed up: 20%)

Setup for testing:
batchsize
: 256
iterations
: 196
model
: bvlc_reference_train_val

I0326
11:30:08.478739 22701 caffe.cpp:271] Average Forward pass: 211.325 ms.
I0326
11:30:08.478744 22701 caffe.cpp:273] Average Backward pass: 396.768 ms.
I0326
11:30:08.478750 22701 caffe.cpp:275] Average Forward-Backward: 608.169 ms.
I0326
11:30:08.478755 22701 caffe.cpp:277] Total Time: 119201 ms
When we want to test 50k images, we get 211.325(average forward pass) * 196 = 41419 ms = 41.4s. (roughly estimating 100M images per day)

It is pretty incredible how cuDNN can speed up learning process, thank for it guys!! I can not image how TITAN X if fast. This GPU should achieve 66% of time measured by GTX 980. 

Leslie N. Smith

unread,
Apr 2, 2015, 9:57:17 AM4/2/15
to caffe...@googlegroups.com
Does anyone know how Nvidia's K80 compares to the GTX 980 or new Titan X?

Leslie
Reply all
Reply to author
Forward
0 new messages