GTX 980 beats K40?

Charles Shang

unread,

Nov 18, 2014, 4:54:23 AM11/18/14

to caffe...@googlegroups.com

We measure the performance of GTX 980 following the setting of performance_hardware (i.e. 20 iteration of 256 images)

As a result,

 Training is 9.1 secs/20 iterations (5,120 images)
 Testing is 41.9 secs/validation set (50,000 images)

which beats K40( training on 5120 images is 19.2 secs and testing on 50k images is 60.7 secs )

For more details,

We do the measurement by setting the

input_dim: 256

then call

./caffe time --model=../../models/bvlc_reference_caffenet/deploy.prototxt --iterations=20 --gpu 0

For experiment on 50k images we change the iterations to 196. We get the following results.

################### 5,120 images

I1118 17:34:10.008749 4848 caffe.cpp:246] Forward pass: 4269.52 milliseconds.

I1118 17:34:14.825181 4848 caffe.cpp:260] Backward pass: 4816.36 milliseconds.

I1118 17:34:14.825232 4848 caffe.cpp:262] Total Time: 9085.97 milliseconds.

################### 50k images

I1118 17:27:04.613462 4801 caffe.cpp:246] Forward pass: 41851.1 milliseconds.

I1118 17:27:52.440292 4801 caffe.cpp:260] Backward pass: 47826.4 milliseconds.

I1118 17:27:52.440318 4801 caffe.cpp:262] Total Time: 89677.6 milliseconds.

Thank Bartosz Ludwiczuk.

Jason Yosinski

unread,

Nov 18, 2014, 4:02:02 PM11/18/14

to Charles Shang, caffe...@googlegroups.com

Hi Charles,

Thanks for posting!

Are these results with cuDNN or without? And with or without ECC?

jason

---------------------------
Jason Yosinski, Cornell Computer Science Ph.D. student
http://yosinski.com/ +1.719.440.1357

> --
> You received this message because you are subscribed to the Google Groups
> "Caffe Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to caffe-users...@googlegroups.com.
> To post to this group, send email to caffe...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/caffe-users/9de1900f-5021-4229-96c2-88f65a241a6e%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

Charles Shang

unread,

Nov 18, 2014, 8:25:21 PM11/18/14

to caffe...@googlegroups.com, shangc...@gmail.com

It's with cuDNN. And GTX 980 has no ECC features.

在 2014年11月19日星期三UTC+8上午5时02分02秒，Jason Yosinski写道：

Bartosz Ludwiczuk

unread,

Nov 19, 2014, 3:01:27 AM11/19/14

to caffe...@googlegroups.com, shangc...@gmail.com

I have done test without cuDNN too,

Total Time: 15715.2 milliseconds.(without cudnn)

So, it get 6 second using cuDNN.

And beat other GPU despite the lack of cuDNN.

Is anybody can confirm that result at his own GTX 980 or can confirm good "Time" procedure?

evancompu...@gmail.com

unread,

Nov 20, 2014, 12:51:19 AM11/20/14

to caffe...@googlegroups.com

I found many kinds of GTX 980, which of these is the one you have?

http://www.newegg.com/Product/ProductList.aspx?N=100007709%20600536050&IsNodeId=1&Submit=ENE

Charles Shang

unread,

Nov 20, 2014, 3:29:42 AM11/20/14

to caffe...@googlegroups.com

No idea which bank..

Since all these have the same cores, I don't think the band can make any real differences..

For your reference, We have 4G Memory on gtx980

在 2014年11月20日星期四UTC+8下午1时51分19秒，evancompu...@gmail.com写道：

fengyanchao

unread,

Dec 1, 2014, 11:46:51 PM12/1/14

to caffe...@googlegroups.com

Hi have you trained a net with the well speed and behave normal , if the performance is stabilized ,I am eager to try!

在 2014年11月20日星期四UTC+8下午4时29分42秒，Charles Shang写道：

Sergio Guadarrama

unread,

Dec 3, 2014, 9:50:48 AM12/3/14

to caffe...@googlegroups.com

Typically backward pass takes much more time than forward pass, since it involves a lot more computations. According to your timings they seem pretty similar, so double check your timings and prototxt.

Bartosz Ludwiczuk

unread,

Dec 16, 2014, 5:48:26 AM12/16/14

to caffe...@googlegroups.com

I get ImageNet database and I must say that running

./caffe time --model=../../models/bvlc_reference_caffenet/deploy.prototxt --iterations=20 --gpu 0

does not give right timing result.

I get this log training "Caffe_Reference" net using 980 GTX with cuDNN:

I1216 11:41:06.724963 22263 solver.cpp:403] Iteration 13280, lr = 0.01
I1216 11:41:21.619262 22263 solver.cpp:191] Iteration 13300, loss = 3.75351
I1216 11:41:21.619304 22263 solver.cpp:206]     Train net output #0: loss = 3.75351 (* 1 = 3.75351 loss)
I1216 11:41:21.619314 22263 solver.cpp:403] Iteration 13300, lr = 0.01
I1216 11:41:36.899164 22263 solver.cpp:191] Iteration 13320, loss = 3.69631
I1216 11:41:36.899279 22263 solver.cpp:206]     Train net output #0: loss = 3.69631 (* 1 = 3.69631 loss)
I1216 11:41:36.899289 22263 solver.cpp:403] Iteration 13320, lr = 0.01
I1216 11:41:52.055275 22263 solver.cpp:191] Iteration 13340, loss = 3.61981
I1216 11:41:52.055341 22263 solver.cpp:206]     Train net output #0: loss = 3.61981 (* 1 = 3.61981 loss)
I1216 11:41:52.055351 22263 solver.cpp:403] Iteration 13340, lr = 0.01
I1216 11:42:07.049682 22263 solver.cpp:191] Iteration 13360, loss = 3.63568
I1216 11:42:07.049794 22263 solver.cpp:206]     Train net output #0: loss = 3.63568 (* 1 = 3.63568 loss)
I1216 11:42:07.049805 22263 solver.cpp:403] Iteration 13360, lr = 0.01
I1216 11:42:22.101342 22263 solver.cpp:191] Iteration 13380, loss = 3.81524
I1216 11:42:22.101380 22263 solver.cpp:206]     Train net output #0: loss = 3.81524 (* 1 = 3.81524 loss)
I1216 11:42:22.101389 22263 solver.cpp:403] Iteration 13380, lr = 0.01
I1216 11:42:37.453555 22263 solver.cpp:191] Iteration 13400, loss = 3.65223
I1216 11:42:37.453733 22263 solver.cpp:206]     Train net output #0: loss = 3.65223 (* 1 = 3.65223 loss)
I1216 11:42:37.453757 22263 solver.cpp:403] Iteration 13400, lr = 0.01
I1216 11:42:53.139688 22263 solver.cpp:191] Iteration 13420, loss = 3.78719
I1216 11:42:53.139725 22263 solver.cpp:206]     Train net output #0: loss = 3.78719 (* 1 = 3.78719 loss)
I1216 11:42:53.139734 22263 solver.cpp:403] Iteration 13420, lr = 0.01
I1216 11:43:08.337144 22263 solver.cpp:191] Iteration 13440, loss = 3.58952

As we can see, 20 iteration(5120 images) at learning process take ~15.1 seconds.

980 GTX is still the fastest GPU on Caffe!!! And it is confirmed by learning process using ImageNet!

fengyanchao

unread,

Dec 19, 2014, 1:39:08 AM12/19/14

to caffe...@googlegroups.com

@Bartosz Ludwiczuk This is amazing, The log file should give out the right time performance .if you get the bvlc_reference_caffenet training result after 35w Iterations,inform me here. Thank you!

在 2014年12月16日星期二UTC+8下午6时48分26秒，Bartosz Ludwiczuk写道：

Sergio Guadarrama

unread,

Dec 22, 2014, 8:17:22 PM12/22/14

to caffe...@googlegroups.com

To get a better estimate you do ./caffe time with train_val.prototxt instead with deploy.prototxt

Bartosz Ludwiczuk

unread,

Mar 26, 2015, 6:40:42 AM3/26/15

to caffe...@googlegroups.com

As there were release of cuDNN v2, I have tested 980 GTX with new version of Caffe. Here are the result:

Setup for traning:
batchsize: 256
iterations: 20
model: bvlc_reference_train_val

I0326 11:24:54.453117 22194 caffe.cpp:271] Average Forward pass: 212.236 ms.
I0326 11:24:54.453129 22194 caffe.cpp:273] Average Backward pass: 395.987 ms.
I0326 11:24:54.453137 22194 caffe.cpp:275] Average Forward-Backward: 608.327 ms.
I0326 11:24:54.453146 22194 caffe.cpp:277] Total Time: 12166.5 ms.

So, training take 12.1/5120 images. It is ~3 seconds faster than cuDNN v1 (relative speed up: 20%)

Setup for testing:
batchsize: 256
iterations: 196
model: bvlc_reference_train_val

I0326 11:30:08.478739 22701 caffe.cpp:271] Average Forward pass: 211.325 ms.
I0326 11:30:08.478744 22701 caffe.cpp:273] Average Backward pass: 396.768 ms.
I0326 11:30:08.478750 22701 caffe.cpp:275] Average Forward-Backward: 608.169 ms.
I0326 11:30:08.478755 22701 caffe.cpp:277] Total Time: 119201 ms

When we want to test 50k images, we get 211.325(average forward pass) * 196 = 41419 ms = 41.4s. (roughly estimating 100M images per day)

It is pretty incredible how cuDNN can speed up learning process, thank for it guys!! I can not image how TITAN X if fast. This GPU should achieve 66% of time measured by GTX 980.

Leslie N. Smith

unread,

Apr 2, 2015, 9:57:17 AM4/2/15

to caffe...@googlegroups.com

Does anyone know how Nvidia's K80 compares to the GTX 980 or new Titan X?

Leslie

Reply all

Reply to author

Forward