Is CuDnn 5105 supported ?

Ragav Venkatesan

unread,

Oct 22, 2016, 5:54:00 PM10/22/16

to theano-users

I updated and I'm getting some weird errors. With Cuda backend, convolutions only run on CPU and with libgpuarray backend GPUs only run at about 35% util.

Michael Klachko

unread,

Oct 24, 2016, 12:33:33 PM10/24/16

to theano-users

Yes, it's supported, I'm using it right now (CUDA 8.0 on Ubuntu 14.04):

>>> import theano
Using gpu device 0: TITAN X (Pascal) (CNMeM is enabled with initial size: 30.0% of memory, cuDNN 5105)
>>> print theano.__version__
0.9.0dev3.dev-20fd30a38d34687e9d944140042762ca9fca6276

Frédéric Bastien

unread,

Oct 24, 2016, 12:38:17 PM10/24/16

to theano-users

What errors do you have? Delete your Theano cache, just in case and be sure to use Theano dev version. The last release don't support it I think.

Fred

--

---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ragav Venkatesan

unread,

Oct 31, 2016, 11:06:33 PM10/31/16

to theano-users

Fred,

Thanks, the bleeding edge worked. I guess, I was on a slightly older version.

Thanks.

To unsubscribe from this group and stop receiving emails from it, send an email to theano-users...@googlegroups.com.

Ragav Venkatesan

unread,

Nov 8, 2016, 1:56:42 PM11/8/16

to theano-users

Ok, here is a problem I'm getting and I am not sure how to solve this. If I use the libgpuarray backend on the cnn_tutorial I am getting a 98% gpu tutilization with cudnn 5105. If I use cuda backend, I am only getting about 35% utilization.

Anyidea why this might be so ?

On Monday, October 24, 2016 at 9:38:17 AM UTC-7, nouiz wrote:

To unsubscribe from this group and stop receiving emails from it, send an email to theano-users...@googlegroups.com.

Frédéric Bastien

unread,

Nov 9, 2016, 11:48:40 AM11/9/16

to theano-users

It could be that the new back-end (libgpuarray) is faster and more efficient in that cases. So just use that back-end :)

The speed difference between both back-end isn't constant, but should be a little bit faster with the new back-end in average.

We have found a few speed regression in the new back-end, but they where fixed. If you found one, just tell us and we'll fix it. But the probably is still low of having slowdown in the new back-end.

We just merged one such fix with indexing. Make sure to update libgpuarray and recompile it if you want to be sure to have the fastest version.

Fred

To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe@googlegroups.com.

Michael Klachko

unread,

Nov 9, 2016, 2:52:21 PM11/9/16

to theano-users

I'm a little confused regarding the two backends. Is there any recommendation to when we should be using one or the other? I installed both on my Ubuntu machine, and I don't even know which one I'm using for my convnets. How can I tell? Or how can I configure it to use one or another?

--

---
You received this message because you are subscribed to a topic in the Google Groups "theano-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/theano-users/bSTnP3yLorw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to theano-users+unsubscribe@googlegroups.com.

Frédéric Bastien

unread,

Nov 9, 2016, 3:44:59 PM11/9/16

to theano-users

If you use the flag

device=gpu*

you use the old back-end.

If you use the flag:

device=cuda*

you use the new back-end(libgpuarray).

If the new back-end work for you, use it. If not, tell us! But we are pretty confident that it should work. We passed practically all operation to it now. The one missing are very not frequently used, so you probably won't miss them.

Here is a link to help you convert to the new back-end. It have some information that should help you:

https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Ragav Venkatesan

unread,

Nov 9, 2016, 7:09:54 PM11/9/16

to theano-users

After investigating further I don't think this is a speed or slow issue. I think the newer version of CUDA/cuDNN using the cuda backend is not using the GPU fully. The older version (7.5/5103) of CUDA/cuDNN produce 98% GPU util but the same code on the latest versions (8.0/5105) don't. The code by the way is the lenet tutorial from theano, so its not some weird coding error also. Using the libgpuarray backend, I am able to produce 98% util even with CUDA/cuDNN (8/5105).

Michael Klachko

unread,

Nov 9, 2016, 7:36:14 PM11/9/16

to theano-users

Ragav, so when GPU is 98% utilized, is the training faster than when it's 35% utilized? Have you timed it?

You received this message because you are subscribed to a topic in the Google Groups "theano-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/theano-users/bSTnP3yLorw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to theano-users+unsubscribe@googlegroups.com.

Ragav Venkatesan

unread,

Nov 9, 2016, 7:37:22 PM11/9/16

to theano...@googlegroups.com

Good question. No I haven't. I will do this by tonight if I can.

--
Ragav

Ragav Venkatesan

unread,

Nov 10, 2016, 6:26:24 PM11/10/16

to theano-users

I'm writing a code to test this, but why do you ask this ? Is there a case where nvidia-smi might give me 35% util when the GPU is actually running the code as fast as it can ?

To unsubscribe from this group and all its topics, send an email to theano-users...@googlegroups.com.

Michael Klachko

unread,

Nov 10, 2016, 6:47:38 PM11/10/16

to theano-users

Yes. It depends on the size of your network/input - the smaller it is, the harder it is to keep 3k cores busy all the time.

Regarding timing, you don't need to write much code:

import time
start_time = time.time()

your code here

print "Code ran for {:.1f} minutes".format((time.time() - start_time)/60)

To unsubscribe from this group and all its topics, send an email to theano-users+unsubscribe@googlegroups.com.

Ragav Venkatesan

unread,

Nov 11, 2016, 5:20:03 PM11/11/16

to theano-users

Running on GTX 1080, cuda0 for device runs for 1.69 minutes at 98% , gpu0 runs for 5.12 minutes at 34% . Both runs the same code cnn_tutorial from theano tutorials. The code is not modified or changed at all. floatX=float32, mode = FAST_RUN, nvcc.fastmath = True and nvcc.allowgc =True.

To unsubscribe from this group and all its topics, send an email to theano-users...@googlegroups.com.

Michael Klachko

unread,

Nov 12, 2016, 12:00:39 AM11/12/16

to theano-users

Do both versions use CuDNN? If gpu0 version didn't use it, that would explain the difference. Also, look at CPU usage for gpu0 version - it could be that some ops are running on CPU instead of GPU.

To unsubscribe from this group and all its topics, send an email to theano-users+unsubscribe@googlegroups.com.

Ragav Venkatesan

unread,

Nov 12, 2016, 9:19:17 PM11/12/16

to theano-users

Both are using CUdNNs.. I am wondering if some ops are running on the CPU, how do I find that out ?

To unsubscribe from this group and all its topics, send an email to theano-users...@googlegroups.com.

Michael Klachko

unread,

Nov 12, 2016, 9:43:16 PM11/12/16

to theano-users

I'm not sure, but just by looking at CPU usage (top command on Linux) you should be able to see the difference.

To unsubscribe from this group and all its topics, send an email to theano-users+unsubscribe@googlegroups.com.

Ragav Venkatesan

unread,

Nov 12, 2016, 9:45:22 PM11/12/16

to theano-users

in htop I usually have one CPU running 100% for both cases.

To unsubscribe from this group and all its topics, send an email to theano-users...@googlegroups.com.

Ragav Venkatesan

unread,

Nov 13, 2016, 12:36:58 AM11/13/16

to theano-users

When I was debugging, I also discovered that if I use ignore_border = False for pooling, it doesn't run on GPU in libgpuarray backend. ignore_border = True does. Is there anything to this ?

Ragav Venkatesan

unread,

Nov 13, 2016, 12:38:41 AM11/13/16

to theano-users

I'll create a new thread for this.

Michael Klachko

unread,

Nov 14, 2016, 12:15:26 PM11/14/16

to theano-users

I will try testing it on Pascal Titan X card when I have time tomorrow, and will report back.

To unsubscribe from this group and all its topics, send an email to theano-users+unsubscribe@googlegroups.com.

Frédéric Bastien

unread,

Nov 15, 2016, 9:09:52 AM11/15/16

to theano-users

With device=gpu flag also add the flag lib.cnmem=1

This will speed up that code. The new backend so something similar by default.

Fred

To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe@googlegroups.com.

Ragav Venkatesan

unread,

Nov 20, 2016, 5:51:40 PM11/20/16

to theano-users

Speeds up a little, but still not 100% utilization.

To unsubscribe from this group and all its topics, send an email to theano-users...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward