Caffe running very slowly on 1080 Ti

675 views
Skip to first unread message

Sean Richardson

unread,
Dec 2, 2017, 12:18:11 AM12/2/17
to Caffe Users
I'm mystified by some very slow Caffe performance... looking for any ideas.

I just custom-built a new machine learning computer that I expected to be ~20-100 times faster at Caffe training than the MacBook Pro I'd been experimenting on. 

Using a Python program I wrote with a simplified version of AlexNet's convolution/pool/fully-connected network to classify images, I was shocked to find that my new computer was 4 times slower per training iteration.

Additionally, though I confirmed (by running "nvidia-smi") that the Caffe training program was running as a compute job on the 1080 Ti, the training was just as slow as when I ran it on the CPU. So, it's as if the GPU was offering no help on network training.

After Googling, I thought it may be a 1080 Ti driver incompatibility with Caffe; I have 384.98 installed, but I also tried 375.82 with the same results. I also attempted to install 367.35, but it was incompatible with my Linux kernel and failed installation, which is the edge of my knowledge on drivers and kernels. Can anyone say what drivers they've successfully used for 1080 Ti running Caffe?

More Googling indicated that there many be a problem with FP16 running very slowly on 1080 Ti. I couldn't find results for whether Caffe is using FP16 or a way I can force it to use FP32 to see if that's the problem. Does anyone know if I can ensure I'm training with FP32 so that I can rule that out?

My setup is:

Processor: Intel i7 6850K
Motherboard: Asus X99-E
RAM: Ballistix Sport LT 16GB (2x 8GB) DDR4 2666MHz SR
Graphics Card: EVGA GTX 1080 Ti FTW3
SSD: Samsung 850 EVO 500GB SATA
OS: Ubuntu 17.04
CUDA: 9.0 (I also have CUDA 8.0 installed, which I think was automatically installed with TensorFlow)
cuDNN: 7.0.4
Graphics Card Driver: 384.98

Running "top", my CPU hovers around 100% usage when the training is running; RAM is only at 6% usage. "nvidia-smi" says the job only takes 189MiB of my graphics card's available 11171MiB.

I installed Caffe via "sudo apt install caffe-cuda", and in my program, I set "s.solver_mode = caffe_pb2.SolverParameter.GPU", "caffe.set_device(0)", and confirmed that the solver prototxt showed GPU. 

When I confirmed that the training was just as slow on the CPU, I had run "sudo apt purge caffe-cuda", then "sudo apt install caffe-cpu". I also set "s.solver_mode = caffe_pb2.SolverParameter.CPU" and confirmed with "nvidia-smi" that the job did not show up on the graphics card. Of note, it doesn't seem like the solver_mode line has any affect on whether the job goes to the GPU; all that mattered was whether I had installed caffe-cuda or caffe-cpu.

After installing CUDA 9.0, I ran their recommended deviceQuery and bandwithTest; both passed.

After cuDNN installation, I ran their recommended MNIST sample test, which passed. 

Any ideas? I'm happy to post any outputs you think could help diagnose this for me and the next guy.

Sean Richardson

unread,
Dec 2, 2017, 8:09:09 PM12/2/17
to Caffe Users
Update: I tried simply running the net forward. It's still ~4 times slower than my MacBook Pro. So, it's not simply a back-propagation problem (as some Google search results sometimes suggest for similar problems).
Message has been deleted

Sage Lee

unread,
Dec 3, 2017, 2:31:09 PM12/3/17
to Caffe Users
HI, I am new to this and don't know if I know what I'm talking about here.  That being said, I had my own difficulties (Ubuntu 16.04) and upon searching, found out that apparently cudnn 5.1 is the last version people are saying to get for installing Caffe.  (Also, if you are using Ubuntu 16.04, they say at the very bottom of the Caffe Berkely site that you need to use cuda 8.0 and not 9.0.)

My own (completely different, fwiw) problems were solved by uninstalling cudnn 7 and getting 5.1 instead. 

Anyway, good luck with better answers.  Just letting you know because everywhere I turn it seems people are saying Caffe users need cudnn 5.1.

Sean Richardson

unread,
Dec 3, 2017, 3:00:38 PM12/3/17
to Caffe Users
Thanks for the reply, Sage. I’ll give that a shot.

Przemek D

unread,
Dec 4, 2017, 5:42:37 AM12/4/17
to Caffe Users
By the looks of it, none of the work is done by the GPU. RAM usage of ~200MB looks more like the CUDA/cuBLAS/cuDNN context memory rather than your model - something that is always allocated when you run Caffe compiled with CUDA, no matter if you train on CPU or GPU. The 100% CPU usage only comfirms it - normally it should be only data loads and preprocessing, but here it looks like it does all the work.

In the script that you're running, did you start with caffe.set_mode_gpu() before setting device and loading your net and solver?

Sean Richardson

unread,
Dec 4, 2017, 11:34:00 AM12/4/17
to Caffe Users
Thanks Przemek! That solved it! My new desktop is now 40x faster training this network than my MacBook Pro (which was CPU-only). And the desktop's GPU training speed is 160x faster than the desktop's CPU training speed. I'm very excited...

I had misunderstood that setting "GPU" in the solver prototxt meant that "caffe.set_mode_gpu()" wasn't necessary. Interestingly, "GPU" vs "CPU" in the solver prototxt seems to be irrelevant... GPU vs CPU is only governed by the line caffe.set_mode_gpu() or cpu().

For the next person who may have this problem, the 3 relevant lines in my program, in order, are:

caffe.set_mode_gpu()

caffe.set_device(0) #unnecessary if you have only one graphics card

solver = caffe.get_solver(solver_prototxt_path)

Other notes...

Looking at "top", my CPU is still running at or above 100%. But "nvidia-smi" shows that the Volatile GPU-Util is now 96% (vice 0% without the caffe.set_mode_gpu() line) and GPU Memory Usage is now 949MiB (vice 189MiB without the caffe.set_mode_gpu() line). 

Regarding Sage's thoughts on cnDNN version, I turns out that Caffe is running great with Ubuntu 17.04, CUDA 9.0, cuDNN 7.0.4, and NVIDIA driver 384.98. Thanks for the effort, though!
Reply all
Reply to author
Forward
0 new messages