I'm mystified by some very slow Caffe performance... looking for any ideas.
I just custom-built a new machine learning computer that I expected to be ~20-100 times faster at Caffe training than the MacBook Pro I'd been experimenting on.
Using a Python program I wrote with a simplified version of AlexNet's convolution/pool/fully-connected network to classify images, I was shocked to find that my new computer was 4 times slower per training iteration.
Additionally, though I confirmed (by running "nvidia-smi") that the Caffe training program was running as a compute job on the 1080 Ti, the training was just as slow as when I ran it on the CPU. So, it's as if the GPU was offering no help on network training.
After Googling, I thought it may be a 1080 Ti driver incompatibility with Caffe; I have 384.98 installed, but I also tried 375.82 with the same results. I also attempted to install 367.35, but it was incompatible with my Linux kernel and failed installation, which is the edge of my knowledge on drivers and kernels. Can anyone say what drivers they've successfully used for 1080 Ti running Caffe?
More Googling indicated that there many be a problem with FP16 running very slowly on 1080 Ti. I couldn't find results for whether Caffe is using FP16 or a way I can force it to use FP32 to see if that's the problem. Does anyone know if I can ensure I'm training with FP32 so that I can rule that out?
My setup is:
Processor: Intel i7 6850K
Motherboard: Asus X99-E
RAM: Ballistix Sport LT 16GB (2x 8GB) DDR4 2666MHz SR
Graphics Card: EVGA GTX 1080 Ti FTW3
SSD: Samsung 850 EVO 500GB SATA
OS: Ubuntu 17.04
CUDA: 9.0 (I also have CUDA 8.0 installed, which I think was automatically installed with TensorFlow)
cuDNN: 7.0.4
Graphics Card Driver: 384.98
Running "top", my CPU hovers around 100% usage when the training is running; RAM is only at 6% usage. "nvidia-smi" says the job only takes 189MiB of my graphics card's available 11171MiB.
I installed Caffe via "sudo apt install caffe-cuda", and in my program, I set "s.solver_mode = caffe_pb2.SolverParameter.GPU", "caffe.set_device(0)", and confirmed that the solver prototxt showed GPU.
When I confirmed that the training was just as slow on the CPU, I had run "sudo apt purge caffe-cuda", then "sudo apt install caffe-cpu". I also set "s.solver_mode = caffe_pb2.SolverParameter.CPU" and confirmed with "nvidia-smi" that the job did not show up on the graphics card. Of note, it doesn't seem like the solver_mode line has any affect on whether the job goes to the GPU; all that mattered was whether I had installed caffe-cuda or caffe-cpu.
After installing CUDA 9.0, I ran their recommended deviceQuery and bandwithTest; both passed.
After cuDNN installation, I ran their recommended MNIST sample test, which passed.
Any ideas? I'm happy to post any outputs you think could help diagnose this for me and the next guy.