The
Caffe docs say, "cuDNN Caffe: for fastest operation Caffe is accelerated by drop-in integration of NVIDIA cuDNN. To speed up your Caffe models, install cuDNN then uncomment the
USE_CUDNN := 1 flag in
Makefile.config when installing Caffe." However when I tried this, I found that Caffe (PyCaffe specifically) actually ran
slower with cuDNN than without cuDNN. Has anyone else experienced this? Or does anyone think there's anything suspect with my builds?
Below are operation times running the same feature extraction code in a loop with the same Caffe model on servers having different Caffe RC3 builds.
ATLAS with CUDA 7.5, without cuDNN
optime: 0:00:00.141123
optime: 0:00:00.141044
optime: 0:00:00.140796
optime: 0:00:00.140881
optime: 0:00:00.140706
optime: 0:00:00.141275
optime: 0:00:00.141032
optime: 0:00:00.141049
optime: 0:00:00.140871
ATLAS with CUDA 7.0 and cuDNN v4
optime: 0:00:00.157828
optime: 0:00:00.157653
optime: 0:00:00.157314
optime: 0:00:00.156893
optime: 0:00:00.155795
optime: 0:00:00.157192
optime: 0:00:00.155587
optime: 0:00:00.155364
optime: 0:00:00.155914
OpenBLAS with CUDA 7.5, without cuDNN
optime: 0:00:00.150775
optime: 0:00:00.152572
optime: 0:00:00.152900
optime: 0:00:00.154615
optime: 0:00:00.151565
optime: 0:00:00.153476
optime: 0:00:00.151332
optime: 0:00:00.151705
optime: 0:00:00.153208
OpenBLAS with CUDA 7.0 and cuDNN v4
optime: 0:00:00.162633
optime: 0:00:00.161286
optime: 0:00:00.161574
optime: 0:00:00.162428
optime: 0:00:00.159456
optime: 0:00:00.160352
optime: 0:00:00.161450
optime: 0:00:00.162175
optime: 0:00:00.160275