System:
OS: Ubuntu 16.04
GPU: GTX 1080
CUDA: 8.0.61
cuDNN: 6.0.21
As of this afternoon, my caffe, which was working fine for two weeks, stopped working. I didn't consciously change anything on my system. Caffe now exits with:
I1025 15:01:10.109141 1036 net.cpp:380] convolution_1 -> convolution_1
E1025 15:01:10.180095 1042 common.cpp:114] Cannot create Cublas handle. Cublas won't be available.
F1025 15:01:10.238710 1036 cudnn_conv_layer.cpp:53] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR
*** Check failure stack trace: ***
@ 0x7fadb236f5cd google::LogMessage::Fail()
@ 0x7fadb2371433 google::LogMessage::SendToLog()
@ 0x7fadb236f15b google::LogMessage::Flush()
@ 0x7fadb2371e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7fadb292f0bb caffe::CuDNNConvolutionLayer<>::LayerSetUp()
@ 0x7fadb2a65ddc caffe::Net<>::Init()
@ 0x7fadb2a6862e caffe::Net<>::Net()
@ 0x7fadb2a1afc5 caffe::Solver<>::InitTrainNet()
@ 0x7fadb2a1c435 caffe::Solver<>::Init()
@ 0x7fadb2a1c74f caffe::Solver<>::Solver()
@ 0x7fadb2a46e31 caffe::Creator_SGDSolver<>()
@ 0x40bd33 train()
@ 0x408450 main
@ 0x7fadb105b830 __libc_start_main
@ 0x408c79 _start
@ (nil) (unknown)
This problem is not
new, but it seems that either it was a memory issue or it was never resolved. For test purposes I'm training on the MNIST dataset with LeNet, so memory is no problem here.
To resolve this I completely reinstalled CUDA/cuDNN and caffe. The CUDA samples as well as the caffe runtest are working just fine, but the error is still there.
The only strange thing I found is the error message right before the fatal error:
E1025 15:01:10.180095 1042 common.cpp:114] Cannot create Cublas handle. Cublas won't be available.
Can someone shed some light on this?