CUDNN_STATUS_ARCH_MISMATCH

5,723 views
Skip to first unread message

Leslie N. Smith

unread,
Oct 14, 2014, 12:16:20 PM10/14/14
to caffe...@googlegroups.com
Last week I downloaded caffe-master and successfully installed it without CuDNN.  But the CuDNN speedup is tempting so I downloaded it. I copied cudnn.h to /usr/include and all the lib* files to /usr/lib on my server.   In the caffe Makefile.config I uncommented "USE_CUDNN := 1".

However, when I make caffe (eg. "make clean && make all && make test && make runtest") I get an error message:

[----------] 6 tests from CuDNNConvolutionLayerTest/1, where TypeParam = double
[ RUN      ] CuDNNConvolutionLayerTest/1.TestSimpleConvolutionGroupCuDNN
F1014 08:55:30.083176 23568 cudnn_conv_layer.cpp:30] Check failed: status == CUDNN_STATUS_SUCCESS (6 vs. 0)  CUDNN_STATUS_ARCH_MISMATCH
*** Check failure stack trace: ***
    @     0x2b082d0a8daa  (unknown)
    @     0x2b082d0a8ce4  (unknown)
    @     0x2b082d0a86e6  (unknown)
    @     0x2b082d0ab687  (unknown)
    @           0x739689  caffe::CuDNNConvolutionLayer<>::LayerSetUp()
    @           0x42c1f0  caffe::Layer<>::SetUp()
    @           0x49d776  caffe::CuDNNConvolutionLayerTest_TestSimpleConvolutionGroupCuDNN_Test<>::TestBody()
    @           0x68d613  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @           0x6840b7  testing::Test::Run()
    @           0x68415e  testing::TestInfo::Run()
    @           0x684265  testing::TestCase::Run()
    @           0x6875a8  testing::internal::UnitTestImpl::RunAllTests()
    @           0x687837  testing::UnitTest::Run()
    @           0x41e9d0  main
    @     0x2b0831cf1ec5  (unknown)
    @           0x4261c7  (unknown)
    @              (nil)  (unknown)
make: *** [runtest] Aborted (core dumped) caffe::NetTest_TestReshape_Test<>::TestBody()
    @           0x68d613  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @           0x6840b7  testing::Test::Run()
    @           0x68415e  testing::TestInfo::Run()
    @           0x684265  testing::TestCase::Run()
    @           0x6875a8  testing::internal::UnitTestImpl::RunAllTests()
    @           0x687837  testing::UnitTest::Run()
    @           0x41e9d0  main
    @     0x2b2a9f778ec5  (unknown)
    @           0x4261c7  (unknown)
    @              (nil)  (unknown)
make: *** [runtest] Aborted (core dumped)

I haven't been able to get around this.  Does anyone know how to fix this problem?  

I would appreciate whatever help you can provide.

Cliff Woolley

unread,
Oct 14, 2014, 1:21:52 PM10/14/14
to caffe...@googlegroups.com
cuDNN requires a GPU with CUDA compute capability 3.x or higher -- i.e., Kepler- or Maxwell-generation GPUs.  Fermi and earlier GPUs are not supported.
 
Thanks,
Cliff

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/e2d326fc-02ce-48a4-9e57-ef8d25c90753%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Leslie N. Smith

unread,
Oct 14, 2014, 2:54:32 PM10/14/14
to caffe...@googlegroups.com
I am running on a Tesla K40.  I also have access to a server with Tesla K20s and could try that.  These are new Nvidia GPUs so I expect it should work.

Are you saying that the "CUDNN_STATUS_ARCH_MISMATCH" implies a problem with the hardware?  

Thanks,
Leslie

Leslie N. Smith

unread,
Oct 14, 2014, 3:29:34 PM10/14/14
to caffe...@googlegroups.com
I just tried using CuDNN on another server with Tesla K20s and received the following error message:

[----------] 8 tests from CuDNNNeuronLayerTest/1, where TypeParam = double
[ RUN      ] CuDNNNeuronLayerTest/1.TestTanHCuDNN
F1014 15:25:36.048496  8779 cudnn_tanh_layer.cpp:15] Check failed: status == CUDNN_STATUS_SUCCESS (1 vs. 0)  CUDNN_STATUS_NOT_INITIALIZED


Thoughts?

Cliff Woolley

unread,
Oct 14, 2014, 3:32:46 PM10/14/14
to Leslie N. Smith, caffe...@googlegroups.com
Tesla K20 and K40 are both of the Kepler generation (compute capability 3.5, to be specific), so you should be fine in that regard.  CUDNN_STATUS_NOT_INITIALIZED would usually mean that your CUDA driver isn't new enough -- it needs to support CUDA 6.5 (hence driver version 340.xx as a minimum).
 
Please let me know if upgrading your CUDA driver doesn't resolve this for you.
 
Thanks,
Cliff

Leslie N. Smith

unread,
Oct 14, 2014, 3:57:20 PM10/14/14
to caffe...@googlegroups.com, iphys...@gmail.com
Thank you Cliff.  You are right about both issues.

1.  The server with the K40 also has an older GPU and my job pointed to the older GPU.
2.  The server with K20s doesn't have CUDA 6.5.

I apologize for my ignorance and appreciate your patience.

Regards,
Leslie

Cliff Woolley

unread,
Oct 14, 2014, 4:59:29 PM10/14/14
to Leslie N. Smith, caffe...@googlegroups.com
 
Glad you got it sorted out!  Happy I could help.
 
--Cliff

Víctor Ponce López

unread,
Nov 3, 2014, 9:55:19 AM11/3/14
to caffe...@googlegroups.com, iphys...@gmail.com
Hi,

I have a GeForce GT 520M GPU, which was detected as CUDA capable GPU when I ran the deviceQuery sample from CUDA 6.5 installation. However, I also got similar error messages when running "make runtest" command:

CuDNNSoftmaxLayerTest/0.TestGradientCuDNN
F1103 14:33:00.619683 27624 cudnn_softmax_layer.cpp:19] Check failed: status == CUDNN_STATUS_SUCCESS (6 vs. 0)  CUDNN_STATUS_ARCH_MISMATCH
*** Check failure stack trace: ***
    @     0x2ad072599dbd  google::LogMessage::Fail()
    @     0x2ad07259bc5d  google::LogMessage::SendToLog()
    @     0x2ad0725999ac  google::LogMessage::Flush()
    @     0x2ad07259c57e  google::LogMessageFatal::~LogMessageFatal()
    @           0x72a94a  caffe::CuDNNSoftmaxLayer<>::LayerSetUp()
    @           0x42f840  caffe::Layer<>::SetUp()
    @           0x43182f  caffe::GradientChecker<>::CheckGradientExhaustive()
    @           0x43813a  caffe::CuDNNSoftmaxLayerTest_TestGradientCuDNN_Test<>::TestBody()
    @           0x68dac3  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @           0x684567  testing::Test::Run()
    @           0x68460e  testing::TestInfo::Run()
    @           0x684715  testing::TestCase::Run()
    @           0x687a58  testing::internal::UnitTestImpl::RunAllTests()
    @           0x687ce7  testing::UnitTest::Run()
    @           0x420110  main
    @     0x2ad077a13ec5  (unknown)
    @           0x427907  (unknown)
make: *** [runtest] Aborted (core dumped)

I also got this error when trying to execute the ./examples/mnist/train_lenet.sh sample:

cudnn_conv_layer.cpp:30] Check failed: status == CUDNN_STATUS_SUCCESS (6 vs. 0) CUDNN_STATUS_ARCH_MISMATCH *** Check failure stack trace: *** @ 0x7f9268420dbd google::LogMessage::Fail() @ 0x7f9268422c5d google::LogMessage::SendToLog() @ 0x7f92684209ac google::LogMessage::Flush() @ 0x7f926842357e google::LogMessageFatal::~LogMessageFatal() @ 0x501899 caffe::CuDNNConvolutionLayer<>::LayerSetUp() @ 0x479f41 caffe::Net<>::Init() @ 0x47b84e caffe::Net<>::Net() @ 0x4597a0 caffe::Solver<>::InitTrainNet() @ 0x45aa56 caffe::Solver<>::Init() @ 0x45abb6 caffe::Solver<>::Solver() @ 0x41b6d0 caffe::GetSolver<>() @ 0x417af4 train() @ 0x4124e1 main @ 0x7f9262e3cec5 (unknown) @ 0x416647 (unknown) Aborted (core dumped)

I'm not really sure, but I suspect it could it be a problem of the variable LD_LIBRARY_PATH, since I'm trying to include the paths directly it in my .bashrc file but I don't know whether I'm doing something wrong. I had to include these lines to allow CAFFE finding some libraries to get success with "make clean/all/test/runtest". This is the end lines of my .bashrc file:

# added by Anaconda 2.1.0 installer export PATH="$HOME/anaconda/bin:$PATH" # added to get fast access to CAFFE libraries CAFFE=$HOME/anaconda/lib/python2.7/site-packages/caffe

# added to compile CAFFE libraries export PATH="/usr/local/cuda-6.5/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-6.5/lib64:$LD_LIBRARY_PATH" export LD_LIBRARY_PATH="/usr/lib/:$LD_LIBRARY_PATH" export LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH" export LD_LIBRARY_PATH="/usr/local/lib:$LD_LIBRARY_PATH" export LD_LIBRARY_PATH="$HOME/anaconda/lib:$LD_LIBRARY_PATH"

Any help will be appreciated.
 
 
 


El martes, 14 de octubre de 2014 22:59:29 UTC+2, Cliff Woolley escribió:
 
Glad you got it sorted out!  Happy I could help.el
 
--Cliff

Cliff Woolley

unread,
Nov 3, 2014, 10:15:30 AM11/3/14
to Víctor Ponce López, caffe...@googlegroups.com

GeForce GT 520M has compute capability 2.1 (see http://developer.nvidia.com/cuda-gpus ), which is not supported by cuDNN.  You'll have to either stick with the default (cuBLAS-based) engine in Caffe or else upgrade to a machine with a newer GPU.

-Cliff

Víctor Ponce López

unread,
Nov 3, 2014, 11:24:55 AM11/3/14
to caffe...@googlegroups.com, v88p...@gmail.com
Thank you Cliff, 

I solved it commenting both cuDNN and CPU possibilities at the beginning of the Makefile.config file in order to work only with CUDA and then rebuilding the library with the "make clean/all/test/runtest" commands. It shows a message skipping two test samples when running "make test", since they are the cuDNN based samples that I am not using anymore. Then, I was able run the mnist/train_lenet.sh example.

Best,

Víctor
Reply all
Reply to author
Forward
0 new messages