Check failed: result == ncclSuccess (1 vs. 0) unhandled cuda error

289 views
Skip to first unread message

Jeffery Kow

unread,
Dec 6, 2017, 10:55:35 AM12/6/17
to Caffe Users

Hi, while I am trying to install nvidia/caffe with Cmaka got no issue at all which show all the correct parameter. If you need to look at the caffe configuration summary do let me know.


CUDA Version: CUDA 8.0
NCCL Version: NCCL 2.1
GPU: 4x GTX Titan X 12GB (Maxwell)
OS: Ubuntu 16.04


This is the error message I am getting while running thorough [make runtest]:


• [----------] 8 tests from RMSPropSolverTest/2, where TypeParam = caffe::GPUDevice
[ RUN ] RMSPropSolverTest/2.TestRMSPropLeastSquaresUpdateWithRmsDecay
F1206 07:37:19.964359 10589 parallel.cpp:156] Check failed: result == ncclSuccess (1 vs. 0) unhandled cuda error
*** Check failure stack trace: ***
F1206 07:37:19.964366 10588 parallel.cpp:156] Check failed: result == ncclSuccess (1 vs. 0) unhandled cuda error
*** Check failure stack trace: ***
@ 0x7f55d9a0f5cd google::LogMessage::Fail()
@ 0x7f55d9a0f5cd google::LogMessage::Fail()
@ 0x7f55d9a11433 google::LogMessage::SendToLog()
@ 0x7f55d9a11433 google::LogMessage::SendToLog()
@ 0x7f55d9a0f15b google::LogMessage::Flush()
@ 0x7f55d9a0f15b google::LogMessage::Flush()
@ 0x7f55d9a11e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f55d9a11e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f55dabb1da3 caffe::P2PSync::InternalThreadEntry()
@ 0x7f55dabb1da3 caffe::P2PSync::InternalThreadEntry()
@ 0x7f55dab70d3b caffe::InternalThread::entry()
@ 0x7f55dab70d3b caffe::InternalThread::entry()
@ 0x7f55dab72ceb boost::detail::thread_data<>::run()
@ 0x7f55dab72ceb boost::detail::thread_data<>::run()
@ 0x7f55da0795d5 (unknown)
@ 0x7f55da0795d5 (unknown)
@ 0x7f55d9c3a6ba start_thread
@ 0x7f55d9c3a6ba start_thread
@ 0x7f55d3a0b3dd clone
@ 0x7f55d3a0b3dd clone
@ (nil) (unknown)
Aborted (core dumped)
src/caffe/test/CMakeFiles/runtest.dir/build.make:57: recipe for target 'src/caffe/test/CMakeFiles/runtest' failed
make[3]: *** [src/caffe/test/CMakeFiles/runtest] Error 134
CMakeFiles/Makefile2:328: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/all' failed
make[2]: *** [src/caffe/test/CMakeFiles/runtest.dir/all] Error 2
CMakeFiles/Makefile2:335: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/rule' failed
make[1]: *** [src/caffe/test/CMakeFiles/runtest.dir/rule] Error 2
Makefile:240: recipe for target 'runtest' failed
make: *** [runtest] Error 2

Bhargava Narendra

unread,
Feb 6, 2020, 2:25:09 AM2/6/20
to Caffe Users
I have faced the same error. Check if your NCCL version is compatible with your CUDA version(I too do not know how to check the version of NCCL. But I have CUDA 10.0 installed and NCCL which is compatible with 10.1). Try reinstalling the NCCL version which is compatible with your CUDA version. I have reinstalled NCCL for CUDA10.0 and now everything is working just fine.
Reply all
Reply to author
Forward
0 new messages