Skip to first unread message

Farhan Abdul

unread,
Jan 29, 2018, 5:34:25 PM1/29/18
to Caffe Users

Is this a bug?
Training using NCCL with 2 gpus 1080 and 1060 and a LevelDB Data Layer?

When using single GPU this does not happen.

It naively appears to me that the levelDB is trying to be opened twice.

I0129 13:13:42.833976 25110 net.cpp:213] pool1 needs backward computation.
I0129 13:13:42.833979 25110 net.cpp:213] relu1 needs backward computation.
I0129 13:13:42.833981 25110 net.cpp:213] conv1 needs backward computation.
I0129 13:13:42.833984 25110 net.cpp:215] data does not need backward computation.
I0129 13:13:42.833986 25110 net.cpp:257] This network produces output loss
I0129 13:13:42.833997 25110 net.cpp:270] Network initialization done.
I0129 13:13:42.834034 25110 solver.cpp:56] Solver scaffolding done.
I0129 13:13:42.834445 25110 caffe.cpp:248] Starting Optimization
F0129 13:13:43.096998 25119 db_leveldb.cpp:16] Check failed: status.ok() Failed to open leveldb /home/farhan/intl910-200a/Training_1
IO error: lock /home/farhan/intl910-200a/Training_1/LOCK: already held by process
*** Check failure stack trace: ***
@ 0x7f4419d835cd google::LogMessage::Fail()
@ 0x7f4419d85433 google::LogMessage::SendToLog()
@ 0x7f4419d8315b google::LogMessage::Flush()
@ 0x7f4419d85e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f441a46fd8b caffe::db::LevelDB::Open()
@ 0x7f441a3e27ff caffe::DataLayer<>::DataLayer()
@ 0x7f441a3e29c2 caffe::Creator_DataLayer<>()
@ 0x7f441a48d3e0 caffe::Net<>::Init()
@ 0x7f441a4903fe caffe::Net<>::Net()
@ 0x7f441a2d9405 caffe::Solver<>::InitTrainNet()
@ 0x7f441a2da875 caffe::Solver<>::Init()
@ 0x7f441a2dab8f caffe::Solver<>::Solver()
@ 0x7f441a49c941 caffe::Creator_SGDSolver<>()
@ 0x416e0c caffe::SolverRegistry<>::CreateSolver()
@ 0x7f441a4c5ecb caffe::Worker<>::InternalThreadEntry()
@ 0x7f441a4afba5 caffe::InternalThread::entry()
@ 0x7f441a4b0ace boost::detail::thread_data<>::run()
@ 0x7f4418a545d5 (unknown)
@ 0x7f441882d6ba start_thread
@ 0x7f4418d703dd clone
@ (nil) (unknown)
Aborted (core dumped)

Reply all
Reply to author
Forward
0 new messages