Is this a bug? Training using NCCL with 2 gpus 1080 and 1060 and a LevelDB Data Layer? When using single GPU this does not happen. It naively appears to me that the levelDB is trying to be opened twice. I0129 13:13:42.833976 25110 net.cpp:213] pool1 needs backward computation. I0129 13:13:42.833979 25110 net.cpp:213] relu1 needs backward computation. I0129 13:13:42.833981 25110 net.cpp:213] conv1 needs backward computation. I0129 13:13:42.833984 25110 net.cpp:215] data does not need backward computation. I0129 13:13:42.833986 25110 net.cpp:257] This network produces output loss I0129 13:13:42.833997 25110 net.cpp:270] Network initialization done. I0129 13:13:42.834034 25110 solver.cpp:56] Solver scaffolding done. I0129 13:13:42.834445 25110 caffe.cpp:248] Starting Optimization F0129 13:13:43.096998 25119 db_leveldb.cpp:16] Check failed: status.ok() Failed to open leveldb /home/farhan/intl910-200a/Training_1 IO error: lock /home/farhan/intl910-200a/Training_1/LOCK: already held by process *** Check failure stack trace: *** @ 0x7f4419d835cd google::LogMessage::Fail() @ 0x7f4419d85433 google::LogMessage::SendToLog() @ 0x7f4419d8315b google::LogMessage::Flush() @ 0x7f4419d85e1e google::LogMessageFatal::~LogMessageFatal() @ 0x7f441a46fd8b caffe::db::LevelDB::Open() @ 0x7f441a3e27ff caffe::DataLayer<>::DataLayer() @ 0x7f441a3e29c2 caffe::Creator_DataLayer<>() @ 0x7f441a48d3e0 caffe::Net<>::Init() @ 0x7f441a4903fe caffe::Net<>::Net() @ 0x7f441a2d9405 caffe::Solver<>::InitTrainNet() @ 0x7f441a2da875 caffe::Solver<>::Init() @ 0x7f441a2dab8f caffe::Solver<>::Solver() @ 0x7f441a49c941 caffe::Creator_SGDSolver<>() @ 0x416e0c caffe::SolverRegistry<>::CreateSolver() @ 0x7f441a4c5ecb caffe::Worker<>::InternalThreadEntry() @ 0x7f441a4afba5 caffe::InternalThread::entry() @ 0x7f441a4b0ace boost::detail::thread_data<>::run() @ 0x7f4418a545d5 (unknown) @ 0x7f441882d6ba start_thread @ 0x7f4418d703dd clone @ (nil) (unknown) Aborted (core dumped) |