Deeplab 2, a prominent image-labeling network, uses their own modified version of Caffe (I'm hoping some users here will be familiar with it).
I'm trying to get the code working on an Ubuntu 16 desktop where I've successfully run normal Caffe many times. When I 'make runtest', however, one test fails, and then towards the end the whole thing crashes:
The failed test is
'[ FAILED ] BatchNormLayerTest/2.TestGradient, where TypeParam = caffe::GPUDevice<float> (5066 ms)'
The crash, when I run make V=1 runtest, is:
'F0807 15:32:53.907034 3963 blob.cpp:163] Check failed: data_
*** Check failure stack trace: ***
@ 0x7fd0d29e05cd google::LogMessage::Fail()
@ 0x7fd0d29e2433 google::LogMessage::SendToLog()
@ 0x7fd0d29e015b google::LogMessage::Flush()
@ 0x7fd0d29e2e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7fd0d03a214b caffe::Blob<>::mutable_cpu_data()
@ 0x7fd0d047dbf7 caffe::BatchNormLayer<>::Forward_cpu()
@ 0x472852 caffe::Layer<>::Forward()
@ 0x62add7 caffe::BatchNormLayerTest_TestForwardInplace_Test<>::TestBody()
@ 0x8b7b13 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x8b112a testing::Test::Run()
@ 0x8b1278 testing::TestInfo::Run()
@ 0x8b1355 testing::TestCase::Run()
@ 0x8b262f testing::internal::UnitTestImpl::RunAllTests()
@ 0x8b2953 testing::UnitTest::Run()
@ 0x46649d main
@ 0x7fd0cf710830 __libc_start_main
@ 0x46d829 _start
@ (nil) (unknown)
Makefile:526: recipe for target 'runtest' failed
make: *** [runtest] Aborted (core dumped)'
From what I've been able to find, data_ seems to have something to do with a network layer not receiving proper data, possibly. But I've been unable to find out what network is running this test or on what data.
Details: I'm running CUDA 8.0, no cuDNN because I only have a Tesla c2050 which cannot run cuDNN. And I've gotten the error above virtually every time I tried to run the test, but one time three tests failed as follows:
[ FAILED ] BatchNormLayerTest/3.TestGradient, where TypeParam = caffe::GPUDevice<double> (4656 ms)
[ FAILED ] BatchNormLayerTest/3.TestForwardInplace, where TypeParam = caffe::GPUDevice<double> (3 ms)
[ FAILED ] BatchNormLayerTest/3.TestForward, where TypeParam = caffe::GPUDevice<double> (2 ms)
I include more detailed copy of the error messages below.