Make runtest fails for PowerPC (ppc64le)

62 views
Skip to first unread message

Rudra H

unread,
Jan 4, 2016, 3:14:50 AM1/4/16
to Caffe Users
Hi,

I am working on PowerPC(ppc64le) machine with Ubuntu 14.04 and 4 GPU's (all Tesla k80's)

The make runtest fails with log:

    @     0x100000040478 (unknown)
    @     0x1000043f6c10 (unknown)
    @     0x1000046daffc std::vector<>::erase()
    @     0x1000046d95f0 caffe::DevicePair::compute()
    @     0x1000046e0530 caffe::P2PSync<>::run()
    @         0x1022b8ec caffe::GradientBasedSolverTest<>::RunLeastSquaresSolver()
    @         0x10234154 caffe::GradientBasedSolverTest<>::TestLeastSquaresUpdate()
    @         0x10234560 caffe::RMSPropSolverTest_TestRMSPropLeastSquaresUpdateWithWeightDecay_Test<>::TestBody()
    @         0x105323a8 testing::internal::HandleExceptionsInMethodIfSupported<>()
    @         0x10524940 testing::Test::Run()
    @         0x10524a7c testing::TestInfo::Run()
    @         0x10524c64 testing::TestCase::Run()
    @         0x10529100 testing::internal::UnitTestImpl::RunAllTests()
    @         0x105294a0 testing::UnitTest::Run()
    @         0x1005bc58 main
    @     0x100004d44d00 (unknown)
    @     0x100004d44ef8 (unknown)
    @                0x0 (unknown)
make: *** [runtest] Segmentation fault (core dumped)


I tried to run "test_all.testbin"using GDB. I am getting below output:

Program received signal SIGSEGV, Segmentation fault.

__memcpy_ppc () at ../sysdeps/powerpc/powerpc64/memcpy.S:364

364     ../sysdeps/powerpc/powerpc64/memcpy.S: No such file or directory.


I could not find the path as given above. When I google searched, most of the hits showed this path as related to glibc. In ubuntu 14.04, there is libc6 instead of glibc. I tried searching system for this path. But, I didnt find anything. I searched for memcpy too, when I found many hits to caffe scripts directory.

I found one file "cpp_lint.py" which points to some options of memcpy files as given below:

caffe_alt_function_list = (
    ('memset', ['caffe_set', 'caffe_memset']),
    ('cudaMemset', ['caffe_gpu_set', 'caffe_gpu_memset']),
    ('memcpy', ['caffe_copy', 'caffe_memcpy']),
    ('cudaMemcpy', ['caffe_copy', 'caffe_gpu_memcpy']),
    )

Can we use alternate function "caffe_gpu_memcpy" as given in list? I am pointing to this particular function because I am using 4 GPU's Tesla K80's.

Any inputs from community will be greatly appreciated.

Thanks in advance

Anup Halarnkar

Rudra H

unread,
Jan 11, 2016, 12:45:14 AM1/11/16
to Caffe Users
Hi,

I did some backtrace with gdb. Here are the results...

The GPU memories are allocated on pci bus at addresses starting with 0x3xxx xxxx xxxx as per the lshw command on linux.
Here the addresses at which functions are called are given on left hand side:
0x00003fffb35019f0 -> caffe::P2PSync<float>::run,
0x00003fffb34fb2b0 -> caffe::DevicePair::compute,
0x00003fffb34fccbc -> std::vector<int, std::allocator<int> >::erase
0x00003fffb30a1068 ->__GI_memmove (dest=0x153ea198, src=<optimized out>, len=<optimized out>)

In last function __GI_memmove, I feel the above destination address is an offset within the GPU memory range. For instance, the final computed address could be 0x00003fff00000000 + 0x153ea198 = 0x00003fff153ea198
However, I am unable to relate the address 0x3fff with any of the GPU cards on pci bus

I have taken a backtrace using gdb and pasted it below for your reference.

(gdb) bt
#0  __memcpy_ppc () at ../sysdeps/powerpc/powerpc64/memcpy.S:364
#1  0x00003fffb30a1068 in __GI_memmove (dest=0x153ea198, src=<optimized out>, len=<optimized out>) at ../sysdeps/powerpc/memmove.c:54
#2  0x00003fffb34fccbc in std::vector<int, std::allocator<int> >::erase(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) ()
   from /root/anup/caffe/.build_release/test/../lib/libcaffe.so
#3  0x00003fffb34fb2b0 in caffe::DevicePair::compute(std::vector<int, std::allocator<int> >, std::vector<caffe::DevicePair,

std::allocator<caffe::DevicePair> >*) ()
   from /root/anup/caffe/.build_release/test/../lib/libcaffe.so
#4  0x00003fffb35019f0 in caffe::P2PSync<float>::run(std::vector<int, std::allocator<int> > const&) () from

/root/anup/caffe/.build_release/test/../lib/libcaffe.so
#5  0x0000000010235074 in caffe::GradientBasedSolverTest<caffe::GPUDevice<float> >::RunLeastSquaresSolver(float, float, float, int, int, int, bool, char

const*) ()
#6  0x0000000010247414 in caffe::GradientBasedSolverTest<caffe::GPUDevice<float> >::TestLeastSquaresUpdate(float, float, float, int) ()
#7  0x00000000102488bc in caffe::SGDSolverTest_TestLeastSquaresUpdate_Test<caffe::GPUDevice<float> >::TestBody() ()
#8  0x000000001053ce68 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char

const*) ()
#9  0x000000001052f400 in testing::Test::Run() ()
#10 0x000000001052f53c in testing::TestInfo::Run() ()
#11 0x000000001052f724 in testing::TestCase::Run() ()
#12 0x0000000010533bc0 in testing::internal::UnitTestImpl::RunAllTests() ()
#13 0x0000000010533f60 in testing::UnitTest::Run() ()
#14 0x000000001005c038 in main ()


Any inputs will be greatly appreciated.

Thanks in advance,
Anup Halarnkar
lshw.txt
gdb_r_test_all.testbin.txt
Reply all
Reply to author
Forward
0 new messages