Many "make runtest" seg faults after compiling opencl caffe

46 views

Skip to first unread message

wog...@gmail.com

unread,

Jun 27, 2018, 5:39:12 PM6/27/18

to Caffe Users

Hi,

I'm trying to compile Caffe opencl branch on Ubuntu 18.04 / kernel 4.17.2. GPU is a Vega 56 amdgpu-pro 18.20.606296

So I do:

cd buildcmake ..
make -j16
make runtest

Many tests succeed, eventually one segfaults, depending on how runtest randomizes them.

For instance:



[ RUN      ] BlobMathTest/1.TestSumOfSquares
F0627 06:10:22.237754 20654 ocl_device_kernel.cpp:67] Check failed: error == 0 (-52 vs. 0)  CL_INVALID_KERNEL_ARGS (libdnn_dot)
*** Check failure stack trace: ***
    @     0x7efdcb6710cd  google::LogMessage::Fail()
    @     0x7efdcb672f33  google::LogMessage::SendToLog()
    @     0x7efdcb670c28  google::LogMessage::Flush()
    @     0x7efdcb673999  google::LogMessageFatal::~LogMessageFatal()
    @     0x7efdcc5ea0f4  caffe::OclDeviceKernel::Execute()
    @     0x7efdccad2d9e  caffe::LibDNNBlas<>::dot()
    @     0x7efdcc5700ed  caffe::Device::dot_half()
    @     0x7efdcc5f688e  caffe::OclDevice::dot_half()
    @     0x7efdcc561bed  caffe::Device::dot<>()
    @     0x7efdcc6cdaf6  caffe::Blob<>::sumsq_data()
    @     0x55fbfe4e30d1  (unknown)
    @     0x55fbfea223da  (unknown)
    @     0x55fbfea1a83a  (unknown)
    @     0x55fbfea1a91c  (unknown)
    @     0x55fbfea1aa84  (unknown)
    @     0x55fbfea1b1e4  (unknown)
    @     0x55fbfea1b327  (unknown)
    @     0x55fbfe44847a  (unknown)
    @     0x7efdc91d8b97  __libc_start_main
    @     0x55fbfe452d7a  (unknown)
Aborted (core dumped)

Build configuration:


-- ******************* Caffe Configuration Summary *******************
-- General:
--   Version           :   1.0.0
--   Git               :   unknown
--   System            :   Linux
--   C++ compiler      :   /usr/bin/c++
--   Release CXX flags :   -O3 -DNDEBUG -fPIC -Wall -std=c++11 -DCMAKE_BUILD -march=native -Wno-sign-compare -Wno-uninitialized
--   Debug CXX flags   :   -g -fPIC -Wall -std=c++11 -DCMAKE_BUILD -march=native -Wno-sign-compare -Wno-uninitialized
--   Build type        :   Release
-- 
--   BUILD_SHARED_LIBS :   ON
--   BUILD_python      :   ON
--   BUILD_matlab      :   OFF
--   BUILD_docs        :   ON
--   CPU_ONLY          :   OFF
--   USE_OPENCV        :   ON
--   USE_FFT           :   OFF
--   USE_LEVELDB       :   ON
--   USE_LMDB          :   ON
--   USE_NCCL          :   OFF
--   ALLOW_LMDB_NOLOCK :   OFF
--   USE_HDF5          :   ON
-- 
-- Dependencies:
--   BLAS              :   Yes (Atlas)
--   Boost             :   Yes (ver. 1.65)
--   glog              :   Yes
--   gflags            :   Yes
--   protobuf          :   Yes (ver. 3.0.0)
--   lmdb              :   Yes (ver. 0.9.21)
--   LevelDB           :   Yes (ver. 1.20)
--   Snappy            :   Yes (ver. ..)
--   OpenCV            :   Yes (ver. 3.2.0)
--   CUDA              :   No
-- 
-- Python:
--   Interpreter       :   /usr/bin/python3 (ver. 3.6.5)
--   Libraries         :   /usr/lib/x86_64-linux-gnu/libpython3.6m.so (ver 3.6.5)
--   NumPy             :   /usr/lib/python3/dist-packages/numpy/core/include (ver 1.13.3)
-- 
-- Documentaion:
--   Doxygen           :   /usr/bin/doxygen (1.8.13)
--   config_file       :   /home/walter/Documents/caffe-opencl/.Doxyfile
-- 
-- Install:
--   Install path      :   /home/walter/Documents/caffe-opencl/build/install
-- 
-- Configuring done

clinfo says:

Number of platforms:                 1
  Platform Profile:                 FULL_PROFILE
  Platform Version:                 OpenCL 2.1 AMD-APP (2639.3)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:                 Advanced Micro Devices, Inc.
  Platform Extensions:                 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 


  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:                 1
  Device Type:                     CL_DEVICE_TYPE_GPU
  Vendor ID:                     1002h
  Board name:                     Radeon RX Vega
  Device Topology:                 PCI[ B#40, D#0, F#0 ]
  Max compute units:                 56
  Max work items dimensions:             3
    Max work items[0]:                 1024
    Max work items[1]:                 1024
    Max work items[2]:                 1024
  Max work group size:                 256
  Preferred vector width char:             4
  Preferred vector width short:             2
  Preferred vector width int:             1
  Preferred vector width long:             1
  Preferred vector width float:             1
  Preferred vector width double:         1
  Native vector width char:             4
  Native vector width short:             2
  Native vector width int:             1
  Native vector width long:             1
  Native vector width float:             1
  Native vector width double:             1
  Max clock frequency:                 1590Mhz
  Address bits:                     64
  Max memory allocation:             4244635648
  Image support:                 Yes
  Max number of images read arguments:         128
  Max number of images write arguments:         8
  Max image 2D width:                 16384
  Max image 2D height:                 16384
  Max image 3D width:                 2048
  Max image 3D height:                 2048
  Max image 3D depth:                 2048
  Max samplers within kernel:             16
  Max size of kernel argument:             1024
  Alignment (bits) of base address:         2048
  Minimum alignment (bytes) for any datatype:     128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                     Yes
    Round to nearest even:             Yes
    Round to zero:                 Yes
    Round to +ve and infinity:             Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                     Read/Write
  Cache line size:                 64
  Cache size:                     16384
  Global memory size:                 8573157376
  Constant buffer size:                 4244635648
  Max number of constant args:             8
  Local memory type:                 Scratchpad
  Local memory size:                 32768
  Max pipe arguments:                 0
  Max pipe active reservations:             0
  Max pipe packet size:                 0
  Max global variable size:             0
  Max global variable preferred total size:     0
  Max read/write image args:             0
  Max on device events:                 0
  Queue on device max size:             0
  Max on device queues:                 0
  Queue on device preferred size:         0
  SVM capabilities:                 
    Coarse grain buffer:             No
    Fine grain buffer:                 No
    Fine grain system:                 No
    Atomics:                     No
  Preferred platform atomic alignment:         0
  Preferred global atomic alignment:         0
  Preferred local atomic alignment:         0
  Kernel Preferred work group size multiple:     64
  Error correction support:             0
  Unified memory for Host and Device:         0
  Profiling timer resolution:             1
  Device endianess:                 Little
  Available:                     Yes
  Compiler available:                 Yes
  Execution capabilities:                 
    Execute OpenCL kernels:             Yes
    Execute native function:             No
  Queue on Host properties:                 
    Out-of-Order:                 No
    Profiling :                     Yes
  Queue on Device properties:                 
    Out-of-Order:                 No
    Profiling :                     No
  Platform ID:                     0x7f99b742ba70
  Name:                         gfx900
  Vendor:                     Advanced Micro Devices, Inc.
  Device OpenCL C version:             OpenCL C 1.2 
  Driver version:                 2639.3 (PAL,HSAIL)
  Profile:                     FULL_PROFILE
  Version:                     OpenCL 1.2 AMD-APP (2639.3)
  Extensions:                     cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

Reply all

Reply to author

Forward

0 new messages