Hi,
I'm trying to compile Caffe opencl branch on Ubuntu 18.04 / kernel 4.17.2. GPU is a Vega 56 amdgpu-pro
18.20.606296
So I do:
cd buildcmake ..
make -j16
make runtest
Many tests succeed, eventually one segfaults, depending on how runtest randomizes them.
For instance:
[ RUN ] BlobMathTest/1.TestSumOfSquares
F0627 06:10:22.237754 20654 ocl_device_kernel.cpp:67] Check failed: error == 0 (-52 vs. 0) CL_INVALID_KERNEL_ARGS (libdnn_dot)
*** Check failure stack trace: ***
@ 0x7efdcb6710cd google::LogMessage::Fail()
@ 0x7efdcb672f33 google::LogMessage::SendToLog()
@ 0x7efdcb670c28 google::LogMessage::Flush()
@ 0x7efdcb673999 google::LogMessageFatal::~LogMessageFatal()
@ 0x7efdcc5ea0f4 caffe::OclDeviceKernel::Execute()
@ 0x7efdccad2d9e caffe::LibDNNBlas<>::dot()
@ 0x7efdcc5700ed caffe::Device::dot_half()
@ 0x7efdcc5f688e caffe::OclDevice::dot_half()
@ 0x7efdcc561bed caffe::Device::dot<>()
@ 0x7efdcc6cdaf6 caffe::Blob<>::sumsq_data()
@ 0x55fbfe4e30d1 (unknown)
@ 0x55fbfea223da (unknown)
@ 0x55fbfea1a83a (unknown)
@ 0x55fbfea1a91c (unknown)
@ 0x55fbfea1aa84 (unknown)
@ 0x55fbfea1b1e4 (unknown)
@ 0x55fbfea1b327 (unknown)
@ 0x55fbfe44847a (unknown)
@ 0x7efdc91d8b97 __libc_start_main
@ 0x55fbfe452d7a (unknown)
Aborted (core dumped)
Build configuration:
-- ******************* Caffe Configuration Summary *******************
-- General:
-- Version : 1.0.0
-- Git : unknown
-- System : Linux
-- C++ compiler : /usr/bin/c++
-- Release CXX flags : -O3 -DNDEBUG -fPIC -Wall -std=c++11 -DCMAKE_BUILD -march=native -Wno-sign-compare -Wno-uninitialized
-- Debug CXX flags : -g -fPIC -Wall -std=c++11 -DCMAKE_BUILD -march=native -Wno-sign-compare -Wno-uninitialized
-- Build type : Release
--
-- BUILD_SHARED_LIBS : ON
-- BUILD_python : ON
-- BUILD_matlab : OFF
-- BUILD_docs : ON
-- CPU_ONLY : OFF
-- USE_OPENCV : ON
-- USE_FFT : OFF
-- USE_LEVELDB : ON
-- USE_LMDB : ON
-- USE_NCCL : OFF
-- ALLOW_LMDB_NOLOCK : OFF
-- USE_HDF5 : ON
--
-- Dependencies:
-- BLAS : Yes (Atlas)
-- Boost : Yes (ver. 1.65)
-- glog : Yes
-- gflags : Yes
-- protobuf : Yes (ver. 3.0.0)
-- lmdb : Yes (ver. 0.9.21)
-- LevelDB : Yes (ver. 1.20)
-- Snappy : Yes (ver. ..)
-- OpenCV : Yes (ver. 3.2.0)
-- CUDA : No
--
-- Python:
-- Interpreter : /usr/bin/python3 (ver. 3.6.5)
-- Libraries : /usr/lib/x86_64-linux-gnu/libpython3.6m.so (ver 3.6.5)
-- NumPy : /usr/lib/python3/dist-packages/numpy/core/include (ver 1.13.3)
--
-- Documentaion:
-- Doxygen : /usr/bin/doxygen (1.8.13)
-- config_file : /home/walter/Documents/caffe-opencl/.Doxyfile
--
-- Install:
-- Install path : /home/walter/Documents/caffe-opencl/build/install
--
-- Configuring done
clinfo says:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1 AMD-APP (2639.3)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: Radeon RX Vega
Device Topology: PCI[ B#40, D#0, F#0 ]
Max compute units: 56
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1590Mhz
Address bits: 64
Max memory allocation: 4244635648
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 8573157376
Constant buffer size: 4244635648
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Max pipe arguments: 0
Max pipe active reservations: 0
Max pipe packet size: 0
Max global variable size: 0
Max global variable preferred total size: 0
Max read/write image args: 0
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: No
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 0x7f99b742ba70
Name: gfx900
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.2
Driver version: 2639.3 (PAL,HSAIL)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (2639.3)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event