Kernels not compiling with Vienna-CL for openCL Intel build on Centos 7

38 views
Skip to first unread message

Krzysztof Nienartowicz

unread,
Feb 16, 2018, 10:20:24 AM2/16/18
to Caffe Users
Hello,
We have a cluster of nodes with Intel GT4e/GT3e cards for astronomical research, so far have been using our home build classification frameworks, but would really like to employ some tens of GPUs we have on our cluster.

We have had a hard time to get anything working though.

We followed the steps, mostly from the (Intel) openCL fork on BVLC repo as the Intel fork did not compile, regardless of tries.

We installed MLK, intel openCL drivers (v5), Vienna-CL and ISAAC, seems successfully, deployed patched kernel 4.7 for Intel but it was panicking, so we bumped kernel to 4.15 and we got GPU devices present:


clinfo

finds two devices (GPU and Xeon) as expected:

Number of platforms                               1
  Platform Name                                   Intel(R) OpenCL
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 2.0
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
  Platform Extensions function suffix             INTEL

  Platform Name                                   Intel(R) OpenCL
Number of devices                                 2
  Device Name                                     Intel(R) HD Graphics
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 2.0
  Driver Version                                  r5.0.63503
  Device OpenCL C Version                         OpenCL C 2.0
  Device Type                                     GPU
  Device Available                                Yes
  Device Profile                                  FULL_PROFILE
  Max compute units                               72
  Max clock frequency                             0MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     by <unknown> (0x7F6D00000000)
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Compiler Available                              Yes
  Linker Available                                Yes
  Preferred work group size multiple              32
  Sub-group sizes (Intel)                         8x16x32
  Preferred / native vector sizes
    char                                                16 / 16
    short                                                8 / 8
    int                                                  4 / 4
    long                                                 1 / 1
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                1 / 1
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
...

We configure it with:

cmake .. -DUSE_GREENTEA=ON -DUSE_CUDA=OFF -DUSE_INTEL_SPATIAL=ON -DBUILD_docs=0 -DUSE_ISAAC=ON -DISAAC_HOME=$ISAAC_HOME/ -DISAAC_INCLUDE_DIR=$ISSAC_HOME/include -DViennaCL_INCLUDE_DIR=$HOME/local/include -DBLAS=mkl -DOPENCL_LIBRARIES=/opt/intel/opencl/libOpenCL.so -DOPENCL_INCLUDE_DIRS=/opt/intel/opencl/include -DMKL_INCLUDE_DIR=$MKL_ROOT/compilers_and_libraries/linux/mkl/include -DMKL_RTL_LIBRARY=$MKL_ROOT/compilers_and_libraries/linux/mkl

Then it compiled with some openCL deprecation warnings but it the end we see OpenCL is detected:
 ./tools/caffe device_query -gpu all

I0216 16:02:53.756531  2328 common.cpp:438] Total devices: 2
I0216
16:02:53.757238  2328 common.cpp:439] CUDA devices: 0
I0216
16:02:53.757244  2328 common.cpp:440] OpenCL devices: 2
I0216
16:02:53.757247  2328 common.cpp:464] Device id:                     0
I0216
16:02:53.757252  2328 common.cpp:466] Device backend:                OpenCL
I0216
16:02:53.757266  2328 common.cpp:468] Backend details:               Intel(R) Corporation: OpenCL 2.0
I0216
16:02:53.757274  2328 common.cpp:470] Device vendor:                 Intel(R) Corporation
I0216
16:02:53.757279  2328 common.cpp:472] Name:                          Intel(R) HD Graphics
I0216
16:02:53.757283  2328 common.cpp:474] Total global memory:           53925871616
I0216
16:02:53.757288  2328 common.cpp:464] Device id:                     1
I0216
16:02:53.757292  2328 common.cpp:466] Device backend:                OpenCL
I0216
16:02:53.757299  2328 common.cpp:468] Backend details:               Intel(R) Corporation: OpenCL 2.0
I0216
16:02:53.757308  2328 common.cpp:470] Device vendor:                 Intel(R) Corporation
I0216
16:02:53.757313  2328 common.cpp:472] Name:                          Intel(R) Xeon(R) CPU E3-1585L v5 @ 3.00GHz
I0216
16:02:53.757431  2328 common.cpp:474] Total global memory:           67417825280



But when we try to run any test, it fails with 

./tools/caffe time -model ../models/bvlc_alexnet/deploy.prototxt -gpu 0

it seem to fail at the beginning, before kernel is compiled, hence later "Kernel not found" errors:

I0216 16:08:38.541231  2390 caffe.cpp:391] Use GPU with device ID 0
I0216 16:08:38.554819  2390 device.cpp:62] CL_DEVICE_HOST_UNIFIED_MEMORY: 1
Build Status = -2 ( Err = -11 )
Log: 1:37:26: warning: OpenCL extension 'cl_khr_fp64' is core feature or supported optional core feature - ignoring
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
                         ^
1:59:26: warning: OpenCL extension 'cl_khr_global_int32_base_atomics' is core feature or supported optional core feature - ignoring
#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
                         ^
fcl build 1 succeeded.

error: undefined reference to `_Z12atom_cmpxchgPVU3AS1mmm()'

error: backend compiler failed build.

Sources: #define ENABLE_DOUBLE_SUPPORT
#ifndef __OPENCL_VERSION__
#define __kernel

then we can see in the log:
#endif  // DOUBLE_SUPPORT_AVAILABLE

I0216
16:09:13.134887  2390 common.cpp:542] OpenCL platform: Intel(R) Corporation: OpenCL 2.0  does not work correctly.
I0216
16:09:13.141176  2390 net.cpp:57] Initializing net from parameters:
name
: "AlexNet"

and finally:


I0216
16:09:13.645362  2390 caffe.cpp:406] Performing Forward
ViennaCL: FATAL ERROR: Could not find kernel 'im2col_float' from program ''
Number of kernels in program: 0
terminate called after throwing an instance of
'viennacl::ocl::kernel_not_found'
  what
():  Kernel not found
*** Aborted at 1518793753 (unix time) try "date -d @1518793753" if you are using GNU date ***
PC
: @     0x7f8f137bf1f7 __GI_raise
*** SIGABRT (@0x145200000956) received by PID 2390 (TID 0x7f8f187daa40) from PID 2390; stack trace: ***
   
@     0x7f8f16ef45e0 (unknown)
   
@     0x7f8f137bf1f7 __GI_raise
   
@     0x7f8f137c08e8 __GI_abort
   
@     0x7f8f140c5ac5 (unknown)
   
@     0x7f8f140c3a36 (unknown)
   
@     0x7f8f140c3a63 (unknown)
   
@     0x7f8f140c3c83 (unknown)
   
@     0x7f8f1804d1d2 caffe::greentea_im2col_gpu<>()
   
@     0x7f8f180ecebc caffe::BaseConvolutionLayer<>::greentea_conv_im2col_gpu()
   
@     0x7f8f180ed04e caffe::BaseConvolutionLayer<>::forward_gpu_gemm()
   
@     0x7f8f18113e17 caffe::ConvolutionLayerSpatial<>::Forward_gpu()
   
@     0x7f8f181f3d3c caffe::Net<>::ForwardFromTo()
   
@     0x7f8f181f4137 caffe::Net<>::Forward()
   
@           0x41305a time()
   
@           0x40fcfc main
   
@     0x7f8f137abc05 __libc_start_main
   
@           0x410639 (unknown)


I attach the full log from the  ./tools/caffe time -model ../models/bvlc_alexnet/deploy.prototxt -gpu 0

Could you point please point us where the problem could be coming from?  
Any ideas?

Thanks,
Chris.




alexnetViennaCLProblem.txt
Reply all
Reply to author
Forward
0 new messages