Hi,
I'm trying to use reikna on Tinker Board with Mali-T760 GPU/
I started with very simple code:
import numpy
import reikna.cluda as cluda
from reikna.fft import FFT
api = cluda.ocl_api()
for platform in api.get_platforms():
for device in platform.get_devices():
if device.type == api.cl.device_type.GPU:
gpu_device = device
thr=api.Thread(gpu_device)
arr = numpy.random.normal(size=2048).astype(numpy.complex64)
fft = FFT(arr)
cfft = fft.compile(thr)
arr_dev = thr.to_device(arr)
res_dev = thr.array(arr.shape, numpy.complex64)
cfft(res_dev, arr_dev)
result = res_dev.get()
reference = numpy.fft.fft(arr)
print('Error:',numpy.linalg.norm(result - reference) / numpy.linalg.norm(reference))
I receive error in compilation for Mali GPU
ERROR:root:Failed to compile:
Traceback (most recent call last):
File "reikna-fft-simple.py", line 27, in <module>
cfft = fft.compile(thr)
File "/home/linaro/.local/lib/python3.5/site-packages/reikna/core/computation.py", line 207, in compile
self._tr_tree, translator, thread, fast_math, compiler_options, keep).finalize()
File "/home/linaro/.local/lib/python3.5/site-packages/reikna/core/computation.py", line 192, in _get_plan
return self._build_plan(plan_factory, thread.device_params, *args)
File "/home/linaro/.local/lib/python3.5/site-packages/reikna/fft/fft.py", line 581, in _build_plan
plan_factory, device_params, local_kernel_limit, output, input_, inverse)
File "/home/linaro/.local/lib/python3.5/site-packages/reikna/fft/fft.py", line 553, in _build_limited_plan
global_size=gsize, local_size=lsize, render_kwds=kwds)
File "/home/linaro/.local/lib/python3.5/site-packages/reikna/core/computation.py", line 473, in kernel_call
keep=self._keep)
File "/home/linaro/.local/lib/python3.5/site-packages/reikna/cluda/api.py", line 535, in compile_static
constant_arrays=constant_arrays, keep=keep)
File "/home/linaro/.local/lib/python3.5/site-packages/reikna/cluda/api.py", line 755, in __init__
constant_arrays=constant_arrays, keep=keep)
File "/home/linaro/.local/lib/python3.5/site-packages/reikna/cluda/api.py", line 624, in __init__
self.source, fast_math=fast_math, compiler_options=compiler_options, keep=keep)
File "/home/linaro/.local/lib/python3.5/site-packages/reikna/cluda/api.py", line 473, in _create_program
src, fast_math=fast_math, compiler_options=compiler_options, keep=keep)
File "/home/linaro/.local/lib/python3.5/site-packages/reikna/cluda/ocl.py", line 145, in _compile
return cl.Program(self._context, src).build(options=options, cache_dir=temp_dir)
File "/home/linaro/.local/lib/python3.5/site-packages/pyopencl/__init__.py", line 510, in build
options_bytes=options_bytes, source=self._source)
File "/home/linaro/.local/lib/python3.5/site-packages/pyopencl/__init__.py", line 554, in _build_and_catch_errors
raise err
pyopencl._cl.RuntimeError: clBuildProgram failed: BUILD_PROGRAM_FAILURE - clBuildProgram failed: BUILD_PROGRAM_FAILURE - clBuildProgram failed: BUILD_PROGRAM_FAILURE
Build on <pyopencl.Device 'Mali-T760' on 'ARM Platform' at 0x-4abe06c8>:
<source>:878:5: error: casting to void is not allowed
VIRTUAL_SKIP_THREADS;
^
<source>:148:30: note: expanded from here
#define VIRTUAL_SKIP_THREADS MARK_VIRTUAL_FUNCTIONS_AS_USED; if(virtual_skip_local_threads() || virtual_skip_groups() || virtual_skip_global_threads()) return
^
<source>:143:46: note: expanded from here
#define MARK_VIRTUAL_FUNCTIONS_AS_USED (void)(virtual_num_groups(0)); (void)(virtual_global_flat_id()); (void)(virtual_global_flat_size())
^
<source>:878:5: error: casting to void is not allowed
<source>:148:30: note: expanded from here
#define VIRTUAL_SKIP_THREADS MARK_VIRTUAL_FUNCTIONS_AS_USED; if(virtual_skip_local_threads() || virtual_skip_groups() || virtual_skip_global_threads()) return
^
<source>:143:77: note: expanded from here
#define MARK_VIRTUAL_FUNCTIONS_AS_USED (void)(virtual_num_groups(0)); (void)(virtual_global_flat_id()); (void)(virtual_global_flat_size())
^
<source>:878:5: error: casting to void is not allowed
<source>:148:30: note: expanded from here
#define VIRTUAL_SKIP_THREADS MARK_VIRTUAL_FUNCTIONS_AS_USED; if(virtual_skip_local_threads() || virtual_skip_groups() || virtual_skip_global_threads()) return
^
<source>:143:111: note: expanded from here
#define MARK_VIRTUAL_FUNCTIONS_AS_USED (void)(virtual_num_groups(0)); (void)(virtual_global_flat_id()); (void)(virtual_global_flat_size())
^
error: Compiler frontend failed (error code 59)
(options: -I /home/linaro/.local/lib/python3.5/site-packages/pyopencl/cl)
linaro@tinkerboard:~/fft_gpu$ ~
This is an output of clinfo for my board
linaro@tinkerboard:~/fft_gpu$ clinfo
Number of platforms 1
Platform Name ARM Platform
Platform Vendor ARM
Platform Version OpenCL 1.2 v1.r9p0-05rel0-git(f980191).e4ba9e4c6ff8005348d0332aae160089
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory
Platform Extensions function suffix ARM
Platform Name ARM Platform
Number of devices 1
Device Name Mali-T760
Device Vendor ARM
Device Vendor ID 0x7500001
Device Version OpenCL 1.2 v1.r9p0-05rel0-git(f980191).e4ba9e4c6ff8005348d0332aae160089
Driver Version 1.2
Device OpenCL C Version OpenCL C 1.2 v1.r9p0-05rel0-git(f980191).e4ba9e4c6ff8005348d0332aae160089
Device Type GPU
Device Profile FULL_PROFILE
Max compute units 4
Max clock frequency 99MHz
Device Partition (core)
Max number of sub-devices 0
Supported partition types None
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple 4
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 8 / 8 (cl_khr_fp16)
float 4 / 4
double 2 / 2 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Error Correction support No
Max memory allocation 527471616 (503MiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size <printDeviceInfo:89: get CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : error -30>
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 65536 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 65536x65536 pixels
Max 3D image size 65536x65536x65536 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Global
Local memory size 32768 (32KiB)
Max constant buffer size 65536 (64KiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop No
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) ARM Platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [ARM]
clCreateContext(NULL, ...) [default] Success [ARM]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name ARM Platform
Device Name Mali-T760
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name ARM Platform
Device Name Mali-T760
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.11
ICD loader Profile OpenCL 2.1
linaro@tinkerboard:~/fft_gpu$
Can someone help me with an idea how to start to troubleshoot this ?
Thanks,
Igor