Pb with slow GPU date transferts with OpenCL on Intel.

30 views

Skip to first unread message

Guillaume Schmid

unread,

Oct 5, 2022, 7:53:07 AM10/5/22

to ArrayFire Users

Hello!

I am using Arrayfire on a real-time video processing software for biomedical research.

I store multiple images on the GPU upon acquisition (3 channel RGB 8 bits or 1 channel float 32), and I do some operations such as difference, apply lut, and so on. I also perform a histogram calculation.

I noticed, even on the weakest machines I tested, that the Image processing functions goes very fast, 1 or 2 milliseconds. But as soon as I transfer the memory to the host, it goes up to 30ms.

It happens for the histogram, where the calculation itself takes 1ms, but transferring the result to the host memory to use it takes 30ms.

I use the simple array.host(pre_allocated_buffer) function and it is quite slow.

I made some tests using alloc/free_pinned, but it was not better.

At some point, I need to retrieve the images from the memory to display them, and this operation are too slow to do in real time (and that is a pity since image processing itself is very fast).

I test the code on multiple machines. My development computer is an AMD + NVidia, but the target hardware are thinkpads with Intel.

Arrayfire uses the OpenCL backend on all computers.

It is a 3.8.2 using OneAPI and intel opencl icd on Ubuntu 20.04.

I don't use custom kernels, only functions provided by arrayfire to do image processing.

I use the rust library.

ArrayFire configuration: ArrayFire v3.8.2 (OpenCL, 64-bit Linux, build
default)
[0] INTEL: Intel(R) Gen9 HD Graphics NEO, 6130 MB

Is there any trick to make those transfers faster?

Yours,

Guillaume.

John Melonakos

unread,

Nov 7, 2022, 2:30:43 PM11/7/22

to ArrayFire Users

Due to ArrayFire's lazy evaluation, you will need to benchmark individual parts by adding "af::eval(output);" followed by "af::sync()" to get accurate timings of each code block.

If you compile arrayfire in debug mode, af::sync() is added after each call except for JIT operations. JIT operations will still need you to add af::eval() and af::sync() explicitly.

Reply all

Reply to author

Forward

0 new messages