Hello!
I am using Arrayfire on a real-time video processing software for biomedical research.
I store multiple images on the GPU upon acquisition (3 channel RGB 8 bits or 1 channel float 32), and I do some operations such as difference, apply lut, and so on. I also perform a histogram calculation.
I noticed, even on the weakest machines I tested, that the Image processing functions goes very fast, 1 or 2 milliseconds. But as soon as I transfer the memory to the host, it goes up to 30ms.
It happens for the histogram, where the calculation itself takes 1ms, but transferring the result to the host memory to use it takes 30ms.
I use the simple array.host(pre_allocated_buffer) function and it is quite slow.
I made some tests using alloc/free_pinned, but it was not better.
At some point, I need to retrieve the images from the memory to display them, and this operation are too slow to do in real time (and that is a pity since image processing itself is very fast).
I test the code on multiple machines. My development computer is an AMD + NVidia, but the target hardware are thinkpads with Intel.
Arrayfire uses the OpenCL backend on all computers.
It is a 3.8.2 using OneAPI and intel opencl icd on Ubuntu 20.04.
I don't use custom kernels, only functions provided by arrayfire to do image processing.
I use the rust library.
ArrayFire configuration: ArrayFire v3.8.2 (OpenCL, 64-bit Linux, build
default)
[0] INTEL: Intel(R) Gen9 HD Graphics NEO, 6130 MB
Is there any trick to make those transfers faster?
Yours,
Guillaume.