What is the correct/best way to use AF arrays in a CUDA kernel when using NVRTC framework and how to create AF arrays from NVRTC data?
So far I've used AF arrays in CUDA kernels with
CUdeviceptr* cudaArray = AFarray.device<CUdeviceptr>();
However, this creates odd behavior, meaning that it works fine in a mex-file in MATLAB, but causes illegal memory accesses in a mex-file in Octave.
For the second case of using NVRTC data in AF computations, I've simply transferred the data to host first and then created an AF array. This is, obviously, very inefficient though.