Hi,
I'm playing with numba to write some code to run on a gpu. I have two kernels that i want to execute on the same input data one after the other, so i'd like to avoid to copy the same input data to the device twice (which is what i understand happens now if i call twice a vectorized function with target="gpu" one after the other on the same input).
Is there a way i can control without using cuda-specific code, as I'm using want to write code that is general to cpu as well, as i'm switching targets based on input size?
Thanks,
Claudio