Hi all,
I have a question regarding the GPU implementation. I am not following a matrix-free implementation but looking at step-64 It it is given this help:
unsigned int size = 10;
...do something with the rw_vector...
// Move the data to the device:
LinearAlgebra::CUDAWrappers::Vector<double> vector_dev(size);
...do some computations on the device...
// Move the data back to the host:
It is not clear to me when do I need to use __device__. Is it in the function that already move data to and from device, or is in the function that do some computations on the device after moving data?
Is it any example of gpu implementation without matrix-free to see how the data is moved and managed?
Thank you