I was looking into runtime.h and saw ("only") the presence of the
synchronous CUDA function library calls and searched for the
counterpart of the asynchronous memory transfer (including streams).
Did you/anyone thought of implementing it, or is there any reason I
did not recognized in the documentation/diploma thesis why this can
not be implemented?
(package: cupp_v0.1.4_rc.tar.gz)
Thanks,
Patrick
yes, it is true, they are currently not used (actually they were not
available the time is started developing CuPP). The way it is
currently implemented relies on blocking memory copies, as the memory
copies are done only when data located in global memory is accessed by
the CPU. In that situation, there is no use for any asynchronous
memory transfers. Async memory transfers could be used by doing some
sort of prefetching, say something like
kernel (input, output);
output.prefetch();
// do some calculations without output
write_to_file (output)
But this is -- as said -- currently not possible.
-Jens
On Feb 17, 12:26 pm, Patrick Kirsch <p-kir...@gmx.de> wrote:
> Jens Breitbart schrieb:> yes, it is true, they are currently not used (actually they were not
> > available the time is started developing CuPP). The way it is
> > currently implemented relies on blocking memory copies, as the memory
> > copies are done only when data located in global memory is accessed by
> > the CPU. In that situation, there is no use for any asynchronous
> > memory transfers.
>
> I thought, the low-level CUDA driver would do implicit neccessary
> synchronisation (if not every block of the requested memory range was
> copied).
I haven't looked in this part of the docu for a while. Are you talking
about global memory of the host counterpart? I was talking about
normal host memory, meaning that data will be transfered back to the
host memory, as soon as the host accesses an element of say e.g. a
vector. I am a little puzzeled how my read on the CPU side should
block.
> Say, if I would add (at least for me) asynchronous behaviour to CuPP, do
> you have any hints for me (regarding caveat in CuPP)?
Not in particular. I guess the best way is design your own data
structure. Take the class sample as a starting point and try to
implement a DS that uses async. memory transfers from there using the
transform(), dirty() functions. Let me know what kind of problems you
encounter or if you would need a callback functions that is called at
another time. I may be able to help you or at least point you to the
correct part, where you should add the needed functionality.
-Jens