asynchronous/streaming transfer

8 views
Skip to first unread message

Patrick Kirsch

unread,
Feb 16, 2010, 3:37:18 AM2/16/10
to CuPP
Hey,

I was looking into runtime.h and saw ("only") the presence of the
synchronous CUDA function library calls and searched for the
counterpart of the asynchronous memory transfer (including streams).
Did you/anyone thought of implementing it, or is there any reason I
did not recognized in the documentation/diploma thesis why this can
not be implemented?
(package: cupp_v0.1.4_rc.tar.gz)

Thanks,
Patrick

Jens Breitbart

unread,
Feb 17, 2010, 4:57:46 AM2/17/10
to CuPP
Hi,

yes, it is true, they are currently not used (actually they were not
available the time is started developing CuPP). The way it is
currently implemented relies on blocking memory copies, as the memory
copies are done only when data located in global memory is accessed by
the CPU. In that situation, there is no use for any asynchronous
memory transfers. Async memory transfers could be used by doing some
sort of prefetching, say something like

kernel (input, output);
output.prefetch();
// do some calculations without output
write_to_file (output)

But this is -- as said -- currently not possible.

-Jens

Patrick Kirsch

unread,
Feb 17, 2010, 6:26:25 AM2/17/10
to cu...@googlegroups.com
Jens Breitbart schrieb:

> yes, it is true, they are currently not used (actually they were not
> available the time is started developing CuPP). The way it is
> currently implemented relies on blocking memory copies, as the memory
> copies are done only when data located in global memory is accessed by
> the CPU. In that situation, there is no use for any asynchronous
> memory transfers.
I thought, the low-level CUDA driver would do implicit neccessary
synchronisation (if not every block of the requested memory range was
copied).
Say, if I would add (at least for me) asynchronous behaviour to CuPP, do
you have any hints for me (regarding caveat in CuPP)?

>
> But this is -- as said -- currently not possible.
>
> -Jens
Patrick

Jens Breitbart

unread,
Feb 17, 2010, 7:28:13 AM2/17/10
to CuPP

On Feb 17, 12:26 pm, Patrick Kirsch <p-kir...@gmx.de> wrote:
> Jens Breitbart schrieb:> yes, it is true, they are currently not used (actually they were not
> > available the time is started developing CuPP). The way it is
> > currently implemented relies on blocking memory copies, as the memory
> > copies are done only when data located in global memory is accessed by
> > the CPU. In that situation, there is no use for any asynchronous
> > memory transfers.
>
> I thought, the low-level CUDA driver would do implicit neccessary
> synchronisation (if not every block of the requested memory range was
> copied).

I haven't looked in this part of the docu for a while. Are you talking
about global memory of the host counterpart? I was talking about
normal host memory, meaning that data will be transfered back to the
host memory, as soon as the host accesses an element of say e.g. a
vector. I am a little puzzeled how my read on the CPU side should
block.

> Say, if I would add (at least for me) asynchronous behaviour to CuPP, do
> you have any hints for me (regarding caveat in CuPP)?

Not in particular. I guess the best way is design your own data
structure. Take the class sample as a starting point and try to
implement a DS that uses async. memory transfers from there using the
transform(), dirty() functions. Let me know what kind of problems you
encounter or if you would need a callback functions that is called at
another time. I may be able to help you or at least point you to the
correct part, where you should add the needed functionality.

-Jens

Reply all
Reply to author
Forward
0 new messages