Zero-Copy on Jetson TX1 / TX2

Shobhit Srivastava

unread,

May 19, 2017, 12:01:49 PM5/19/17

to ArrayFire Users

Hi,

The Jetson TX1 / TX2 share the RAM between GPU and CPU which should ideally reduce data transfer time between the CPU and GPU. However, it also seems to be the case that this memory is uncached, which would affect performance in some applications.

I wanted to know whether ArrayFire exploits the zero-copy feature available on the TXs or not (given its pros and cons). Or is there something more intelligent happening under the hood (probably depending on memory-access patterns)?

FYI, I am on v3.4.2.

Thanks,

Shobhit

Pradeep Garigipati

unread,

Jun 5, 2017, 2:25:49 AM6/5/17

to Shobhit Srivastava, ArrayFire Users

Hello Shobhit,

At the moment(v3.4.2 or soon to be released v3.5.0), ArrayFire af::array doesn't take advantage of zero-copy automatically. However, ArrayFire implements its own memory manager that caches the any memory allocated in the past and reuses it if a memory request for similar size data is made in the future - this should reduce the latency of allocating memory for each new af::array object creation in appropriate scenarios.

In spite of the above, I believe you can still use zero-copy in your application by doing the following steps, theoretically - please note that I haven't tried the following steps.

Memory allocation is taken care of by the user using cudaHostAlloc and cudaHostGetDevicePointer
Call af::array constructors passing the device pointer and set pointer to be of device kind. For example, af::array a(dim4(10, 10), device_ptr_from_cudaHostGetDevicePointer_call, afDevice) - Notice that last parameter which defaults to afHost usually. However, you have to call array::lock() method immediately after the af::array is constructed from the device pointer you acquired by calling cudaHostGetDevicePointer because we need to call cudaFreeHost(hostPointer) instead of cudaFree(devicePointer).
Use the af::array created in the above manner with ArrayFire functions.
Towards end of your program, call cudaFreeHost on the host pointers you retrieved towards the beginning of the program.

Note: DO NOT call array::unlock() to match the array::lock() we did after array creation. Calling unlock would result in calling cudaFree on the device pointer stored by ArrayFire memory manager during resource cleanup at program exit.

Hope it helps.

Regards,

Pradeep.

--
You received this message because you are subscribed to the Google Groups "ArrayFire Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to arrayfire-use...@googlegroups.com.
To post to this group, send email to arrayfi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/arrayfire-users/db76fd47-dbe8-47ac-a34d-719a7c40629a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shobhit Srivastava

unread,

Jun 7, 2017, 11:56:07 AM6/7/17

to ArrayFire Users, shobh...@gmail.com

Pradeep,

Thanks for your reply and the suggested work-around. I will give it a try.

Regards,

Shobhit

Reply all

Reply to author

Forward