Transfer complex data from host to preallocated device array

411 views
Skip to first unread message

Iulian Popescu

unread,
Apr 28, 2015, 12:35:30 PM4/28/15
to arrayfi...@googlegroups.com
Hi,

I'm new to the ArrayFire APIs and I was trying to transfer single precision complex data from a host buffer in row major order into a 2D array already allocated on the device.
Is there a more elegant mechanism other than just issuing a cudaMemcpy call using the device and host pointers?

Thanks,
Iulian

Shehzan Mohammed

unread,
Apr 28, 2015, 12:52:08 PM4/28/15
to arrayfi...@googlegroups.com
Hi

For transferring complex data from host to device, this post deals with that.

The array constructors allow you to copy data from host to device without the user having to call cudaMemcpy explicitly. The getting started page shows nice examples about creating arrayfire from different types of data.

-Shehzan

Iulian Popescu

unread,
Apr 28, 2015, 4:54:37 PM4/28/15
to arrayfi...@googlegroups.com
Thank you very much! That helped.

Iulian Popescu

unread,
May 1, 2015, 3:40:51 PM5/1/15
to arrayfi...@googlegroups.com
Hi Shehzan,

I've just want to re-iterate and make sure my understanding is correct. If you instantiate an array such as:

af::array d_myArray(dim1, dim2, c32);

It will automatically allocate a device buffer according with the specified sizes using cudaMalloc and when the object gets free then the device buffer will be freed by calling cudaFree.
For transfering the memory from a host buffer into the device buffer, one can use the following statement:

d_myArray = af::array(dim1, dim2, (af::cfloat*)h_myData);

This will NOT allocate another device buffer first but it will just call cudaMemcpy to transfer the memory from the h_myData host buffer into the device buffer d_myArray.

Thanks,
Iulian




On Tuesday, April 28, 2015 at 12:52:08 PM UTC-4, Shehzan Mohammed wrote:

Shehzan Mohammed

unread,
May 1, 2015, 3:46:44 PM5/1/15
to arrayfi...@googlegroups.com
af::array d_myArray(dim1, dim2, c32); -> Creates memory.

d_myArray = af::array(dim1, dim2, (af::cfloat*)h_myData); -> Creates memory space and copies host to device

d_myArray = af::array(dim1, dim2, (af::cfloat*)d_myData, afDevice); -> Creates memory space and copies device to device

Iulian Popescu

unread,
May 1, 2015, 4:13:40 PM5/1/15
to arrayfi...@googlegroups.com
What about if I just want to copy host to device? So first I do:


af::array d_myArray(dim1, dim2, c32);

to create the memory on the device, for instance in the constructor of my computation object and then I have a run() function where I want just to copy from host to the device? How would I do that? I don't want to double allocate the memory on the device.

Thanks,
Iulian

Shehzan Mohammed

unread,
May 1, 2015, 4:15:57 PM5/1/15
to arrayfi...@googlegroups.com
Just do

array myarray(dim0, dim1, (cfloat *)h_ptr, afHost);

This will allocate memory and copy. 

Iulian Popescu

unread,
May 1, 2015, 4:44:38 PM5/1/15
to arrayfi...@googlegroups.com
I'm sorry if I wasn't clear again, but I need to first allocate and then in the run() function copy. I need to separate those two because I need to reuse the device buffer. I don't want the overhead of creating a device buffer every time I get new data.

Thanks,
Iulian

Shehzan Mohammed

unread,
May 1, 2015, 4:51:34 PM5/1/15
to arrayfi...@googlegroups.com
1. When you pass a device pointer to the constructor and mark it as afDevice, there is no memcpy happening (i probably missed this in my previous anwer). ArrayFire simply takes control of the device pointer.
2. ArrayFire uses a memory manager. So it wont actually free cuda memory unless there is a need to. It will simply re-use memory that has gone out of scope. So you don't exactly have to worry about allocating memory every time. A simple example is:
for(int i = 0; i< 10; i++) {
    array a
= randu(10);
    af_print
(a);
}

Here cudaMalloc will only be called once. As a goes out of scope, the memory is marked as free but not deleted. The next time it is simply given to the 2nd a.

Iulian Popescu

unread,
May 1, 2015, 5:36:04 PM5/1/15
to arrayfi...@googlegroups.com
1. I don't have device pointer yet, I want ArrayFire to allocate device memory for me. Let's consider the following code:

classA::classA() :
   d_myArray(dim1, dim2, c32)
{
}

classA::run(float* h_myData)
{

  d_myArray = af::array(dim1, dim2, (af::cfloat*)h_myData);
}

So the constructor allocates a device buffer managed by d_myArray, run() just copies the memory from the host.
Now my question is:  Is that every time run() is called a new device buffer gets allocated, host memory gets copied to it and then copied again into the current d_myArray device buffer allocated at construction?

2. I see, but how about if one wants to have control over how much device memory gets allocated at every moment in time? I wanna be able to control how much memory gets allocated on the device and when to free it whereas it seems that's not the case, I'm at the discretion of the Memory manager.

Thanks,
Iulian

Shehzan Mohammed

unread,
May 1, 2015, 5:45:11 PM5/1/15
to arrayfi...@googlegroups.com
classA::classA() :
   d_myArray(dim1, dim2, c32) -> Allocates memory
{
}

classA::run(float* h_myData)
{
  d_myArray = af::array(dim1, dim2, (af::cfloat*)h_myData); -> Uses "free" memory if available, otherwise allocates, then copies. Old pointer is marked free and can be reused. Old array object is destructed. No device-to-device copy.

}

So the constructor allocates a device buffer managed by d_myArray, run() just copies the memory from the host.
Now my question is:  Is that every time run() is called a new device buffer gets allocated, host memory gets copied to it and then copied again into the current d_myArray device buffer allocated at construction?
No. Only host to device. No device to device.

2. I see, but how about if one wants to have control over how much device memory gets allocated at every moment in time? I wanna be able to control how much memory gets allocated on the device and when to free it whereas it seems that's not the case, I'm at the discretion of the Memory manager.

You can only delete all the "free" memory at once. You cannot control which memory you want freed (except ones that are not marked free, ie. are in scope). See https://github.com/arrayfire/arrayfire/pull/539/files for API.
You can allocate any amount of memory by using the constructors.

Iulian Popescu

unread,
May 1, 2015, 7:23:08 PM5/1/15
to arrayfi...@googlegroups.com
I understand. So the first call to run() will allocate another buffer on the device before marking the old one as free and subsequent calls will reuse what is free. Along those lines then there is no point on doing the allocation in the constructor of classA since there will always be the overhead of allocating a device buffer on the first call to run(). Is that anything I can do in the run() so that I only copy to the pre-allocated device buffer other than a cudaMemcpy?
My fundamental problem here is that I would like to have all device buffers allocated at construction time and then each time run() is called only perform copy from host to the device of the new data. I believe there is no way to achieve that with ArrayFire just because of the existence of the memory manager.

Thanks,
Iulian

Shehzan Mohammed

unread,
May 1, 2015, 7:30:13 PM5/1/15
to arrayfi...@googlegroups.com
That is correct.
Let us discuss this internally. I think we can make a call that would simply copy rather than have a constructor.

-Shehzan

Shehzan Mohammed

unread,
May 1, 2015, 7:32:38 PM5/1/15
to arrayfi...@googlegroups.com
So one thing you could do is call the alloc functions in your constructor.

That way, it will create the memory but not arrays. So when you do your run, it will simply pick an optimal sized buffer and assign it to the array.

Iulian Popescu

unread,
May 4, 2015, 11:34:27 AM5/4/15
to arrayfi...@googlegroups.com
Hi Shehzan,

I understand this solution will work, however I think it spoils the elegance of the library which is using only the array object. I would rather prefer to make a call to a copy member function to transfer memory.

Iulian

Pavan Yalamanchili

unread,
May 4, 2015, 12:44:25 PM5/4/15
to Iulian Popescu, arrayfi...@googlegroups.com
Hi Lulian,

You can not have some benefits of the array class and not the others. Ideally you would create the arrays as necessary and not worry about the allocations. ArrayFire does the memory management automatically without causing any additional overheads. There are very very rare instances in which the usecase you are asking for would be beneficial compared to using the ArrayFires memory manager.

Having that said that, you can do something like this to pre-allocate memory and use it repeatedly.

void run(array &a, other_params);
{
// Do stuff

cudaMemcpy(a.device<float>(), h_ptr, bytes, cudaMemcpyDeviceToHost);

// do other stuff


int main()
{
//
array a(rows, cols, f32);
run(a, other_params);
//
}

The downside of this is that you are losing out on portability and are being tied down to a particular backend.

--
You received this message because you are subscribed to the Google Groups "ArrayFire Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to arrayfire-use...@googlegroups.com.
To post to this group, send email to arrayfi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/arrayfire-users/0f59fb68-ceb4-43d4-ac57-12053bb62c20%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Iulian Popescu

unread,
May 4, 2015, 2:25:28 PM5/4/15
to arrayfi...@googlegroups.com, julian....@gmail.com
Hi Pavan,

Your last remark says it all and that is exactly what I was trying to avoid doing when I first posted the question: not have to use cudaMemcpy. I also understand the use cases that dictated the current design of ArrayFire, however I don't see what I'm asking for necessarily as a corner case. I think Shehzan already suggested that this can be solved in an elegant way by providing an API for just transferring from host to device and this is something I would probably be looking for if I will continue to use the library.

Thanks,
Iulian

P.S. Just so you know, my name is Iulian not Lulian but no worries with certain fonts an "I" looks like an "L".

Pavan Yalamanchili

unread,
May 4, 2015, 2:36:39 PM5/4/15
to Iulian Popescu, arrayfi...@googlegroups.com
Hi Iulian,

Sorry about the repeated mistakes regarding your name. Shehzan is implementing the function as suggested. Once implemented you can do something like the following:

arr.wirte(h_ptr); // it will replace the data already present in arr

Iulian Popescu

unread,
May 4, 2015, 2:42:29 PM5/4/15
to arrayfi...@googlegroups.com, julian....@gmail.com
Hi Pavan,

No worries about the misspelling. That is very good news, it is exactly what I was envisioning. Any time frame for when it would be available?

Thanks for letting me know,
Iulian

Shehzan Mohammed

unread,
May 5, 2015, 2:01:18 PM5/5/15
to arrayfi...@googlegroups.com, julian....@gmail.com
This has been completed via PR #643. See test/write for usage example.

Iulian Popescu

unread,
May 6, 2015, 11:10:22 AM5/6/15
to arrayfi...@googlegroups.com, julian....@gmail.com
Hi Shehzan,

Is it already in the build? I grabbed the last build last night and I can't see any write() API in the array.h, however if I grab the sources it is there. I would rather not build the library but just use one already built.

Thanks,
Iulian

Shehzan Mohammed

unread,
May 6, 2015, 11:15:06 AM5/6/15
to arrayfi...@googlegroups.com, julian....@gmail.com
The installers get updated after midnight.
So grab the latest one and that should have it.

-Shehzan
Reply all
Reply to author
Forward
0 new messages