boost::copy vs queue.enqueue_read

david....@eigendynamics.com

unread,

Jul 5, 2017, 10:01:23 AM7/5/17

to boost-compute

Hi,

We are testing the boost::copy vs queue.enqueue_read_buffer and I'm finding the queue.enqueue_read_buffer approach to be consistently slower for low volume data. Is there something that we are missing? Will this work with the same performance higher volumes of data?

I find the queue.enqueue_read_buffer the most desirable copy format since It allows me to pass __read_only flags. Is it possible to pass __real_only to boost::copy?

This would be the case with enqueue_read_buffer https://pastebin.com/hGaH7eyq

and this would be with boost::copy. https://pastebin.com/h8BTpbLV

There is almost a 40% in time difference. And I don't really know why, is boost::copy using something better under the hood?

Thanks

Jakub Szuppe

unread,

Jul 5, 2017, 2:26:48 PM7/5/17

to boost-compute

Hi,

For such small vectors (4 elements?) I'd guess that functions overhead takes more time than copying itself. I guess

by low volume data you actually mean arrays of like 1k - 10k elements.

1. In https://pastebin.com/h8BTpbLV you have error, it should be:

compute::copy(a.begin(), a.begin() + 4, a_d.begin(), queue);
compute::copy(b.begin(), b.begin() + 4, b_d.begin(), queue);
compute::copy(c.begin(), c.begin() + 4, c_d.begin(), queue);

2. __read_only etc. (see https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/accessQualifiers.html) are reserved for images IIRC.

I think that using CL_MEM_READ_ONLY (boost::compute::buffer::read_only for Boost.Compute) should be is enough (+ const in kernel

to indicate it's read only memory).

3. enqueue_read_buffer uses clEnqueueReadBuffer, however for boost::copy

https://github.com/boostorg/compute/blob/master/include/boost/compute/algorithm/detail/copy_to_host.hpp#L55 this is

what happens for small vectors, we map buffer into host memory, copy to host, unmap.

david....@eigendynamics.com

unread,

Jul 6, 2017, 4:31:15 AM7/6/17

to boost-compute

Thanks a lot as always!

I am using it to update the position of the point of view of the camera for a ray tracing app, so I need to pass something like a 4*4 matrix each time, I think we will use boost::compute::copy, but can you pass the boost::compute::buffer::read_

only to boost::compute::copy? Would you recommend anything different?

Jakub Szuppe

unread,

Jul 6, 2017, 3:10:28 PM7/6/17

to boost-compute

You can try using constant memory (constant_buffer_iterator) instead of just global memory, 4x4 matrix

should easily fit into constant memory on any GPU. Also read more about using CL_MEM_USE_HOST_PTR, and/or

CL_MEM_ALLOC_HOST_PTR (of course always benchmark).

Kyle Lutz

unread,

Jul 6, 2017, 3:16:25 PM7/6/17

to Jakub Szuppe, boost-compute

You could also try passing the 4x4 matrix as a float16/double16 kernel argument, may be more efficient depending on the OpenCL implementation in use.

-kyle

--
You received this message because you are subscribed to the Google Groups "boost-compute" group.
To unsubscribe from this group and stop receiving emails from it, send an email to boost-compute+unsubscribe@googlegroups.com.
To post to this group, send email to boost-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/boost-compute/73e939ca-960c-47a7-8bb0-d747a679af36%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

david....@eigendynamics.com

unread,

Jul 7, 2017, 6:48:37 AM7/7/17

to boost-compute

Hi Jakub,

I will try the constant_buffer and the host_ptr, they will be both surely be useful, ( The POV has to be updated updated each time so HOST_PTR , but the position of the pixels relative to the POV is uploaded only once, so constant_buffer is perfect. I might try the map/unmap approach as well and Kyle's suggestion.

Again, thanks a lot to both!

Reply all

Reply to author

Forward