Here's a better example of the problem I'm having.
Compute::Accumulate is dreadfully slow. Over 5 million values, it processes at about 8 seconds vs standard c++ accumulate with under a second.
I'm surprised Compute::Accumulate does it serial wise. That's when I thought maybe I could write my own sum algorithm.
I have a vector.
What I do is divide the vector in half (copy each half into a new vector). Add these together to create a 3rd vector, odd sized vectors remainder gets added to the new vector.
Example
vector of 101 elements
two new vectors of 50, and 50 are added together (this would be where opencl comes in handy) to create a new 50 element vector, add odd element back in for 51.
repeat
25, 25, = 25 elements, add odd = 26
13 and 13 = 13 elements
6, 6 = 6 + 1 = 7
3, 3 = 3 + 1 =4
2, 2 = 2
1, 1 = 1
I achieved the sum in what I would hope would be less expensive than a serial add.
How can I accomplish this in boost? Do I just pass a vector? Then create new vectors inside say a custom function which call new custom function (of course modifying. the below code)?
My hope is to not split up the data outside the opencl device (i.e. on the host) because of the overhead copying data back/forth.BOOST_COMPUTE_FUNCTION(int, add_four, (int x),
{
return x + 4;
});