work groups and work items

Joshua Laferriere

unread,

Jun 28, 2016, 11:10:12 AM6/28/16

to boost-compute

I have a hankering to try my hand at neural networks. I have a standard version coded.

I'm hoping to split feed forward layers into opencl

Then split each neuron as a workgroup?

Each weighted sum calculation as a work item?

I'm kind of lost to be honest, I was wondering if there was a wiggted sum calculation that could be given as an example of work groups and work items as implemented in boost compute

Joshua Laferriere

unread,

Jun 30, 2016, 10:14:37 AM6/30/16

to boost-compute

Here's a better example of the problem I'm having.

Compute::Accumulate is dreadfully slow. Over 5 million values, it processes at about 8 seconds vs standard c++ accumulate with under a second.

I'm surprised Compute::Accumulate does it serial wise. That's when I thought maybe I could write my own sum algorithm.

I have a vector.

What I do is divide the vector in half (copy each half into a new vector). Add these together to create a 3rd vector, odd sized vectors remainder gets added to the new vector.

Example

vector of 101 elements
two new vectors of 50, and 50 are added together (this would be where opencl comes in handy) to create a new 50 element vector, add odd element back in for 51.

repeat
25, 25, = 25 elements, add odd = 26
13 and 13 = 13 elements
6, 6 = 6 + 1 = 7
3, 3 = 3 + 1 =4
2, 2 = 2
1, 1 = 1

I achieved the sum in what I would hope would be less expensive than a serial add.

How can I accomplish this in boost? Do I just pass a vector? Then create new vectors inside say a custom function which call new custom function (of course modifying. the below code)?

My hope is to not split up the data outside the opencl device (i.e. on the host) because of the overhead copying data back/forth.

BOOST_COMPUTE_FUNCTION(int, add_four, (int x),
{
    return x + 4;
});

Jakub Szuppe

unread,

Jun 30, 2016, 11:06:20 AM6/30/16

to boost-compute

Well, I think you should use reduce operation.

Do you use floating point numbers?
Do you use GPU or CPU?

Joshua Laferriere

unread,

Jun 30, 2016, 12:35:11 PM6/30/16

to boost-compute

yes floating

goal is to use gpu

I had an idea that I could create all my vectors before hand using the method mentioned above, but once on the card. I can do the additions and assignments.

Jakub Szuppe

unread,

Jun 30, 2016, 12:42:56 PM6/30/16

to boost-compute

OK, so for floating point number accumulate indeed is slow (see http://www.boost.org/doc/libs/1_61_0/libs/compute/doc/html/boost/compute/accumulate.html).

You should use reduce. If a loss in precision is not acceptable you can create your own function based on reduce implementation and Kahan summation algorithm.

Joshua Laferriere

unread,

Jun 30, 2016, 1:05:47 PM6/30/16

to boost-compute

While I appreciate the advice for this exercise. I am still curious how to do work on the gpu itself.

Say I have a structure with different types of vectors that each need their own parrallel processing.

How would one normally go about that?

Step 1 I assume would be to send the structure.

Step 2 ? Break up the structure into separate vectors by creating new vectors and copying? Will the copying be done from gpu to gpu memory? That is my hope to avoid host to gpu copying.

Joshua Laferriere

unread,

Jun 30, 2016, 3:47:43 PM6/30/16

to boost-compute

I think what I'm trying to accomplish in addition to splitting a structure into specific work sections, is creating multiple command queue's to process these work sections independently of each other.

Joshua Laferriere

unread,

Jul 1, 2016, 9:02:35 AM7/1/16

to boost-compute

http://stackoverflow.com/questions/26113527/how-to-launch-multiple-kernel-in-opencl-inside-the-program#

Reply all

Reply to author

Forward