How to Use the Box Blur Example

Royi

unread,

Aug 17, 2018, 6:15:45 AM8/17/18

to Intel SPMD Program Compiler Users

Hello,

I'm trying to do my first steps with ISPC.

I understand the concept of ISPC that you write the programs from the point of view of a single element.

It is easy to understand in the case there is an array as input and array as output and the ISPC program is the whole program.

What I don't get is how it the Box Blur example should be used by the host program.

let's say we have an image (Single Channel) as mI.

Should we iterate it like:

for(ii = 0; ii < numRows; ii++){

for(jj = 0; jj < numCols; jj++){

mO[ii, jj] = box3x3(mI, jj, ii);

}

But then it doesn't make sense.
As it seem a "Gang" will update single value.

Could someone show a simple example how to utilize ISPC for Image Filtering?

Thank You.

Dmitry Babokin

unread,

Aug 17, 2018, 3:01:26 PM8/17/18

to ispc-...@googlegroups.com

box3x3(uniform float image[32][32], int x, int y) takes varying x and y parameters and exploits parallelism coming from these vectors. For SSE4 it will handle 4 pixels at a time. So you need to organise outer loops to supply these pixels in chunks of 4 (or 8, or 16, depending on your target).

This loop will do the job:

for(uniform int ii = 0; ii < numRows; ii++){ // note uniform counter

for(int jj = 0; jj < numCols; jj++){ // note varying counter

mO[ii, jj] = box3x3(mI, jj, ii); // on the first iteration it will handle pixels (0,0), (1,0), (2,0), (3,0), i.e. jj is (0,1,2,3), ii is uniform int 0, which is casted to varying int (0,0,0,0).

}

In case of both counter are varying, it will basically handle only 4 diagonal points of out of 16 in 4x4 area: (0,0), (1,1), (2,2), (3,3).

--
You received this message because you are subscribed to the Google Groups "Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ispc-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Royi

unread,

Aug 18, 2018, 5:02:53 AM8/18/18

to Intel SPMD Program Compiler Users

Hi Dmitry,

If I get you right you're suggesting an ISPC wrapper around the function from the example.

The host program is something like:

for(ii = 0; ii < numRows; ii++){

for(jj = 0; jj < numCols; jj += 4){

mO[ii, jj] = IspcBoxBlurWrppaer(mI, jj, ii);

}

Where IspcBoxBlurWrapper is something like:

for(ii = 0; ii < 4; ii++){

mO[ii, jj] = box3x3(mI, jj, ii);

}

And box3x3(mI, jj, ii); is like in the documentation.

Is it efficient?

Otherwise I didn't get your code:

for(uniform int ii = 0; ii < numRows; ii++){ // note uniform counter

for(int jj = 0; jj < numCols; jj++){ // note varying counter

mO[ii, jj] = box3x3(mI, jj, ii); // on the first iteration it will handle pixels (0,0), (1,0), (2,0), (3,0), i.e. jj is (0,1,2,3), ii is uniform int 0, which is casted to varying int (0,0,0,0).

}

Which seems to require the whole image as input.

Dmitry Babokin

unread,

Aug 20, 2018, 7:29:45 PM8/20/18

to ispc-...@googlegroups.com

No, box3x3() is ISPC function, the function which calls it also has to be ISPC (just because box3x3 accepts varying parameters).

it will look like this:

export void computeImage(uniform float * uniform ml, uniform int numRows, uniform int numCols) {

Royi Avital

unread,

Aug 21, 2018, 12:07:02 AM8/21/18

to ispc-...@googlegroups.com

Hi Dmitry,

I don't understand why must I have 2 levels of ISPC functions instead of 1.

Wouldn't 1 be more efficient?

I will create a C wrapper which works on chunks of 4-8 pixels.

It gives the chunk to ISPC function which updated a buffer chunk which is given as well.

Would that be more efficient or better have 2 levels of ISPC?

You received this message because you are subscribed to a topic in the Google Groups "Intel SPMD Program Compiler Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ispc-users/QvlGaQQYRhw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ispc-users+...@googlegroups.com.

Dmitry Babokin

unread,

Aug 21, 2018, 2:35:42 AM8/21/18

to ispc-...@googlegroups.com

Hi Royi,

There's no benefit of having two levels of ISPC functions. You've just started from box3x3 example and I explained how to use it from another ISPC function to cover the whole image. Writing such functions on practice you will use just a single function. Nevertheless, I would be surprised if box3x3 is not inlined and all the overhead of two functions is eliminated.

Dmitry.

Royi

unread,

Aug 26, 2018, 7:48:39 AM8/26/18

to Intel SPMD Program Compiler Users

Hi Dmitry,

What about the case you have no obligations.

What's the best way to apply 2D Convolution using ISPC?

Thank You.

Dmitry Babokin

unread,

Aug 26, 2018, 6:12:13 PM8/26/18

to ispc-...@googlegroups.com

Disregard my comments about uniform and varying variables in for-statement, they were misleading.

This code will do the job for you.

export void foo(uniform float image[32][32]){

foreach(y = 1 ... 31, x = 1 ... 31){

float sum = 0;

for (uniform int dy = -1; dy <= 1; ++dy) {

for (uniform int dx = -1; dx <= 1; ++dx) {

// print("x=%; y=%, dx=%, dy=%\n", x, y, dx, dy);

sum += image[y+dy][x+dx];

}

image[y][x] = sum / 9.;

}

Reply all

Reply to author

Forward