Coalescing double loads?

61 views
Skip to first unread message

Philip Thong

unread,
May 23, 2017, 12:35:59 PM5/23/17
to Intel SPMD Program Compiler Users
Hi,

I'm trying to compile a simple ISPC program:

export void normalizeSOA(uniform double array[], uniform int count,
                         uniform double zeros[]) {
   double l2 = 0;
   foreach (i = 0 ... count/4) {
      double x = array[4*i];
      double y = array[4*i+1];
      double z = array[4*i+2];
      double w = array[4*i+3];

     l2 += x*x + y*y + z*z + w*w;
   }

   zeros[0] = reduce_add(l2);
}

I always get Performance warnings about the gather if the datatype is double.

I would have expected the loads to be coalesced. 


Dmitry Babokin

unread,
May 23, 2017, 7:01:40 PM5/23/17
to ispc-...@googlegroups.com
<I've replied to this email in ispc-dev mailing list, while ispc-users is better fit for it. I'm copy-pasting my reply here>

Philip,

Note that i is varying int, not uniform. I.e. on first iteration has value (0,1,2,3), if you are compiling for avx1-i32x4 (or other 4-wide target).

Hence, 4*i has value (0,4,8,12). Which means array[4*i] is not a continuous load, it has to be gather.

So I assume your data layout is x, y, z, w, x, y, z, w,... And in this case you would need to use aos_to_soa4() function, but it supports only float and int32. The code would look like this:

export void normalizeSOA(uniform float array[], uniform int count,
                         uniform float zeros[]) {
   float l2 = 0;
   for (uniform int i = 0; i < count; i += programCount*4) {
     float x, y, z, w;
     aos_to_soa4(&array[i], &x, &y, &z, &w);

     l2 += x*x + y*y + z*z + w*w;
   }

   zeros[0] = reduce_add(l2);
}

The better solution would be to have x, x, x, x, y, y, y, y, z, z, z, z, w, w, w, w, x, x, x,... data layout, in this case you'll be able to work with it much more efficiently. And the code would look like:

export void normalizeSOA(uniform double array[], uniform int count,
                         uniform double zeros[]) {
   double l2 = 0;
   for (uniform int i = 0; i < count; i += programCount*4) {
     double x = array[i*programCount + programIndex];
     double y = array[(i+1)*programCount + programIndex];
     double z = array[(i+2)*programCount + programIndex];
     double w = array[(i+3)*programCount + programIndex];

     l2 += x*x + y*y + z*z + w*w;
   }

   zeros[0] = reduce_add(l2);
}

Hope it helps.

Dmitry.

--
You received this message because you are subscribed to the Google Groups "Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ispc-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages