coalescing double types?

22 views
Skip to first unread message

Philip Thong

unread,
May 23, 2017, 12:36:13 PM5/23/17
to Intel SPMD Program Compiler Developers
On an AVX target, I was surprised the following always generates Gather warnings:

export void normalizeSOA(uniform double array[], uniform int count,
                         uniform double zeros[]) {
   double l2 = 0;
   foreach (i = 0 ... count/4) {
      double x = array[4*i];
      double y = array[4*i+1];
      double z = array[4*i+2];
      double w = array[4*i+3];

     l2 += x*x + y*y + z*z + w*w;
   }

   zeros[0] = reduce_add(l2);
}


What needs to change in the ispc code to get it to generate an efficient set of AVX loads and multiplies?

Dmitry Babokin

unread,
May 23, 2017, 6:59:45 PM5/23/17
to ispc...@googlegroups.com
Philip,

Note that i is varying int, not uniform. I.e. on first iteration has value (0,1,2,3), if you are compiling for avx1-i32x4 (or other 4-wide target).

Hence, 4*i has value (0,4,8,12). Which means array[4*i] is not a continuous load, it has to be gather.

So I assume your data layout is x, y, z, w, x, y, z, w,... And in this case you would need to use aos_to_soa4() function, but it supports only float and int32. The code would look like this:

export void normalizeSOA(uniform float array[], uniform int count,
                         uniform float zeros[]) {
   float l2 = 0;
   for (uniform int i = 0; i < count; i += programCount*4) {
     float x, y, z, w;
     aos_to_soa4(&array[i], &x, &y, &z, &w);

     l2 += x*x + y*y + z*z + w*w;
   }

   zeros[0] = reduce_add(l2);
}

The better solution would be to have x, x, x, x, y, y, y, y, z, z, z, z, w, w, w, w, x, x, x,... data layout, in this case you'll be able to work with it much more efficiently. And the code would look like:

export void normalizeSOA(uniform double array[], uniform int count,
                         uniform double zeros[]) {
   double l2 = 0;
   for (uniform int i = 0; i < count; i += programCount*4) {
     double x = array[i*programCount + programIndex];
     double y = array[(i+1)*programCount + programIndex];
     double z = array[(i+2)*programCount + programIndex];
     double w = array[(i+3)*programCount + programIndex];

     l2 += x*x + y*y + z*z + w*w;
   }

   zeros[0] = reduce_add(l2);
}

Hope it helps.

Dmitry.


--
You received this message because you are subscribed to the Google Groups "Intel SPMD Program Compiler Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ispc-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages