Philip,
Note that i is varying int, not uniform. I.e. on first iteration has value (0,1,2,3), if you are compiling for avx1-i32x4 (or other 4-wide target).
Hence, 4*i has value (0,4,8,12). Which means array[4*i] is not a continuous load, it has to be gather.
So I assume your data layout is x, y, z, w, x, y, z, w,... And in this case you would need to use aos_to_soa4() function, but it supports only float and int32. The code would look like this:
export void normalizeSOA(uniform float array[], uniform int count,
uniform float zeros[]) {
float l2 = 0;
for (uniform int i = 0; i < count; i += programCount*4) {
float x, y, z, w;
aos_to_soa4(&array[i], &x, &y, &z, &w);
The better solution would be to have x, x, x, x, y, y, y, y, z, z, z, z, w, w, w, w, x, x, x,... data layout, in this case you'll be able to work with it much more efficiently. And the code would look like:
for (uniform int i = 0; i < count; i += programCount*4) {
double x = array[i*programCount + programIndex];
double y = array[(i+1)*programCount + programIndex];
double z = array[(i+2)*programCount + programIndex];
double w = array[(i+3)*programCount + programIndex];