complex numbers

50 views
Skip to first unread message

rossinel...@gmail.com

unread,
Jan 12, 2022, 4:33:43 AM1/12/22
to Intel SPMD Program Compiler Users
Dear ISPC Community,

Is there any support for arrays of complex numbers in ISPC?

The inadequacy of std::complex and complex to SIMDization,
due to the AoS layout, can be partially relaxed by the ISA (see vaddsubps for example).

Having seamless support of ISPC for std::complex would be best.
My guess is that there would be some incompatibility in terms of ISPC masking?
(For complex<float>, I would mask i32x8 or i64x4 depending on the operation at hand.)

It would be great to have at least some builtin functions dealing with such arrays,
to "unzip" them into real and imaginary component arrays (SoA), zip them again, and builtin functions z = f(x,y), hence consuming 4 input scalars and producing 2 output scalars.

What are your thoughts on this?

Best wishes,

Diego

Steve Hill

unread,
Jan 12, 2022, 4:58:16 AM1/12/22
to Intel SPMD Program Compiler Users
I have exactly this problem. It is reasonably easy to convert to SoA using something like this:

struct tComplexFloat
{
   float Real;
   float Imag;
};

static inline tComplexFloat ReadFromArrayOfComplexFloat(
   uniform const tComplexFloat Input[], int32 Index
)
{
    const uint64 Raw = ((uniform uint64 *)Input )[Index];
    float Imag = floatbits((uint32)(Raw >> 32));
    float Real = floatbits((uint32)(Raw & 0xFFFFFFFF));
    const tComplexFloat Value = {Real, Imag};
    return Value;
}

...

foreach(i = 0...N)
{
   tComplexFloat Value = ReadFromArrayOfComplexFloat(InputData, i);
   // Operate on Value
}

However, while this might be more efficient than using a gather, it is still significantly less efficient than hand-coded (with the FMADDSUB, in our case) C+intrinsics.

This is the main reason why we are still using intrinsics, instead of ISPC, for a lot of our code. However, I don't see how ISPC could make use of the ADDSUB instructions, due to the programming model. You could, of course, manually swap values between program instances and work on them that way but this would be complicated and likely to result in code just as difficult to write and understand as using intrinsics.

I would also be very interested to learn if there is a way around this.

Regards,

S.

Pete Brubaker

unread,
May 31, 2022, 5:59:23 AM5/31/22
to ispc-...@googlegroups.com
Apologies for the late reply, but this mailing list isn't used much anymore apparently.  We mostly take questions on the github issues page and twitter.

Have you all looked into aos_to_soa2() and soa_to_aos2()?  Granted this adds some swizzling to the loads/stores it does make more efficient use of the lanes.

Another option for more optimal SIMD use would be to slightly reorganize your memory layout.  You could define your struct as:

#define MAX_WIDTH 16
struct Complex
{
    float Real[MAX_WIDTH];
    float Imaginary[MAX_WIDTH];
};

Then you'll have to handle the iteration in uniform for() loops and use programCount/programIndex for addressing the data.  But this will result in getting vector loads and won't lead to gathers.

Cheers,

Pete



--
You received this message because you are subscribed to the Google Groups "Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ispc-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ispc-users/eac895ef-3541-43fa-85c5-c81534e6c52dn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages