Good way to gather all 16 even bits in a 32 bit integer into the lower 16 bits of the destination?

Morten Mikkelsen

unread,

Nov 18, 2017, 3:17:11 PM11/18/17

to Intel SPMD Program Compiler Developers

I am looking for a good way to gather all 16 even bits in a 32 bit integer into the lower 16 bits of the destination (ie. {bit0, bit2, bit4, ...., bit30} --> {bit0...bit15})..
Ideally in a way that will work with the ispc compiler as well. The only thing I've come across so far to do it with is _pext_u32() and it does not appear to exist in ispc going by the user's guide.

Thanks,

Morten

Dmitry Babokin

unread,

Nov 19, 2017, 2:28:16 AM11/19/17

to ispc...@googlegroups.com

pext_u32 seems to be what you need. Adding it to ispc isn't really difficult, but I doubt that it will be very popular.

The easiest is probably calling it as an external C function and implementing this external function using intrinsics.

Dmitry.

--
You received this message because you are subscribed to the Google Groups "Intel SPMD Program Compiler Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ispc-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Morten Mikkelsen

unread,

Nov 19, 2017, 9:35:10 AM11/19/17

to Intel SPMD Program Compiler Developers

>The easiest is probably calling it as an external C function

Wouldn't this be quite inefficient? That forces a function call rather than inlining.

To unsubscribe from this group and stop receiving emails from it, send an email to ispc-dev+u...@googlegroups.com.

Dmitry Babokin

unread,

Nov 20, 2017, 12:32:53 PM11/20/17

to ispc...@googlegroups.com

True, but it still may be more efficient than bit logic.

Though if you do bit manipulation on the whole vector (in ispc), this may be comparable with element-wise _pext_u32(). By comparable I mean ("just" 2-3x slower).

Dmitry.

To unsubscribe from this group and stop receiving emails from it, send an email to ispc-dev+unsubscribe@googlegroups.com.

Morten Mikkelsen

unread,

Nov 20, 2017, 1:22:00 PM11/20/17

to Intel SPMD Program Compiler Developers

To be honest even just the uniform version as an intrinsic for ispc would be super helpful in my scenario since I can do the lane specific part before entering the loop.

Dmitry Babokin

unread,

Nov 20, 2017, 1:28:49 PM11/20/17

to ispc...@googlegroups.com

I'll have a look at it with low priority (I have too much other work at the moment).

But if someone is willing to contribute, pull requests are welcome. I may also help with advice how to implement it.

Dmitry.

To unsubscribe from this group and stop receiving emails from it, send an email to ispc-dev+unsubscribe@googlegroups.com.

Morten Mikkelsen

unread,

Dec 19, 2017, 4:36:22 PM12/19/17

to Intel SPMD Program Compiler Developers

This may be common knowledge for many on this forum but I thought it was interesting enough to mention anyway. Apparently icelake

will have VBMI2 support which I'd expect includes a simd version of the pext instruction we're discussing? since the scalar pext version is under BMI2.

https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512

Reply all

Reply to author

Forward