Hi,
I've been working on a few different implementations of a simple ray tracer where one is implemented using intrinsics and one using ISPC. Long story short, the intrinsics version outperforms the ISPC one and it appears to be the ray/aabb intersection test that is slower. I did a comparison with Compiler Explorer and one difference I noticed is that ISPC emits vcmpleps + vblendvps as opposed to vminps/vmaxps (see
https://godbolt.org/z/ZD7Vpr for details). I have yet to determine with absolute certainty that this specific difference between the intrinsics version and the one written in ISPC is causing the performance difference but it seems a resonable cause given that the former ends up having less instructions than the latter.
1. Is there a specific reason why ISPC emits vcmpleps + vblendvps instead of vminps/vmaxps?
2. Is there something I can do, be it provide a specific compiler flag or write the intersection routine differently, to make ISPC emit vminps/vmaxps instructions?
Thanks
Michael