Multiple ISPC compile targets seemingly breaks struct alignment generation in C

146 views
Skip to first unread message

Matt Boudreaux

unread,
Jul 2, 2018, 12:29:02 PM7/2/18
to Intel SPMD Program Compiler Users
Hey all,

I was running into a maddening issue where I was getting what looked like junk memory / stomps, but turned out to be a basic alignment mismatch between structures shared between C and ISPC. I finally tracked this down to multiple targets breaking the header generation:

So if I have a ISPC header that does a trivial short vector define:

// Types.ispc
typedef float<4> MyVector;

export PrintTypes()
{
   print("MyVector Size %", sizeof(uniform MyVector));
}

////////////////////////////////


And compile that with multiple targets:

ispc Types.ispc -o "Types_ispc.obj" -h "Types_ispc.h" --target=sse4,avx2

You'll see the header that will use 16 byte alignment (corresponding to SSE4), yet when you call "PrintTypes" it'll print 32 as the compiler choose AVX2 under the hood (but seemingly after the header generation). Removing the SSE4 target (or moving AVX2 to the first target) fixes the issue.

Perhaps modifying the header file to support all the provided targets and just use common preprocessor defines? (__SSE4_1__, __AVX2___, etc) so that the final output matches?

Cheers,

- Matt

Dmitry Babokin

unread,
Jul 2, 2018, 12:44:41 PM7/2/18
to ispc-...@googlegroups.com
Matt,

Does this happen only with short vectors?

On what platform are you working (Linux/Win/Mac)? We have recent fix, which may help. I can build trunk for you to try out.

Dmitry.

--
You received this message because you are subscribed to the Google Groups "Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ispc-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matt Boudreaux

unread,
Jul 2, 2018, 2:37:37 PM7/2/18
to ispc-...@googlegroups.com

I haven’t tried other data types but I wouldn’t be shocked if they are affected as well. I imagine what is happening is the header generation is happening before the actual target is selected (from the available options). 

If you want to give me a version to test (Windows), I’ll happily try out a few test cases and report back. 

Cheers,
- Matt

Dmitry Babokin

unread,
Jul 2, 2018, 2:43:37 PM7/2/18
to ispc-...@googlegroups.com
Windows version takes little longer to build for me. I'll try to reproduce the problem and if it looks fixed to me, will give you a Windows build.

Dmitry Babokin

unread,
Jul 6, 2018, 2:34:10 AM7/6/18
to ispc-...@googlegroups.com
This behaviour appeared to be documented in http://ispc.github.io/ispc.html#data-layout:

>There is one subtlety related to data layout to be aware of: ispc stores uniform short-vector types in memory with their first element at the machine's natural vector alignment (i.e. 16 bytes for a target that is using Intel® SSE, and so forth.) This implies that these types will have different layout on different compilation targets. As such, applications should in general avoid accessing uniform short vector types from C/C++ application code if possible.

Though, it's difficult to argue that it's quite ugly solution. I've tried to dig to the reasoning behind it and all I was able to find is a passage in the source code that justifies it by the need to map short-vector computations to vector instructions. But modern LLVM does pretty good job mapping non-natural size vectors to vector instructions. At least simple examples work well. Also, with new AVX2 and AVX512 targets, this decision doesn't look that good anymore (i.e. mapping float3 to 512 bit vector is probably a bit too much).

Looking around at other short-vector solutions, I see that OpenCL rounds up the number of vector elements to number of 2. This looks reasonable and I think we should stick to this strategy as well.

Bottom line, rules for short vectors size / alignment should be target-agnostic. Specific rules need more careful performance analysis (probably there should be smallest alignment of 16 bytes), but rounding to next power of two looks like a good first cut.

Implementing this will take some time, but it should be done by the next release. Feel free to open a new issue for that.

Dmitry.

Reply all
Reply to author
Forward
0 new messages