Where are the AVX-512 speedup results for the ISPC example programs ?

268 views
Skip to first unread message

bb3141

unread,
Sep 25, 2019, 6:42:16 PM9/25/19
to Intel SPMD Program Compiler Users
Hi,

the speedup of ISPC-vectorized is very impressive and for SSE and AVX, many of the examples show near ideal 4x and 8x scaling with the SIMD width.

So I'm very interested in the results of the 16 wide AVX512 for the common ISPC examples (like aobench and ray-tracer).
In theory, AVX512 should be the ideal hardware for ISPC with a potential of sixteen times speedup. (the only limiting factor would be memory bandwidth).

Am I missing something or are the AVX512 results not reported in the "performance" paper  ?

Does anyone have some numbers that comprae the speedup (esp. with respect to 8 wide AVX2)
or  build an run the examples on a AVX512 capable machine ?

Thanks & Regards,
bb3141
 





  

bb3141

unread,
Sep 25, 2019, 6:43:17 PM9/25/19
to Intel SPMD Program Compiler Users

Dmitry Babokin

unread,
Sep 25, 2019, 6:56:23 PM9/25/19
to ispc-...@googlegroups.com
Hi,

We haven't updated performance numbers for a while, thanks for pointing this out.

I'll make measurements on the machine that I have and will post the results here. And we'll update the "official" numbers a bit later.

AVX512 is indeed ideal target for ISPC. Though a few factors need to be taken into account. AVX512 triggers lower frequencies than AVX2, which contributes to less-than-expected scaling factor when going to 16 lanes.

We also have AVX512 8-wide target, which doesn't triggers frequency problem.

Right now the most common platform with AVX512 is server Skylake platform (Purley).

AVX512 is also available on the client in Ice Like chips (since recently), but unfortunatelly I don't have such machines within my reach. It would be very interesting to experiment with their performance. But they are *client* chips, which means they have fewer AVX execution units, than server parts.

I'll run tests on Skylake server and post it here.

Dmitry.

--
You received this message because you are subscribed to the Google Groups "Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ispc-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ispc-users/7405df17-7632-4b1e-bd8a-5da48ca8a8f9%40googlegroups.com.

Dmitry Babokin

unread,
Sep 25, 2019, 9:41:24 PM9/25/19
to ispc-...@googlegroups.com
I'm attaching perf measurements on SKX machine for avx2-i32x8, avx2-i32x16, avx512skx-i32x8, and avx512skx-i32x16 targets.

Note, that some benchmarks behave better with "double pumped" targets (i.e. avx2-i32x16), than with targets with native architecture width. So it makes sense to have a closer look at individual benchmarks, rather than just looks at geomean.

Also, don't put attention to "ISPC+tasks", as examples are not really tuned for that wide machine (my machine has 2x28 cores with enabled HT, i.e. 112 virtual cores). So these runs basically don't have enough work for that many cores.

Raw speedup geomean (against clang-8 compiler) is:
avx2-i32x8: 6.85
avx2-i32x16: 7.33
avx512skx-i32x8: 7.13
avx512skx-i32x16: 9.18

Dmitry.
perf-avx2-i32x8.txt
perf-avx2-i32x16.txt
perf-avx512-i32x16.txt
perf-avx512-i32x8.txt

Brian Green

unread,
Oct 4, 2019, 5:48:50 PM10/4/19
to Intel SPMD Program Compiler Users
Have you found hyper-threading to have a positive impact vector width scalability?  Among other things, I'm wondering if it can help speed up gathers.  I could test this myself, it just isn't the easiest thing to turn on and off on production systems.

Cheers,
-Brian
To unsubscribe from this group and stop receiving emails from it, send an email to ispc-...@googlegroups.com.

Dmitry Babokin

unread,
Oct 4, 2019, 7:32:14 PM10/4/19
to ispc-...@googlegroups.com
I haven't studied effect of HT, but I assume that workloads with sparse memory access should benefit from it. I.e. you have 2x more workers to wait for memory. For not sparse memory access, I would expect HW prefetcher to kick in and solve the problem.

Of course this positive effect should be limited by bandwidth. Once you hit the bandwidth limit, nothing will help.

Dmitry.

To unsubscribe from this group and stop receiving emails from it, send an email to ispc-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ispc-users/1929e11b-0037-41f1-becc-3184578b91c7%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages