A few questions about compute shaders workgroup size...

531 views
Skip to first unread message

Mark Sibly

unread,
Apr 23, 2023, 2:29:51 AM4/23/23
to Dawn Graphics
Hi,

the webgpu docs seem to suggest workgroup x and y can be max 256 and 64 for z  ( https://gpuweb.github.io/gpuweb/#limits ) but the compute boids demo does Dispatch(1000), does this mean that 4 lots of 256 workgroups will be created, so workgroupID.x is always <256?

Also, isn't 1000 workgroups actually 1000 * some number of HW 'threads'? If so, the boids demo seems to be launching many more threads than it needs - unless I've misread the code. Or are the params to Dispatch() actually threads? If not, how do I find out how many threads in a workgroup so I can workout out how mnay workgroups to start if I want N threads?

Bye!
Mark


Corentin Wallez

unread,
Apr 24, 2023, 4:29:53 AM4/24/23
to Mark Sibly, Dawn Graphics
Hey Mark,

The size of the workgroup is annotated on the compute entrypoint. It is set to 64x1x1 for the compute boids so the dispatchWorkgroups(1000) will cause 64*1000 invocations to be created. This confusion is why the call Dispatch() was renamed to DispatchWorkgroups().

Hope this helps,

Corentin

--
You received this message because you are subscribed to the Google Groups "Dawn Graphics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dawn-graphic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dawn-graphics/202df8fa-e4cd-42e1-8ccc-2badfa241977n%40googlegroups.com.

Kai Ninomiya

unread,
Apr 24, 2023, 1:44:03 PM4/24/23
to Corentin Wallez, Mark Sibly, Dawn Graphics
Hey Mark,

Which compute boids demo are you referring to?
The webgpu-samples one uses dispatchWorkgroups(Math.ceil(numParticles / 64)) with @workgroup_size(64)
... though this seems to be missing a check in the shader to skip if index is past the max particle index?
The Dawn one uses DispatchWorkgroups(kNumParticles) with @workgroup_size(1)
... which we should optimize to match webgpu-samples.

-Kai (he/they)


Mark Sibly

unread,
Apr 24, 2023, 4:21:47 PM4/24/23
to Kai Ninomiya, Corentin Wallez, Dawn Graphics
Hi,

> The size of the workgroup is annotated on the compute entrypoint.

OK, I had managed to completely miss this somehow!

> Which compute boids demo are you referring to?

The dawn one with a workgroup size of 1 so it was a bit confusing, but the other one looks more informative...

Still, why 64 for workgroup_size? Does 1 mean no parallelism? Why not 256 then? I did see this magic number mentioned elsewhere, just curious where it comes from...

Anyway, I'll wait until I write some actual compute shader code before asking any more stupid questions - I've got a ton!

Bye,
Mark




Kai Ninomiya

unread,
Apr 24, 2023, 6:43:13 PM4/24/23
to Mark Sibly, Corentin Wallez, Dawn Graphics
Hardware executes invocations in fixed-width "subgroups" - it's the number of SIMD lanes in the SIMT execution. The size depends on the vendor and architecture, but I'm told it's typically 32 or 64. There are also details that make things more complicated than that but that's a first approximation.

I'm not an expert in this, so take what I say with a grain of salt (and someone can correct me if I'm wrong). Anecdotally, if your workgroup is smaller than the subgroup size on a given piece of hardware, drivers usually just leave the other lanes empty, instead of combining multiple workgroups into one subgroup. On the other hand, if the workgroup (say 64) is larger than a subgroup (say 32), then it will just de-parallelize (run 32 up to the first control flow barrier, then switch to run the other 32, and so on until it's done). Hence for reasonable performance, we want to make sure we have workgroups which are a multiple of, and at least as large as, the largest common subgroup size (which is 64) without being so large it defeats parallelization. The simple choice of 64 works well overall, even though tweaking workgroup sizes can improve performance depending on the hardware and exact shader logic.

I haven't read all of this but this article from Faith Ekstrand gets into some interesting nitty gritty about the tradeoffs of different hardware subgroup sizes (not user workgroup sizes). In the 4th paragraph there's mention of the subgroup sizes on different vendors, which is how I stumbled upon it.
https://www.gfxstrand.net/faith/blog/2020/10/does-subgroup-wave-size-matter/

-Kai (he/they)

Mark Sibly

unread,
Apr 24, 2023, 8:18:28 PM4/24/23
to Kai Ninomiya, Corentin Wallez, Dawn Graphics
Hi Kai,

Great beginners article thanks!
Reply all
Reply to author
Forward
0 new messages