Hi!
I just wanted to give some more information here on the issue we're seeing. The sample file I'm using to benchmark each GPU is here:
In the test, I measure the time between requestAnimationFrame (RAF) and use that to determine the user perceived FPS (first two lines in the program). I also measure the CPU time spent in RAF (wall clock end - wall clock start) (third line in the program). Finally, I measure the GPU time spent in RAF using a WebGL2.0 query timer every 10th frame and use that to determine the GPU based FPS (fourth line in the program). The test has an option to enable addition of CPU only spinning to simulate a heavy CPU vs heavy GPU load. To increase the GPU load, I simply issue more draws. Ideally, we want to be able to determine if a user is bottlenecked on CPU or GPU using these timers.
Testing on my Apple-silicon based Mac, I can see that the GPU query timer is generally accurate (i.e., if the GPU is the bottleneck, I can see that perceived FPS ~= GPU query timer based FPS). Therefore, using the program above, I can bucket my experience into four different categories:
1) If neither CPU nor GPU are constrained, Perceived FPS ~= Refresh Rate (60FPS) AND WebGL2 FPS >> Perceived FPS. No Bottlenecks.
2) If the CPU is heavily constrained (50 ms sleep), we can see Perceived FPS < Refresh Rate AND WebGL2 FPS >> Perceived FPS. This implies the CPU is the bottleneck.
3) If the GPU is heavily constrained (lots of draws per frame), we can see Perceived FPS < Refresh Rate AND WebGL2 FPS ~= Perceived FPS AND FPS derived just from CPU time spent in RAF >> WebGL2 FPS . This implies the GPU is the bottleneck.
4) If the CPU and GPU are equally and heavily constrained, we can see Perceived FPS < Refresh Rate AND WebGL2 FPS ~= Perceived FPS AND FPS derived just from CPU time in RAF ~= WebGL2 FPS. This implies both GPU and CPU are equally bottlenecking. This is validated by checking the Chrome profiler and seeing that the GPU is almost fully utilized.
The only time it's not distinguishable is when the CPU issues too much GPU work and starts blocking on GPU calls, in which case it appears both CPU and GPU are bottlenecking.This could lead to some conflation between which bucket it is in, so I'm ignoring it for now.
What I've noticed testing across several devices is that (of the two) Intel-silicon Windows based and (of the one) Intel-silicon Mac laptops I've tested on is that they are not consistently reporting scenario number 3 (and by extension, scenario number 4).
What I'm curious about is if my implementation is incorrect or if there's an oversight on why it would not work on Intel-silicon Windows/Mac computers? Open to any suggestions!
Thank you!
Yassir Solomah