Re: [chromium-dev] Why might Major GC cause a CommandBuffer flush and cause more work for the GPU?

8 views
Skip to first unread message

Ken Russell

unread,
Apr 26, 2022, 4:44:17 PM4/26/22
to kallyn...@yeticgi.com, graphics-dev, Chromium-dev
Adding graphics-dev@ though in general it's best to not cross-post. Since this issue involves GC, folks on the chromium-dev@ side may have more advice.

I'd suggest you use the heap profiler and try to figure out what kinds of objects are being collected during that full GC. It's entirely possible that if your app, or Three.js, are allocating and later unreferencing WebGL objects at the JavaScript level, that their finalizers invoked during a full GC are enqueuing GPU work that's causing your application to stutter.

It's for this reason that it's strongly recommended to manage the lifetimes of graphics resources yourself, and not rely on the garbage collector to do so. This means reusing objects like buffers and textures, and deleting the resources via the WebGL API when they're no longer used.

Hope this helps, and that you can get to the bottom of what's going on. The smaller a test case you can provide, the better - the one offered here still looks pretty large from the standpoint of diagnosing browser behavior with it.

-Ken



On Tue, Apr 26, 2022 at 9:23 AM Kallyn Gowdy <kallyn...@yeticgi.com> wrote:
Hey all!

I'm having an issue on a website where intermittent stuttering is occurring, but it is not exactly clear why it is happening or how to prevent it.
As you will see below, it appears that GC has something to do with the problem, but the frame stuttering is actually caused by the GPU doing more work as a result of the GC pass instead of the GC pass causing the main thread to not dispatch rendering commands in time.

What is clear is that this happens on Chrome/Edgium 99 and 100 and on Windows (Intel) and MacOS (M1 Max), so it is likely a behavior of Chrome itself.
Notably, this stuttering doesn't occur on Firefox or Safari.

For context, this website uses requestAnimationFrame() to create a rendering loop for a Three.js based 3D experience. Normally the loop runs at a solid ~160 FPS (to match my monitor refresh rate), but occasionally it will pause for a fraction of a second.

You can find a reproduction at this link: https://alpha.ab1.bot/?autoLoad=stutteringIssue01
Additionally, you can find the source here: https://github.com/casual-simulation/casualos

To make the issue more apparent, you can drag the solid purple block onto one of the square wireframes to cause it to animate along the path.
The stuttering happens regardless of if the block is animating, but it is easier to see when it is.

After profiling, it appears that these stutters occur after Major GC passes, but are not directly related to the GC pass.
You can see in the included profile that before each group of dropped frames there is a Major GC pause, but oddly the GC doesn't last long enough to actually cause frames to be dropped on the main thread.
Instead, we see a couple more "animation frame fired" events and then suddenly we're dropping frames. Looking at the GPU, we can see that it is very busy for those dropped frames.
Where a normal frame takes around ~1ms, the combined GPU activity after GC takes a whopping ~44ms.

So I decided that maybe I need to get a more detailed look at things, and pulled out https://www.ui.perfetto.dev/ with Edgium.
After verifying that the issue still occurs, I recorded a trace and started inspecting work around the dropped frames.
And this is the confusing part, because it appears that the GC pass is triggering extra work on the GPU.

In normal operation, the following sequence of steps happen in order according to the trace and flow events:

1. The GpuVSyncThread posts a task to the VizCompositorThread at a set interval. (~160 Times per second)
2. The VizCompositorThread sends a message to the Renderer Process Compositor which then sends the BeginMainFrame event to the CrRendererMain thread.
3. VizCompositorThread then sends a message to CrGpuMain to tell it to swap the framebuffers.
4. CrRendererMain calls into V8 to trigger the requestAnimationFrame() callback, which then triggers a CommandBufferHelper::Flush which triggers CommandBufferStub::OnAsyncFlush on CrGpuMain indirectly via the GPU Process ChildIOThread. (Technically there is also some additional Compositor work that executes after the JS function call which triggers a second CommandBufferStub::OnAsyncFlush call as well, but I don't want to be too detailed)
5. CrGpuMain then executes all the command buffer commands and the frame rendering is done.

However when a stutter occurs, it looks like this:

1. The GpuVSyncThread posts a task to the VizCompositorThread at a set interval. (~160 Times per second)
2. The VizCompositorThread sends a message to the Renderer Process Compositor which then sends the BeginMainFrame event to the CrRendererMain thread.
3. VizCompositorThread then sends a message to CrGpuMain to tell it to swap the framebuffers.
4. CrRendererMain calls into V8 to trigger the requestAnimationFrame() callback, which then triggers a CommandBufferHelper::Flush which triggers CommandBufferStub::OnAsyncFlush on CrGpuMain indirectly via the GPU Process ChildIOThread. V8 also triggers a Minor GC pass.
5. Minor GC Pass executes and triggers several incremental GC passes.
6. Incremental GC schedules Major GC.
7. Another requestAnimationFrame() callback executes and properly flushes the command buffer and schedules the draw calls. (Just like the normal sequence)
8. Major GC starts, marks objects and then for some reason triggers two additional CommandBufferHelper::Flush calls during CppGC.SweepInvokePreFinalizers. One of these triggers a CommandBufferStub::OnAsyncFlush that takes ~18ms to execute.
9. A third requestAnimationFrame() callback executes and it flushes the command buffer like normal, however this time it takes the GPU ~35ms to execute it this time instead of the usual ~1ms.
10. Finally, the swap buffers task runs for the next real frame and that takes ~41ms instead of the usual ~1ms.

I'm a little at a loss for why GC would create additional work for the GPU.

For now, my best guess is that GC is incorrectly collecting the command buffers and triggers the destructor for CommandBufferHelper which then frees some of the associated memory resources.
Then on the next frame, everything needs to be setup again which causes the next two flushes to take much longer than they otherwise would. Part of my thinking for this is validated by the fact that CommandBufferHelper::~CommandBufferHelper() calls FreeRingBuffer() (in gpu/command_buffer/client/cmd_buffer_helper.cc), but I'm not really sure how the lost state would be reconciled - though I guess Chromium probably has a mechanism for freeing and resuming GPU resources in unused/idle/suspended tabs.

Does anyone have ideas on this?

I'm including both the profile and trace that I've recorded. Hopefully they're useful for debugging.

Thanks for helping!

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/0d05262c-8aaf-4b3b-a4ff-f56ca5434af2n%40chromium.org.

Kallyn Gowdy

unread,
Apr 27, 2022, 4:54:20 PM4/27/22
to Michael Lippautz, k...@chromium.org, graphics-dev, Chromium-dev
Ok, so I'm back with an update. It appears that this is indeed an application issue, and Chrome/Edge simply manifest it in a different manner than Firefox/Safari.

The simple version is that the app is allocating new WebGLBuffer objects every frame, and that is likely what is causing the stuttering via GC.

The part that I wasn't thinking about was that since these objects only get cleaned up every ~10 seconds, that is a lot of buffers that need to be released when they're created at ~160fps.
Additionally, the library that was allocating these buffers had a subtle bug where it appeared that it would reuse them if the number of elements in the buffer didn't change, but the check would always fail because it was comparing the number of numbers to the number of elements. (e.g. count !== length)

So, I was expecting that the app did not allocate WebGL resources every frame when in fact it did. I should've known that it would be the thing you first think of when you see GC and dropped frames close together haha.

Thanks for the help, especially with telling me to look at the heap. I was able to catch the issue pretty quickly once I actually looked at it.
Additionally, I got kinda lucky because we recently replaced the problem library for unrelated reasons and I just noticed that the stuttering problem was pretty much gone.
There's still some unnecessary allocations done by the replacement, but it is better by about a factor of 6.

Thanks again!

~ Kal

On Wed, Apr 27, 2022 at 8:28 AM Michael Lippautz <mlip...@chromium.org> wrote:


On Tue, Apr 26, 2022 at 10:44 PM Ken Russell <k...@chromium.org> wrote:
Adding graphics-dev@ though in general it's best to not cross-post. Since this issue involves GC, folks on the chromium-dev@ side may have more advice.

I'd suggest you use the heap profiler and try to figure out what kinds of objects are being collected during that full GC. It's entirely possible that if your app, or Three.js, are allocating and later unreferencing WebGL objects at the JavaScript level, that their finalizers invoked during a full GC are enqueuing GPU work that's causing your application to stutter.

+1 here as this is likely the reason, see also below.
GC may happen because of general memory pressure through JS allocations, so that's expected.

As for why GC creates additional work: Pre-finalizers allow user objects to specify code to be run synchronously on the main render thread before the object is reclaimed later. There's some object type in Blink somewhere that specifies a pre-finalizer which ends up calling `CommandBufferHelper::Flush()`. I think it may be interesting to dig into which object type is being reclaimed and why that requires issuing a Flush() operation. 
 

For now, my best guess is that GC is incorrectly collecting the command buffers and triggers the destructor for CommandBufferHelper which then frees some of the associated memory resources.

Which buffers are you thinking about? The pre-finalizer in the trace really suggests having a Blink object (in third_party/blink/renderer/) that specifies via some proxy that this flush should happen when it is being reclaimed. It's generally not the destructor but an explicit Flush call we are after because destructor invocations would happen later (in e.g. concurrent sweeping).
 
-Michael

Michael Lippautz

unread,
Apr 27, 2022, 4:54:39 PM4/27/22
to k...@chromium.org, kallyn...@yeticgi.com, graphics-dev, Chromium-dev
On Tue, Apr 26, 2022 at 10:44 PM Ken Russell <k...@chromium.org> wrote:
Adding graphics-dev@ though in general it's best to not cross-post. Since this issue involves GC, folks on the chromium-dev@ side may have more advice.

I'd suggest you use the heap profiler and try to figure out what kinds of objects are being collected during that full GC. It's entirely possible that if your app, or Three.js, are allocating and later unreferencing WebGL objects at the JavaScript level, that their finalizers invoked during a full GC are enqueuing GPU work that's causing your application to stutter.

+1 here as this is likely the reason, see also below.
 
GC may happen because of general memory pressure through JS allocations, so that's expected.

As for why GC creates additional work: Pre-finalizers allow user objects to specify code to be run synchronously on the main render thread before the object is reclaimed later. There's some object type in Blink somewhere that specifies a pre-finalizer which ends up calling `CommandBufferHelper::Flush()`. I think it may be interesting to dig into which object type is being reclaimed and why that requires issuing a Flush() operation. 
 
For now, my best guess is that GC is incorrectly collecting the command buffers and triggers the destructor for CommandBufferHelper which then frees some of the associated memory resources.

Ken Russell

unread,
Apr 27, 2022, 4:56:35 PM4/27/22
to Kallyn Gowdy, Michael Lippautz, graphics-dev, Chromium-dev
You're welcome Kal! Glad you figured out the problem, and that it was solvable at the app / library level!

-Ken


Reply all
Reply to author
Forward
0 new messages