GPU process crashy in M125 on Windows, possibly SwANGLE related

332 views
Skip to first unread message

Justin Novosad

unread,
May 17, 2024, 10:00:07 AMMay 17
to Graphics-dev, geof...@chromium.org, mat...@chromium.org
Hi Graphics devs,

We've released Arc 1.2 on Windows yesterday, which uses Chromium 125 and we're seeing a spike in GPU process crashes. Unfortunately, it's crashing in an obscure way that escapes our crash reporting so we don't have GPU process stack traces.  What we are seeing however is the browser process crashing here due to running out of GPU fallback modes:

This means the GPU process is in a crash loop while in DISPLAY_COMPOSITOR mode.  This issue is new in M125 and specific to Windows. It affects a small number of users, but hits them a lot.

I believe it's probably a regression in SwANGLE for two reasons:
1) It's happening after falling back to having all GPU features disabled.
2) Several SwANGLE-related crashes have mysteriously stopped coming in through our crash reporting infrastructure, which I find suspicious, and might explain why we're not getting GPU-process crash reports that correlate with these browser process crashes

I noticed this issue that suggests removing the swiftshader fallback: https://crbug.com/40277080

Unfortunately the plan referenced in that bug is an internal go link. Are there more details you're willing to share with the Chromium community?

Thanks, Justin

Vasiliy Telezhnikov

unread,
May 17, 2024, 10:53:25 AMMay 17
to Justin Novosad, Graphics-dev, geof...@chromium.org, mat...@chromium.org
SwANGLE fallback (and its removal) affects only WebGL. In DISPLAY_COMPOSITOR mode webgl is already not available, gpu process will not initialize gpu or swift shader (SwANGLE is used in SWIFTSHADER mode and is disabled if we crash too many times with it too).

SoftwareRenderer is still there and not going away, so we need gpu stack traces (or at least more details of when it crashes, what hardware/os it is) to figure out why the gpu process crashes.

- Vasiliy

Justin Novosad

unread,
May 17, 2024, 3:42:23 PMMay 17
to Vasiliy Telezhnikov, Graphics-dev, geof...@chromium.org, mat...@chromium.org
Thanks.  I'll share more info when I can. Are you not seeing a similar crash uptick in Chrome 125? Look for browser process stack traces with "IntentionallyCrashBrowserForUnusableGpuProcess" on Windows

Marshall Greenblatt

unread,
May 17, 2024, 7:06:25 PMMay 17
to Justin Novosad, Vasiliy Telezhnikov, Graphics-dev, geof...@chromium.org, mat...@chromium.org
I'm also seeing some new/unusual GPU process exits with M125 (log messages below). Setting a breakpoint in GpuServiceImpl::MaybeExitOnContextLost shows |context_lost_reason=kUnknown| and the following call stack:

> libcef.dll!viz::GpuServiceImpl::MaybeExitOnContextLost(bool) Line 1119 C++
  libcef.dll!gpu::GpuChannelManager::OnContextLost(int context_lost_count, bool synthetic_loss, gpu::error::ContextLostReason context_lost_reason) Line 1109 C++
  libcef.dll!gpu::CommandBufferStub::CheckContextLost() Line 707 C++
  libcef.dll!gpu::CommandBufferStub::OnParseError() Line 404 C++
  libcef.dll!gpu::CommandBufferService::SetParseError(gpu::error::Error error) Line 362 C++
  libcef.dll!gpu::CommandBufferStub::MarkContextLost() Line 729 C++
  libcef.dll!gpu::GpuChannel::MarkAllContextsLost() Line 856 C++
  libcef.dll!gpu::GpuChannelManager::LoseAllContexts() Line 641 C++
  libcef.dll!viz::GpuServiceImpl::LoseAllContexts() Line 997 C++
  libcef.dll!gpu::GpuChannelManager::OnContextLost(int context_lost_count, bool synthetic_loss, gpu::error::ContextLostReason context_lost_reason) Line 1105 C++
  libcef.dll!gpu::CommandBufferStub::CheckContextLost() Line 707 C++
  libcef.dll!gpu::CommandBufferStub::OnParseError() Line 404 C++
  libcef.dll!gpu::CommandBufferService::SetParseError(gpu::error::Error error) Line 362 C++
  libcef.dll!gpu::CommandBufferStub::MarkContextLost() Line 729 C++
  libcef.dll!gpu::GpuChannel::MarkAllContextsLost() Line 856 C++
  libcef.dll!gpu::GpuChannelManager::LoseAllContexts() Line 641 C++
  libcef.dll!viz::GpuServiceImpl::LoseAllContexts() Line 997 C++
  libcef.dll!gpu::GpuChannelManager::OnContextLost(int context_lost_count, bool synthetic_loss, gpu::error::ContextLostReason context_lost_reason) Line 1105 C++
  libcef.dll!gpu::CommandBufferStub::CheckContextLost() Line 707 C++
  libcef.dll!gpu::CommandBufferStub::OnParseError() Line 404 C++
  libcef.dll!gpu::CommandBufferService::SetParseError(gpu::error::Error error) Line 362 C++
  libcef.dll!gpu::CommandBufferStub::MarkContextLost() Line 729 C++
  libcef.dll!gpu::GpuChannel::MarkAllContextsLost() Line 856 C++
  libcef.dll!gpu::GpuChannelManager::LoseAllContexts() Line 641 C++
  libcef.dll!viz::GpuServiceImpl::LoseAllContexts() Line 997 C++
  libcef.dll!gpu::GpuChannelManager::OnContextLost(int context_lost_count, bool synthetic_loss, gpu::error::ContextLostReason context_lost_reason) Line 1105 C++
  libcef.dll!gpu::CommandBufferStub::CheckContextLost() Line 707 C++
  libcef.dll!gpu::CommandBufferStub::OnParseError() Line 404 C++
  libcef.dll!gpu::CommandBufferService::SetParseError(gpu::error::Error error) Line 362 C++
  libcef.dll!gpu::CommandBufferStub::MarkContextLost() Line 729 C++
  libcef.dll!gpu::GpuChannel::MarkAllContextsLost() Line 856 C++
  libcef.dll!gpu::GpuChannelManager::LoseAllContexts() Line 641 C++
  libcef.dll!viz::GpuServiceImpl::LoseAllContexts() Line 997 C++
  libcef.dll!gpu::GpuChannelManager::OnContextLost(int context_lost_count, bool synthetic_loss, gpu::error::ContextLostReason context_lost_reason) Line 1105 C++
  libcef.dll!gpu::CommandBufferStub::CheckContextLost() Line 707 C++
  libcef.dll!gpu::CommandBufferStub::OnParseError() Line 404 C++
  libcef.dll!gpu::CommandBufferService::SetParseError(gpu::error::Error error) Line 362 C++
  libcef.dll!gpu::CommandBufferStub::MarkContextLost() Line 729 C++
  libcef.dll!gpu::GpuChannel::MarkAllContextsLost() Line 856 C++
  libcef.dll!gpu::GpuChannelManager::LoseAllContexts() Line 641 C++
  libcef.dll!viz::GpuServiceImpl::LoseAllContexts() Line 997 C++
  libcef.dll!gpu::GpuChannelManager::OnContextLost(int context_lost_count, bool synthetic_loss, gpu::error::ContextLostReason context_lost_reason) Line 1105 C++
  libcef.dll!base::OnceCallback<void (bool, unsigned int)>::Run(bool args, unsigned int args) Line 156 C++
  libcef.dll!gpu::SharedContextState::MarkContextLost(gpu::error::ContextLostReason reason) Line 882 C++
  libcef.dll!viz::SkiaOutputSurfaceImplOnGpu::MarkContextLost(viz::ContextLostReason reason) Line 2683 C++
  libcef.dll!viz::SkiaOutputSurfaceImplOnGpu::Reshape(const SkImageInfo & image_info, const gfx::ColorSpace & color_space, int sample_count, float device_scale_factor, gfx::OverlayTransform transform) Line 443 C++

[0517/185204.520:ERROR:gpu_service_impl.cc(1119)] Exiting GPU process because some drivers can't recover from errors. GPU process will restart shortly.
[0517/185204.552:ERROR:gpu_process_host.cc(999)] GPU process exited unexpectedly: exit_code=34
[0517/185204.552:WARNING:gpu_process_host.cc(1433)] The GPU process has crashed 1 time(s)
[0517/185204.770:WARNING:gpu_process_host.cc(1021)] Reinitialized the GPU process after a crash. The reported initialization time was 146 ms
[0517/185205.116:ERROR:shared_image_manager.cc(225)] SharedImageManager::ProduceSkia: Trying to Produce a Skia representation from a non-existent mailbox.
[0517/185205.116:ERROR:image_context_impl.cc(362)] Failed to fulfill the promise texture - SharedImage mailbox not found in SharedImageManager.
[0517/185212.393:ERROR:gpu_service_impl.cc(1119)] Exiting GPU process because some drivers can't recover from errors. GPU process will restart shortly.
[0517/185212.424:ERROR:gpu_process_host.cc(999)] GPU process exited unexpectedly: exit_code=34
[0517/185212.424:WARNING:gpu_process_host.cc(1433)] The GPU process has crashed 2 time(s)
[0517/185212.679:WARNING:gpu_process_host.cc(1021)] Reinitialized the GPU process after a crash. The reported initialization time was 141 ms
[0517/185212.991:ERROR:shared_image_manager.cc(225)] SharedImageManager::ProduceSkia: Trying to Produce a Skia representation from a non-existent mailbox.
[0517/185212.991:ERROR:image_context_impl.cc(362)] Failed to fulfill the promise texture - SharedImage mailbox not found in SharedImageManager.
[0517/185213.077:ERROR:gpu_service_impl.cc(1119)] Exiting GPU process because some drivers can't recover from errors. GPU process will restart shortly.
[0517/185213.116:ERROR:gpu_process_host.cc(999)] GPU process exited unexpectedly: exit_code=34
[0517/185213.116:WARNING:gpu_process_host.cc(1433)] The GPU process has crashed 3 time(s)
[0517/185213.191:ERROR:command_buffer_proxy_impl.cc(131)] ContextResult::kTransientFailure: Failed to send GpuControl.CreateCommandBuffer.
[0517/185213.191:ERROR:context_provider_command_buffer.cc(157)] GpuChannelHost failed to create command buffer.
[0517/185213.300:WARNING:gpu_process_host.cc(1021)] Reinitialized the GPU process after a crash. The reported initialization time was 130 ms
[0517/185213.316:ERROR:command_buffer_proxy_impl.cc(131)] ContextResult::kTransientFailure: Failed to send GpuControl.CreateCommandBuffer.
[0517/185213.316:ERROR:context_provider_command_buffer.cc(157)] GpuChannelHost failed to create command buffer.

To unsubscribe from this group and stop receiving emails from it, send an email to graphics-dev...@chromium.org.

Victor Miura

unread,
May 18, 2024, 11:33:25 AMMay 18
to Marshall Greenblatt, Justin Novosad, Vasiliy Telezhnikov, Graphics-dev, geof...@chromium.org, mat...@chromium.org
I'm not currently seeing an increased trend in IntentionallyCrashBrowserForUnusableGpuProcess crashes in M125, however I do see that there was a >10x increase in M120 and elevated trend since then. Perhaps this was an experiment that now rolled out by default?
--
I support flexible work schedules, and I’m sending this email now because it is within the hours I’m working today.  Please do not feel obliged to reply straight away - I understand that you will reply during the hours you work, which may not match mine. (credit: jparent@)

Victor Miura

unread,
May 20, 2024, 11:46:21 AMMay 20
to Marshall Greenblatt, Justin Novosad, Vasiliy Telezhnikov, Graphics-dev, geof...@chromium.org, mat...@chromium.org
Just to follow up, I think the change I noted in IntentionallyCrashBrowserForUnusableGpuProcess in M120 wasn't really there.

In summary though, we're not seeing an increase in IntentionallyCrashBrowserForUnusableGpuProcess crashes in M125.

Mathias Bynens

unread,
May 21, 2024, 10:23:14 AMMay 21
to Justin Novosad, Graphics-dev, geof...@chromium.org
https://issues.chromium.org/issues/40277080#comment21 captures the outcome: “[T]he agreed-upon plan is to add --enable-unsafe-swiftshader as an explicit opt-in for all products (Chrome, new Headless, chrome-headless-shell, Chrome for Testing).”

Justin Novosad

unread,
May 21, 2024, 10:23:23 AMMay 21
to Victor Miura, Marshall Greenblatt, Vasiliy Telezhnikov, Graphics-dev, geof...@chromium.org, mat...@chromium.org
Hmmm... Are there GPU feature flags that are in the launched state but not yet enabled by default?
Reply all
Reply to author
Forward
0 new messages