browser process not notified when GPU process dies - only on cros

23 views
Skip to first unread message

Joel Hockey

unread,
Jan 11, 2024, 8:33:50 PM1/11/24
to Chromium-dev, Chromium OS discuss
There was a recent bug in GPU code where a DCHECK behind a non-released feature was crashing the gpu process and it would leave my dev ChromeOS in a bad state with a blank screen until I did a reboot.

I only noticed this because I compiled with `dcheck_always_on = true` and I also turned on `--enable-field-trial-config`.  The code is now fixed, but I am still concerned about how the system handled this.

GpuProcessHost::RecordProcessCrash() has logic to keep a count of gpu crashes and will eventually crash chrome if the gpu is crashing too much.  ChromeOS session_manager also keeps track of chrome crashes and will eventually restart chrome in safe-mode where the --enable-field-trial-config switch would be ignored, and the device should remain usable.

The problem on my cros eve device, was that browser process was not detecting that gpu process had exited, and was not calling RecordProcessCrash().

I have created crrev.com/c/5191725 to repro the crash scenario.

When I run this patch on linux-chrome, I can see that RecordProcessCrash() gets called as expected, and after 6 crashes, it crashes chrome as expected:
...
[826521:826521:0111/163016.433689:ERROR:gpu_process_host.cc(991)] GPU process exited unexpectedly: exit_code=5
[826521:826521:0111/163016.433724:ERROR:gpu_process_host.cc(1355)] XXX RecordProcessCrash
[826521:826521:0111/163016.433757:FATAL:gpu_data_manager_impl_private.cc(448)] GPU process isn't usable. Goodbye.

Does anyone have ideas why this code is not working correctly for ChromeOS?  In ChromeOS, the gpu process crashes, but nothing in BrowserChildProcessHostImpl::OnChildDisconnected() is noticing.


Joel Hockey

unread,
Jan 11, 2024, 9:17:17 PM1/11/24
to Peter McNeeley, Erik Chen, Thomas Lukaszewicz, Chromium-dev, Chromium OS discuss
The fallback works ok on linux, but not on chromeos (or linux-chromeos).  So I think there might be something different about how chromeos is handling child processes of the browser and getting notified when they exit.  I don't believe that this is a bug in any gpu-related code, but more //content handling of child processes.

On Fri, Jan 12, 2024 at 11:42 AM Peter McNeeley <peterm...@chromium.org> wrote:
I currently own this bug and was able to reproduce this issue. Yes this is a problem and should be fixed but I dont think it is a regression (from existing behavior).
My current theory is that the crash is too early to trigger the fallback but I have not investigated it enough.

On Thu, Jan 11, 2024 at 8:36 PM Erik Chen <erik...@chromium.org> wrote:

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAJJNyZajFjDgQbumzcqKxhHNmkJJx0iys40eMc3oUkJ%3DAb7vpA%40mail.gmail.com.

Joel Hockey

unread,
Jan 12, 2024, 12:21:01 AM1/12/24
to Peter McNeeley, Erik Chen, Thomas Lukaszewicz, Chromium-dev, Chromium OS discuss
I've found the cause and have a fix in crrev.com/c/5192042.

Exo is calling EstablishGpuChannelSync() during PreProfileInit().  It gets stuck in a loop calling this forever, and so we never get to RunMainMessageLoop() to receive the IPC::ChannelMojo::OnPipeError() to indicate that gpu has crashed.
https://source.chromium.org/chromium/chromium/src/+/main:content/browser/compositor/viz_process_transport_factory.cc;l=243-249;drc=bb0e760cd971edfd0c334760ac396d3b9d7917a2
Reply all
Reply to author
Forward
0 new messages