browser process not notified when GPU process dies - only on cros

460 views
Skip to first unread message

Joel Hockey

unread,
Jan 11, 2024, 8:35:12 PM1/11/24
to Chromium-dev, Chromium OS discuss
There was a recent bug in GPU code where a DCHECK behind a non-released feature was crashing the gpu process and it would leave my dev ChromeOS in a bad state with a blank screen until I did a reboot.

I only noticed this because I compiled with `dcheck_always_on = true` and I also turned on `--enable-field-trial-config`.  The code is now fixed, but I am still concerned about how the system handled this.

GpuProcessHost::RecordProcessCrash() has logic to keep a count of gpu crashes and will eventually crash chrome if the gpu is crashing too much.  ChromeOS session_manager also keeps track of chrome crashes and will eventually restart chrome in safe-mode where the --enable-field-trial-config switch would be ignored, and the device should remain usable.

The problem on my cros eve device, was that browser process was not detecting that gpu process had exited, and was not calling RecordProcessCrash().

I have created crrev.com/c/5191725 to repro the crash scenario.

When I run this patch on linux-chrome, I can see that RecordProcessCrash() gets called as expected, and after 6 crashes, it crashes chrome as expected:
...
[826521:826521:0111/163016.433689:ERROR:gpu_process_host.cc(991)] GPU process exited unexpectedly: exit_code=5
[826521:826521:0111/163016.433724:ERROR:gpu_process_host.cc(1355)] XXX RecordProcessCrash
[826521:826521:0111/163016.433757:FATAL:gpu_data_manager_impl_private.cc(448)] GPU process isn't usable. Goodbye.

Does anyone have ideas why this code is not working correctly for ChromeOS?  In ChromeOS, the gpu process crashes, but nothing in BrowserChildProcessHostImpl::OnChildDisconnected() is noticing.


Erik Chen

unread,
Jan 11, 2024, 8:37:21 PM1/11/24
to joelh...@chromium.org, Peter McNeeley, Thomas Lukaszewicz, Chromium-dev, Chromium OS discuss

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAJJNyZajFjDgQbumzcqKxhHNmkJJx0iys40eMc3oUkJ%3DAb7vpA%40mail.gmail.com.

Joel Hockey

unread,
Jan 11, 2024, 9:18:43 PM1/11/24
to Peter McNeeley, Erik Chen, Thomas Lukaszewicz, Chromium-dev, Chromium OS discuss
The fallback works ok on linux, but not on chromeos (or linux-chromeos).  So I think there might be something different about how chromeos is handling child processes of the browser and getting notified when they exit.  I don't believe that this is a bug in any gpu-related code, but more //content handling of child processes.

On Fri, Jan 12, 2024 at 11:42 AM Peter McNeeley <peterm...@chromium.org> wrote:
I currently own this bug and was able to reproduce this issue. Yes this is a problem and should be fixed but I dont think it is a regression (from existing behavior).
My current theory is that the crash is too early to trigger the fallback but I have not investigated it enough.

Joel Hockey

unread,
Jan 12, 2024, 12:22:25 AM1/12/24
to Peter McNeeley, Erik Chen, Thomas Lukaszewicz, Chromium-dev, Chromium OS discuss
I've found the cause and have a fix in crrev.com/c/5192042.

Exo is calling EstablishGpuChannelSync() during PreProfileInit().  It gets stuck in a loop calling this forever, and so we never get to RunMainMessageLoop() to receive the IPC::ChannelMojo::OnPipeError() to indicate that gpu has crashed.
https://source.chromium.org/chromium/chromium/src/+/main:content/browser/compositor/viz_process_transport_factory.cc;l=243-249;drc=bb0e760cd971edfd0c334760ac396d3b9d7917a2

sujata Dutta

unread,
Jan 12, 2024, 1:37:03 PM1/12/24
to Chromium-dev, Joel Hockey, Erik Chen, Thomas Lukaszewicz, Chromium-dev, Chromium OS discuss, Peter McNeeley
I have the similar callstack when the browser process is killed from task manager and renderer process is in a dangling state since we are using the in-process-gpu.
Any suggestions for this?

E Kboy

unread,
Jan 16, 2024, 5:44:32 PM1/16/24
to joelh...@chromium.org, Chromium-dev, Chromium OS discuss
Whatsapp me on +2349156408203

Reply all
Reply to author
Forward
0 new messages