Hewro Hewei
unread,Mar 31, 2026, 6:52:45 AM (6 days ago) Mar 31Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Chromium-dev, Joe Mason, Chromium-dev, scheduler-dev, ihe...@gmail.com
Thanks for the reply.
Our detector is heartbeat-based: a periodic timer on the UI thread updates a timestamp, and a monitor thread reports a stall if that heartbeat is overdue.
I looked through Chromium’s existing mechanisms (`HangWatcher`, `GpuWatchdogThread`, `JankMonitorImpl`, responsiveness watcher), and my understanding is that they mostly reason about task/event/message-pump work boundaries rather than a heartbeat signal.
So for a heartbeat-based detector, it seems expected that some reports will capture only the normal message-loop wait path: the signal reflects overall UI-thread responsiveness, not necessarily the runtime of one specific task/event/message. By the time we sample, the UI thread may already have returned to the run loop.
The idea of discounting reports when the monitor thread itself appears delayed is also very helpful for us.Separately, in production we sometimes see very shallow crashpad stacks for the UI thread, e.g. only 2 frames on macOS:
```text
_mach_msg2_trap
_mach_msg2_internal
```
and on Windows sometimes similarly shallow stacks such as:
```text
NtUserCalcMenuBar
NcGetFrameMetrics
```
Has the Chromium team seen this kind of very shallow stack in apparent UI-hang reports? If so, any guidance on common causes or ways to improve diagnostic quality would be much appreciated.
Thanks again!
On Friday, March 27, 2026 at 3:03:29 AM UTC+8 Joe Mason wrote:
Hi Chromium folks,
I’m working on responsiveness monitoring for a Chromium-based application. We report a UI stall when the browser main thread has no heartbeat for more than 6 seconds. The heartbeat is application-defined: the UI thread periodically posts to a watchdog thread.
A noticeable portion of these reports capture a stack that is only in the normal message-loop wait path, with no product/business logic above it.
On Windows, stacks often look like:
- `ZwUserMsgWaitForMultipleObjectsEx`
- `user32!RealMsgWaitForMultipleObjectsEx`
- `base::MessagePumpForUI::WaitForWork`
- `base::MessagePumpForUI::DoRunLoop`
On macOS, they often look like:
- `_mach_msg`
- `CFRunLoopRunSpecific`
- `-[NSApplication run]`
- `base::MessagePumpNSApplication::DoRun`
Our current interpretation is mainly:
1. the UI thread/process was not scheduled for a long time due to system-level reasons (high load, suspension, etc.), or
2. the sampled stack is not the actual hang site, and the thread had already returned to the message loop by the time we captured it.
We also account for possible watchdog-thread delay on our side, with separate telemetry to detect that.
For this kind of pure wait-path stack, is that the right way to think about it from the Chromium team’s perspective? Are there other common explanations we should consider?
Also, are there any recommended diagnostics, heuristics, or practical mitigation tricks for investigating and reducing this kind of report?
Thanks!
--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/894b3e2c-4353-4c33-84a3-8d158012f6b9n%40chromium.org.