MV3 service worker killed by `StartTimeoutTimer` on 6W TDP devices; all extensions fail uniformly, restart loop never self-recovers

12 views
Skip to first unread message

Pruthvi Kumar

unread,
2:57 AM (7 hours ago) 2:57 AM
to Chromium Extensions

hi all,

i'm working on an MV3 extension that uses `webRequestBlocking` (deployed in managed education environments). we've hit a reproducible issue on thermally-constrained devices: specifically Intel Pentium Silver N6000 (6W TDP, 4C/4T) with 4-8GB RAM running Windows 11 Education and Chrome 145 Stable (145.0.7632.117).

the issue in short: on these low-TDP devices, the MV3 service worker consistently exceeds Chromium's 60-second `StartTimeoutTimer` budget during browser startup. Chromium kills the SW, but because our extension declares `webRequestBlocking`, Chrome immediately queues a restart; which also fails, creating an infinite timeout -> kill -> restart loop that never self-recovers.

i've done Perfetto trace analysis on this and have some specific findings and questions. hoping someone from the extensions or service worker team can help clarify a few things.

what the traces show

i captured `chrome://tracing` on an affected N6000 device. filtered for `ServiceWorkerVersion::StartWorker` and `ServiceWorkerVersion::StopWorker` events using `EXTRACT_ARG(arg_set_id, 'debug.Script')` to isolate by script URL.

finding 1: all extensions fail identically. 

this device has 5 extensions installed (ours + 4 others from different vendors). all 5 hit the 60s timeout in the first cycle, all within 45ms of each other:

Screenshot 2026-03-08 at 1.03.36 pm.pngdifferent extensions from 5 different vendors. all start within 45ms. all killed at exactly 60s. this tells me it's not any one extension's code but it's the environment. the device simply cannot service any SW startup within the 60s budget during the browser init window.

finding 2: the restart loop degrades, never recovers.

Screenshot 2026-03-08 at 1.06.32 pm.png

cycle 2 (immediate restart): all 5 SWs restart and run for ~64.76s (uniformly 4.76s worse than cycle 1) confirms that the system is degrading, not recovering.

cycle 3: all restart again with dur = -1e-9 (i.e. still running when trace was captured). the user observed the browser in this state for 10+ minutes.

the gap between cycle 1 stop and cycle 2 start is just 0.255 seconds so Chromium restarts almost immediately. there's no cooldown or backoff.

finding 3: `CrBrowserMain` is saturated during the startup window.

Screenshot 2026-03-08 at 1.10.22 pm.png

Screenshot 2026-03-08 at 1.09.15 pm.png

filtered for `CrBrowserMain` thread during the first 60 seconds:

  • 190,344 slice events
  • ~34 seconds cumulative work volume
  • top consumers are all browser internals: `ProfileManager::GetProfile` (2739ms), `chrome_prefs::CreateProfilePrefs` (1514ms), `ProfileManager::DoFinalInit` (1111ms)
  • the extension system's largest event (`ChromeExtensionSystem::InitForRegularProfile`) is 1104ms; but this is a child span nested inside `ProfileManager's` 2739ms init chain, not additive work
afaict, the main thread is spending 34 out of 60 seconds on browser-internal work during startup. that's more than half the SW's timeout budget consumed before extension code meaningfully progresses.

finding 4: debug.Restart = true on StopWorker events.
Screenshot 2026-03-08 at 1.14.38 pm.png

the `StopWorker` events for our extension carry these debug args:

```

debug.Restart = true
debug.Version Status = activated
debug.Script = chrome-extension://[id]/background/background.bundle.js

```

---------------------

questions for the chromium team

q1. what exactly triggers debug.Restart = true?

i've been reading through the service worker lifecycle code but haven't been able to pin down the exact condition that sets `Restart = true` on a `StopWorker` event. is this flag set specifically when:

  • (a) the SW had pending `webRequestBlocking` callbacks at time of kill?
  • (b) there are queued events that require the SW to be alive?
  • (c) some other condition?

understanding the trigger helps us reason about whether the restart is avoidable or if it's a fundamental consequence of declaring `webRequestBlocking`.

q2: is there any backoff mechanism on the restart loop?

from the trace data, cycle 1 -> cycle 2 gap is 255ms. cycle 2 -> cycle 3 is similar. there doesn't appear to be any exponential backoff or retry limit. on these constrained devices, the result is an infinite loop where each cycle is worse than the last (the system never gets quieter because the restarts themselves add load).

is this by design? is there a max retry count i'm not seeing in the traces?

q3: does `StartTimeoutTimer` account for main thread saturation?

the 60-second budget appears to be wall-clock time from `StartWorker` to when the SW reports ready. but on these devices, the main thread, afaict is doing 30+ seconds of browser-internal work during that same window. the SW essentially gets less than 30 seconds of actual execution time within a 60-second wall-clock budget.

has there been any discussion about making the timeout adaptive? for eg. based on actual SW execution time rather than wall clock? or pausing the timer while the main thread is saturated with higher-priority browser work?

q4: cross-extension contention: is startup serialised or parallel?

the 5 SWs start within 45ms of each other and all fail at 60s. are they sharing the same thread pool / IPC channel for `chrome.storage.local` and other extension APIs during startup? if so, 5 extensions hammering storage concurrently would explain why ALL fail even though individually each might be fine.

is there any mechanism to stagger SW startups on resource-constrained devices?'

-------------

what we've tried / observed

  • Chrome 147 Dev (147.0.7703.2): tested the `cannot_dispatch_callback` fix (CL 7596563). browser hangs dropped from 60% to 0%.. the patch works as intended. but SW still times out in 60% of runs (6/10). confirms the timeout and the hang are orthogonal issues.
  • same extension on 15W TDP hardware (i7-1065G7, i7-8565U): 10/10 starts, 0 deadlocks, initModules takes ~110ms vs ~5,400ms on N6000. same code, ~50× performance delta. afaict, RAM doesn't matter (8GB vs 16GB has identical results). TDP is the variable.
  • setTimeout(100) inflation: on N6000, a setTimeout(100) call fires after 4,000ms–23,700ms (i.e. 40x–237x inflation). on i7, same call fires at 340ms–780ms. the event loop on these devices is so contended that any async coordination mechanism becomes unreliable.
--------------------

what i'm hoping for

mainly clarity on Q1-Q4 above. but also, if there's any appetite on the chromium side for:

  1. an adaptive `StartTimeoutTimer` that accounts for main thread load (or at minimum, a configurable timeout via enterprise policy for managed environments)
  2. SW startup staggering for multi-extension environments on constrained hardware
  3. a retry backoff/limit on the restart loop to prevent infinite degradation

these education devices (chromebooks, low-cost windows laptops) are a significant deployment surface for managed extensions. 6W TDP processors are standard in this segment. the 60-second wall-clock timeout was probably reasonable when MV3 launched, but the intersection of multiple extensions + constrained hardware + browser startup contention creates a scenario where the timeout is effectively unachievable.

happy to share Perfetto traces if that's useful. they're from a managed device but i can sanitise them.

thanks for reading this far.

Pruthvi

Reply all
Reply to author
Forward
0 new messages