Hi Thomas,
I’ve been testing the Prompt API session management behavior (Chrome on-device model). I noticed that even if I create a session and keep it untouched, the model still unloads after some idle time. This happens even when I intentionally keep the session object stored and never call destroy() on it.
From the documentation, the recommendation is to “keep an empty session alive” so the model stays loaded. However, in practice it seems that an idle session does not always count as a “living session,” and the model is unloaded anyway after a timeout.
Before I assume this is expected, I wanted to confirm:
1. Is there currently any supported way to keep the model loaded indefinitely (e.g. holding a session open that does not get timed out or GC’d)?
2. Are idle sessions intentionally treated as ‘dead’ after some period, causing an unload?
3. Is there any official guidance or upcoming API for more explicit warm-up/keep-alive behavior, such as a warmup() or “persistent session” mode?
My use case needs the model to be instantly available for short, unpredictable bursts of work. Reloading the model each time adds noticeable latency, so I’m trying to understand the correct approach based on current implementation and future plans.
Thanks a lot for your time — appreciate any clarification!
Best regards,
Wilson
--
You received this message because you are subscribed to the Google Groups "Chrome Built-in AI Early Preview Program Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chrome-ai-dev-previe...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/chrome-ai-dev-preview-discuss/06816057-4992-4a34-aa72-acf14a33c50cn%40chromium.org.
Thanks Tom — that helps a lot.
To give a bit more background on the experience I’m trying to support:
The model is used in short, intermittent bursts. A user may trigger an AI action, pause for a while, then resume with another quick interaction. These interactions are unpredictable and typically require the model to respond immediately, without noticeable warm-up time. When the model unloads during idle gaps, the reload latency becomes very visible to the user.
This is why I initially explored keeping a minimal session alive — not for preserving context, but purely to keep the model ready. Since idle sessions may be discarded, I’m looking for the most reliable way to avoid surprising latency spikes.
Your suggestion about proactively creating a cloned session makes sense. I can potentially tie that to early UI signals (opening a panel, focusing an input field, etc.). Hover is a nice optimization, though not always guaranteed (touch devices, keyboard users, etc.), so I’m considering broader cues.
If session startup becomes significantly faster in the future, that would essentially resolve the experience issue altogether. In the meantime, any recommended best practices for managing these short-lived, high-responsiveness interactions would be very helpful.
Thanks again — really appreciate the insight.
Best,
Wilson
Hi François,
Here’s what I observed in my tests: