Thanks for the great feedback! Running the preference benchmark and isolating your blockers (task support and language) is exactly the right approach.
For others on the list, we highly encourage you to follow Younes's lead: use the current English model with its current capabilities as a performance proxy. Run the benchmarks to compare performance against the full model, and show us if and how this specific speed/size delta actually unlocks your use case (i.e. latency delta, details about the use case, etc). Feel free to reach out directly if sharing all the details would be easier. At the same time, use the full model in your target language to establish your quality baseline. The full model generally represents the quality ceiling; if it can't achieve the accuracy your feature requires today, a smaller expert model will likely struggle too.
See inline for your other questions:
Hi all,
Thanks for all the work on the built-in AI APIs. It's been great to follow along.
I tried the summarizer preference benchmark demo (https://googlechrome.github.io/samples/summarizer-preference-benchmark/) and was really encouraged by the results. Model size and speed have been the main things keeping me from adopting the Summarizer API, so seeing this kind of quality at a smaller footprint is exactly what I was hoping for.
It's not quite usable for my own project yet, though. My use case is headline generation, which the speed version of the API doesn't support yet, and my target language (Danish) isn't supported either.
So what I'd really like is more detail on how the new, faster model works. The I/O 2026 recap (https://developer.chrome.com/blog/chrome-at-io26) mentions a Gemma 197M "expert model" that can transparently power task-specific APIs like the summarizer, but I couldn't find anything more on it, and as far as I can tell it isn't public.
A few questions:
1. Are there plans to publish more about the Gemma 197M model?
2. How is it fine-tuned for the summarization task? Anything on the training data, distillation approach, or setup would help a lot.
3. Mainly: is there enough out there (or planned) for developers to reproduce this performance for an unsupported language like Danish? I'd be glad to do the fine-tuning myself if the recipe were documented.
I realize some of this might not be shareable yet, but even a pointer in the right direction would be appreciated.
--
You received this message because you are subscribed to the Google Groups "Chrome Built-in AI Early Preview Program Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chrome-ai-dev-previe...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/chrome-ai-dev-preview-discuss/c0e7035e-1be0-491d-a62a-3e6e339bd756n%40chromium.org.
While I can't promise that you can reproduce a model with similar characteristics, if you want to explore tuning the open Gemma models yourself for external workflows, please check out the Gemma Tuning Guide.