Highest Quality vs Fastest Inference dropdown?

80 views
Skip to first unread message

Die4Ever2005

unread,
Aug 12, 2025, 1:36:45 AMAug 12
to Chrome Built-in AI Early Preview Program Discussions
What does this dropdown in chrome://on-device-internals/ do? Highest Quality vs Fastest Inference. Are we able to choose this in the API? Which is the default?
BBIBfZH.png

Thomas Steiner

unread,
Aug 12, 2025, 1:50:07 AMAug 12
to Die4Ever2005, Chrome Built-in AI Early Preview Program Discussions
Hi there,

This is a feature control for manually loaded models. It's not exposed to the API.

Cheers,
Tom

On Tue, Aug 12, 2025 at 1:36 PM Die4Ever2005 <die4ev...@gmail.com> wrote:
What does this dropdown in chrome://on-device-internals/ do? Highest Quality vs Fastest Inference. Are we able to choose this in the API? Which is the default?
BBIBfZH.png

--
You received this message because you are subscribed to the Google Groups "Chrome Built-in AI Early Preview Program Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chrome-ai-dev-previe...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/chrome-ai-dev-preview-discuss/579e7847-4b50-447b-abbb-2f8995868478n%40chromium.org.


--
Thomas Steiner, PhD—Developer Relations Engineer (blog.tomayac.comtoot.cafe/@tomayac)

Google Spain, S.L.U.
Torre Picasso, Pl. Pablo Ruiz Picasso, 1, Tetuán, 28020 Madrid, Spain

CIF: B63272603
Inscrita en el Registro Mercantil de Madrid, sección 8, Hoja M­-435397 Tomo 24227 Folio 25

----- BEGIN PGP SIGNATURE -----
Version: GnuPG v2.4.8 (GNU/Linux)

iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck
0fjumBl3DCharaCTersAttH3b0ttom.xKcd.cOm/1181.
----- END PGP SIGNATURE -----

Die4Ever2005

unread,
Aug 12, 2025, 10:14:44 PMAug 12
to Chrome Built-in AI Early Preview Program Discussions, Thomas Steiner, Chrome Built-in AI Early Preview Program Discussions, Die4Ever2005
So which option is used in the Javascript API?

Thomas Steiner

unread,
Aug 14, 2025, 1:15:36 PMAug 14
to Die4Ever2005, Clark Duvall, Chrome Built-in AI Early Preview Program Discussions, Thomas Steiner
@Clark Duvall, can you make any statement about automatic selection of highest quality vs. fastest inference (as can be seen in the on-device-internals page) and the Prompt API? Is it based on the device performance class?

Thanks!

Thomas Steiner, PhD—Developer Relations Engineer (blog.tomayac.comtoot.cafe/@tomayac)

Google Spain, S.L.U.
Torre Picasso, Pl. Pablo Ruiz Picasso, 1, Tetuán, 28020 Madrid, Spain

CIF: B63272603
Inscrita en el Registro Mercantil de Madrid, sección 8, Hoja M­-435397 Tomo 24227 Folio 25

----- BEGIN PGP SIGNATURE -----
Version: GnuPG v2.4.8 (GNU/Linux)

iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck
0fjumBl3DCharaCTersAttH3b0ttom.xKcd.cOm/1181.
----- END PGP SIGNATURE -----

Clark Duvall

unread,
Aug 18, 2025, 3:33:02 PMAug 18
to Thomas Steiner, Die4Ever2005, Chrome Built-in AI Early Preview Program Discussions
We use the highest quality model that the device is capable of running. This translates to devices with High/Very High performance class as reported by chrome://on-device-internals using Highest Quality, and devices with Low/Medium using Fastest Inference.

Thomas Steiner

unread,
Aug 19, 2025, 5:55:06 AMAug 19
to Clark Duvall, Thomas Steiner, Die4Ever2005, Chrome Built-in AI Early Preview Program Discussions
Perfect, that's what I thought. Thanks for confirming, Clark! Die4Ever2005, note that this is an implementation detail of how it works today. It can change at any time, for example, should we enable CPU inference (right now, we require a GPU). 

Bobo Zhou

unread,
Aug 19, 2025, 6:52:33 AMAug 19
to Chrome Built-in AI Early Preview Program Discussions, Thomas Steiner, Die4Ever2005, Chrome Built-in AI Early Preview Program Discussions, Clark Duvall
Can this model directory support asynchronous downloads of Ollama model addresses? Or can it directly pull model addresses from LLM Studio? This seems like a more convenient way to operate.
Snipaste_2025-08-19_18-52-21.png

Die4Ever2005

unread,
Sep 5, 2025, 1:50:28 AM (2 days ago) Sep 5
to Chrome Built-in AI Early Preview Program Discussions, Bobo Zhou, Thomas Steiner, Die4Ever2005, Chrome Built-in AI Early Preview Program Discussions, Clark Duvall
> We use the highest quality model that the device is capable of running. This translates to devices with High/Very High performance class as reported by chrome://on-device-internals using Highest Quality, and devices with Low/Medium using Fastest Inference.

My laptop is "Very Low" and I cannot use multimodal on here. However chrome://on-device-internals/ allows me to run the Highest Quality model and even attach an image and do multimodal prompts in there, so I know the hardware is capable, but I cannot do this in my own Javascript.

This is pretty punishing for when I'm trying to dev on the go, I cannot even test my code.

I've noticed the JSON output is FAR less reliable on my laptop compared to my desktop computer, I had to make my JSON prompts auto retry just for JSON.parse to succeed, but I don't think I've ever seen it fail on my desktop. I imagine failing and retrying is slower than running the higher quality model in the first place, not to mention the garbage data it outputs like JSON strings starting with symbols instead of the desired text.

Die4Ever2005

unread,
Sep 5, 2025, 1:31:50 PM (2 days ago) Sep 5
to Chrome Built-in AI Early Preview Program Discussions, Die4Ever2005, Bobo Zhou, Thomas Steiner, Chrome Built-in AI Early Preview Program Discussions, Clark Duvall
Does this coincide with multimodal support? At least that would give me some way to detect the issue.
Reply all
Reply to author
Forward
0 new messages