Highest Quality vs Fastest Inference dropdown?

120 views
Skip to first unread message

Die4Ever2005

unread,
Aug 12, 2025, 1:36:45 AMAug 12
to Chrome Built-in AI Early Preview Program Discussions
What does this dropdown in chrome://on-device-internals/ do? Highest Quality vs Fastest Inference. Are we able to choose this in the API? Which is the default?
BBIBfZH.png

Thomas Steiner

unread,
Aug 12, 2025, 1:50:07 AMAug 12
to Die4Ever2005, Chrome Built-in AI Early Preview Program Discussions
Hi there,

This is a feature control for manually loaded models. It's not exposed to the API.

Cheers,
Tom

On Tue, Aug 12, 2025 at 1:36 PM Die4Ever2005 <die4ev...@gmail.com> wrote:
What does this dropdown in chrome://on-device-internals/ do? Highest Quality vs Fastest Inference. Are we able to choose this in the API? Which is the default?
BBIBfZH.png

--
You received this message because you are subscribed to the Google Groups "Chrome Built-in AI Early Preview Program Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chrome-ai-dev-previe...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/chrome-ai-dev-preview-discuss/579e7847-4b50-447b-abbb-2f8995868478n%40chromium.org.


--
Thomas Steiner, PhD—Developer Relations Engineer (blog.tomayac.comtoot.cafe/@tomayac)

Google Spain, S.L.U.
Torre Picasso, Pl. Pablo Ruiz Picasso, 1, Tetuán, 28020 Madrid, Spain

CIF: B63272603
Inscrita en el Registro Mercantil de Madrid, sección 8, Hoja M­-435397 Tomo 24227 Folio 25

----- BEGIN PGP SIGNATURE -----
Version: GnuPG v2.4.8 (GNU/Linux)

iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck
0fjumBl3DCharaCTersAttH3b0ttom.xKcd.cOm/1181.
----- END PGP SIGNATURE -----

Die4Ever2005

unread,
Aug 12, 2025, 10:14:44 PMAug 12
to Chrome Built-in AI Early Preview Program Discussions, Thomas Steiner, Chrome Built-in AI Early Preview Program Discussions, Die4Ever2005
So which option is used in the Javascript API?

Thomas Steiner

unread,
Aug 14, 2025, 1:15:36 PMAug 14
to Die4Ever2005, Clark Duvall, Chrome Built-in AI Early Preview Program Discussions, Thomas Steiner
@Clark Duvall, can you make any statement about automatic selection of highest quality vs. fastest inference (as can be seen in the on-device-internals page) and the Prompt API? Is it based on the device performance class?

Thanks!

Thomas Steiner, PhD—Developer Relations Engineer (blog.tomayac.comtoot.cafe/@tomayac)

Google Spain, S.L.U.
Torre Picasso, Pl. Pablo Ruiz Picasso, 1, Tetuán, 28020 Madrid, Spain

CIF: B63272603
Inscrita en el Registro Mercantil de Madrid, sección 8, Hoja M­-435397 Tomo 24227 Folio 25

----- BEGIN PGP SIGNATURE -----
Version: GnuPG v2.4.8 (GNU/Linux)

iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck
0fjumBl3DCharaCTersAttH3b0ttom.xKcd.cOm/1181.
----- END PGP SIGNATURE -----

Clark Duvall

unread,
Aug 18, 2025, 3:33:02 PMAug 18
to Thomas Steiner, Die4Ever2005, Chrome Built-in AI Early Preview Program Discussions
We use the highest quality model that the device is capable of running. This translates to devices with High/Very High performance class as reported by chrome://on-device-internals using Highest Quality, and devices with Low/Medium using Fastest Inference.

Thomas Steiner

unread,
Aug 19, 2025, 5:55:06 AMAug 19
to Clark Duvall, Thomas Steiner, Die4Ever2005, Chrome Built-in AI Early Preview Program Discussions
Perfect, that's what I thought. Thanks for confirming, Clark! Die4Ever2005, note that this is an implementation detail of how it works today. It can change at any time, for example, should we enable CPU inference (right now, we require a GPU). 

Bobo Zhou

unread,
Aug 19, 2025, 6:52:33 AMAug 19
to Chrome Built-in AI Early Preview Program Discussions, Thomas Steiner, Die4Ever2005, Chrome Built-in AI Early Preview Program Discussions, Clark Duvall
Can this model directory support asynchronous downloads of Ollama model addresses? Or can it directly pull model addresses from LLM Studio? This seems like a more convenient way to operate.
Snipaste_2025-08-19_18-52-21.png

Die4Ever2005

unread,
Sep 5, 2025, 1:50:28 AMSep 5
to Chrome Built-in AI Early Preview Program Discussions, Bobo Zhou, Thomas Steiner, Die4Ever2005, Chrome Built-in AI Early Preview Program Discussions, Clark Duvall
> We use the highest quality model that the device is capable of running. This translates to devices with High/Very High performance class as reported by chrome://on-device-internals using Highest Quality, and devices with Low/Medium using Fastest Inference.

My laptop is "Very Low" and I cannot use multimodal on here. However chrome://on-device-internals/ allows me to run the Highest Quality model and even attach an image and do multimodal prompts in there, so I know the hardware is capable, but I cannot do this in my own Javascript.

This is pretty punishing for when I'm trying to dev on the go, I cannot even test my code.

I've noticed the JSON output is FAR less reliable on my laptop compared to my desktop computer, I had to make my JSON prompts auto retry just for JSON.parse to succeed, but I don't think I've ever seen it fail on my desktop. I imagine failing and retrying is slower than running the higher quality model in the first place, not to mention the garbage data it outputs like JSON strings starting with symbols instead of the desired text.

Die4Ever2005

unread,
Sep 5, 2025, 1:31:50 PMSep 5
to Chrome Built-in AI Early Preview Program Discussions, Die4Ever2005, Bobo Zhou, Thomas Steiner, Chrome Built-in AI Early Preview Program Discussions, Clark Duvall
Does this coincide with multimodal support? At least that would give me some way to detect the issue.

Thomas Steiner

unread,
Sep 8, 2025, 4:56:31 AMSep 8
to Die4Ever2005, Mike Wasserman, Chrome Built-in AI Early Preview Program Discussions, Bobo Zhou, Thomas Steiner
Clark has since left Google, but let me loop in @Mike Wasserman. Mike, there's a question around a device being categorized as "Very low" and the Prompt API refusing to do multimodal prompts via the JavaScript API, but the same multimodal prompts work via the chrome://on-device-internals page (likely very slowly). The same user is wondering about a possible connection to JSON output being less reliable. More context below. Is this something you have more background on?  

Mike Wasserman

unread,
Sep 10, 2025, 5:49:03 PMSep 10
to Chrome Built-in AI Early Preview Program Discussions, Thomas Steiner, Chrome Built-in AI Early Preview Program Discussions, Bobo Zhou, Die4Ever2005, Mike Wasserman
(Replying from @chomium.org so this hopefully doesn't bounce)

I'm a little surprised if performance modes didn't share the same constrained decoding implementation for structured output. Minimized failure repro instructions and samples would help with following up; please file an issue!

I don't recall whether the lower performance modes support image and audio input in GPU models. Are the APIs returning unavailable? Again, filing an issue would help with repo and followup, but that might unfortunately be working as intended for now. (CPU models coming soon!)

Die4Ever2005

unread,
Sep 10, 2025, 9:15:52 PMSep 10
to Chrome Built-in AI Early Preview Program Discussions, Mike Wasserman, Thomas Steiner, Chrome Built-in AI Early Preview Program Discussions, Bobo Zhou, Die4Ever2005, Mike Wasserman
Thanks. I haven't seen JSON output fail before (not since I added 'x-guidance': {whitespace_flexible: false}) so I was surprised to see it and thought maybe it was just my laptop. Regex constraint still works perfectly. I tested on my friend's RTX 5090 and saw the same failures so it seems to be just my new prompt causing issues. I will investigate further.

I would still be interested if there's a way I can determine whether the low quality or high quality mode is being used.

This is what my laptop shows for availability, I thought it couldn't do images because I thought image and audio would be the same, but it turns out I was wrong and not being granular enough:

await LanguageModel.availability({expectedInputs: [{type:'text'},{type:'image'}]});
'available'
await LanguageModel.availability({expectedInputs: [{type:'text'},{type:'image'}, {type:'audio'}]});
(index):1 Model capability is not available.
'unavailable'
await LanguageModel.availability({expectedInputs: [{type:'audio'}]});
(index):1 Model capability is not available.
'unavailable'

Thomas Steiner

unread,
Sep 12, 2025, 7:20:28 AMSep 12
to Die4Ever2005, Chrome Built-in AI Early Preview Program Discussions, Mike Wasserman, Thomas Steiner, Bobo Zhou, Mike Wasserman
Hi again,

Yes, you always need to run the `availability()` function with the set of options that you're going to use in your prompt, which is easy to forget (I'm no exception here).

We had a case at Google I/O Connect where we were given MacBook Pro devices from 2019 that would support images, but not audios in the prompt, so this is a real-world thing you definitely will encounter in production. 

I've also just opened a CL that adds a warning to our docs (this will likely still be word-smithed by our tech writers). 

Screenshot 2025-09-12 at 13.18.17.png

Cheers,
Tom
 

Die4Ever2005

unread,
Sep 13, 2025, 4:10:19 PMSep 13
to Chrome Built-in AI Early Preview Program Discussions, Thomas Steiner, Chrome Built-in AI Early Preview Program Discussions, Mike Wasserman, Bobo Zhou, Mike Wasserman, Die4Ever2005
So is audio support tied to the higher quality inference? How can I detect highest quality vs fastest inference modes? Also being able to force it would be good for dev.
Reply all
Reply to author
Forward
0 new messages