Intent to Ship: Web Speech API: On-Device Recognition Quality

54 views
Skip to first unread message

Chromestatus

unread,
May 5, 2026, 6:48:15 PM (2 days ago) May 5
to blin...@chromium.org, ev...@google.com
Contact emails
ev...@google.com

Explainer
https://github.com/WebAudio/web-speech-api/blob/main/explainers/quality-levels.md

Specification
https://webaudio.github.io/web-speech-api

Summary
Extends the SpeechRecognition interface by adding a quality property to SpeechRecognitionOptions. This allows developers to specify the semantic capability required for on-device recognition (via processLocally: true). The proposed quality enum supports three levels—'command', 'dictation', and 'conversation'—mapping to increasing task complexity and hardware requirements. This enables developers to determine if the local device can handle high-stakes use cases (like meeting transcription) or if they should fallback to cloud services, solving the current "black box" issue of on-device model capabilities.

Blink component
Blink>Speech

Web Feature ID
speech-recognition

Motivation
While the introduction of processLocally: true was a significant step for privacy and latency, it currently treats all on-device models as functionally equivalent. In reality, on-device capabilities are highly fragmented: a lightweight model optimized for simple voice commands (e.g., "turn on the lights") is often insufficient for high-stakes use cases like video conferencing transcription or accessibility captioning, which require handling continuous speech, multiple speakers, and background noise. Because developers currently have no way to verify the semantic capability of the local model, they must blindly trust the device or default to Cloud-based recognition to guarantee a minimum user experience. This lack of transparency forces developers to bypass on-device capabilities for high-end use cases, effectively negating the privacy and bandwidth benefits of the API. There is a critical need for a mechanism that allows applications to define their required "floor" of utility (e.g., conversation-grade accuracy) to confidently utilize local processing.

Initial public proposal
https://github.com/WebAudio/web-speech-api/issues/182

TAG review
No information provided

TAG review status
Issues addressed

Goals for experimentation
None

Risks


Interoperability and Compatibility
No information provided

Gecko: Positive (https://github.com/mozilla/standards-positions/issues/1375) Neutral/positive, Firefox is launching on-device Web Speech using a single LLM model, so they won't make use of this proposed quality hint but they don't have strong objections against adding it.

WebKit: No signal (https://github.com/WebKit/standards-positions/issues/634) N/A, Apple doesn't have anyone actively working on the Web Speech API on the moment.

Web developers: No signals

Other signals:

WebView application risks

Does this intent deprecate or change behavior of existing APIs, such that it has potentially high risk for Android WebView-based applications?

No information provided


Debuggability
No information provided

Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, ChromeOS, Android, and Android WebView)?
No

Is this feature fully tested by web-platform-tests?
No


Flag name on about://flags
No information provided

Finch feature name
OnDeviceWebSpeechQuality

Rollout plan
Will ship enabled for all users

Requires code in //chrome?
True

Tracking bug
https://g-issues.chromium.org/issues/476168420

Estimated milestones
Shipping on desktop150


Anticipated spec changes

Open questions about a feature may be a source of future web compat or interop issues. Please list open issues (e.g. links to known github issues in the project for the feature specification) whose resolution may introduce web compat/interop risk (e.g., changing to naming or structure of the API in a non-backward-compatible way).

No information provided

Link to entry on the Chrome Platform Status
https://chromestatus.com/feature/5136859632107520?gate=6594055733641216

Links to previous Intent discussions
Intent to Prototype: https://groups.google.com/a/chromium.org/d/msgid/blink-dev/69694ed0.050a0220.f8796.0337.GAE%40google.com


This intent message was generated by Chrome Platform Status.

Rick Byers

unread,
May 6, 2026, 11:02:34 AM (yesterday) May 6
to Chromestatus, blin...@chromium.org, ev...@google.com
It looks like this is in this PR, right? Is there any reason we shouldn't wait for the PR to land before shipping?

Summary
Extends the SpeechRecognition interface by adding a quality property to SpeechRecognitionOptions. This allows developers to specify the semantic capability required for on-device recognition (via processLocally: true). The proposed quality enum supports three levels—'command', 'dictation', and 'conversation'—mapping to increasing task complexity and hardware requirements. This enables developers to determine if the local device can handle high-stakes use cases (like meeting transcription) or if they should fallback to cloud services, solving the current "black box" issue of on-device model capabilities.

Blink component
Blink>Speech

Web Feature ID
speech-recognition

Motivation
While the introduction of processLocally: true was a significant step for privacy and latency, it currently treats all on-device models as functionally equivalent. In reality, on-device capabilities are highly fragmented: a lightweight model optimized for simple voice commands (e.g., "turn on the lights") is often insufficient for high-stakes use cases like video conferencing transcription or accessibility captioning, which require handling continuous speech, multiple speakers, and background noise. Because developers currently have no way to verify the semantic capability of the local model, they must blindly trust the device or default to Cloud-based recognition to guarantee a minimum user experience. This lack of transparency forces developers to bypass on-device capabilities for high-end use cases, effectively negating the privacy and bandwidth benefits of the API. There is a critical need for a mechanism that allows applications to define their required "floor" of utility (e.g., conversation-grade accuracy) to confidently utilize local processing.

Initial public proposal
https://github.com/WebAudio/web-speech-api/issues/182

TAG review
No information provided

TAG review status
Issues addressed

Goals for experimentation
None

Risks


Interoperability and Compatibility
No information provided

Gecko: Positive (https://github.com/mozilla/standards-positions/issues/1375) Neutral/positive, Firefox is launching on-device Web Speech using a single LLM model, so they won't make use of this proposed quality hint but they don't have strong objections against adding it.

WebKit: No signal (https://github.com/WebKit/standards-positions/issues/634) N/A, Apple doesn't have anyone actively working on the Web Speech API on the moment.

Web developers: No signals

Do we know of any website who wants to use this API? In general we don't ship APIs that we don't have known customers for.
 

Other signals:

WebView application risks

Does this intent deprecate or change behavior of existing APIs, such that it has potentially high risk for Android WebView-based applications?

No information provided


Debuggability
No information provided

Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, ChromeOS, Android, and Android WebView)?
No

Is this feature fully tested by web-platform-tests?
No

Why not? 


Flag name on about://flags
No information provided

Finch feature name
OnDeviceWebSpeechQuality

Rollout plan
Will ship enabled for all users

Requires code in //chrome?
True

Tracking bug
https://g-issues.chromium.org/issues/476168420

Estimated milestones
Shipping on desktop150


Anticipated spec changes

Open questions about a feature may be a source of future web compat or interop issues. Please list open issues (e.g. links to known github issues in the project for the feature specification) whose resolution may introduce web compat/interop risk (e.g., changing to naming or structure of the API in a non-backward-compatible way).

No information provided

Link to entry on the Chrome Platform Status
https://chromestatus.com/feature/5136859632107520?gate=6594055733641216

Links to previous Intent discussions
Intent to Prototype: https://groups.google.com/a/chromium.org/d/msgid/blink-dev/69694ed0.050a0220.f8796.0337.GAE%40google.com


This intent message was generated by Chrome Platform Status.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/69fa73a1.050a0220.e03d3.00f0.GAE%40google.com.

Evan Liu

unread,
May 6, 2026, 1:47:53 PM (yesterday) May 6
to Rick Byers, Chromestatus, blin...@chromium.org
Thanks for the quick response, Rick!

Do we know of any website who wants to use this API? In general we don't ship APIs that we don't have known customers for.
Google Meet requires this feature to ensure that the on-device model used by the Web Speech API meets its strict quality requirements.

It looks like this is in this PR, right? Is there any reason we shouldn't wait for the PR to land before shipping?
Meet wants to use this feature in M150, but hopefully the PR will land before then anyway!

Why not? 
 Android & ChromeOS currently do not support on-device Web Speech, so this quality hint won't be available on those platforms. As for the lack of WPT coverage, the testing infrastructure currently lacks a standardized way to mock the hardware-dependent capabilities and subjective behaviors of different on-device machine learning models.

Thanks,
Evan
Reply all
Reply to author
Forward
0 new messages