Intent to Ship: On-device Web Speech API

Chromestatus

unread,

Jan 6, 2025, 8:10:51 PMJan 6

to blin...@chromium.org, ev...@google.com

Contact emails

ev...@google.com

Explainer

https://github.com/WebAudio/web-speech-api/pull/122

Specification

https://webaudio.github.io/web-speech-api

Summary

This feature adds on-device speech recognition support to the Web Speech API, allowing websites to ensure that neither audio nor transcribed speech are sent to a third-party service for processing. Websites can query the availability of on-device speech recognition for specific languages, prompt users to install the necessary resources for on-device speech recognition, and choose between on-device or cloud-based speech recognition as needed.

Blink component

Blink>Speech

Search tags

speech, recognition, local, offline, on-device

TAG review

None

TAG review status

Pending

Risks

Interoperability and Compatibility

None

Gecko: Positive Discussed at TPAC 2024 with representatives from Mozilla including Paul Adenot

WebKit: Positive Discussed at TPAC 2024 with representatives from Apple including Eric Carlson.

Web developers: Positive Commonly requested feature. Examples: https://webwewant.fyi/wants/55/ https://github.com/WebAudio/web-speech-api/issues/108 https://stackoverflow.com/questions/49473369/offline-speech-recognition-in-browser https://www.reddit.com/r/html5/comments/8jtv3u/offline_voice_recognition_without_the_webspeech/

Other signals:

WebView application risks

Does this intent deprecate or change behavior of existing APIs, such that it has potentially high risk for Android WebView-based applications?

None

Debuggability

None

Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, ChromeOS, Android, and Android WebView)?

No

Initially supported on Windows, Mac, and Linux with ChromeOS support to follow.

Is this feature fully tested by web-platform-tests?

No

Flag name on about://flags

None

Finch feature name

InstallOnDeviceSpeechRecognition,OnDeviceWebSpeechAvailable,OnDeviceWebSpeech

Requires code in //chrome?

False

Estimated milestones

Shipping on desktop

135

Anticipated spec changes

Open questions about a feature may be a source of future web compat or interop issues. Please list open issues (e.g. links to known github issues in the project for the feature specification) whose resolution may introduce web compat/interop risk (e.g., changing to naming or structure of the API in a non-backward-compatible way).

https://github.com/WebAudio/web-speech-api/pull/122

Link to entry on the Chrome Platform Status

https://chromestatus.com/feature/6090916291674112?gate=4683906480340992

This intent message was generated by Chrome Platform Status.

Yoav Weiss (@Shopify)

unread,

Jan 7, 2025, 3:29:12 AMJan 7

to Chromestatus, blin...@chromium.org, ev...@google.com

On Tue, Jan 7, 2025 at 2:10 AM Chromestatus <ad...@cr-status.appspotmail.com> wrote:

Contact emails
ev...@google.com

Explainer
https://github.com/WebAudio/web-speech-api/pull/122

An actual explainer with usage examples would've been useful.

Also, the spec is not very detailed:

* It seems to be triggering resource downloads, but Fetch integration is not specified.

* Are the resources downloaded partitioned per top-level site? What should typical download sizes be?

Specification
https://webaudio.github.io/web-speech-api

Summary

This feature adds on-device speech recognition support to the Web Speech API, allowing websites to ensure that neither audio nor transcribed speech are sent to a third-party service for processing. Websites can query the availability of on-device speech recognition for specific languages, prompt users to install the necessary resources for on-device speech recognition, and choose between on-device or cloud-based speech recognition as needed.

Blink component
Blink>Speech

Search tags
speech, recognition, local, offline, on-device

TAG review
None

TAG review status
Pending

Risks

Interoperability and Compatibility

None

Gecko: Positive Discussed at TPAC 2024 with representatives from Mozilla including Paul Adenot

WebKit: Positive Discussed at TPAC 2024 with representatives from Apple including Eric Carlson.

Links to the minutes would be helpful. Filing official positions would be even better.

Web developers: Positive Commonly requested feature. Examples: https://webwewant.fyi/wants/55/ https://github.com/WebAudio/web-speech-api/issues/108 https://stackoverflow.com/questions/49473369/offline-speech-recognition-in-browser https://www.reddit.com/r/html5/comments/8jtv3u/offline_voice_recognition_without_the_webspeech/

Other signals:

WebView application risks

Does this intent deprecate or change behavior of existing APIs, such that it has potentially high risk for Android WebView-based applications?

None

Debuggability

None

Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, ChromeOS, Android, and Android WebView)?
No
Initially supported on Windows, Mac, and Linux with ChromeOS support to follow.

Is this feature fully tested by web-platform-tests?
No

Why not? Is it tested otherwise?

Flag name on about://flags
None

Finch feature name
InstallOnDeviceSpeechRecognition,OnDeviceWebSpeechAvailable,OnDeviceWebSpeech

Requires code in //chrome?
False

Estimated milestones

Shipping on desktop 135

Anticipated spec changes

Open questions about a feature may be a source of future web compat or interop issues. Please list open issues (e.g. links to known github issues in the project for the feature specification) whose resolution may introduce web compat/interop risk (e.g., changing to naming or structure of the API in a non-backward-compatible way).
https://github.com/WebAudio/web-speech-api/pull/122

Link to entry on the Chrome Platform Status
https://chromestatus.com/feature/6090916291674112?gate=4683906480340992

This intent message was generated by Chrome Platform Status.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/677c7f0e.2b0a0220.2e82a8.01f6.GAE%40google.com.

Daniel Clark

unread,

Jan 7, 2025, 1:34:29 PMJan 7

to Yoav Weiss (@Shopify), Chromestatus, blin...@chromium.org, ev...@google.com

Adding to Yoav’s feedback about the spec:

It’s implied that installOnDeviceSpeechRecognition() happens synchronously. Making this a blocking call seems problematic since it could involve a fetch and a download. I’d expect it to return a Promise (https://www.w3.org/TR/design-principles/#promises). And onDeviceWebSpeechAvailable should probably also be async since it could involve reading data from disk.
The SpeechRecognitionMode "ondevice-only" value is only defined by a comment in the IDL stating that it “Returns an error if on-device speech recognition is not available”. What specifically returns an error? SpeechRecognition.start() doesn’t return any value, and in other error conditions the behavior is to fire SpeechRecognitionErrorEvent. Also, what should the behavior be if SpeechRecognitionMode is changed after start() has already been called?

I also wonder if this should have a TAG review, especially given the privacy/fingerprinting implications of websites being able to query which on-device models are available.

-- Dan Clark

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOmohSJFcq7nCbx372u8Qas0%3DUWbCUY9b37ak6fAN8CwGfFVcA%40mail.gmail.com.

Evan Liu

unread,

Jan 7, 2025, 3:50:21 PMJan 7

to Daniel Clark, Yoav Weiss (@Shopify), Chromestatus, blin...@chromium.org

* Are the resources downloaded partitioned per top-level site? What should typical download sizes be?

This depends on the browser--for Chrome on Windows/Mac/Linux, there's only one instance of each on-device speech recognition language pack and each language pack is ~60MB. The spec doesn't necessarily dictate how the downloads are handled, only that websites should be allowed to trigger a download (or request a download) of a language.

Links to the minutes would be helpful. Filing official positions would be even better.

I've filed official positions for Mozilla and WebKit.

Why not? Is it tested otherwise?

Oops, I forgot to check that box. This feature is testable by web-platform-tests.

It’s implied that installOnDeviceSpeechRecognition() happens synchronously. Making this a blocking call seems problematic since it could involve a fetch and a download. I’d expect it to return a Promise (https://www.w3.org/TR/design-principles/#promises). And onDeviceWebSpeechAvailable should probably also be async since it could involve reading data from disk.

Totally agree--the implementation of those two APIs on Chrome return promises. I'll make sure the spec reflects this.

The SpeechRecognitionMode "ondevice-only" value is only defined by a comment in the IDL stating that it “Returns an error if on-device speech recognition is not available”. What specifically returns an error? SpeechRecognition.start() doesn’t return any value, and in other error conditions the behavior is to fire SpeechRecognitionErrorEvent. Also, what should the behavior be if SpeechRecognitionMode is changed after start() has already been called?

Ah yeah, I'll update that comment to clarify that it fires a SpeechRecognitionErrorEvent. Updating the SpeechRecognitionMode after start() has been called has no effect on the existing session. This is consistent with how other SpeechRecognition attributes work (i.e. lang, maxAlternatives, etc.). This isn't explicitly stated anywhere in the spec, so I'll file a spec issue to clarify this as well.

As for mitigating privacy and fingerprinting risks, we've been collaborating with the team building the Translator API feature which also has the ability to download and detect language packs. Because the risks between these two features are nearly identical, on-device speech recognition language pack downloads will follow the same pattern and use the same permissions UI as on-device translation language packs. Here are some helpful links:

Privacy Design Doc

Translator API Developer Docs

Github Issue on Preventing Fingerprinting

Thanks,

Evan

Yoav Weiss (@Shopify)

unread,

Jan 7, 2025, 10:20:12 PMJan 7

to Evan Liu, Domenic Denicola, Daniel Clark, Chromestatus, blin...@chromium.org

On Tue, Jan 7, 2025 at 9:50 PM Evan Liu <ev...@google.com> wrote:

* Are the resources downloaded partitioned per top-level site? What should typical download sizes be?
This depends on the browser--for Chrome on Windows/Mac/Linux, there's only one instance of each on-device speech recognition language pack and each language pack is ~60MB. The spec doesn't necessarily dictate how the downloads are handled, only that websites should be allowed to trigger a download (or request a download) of a language.

This seems like it'd require at very least some extra considerations as part of the Privacy & Security section of the spec.

It would also be good to have that be explicitly an implementation-defined decision.

+Domenic Denicola who's been working on similar privacy models related to translations, and can potentially advise you on the best path there.

Links to the minutes would be helpful. Filing official positions would be even better.
I've filed official positions for Mozilla and WebKit.

Why not? Is it tested otherwise?
Oops, I forgot to check that box. This feature is testable by web-platform-tests.

It’s implied that installOnDeviceSpeechRecognition() happens synchronously. Making this a blocking call seems problematic since it could involve a fetch and a download. I’d expect it to return a Promise (https://www.w3.org/TR/design-principles/#promises). And onDeviceWebSpeechAvailable should probably also be async since it could involve reading data from disk.
Totally agree--the implementation of those two APIs on Chrome return promises. I'll make sure the spec reflects this.

The SpeechRecognitionMode "ondevice-only" value is only defined by a comment in the IDL stating that it “Returns an error if on-device speech recognition is not available”. What specifically returns an error? SpeechRecognition.start() doesn’t return any value, and in other error conditions the behavior is to fire SpeechRecognitionErrorEvent. Also, what should the behavior be if SpeechRecognitionMode is changed after start() has already been called?
Ah yeah, I'll update that comment to clarify that it fires a SpeechRecognitionErrorEvent. Updating the SpeechRecognitionMode after start() has been called has no effect on the existing session. This is consistent with how other SpeechRecognition attributes work (i.e. lang, maxAlternatives, etc.). This isn't explicitly stated anywhere in the spec, so I'll file a spec issue to clarify this as well.

As for mitigating privacy and fingerprinting risks, we've been collaborating with the team building the Translator API feature which also has the ability to download and detect language packs. Because the risks between these two features are nearly identical, on-device speech recognition language pack downloads will follow the same pattern and use the same permissions UI as on-device translation language packs. Here are some helpful links:
Privacy Design Doc

I don't think that's a link..

Rick Byers

unread,

Jan 8, 2025, 10:33:12 AMJan 8

to Yoav Weiss (@Shopify), Evan Liu, Domenic Denicola, Daniel Clark, Chromestatus, blin...@chromium.org

This is great to see! IMHO there are a bunch of great use-cases for on-device speech recognition which are likely not suitable for server-based approaches.

This is still only exposed via a legacy prefixed API, window.webkitSpeechRecognition, right? Any reason why it wouldn't be trivial to unprefix the speech recognition API (supporting both prefixed and unprefixed) at the same time? In general we don't support making updates to APIs which are only exposed via non-standard legacy prefixed API names.

On Tue, Jan 7, 2025 at 10:20 PM Yoav Weiss (@Shopify) <yoav...@chromium.org> wrote:

On Tue, Jan 7, 2025 at 9:50 PM Evan Liu <ev...@google.com> wrote:
* Are the resources downloaded partitioned per top-level site? What should typical download sizes be?
This depends on the browser--for Chrome on Windows/Mac/Linux, there's only one instance of each on-device speech recognition language pack and each language pack is ~60MB. The spec doesn't necessarily dictate how the downloads are handled, only that websites should be allowed to trigger a download (or request a download) of a language.

This seems like it'd require at very least some extra considerations as part of the Privacy & Security section of the spec.
It would also be good to have that be explicitly an implementation-defined decision.

+Domenic Denicola who's been working on similar privacy models related to translations, and can potentially advise you on the best path there.

Links to the minutes would be helpful. Filing official positions would be even better.
I've filed official positions for Mozilla and WebKit.

Why not? Is it tested otherwise?
Oops, I forgot to check that box. This feature is testable by web-platform-tests.

Have you written web platform tests for it? Have a link?

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOmohSJb0cmJS4MxC7sTAnXNtrOXdV601QoGa_pXwseJH4%2Bhcw%40mail.gmail.com.

Alex Russell

unread,

Jan 8, 2025, 11:25:18 AMJan 8

to blink-dev, Rick Byers, ev...@google.com, Domenic Denicola, dan...@microsoft.com, Chromestatus, blin...@chromium.org, Yoav Weiss

+1 to Dan's feedback; this needs an async API, likely with a streams design.

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/677c7f0e.2b0a0220.2e82a8.01f6.GAE%40google.com.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOmohSJFcq7nCbx372u8Qas0%3DUWbCUY9b37ak6fAN8CwGfFVcA%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.

Mike Taylor

unread,

Jan 8, 2025, 11:31:24 AMJan 8

to Evan Liu, Daniel Clark, Yoav Weiss (@Shopify), Chromestatus, blin...@chromium.org

On 1/7/25 3:49 PM, 'Evan Liu' via blink-dev wrote:

As for mitigating privacy and fingerprinting risks, we've been collaborating with the team building the Translator API feature which also has the ability to download and detect language packs. Because the risks between these two features are nearly identical, on-device speech recognition language pack downloads will follow the same pattern and use the same permissions UI as on-device translation language packs. Here are some helpful links:
Privacy Design Doc
Translator API Developer Docs

Github Issue on Preventing Fingerprinting

Should we update the Privacy considerations in the spec to describe these risks?

Jeffrey Yasskin

unread,

Jan 8, 2025, 11:35:12 AMJan 8

to Daniel Clark, Yoav Weiss (@Shopify), Chromestatus, blin...@chromium.org, ev...@google.com

On Tue, Jan 7, 2025 at 10:34 AM 'Daniel Clark' via blink-dev <blin...@chromium.org> wrote:

Adding to Yoav’s feedback about the spec:

It’s implied that installOnDeviceSpeechRecognition() happens synchronously. Making this a blocking call seems problematic since it could involve a fetch and a download. I’d expect it to return a Promise (https://www.w3.org/TR/design-principles/#promises). And onDeviceWebSpeechAvailable should probably also be async since it could involve reading data from disk.
The SpeechRecognitionMode "ondevice-only" value is only defined by a comment in the IDL stating that it “Returns an error if on-device speech recognition is not available”. What specifically returns an error? SpeechRecognition.start() doesn’t return any value, and in other error conditions the behavior is to fire SpeechRecognitionErrorEvent. Also, what should the behavior be if SpeechRecognitionMode is changed after start() has already been called?

I also wonder if this should have a TAG review, especially given the privacy/fingerprinting implications of websites being able to query which on-device models are available.

As a TAG member, I think a TAG review would probably result in useful feedback for this API. Please do send one.

Thanks,

Jeffrey

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CH4PR00MB23299535B21C262B4841395FC5112%40CH4PR00MB2329.namprd00.prod.outlook.com.

Evan Liu

unread,

Jan 9, 2025, 5:55:21 PMJan 9

to Jeffrey Yasskin, Daniel Clark, Yoav Weiss (@Shopify), Chromestatus, blin...@chromium.org

Have you written web platform tests for it? Have a link?

I've added a few so far--WPT coverage for the Web Speech API in general is pretty basic at the moment. I'm planning on adding more comprehensive coverage. Here are the ones relevant to this proposal:

https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/web_tests/external/wpt/speech-api/SpeechRecognition-installOnDeviceSpeechRecognitiRecognition.https.html

https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/web_tests/external/wpt/speech-api/SpeechRecognition-onDeviceWebSpeechAvailable.https.html

Should we update the Privacy considerations in the spec to describe these risks?

That's a good idea! There's a section in the spec that requires user consent to download language packs to mitigate these risks, but I'll update this section to explicitly mention the fingerprinting risks.

this needs an async API, likely with a streams design.

The new methods added to the SpeechRecognition interface (onDeviceSpeechRecognitionAvailable and installOnDeviceSpeechRecognition) will return promises--I'll update the spec to reflect this. Regarding dropping the webkit prefix and replacing the non-prefixed version with a modern API design with promises, this probably wouldn't be feasible due to interoperability and backwards compatibility issues. While Firefox doesn't officially support the speech recognition section of the Web Speech API, it has a unprefixed implementation behind a flag and most of the guides on how to use the Web Speech API do something like window.SpeechRecognition || window.webkitSpeechRecognition; (Examples from developer.mozilla.org, codeburst.io, dev.to) and there are 17.8K instances of this kind of usage on Github alone. However, I think it's worth considering whether to create a new, modernized version of the API under a different name. I've filed a Github issue to continue this discussion with the Audio Working Group: https://github.com/WebAudio/web-speech-api/issues/130

Privacy Design Doc

I don't think that's a link..

Oops sorry, here's the actual link: https://docs.google.com/document/d/12NHh8eVGmuqRuG9H4na4jGwCPD5UqIbC0bW_CJQi2Sc You'll probably have to request access to view the doc.

I also wonder if this should have a TAG review, especially given the privacy/fingerprinting implications of websites being able to query which on-device models are available.

As a TAG member, I think a TAG review would probably result in useful feedback for this API. Please do send one.

I've sent a TAG design review request for this here: https://github.com/w3ctag/design-reviews/issues/1038

Thanks for all of the input so far!

Evan

Rick Byers

unread,

Jan 15, 2025, 10:59:23 AMJan 15

to Evan Liu, Jeffrey Yasskin, Daniel Clark, Yoav Weiss (@Shopify), Chromestatus, blin...@chromium.org

Thank you Evan. Given the samples and github hits you've shared, I agree that web compat will constrain us from making breaking changes to the APi when we unprefix. That's a shame, but is a known reason why we long ago gave up on prefixes as a safe way to do experimental API development. So are you OK with adding unprefixing to this intent (or if you prefer, a new one that this is blocked on)? IMHO the webkit-prefixed entrypoint can just be an alias, but we should UseCounter it independently - with all the examples you've shared I'm hoping maybe we can even delete the prefixed API name relatively easily? But we should keep that as separate future work.

The TAG or others may argue that we should block on-device support on modernizing the API elsewhere (eg. exposing under a brand new name, or adding duplicate methods for the promise versions), but personally I think that goes too far so I won't withhold my personal LGTM on such an argument. Other API owners may feel differently.

It's great to see Mozilla officially positive on this. There's no TAG or WebKit feedback yet, but let's keep our ears open. From my perspective this is currently only blocked on unprefixing and landing the spec updates you mentioned.

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOVsCZm2%3DNhZrkv6U4DkC9fzYP76%3DBOO6vy95_Jq6qPSzZh4bA%40mail.gmail.com.

Rick Byers

unread,

Jan 15, 2025, 1:49:00 PMJan 15

to Evan Liu, Jeffrey Yasskin, Daniel Clark, Yoav Weiss (@Shopify), Chromestatus, blin...@chromium.org

One more question, it looks like the latest spec has not been published to the gh-pages branch yet. Can you please make sure that your changes are visible here?

API owners (chrishtr, bratell, yoavweiss, vmpstr and me) met and agreed:

It would be helpful if you wrote a short explainer.

The spec PR (while a pretty easy way to see what APIs you're adding) doesn't really make for a good explainer because it doesn't really explain the context and "why" for people who don't have any knowledge of the web speech API. TAG may not look at your review without a high-level explainer.

Making window.SpeechRecognition an alias of window.webkitSpeechRecognition is OK with us

This means the two new methods will be available both via webkitSpeechRecognition and SpeechRecognition. Adding new methods to prefixed APIs isn't ideal, but we agreed that it's not worth the extra work to separate these especially given the documentation you shared using 'const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;'
But we do want a UseCounter specific to webkitSpeechRecognition so we get a good idea when it can be removed (should be trivial with LegacyWindowAlias_Measure)

We are looking for the spec and WPTs to match the implementation before approving
The group agrees that discussions of modernizing the API are non-blocking for this intent

Thanks,

Rick

Evan Liu

unread,

Jan 21, 2025, 7:33:30 PMJan 21

to Stmh, Rick Byers, Jeffrey Yasskin, Daniel Clark, Yoav Weiss (@Shopify), Chromestatus, blink-dev

So are you OK with adding unprefixing to this intent (or if you prefer, a new one that this is blocked on)?

Yeah, I think that's a great idea! I'm also in favor of tracking usage of the prefixed version with the goal of possibly dropping it entirely in the future.

It would be helpful if you wrote a short explainer.

I've sent out a PR adding an explainer for on-device speech recognition: https://github.com/WebAudio/web-speech-api/pull/133

We are looking for the spec and WPTs to match the implementation before approving

I've sent out a PR updating the spec to match the WPTs, which return Promises that resolve to booleans for the two new methods: https://github.com/WebAudio/web-speech-api/pull/132

One more question, it looks like the latest spec has not been published to the gh-pages branch yet. Can you please make sure that your changes are visible here?

Dominique Hazael-Massieux is currently working on this--the change should be auto-published once this PR is merged: https://github.com/WebAudio/web-speech-api/pull/129

Thanks for the feedback and please let me know if anyone has any additional comments or concerns!

On Wed, Jan 15, 2025 at 10:55 AM Stmh <hateds...@gmail.com> wrote:

It would be nice to speak with someone privately, as I may be able to add some additional insight.

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAFUtAY9Voe4qsEig1wJw%3D3aVOnF4aUrgs7Y6hgdSzWimP4DKmQ%40mail.gmail.com.

Rick Byers

unread,

Jan 22, 2025, 11:25:51 AMJan 22

to Evan Liu, Stmh, Jeffrey Yasskin, Daniel Clark, Yoav Weiss (@Shopify), Chromestatus, blink-dev

On Tue, Jan 21, 2025 at 7:33 PM Evan Liu <ev...@google.com> wrote:

So are you OK with adding unprefixing to this intent (or if you prefer, a new one that this is blocked on)?
Yeah, I think that's a great idea! I'm also in favor of tracking usage of the prefixed version with the goal of possibly dropping it entirely in the future.

Great, thank you!

It would be helpful if you wrote a short explainer.
I've sent out a PR adding an explainer for on-device speech recognition: https://github.com/WebAudio/web-speech-api/pull/133

The explainer looks great, thanks!

We are looking for the spec and WPTs to match the implementation before approving
I've sent out a PR updating the spec to match the WPTs, which return Promises that resolve to booleans for the two new methods: https://github.com/WebAudio/web-speech-api/pull/132

Perfect, please follow up here once the PR lands (or gets blocked for some reason outside your control, including lack of review).

One more question, it looks like the latest spec has not been published to the gh-pages branch yet. Can you please make sure that your changes are visible here?
Dominique Hazael-Massieux is currently working on this--the change should be auto-published once this PR is merged: https://github.com/WebAudio/web-speech-api/pull/129

Thanks!

Paul Adenot

unread,

Feb 6, 2025, 5:13:50 AMFeb 6

to blink-dev, rby...@chromium.org, hateds...@gmail.com, jyas...@chromium.org, dan...@microsoft.com, yoav...@chromium.org, Chromestatus, blink-dev, ev...@google.com

All,

Answering with both hats separately here: Mozilla developer implementing this, and chair of the CG in which this is happening (and of the WG that will adopt this when it's rechartering time).

While we (Mozilla) are extremely happy to see work in the area, are participating in the standard, and are currently implementing, and can be clearly counted as "Positive" (putting my standard author hat on) this is **nowhere near ready** for exposure on the web, as clearly outlined by a few people in this thread. Lots of questions remain, PRs are in flights, the small amount of test is using APIs that haven't been agreed upon nor merged. Progress is being made every day, but not agreement has been reached. It has only been a few months since Evan's announcement of this at TPAC, we can spend more time getting this ready. We barely have an explainer, that is already outdated and hasn't been reviewed by all the implementors.

I have called for a special, mostly dedicated, meeting to be able to expedite lots of the items required for shipping something like this last Thursday, and I'm waiting for an answer from Apple on this, as they are actively commenting on the current text, and it wouldn't be appropriate to not have them on board here. In the meantime, we're making progress on a day-to-day basis. Should it be needed, we can also set up regular meeting with a higher frequency, as it has been done before when there was a strong need to drive a spec to a shippable state quickly.

All that to say, shipping this in Chrome 135, that enters Beta in about a month, feels premature.

Best,

Paul.

Evan Liu

unread,

Apr 2, 2025, 5:16:40 PMApr 2

to blink-dev, pad...@mozilla.com, rby...@chromium.org, hateds...@gmail.com, jyas...@chromium.org, dan...@microsoft.com, yoav...@chromium.org, Chromestatus, blink-dev, Evan Liu

Hi all,

I've addressed the following issues and re-requested a review for this.

✓ Worked with pad...@mozilla.com and others from the Audio WG in finalizing the API shape for on-device speech recognition.

✓ Expanded WPT coverage for on-device functionality

✓ Dropped the "webkit" prefix from the Web Speech API

Please let me know if anyone has any other concerns. Thanks!

Evan

Rick Byers

unread,

Apr 4, 2025, 9:31:01 AMApr 4

to Evan Liu, blink-dev, pad...@mozilla.com, hateds...@gmail.com, jyas...@chromium.org, dan...@microsoft.com, yoav...@chromium.org, Chromestatus

Thanks Evan!

This all looks great to me! Are you thinking it might be reasonable to ship in M128 (decide by branch on Apr 28, plan to merge any required changes before May 21)?

Paul, thank you for your comments, to what extent are your concerns now addressed? While we don't generally wait for full consensus before shipping in Chromium (and expect that changes will continue to come after we initially ship), we definitely don't want to create future interop problems.

At minimum I think we need to give TAG a couple more weeks to possibly weigh in before API owners can fully re-review shipping on-device.

That said, if you want to, I'm supportive of shipping the unprefixing alone now, since you already proved to us that the unprefixed API is not an opportunity to make any breaking API changes. Do you prefer to decouple that, or just wait and get the whole bundle approved to ship together?

Rick

Thomas Steiner

unread,

Apr 4, 2025, 9:51:58 AMApr 4

to Rick Byers, Evan Liu, blink-dev, pad...@mozilla.com, hateds...@gmail.com, jyas...@chromium.org, dan...@microsoft.com, yoav...@chromium.org, Chromestatus

This all looks great to me! Are you thinking it might be reasonable to ship in M128 (decide by branch on Apr 28, plan to merge any required changes before May 21)?

Off by one, classic. I think you meant 138 here. I know it's obvious now, but someone might once look back at this in ten years from now and wonder…

Evan Liu

unread,

Apr 4, 2025, 1:22:56 PMApr 4

to Thomas Steiner, Rick Byers, blink-dev, pad...@mozilla.com, hateds...@gmail.com, jyas...@chromium.org, dan...@microsoft.com, yoav...@chromium.org, Chromestatus

Are you thinking it might be reasonable to ship in M128 (decide by branch on Apr 28, plan to merge any required changes before May 21)?

That sounds like a reasonable target, assuming TAG doesn't propose any significant changes.

That said, if you want to, I'm supportive of shipping the unprefixing alone now, since you already proved to us that the unprefixed API is not an opportunity to make any breaking API changes. Do you prefer to decouple that, or just wait and get the whole bundle approved to ship together?

Either is fine with me! Would decoupling just be a matter of making the changes, or would I need to create a separate Chrome Status entry, get position statements, all of the approvals, etc.? If it's the former, we might as well make the change now. Otherwise it might just be easier to bundle everything together.

Thanks,

Evan

Jeffrey Yasskin

unread,

Apr 15, 2025, 12:08:53 AMApr 15

to Evan Liu, Thomas Steiner, Rick Byers, blink-dev, pad...@mozilla.com, Chromestatus

FYI, the TAG left comments at https://github.com/w3ctag/design-reviews/issues/1038#issuecomment-2803693504.

Evan Liu

unread,

Apr 15, 2025, 8:14:52 PMApr 15

to Jeffrey Yasskin, Thomas Steiner, Rick Byers, blink-dev, pad...@mozilla.com, Chromestatus

Thanks for the detailed feedback, Jeffrey! We'll discuss this at the Audio Working Group meeting this week and I'll update this thread afterwards.

Thanks,

Evan

Rick Byers

unread,

Apr 16, 2025, 10:55:52 AMApr 16

to Evan Liu, Jeffrey Yasskin, Thomas Steiner, blink-dev, pad...@mozilla.com, Chromestatus

On Tue, Apr 15, 2025 at 8:14 PM Evan Liu <ev...@google.com> wrote:

Thanks for the detailed feedback, Jeffrey! We'll discuss this at the Audio Working Group meeting this week and I'll update this thread afterwards.

Thanks,
Evan

On Mon, Apr 14, 2025 at 9:08 PM Jeffrey Yasskin <jyas...@chromium.org> wrote:
FYI, the TAG left comments at https://github.com/w3ctag/design-reviews/issues/1038#issuecomment-2803693504.

On Fri, Apr 4, 2025 at 10:22 AM Evan Liu <ev...@google.com> wrote:
Are you thinking it might be reasonable to ship in M128 (decide by branch on Apr 28, plan to merge any required changes before May 21)?
That sounds like a reasonable target, assuming TAG doesn't propose any significant changes.

That said, if you want to, I'm supportive of shipping the unprefixing alone now, since you already proved to us that the unprefixed API is not an opportunity to make any breaking API changes. Do you prefer to decouple that, or just wait and get the whole bundle approved to ship together?
Either is fine with me! Would decoupling just be a matter of making the changes, or would I need to create a separate Chrome Status entry, get position statements, all of the approvals, etc.? If it's the former, we might as well make the change now. Otherwise it might just be easier to bundle everything together.

I'm OK with just shipping the unprefixing under this same intent without the extra paperwork, but also it's a bit simpler if we just keep it all lumped together as a bundle. I don't think there's any particular reason to unprefix before shipping on-device, is there?

Thanks,
Evan

On Fri, Apr 4, 2025 at 6:51 AM Thomas Steiner <to...@google.com> wrote:
This all looks great to me! Are you thinking it might be reasonable to ship in M128 (decide by branch on Apr 28, plan to merge any required changes before May 21)?

Off by one, classic. I think you meant 138 here. I know it's obvious now, but someone might once look back at this in ten years from now and wonder…

Whoops, yes of course - thank you :-).

Brian Kardell

unread,

Apr 16, 2025, 1:54:19 PMApr 16

to blink-dev, Rick Byers, Jeffrey Yasskin, Thomas Steiner, blink-dev, pad...@mozilla.com, Chromestatus, Evan Liu

Just linking this up as I see there are some questions, but the opening post seems to suggest there are positive signals from WebKit...

https://github.com/WebKit/standards-positions/issues/443

Evan Liu

unread,

Apr 18, 2025, 2:31:50 PMApr 18

to Brian Kardell, blink-dev, Rick Byers, Jeffrey Yasskin, Thomas Steiner, pad...@mozilla.com, Chromestatus

Hi all,

We discussed the TAG feedback at the Audio Working Group meeting yesterday and I've posted our response here: https://github.com/w3ctag/design-reviews/issues/1038#issuecomment-2815982645

Please let me know if anyone has any questions/comments/concerns.

I don't think there's any particular reason to unprefix before shipping on-device, is there?

Also to answer your question, Rick, I don't think there's any reason to unprefix before shipping on-device, so we might as well lump it together as a bundle :).

Thanks!

Evan

Dan Clark

unread,

Apr 21, 2025, 2:41:52 PMApr 21

to blink-dev, ev...@google.com, blink-dev, rby...@chromium.org, jyas...@chromium.org, tste...@google.com, pad...@mozilla.com, Chromestatus, Brian Kardell

Thanks Evan for the follow-up.

The response to the TAG feedback mentions a few changes that would be web-facing:

- Removing "cloud-only"
- Changing installOnDevice to support multiple langs in a single call, including the consideration of a more expressive return value type.
- Adding an AbortSignal parameter to the download/install function

For the first one I see that you have https://github.com/WebAudio/web-speech-api/pull/150.

For the other two, are those changes you plan to make prior to shipping? The second one in particular seems like it'd be breaking if done afterwards.

-- Dan

Evan Liu

unread,

Apr 21, 2025, 6:22:53 PMApr 21

to Dan Clark, blink-dev, rby...@chromium.org, jyas...@chromium.org, tste...@google.com, pad...@mozilla.com, Chromestatus, Brian Kardell

Hi Dan,

For the second one, yes this change would be made prior to shipping. The second two changes might be a little more controversial, so I was planning on making those changes in subsequent PRs.

Thanks,

Evan

--
You received this message because you are subscribed to a topic in the Google Groups "blink-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/a/chromium.org/d/topic/blink-dev/VNOok2dbmHM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to blink-dev+...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/3deb7274-8a70-4b28-a60a-02d76d22ba14n%40chromium.org.

Domenic Denicola

unread,

Apr 23, 2025, 12:34:30 AMApr 23

to blink-dev, ev...@google.com, blink-dev, Rick Byers, Jeffrey Yasskin, tste...@google.com, pad...@mozilla.com, Chromestatus, Brian Kardell, dan...@microsoft.com

Hi Evan,

Can you confirm that this new mode will support all the features of the existing speech recognition API? In particular, I am wondering about:

phrases list
continuous: true vs. false
interimResults: true vs. false
maxAlternatives: 1 vs. larger values
All the events: audiostart, audioend, start, end, soundstart, soundend, speechstart, speechend, result, nomatch

Thank you!

To unsubscribe from this group and all its topics, send an email to blink-dev+unsubscribe@chromium.org.

Evan Liu

unread,

Apr 24, 2025, 11:48:18 AMApr 24

to Domenic Denicola, blink-dev, Rick Byers, Jeffrey Yasskin, tste...@google.com, pad...@mozilla.com, Chromestatus, Brian Kardell, dan...@microsoft.com

Hi Domenic,

On-device Web Speech supports phrases list, continuous, and all of the audio events. Support for interimResults and maxAlternatives isn't implemented at this time, but adding it would be trivial.

Thanks,

Evan

To unsubscribe from this group and all its topics, send an email to blink-dev+...@chromium.org.

Alex Russell

unread,

May 5, 2025, 2:09:54 PMMay 5

to blink-dev, ev...@google.com, blink-dev, Rick Byers, Jeffrey Yasskin, tste...@google.com, pad...@mozilla.com, Chromestatus, Brian Kardell, dan...@microsoft.com, Domenic Denicola

Does "trivial" mean "possible within the scope of this Intent"? Or does that mean it will need to be picked up in the future? A full parity API is easier to approve.

To unsubscribe from this group and all its topics, send an email to blink-dev+unsubscribe@chromium.org.

Evan Liu

unread,

May 5, 2025, 2:51:02 PMMay 5

to Alex Russell, blink-dev, Rick Byers, Jeffrey Yasskin, tste...@google.com, pad...@mozilla.com, Chromestatus, Brian Kardell, dan...@microsoft.com, Domenic Denicola

Hi Alex,

Sure, if adding support for these two options would make the API easier to approve, then we might as well include it!

Thanks,

Evan

To unsubscribe from this group and all its topics, send an email to blink-dev+...@chromium.org.

Jeffrey Yasskin

unread,

May 7, 2025, 3:12:29 AMMay 7

to Evan Liu, blink-dev, Rick Byers, Thomas Steiner, pad...@mozilla.com

FYI, the TAG finished our review with https://github.com/w3ctag/design-reviews/issues/1038#issuecomment-2853142041. We were generally happy with the design decisions that Evan and the WG have made, but we were still concerned that "ondevice-only" excludes some choices that future UAs might reasonably want to explore. We listed 5 kinds of locations that a user might want to run speech recognition (or heavy workloads in general), and we thought the WG should look at the concrete websites that want to adopt this, figure out which locations they're ok with, and pick a name based on that. We didn't think Google Meet's described use case for "ondevice-only" was even about recognition location, but it might also indicate a feature the WG might want to add.

Jeffrey

Alex Russell

unread,

May 7, 2025, 11:10:21 AMMay 7

to blink-dev, Jeffrey Yasskin, blink-dev, Rick Byers, tste...@google.com, pad...@mozilla.com, ev...@google.com

Thanks Evan and Jeff.

Evan: if we can get to API symmetry, I think that will help considerably.

Evan/Jeff: this seems like good advice from the TAG. When do we think we can get the bikeshed repain...er...develop updated names?

Best,

Alex

Evan Liu

unread,

May 7, 2025, 2:50:12 PMMay 7

to Alex Russell, blink-dev, Jeffrey Yasskin, Rick Byers, tste...@google.com, pad...@mozilla.com

Hi all,

Thanks for the thorough review! I've opened a GitHub issue for the remaining request. Hopefully we'll settle on an option before the next Audio Working Group meeting on 5/15! I'll update the spec as soon as we do.

Thanks,

Evan

Yoav Weiss (@Shopify)

unread,

May 14, 2025, 11:04:52 AMMay 14

to blink-dev, ev...@google.com, blink-dev, Jeffrey Yasskin, Rick Byers, tste...@google.com, pad...@mozilla.com, Alex Russell

On Wednesday, May 7, 2025 at 8:50:12 PM UTC+2 ev...@google.com wrote:

Hi all,

Thanks for the thorough review! I've opened a GitHub issue for the remaining request. Hopefully we'll settle on an option before the next Audio Working Group meeting on 5/15! I'll update the spec as soon as we do.

Is this issue a blocker for this intent, or is it an option that can be added in a backwards compatible way later on?

Evan Liu

unread,

May 14, 2025, 2:11:49 PMMay 14

to Yoav Weiss (@Shopify), blink-dev, Jeffrey Yasskin, Rick Byers, tste...@google.com, pad...@mozilla.com, Alex Russell

It would definitely be better to make this change before the feature ships, but it's up to y'all Blink owners if you think this should be a blocker for this intent. The Audio Working Group is meeting tomorrow morning so hopefully we'll be able to reach a consensus then!

Yoav Weiss (@Shopify)

unread,

May 21, 2025, 11:04:12 AMMay 21

to blink-dev, ev...@google.com, blink-dev, Jeffrey Yasskin, Rick Byers, tste...@google.com, pad...@mozilla.com, Alex Russell, Yoav Weiss

Any conclusions from the WG meeting?

Evan Liu

unread,

May 21, 2025, 1:33:52 PMMay 21

to Yoav Weiss (@Shopify), blink-dev, Jeffrey Yasskin, Rick Byers, tste...@google.com, pad...@mozilla.com, Alex Russell

Hi! We discussed this at the Audio Working Group and reached a consensus :) I have a PR out to update the spec and another one to update the explainer with the changes. I believe this should cover the remaining issues.

Evan Liu

unread,

May 27, 2025, 7:18:44 PMMay 27

to Yoav Weiss (@Shopify), blink-dev, Jeffrey Yasskin, Rick Byers, tste...@google.com, pad...@mozilla.com, Alex Russell

Hi all,

The spec changes have been merged and are live! https://webaudio.github.io/web-speech-api/

I believe this should be ready for Blink owners to take another look. Please let me know if you have any questions.

Thanks,

Evan

Vladimir Levin

unread,

May 28, 2025, 11:07:52 AMMay 28

to blink-dev, ev...@google.com, blink-dev, Jeffrey Yasskin, Rick Byers, tste...@google.com, pad...@mozilla.com, Alex Russell, Yoav Weiss

LGTM1. Please update the TAG thread with the spec changes for posterity

Chris Harrelson

unread,

May 28, 2025, 11:09:48 AMMay 28

to Vladimir Levin, blink-dev, ev...@google.com, Jeffrey Yasskin, Rick Byers, tste...@google.com, pad...@mozilla.com, Alex Russell, Yoav Weiss

LGTM2

--

You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/858cded5-d9a5-4b7c-a727-4cd82699d5b8n%40chromium.org.

Alex Russell

unread,

May 28, 2025, 11:10:31 AMMay 28

to blink-dev, Chris Harrelson, blink-dev, ev...@google.com, Jeffrey Yasskin, Rick Byers, tste...@google.com, pad...@mozilla.com, Alex Russell, Yoav Weiss, Vladimir Levin

LGTM3

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.

Ashley Newson

unread,

Jun 11, 2025, 11:18:21 AMJun 11

to blink-dev, Alex Russell, Chris Harrelson, blink-dev, ev...@google.com, Jeffrey Yasskin, Rick Byers, tste...@google.com, pad...@mozilla.com, Yoav Weiss, Vladimir Levin, Peter Beverloo

Hello, Blink and Web Speech API folks!

I noticed from my Android WebView webexposed watchlist that https://chromium-review.googlesource.com/c/chromium/src/+/6422562 progresses towards the removal the "webkit" prefix from the Web Speech API.

From what I understand, this I2S (and Chrome Status) only cover desktop platforms, and neither covers Android WebView nor Chrome on Android. However, both the prefixed and non-prefixed Web Speech APIs are exposed on these platforms, at least to the same degree as desktop. (There are some experimental APIs, but the existing/new stable API surface is already featureful and usable.)

I will note that one of the demo URLs I found (https://mdn.github.io/dom-examples/web-speech-api/speech-color-changer/) suggests that the feature generally works on these platforms, but it would be good to clarify what the intentions are for WebView (and Chrome on Android).

Apologies that this is only being noticed right now. It looks like a lot of the pre-existing webkit-prefixed stuff predates the recent upgrades to Android WebView's webexposed coverage, so apps might already be relying on the webkit-prefixed implementation?

Ashley Newson

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Ashley Newson

unread,

Jun 11, 2025, 11:39:28 AMJun 11

to blink-dev, Ashley Newson, Alex Russell, Chris Harrelson, blink-dev, ev...@google.com, Jeffrey Yasskin, Rick Byers, tste...@google.com, pad...@mozilla.com, Yoav Weiss, Vladimir Levin, Peter Beverloo

Looking a bit more deeply into this, am I right in understanding that there isn't really any new API being added? it looks like unprefixing is actually a tangent from the original scope of this I2S? So I may have gotten confused about what's actually going on.

I found https://chromestatus.com/feature/5908775487668224 that suggests it's actually an pretty old API (perhaps sufficiently old that there was little WebView documentation).

Rick Byers

unread,

Jun 12, 2025, 9:29:22 AMJun 12

to Ashley Newson, blink-dev, Alex Russell, Chris Harrelson, ev...@google.com, Jeffrey Yasskin, tste...@google.com, pad...@mozilla.com, Yoav Weiss, Vladimir Levin, Peter Beverloo

Hi Ashley,

Thanks for catching this, I'm glad to see the WebView API exposure process working :-). The context is that the web speech API is indeed old and supported on all platforms, but historically exposed only via a webkit prefix. Evan is adding on-device web speech which I understand is a desktop-only option for now (like our other built-in AI APIs). But as a condition for adding anything to this API, the API owners asked Evan to clean up some debt here and unprefix the API. I guess there's not a separate chromestatus entry for just unprefixing (I was trying to avoid unnecessary extra paperwork but failed to appreciate the platform difference), but conceptually it is separate. So API owners have approved exposing unprefixed web speech on all platforms AND adding on-device speech recognition to desktop platforms in this intent. Sound ok?

Rick

Peter Beverloo

unread,

Jun 12, 2025, 9:51:24 AMJun 12

to Rick Byers, Ashley Newson, blink-dev, Alex Russell, Chris Harrelson, ev...@google.com, Jeffrey Yasskin, tste...@google.com, pad...@mozilla.com, Yoav Weiss, Vladimir Levin

Hey Rick, I'm replying in Ashley's absence - thank you for the added context, & we have no concerns with unprefixing the existing Web Speech API :)

Thanks,

Peter

Rick Byers

unread,

Jun 12, 2025, 11:17:29 AMJun 12

to Peter Beverloo, Ashley Newson, blink-dev, Alex Russell, Chris Harrelson, ev...@google.com, Jeffrey Yasskin, tste...@google.com, pad...@mozilla.com, Yoav Weiss, Vladimir Levin

Great, thanks Peter!

Reply all

Reply to author

Forward