Google グループのキーボード ショートカットが更新されました

Intent to Deprecate: speechSynthesis.speak without user activation

閲覧: 1,542 回

Charles Harrison

2018/06/22 15:33:352018/06/22
To: blink-dev、Dominic Mazzoni、David Benjamin、

Contact emails,




Unofficial draft:

TAG review skipped, as this is a simple change to an existing API. I couldn’t find documentation for what changes require TAG review.


The SpeechSynthesis API is actively being abused on the web. We don’t have hard data on abuse, but since other autoplay avenues are starting to be closed, abuse is anecdotally moving to the Web Speech API, which doesn't follow autoplay rules.

After deprecation, the plan is to cause speechSynthesis.speak to immediately fire an error if specific autoplay rules are not satisfied. This will align it with other audio APIs in Chrome.

Demo link:

Is this feature supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?

Yes (except Android WebView, which does not support Web Speech API)


Interoperability and Compatibility

Compat risk is medium, but we only recently added UseCounters which are in M69 dev right now, so take data with a grain of salt. Data is broken out by Android / ChromeOS / Other.

New UseCounters:

Android: .04% page visits impacted (.06% of page visits use speak())

ChromeOS: .0008% page visits impacted (.005% page visits use speak())

Other Desktop: .001% page visits impacted (.003% page visits use speak())

Use counts are most concerning on Android, but we also think that might be where a lot of the abuse is.

Existing UMA:

All data from the histogram TextToSpeech.Utterance.FromExtensionAPI which logs global utterances from the Web Speech API as well. Data is from all channels.

Android: 1.18% of unqiue users / day

ChromeOs: .24% of unique users / day

Other Desktop: .12% of unique users / day


I ran a few HTTPArchive queries. The results show that the API is most used by a small-ish number of libraries (ads, youtube, googleapis, etc).

HTTPArchive Total Pages

SELECT page FROM [httparchive:har.2018_02_01_chrome_requests_bodies] WHERE body CONTAINS 'speechSynthesis.speak' GROUP BY page

This returned 38467 pages.

HTTPArchive Total Subresources

SELECT url FROM [httparchive:har.2018_02_01_chrome_requests_bodies] WHERE body CONTAINS 'speechSynthesis.speak' GROUP BY url

This returned 2210 subresource URLs

HTTPArchive Total Subresource Domains

SELECT DOMAIN(url) as domain FROM [httparchive:har.2018_02_01_chrome_requests_bodies] WHERE body CONTAINS 'speechSynthesis.speak' GROUP BY domain

This returned 192 subresource domains.

Edge: No signals

Firefox: No signals

Safari: No signals, but this change will match Safari on iOS

Web developers: No signals

Is this feature fully tested by web-platform-tests? Link to test suite results from

The speech synthesis API could be tested better on WPT, but here is the dashboard. is the Chromium bug to upstream the remaining layout tests (and unbreak existing tests).

Entry on the feature dashboard

Requesting approval to ship?

No. Plan is to deprecate with a warning in M69/M70, to drive down usage and also collect more data. Ideally we could fully remove in M70/M71 once we have a clearer idea of usage, but I will leave that for another intent.


2018/06/22 16:51:182018/06/22
To: Charles、blink-dev、Dominic Mazzoni、David Benjamin、Mounir Lamouri
If I remember correctly, when <video> and <audio> got the autoplay prevention treatment, their play() was made to return a promise.
Maybe this is a good change for this one, too?


You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit

Charles Harrison

2018/06/22 17:01:262018/06/22
To: PhistucK、blink-dev、Dominic Mazzoni、David Benjamin、
Hey PhistucK,
I'm not sure changing the API to return a promise improves ergonomics here. Developers must already handle Web Speech errors anyway, so I don't see a strong need to add another error mechanism.

Mounir Lamouri

2018/06/23 18:33:252018/06/23
To: Charles Harrison、PhistucK、blink-dev、Dominic Mazzoni、David Benjamin、
I do not know the specifics of the Web Speech API but when we added the promise to `play()`, there was already solutions to discover if playback was blocked or failed but the promise made this much simpler for developers.

-- Mounir

On Fri, 22 Jun 2018, at 22:01, Charles Harrison wrote:
> Hey PhistucK,
> I'm not sure changing the API to return a promise improves ergonomics here.
> Developers must already handle Web Speech errors
> <>
> anyway, so I don't see a strong need to add another error mechanism.
> On Fri, Jun 22, 2018 at 4:51 PM PhistucK <> wrote:
> > If I remember correctly, when <video> and <audio> got the autoplay
> > prevention treatment, their play() was made to return a promise.
> > Maybe this is a good change for this one, too?
> >
> > ☆*PhistucK*
> >
> >
> > On Fri, Jun 22, 2018 at 10:33 PM Charles Harrison <>
> > wrote:
> >
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> *Contact <>,
> >> <>ExplainerN/ASpecUnofficial
> >> draft:
> >> <>TAG review
> >> skipped, as this is a simple change to an existing API. I couldn’t find
> >> documentation for what changes require TAG review.SummaryThe
> >> SpeechSynthesis API is actively being abused on the web. We don’t have hard
> >> data on abuse, but since other autoplay avenues are starting to be closed,
> >> abuse is anecdotally moving to the Web Speech API, which doesn't follow
> >> autoplay rules.After deprecation, the plan is to cause
> >> speechSynthesis.speak to immediately fire an error if specific autoplay
> >> rules
> >> <>
> >> are not satisfied. This will align it with other audio APIs in Chrome. Demo
> >> link:
> >> <>Is this feature
> >> supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS,
> >> Android, and Android WebView)?Yes (except Android WebView, which does not
> >> support Web Speech API)RisksInteroperability and CompatibilityCompat risk
> >> is medium, but we only recently added UseCounters which are in M69 dev
> >> right now, so take data with a grain of salt. Data is broken out by Android
> >> / ChromeOS / Other.New UseCounters:Android: .04% page visits impacted (.06%
> >> of page visits use speak())ChromeOS: .0008% page visits impacted (.005%
> >> page visits use speak())Other Desktop: .001% page visits impacted (.003%
> >> page visits use speak())Use counts are most concerning on Android, but we
> >> also think that might be where a lot of the abuse is.Existing UMA:All data
> >> from the histogram TextToSpeech.Utterance.FromExtensionAPI which logs
> >> global utterances from the Web Speech API as well. Data is from all
> >> channels.Android: 1.18% of unqiue users / dayChromeOs: .24% of unique users
> >> / dayOther Desktop: .12% of unique users / dayHTTPArchive:I ran a few
> >> HTTPArchive queries. The results show that the API is most used by a
> >> small-ish number of libraries (ads, youtube, googleapis, etc).HTTPArchive
> >> Total PagesSELECT page FROM
> >> [httparchive:har.2018_02_01_chrome_requests_bodies] WHERE body CONTAINS
> >> 'speechSynthesis.speak' GROUP BY pageThis returned 38467 pages.HTTPArchive
> >> Total SubresourcesSELECT url FROM
> >> [httparchive:har.2018_02_01_chrome_requests_bodies] WHERE body CONTAINS
> >> 'speechSynthesis.speak' GROUP BY urlThis returned 2210 subresource
> >> URLsHTTPArchive Total Subresource DomainsSELECT DOMAIN(url) as domain FROM
> >> [httparchive:har.2018_02_01_chrome_requests_bodies] WHERE body CONTAINS
> >> 'speechSynthesis.speak' GROUP BY domainThis returned 192 subresource
> >> domains.Edge: No signals Firefox: No signalsSafari: No signals, but this
> >> change will match Safari on iOSWeb developers: No signalsIs this feature
> >> fully tested by web-platform-tests
> >> <>?
> >> Link to test suite results from
> >> <>.The speech synthesis API
> >> could be tested better on WPT, but here is the dashboard.
> >>
> >> <>
> >> <> is the Chromium bug to upstream the remaining
> >> layout tests (and unbreak existing tests).Entry on the feature dashboard
> >> <>
> >> <>Requesting approval
> >> to ship?No. Plan is to deprecate with a warning in M69/M70, to drive down
> >> usage and also collect more data. Ideally we could fully remove in M70/M71
> >> once we have a clearer idea of usage, but I will leave that for another
> >> intent.*
> >>
> >> --
> >> You received this message because you are subscribed to the Google Groups
> >> "blink-dev" group.
> >> To unsubscribe from this group and stop receiving emails from it, send an
> >> email to
> >> To view this discussion on the web visit
> >>
> >> <>
> >> .
> >>
> >
> --
> You received this message because you are subscribed to the Google
> Groups "blink-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
> To view this discussion on the web visit

Charles Harrison

2018/06/23 19:13:142018/06/23
To:、PhistucK、blink-dev、Dominic Mazzoni、David Benjamin、
Thanks Mounir for the background. I'm definitely open to making this change if API owners (and owners of speech API) prefer it.

Reilly Grant

2018/06/25 11:19:442018/06/25
Assuming the use counter statistics can be compared this way it looks like 66% of all speechSynthesis.speak() calls on Android, 16% on Chrome OS and 33% on other desktop platforms will be effected by this change. Can we do an analysis whether there are sites that will be completely broken by this change rather than degraded? I'm mostly concerned about those that are using it for accessibility purposes.
Reilly Grant | Software Engineer | | Google Chrome

You received this message because you are subscribed to the Google Groups "blink-dev" group.

Charles Harrison

2018/06/25 11:28:122018/06/25
To:、blink-dev、Dominic Mazzoni、David Benjamin、
Hi Reilly,
I think the way to do this will be to log UKM metrics for these counters. I will go ahead and do that now so we can get per-URL metrics about breakage in M69.

Yoav Weiss

2018/06/29 5:09:042018/06/29
To: Charles Harrison、、blink-dev、Dominic Mazzoni、David Benjamin、
Switching the API to return a promise (instead of the current void) SGTM.

Are there any other learnings from the `autoplay` issues we ran into when trying to block it on user activation?

To unsubscribe from this group and stop receiving emails from it, send an email to

Dominic Mazzoni

2018/06/29 11:37:112018/06/29
To: Yoav Weiss、Charles Harrison、、blink-dev、David Benjamin、
I'm not sure I understand the purpose of returning a Promise.

Do you want the Promise to be fulfilled if the user does interact with the page later?

If not, why couldn't it just return a boolean true or false? We know synchronously whether the user has interacted with the page or not.

Also, SpeechSynthesis already has an error callback, it seems confusing to add another mechanism.

Yoav Weiss

2018/07/04 2:00:182018/07/04
To: Dominic Mazzoni、Charles Harrison、、blink-dev、David Benjamin、
Mounir - can you expand on why the promise returning mechanism had a positive user-ergonomics effect on the video autoplay case?

Charles Harrison

2018/07/06 13:11:342018/07/06
To: Yoav Weiss、Dominic Mazzoni、、blink-dev、David Benjamin、
Hey folks,
We’ve added UKM metrics since version 69.0.3474.0. While data is still very new, the trends are definitely suggestive (see internal go/speech-autoplay-deprecation-data for query). For all the cases where speak would have been disallowed via autoplay policy (across all platforms):

- About half of the origins were not known by Google’s web crawlers and thus were not included in the analysis.
- Among the remaining origins, I haven’t yet encountered any that would be affected by this deprecation that serve anything other than ad content. Most origins do not have landing pages, and searching for these origins only yields what appear to be spam posts on Twitter, posting direct links to ads hosted there.

Given the above, I believe we should proceed with the deprecation notice in M69 if we can make branch cut.

Charles Harrison

2018/07/06 13:32:332018/07/06
To: Yoav Weiss、Dominic Mazzoni、、blink-dev、David Benjamin、
To add a bit of detail, I've manually gone through the top ~10 of these origins that would be affected. None of them have landing pages but for most of them I was able to find a reference to an ad hosted on the origin using Google search. For the rest, many of the origins reference lotteries or winning prizes but I haven't manually checked all of them. I'll probably be able to do a more detailed analysis when we have more data though.

Malte Ubl

2018/07/09 18:50:272018/07/09
To:、Yoav Weiss、、、blink-dev、、
One thing that would be good to clarify:
Do future .speak() calls always succeed if a previous one succeeded (has a user action) for the window?
For video/audio elements that ever were allowed to play can play again (including different sources) in the future. The SpeechSynthesis API doesn't really have an equivalent element to which the autoplay privilege could be attached.

Charles Harrison

2018/07/10 8:41:042018/07/10
To: Malte Ubl、Yoav Weiss、Dominic Mazzoni、、blink-dev、David Benjamin、
Hi Malte, for a frame that calls speak(), its autoplay privileges are determined by whether the frame or any of its parent frames have ever received activation.

So if your frame has has activation and speak() succeeds, it should always succeed for that load. The autoplay is governed by the frame and not any specific element in this case.

Philip Jägenstedt

2018/07/10 9:08:462018/07/10
To: Charles Harrison、Malte Ubl、Yoav Weiss、Dominic Mazzoni、Reilly Grant、blink-dev、David Benjamin、Mounir Lamouri
Hi Charles,

Can you go ahead and file an issue on to make this change in the spec as well, or a PR directly?

Especially if speak() is made to return a promise, this is easily web observable and testable, so I think we should treat the spec+test situation similar to an Intent to Ship.

Like Reilly I also wonder about the potential breakage here, and I would not be surprised if we find that a large majority of usage would be broken entirely. If that is the case, then this change will require some care to get done.

Charles Harrison

2018/07/10 10:05:122018/07/10
To: Philip Jägenstedt、Malte Ubl、Yoav Weiss、Dominic Mazzoni、、blink-dev、David Benjamin、
Thanks Philip, I filed to track this.

Do you mean we should track the removal as an intent to ship, or the deprecation (this intent)? I was hoping we could migrate the LayoutTests to WPT after landing the warning but before actually shipping.

Philip Jägenstedt

2018/07/10 10:24:472018/07/10
To: Charles Harrison、Malte Ubl、Yoav Weiss、Dominic Mazzoni、Reilly Grant、blink-dev、David Benjamin、Mounir Lamouri
I didn't mean that this should be an Intent to Ship, just the "Is this feature fully tested by web-platform-tests? Link to test suite results from" bit of shipping makes sense in this case too.

Whether the tests are written/modified directly in LayoutTests/external/wpt/speech-api/ or if some of LayoutTests/fast/speechsynthesis/ is modified and upstream, and when, whatever makes sense to you is fine :)

The most pressing question now I think is how this will affect web developers, as we ought not deprecate until we're fairly confident that our plan/timeline for removal will work out. The preliminary data you have from vs. suggests that something up to 2/3 of usage will be broken, even if the usage in absolute terms isn't huge.

From the httparchive search, were you able to extract out any typical example usages, to see how it will be broken? Or must we wait for UKM metrics?

Charles Harrison

2018/07/10 14:16:072018/07/10
To: Philip Jägenstedt、Malte Ubl、Yoav Weiss、Dominic Mazzoni、、blink-dev、David Benjamin、
Thanks, that makes sense. I dug deeper into HTTP Archive to try to see typical usage patterns.
Because most of the usage is in ads script or more general "base" libraries, I grouped by resource domain first to get more diversity.

I manually went through the first ~20 entries that weren't obviously generic libraries, and collected the results in this sheet. The majority of cases used a play button to start speaking, which means the site would be fine with the autoplay restrictions. For many of the sites I couldn't find how they used the API, or I couldn't get the UI to actually issue a speak command. I couldn't get any site to speak() in way which would break from this change.

  FORMAT("%T", NET.REG_DOMAIN(url)) as domain
  STRPOS(body, 'speechSynthesis.speak') > 0

Philip Jägenstedt

2018/07/10 14:44:232018/07/10
To: Charles Harrison、Malte Ubl、Yoav Weiss、Dominic Mazzoni、、blink-dev、David Benjamin、
Hmm, failure to find cases that would be broken in httparchive coupled with the preliminary use counter findings suggest that either there's some large (in page visits) site using the API without a user gesture, which is good news in a way if we can figure out which it is...

(Or possibly there's some bug in the condition to trigger the use counter?)

Does Google search use the API by any chance? You mentioned YouTube also, how do they use it?

Dominic Mazzoni

2018/07/10 15:17:582018/07/10
To: Philip Jägenstedt、Charles Harrison、Malte Ubl、Yoav Weiss、、blink-dev、David Benjamin、
Are ads always constrained to iframes? Is there any chance that some ad network is speaking, but the use counter is blaming the hosting site rather than the ad's origin?

What about a Chrome extension could be injecting code to speak using a content script? Note that there's already a separate extension API for TTS from an extension's background page, but it's certainly plausible that an extension could be adding code to a page that makes it speak.

Charles Harrison

2018/07/10 16:17:252018/07/10
To: Dominic Mazzoni、Philip Jägenstedt、Malte Ubl、Yoav Weiss、、blink-dev、David Benjamin、
Philip: My hypothesis (and the UKM backs this up), is that the majority of the uses of the API are abuse. Perhaps none of these sites show up on HTTP Archive because they have no landing pages (e.g. they 403 if you don't know the exact path to the ad).

I thought Google Search uses the API if you use your microphone to search, but it looks like it's using some other mechanism. I haven't figured out how youtube embeds can access the API, I don't think there is public documentation. By far the most prevalent resource URL using the API (according to HTTP Archive) is .

Dominic: Ads are not always constrained to frames (the UKM data shows most of the top counts are top-level ads). UKM will also always report the top level, so yes ads on will report by design, even if the ads are constrained to an iframe.

A Chrome extension could be injecting code to speak using a content script. We have no practical way of telling the difference between a content script and a normal script at runtime though. 

Daniel Bratell

2018/07/17 10:22:032018/07/17
To: Dominic Mazzoni、Charles Harrison、Philip Jägenstedt、Malte Ubl、Yoav Weiss、、blink-dev、David Benjamin、
I tried to see what YouTube is up to but all I found was some speech code in the TV version of YouTube that I couldn't figure out how to trigger. 
There is something called env_isTransliterationSpeakable (see
function(a){return Qp(a.speechSynthesis.speak)},

I guess that is the same that you found?

To view this discussion on the web visit

/* Opera Software, Linköping, Sweden: CEST (UTC+2) */

Charles Harrison

2018/07/17 15:33:162018/07/17
To: Daniel Bratell、Dominic Mazzoni、Philip Jägenstedt、Malte Ubl、Yoav Weiss、、blink-dev、David Benjamin、
Hi Daniel,
Yeah the tv-player was the only thing I saw (besides the uses in base.js). I believe Youtube TV uses speech synthesis for accessibility / screen reading. However, the client on the actual TV isn't a Blink platform (it uses Cobalt iiuc).

2018/07/19 12:38:092018/07/19
To: blink-dev、、、、、、、、、
Hey all,

Discussed at today's API Owners meeting; I think I understand the situation but for the avoidance of doubt, can someone confirm that the new policy will adhere to the exact same behavior as media element autoplay?

That is to say, we grant autoplay by default with sound to installed PWAs. Will this API respect that? Other cases where autoplay with sound has been granted? The hope is that "things that can make noise without interaction" are all goverend in exactly the same way (and that use of speech playback feeds into Media Engagement score, etc. etc.). Are they symmetric?

Similarly, the concern about feature detection and modeling as a permission came up again. The resolution of the previous discussion wasn't particularly satisfying:

We continue to lack an effective feature-detection mechanism for both; the only way to be notified of the autoplay situation changing in-page seems to be the `play` event:

Can we get this capability reflected to developers in a way that doesn't require con's-ing up an expensive to create element + weird side effects?

To unsubscribe from this group and stop receiving emails from it, send an email to
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to

Mounir Lamouri

2018/07/19 21:05:462018/07/19
On Thu, 19 Jul 2018 at 09:38 <> wrote:
Hey all,

Discussed at today's API Owners meeting; I think I understand the situation but for the avoidance of doubt, can someone confirm that the new policy will adhere to the exact same behavior as media element autoplay?

That is to say, we grant autoplay by default with sound to installed PWAs. Will this API respect that? Other cases where autoplay with sound has been granted? The hope is that "things that can make noise without interaction" are all goverend in exactly the same way (and that use of speech playback feeds into Media Engagement score, etc. etc.). Are they symmetric?

The code that I reviewed is hooking into the autoplay policy so the Speech API will use the policy applying to any media element. If autoplay is granted or disabled one way or another (MEI, x-origin iframe, PWA), it will apply to this API.
Similarly, the concern about feature detection and modeling as a permission came up again. The resolution of the previous discussion wasn't particularly satisfying:

We continue to lack an effective feature-detection mechanism for both; the only way to be notified of the autoplay situation changing in-page seems to be the `play` event:

Can we get this capability reflected to developers in a way that doesn't require con's-ing up an expensive to create element + weird side effects?

The play event is no longer the state of the art to detect autoplay. Instead, websites are expected to use the play promise. It doesn't require to set up a full element. There are some discussions to expose an API but at the moment, it's specific to the media element.

-- Mounir

Philip Jägenstedt

2018/07/20 8:26:002018/07/20
To: Mounir Lamouri、Alex Russell、blink-dev、Daniel Bratell、Dominic Mazzoni、Malte Ubl、Yoav Weiss、Reilly Grant、David Benjamin、Alex Russell
Given that the relevant specs don't require the same policy to be used for all media-producing APIs, should we consider adding the equivalent of the proposed allowedToPlay to Web Speech (and Web Audio?), or should we collapse them into a single concept spec side and have a single way of detecting that making noise will or won't succeed?

Daniel Bratell

2018/07/26 12:11:352018/07/26
To: Mounir Lamouri、Philip Jägenstedt、Alex Russell、blink-dev、Dominic Mazzoni、Malte Ubl、Yoav Weiss、Reilly Grant、David Benjamin、Alex Russell
