Intent to Implement: Web Speech Recognition API on Android

5,172 views
Skip to first unread message

Jan Keromnes

unread,
May 16, 2013, 10:50:08 AM5/16/13
to blin...@chromium.org, pe...@chromium.org, mig...@chromium.org, prim...@chromium.org
TL;DR Web Speech Recognition has a vendor prefix on desktop. We are asking for feedback on whether to use the same vendor prefix on android, or to remove the prefix across desktop and android to move recognition behind a flag.

Emails

Jan Keromnes (intern) ja...@chromium.org
Peter Beverloo (host) pe...@chromium.org
Miguel García (TLM) mig...@chromium.org

Spec (mind: unofficial)


Summary

The Web Speech Recognition API’s goal is to enable web developers to programmatically incorporate speech recognition into their web apps. It is already implemented on desktop and exposed since M25 with a vendor prefix (“webkitSpeechRecognition”).

We intend to leverage Android’s excellent platform speech recognition capabilities by implementing the Web Speech API’s SpeechRecognition in Chrome for Android. This will expose the same JavaScript interfaces as on the desktop: webkitSpeechRecognition, webkitSpeechRecognitionEvent, webkitSpeechRecognitionError, webkitSpeechGrammar and webkitSpeechGrammarList.

Note that the specification also defines a SpeechSynthesis interface, which is out of the scope of this Intent to Implement as it was handled in a separate one. FYI its launch bug tracker is at http://crbug.com/239503.

Exposing “webkitSpeechRecognition” on mobile will further increase the incentive for developers to use the vendor prefix. Since Blink came with the promise not to use vendor prefixes for new features, because they are painful for developers, we are interested in feedback as to how best deal with the existing prefix. 

The two options we are considering are:
1) Keep the prefix on desktop and use the same prefix for the android version.
2) Remove the prefix and hide both desktop and android versions behind the same flag.

API test pages


Motivation

“Voice commands are going to be increasingly important.  It’s just much less hassle to talk than type!  So this quarter we launched Chrome support for web speech APIs.  Developers can now easily add voice recognition into their web apps.  We expect to see a lot of innovation there.”

Android’s platform voice recognition capabilities are excellent, and work both online and offline. Exposing it on mobile Chrome will allow a new generation of voice-driven web applications, unleashing innovation in domains like speech-to-text dictation, conversational apps and more.

Compatibility Risk

Low. The compatibility risk due to the unofficial status of the specification and the absence of other browsers exposing it is offset by the use of the vendor prefix. Once a consensus emerges among browsers, we will be able to remove the prefix (or flag) from both desktop and android. A consensus on the Web Speech API seems likely, because Safari implemented the SpeechSynthesis interface, and Firefox already implemented the whole API, though they shipped without an engine, so their API has no effect yet and they haven’t talked about it in their release notes.

Firefox:
SpeechRecognition and SpeechGrammar are implemented unprefixed, but they lack a recognition backend https://bugzilla.mozilla.org/show_bug.cgi?id=650295
SpeechSynthesis is also implemented, unprefixed, and it also has no voice backend https://bugzilla.mozilla.org/show_bug.cgi?id=525444
Safari: Only SpeechSynthesis is implemented https://bugs.webkit.org/show_bug.cgi?id=106742
Opera: N/A
Internet Explorer: N/A

OWP launch tracking bug

Row on feature dashboard?
Yes

Requesting simultaneous permission to ship?
Our immediate goal is to expose the feature behind either vendor prefix or flag to perform testing across desktop and mobile, and surface spec issues to route back to the W3C.

TAMURA, Kent

unread,
May 19, 2013, 5:56:15 PM5/19/13
to Jan Keromnes, blink-dev, Peter Beverloo, mig...@chromium.org, prim...@chromium.org
LGTM to implement.

> 1) Keep the prefix on desktop and use the same prefix for the android version.
> 2) Remove the prefix and hide both desktop and android versions behind the same flag.

I don't think 1 is reasonable.  It might increase the prefixed API usage.
If the current usage of the desktop prefixed API is very low, 2 is reasonable. Otherwise, we had better have both of the prefixed version (desktop only) and the unprefixed version behind the flag.

--
TAMURA Kent
Software Engineer, Google


Tommy Widenflycht (ᛏᚮᛘᛘᚤ)

unread,
May 20, 2013, 7:08:46 AM5/20/13
to TAMURA, Kent, Glen Shires, Jan Keromnes, blink-dev, Peter Beverloo, mig...@chromium.org, prim...@chromium.org
Adding Glen, the main speech recognition guy, to the discussion.

The speech recognition has significant usage on desktop, has been publicly announced and can't be hidden behind a flag without consequences.

Jan Keromnes

unread,
May 20, 2013, 7:22:07 AM5/20/13
to Tommy Widenflycht (ᛏᚮᛘᛘᚤ), TAMURA, Kent, Glen Shires, blink-dev, Peter Beverloo, Miguel García, Primiano Tucci
Do we have data on how much usage the Speech Recognition API gets?

I know that the API was announced by Glen in January on HTML5 Rocks (explicitely using the prefix and not checking for the unprefixed option, I submitted a pull request to fix that). The post was reshared and paraphrased several times, and GitHub has around 400 occurences of "webkitSpeechRecognition" being used across all repositories (even though a significant portion of those repositories are forks of WebKit/Blink, the rest look like demos or test pages, and I rarely found checks for the unprefixed feature).

Glen Shires

unread,
May 20, 2013, 10:25:29 AM5/20/13
to Jan Keromnes, Tommy Widenflycht (ᛏᚮᛘᛘᚤ), TAMURA, Kent, blink-dev, Peter Beverloo, Miguel García, Primiano Tucci
Usage is significant, including it's use on the new google.com homepage, announced at Google I/O http://www.cnn.com/2013/05/15/tech/web/google-search-update/index.html

It cannot be hidden behind a flag.





Jan Keromnes

unread,
May 20, 2013, 11:42:13 AM5/20/13
to Glen Shires, Tommy Widenflycht (ᛏᚮᛘᛘᚤ), TAMURA, Kent, blink-dev, Peter Beverloo, Miguel García, Primiano Tucci
I believe the google.com homepage uses the Speech Input API, which is different from the Web Speech (Recognition) API: The searchbox is an input tag with `x-webkit-speech` that doesn't use `webkitSpeechRecognition` (at least in the version I can see).

However, the official Web Speech API demo page (first result for "Web Speech API") uses `webkitSpeechRecognition` without checking for an unprefixed option: http://www.google.com/intl/en/chrome/demos/speech.html

Glen Shires

unread,
May 20, 2013, 12:21:34 PM5/20/13
to Jan Keromnes, Tommy Widenflycht (ᛏᚮᛘᛘᚤ), TAMURA, Kent, blink-dev, Peter Beverloo, Miguel García, Primiano Tucci
Jan and I talked directly.
Looks like the use of the Web Speech API on google.com is in the process of rolling out now, it's partially rolled out to some users, and soon will be more widely available. 

Hans Wennborg

unread,
May 20, 2013, 1:39:18 PM5/20/13
to Jan Keromnes, blin...@chromium.org, pe...@chromium.org, mig...@chromium.org, Primiano Tucci, Glen Shires
On Thu, May 16, 2013 at 3:50 PM, Jan Keromnes <ja...@chromium.org> wrote:
> TL;DR Web Speech Recognition has a vendor prefix on desktop. We are asking
> for feedback on whether to use the same vendor prefix on android, or to
> remove the prefix across desktop and android to move recognition behind a
> flag.

Just to clarify: you're not suggesting moving it behind a flag on
desktop, right? That would essentially mean unlaunching the feature.

I think it should be developed behind a flag on mobile irrespective of
it having a prefix or not; that's how we usually land new code in the
tree.

> The two options we are considering are:
> 1) Keep the prefix on desktop and use the same prefix for the android
> version.
> 2) Remove the prefix and hide both desktop and android versions behind the
> same flag.

I think having a prefixed version on desktop and non-prefixed on
mobile would be confusing. I think Chrome should be exposing the
platform in the same way everywhere.

I'd vote for 1), and then pushing for dropping the prefix altogether
as soon as possible.

Just my $0.02 :)

- Hans

scott...@gmail.com

unread,
Jun 12, 2013, 3:38:05 PM6/12/13
to blin...@chromium.org, pe...@chromium.org, mig...@chromium.org, prim...@chromium.org
Forgive me if this is off-topic. Please redirect it to a better place if so.

I'd like to write a mobile application to run on Android which did voice recognition to text and/or speech to text. 

I have worked some with the built in Speech stuff in .NET on Windows (System.Speech.Recognition).  I'm thinking as a developer that the mobile phone is a platform, and running apps within chrome, within a mobile phone, is somewhat, what is the word, restrictive/too many layers.

It seems much more natural to make these speech libraries be Android native libraries.  Otherwise if someone hasn't installed Chrome on their phone and is using say, Firefox, I won't be able to write a cool Android app making use of these libraries. 

If they have installed Chrome on their mobile, I could make an app run within Chrome.  But it won't run in say, Firefox currently (later I guess). 

Isn't this, two steps back.  Doesn't it make sense to just make these libraries built in Android API's that are callable from Apps? With or without Chrome being installed?

From a marketing perspective the excellence of the Google voice recognition would give a substantial win here vis a vis iPhone.

Just my two cents, any plan for making the libraries more generically callable for Android and not kept in the Chrome sandbox?  Or perhaps I misunderstand the architecture (please correct me).

thanks - Scott

Peter Beverloo

unread,
Jun 12, 2013, 6:18:34 PM6/12/13
to scott...@gmail.com, blink-dev, mig...@chromium.org, prim...@chromium.org, ja...@chromium.org
The Android SDK contains a number of APIs related to both speech recognition and synthesis.  We have also used these APIs for developing the Web facing implementation, and you could use them in any Java-based Android application you'd like to develop.


Your question indeed is a bit off-topic for this mailing list.  I'd encourage you to search the internet for "speech recognition android" or similar terms, which should yield a number of useful resources.

Peter

Dominic Mazzoni

unread,
Jun 12, 2013, 6:42:28 PM6/12/13
to scott...@gmail.com, blink-dev, Peter Beverloo, mig...@chromium.org, prim...@chromium.org
On Wed, Jun 12, 2013 at 12:38 PM, <scott...@gmail.com> wrote:
I'd like to write a mobile application to run on Android which did voice recognition to text and/or speech to text. 

To clarify Peter's answer, these Android APIs already exist and have existed for a long time. Chrome is simply providing access to those capabilities for web apps too. If you want to write native Android apps, ask on an Android mailing list.
 
- Dominic

scott...@gmail.com

unread,
Jun 12, 2013, 6:52:45 PM6/12/13
to blin...@chromium.org, scott...@gmail.com, Peter Beverloo, mig...@chromium.org, prim...@chromium.org
Great thanks for great info.  I apologize for asking in the wrong place.  I did a google of this and the results for the new Web API stuff were somewhat overwhelming, hehe.  So it brought me here.  Thanks for being patient with my ignorance.

say one last question though which may relate to the chrome web api's somewhat:

I'm just curious if these (old, already existing) Android speech API's hit the google servers directly for the voice translation to text.  I believe that is how the new Chrome API's do it from what I read.  I am thinking that this is probably the best source of translation.  Do you know if the Android speech API's do the same mechanisms?  Do they hit the google servers too? Or just use some local logic.  I would think the quality of the translation was not as good using local logic as speech to text translation is a hard problem.

thanks again and sorry for dumb questions hehe,

Scott

Dominic Mazzoni

unread,
Jun 12, 2013, 7:00:53 PM6/12/13
to scott...@gmail.com, blink-dev, Peter Beverloo, mig...@chromium.org, prim...@chromium.org
On Wed, Jun 12, 2013 at 3:52 PM, <scott...@gmail.com> wrote:
I'm just curious if these (old, already existing) Android speech API's hit the google servers directly for the voice translation to text.  I believe that is how the new Chrome API's do it from what I read.  I am thinking that this is probably the best source of translation.  Do you know if the Android speech API's do the same mechanisms?  Do they hit the google servers too? Or just use some local logic.  I would think the quality of the translation was not as good using local logic as speech to text translation is a hard problem.

The APIs do not specify which is used - it may be local, it may be remote. The client application shouldn't know or care, and the backend implementation may change at any time. The web APIs will hopefully be implemented by other browsers, and what they do may be different.

Currently if you have the latest Android, I believe it supports both server-based speech recognition (better quality) and local (works offline). Currently if you have the latest version of Chrome on other platforms, it only uses server-based speech recognition - but again, that may change at any time.

- Dominic

b.ke...@samsung.com

unread,
Mar 3, 2014, 4:54:46 PM3/3/14
to blin...@chromium.org, scott...@gmail.com, Peter Beverloo, mig...@chromium.org, prim...@chromium.org
By digging the code I was not able to find a local implementation. There is a GoogleOneShotRemoteEngine and a GoogleStreamingRemoteEngine, both using web services. Could you point me to the code that uses local api's?

Thanks,
Balazs
 

binary...@gmail.com

unread,
Mar 3, 2014, 5:33:54 PM3/3/14
to blin...@chromium.org, scott...@gmail.com, Peter Beverloo, mig...@chromium.org, prim...@chromium.org, b.ke...@samsung.com
Nevermind, I realized that I should look at the java files, i.e. SpeechRecognition.java.

jasonv...@gmail.com

unread,
Sep 17, 2015, 10:44:53 AM9/17/15
to blink-dev, pe...@chromium.org, mig...@chromium.org, prim...@chromium.org
Hello,

I am developing an HTML5 based Android application that uses the webkitSpeechRecognition (SpeechRecognition). The Android application will use Crosswalk to embed the Chromium engine to run the app. My question is whether I can use the webkitSpeechRecognition offline without the user having to be connected to the web?  Is there a way to include (embed) this technology directly into the application somehow? ...and, if so, possibly add additional language support.

Thanks,
Mr. Villmer

Primiano Tucci

unread,
Sep 17, 2015, 12:40:50 PM9/17/15
to jasonv...@gmail.com, blink-dev, Peter Beverloo, mig...@chromium.org
> My question is whether I can use the webkitSpeechRecognition offline without the user having to be connected to the web?  
Look at content/public/android/java/src/org/chromium/content/browser/SpeechRecognition.java, webkitSpeechRecognition on android is backed by Android's android.speech API. 
To be completely accurate, it is down to the backend (on the android side) which implements android.speech whether offline recognition is supported or not. IIRC it is, in the general case where android.speech it's backed by the Google Now, but there is no strong guarantee that it is always the case.

So the logical model is
(Chrome webkitSpeechRecognition) --> (Android android.speech API) --> (Provider implementing android.speech, e.g. Google Now)

r.rabi...@gmail.com

unread,
Mar 6, 2016, 4:43:14 AM3/6/16
to blink-dev, pe...@chromium.org, mig...@chromium.org, prim...@chromium.org
Hi, 
I have implemented the speech recognition on Chrome. Works great on PC. 
But, on Android, it shows all the intermediate results, as if they are final results. 
Are you familiar with that? 
Thanks.

PhistucK

unread,
Mar 6, 2016, 5:58:34 AM3/6/16
to r.rabi...@gmail.com, blink-dev, Peter Beverloo, Miguel Garcia, Primiano Tucci
No need to hijack old intent threads with bug reports.
You can search crbug.com for an existing issue and star it. If you cannot find one, file a new issue using the "New issue" link on the same page.
Please, do not add a "+1" or "Me too" or "Confirmed" (or similar) comment. It just wastes the time of Chrome engineers and sends unnecessary e-mails to all of the people who starred the issue.

You can reply with a link to the found or created issue and might get triaged (and fixed) faster.

Thank you.



PhistucK

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Reply all
Reply to author
Forward
0 new messages