Web Speech API limit of 60 seconds?

20,573 views
Skip to first unread message

Brandon Nicholls

unread,
Jan 16, 2013, 11:06:45 PM1/16/13
to chromiu...@chromium.org
Using the Web Speech API, I am only able to process speech for 60 seconds at a time.  After the 60 seconds has passed the onspeechend event is fired.  Is there a way to increase that period of time?

Irving Zamora

unread,
Feb 13, 2013, 1:51:56 PM2/13/13
to chromiu...@chromium.org
Im having the same problem, i've been looking for a solution but still havent found any.
If you find the way to increase the period of time please let me know.

Brandon Nicholls

unread,
Feb 19, 2013, 12:12:19 PM2/19/13
to chromiu...@chromium.org
I ended up redesigning my app to be based around the 60 second limit instead of trying to get more time.  


I imagine that as other browsers implement the API the time limit will probably go away.

Patrick Brady

unread,
Feb 27, 2013, 2:14:37 PM2/27/13
to chromiu...@chromium.org
Has anyone figured this out? This is driving me absolutely nuts!

Patrick Brady

unread,
Feb 27, 2013, 2:17:20 PM2/27/13
to chromiu...@chromium.org
It even says on the API demo page that you can "speak for as long as you like" http://www.google.com/intl/en/chrome/demos/speech.html. It would be nice if they would mention the 60 second limit somewhere/how to change it.

gsh...@chromium.org

unread,
Feb 28, 2013, 9:17:20 AM2/28/13
to chromiu...@chromium.org
Yes, there is a 60 second timeout in this first release, and there is currently no way to increase that period of time. We may consider lengthening this in the future. Let us know what timeout period you think would be reasonable and why.

Thanks,
Glen Shires

PhistucK

unread,
Feb 28, 2013, 9:30:32 AM2/28/13
to gsh...@chromium.org, chromiu...@chromium.org
For dictation applications, there is a need for continuous speaking. I an not familiar with the API yet, but does this limitation prevent this use case (without interrupting the speaker every minute)?

PhistucK


--
You received this message because you are subscribed to the Google Groups "Chromium HTML5" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-html...@chromium.org.
To post to this group, send email to chromiu...@chromium.org.

Patrick Brady

unread,
Feb 28, 2013, 9:54:03 AM2/28/13
to chromiu...@chromium.org
Thanks very much for the quick reply. I'm glad to hear that I'm not crazy in terms of not being able to extend the limit. haha. My main use for this type of technology would be to write blog posts using my voice since I tend to get quite a bit of pain in my hands as I type. I'm guessing it would probably take somewhere between 4 and 8 minutes to speak a 500-1000 word blog post. I realize speech to text takes a lot of processing power, so that might be wishful thinking.

The demo works quite well! Keep up the good work!

Ashley Smith

unread,
Mar 5, 2013, 3:49:24 AM3/5/13
to chromiu...@chromium.org
Please, please implement the nospeechtimeout attribute ASAP!  The 60 seconds is by no means a reasonable setting for any number of applications. It doesn't matter what you decide is the default if we can change it!

gsh...@chromium.org

unread,
Mar 5, 2013, 8:17:11 PM3/5/13
to chromiu...@chromium.org
Whether or not there is a "nospeechtimeout" attribute, there needs to be an upper limit on the maximum duration.  Let us know what max timeout period you think would be reasonable and what usage scenarios need long durations.

PhistucK

unread,
Mar 6, 2013, 1:34:52 AM3/6/13
to gsh...@chromium.org, chromiu...@chromium.org
Typing a blob post using the API, for example. As long as you can immediately re-trigger the speech recording/processing without losing any information in between, that is fine.

PhistucK


--

Robert Jones

unread,
Mar 8, 2013, 4:36:56 PM3/8/13
to chromiu...@chromium.org
On Tuesday, March 5, 2013 5:17:11 PM UTC-8, gsh...@chromium.org wrote:
Whether or not there is a "nospeechtimeout" attribute, there needs to be an upper limit on the maximum duration.  Let us know what max timeout period you think would be reasonable and what usage scenarios need long durations.


Think of a hands-free e-reader where you change pages with spoken 'forward' and 'back' - the interval between commands could be very long - many minutes

Perhaps you could use the Web Audio API to detect some relatively high level of sound that initiates a new SpeechRecognition session, which itself is short lived.

Perhaps even grab the soundbite in Web Audio and pass it on to Speech Recognition - although the Speech recognition API doesn't seem to have any linkage to Web Audio

The Google Glass folks must have dealt with this - from the demo videos it looks like you say 'OK Glass'  to get its attention and from then on it can handle speech recognition



PhistucK

unread,
Mar 8, 2013, 4:37:58 PM3/8/13
to gsh...@chromium.org, chromiu...@chromium.org
Oops, *blog

PhistucK

Ashley Smith

unread,
Mar 29, 2013, 7:38:59 AM3/29/13
to chromiu...@chromium.org
Ditto! Provided we can change it, it doesn't matter what you set it to!

Austin France

unread,
Mar 29, 2013, 9:10:14 AM3/29/13
to chromiu...@chromium.org
nospeechtimeout would be a horrible property name, instead just have a timeout propery
timeout = <ms> 
with 0 meaning no timeout (or max possible timeout allowed by implementation)

Regards
Austin




--

Joe Wagner

unread,
Apr 13, 2013, 12:06:49 PM4/13/13
to chromiu...@chromium.org
Glen, it's probably operator error on my part, but did you all increase the time limit from 60 seconds to some higher number?  It appears to not timeout for several minutes, perhaps five.  The API is lots of fun. Thanks.  -Joe Wagner

Thomas Alisi

unread,
May 9, 2013, 6:46:32 AM5/9/13
to chromiu...@chromium.org
hey,

I can confirm something has changed: I had a timer running for 5 min before the onend event was fired
Message has been deleted

Eduard Luca

unread,
Aug 25, 2013, 6:09:39 PM8/25/13
to chromiu...@chromium.org
I'm using Chrome, but if you don't speak for like 7 seconds, the recording is automatically canceled and the onend event is fired.

What I did to fix this was:

recognizer.onend = function(){
    recognizer.start();
}

Works like a charm :)

Bruce

unread,
Sep 10, 2013, 3:42:15 PM9/10/13
to chromiu...@chromium.org
Is there any simple way I can decode a pre-record audio file?

I use the following bash script to do this, but it looks like can only work on the audio file no longer  than 8 seconds.

Any suggestions will be highly appreciated.

Thanks.

Bruce.

============


#!/bin/sh
input_audio=$1;
output_txt=$2;
LANGUAGE=$3;
sox $input_audio -r 16000 -t flac ${output_txt}.flac
wget -q -U "Mozilla/5.0" --no-proxy --post-file ${output_txt}.flac --header="Content-Type: audio/x-flac; rate=16000" -O  - "http://www.google.com/speech-api/v1/recognize?lang=${LANGUAGE}&client=chromium" > ${output_txt}.tmp
cat ${output_txt}.tmp | sed 's/.*utterance":"//' | sed 's/","confidence.*//' > ${output_txt}

ewokm...@gmail.com

unread,
Sep 17, 2013, 11:32:43 PM9/17/13
to chromiu...@chromium.org, brandon....@gmail.com
My use case is that I'm writing an extension to control the entire browser vocally! Which of course necessitates that the browser be ready to take a command at any time, even after periods of silence (like while a user reads something), a.k.a. no time limit at all. I tried to circumvent the time limit by starting a new request automatically when the present one ends or errors with 'no-speech,' and promptly got request-limited-- my speech-recognition sessions are now ending instantaneously.

It seems to me that there is no solution to this problem except for each user to have their own local speech-recognition engine built into the browser client. Otherwise, the vast majority of truly great applications for this technology will be impossible.


On Wednesday, January 16, 2013 11:06:45 PM UTC-5, Brandon Nicholls wrote:

Robert Jones

unread,
Sep 18, 2013, 11:29:05 AM9/18/13
to ewokm...@gmail.com, chromiu...@chromium.org, brandon....@gmail.com
Have you seen this project by Tal Ater ?



This is a way for the browser to respond to predefined spoken commands and seems to do the same approach of 'process one command, get the result back, start up another recognition session'. I've not had time to play with it much but it seems to work OK - there is minimal feedback to the user but it doesn't seem to time out in my tests.

The issue for Google is that a continuous recognition session is causing continuous load on their server and that isn't going to scale well at all.

I've thought about, but have not implemented, having the browser monitor audio input continuously and fire up a speech recognition session whenever the user utters a specific sound - say a whistle, or 'ok' - I can see a way to build a simple javascript audio recognizer in javascript that the user would train.

I think it will have to be an approach along those lines. Or at least use some trigger that does not involve a mouse click - a gesture in front of the camera? something like Leap Motion ?

On Google's end, I could see that they could offer continuous recognition for a fee - and so the API would need to handle a user's API key. I would be OK with that.

It is not an easy problem... that's why it's fun...

--Rob







--
You received this message because you are subscribed to the Google Groups "Chromium HTML5" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-html...@chromium.org.
To post to this group, send email to chromiu...@chromium.org.

ewokm...@gmail.com

unread,
Oct 14, 2013, 5:13:54 PM10/14/13
to chromiu...@chromium.org
Is there a reason the recognition can't be done client-side? Would it massively increase browser footprint? Because that would resolve the bandwidth problem, am I right?

ewokm...@gmail.com

unread,
Oct 14, 2013, 5:16:03 PM10/14/13
to chromiu...@chromium.org, ewokm...@gmail.com, brandon....@gmail.com
I spoke with Tal, actually, and we couldn't determine why my code times out and his doesn't. Our techniques are identical. He said he ran into my problem himself while writing Annyang, but it wasn't there for me when I tried using his library. Ultimately it was just confusing.

Robert Jones

unread,
Jan 24, 2014, 5:30:02 PM1/24/14
to chromiu...@chromium.org, brandon....@gmail.com, gsh...@chromium.org


On Wednesday, January 16, 2013 8:06:45 PM UTC-8, Brandon Nicholls wrote:
Using the Web Speech API, I am only able to process speech for 60 seconds at a time.  After the 60 seconds has passed the onspeechend event is fired.  Is there a way to increase that period of time?

I've come up with a way to continuously monitor the audio input in the browser and then to trigger an event, such as a Speech Recognition, when the user creates a specific audio 'signature' - which in this case consists of two sharp 'taps' with the correct spacing, such as a finger or pen tapping on a table

shows a div when the correct audio trigger is detected

starts a Speech Recognition session when the audio trigger is detected

Try it out - #1 works in Firefox and Chrome, #2 only works in Chrome

Click Start and try tapping your desk - a light blue spike in the display shows you are tapping the 'right way' - two fast taps will trigger the action (between 200 and 500 ms spacing)



This, or something like it, might be a way to combine continuous audio monitoring with Speech Recognition

The code is pretty rough right now but I'll work it up into small a JS library. It might also have application as an assistive technology.

Thoughts, comments are welcome

--Rob Jones   jo...@craic.com


C

unread,
Apr 14, 2014, 11:55:47 AM4/14/14
to chromiu...@chromium.org
With this solution, if you are not using https://, doesn't the browser ask for mic permission every time you call the recognizer.start();?
This is the problem I am having. If you solved this, please advise.

Szymon Nowak

unread,
Aug 15, 2014, 3:46:22 PM8/15/14
to chromiu...@chromium.org
I'm trying to use speech recognition in a video conference app and the main issue for me is that it turns off itself after a few seconds without any input, which is quite a common case in a conversation. I've tried running recognition.start() in 'onend' handler, but this crashes my browser every time I try to close a tab with my app (happens in Chrome Beta 37 and Canary 38).

Anyway, for those having issues with their app asking about mic permission every time recognition.start() is called - there's a free, open source and very simple tool that allows you to access your local server through HTTPS (and HTTP) - https://ngrok.com. It's also available via homebrew for those on OS X.

David Altmayer

unread,
Sep 11, 2014, 9:49:57 AM9/11/14
to chromiu...@chromium.org
I'm also having a similar issue. I put recognizer.start() in onend(), but it seems like onend() is not actually firing constantly.

Karl Roberts

unread,
Dec 5, 2015, 9:19:29 AM12/5/15
to Chromium HTML5, brandon....@gmail.com
The nospeechtimeout should apply to standard or dictation. Currently its a mess. For example without continuous, you have to

Speak at a reasonable speed to stop it cutting out in the browser.  Generally in IVR/Speech applications this is set to 5s, but 2-3 seconds is usually a good baseline but it depends on the scenario

1) Non-Continuous needs a timeout property which cuts out after x seconds or milliseconds.

2) Continuous setting for dictation, should also have a timeout, and at a fixed 10 seconds is no really useful. So again this needs a timeout property.    Also ability to get the last "dictated" statement (from previous on dictation start/end.  End might have been instigated by timeout too.

All this makes sense to me having designed a lot of speech applications and will make google speech better.  

Google, lift your game, Microsoft are about to do an assault on you and they might well do it better.

Maxime Rinna

unread,
Feb 26, 2016, 5:34:13 AM2/26/16
to Chromium HTML5

Hello,

I am deaf and I use the API for a few months, recovering the audio output from my phone to transcribe the words of my interlocutor.

I call that little by being deaf, but recently the timer 60 s no longer possible to use the speech API that passes t he why this change?

Few have paid your API works without judgment and without contreintes 24H24H 7/7 ns 365 days.

In short how to use this API in batch mode on a VM without having to click on the start button?

the applications must be simple, I log on to the URL I speak and I see the text not button and ca has to walk all the time without interruption.

This will make a great service.

Thank you for your return please.

Szymon Nowak

unread,
Feb 26, 2016, 6:00:14 AM2/26/16
to Chromium HTML5
Hi Maxime,

I've been working on a universal translator some time ago (https://webrtc-translate.herokuapp.com) where I had the same issue and had to add a button to turn speech recognition on, which is really annoying. However, there's a new speech recognition service from Microsoft (here's a demo https://www.projectoxford.ai/demo/speech#recognition). It works via websockets (you can check demo code for details) and allows you to recognize speech from any audio source (mic, remote audio stream, audio file etc.). I haven't yet had time to use it, so I don't know if they have any constraints similar to Chrome implementation of Web Speech API, but it might be worth trying out.

Cheers,
Szymon

Maxime Rinna

unread,
Feb 27, 2016, 2:24:36 AM2/27/16
to Chromium HTML5
Hello ,  Szymon
 
Thank you for the quick return.

Your project is very useful it looks like this app www.hellopal.com

I registered and I will test the API www.projectoxford.ai speech I will summarize when it will work on my website.

and API chrome, I have a solution that allows me to remain in continuous mode, but it is very heavy and not really optimum.

it comes to a reset line to every problem or finalized, it is very ugly but I is no other solution:

   recognition.onend = function() {  console.log("CAS onend");   window.location.reload(); };
   recognition.onerror = function() {  console.log("CAS onerror");   window.location.reload(); };
   recognition.onaudioend = function() {  console.log("CAS onaudioend");  window.location.reload(); };
   recognition.onnomatch = function() {  console.log("CAS onnomatch");   window.location.reload(); };
   recognition.onsoundend = function() {  console.log("CAS onsoundend");  window.location.reload();   };
   recognition.onspeechend = function() {  console.log("CAS onspeechend"); window.location.reload(); };
        

in the loop result:

        if (event.results [i] .isFinal) {
          final_transcript event.results = [i] [0] .transcript;

                 // Write in this line mysql for this session http ok ... :::

console.log ( "CASE ok and reset api ... Line:" + final_transcript);
window.location.reload ();

        } Else {
          interim_transcript event.results + = [i] [0] .transcript;


        }


I'll add a mysql in writing and proofreading for persitance after reload

But why it is also complicated?

José Manuel

unread,
Apr 13, 2016, 2:59:38 AM4/13/16
to Chromium HTML5, brandon....@gmail.com
I  know that this thread has a lot of time since published but i used this api and i haved the same problem.
What i did to solved it was:
in the function onend i call for the function to initialize again the recognizer it worked!!! but the problem now was to stop it jeje so i create a boolean variable to cast if the user press a button to stop the recognizer and stop all the api
i dont know if i explaned well o.o!!

ahhh and another thing sorry for my english jeje!!! I'm trying to improve it


noeru244

unread,
Jun 26, 2016, 7:32:08 AM6/26/16
to Chromium HTML5
Can anyone advise a novice how to apply this html fix?

Thanks.

Prateek Pant

unread,
Nov 18, 2016, 6:32:07 AM11/18/16
to Chromium HTML5

@noeru244  

You can check sample implementation in  file player.html in my github page https://github.com/pantprateek/genieYT. The logic is as follows :

Create a timer which stops recognition after every 10secs .

setInterval(resetVoiceRecog, 10000);

function resetVoiceRecog() {
    if(inspeech==false){ 
       recognition.stop();
      }
}

when  recognition.stop is called it  surely invokes onend and then start recognition again. 


         recognition.onend = function(event) {
          recognition.start();

       }

This method works for me for hours even if I don't speak a word.

Joanne Lombardi

unread,
May 10, 2017, 4:37:25 PM5/10/17
to Chromium HTML5, brandon....@gmail.com
Has there been any updates to extend the 60 second limit?  I would love to see it be able to translate for at least an hour.

Mark Rejhon

unread,
Jul 20, 2017, 9:29:31 AM7/20/17
to Chromium HTML5, brandon....@gmail.com, jl04...@gmail.com
I am a deaf person who programs.  I can't use the phone without relay assistance.  Speech Recognizer API works with 98% accuracy on my spouse's voice for phone calls.  I would pay $10/hour.  Still much cheaper than a human transcriber, and would cut out a middleman in some of my phone calls.

Google uses Speech Recognizer for YouTube autocaptions (and it's finally starting to work surprisingly well for some good commentator voices).   So why not telephone calls?   I know it won't work on 100% of phone calls, but it works nearly perfectly with my spouse's voice -- the most important person.   I can't call 911 quick on my mobile phone where I live in Canada (Text-only 911 not deployed for mobiles), but I can call my spouse in emergencies.  I've had people email me from UK begging me for ways to make telephone calls, as their countries don't have Sprint WebCapTel or similar services, etc.

I'm surprised that I'm still not even able to pay Google (yet) for continuous cloud speech API on my mobile phone.  

Nudge, nudge -- Google!   
(cc: Google disability department too)

Mark Rejhon

unread,
Jul 20, 2017, 9:35:29 AM7/20/17
to Chromium HTML5, brandon....@gmail.com, jl04...@gmail.com
Oh, and I think there's a market for apps where in-app purchases lets users buy continuous transcription cloud time, to make longer phone calls.  Disability departments at many companies often reimburse $100+/hr for Remote CART transcribers/captioners, and some companies may want to save money or eliminate caps on use (unlimited hours reimbursed by company) by switching to automated transcription.   

It might not be worth much to Google at this time, but look at all the Good Karma points available if Google Speech Recognizer API was made available for continuous speech.   It could revolutionize phone calls for the deaf in many countries such as Europe that does not have captioned telephone services like the United States does.   Google wants better reputation in Europe (antitrust) and maybe this is one of the many easy ways to compensate.   (Just brainstorming incentives for Google, nudge, nudge)

Cheers
Mark Rejhon

PhistucK

unread,
Jul 20, 2017, 9:57:59 AM7/20/17
to Mark Rejhon, Chromium HTML5, brandon....@gmail.com, jl04...@gmail.com
I think it is supported, but the developer has to jump through some hoops for it to work (restart the recognition after X seconds). It might just be a matter of developing such an application.
I think you can cook up WebRTC and speech recognition together by parallely starting both of them at once, I guess, but the other side would have to use a Chrome rather than a regular phone.


PhistucK

--
You received this message because you are subscribed to the Google Groups "Chromium HTML5" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-html5+unsubscribe@chromium.org.

To post to this group, send email to chromiu...@chromium.org.

Glen Shires

unread,
Jul 20, 2017, 10:56:37 AM7/20/17
to PhistucK, Mark Rejhon, Chromium HTML5, brandon....@gmail.com, jl04...@gmail.com
The Google Cloud Speech API supports long transcriptions and many additional features. https://cloud.google.com/speech

Szymon Nowak

unread,
Jul 20, 2017, 11:55:19 AM7/20/17
to Chromium HTML5, brandon....@gmail.com, jl04...@gmail.com
Twilio does support speech recognition, but according to their docs only for 60 seconds (https://www.twilio.com/speech-recognition - "<Gather> with speech has a maximum duration of 60 seconds."). I'm not sure if it's possible to restart it after this time. However, if Twilio JS library gives access to the remote audio stream (and if it doesn't I guess one could modify the library to expose it and/or create an issue for it), it should be possible to use a streaming speech recognition service like Google (https://cloud.google.com/speech/) or Microsoft (https://azure.microsoft.com/en-us/services/cognitive-services/speech/) and create a web or native app that accepts incoming calls and provides real-time transcription. Cool idea for a project ;)

Mark Rejhon

unread,
Jul 20, 2017, 3:40:11 PM7/20/17
to Chromium HTML5, mark...@gmail.com, brandon....@gmail.com, jl04...@gmail.com
I have thought of that idea:
(1) Purchase paid Google API time, billed at $/minute for transcription
(2) Run two parallel streams to the server (offset by 30 seconds relative to each other), so that restart-interruptions don't interrupt transcriptions
(3) Use intelligent programming to stitch the transcriptions together in realtime with low latency.   There will be context confusions that causes mis-recognizes or imperfect splices of realtime text output, but some intelligent client-side programming glue will erase the majority of that.

This workaround likely would probably work, but would cost twice as much ($/hour) and use twice as much server resources.   The overlaps could be reduced to about 5/10 seconds (not too little, not too big) to pull off continuous-speech out of a 1-minute-limited cloud speech recognizer.    It works in crude experiments, but I'd rather have a cleaner solution that isn't as much at risk of flouting API policies in a mass-market application.

Mark Rejhon



PhistucK

To unsubscribe from this group and stop receiving emails from it, send an email to chromium-html...@chromium.org.
Reply all
Reply to author
Forward
0 new messages