Children Speech Recognition

146 views
Skip to first unread message

ozi samur

unread,
Mar 27, 2016, 6:01:10 PM3/27/16
to kaldi-help
Hi everyone,

First of all I would like to thank you for developing this open source speech processing library.

I would like to real time online speech recognition system. I have background on C++, nodejs, linux, websocket etc. I have small background about the speech processing. I just completed the tutorial "Kaldi for dummies" and I got the expected %30 WER.

Anyway, I want to ask simple question. Is that possible to develop real time web application which does speech recognition by %15 WER for 1K vocabulary for maximum 1 second latency for children voice? I know it depends lots of things but if you have enough training data ( 50hours of speech or what should we have? ) could you achieve that ? And if you have basic understanding the speech recognition and you are hard developer, could you complete that type of requirement in 6 months?

For example what are the next steps to improve %30 WER in the "Kaldi for dummies" tutorial? It is that because lack of training data or simple model? Which steps should we follow next to improve that? I know we should now speech processing in detail but in a summary which steps you would follow?

Thanks for helping.



Daniel Povey

unread,
Mar 27, 2016, 6:04:43 PM3/27/16
to kaldi-help
 

First of all I would like to thank you for developing this open source speech processing library.

I would like to real time online speech recognition system. I have background on C++, nodejs, linux, websocket etc. I have small background about the speech processing. I just completed the tutorial "Kaldi for dummies" and I got the expected %30 WER.

Anyway, I want to ask simple question. Is that possible to develop real time web application which does speech recognition by %15 WER for 1K vocabulary for maximum 1 second latency for children voice? I know it depends lots of things but if you have enough training data ( 50hours of speech or what should we have? ) could you achieve that ? And if you have basic understanding the speech recognition and you are hard developer, could you complete that type of requirement in 6 months?

That should be possible, although it would be hassle if it were an application where detecting speech were also part of the problem.  I.e. push-to-talk etc. are easier, because you know there is speech and it's directed at the device.

For example what are the next steps to improve %30 WER in the "Kaldi for dummies" tutorial? It is that because lack of training data or simple model? Which steps should we follow next to improve that? I know we should now speech processing in detail but in a summary which steps you would follow?

The amount of data there is tiny.  You normally expect tens or hundreds of hours of data to train a reasonable system.  And to recognize childrens' speech, you need data from children.

Dan


 
-- 
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages