Hi everyone,
First of all I would like to thank you for developing this open source speech processing library.
I would like to real time online speech recognition system. I have background on C++, nodejs, linux, websocket etc. I have small background about the speech processing. I just completed the tutorial "Kaldi for dummies" and I got the expected %30 WER.
Anyway, I want to ask simple question. Is that possible to develop real time web application which does speech recognition by %15 WER for 1K vocabulary for maximum 1 second latency for children voice? I know it depends lots of things but if you have enough training data ( 50hours of speech or what should we have? ) could you achieve that ? And if you have basic understanding the speech recognition and you are hard developer, could you complete that type of requirement in 6 months?
For example what are the next steps to improve %30 WER in the "Kaldi for dummies" tutorial? It is that because lack of training data or simple model? Which steps should we follow next to improve that? I know we should now speech processing in detail but in a summary which steps you would follow?
Thanks for helping.