Building Speech Model for limited vocabulary like Alexa skill

162 views
Skip to first unread message

gule-gulzar

unread,
Jun 4, 2019, 4:49:23 AM6/4/19
to kaldi-help
Hi,

I am developing speech model for Limited vocabulary Virtual assistant. Please help me in designing following.
  1. Data collection: What should be the approach of data collection. Should I take into consideration the noise also, as production system is going to run inside running vehicles. And how much data should be sufficient.
  2. LM: What should be the approach for LM. Should i consider limited words and sentences.
  3. Decoding time: How to reduce prediction time, so as it should look like real time response.


Also if these type of questions are previously answered please let me know and If there are any papers or read-ups available please paste them here.

Daniel Povey

unread,
Jun 4, 2019, 10:43:40 AM6/4/19
to kaldi-help
I don't think you should be trying to design such an ambitious project without having prior experience in
speech recognition.  These things aren't trivial at all, and there's no way you can spec it out properly
unless you have a sense of how well ASR systems do in different circumstances.
That will take time.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/0d5601bf-0c74-41f3-ac01-990901d3fc32%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gule-gulzar

unread,
Jun 6, 2019, 6:10:58 AM6/6/19
to kaldi-help
Thanks Dan for your suggestions.

Actually we have developed few ASR systems and that too in Indian Languages apart from English (thanks for giving us wonderful tool - KALDI).
Till now we have mostly developed LVCSR based system.

But this command based speech system needs different treatment.

One suggestion for acoustic modeling was that, don't restrict training data to any number of hours. Collect as much data as possible so as to cover almost every acoustic signature. But do apply domain specific LM and Lexicon, so as to generate only required commands and prediction time will also be small because of limited graph.

But with this approach OOV problem will occur, which will be another thing to be consider while system is put in production.




On Tuesday, 4 June 2019 20:13:40 UTC+5:30, Dan Povey wrote:
I don't think you should be trying to design such an ambitious project without having prior experience in
speech recognition.  These things aren't trivial at all, and there's no way you can spec it out properly
unless you have a sense of how well ASR systems do in different circumstances.
That will take time.

On Tue, Jun 4, 2019 at 4:49 AM gule-gulzar <nisar...@gmail.com> wrote:
Hi,

I am developing speech model for Limited vocabulary Virtual assistant. Please help me in designing following.
  1. Data collection: What should be the approach of data collection. Should I take into consideration the noise also, as production system is going to run inside running vehicles. And how much data should be sufficient.
  2. LM: What should be the approach for LM. Should i consider limited words and sentences.
  3. Decoding time: How to reduce prediction time, so as it should look like real time response.


Also if these type of questions are previously answered please let me know and If there are any papers or read-ups available please paste them here.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Jun 6, 2019, 11:26:11 AM6/6/19
to kaldi-help
You would definitely need a way to handle things outside the grammar.  Possibly just putting a bunch of of other words in your LM and handling it after the ASR (e.g. via rejection) would work.  Or you could train a system with a limited vocabulary and have other speech in your training data but just have it transcribed as if it were silence (i.e. not transcribed) or <unk> or something.

Typically with those kinds of system there is a wake-word too.  But if it's push-to-talk or whatever, that might not be needed.

Dan


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages