Wake word traning

252 views
Skip to first unread message

Иван Дмитриев

unread,
Dec 2, 2020, 8:28:00 AM12/2/20
to kaldi-help
Hello.
I found wake word detection recipes, such as mobvoi, mobvoihotword and snips.
I have some questions.
1. I found that wav files should be not very short. Why? If my dataset consists of audio files one second long then the model is not trained or has a very large WER. But if files 1.5-2 seconds long then results are better. 
2. Why these recipes use a special word FREETEXT instead of the standard <UNK> for not wake word utterances? 
3. If I use utterances with words that are not in lexicon.txt file, will be this utterances influence in training and how?

Daniel Povey

unread,
Mar 1, 2021, 11:25:58 PM3/1/21
to kaldi-help

Hello.
I found wake word detection recipes, such as mobvoi, mobvoihotword and snips.
I have some questions.
1. I found that wav files should be not very short. Why? If my dataset consists of audio files one second long then the model is not trained or has a very large WER. But if files 1.5-2 seconds long then results are better. 

Check the chunk size used in the recipes.  If too many utterances were shorter than the chunk size they may have been discarded in training.
An issue that arises is, to avoid instability in training, it's important to ensure the distribution of audio lengths for the positive and negative examples is
the same.  Otherwise they end up in different minibatches which is bad for training stability because the gradients for positive and negative examples can
be quite different.

 
2. Why these recipes use a special word FREETEXT instead of the standard <UNK> for not wake word utterances? 

I suppose it has a similar effect to <UNK>.  It's just a name.
Incidentally, Yiming found that in our recipe it was important to distinguish between freetexts and silence, or at least have
separate models for them.
 
3. If I use utterances with words that are not in lexicon.txt file, will be this utterances influence in training and how?

I think the way that recipe is set up is that there are 3 phones: silence, wakeword and freetext, and I think there are supposed to
be two types of training utterances: those with wakeword, and those with freetext (not sure whether silence utts are supported).  
You're not really supposed to have any types of training utterance than those two types.

Dan
 

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/accb40a4-aae1-44d2-bfaa-db0dd7f68c44n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages