Hello.
I found wake word detection recipes, such as mobvoi, mobvoihotword and snips.
I have some questions.
1. I found that wav files should be not very short. Why? If my dataset consists of audio files one second long then the model is not trained or has a very large WER. But if files 1.5-2 seconds long then results are better.
2. Why these recipes use a special word FREETEXT instead of the standard <UNK> for not wake word utterances?
3. If I use utterances with words that are not in lexicon.txt file, will be this utterances influence in training and how?