Thanks a lot for the patient help of David on e-mail!
The xvector recipe is so amazing and helpful that I want to research and reference it in my future paper. However, I meet some doubt when experiencing it for language recognition. My training dataset is AP17_OLR that has many short utterances. And my main problems are at stage 4 about creating the nnet examples:
1. What is the relationship between the arguments --min(max)-frames-per-chunk and the min_len=500(500 frames)? Can I skip the stage 3 to run the stage 4 directly?
2. I find it is hard for me to follow your explanation about some arguments, such as chunk, archive and --num-repeats. Would you pls kindly tell me much more details for them, respectively? How do I set them reasonably according to my training dataset?
3. At the 'StatisticsPoolingComponent', the left-context=0 and right-context=10000 means that we pool over an input segment starting at frame 0 and ending at frame 10000 or earlier, right? How can you determine the ending at frame 10000?
4. I have tried some different values, but all of them did not work. My error is 'ERROR (nnet3-compute-prob[5.2.124~1-70748]:CreateComputation():nnet-compile.cc:59) Not all outputs were computable, cannot create computation.' And the following screenshot is my arguments setting:
I hope you could send me more information which would do me a good favor in comprehending your algorithm. BTW, if I want to extract embeddings by building CNN instead of TDNN with Kaldi, would such way work better?
Thanks!
The following part(the screenshot is lost) is my arguments setting:
if [[ $stage -le 1 && 1 -le $endstage ]];then
sid/nnet3/xvector/get_egs.sh --cmd "$train_cmd" \
--nj 5 \
--stage 0 \
--frames-per-iter 4000000 \
--frames-per-iter-diagnostic 100000 \
--min-frames-per-chunk 10 \
--max-frames-per-chunk 30 \
--num-diagnostic-archives 3 \
--num-repeats 2 \
"$data" $egs_dir
fi
1. What is the relationship between the arguments --min(max)-frames-per-chunk and the min_len=500(500 frames)? Can I skip the stage 3 to run the stage 4 directly?
2. I find it is hard for me to follow your explanation about some arguments, such as chunk, archive and --num-repeats. Would you pls kindly tell me much more details for them, respectively? How do I set them reasonably according to my training dataset?
3. At the 'StatisticsPoolingComponent', the left-context=0 and right-context=10000 means that we pool over an input segment starting at frame 0 and ending at frame 10000 or earlier, right? How can you determine the ending at frame 10000?
4. I have tried some different values, but all of them did not work. My error is 'ERROR (nnet3-compute-prob[5.2.124~1-70748]:CreateComputation():nnet-compile.cc:59) Not all outputs were computable, cannot create computation.' And the following screenshot is my arguments setting:
BTW, if I want to extract embeddings by building CNN instead of TDNN with Kaldi, would such way work better?
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/4a3335e8-318d-4787-876b-d5335cb04ec1%40googlegroups.com.--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/2e360398-2a45-4b37-ba59-edd98636db0a%40googlegroups.com.