k2 generalization vs kaldi

242 views
Skip to first unread message

aliiire...@gmail.com

unread,
Oct 7, 2022, 1:25:12 PM10/7/22
to kaldi-help
Hi Dear,
In my experiments, E2E models don't have good generalization and they are very dependent on dataset but Kaldi models have good generalization however E2E models have better WER on in-domain datasets vs Kaldi.
In these cases, I prefer the Kaldi models. Kaldi models have a good variance of WER on in/out-domain datasets.
For example, I train a model with wav2vec, espnet, and Kaldi on a telephone dataset like swbd. WER of wa2vec on an in-domain dataset is 5% lower (Kaldi wer is 25% and wav2vec wer is 20%) but on a very easy clean out-domain dataset wav2vec wer is 25 % but Kaldi wer is 9%.

What is about K2-fst model generalization?
 
best regards

Daniel Povey

unread,
Oct 7, 2022, 11:41:44 PM10/7/22
to kaldi...@googlegroups.com
Hm, which specific model were you using in k2?
That's something we'll have to work on but I don't think it will improve super fast.


--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/9b37820a-9b34-42b1-9fd4-da3cdf74bea3n%40googlegroups.com.

Sage Khan (Sage Khan)

unread,
Oct 8, 2022, 12:47:40 AM10/8/22
to kaldi-help
Hi. Did you try out CNN-TDNN for Switchboard? 
Also check these experiments

Kaldi-based DNN Architectures for Speech Recognition in Romanian Alexandru-Lucian Georgescu, Horia Cucu, Corneliu Burileanu Speech and Dialogue Research Laboratory University Politehnica of Bucharest Bucharest, Romania

Georgescu et al. 2021, “Performance vs. Hardware Requirements in State-of-the-Art Automatic Speech Recognition.”

Cnn-tdnn-based architecture for speech recognition using grapheme models in bilingual czechslovak task in Text, Speech, and Dialogue, K. Ekˇstein, F. P´artl, and M. Konop´ık, Eds. Cham: Springer International Publishing, 2021, pp. 523–533. [12] J. V. Psutka, A. Praˇz´ak,

  

I played with this script of CNN-TDNN for my use on Code switched Urdu on telephonic and clean data mixed. https://github.com/anish9208/gramvaani_hindi_asr/blob/main/kaldi/asr/Run_cnn-tdnn.sh

aliiire...@gmail.com

unread,
Oct 8, 2022, 1:37:25 AM10/8/22
to kaldi-help
Thanks,
I just want to start k2 training. what is the model you suggest?
Reply all
Reply to author
Forward
0 new messages