Transform matrix for utterance xxx has bad dimension 40X91 versus feat dim 112

798 views
Skip to first unread message

Dusefu

unread,
Jun 21, 2018, 8:51:16 AM6/21/18
to kaldi-help
Dear Dana and et al.,

When I train multilingual TDNN model using the babel-Multilingual scripts,
 there is a dimension mismatch during the MFCC+pict features extraction utilizing the run_common_lang.sh script and specifically, when aligning the perturbed data. I tried to solve by changing the sampling frequencies for all the plp, pitch and mfcc to 16 and 8 as well as I also tried the increase the number of ceps for plp. But the mismatch did not remove.

please suggest me a solution?

with best regards,

Daniel Povey

unread,
Jun 21, 2018, 3:32:21 PM6/21/18
to kaldi-help
You're not really being specific enough here. What specific script
did you run, and did you change it, e.g. did you try to adapt it to
your own data or did you run it on the BABEL data in that directory?

I think you need to do some more background reading on speech
recognition, e.g. the HTK Book. It bothers me that you thought
changing the sampling frequency would resolve this.

Dan
> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> To post to this group, send email to kaldi...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kaldi-help/9cbd3977-1134-4305-9e04-d7f5b317c573%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Dusefu

unread,
Jun 21, 2018, 8:13:24 PM6/21/18
to kaldi-help
Thank you, Dan., 
I used the multilingual script "local/nnet3/run_tdnn.multilingual.sh" from the babel_multilang recipe; I adapted to my own data set.
I did not change the script. I run the run_tdnn_multilingual.sh script, it calls the "local/nnet3/run_common_lang.sh" script that fundamentally used to extract the MFCC+pich input features for the possible languages. It extracts the plp+pith feature for the speed perturbed data, then when tried to align the perturbed data using the tri3b and tri3b_ali, this dimension mismatch happened.
With regards,

Daniel Povey

unread,
Jun 21, 2018, 8:15:13 PM6/21/18
to kaldi-help
Likely that error has to do with a mix-up between a part of the system
that expects pitch and a part that does not.
> https://groups.google.com/d/msgid/kaldi-help/df0e96fd-bee1-4ef6-a568-8c618c9d25bf%40googlegroups.com.

mura...@gmail.com

unread,
Sep 16, 2020, 3:29:23 PM9/16/20
to kaldi-help
I can confirm this error "40X91 versus feat dim 112" happens when you are trying to extract pitch features from the fisher_english dataset. If you turn you choose to extract make_plp.sh instead of make_plp_pitch you will solve it.
Reply all
Reply to author
Forward
0 new messages