Transform matrix for utterance xxx has bad dimension 40X91 versus feat dim 112
798 views
Skip to first unread message
Dusefu
unread,
Jun 21, 2018, 8:51:16 AM6/21/18
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to kaldi-help
Dear Dana and et al.,
When I train multilingual TDNN model using the babel-Multilingual scripts,
there is a dimension mismatch during the MFCC+pict features extraction utilizing the run_common_lang.sh script and specifically, when aligning the perturbed data. I tried to solve by changing the sampling frequencies for all the plp, pitch and mfcc to 16 and 8 as well as I also tried the increase the number of ceps for plp. But the mismatch did not remove.
please suggest me a solution?
with best regards,
Daniel Povey
unread,
Jun 21, 2018, 3:32:21 PM6/21/18
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to kaldi-help
You're not really being specific enough here. What specific script
did you run, and did you change it, e.g. did you try to adapt it to
your own data or did you run it on the BABEL data in that directory?
I think you need to do some more background reading on speech
recognition, e.g. the HTK Book. It bothers me that you thought
changing the sampling frequency would resolve this.
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to kaldi-help
Thank you, Dan.,
I used the multilingual script "local/nnet3/run_tdnn.multilingual.sh" from the babel_multilang recipe; I adapted to my own data set.
I did not change the script. I run the run_tdnn_multilingual.sh script, it calls the "local/nnet3/run_common_lang.sh" script that fundamentally used to extract the MFCC+pich input features for the possible languages. It extracts the plp+pith feature for the speed perturbed data, then when tried to align the perturbed data using the tri3b and tri3b_ali, this dimension mismatch happened.
With regards,
Daniel Povey
unread,
Jun 21, 2018, 8:15:13 PM6/21/18
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to kaldi-help
Likely that error has to do with a mix-up between a part of the system
that expects pitch and a part that does not.
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to kaldi-help
I can confirm this error "40X91 versus feat dim 112" happens when you are trying to extract pitch features from the fisher_english dataset. If you turn you choose to extract make_plp.sh instead of make_plp_pitch you will solve it.