Custom data dimension mismatch

Nick Ellinas

unread,

Oct 5, 2017, 6:30:55 PM10/5/17

to kaldi-help

Hello,

I have succesfully trained a GMM-HMM model with the mfcc features that are extracted by kaldi. My process is mono -> deltas -> lda_mllt -> sat

Now I want to use my custom data which are 41 dimensional LSF features and different number of frames per utterance in order to train a DNN-HMM system, but first I have to align the data.

Using deltas I get "Dim mismatch: data dim = 123 vs. model dim = 39"

Using lda I get "Transform matrix for utterance s001ah012 has bad dimension 40x91 versus feat dim 287"

So what can I do in order to not have the dimensional mismatch? Sorry if its a trivial question, but I have searched a lot myself and haven't arrived at a conclusion.

Daniel Povey

unread,

Oct 5, 2017, 6:45:50 PM10/5/17

to kaldi-help

You can start the process by running, for example, train_lda_mllt.sh
and give it alignments obtained using your existing system (with the
default features), but your new higher-dimensional data directory.
This will only work if the number of frames for at least the majority
of your files is the same as with your baseline features.

Dan

> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Nick Ellinas

unread,

Oct 5, 2017, 7:09:50 PM10/5/17

to kaldi-help

I do not think it is going to work, because my higher-dimensional features have many more frames, as I used different window size and step. Could it possibly work if I change the window and step of kaldi's mfcc feature extraction in order to match mine? That way I guess the number of frames is going to be the same.

Daniel Povey

unread,

Oct 5, 2017, 7:13:12 PM10/5/17

to kaldi-help

You could try that. Or you could start from scratch doing monophone
training with your features.
It's highly unlikely that whatever you are doing to modify the
features will improve the word error rate. Many have tried and
failed.

Nick Ellinas

unread,

Oct 5, 2017, 7:22:31 PM10/5/17

to kaldi-help

(Well the idea is to then further augment the features and do lots of other stuff, so I hope that something will work)

Since you mentioned starting from scratch, I tried doing it, but I get for all the utterances: "No alignment for utterance" and e.g. "Processed 50 utterances; for utterance s001ah050 avg. like is 174.189 over 512 frames"

I structured the data in the ark format in a txt file, then converted them with copy-matrix (the functions feat-to-dim, feat-to-len etc. all work so I guess the files are ok). Then I did: fix_data_dir -> compute_cmvn_stats -> fix_data_dir -> validate_data_dir and they all ran ok.

Daniel Povey

unread,

Oct 5, 2017, 7:46:14 PM10/5/17

to kaldi-help

I don't think you got "No alignment" for all utterances, otherwise it
wouldn't have said "for utterance xxx, likelihood is yyy". You could
continue running from that point.
But it's very unlikely you'll get good results. In particular if you
change the frame rate, the results may not be good.

Nick Ellinas

unread,

Oct 5, 2017, 7:52:34 PM10/5/17

to kaldi-help

I trace the problem in the align-equal-compile script, where it outputs for most of the utterances:

EqualAlign: the randomly constructed paths lengths: 135,135,135,135,135,135,135,135,135,135

EqualAlign: utterance has too few frames 79 to align.

AlignEqual: did not align utterence

EqualAlign: the randomly constructed paths lengths: 405,405,405,405,405,405,405,405,405,405

EqualAlign: utterance has too few frames 325 to align.

AlignEqual: did not align utterence

It seems weird that it sees 325 frames as too few. Maybe the fst paths are bad? I trained a bigram phone model (I split the data into phones for getting phone error rate)

Daniel Povey

unread,

Oct 5, 2017, 7:59:17 PM10/5/17

to kaldi-help

If your frame shift is larger you may have to use a 1-state topology,
try adding the options
--num-sil-states 1 --num-nonsil-states 1
to prepare_lang.sh.

Nick Ellinas

unread,

Oct 6, 2017, 8:22:55 AM10/6/17

to kaldi-help

Well I tried this and you were right, there were no errors this time (only on a few utterances). But the monophone PER turned out to be 80%, so something must not be going right with my features I guess.

Anyway, thanks for the immediate and accurate help.

Daniel Povey

unread,

Oct 6, 2017, 12:29:08 PM10/6/17

to kaldi-help

It may be better after the LDA+MLLT stage; if your features are too
correlated, GMMs won't work as well, but LDA will decorrelate them.

Nick Ellinas

unread,

Oct 6, 2017, 5:13:55 PM10/6/17

to kaldi-help

Tried continuing training, but in train_deltas it told me to add the pdf-class-list=0 option to cluster-phones.

Did that and then I got again "no alignment for utterance" during delta training, and the PER increased to 95%, so it went all wrong.

I will try to find a way to use the features extracted from Kaldi in my other systems and see where it goes (the task is articulatory inversion, just for your knowledge)

Reply all

Reply to author

Forward