mfcc vs mfcc hires features

621 views
Skip to first unread message

Eric W

unread,
Aug 30, 2017, 3:00:35 PM8/30/17
to kaldi...@googlegroups.com
Hi Dan and all,

I am curious with the design decision of using different mfcc features between GMM and nnet3 system. Why not use mfcc hires features for GMM system too (and thus having just one type of feature during training)? Can one still train gmm system on mfcc hires features? I'd guess there will be no meaningful gain though...


Thanks
Eric

Daniel Povey

unread,
Aug 30, 2017, 3:05:28 PM8/30/17
to kaldi-help
It's a standard thing that people do. I haven't experimented with the
hires features for the GMM system, but certainly at the beginning of
training it would make it about 3x slower for no WER gain. (After LDA
the speed would be about the same, but the disk I/O would still be
greater).
The use of 13-dim features for GMM systems was tuned decades ago.

Dan
> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Eric W

unread,
Aug 30, 2017, 3:53:14 PM8/30/17
to kaldi-help, dpo...@gmail.com

Eric W

unread,
Aug 30, 2017, 4:10:21 PM8/30/17
to kaldi-help, dpo...@gmail.com
Thanks, Dan and sorry about the empty reply (wrong gmail button).

FYI, just tested hires feature on a small mono phone system (subset of wsj). Besides larger feature data size, there were also quite many alignment errors in early stages and had to use much bigger beam for alignment to pass. It would certainly make training slower...

Thanks
Eric

Daniel Povey

unread,
Aug 30, 2017, 4:18:42 PM8/30/17
to Eric W, kaldi-help
BTW if you're concerned about the space, it would be possible to dump
hires features and then choose the first 13 cepstra using
utils/limit_feature_dim.sh, which references the original archive
without copying the data. This would reduce the space taken but might
be less optimal from a linearity-of-disk-access point of view for the
GMM training. Also the features wouldn't be exactly equivalent
because the default ones use 23 instead of 40 cepstra, but I doubt
this would make a substantial difference to the WERs.

Dan
Reply all
Reply to author
Forward
0 new messages