80-dimensional log Mel feature with the additional pitch features

285 views
Skip to first unread message

Selma KA

unread,
May 8, 2021, 7:01:57 AM5/8/21
to kaldi-help
 Hi everyone, 
I'm trying to implement an end-to-end ASR model with the commonly used 80-dimensional log Mel feature with the additional pitch features (in total, 83 dimensions for each frame). But I'm not very familiar with the feature extraction step,  so I have these two questions;
  • Why are there 3 features for pitch and not just one feature? 
  • How can I have a 40-dimensional log Mel feature instead of 80? 
Thanks a lot for helping,

nshm...@gmail.com

unread,
May 8, 2021, 12:23:00 PM5/8/21
to kaldi-help
> Why are there 3 features for pitch and not just one feature?

Traditionally kaldi pitch features contain three values - probability of voicing, pitch value and delta pitch value. You can read about it in documentation:

https://kaldi-asr.org/doc/process-kaldi-pitch-feats_8cc.html

and in the research paper:

A PITCH EXTRACTION ALGORITHM TUNED FOR AUTOMATIC SPEECH RECOGNITION Pegah Ghahremani
https://danielpovey.com/files/2014_icassp_pitch.pdf

You can select just pitch if you are interested in experiments but 3-valued feature should work better.


> How can I have a 40-dimensional log Mel feature instead of 80

You set --ncep=40 --num-mel-bin=40 in conf/mfcc.conf fie.

Selma KA

unread,
May 8, 2021, 4:35:48 PM5/8/21
to kaldi-help
Thank you very much for your answer. 
Reply all
Reply to author
Forward
0 new messages