SSL features instead of MFCC in Kaldi?

111 views
Skip to first unread message

Max Lvov

unread,
Mar 18, 2024, 11:43:53 AM3/18/24
to kaldi-developers
Following this paper for End2End models:
"AN EXPLORATION OF SELF-SUPERVISED PRETRAINED REPRESENTATIONS FOR
END-TO-END SPEECH RECOGNITION"

Has anyone tried using SSL pretrained models (like HuBERT) for extracting features, instead of MFCC, and then training a Hybrid model on top of them?

ondrej...@gmail.com

unread,
Mar 19, 2024, 4:47:37 AM3/19/24
to kaldi-developers
Hi Max,

we trained a small TDNN-F model on top of features extracted with xlsr-53 in "Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models" and the SSL features helped a lot. Another benefit of using SSL features is that you can do self-supervised continual pretraining with untranscribed data even when semi-supervised training doesn't work that well due to a weak language model.

Best regards,

Ondrej

Max Lvov

unread,
Mar 19, 2024, 3:20:30 PM3/19/24
to kaldi-developers
Thanks Ondrej!

Did you try other SSL pretrained models, other than XLSR, like HuBERT or WavLM?

ondrej...@gmail.com

unread,
Mar 20, 2024, 5:09:53 AM3/20/24
to kaldi-developers
I tried XLS-R, XLSR-53, wav2vec 2.0, and HuBERT. They all worked better than MFCC features, but XLS-R worked best for low-resource languages.

Aditya Parikh

unread,
Apr 1, 2024, 6:51:20 AM4/1/24
to kaldi-developers
Hi Ondrej,

Thanks for mentioning the paper. Is there any codebase available to replicate the methodology? 
I am specifically talking about these lines: 
"We trained a five-lingual (four Bantu languages + English) South African acoustic models which used either 40- dimensional MFCC features or 1024-dimensional XLSR-53 features as inputs. Both types of models were trained using Kaldi toolkit [34] and used the same alignments obtained with a standard GMM model."
I am trying to use this method for train a phoneme recognition model. 

Thanks,
Aditya. 
Reply all
Reply to author
Forward
0 new messages