gender and accent recognition in Kaldi

579 views
Skip to first unread message

Tuan Maivan

unread,
Aug 18, 2018, 4:59:05 AM8/18/18
to kaldi-help
Hello,
are there anyone have tried to do the gender and accent recognition base on Kaldi framework? could you give me some advice?
thanks alot







David Snyder

unread,
Aug 18, 2018, 11:15:20 AM8/18/18
to kaldi-help
For accent, I think the most common approach is to capture this information in i-vectors, and use them as features for a separately trained classifier (e.g., SVM, logistic regression, etc). State of the art systems now use DNNs to directly classify accents or to generate embeddings that replace i-vectors: http://www.danielpovey.com/files/2018_odyssey_xvector_lid.pdf. We have a few i-vector-based language recognition recipes in egs/lre07, but they aren't regularly updated. Still, the v1 recipe is probably your best starting point. Our DNN embedding-based recipes, which we call "x-vectors" are currently only available for the task of speaker recognition (e.g., see egs/voxceleb/v2). If you wanted to go that route, you'd have to put in some effort adapting the recipe to language recognition. I recommend starting off as simply as possible, with i-vectors trained on MFCC-based features, such as the recipe in egs/lre07/v1 does. If you can train an ASR DNN, you'll almost certainly get an improvement by replacing the MFCC-based features with bottleneck features form your ASR DNN. The best ASR DNN for this purpose would be one trained on multiple languages, see the babel recipe for an example of this.

Gender ID is more straightforward. You could take an i-vector extractor (trained for speaker ID, not language ID) and train a separate system (e.g., logistic regression) to classifier the gender. That should be pretty easy to do, but an even easier, and lightweight option is to use GMMs directly. You can see an example of the latter option here: https://github.com/kaldi-asr/kaldi/blob/master/egs/sre08/v1/run.sh#L127
Reply all
Reply to author
Forward
0 new messages