Hi Ondrej,
Thanks for mentioning the paper. Is there any codebase available to replicate the methodology?
I am specifically talking about these lines:
"We trained a five-lingual (four Bantu languages + English) South African acoustic models which used either 40-
dimensional MFCC features or 1024-dimensional XLSR-53
features as inputs. Both types of models were trained using
Kaldi toolkit [34] and used the same alignments obtained with
a standard GMM model."
I am trying to use this method for train a phoneme recognition model.
Thanks,
Aditya.