AlSalamu Alykom everyone.
Continuing on the work I've been doing on Automatic Speech Recognition (ASR) for the Sudanese dialect, I faced the same problem many of us face in this region: a lack of suitable data.
Alhamdulillah, I have completed a new paper titled: "Doing More with Less: Data Augmentation for Sudanese Dialect ASR."
Since we didn't have enough labeled data, I explored how we could generate our own. I used a mix of synthetic speech generation (TTS) and self-training pipelines.
The Result: Achieved a 45% relative improvement in recognizing Sudanese speech from (Zero-shot). Interestingly, the research showed that for our dialect, the "Medium" Whisper model actually performed better than the "Large-V2".
I am sharing this research, along with the models and datasets, hoping it makes things easier for any "Zool" (person) who wants to continue research or build applications for our community.
Your feedback and constructive criticism are always welcome.
PS: Planning on applying the same approach for few Arabic dialects(Libyan, Yamani, Iraqi, Syrian and Palestinian), any help with resources and advice would be appreciated.
Regards.
Ayman Mansour