At Mozilla Common Voice, we are creating dataset with text and relevant audio data.
We can download the dataset here
https://commonvoice.mozilla.org/ta/datasets
It has currently 14 hours of data for tamil .
Is this enough to train any model for TTS or STT models for tamil?
Try to build models with the current data and explore the results.
There is a existing Speech to text demo with this data here
https://huggingface.co/Rajaram1996/wav2vec2-large-xlsr-53-tamil
Can we get a text to speech version like the same above?
Created an issue for the same here to track the progress.
https://github.com/KaniyamFoundation/ProjectIdeas/issues/164
Shrini