I know there are a lot of amazing open source force alignment tools, like Gentle, Montreal forced aligner etc., but most of them work for english data, and are trained on english models, I have 2 questions :
1. How does force alignment work in context of kaldi ( word level )?
For example, I have an hour of audio, and corresponding text, and I would like to get sentence level alignments, where each sentence consists of words.
2. How can I train my own model in kaldi, for a new language with less data , for forced alignment?
Also for this, how much data will I need ?
I have limited Hindi data, and I need to build a custom forced aligner to have a basic system,I can later improve it, once I have more data, Is Kaldi suitable for this ?
Thanks