Aligning audio to text

Shreevatsa R

unread,

Dec 15, 2018, 10:46:41 PM12/15/18

to sanskrit-programmers

Over the last year or so, I have been listening to the Rāmāyaṇa off and on, mostly while driving. It has been a richly rewarding experience (despite my limited attention and understanding). As I near the end, it occurs to me that it would be a nice idea to combine the excellent audio with the Sanskrit text (stanza by stanza) and possibly an English translation, to produce essentially 50-odd hours of “video”.

I do not know how to do this. I imagine that, having the audio and text separately (the audio is already split by sarga), some heuristics like looking for pauses and matching them to shloka boundaries will mostly work. If that's not good enough, machine learning may help. I looked up how this is done in general, and it seems the problem is called “forced alignment” and there are some resources here: https://github.com/pettarin/forced-alignment-tools (most of the existing tools work only with specific languages, Sanskrit not included). (Note that Youtube also has a feature where you can upload a video and transcript and it will try to create subtitles out of the transcript.)

I don't know anything about machine learning, so probably am not going to pursue this further right now, but if someone wants to do it, then having the audio annotated with timing information might be a nice project.

-Shreevatsa

(BTW it's not even fully clear exactly what text was used (it's definitely not the “critical edition” of the Rāmāyaṇa used by e.g. Bibek Debroy for his translation), but it seems to match the popular text like that on https://www.valmikiramayan.net/ Anyway, if someone is producing audio out of text, it's good to retain the exact text used; it may be useful later.)

Shreevatsa R

unread,

Dec 15, 2018, 11:16:52 PM12/15/18

to sanskrit-programmers

An example of the kind of "video" I have in mind: https://youtu.be/Nc2_LaM6naU

(Except better :P)

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,

Dec 16, 2018, 9:43:38 PM12/16/18

to sanskrit-programmers

Does it match https://sanskritdocuments.org/sites/pssramanujaswamy/1.%20BAALA%20KAANDAM.pdf ?

--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

--
Vishvas /विश्वासः

Shreevatsa R

unread,

Dec 16, 2018, 10:28:18 PM12/16/18

to sanskrit-programmers

Haven't checked; and that's the sort of effort I hope automation can help avoid.

But the titles of the sargas in the audio files seem to be taken from Desiraju Hanumantha Rao's work (valmikiramayan.net) and the contents seem to roughly match the titles.

I imagine the work of aligning audio and text will be a sort of easier case of speech-to-text, and (imagining it aligns to the syllable level) will highlight whatever discrepancies exist.

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,

Apr 1, 2019, 11:10:11 PM4/1/19

to sanskrit-programmers

While researching a separate (and much simpler) problem, I stumbled upon https://github.com/tyiannak/pyAudioAnalysis/wiki/5.-Segmentation . I have a feeling that this library can be used to rather easily split the audio into shlokArdha-s.

On Sat, Dec 15, 2018 at 7:46 PM Shreevatsa R <shree...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,

Apr 1, 2019, 11:13:11 PM4/1/19

to sanskrit-programmers

On Mon, Apr 1, 2019 at 8:09 PM विश्वासो वासुकिजः (Vishvas Vasuki) <vishvas...@gmail.com> wrote:

While researching a separate (and much simpler) problem, I stumbled upon https://github.com/tyiannak/pyAudioAnalysis/wiki/5.-Segmentation . I have a feeling that this library can be used to rather easily split the audio into shlokArdha-s.

PS: Invocation here - https://stackoverflow.com/questions/36458214/split-speech-audio-file-on-words-in-python .

On Sat, Dec 15, 2018 at 7:46 PM Shreevatsa R <shree...@gmail.com> wrote:
Over the last year or so, I have been listening to the Rāmāyaṇa off and on, mostly while driving. It has been a richly rewarding experience (despite my limited attention and understanding). As I near the end, it occurs to me that it would be a nice idea to combine the excellent audio with the Sanskrit text (stanza by stanza) and possibly an English translation, to produce essentially 50-odd hours of “video”.

I do not know how to do this. I imagine that, having the audio and text separately (the audio is already split by sarga), some heuristics like looking for pauses and matching them to shloka boundaries will mostly work. If that's not good enough, machine learning may help. I looked up how this is done in general, and it seems the problem is called “forced alignment” and there are some resources here: https://github.com/pettarin/forced-alignment-tools (most of the existing tools work only with specific languages, Sanskrit not included). (Note that Youtube also has a feature where you can upload a video and transcript and it will try to create subtitles out of the transcript.)

I don't know anything about machine learning, so probably am not going to pursue this further right now, but if someone wants to do it, then having the audio annotated with timing information might be a nice project.

-Shreevatsa

(BTW it's not even fully clear exactly what text was used (it's definitely not the “critical edition” of the Rāmāyaṇa used by e.g. Bibek Debroy for his translation), but it seems to match the popular text like that on https://www.valmikiramayan.net/ Anyway, if someone is producing audio out of text, it's good to retain the exact text used; it may be useful later.)

--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
--
Vishvas /विश्वासः

Pooja P

unread,

Oct 4, 2019, 1:30:36 AM10/4/19

to sanskrit-programmers

I tried it out and with a bit of tinkering with the silence length and threshold, i got the necessary chunks - broken at the dvitIya and chaturtha pAdAntAs. My params were length 500 and threshold -26. It's a bit tricky to break at every pAda because of the saMhita. The text is same as valmikiramayan.net - The site has some errors here and there but the audio makes up for it I felt.

Here're some of the chunks.

On Tuesday, 2 April 2019 08:43:11 UTC+5:30, विश्वासो वासुकिजः (Vishvas Vasuki) wrote:

To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-programmers+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
--
Vishvas /विश्वासः

chunk0.wav

chunk1.wav

chunk2.wav

chunk3.wav

chunk4.wav

chunk5.wav

Shreevatsa R

unread,

Oct 4, 2019, 6:06:12 AM10/4/19

to sanskrit-programmers

Wow this is great to know, thank you!

From the API documentation it appears one can also simply get the duration, so it should be possible to do something interesting here. Will try to look into it further!

To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
--
Vishvas /विश्वासः

--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sanskrit-programmers/5932ca43-9dd0-4159-99e9-0e6aabf7ae2a%40googlegroups.com.

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,

Oct 4, 2019, 10:09:12 AM10/4/19

to sanskrit-programmers

Separately, I believe (but have not confirmed) that the pATha can be obtained with high accuracy from http://parankusan.cloudapp.net/Integrated/Login.aspx .

To view this discussion on the web visit https://groups.google.com/d/msgid/sanskrit-programmers/CAKEM%3DPOLCniAB8h-XpyHzkcPu9Y7FsoMURicpr%3DeDPXeLvuenQ%40mail.gmail.com.

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,

Oct 5, 2019, 7:21:55 AM10/5/19

to sanskrit-programmers

On Fri, Oct 4, 2019 at 7:38 PM विश्वासो वासुकिजः (Vishvas Vasuki) <vishvas...@gmail.com> wrote:

Separately, I believe (but have not confirmed) that the pATha can be obtained with high accuracy from http://parankusan.cloudapp.net/Integrated/Login.aspx .

Actually, it seems that I've dumped the https://www.valmikiramayan.net/ text to https://github.com/vvasuki/kAvya/tree/master/content/TIkA/padya/purANa/rAmAyaNa (don't recall how) modulo some manual corrections later (partly using file comparison with the critical edition while listening to the recitation). There are many errors in the valmikiramayan text - for example - "शिष्यस्तु तस्य ब्रुवतो मुनेर्वाक्यमनुत्तमम् | प्रतिजग्राह संतुष्टस्तस्य तुष्टोऽभवद्मुनिः || १-२-१९" -> the reciters correctly chant "तुष्टोऽभवन् मुनिः". Folks interested in contributing to further corrections there may do so using the pencil icon on the top and the github pull-request technique.

The old/ kumbhakoNa pATha at http://parankusan.cloudapp.net/Integrated/Login.aspx reads तुष्टोऽभवद्गुरुः; And, kANDa 6 contains 131 rather than 127 sargas - so doesn't exactly match the chanting.

Reply all

Reply to author

Forward