Forced - alignment

Aman Dudeja

unread,

Aug 28, 2019, 5:38:13 AM8/28/19

to kaldi-help

I want to force align audios to incorrect transcript. How can I achieve that in Kaldi.

Sudheer Kolachina

unread,

Aug 28, 2019, 5:46:03 AM8/28/19

to kaldi...@googlegroups.com

You mean partially correct transcript? You can check out Gentle forced aligner that's based on Kaldi which is quite convenient.

On Wed, Aug 28, 2019 at 10:38 AM Aman Dudeja <dudeja...@gmail.com> wrote:

I want to force align audios to incorrect transcript. How can I achieve that in Kaldi.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/48dc0ed0-165d-4735-919c-bb1bbac52448%40googlegroups.com.

Aman Dudeja

unread,

Aug 28, 2019, 6:00:05 AM8/28/19

to kaldi-help

I want to time align an audio to a completely different transcript. Is it possible uusing kaldi? If not how can I achieve this?

Thank you

On Wednesday, August 28, 2019 at 3:16:03 PM UTC+5:30, SudKol wrote:

You mean partially correct transcript? You can check out Gentle forced aligner that's based on Kaldi which is quite convenient.

On Wed, Aug 28, 2019 at 10:38 AM Aman Dudeja <dudeja...@gmail.com> wrote:

I want to force align audios to incorrect transcript. How can I achieve that in Kaldi.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Yondu Tsai

unread,

Aug 28, 2019, 6:04:03 AM8/28/19

to kaldi-help

you mean decode audio and align the words that are decoded?

https://groups.google.com/d/msg/kaldi-help/g36HqAEC9lI/yY1O-SrqAwAJ

Aman Dudeja

unread,

Aug 28, 2019, 6:14:31 AM8/28/19

to kaldi-help

Kaldi will always give alignments which result in maximum probability. My input will be audio , incorrect transcription and a correct transcription. The number of words and number of phonemes will be same in both the trancriptions. I want to align audio to both the transcriptons and the calculate the likelihood score for both.

Yondu Tsai

unread,

Aug 28, 2019, 6:29:03 AM8/28/19

to kaldi-help

oh then use this

https://github.com/homink/kaldi-asr.forced_decoding

Sudheer Kolachina

unread,

Aug 28, 2019, 6:30:55 AM8/28/19

to kaldi...@googlegroups.com

Cool, thanks! I didn't know about online2-wav-nnet3-latgen-faster-force in Kaidi.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/24ce2a32-0e51-4537-b21c-995626332ac0%40googlegroups.com.

Aman Dudeja

unread,

Aug 28, 2019, 6:39:33 AM8/28/19

to kaldi-help

Thanks for your quick response.

Do I need to have a trained nnet3 model for this to use.

Yondu Tsai

unread,

Aug 28, 2019, 10:11:50 AM8/28/19

to kaldi-help

ya use the kaldi one

http://kaldi-asr.org/models.html

or zamia

https://goofy.zamia.org/zamia-speech/asr-models/

```

Kaldi ASR, English:

kaldi-generic-en-tdnn_f Large nnet3-chain factorized TDNN model, trained on ~1200 hours of audio. Has decent background noise resistance and can also be used on phone recordings. Should provide the best accuracy but is a bit more resource intensive than the other models.
kaldi-generic-en-tdnn_sp Large nnet3-chain model, trained on ~1200 hours of audio. Has decent background noise resistance and can also be used on phone recordings. Less accurate but also slightly less resource intensive than the tddn_f model.
kaldi-generic-en-tdnn_250 Same as the larger models but less resource intensive, suitable for use in embedded applications (e.g. a RaspberryPi 3).
kaldi-generic-en-tri2b_chain GMM Model, trained on the same data as the above two models - meant for auto segmentation tasks.

```

Jan Trmal

unread,

Aug 28, 2019, 11:16:29 AM8/28/19

to kaldi-help

I'd suggest starting with the montreal force aligner, it has models for several languages. And it's a self-contained packages, which might be beneficial for you at your current level of experience.

y.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/be8c83d6-91b9-464e-b1a6-ff34a10cd77c%40googlegroups.com.

Aman Dudeja

unread,

Aug 28, 2019, 11:57:12 AM8/28/19

to kaldi...@googlegroups.com

Thank u very much .

One more question can I get phoneme level alignments using online2-wav-nnet3-latgen-faster-force.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/be8c83d6-91b9-464e-b1a6-ff34a10cd77c%40googlegroups.com.

Reply all

Reply to author

Forward