Show alignments

716 views
Skip to first unread message

subash khanal

unread,
Jul 18, 2019, 1:43:15 PM7/18/19
to kaldi...@googlegroups.com
Hi all,

I am trying to extract specific frames from the alignments created by the acoustic models. I  used :

show-alignments data/lang/phones.txt exp/mono/0.mdl \ "ark,t:gunzip -c exp/mono/ali.1.gz |"

to get readable alignment files. Am I right in saying the numbers in [ ] in the alignment files are transition ids corresponding to each frame in sequential order (frame 1 - id1, frame 2-id3..etc)? I am only interested in getting the frames for the aligned phoneme but I wanted to be sure I am understanding the file correctly.

Moreover, how good the alignments are (say for tri3 model) if the utterance is one sentence long or just one word long? Alignments for the single word long utterances are more accurate than sentence long utterance. My experiment works on specific frames extracted, so the alignments need to be as accurate as possible. Any guidance would be appreciated.


With regards,

Subash

Daniel Povey

unread,
Jul 18, 2019, 2:23:21 PM7/18/19
to kaldi-help
Yes, numbers correspond to frames.  See ali-to-phones for info specifically about phones.  It has various options, look at them carefully.
The alignments should be quite accurate and it shouldn't matter whether you align a sentence or a word.


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CA%2B30hV%3DbmUMFVJiw_1-qTw_1%2BUX7mQ%3DvGqAhE%2BXeHTW9ys%2BQQw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

subash khanal

unread,
Jul 18, 2019, 2:25:51 PM7/18/19
to kaldi...@googlegroups.com

subash khanal

unread,
Aug 28, 2019, 10:37:02 AM8/28/19
to kaldi...@googlegroups.com
Hi, 

I just needed the frames at specific location based on the prompt transcript and human annotated transcript so I used tri3 model and extracted frames from the alignments. Following the group there is discussion about other forced alignments tools like Gentle and also some kaldi recipes as well. My doubt is if my task is only to align my transcript with the audio, using hmm tri3 model for it is fine or I should have used the forced alignment specific tools?

Regards,
Subash

Daniel Povey

unread,
Aug 28, 2019, 1:07:57 PM8/28/19
to kaldi-help
That model is fine.  Forced alignment is something that happens as part of training anyway, e.g. see the files ali.*.gz and the logs align.*.log.  

I suggest to read the HTK Book, first few chapters, to understand the basic ideas of ASR better, it would help clarify things.

Reply all
Reply to author
Forward
0 new messages