alignment to word_level/phone_level with ctm

1,307 views
Skip to first unread message

Hunt Rui

unread,
Jun 26, 2019, 8:14:20 PM6/26/19
to kaldi-help
Hi,

   When we have decoded lattice from force-alignment ($data/ali.lats), I use 

  1) Word transcript with ctm
     ./lattice-align-words lang/phones/word_boundary.int nnet3/final.mdl ark:$data/ali.lats ark:$data/1best_word.lats

     ./nbest-to-ctm --frame-shift=0.01 --print-silence=1 ark:$data/1best_word.lats $data/trans.ctm



  2) Phone transcript with ctm

     lattice-align-phones --replace-output-symbols=true nnet3/final.mdl ark:$data/ali.lats ark:$data/phone_aligned.lats

     lattice-to-ctm-conf --inv-acoustic-scale=10 ark:$data/phone_aligned.lats $data/trans_phones.ctm

 

  Am I right in both cases?


  Now if I have alignment file(vector of transition id) instead of lattice, how can I convert to word/phone transcript with ctm? 

  1) Phone transcript with ctm
     Can I use ali-to-phone with ctm output for this purpose?
  2) Word transcript with ctm
     I can't find codes for this. Can I use the result above and lexicon to concatenate these to form each word?


Thanks,
Andy

Daniel Povey

unread,
Jun 26, 2019, 9:22:27 PM6/26/19
to kaldi-help


   When we have decoded lattice from force-alignment ($data/ali.lats), I use 

  1) Word transcript with ctm
     ./lattice-align-words lang/phones/word_boundary.int nnet3/final.mdl ark:$data/ali.lats ark:$data/1best_word.lats

     ./nbest-to-ctm --frame-shift=0.01 --print-silence=1 ark:$data/1best_word.lats $data/trans.ctm



  2) Phone transcript with ctm

     lattice-align-phones --replace-output-symbols=true nnet3/final.mdl ark:$data/ali.lats ark:$data/phone_aligned.lats

     lattice-to-ctm-conf --inv-acoustic-scale=10 ark:$data/phone_aligned.lats $data/trans_phones.ctm

 

  Am I right in both cases?

Looks plausible. 



  Now if I have alignment file(vector of transition id) instead of lattice, how can I convert to word/phone transcript with ctm? 

You can't get the word 

  1) Phone transcript with ctm
     Can I use ali-to-phone with ctm output for this purpose?
That's what I would use, yes.
 
  2) Word transcript with ctm
     I can't find codes for this. Can I use the result above and lexicon to concatenate these to form each word?
get_train_ctm.sh may help you. 


Thanks,
Andy

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/1a29973f-1bc0-4e96-ac45-f5dc8381e96c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hunt Rui

unread,
Jun 26, 2019, 11:43:17 PM6/26/19
to kaldi...@googlegroups.com

Sray Chen

unread,
Jul 11, 2019, 1:12:20 PM7/11/19
to kaldi-help
Hi,

I use this command:
'''
lattice-align-phones --replace-output-symbols exp/tdnn_7b_chain_online/final.mdl "ark:gunzip -c $dir/lat.1.gz|" ark:- | lattice-to-ctm-conf --inv-acoustic-scale=11 --frame-shift=0.03 ark:- - | utils/int2sym.pl -f 5 model/graph_pp/phones.txt | $filter_cmd > $dir/ctm/phone.ctm
'''
And it will take around 60G of memory (using swap file) and stuck at "lattice-to-ctm-conf" for about 30 minutes. Then stop without any error message.
I have use the same command successfully before with smaller lattice (take around 30G memory with swap file). Does anyone knows the reason why it fail?

Thanks,
Sray





Hunt Rui於 2019年6月26日星期三 UTC-5下午10時43分17秒寫道:
Thank you so much. 

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/1a29973f-1bc0-4e96-ac45-f5dc8381e96c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Jul 11, 2019, 1:26:05 PM7/11/19
to kaldi-help
Normally  lattice-to-ctm-conf would be very fast and it shouldn't take up much memory.
The only thing I can think of is, maybe the input utterance is super long.  But the algorithm shouldn't 
be more than quadratic in its length (at the very most), so I'd be surprised you'd be able to make it take
so much time and memory.

Dan


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Sray Chen

unread,
Jul 11, 2019, 2:28:43 PM7/11/19
to kaldi-help
Yes, the utterance is one line with 6346 words this time. The last time I successfully get phone.ctm is with utterance consist of 4164 words. 

Dan Povey於 2019年7月11日星期四 UTC-5下午12時26分05秒寫道:

Sray Chen

unread,
Jul 11, 2019, 2:44:01 PM7/11/19
to kaldi-help
I think I can just do segment first and then get the phone alignment.

Thank you!

Sray
Sray Chen於 2019年7月11日星期四 UTC-5下午1時28分43秒寫道:
Reply all
Reply to author
Forward
0 new messages