Getting the ctm file from a new audio file that isn't yet labelled

559 views
Skip to first unread message

user435268

unread,
Jan 12, 2018, 9:51:43 PM1/12/18
to kaldi-help
In an effort to speed up the creation of labelled data is there a way to get the ctm file from a new recording that isn't labelled yet?  I'm able to get the ctm file using ./steps/get_ctm.sh but that is referencing the data/test directory which has the transcription, text, utt2spk, etc.  I've been crawling through the steps/get_ctm.sh and steps/decode.sh code but it's taking me a while.  As well as looking at lattice-align-words.  Any help is appreciated.

Daniel Povey

unread,
Jan 12, 2018, 10:02:04 PM1/12/18
to kaldi-help
If you don't have the text, just don't put it there.

On Fri, Jan 12, 2018 at 9:51 PM, user435268 <robertjim...@gmail.com> wrote:
In an effort to speed up the creation of labelled data is there a way to get the ctm file from a new recording that isn't labelled yet?  I'm able to get the ctm file using ./steps/get_ctm.sh but that is referencing the data/test directory which has the transcription, text, utt2spk, etc.  I've been crawling through the steps/get_ctm.sh and steps/decode.sh code but it's taking me a while.  As well as looking at lattice-align-words.  Any help is appreciated.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/e5efa728-38b5-4904-b697-a218981888b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

user435268

unread,
Jan 15, 2018, 5:33:30 PM1/15/18
to kaldi-help
Thanks for the reply !  I did some more looking around and I found this which is what I'm trying to get at.


I was able to implement the "online-wav-gmm-decode-faster" method to transcribe some new audio at the utterance level which is helpful. But I'm still struggling to get it to the point to be able to decode new audio without having the text, utt2spk, spk2utt, segments, files for the new audio.  I created blank files for text, utt2pk, segments, spk2utt and I'm using the kaldi for dummies run.sh script and the utils/validate_data_dir.sh throws an error that spk2utt is empty.  So I commented out the validate_data_dir.sh script in an effort to get the make_mfcc.sh script to at least create the mfccs for the new audio.  During a separate successful run I outputted the compute-mfcc-feats to see if I could some how directly call that but it doesn't look like I can do that.  I'd like to be able to decode this new audio so that I can run ./step/get_ctm.sh so I can get the alignments at the word level.  I definitely need to better understand this process but in the meantime thanks in advance for any help!


On Friday, January 12, 2018 at 10:02:04 PM UTC-5, Dan Povey wrote:
If you don't have the text, just don't put it there.
On Fri, Jan 12, 2018 at 9:51 PM, user435268 <robertjim...@gmail.com> wrote:
In an effort to speed up the creation of labelled data is there a way to get the ctm file from a new recording that isn't labelled yet?  I'm able to get the ctm file using ./steps/get_ctm.sh but that is referencing the data/test directory which has the transcription, text, utt2spk, etc.  I've been crawling through the steps/get_ctm.sh and steps/decode.sh code but it's taking me a while.  As well as looking at lattice-align-words.  Any help is appreciated.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Daniel Povey

unread,
Jan 15, 2018, 5:37:57 PM1/15/18
to kaldi-help

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

郝竹林

unread,
Jun 10, 2019, 10:09:25 PM6/10/19
to kaldi-help
Here is a complete introduction and steps.

https://www.eleanorchodroff.com/tutorial/kaldi/forced-alignment.html


在 2018年1月13日星期六 UTC+8上午10:51:43,user435268写道:
Reply all
Reply to author
Forward
0 new messages