Testing audio with Librispeech model

963 views
Skip to first unread message

oygle

unread,
Feb 27, 2018, 12:13:46 AM2/27/18
to kaldi-help
Having built Librispeech model and running the supplied example to produce transcripts, I need to now test another audio file. What is required to do this ?

1. Read through the instructions for the ASpIRE model at https://chrisearch.wordpress.com/2017/03/11/speech-recognition-using-kaldi-extending-and-using-the-aspire-model/ . I realise these instructions are for a different model, but I found the instructions very clear.

2. Converted a wav file to 16-bit 8khz mono waveform with ffmpeg and placed the wav file in the /kaldi/egs/librispeech/s5/data/test_clean_example/example_wav folder

3. Modified wav.scp to contain an additional line, describing the wav file from step 2

4. Do I simply run decode.sh in it's current form, or does it also need modifying ?

Daniel Povey

unread,
Feb 27, 2018, 12:50:44 AM2/27/18
to kaldi-help
it kind of depends which type of  model you built.


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/743fc074-9d7c-43f4-9314-c1f57558453f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

oygle

unread,
Feb 27, 2018, 1:27:52 AM2/27/18
to kaldi-help
For the Librispeech model ( /kaldi/egs/librispeech/s5/.. ), does the decode.sh file need to be modified please ?

oygle

unread,
Mar 1, 2018, 10:16:44 PM3/1/18
to kaldi-help
Here are the test results so far ..

1. Created a test audio, 2 mins in length and placed it in kaldi/egs/librispeech/s5/data/test_clean_example/example_wav
2. Backed up the files in kaldi/egs/librispeech/s5/data/test_clean_example and kaldi/egs/librispeech/s5/data/test_clean_example/split1/1
3. Modified files spk2utt , utt2spk and wav.scp in each of those paths, to add the file from step 1
4. Ran the following
    . cmd.sh
    . path.sh
    
5.  Then ran the test ..
online2-wav-nnet2-latgen-faster --online=true --do-endpointing=false --config=exp/nnet2_online/nnet_ms_a_online/conf/online_nnet2_decoding.conf --max-active=7000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=0.1 --word-symbol-table=exp/nnet2_online/nnet_ms_a_online//graph_test/words.txt exp/nnet2_online/nnet_ms_a_online/final.mdl exp/nnet2_online/nnet_ms_a_online//graph_test/HCLG.fst ark:data/test_clean_example/split1/1/spk2utt 'ark,s,cs:wav-copy scp,p:data/test_clean_example/split1/1/wav.scp ark:- |' 'ark:|gzip -c > exp/nnet2_online/nnet_ms_a_online//decode_test_clean_example_test/lat.1.gz'

It took about 5 minutes to decode the 2 minutes audio. Monitoring the cpu and it didn't go much above 25%, which is good, as some ASR's have gone as high as 100%.  The text output was of course very different to the spoken audio, which I expected.

Just wondering how to get the output text all lowercase and add time intervals. I assume I would have to do additional training to the Librispeech model to improve the accuracy.

Daniel Povey

unread,
Mar 1, 2018, 10:30:27 PM3/1/18
to kaldi-help
to get the time marks you have to convert it to ctm.  See steps/get_ctm.sh.
If it's using less than 100% CPU then that's very odd-- it shouldn't.  Or perhaps the tool you are monitoring it with is expressing it as a percentage of the total number of CPUs on your machine.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

oygle

unread,
Mar 2, 2018, 12:15:35 AM3/2/18
to kaldi-help
Thanks on the time markers script. I was using KSysguard to monitor, and with 4 CPU's, the 25% was an average, so, as you say, it was probably 100% of one cpu.

oygle

unread,
Mar 2, 2018, 1:19:33 AM3/2/18
to kaldi-help
That script worked okay; produced a lot of .CTM files within 'score' paths, plus other output. The converting of case to uppercase is not really an issue, as the text output can be run through a script to do that. This is the command i used to get the time markers:

bash steps/get_ctm.sh data/test_clean_example/example_wav/test.flac  data/lang exp/nnet2_online/nnet_ms_a_online/decode_test_clean_example_test

Thanks  :)

Reply all
Reply to author
Forward
0 new messages