kaldi_dir="$ASR_MODEL_DIR"
online_nnet2="$kaldi_dir/src/online2bin/online2-wav-nnet2-latgen-faster"
recipe_dir="$kaldi_dir/egs/$recipe/s5"
online_nnet2_decoding()
{
local decoding_conf=$1
local word_symbol_table=$2
local mdl=$3
local fst=$4
"$online_nnet2" --do-endpointing=true \
--online=false \
--config="$decoding_conf" \
--max-active=7000 \
--beam=15.0 \
--lattice-beam=6.0 \
--acoustic-scale=0.1 \
--word-symbol-table="$word_symbol_table" \
"$mdl" \
"$fst" \
"ark:echo utterance-id1 utterance-id1|" \
"scp:echo utterance-id1 $input_wav_file|" \
"ark:|$kaldi_dir/src/latbin/lattice-best-path --acoustic-scale=0.1 ark:- ark,t:- | $recipe_dir/utils/int2sym.pl -f 2- "$word_symbol_table" > $output_txt_file" || exit 1
}
You can use the approach used in Aspire to create uniform segments and decode them.
local/multi_condition/create_uniform_segments.py
Vimal
--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
$ echo "utt1 <abs-path-to-wav-file>" > wav.scponline2-wav-nnet2-latgen-faster --online=false \
--do-endpointing=false -- \
--config=/opt/kaldi/egs/fisher_english/s5/exp/nnet2_online/nnet_a_gpu_online/conf/online_nnet2_decoding.conf \
--max-active=7000 \
--beam=15.0 \
--lattice-beam=6.0 \
--acoustic-scale=0.1 \
--word-symbol-table=/opt/kaldi/egs/fisher_english/s5/exp/tri5a/graph/words.txt \
/opt/kaldi/egs/fisher_english/s5/exp/nnet2_online/nnet_a_gpu_online/final.mdl \
/opt/kaldi/egs/fisher_english/s5/exp/tri5a/graph/HCLG.fst \
ark:/tmp/wav/spk2utt \
'ark,s,cs:extract-segments scp,p:/tmp/wav/wav.scp /tmp/wav/segments ark:- |' \
'ark:|/opt/kaldi/src/latbin/lattice-best-path --acoustic-scale=0.1 ark:- ark,t:- | /opt/kaldi/egs/fisher_english/s5/utils/int2sym.pl -f 2- /opt/kaldi/egs/fisher_english/s5/exp/tri5a/graph/words.txt > /tmp/kaldi_output.txt'online2-wav-nnet2-latgen-faster --online=false \
--do-endpointing=true -- \
--config=/opt/kaldi/egs/fisher_english/s5/exp/nnet2_online/nnet_a_gpu_online/conf/online_nnet2_decoding.conf \
--max-active=7000 \
--beam=15.0 \
--lattice-beam=6.0 \
--acoustic-scale=0.1 \
--word-symbol-table=/opt/kaldi/egs/fisher_english/s5/exp/tri5a/graph/words.txt \
/opt/kaldi/egs/fisher_english/s5/exp/nnet2_online/nnet_a_gpu_online/final.mdl \
/opt/kaldi/egs/fisher_english/s5/exp/tri5a/graph/HCLG.fst \
'ark:echo utterance-id1 utterance-id1|' \
'scp:echo utterance-id1 /tmp/audio_file.wav|' \
'ark:|/opt/kaldi/src/latbin/lattice-best-path --acoustic-scale=0.1 ark:- ark,t:- | /opt/kaldi/egs/fisher_english/s5/utils/int2sym.pl -f 2- /opt/kaldi/egs/fisher_english/s5/exp/tri5a/graph/words.txt > /tmp/kaldi_output.txt'I was able to re-use that Python script (~/kaldi/egs/aspire/local/multi_condition/create_uniform_segments.py) to split into uniform segments. But I did this thing:
- I created the folder, and put there my WAV file (~ 44 min)
- My service doesnt know anything about the speakers etc. So I have to create "wav.scp" and add there only 1 utterance.
$ echo "utt1 <abs-path-to-wav-file>" > wav.scp- Then this script created "segments" , "spk2utt", "utt2spk" files
Now I have 2 ways of decoding "online", and here are my concerns:
- Segmented audio async output: With such as 10-sec and 0-sec overlap segments recognition I'm getting worse results.
- I also wanted to get the recognized text async (that was the reason of choosing segmentation), but even with this last argument in command below, I'm getting results only after the online2-wav-nnet2-latgen-faster is completed.
online2-wav-nnet2-latgen-faster --online=false \
--do-endpointing=false -- \
--config=/opt/kaldi/egs/fisher_english/s5/exp/nnet2_online/nnet_a_gpu_online/conf/online_nnet2_decoding.conf \
--max-active=7000 \
--beam=15.0 \
--lattice-beam=6.0 \
--acoustic-scale=0.1 \
--word-symbol-table=/opt/kaldi/egs/fisher_english/s5/exp/tri5a/graph/words.txt \
/opt/kaldi/egs/fisher_english/s5/exp/nnet2_online/nnet_a_gpu_online/final.mdl \
/opt/kaldi/egs/fisher_english/s5/exp/tri5a/graph/HCLG.fst \
ark:/tmp/wav/spk2utt \
'ark,s,cs:extract-segments scp,p:/tmp/wav/wav.scp /tmp/wav/segments ark:- |' \
'ark:|/opt/kaldi/src/latbin/lattice-best-path --acoustic-scale=0.1 ark:- ark,t:- | /opt/kaldi/egs/fisher_english/s5/utils/int2sym.pl -f 2- /opt/kaldi/egs/fisher_english/s5/exp/tri5a/graph/words.txt > /tmp/kaldi_output.txt'
- Unsegmented audio:
Now with this command below (--do-endpointing=true)
and with specifying the only 1 wav file -- I'm getting satisfying results, but this consumes rather big resource of RAM.online2-wav-nnet2-latgen-faster --online=false \
--do-endpointing=true -- \
--config=/opt/kaldi/egs/fisher_english/s5/exp/nnet2_online/nnet_a_gpu_online/conf/online_nnet2_decoding.conf \
--max-active=7000 \
--beam=15.0 \
--lattice-beam=6.0 \
--acoustic-scale=0.1 \
--word-symbol-table=/opt/kaldi/egs/fisher_english/s5/exp/tri5a/graph/words.txt \
/opt/kaldi/egs/fisher_english/s5/exp/nnet2_online/nnet_a_gpu_online/final.mdl \
/opt/kaldi/egs/fisher_english/s5/exp/tri5a/graph/HCLG.fst \
'ark:echo utterance-id1 utterance-id1|' \
'scp:echo utterance-id1 /tmp/audio_file.wav|' \
'ark:|/opt/kaldi/src/latbin/lattice-best-path --acoustic-scale=0.1 ark:- ark,t:- | /opt/kaldi/egs/fisher_english/s5/utils/int2sym.pl -f 2- /opt/kaldi/egs/fisher_english/s5/exp/tri5a/graph/words.txt > /tmp/kaldi_output.txt'Question: So for my application, should I actually use online-decoding with segmented audio in 1st option above? (There is no necessity to have async recognized text, I just want to free up RAM in that way to have at least 3 jobs for ASR)
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.