How to get timestamps while decoding with mfcc feature?

884 views
Skip to first unread message

Li Ye

unread,
Aug 25, 2019, 10:21:41 PM8/25/19
to kaldi-help
Hello I'm new to Kaldi and this could be a stupid question. But I googled and can't find any proper solution.
I have a pre-trained model(Actually it's DataTang TDNN Chain Model from Kaldi) and it works quite well. I'm trying to get timestamps,
something like this: 
word1,[startTime, endTime]; word2,[startTime, endTime]...
Here is what I found, this is the decode part:
# decoding
if [ $stage -le 2 ]; then
local/decode.sh exp/chain/tdnn_1a_sp exp/chain/tdnn_1a_sp/decode_offline_test_$vdate $nj
fi
and this is what decode is actually doing:
set -e
model_dir=$1
decode_dir=$2
stage=1
nj=$3

.....

#step2 decode with mfcc feature
graph_dir=$model_dir/graph
if [ $stage -le 2 ]; then
for test_set in $test_sets; do
steps/nnet3/decode.sh --acwt 1.0 --post-decode-acwt 10.0 \
--nj $nj --cmd "$decode_cmd" \
$graph_dir data/${test_set}_hires $decode_dir || exit 1;
done
fi
In this decoding step, it's calling kaldi bash scripts. Can I add more parameters to get the timestamps? Or is there any documentation so I could do this on my own?
Another question, can I use c++ doing all these steps? c++ would be more familiar to me. I googled that and found no Kaldi c++ coding tutorial.
Thanks!

Li Ye

unread,
Aug 25, 2019, 11:26:49 PM8/25/19
to kaldi-help
This decode bash script is from \egs\wsj\s5\steps\nnet3, and it has the following required parameters:
if [ $# -ne 3 ]; then
echo "Usage: $0 [options] <graph-dir> <data-dir> <decode-dir>"
echo "e.g.: steps/nnet3/decode.sh --nj 8 \\"
echo "--online-ivector-dir exp/nnet2_online/ivectors_test_eval92 \\"
echo " exp/tri4b/graph_bg data/test_eval92_hires $dir/decode_bg_eval92"
echo "main options (for others, see top of script file)"
echo " --config <config-file> # config containing options"
echo " --nj <nj> # number of parallel jobs"
echo " --cmd <cmd> # Command to run in parallel with"
echo " --beam <beam> # Decoding beam; default 15.0"
echo " --iter <iter> # Iteration of model to decode; default is final."
echo " --scoring-opts <string> # options to local/score.sh"
echo " --num-threads <n> # number of threads to use, default 1."
echo " --use-gpu <true|false> # default: false. If true, we recommend"
echo " # to use large --num-threads as the graph"
echo " # search becomes the limiting factor."
exit 1;
fi
I guess I cannot get timestamps with this script. Can I modify these code or use another bash script? I found there are many script in \egs\wsj\s5\steps\nnet3.
Thanks!

Dazhu Qiu

unread,
Aug 26, 2019, 2:47:56 AM8/26/19
to kaldi-help
You will get "lat.*.gz" file  your decode folder, right? Suppose you get lat.1, then you can use script 
lattice-1best ark:DECODE_FOLDER/lat.1 ark:- | lattice-align-words LANG_PATH/phones/word_boundary.int MODEL_PATH/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 LANG_PATH/words.txt > 1.ctm

to get the CTM file for lat.1. The format of CTM file is
utterance_id channel_num start_time duration word

For C++, actually, you can refer to the above script file to get the code for C++. I guess it may include function 
WordAlignLattice and CompactLatticeToWordAlignment




在 2019年8月26日星期一 UTC+8上午11:26:49,Li Ye写道:

Dazhu Qiu

unread,
Aug 26, 2019, 2:54:37 AM8/26/19
to kaldi-help
@Dan, I noticed that there so many questions about tool-chain from beginners, such how to get phone alignments, get posterior, get word alignments, etc. Do you think we need a FAQ page for some basic questions? Or if there is one such page but I do not know?


在 2019年8月26日星期一 UTC+8上午11:26:49,Li Ye写道:
This decode bash script is from \egs\wsj\s5\steps\nnet3, and it has the following required parameters:

Li Ye

unread,
Aug 26, 2019, 3:54:09 AM8/26/19
to kaldi-help
Hello Qiu:
Thanks for your replying. Currently I only have model_dir and decode_dir(I download this open source model from kaldi https://kaldi-asr.org/models/m10). Is LANG_PATH necessary in this script? In Cmusphinx I can get the timestamps from audio frame indices with a default frame frequency. What's the mechanism of Kaldi if I am going to get the timestamps of the decoded text from input audio?

在 2019年8月26日星期一 UTC+8下午2:47:56,Dazhu Qiu写道:

orum farhang

unread,
Aug 26, 2019, 6:15:54 AM8/26/19
to kaldi...@googlegroups.com
this is the standard script to get time stamps and confidences from lattice.

egs/wsj/s5/steps/conf/get_ctm_conf.sh

Have a look to this script to understand which files you need and what are the steps from lattice to ctm.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/92e6d333-95b9-493a-8c85-0c05d485527a%40googlegroups.com.

Dazhu Qiu

unread,
Aug 26, 2019, 6:32:57 AM8/26/19
to kaldi-help
As you see from the above script, it just requires word_boundary.int and words.txt. So just prepare the two files.

The input symbols of Lattice are transition-ids (varying from time to time, that is, frame-id/timestamps), the output symbols of Lattice are words. So all you need have been already contained in the Lattice, we just extract the information from it. For what is Lattice, please see the official document https://kaldi-asr.org/doc/lattices.html

在 2019年8月26日星期一 UTC+8下午3:54:09,Li Ye写道:

Li Ye

unread,
Aug 26, 2019, 8:13:22 AM8/26/19
to kaldi-help
Hello Farhang, thanks for your replying. 
Now I'm trying to modify bash scripts on my own. Everything is OK except one more question. In get_ctm_conf.sh:
# begin configuration section.
cmd=run.pl
stage=0
min_lmwt=5
max_lmwt=20
use_segments=true # if we have a segments file, use it to convert
# the segments to be relative to the original files.
iter=final
beam=5 # pruning beam before MBR decoding
#end configuration section.
Here it says cmd=run.pl
Which I don't know what exactly is run.pl. Am I supposed to write it on my own? or maybe it's just hiding in somewhere just like get_ctm_conf.sh?
And I found:
nj=$(cat $dir/num_jobs)
lats=$(for n in $(seq $nj); do echo -n "$dir/lat.$n.gz "; done) 
There should be outputs on the screen. But nothing happened. 

在 2019年8月26日星期一 UTC+8下午6:15:54,orum farhang写道:
this is the standard script to get time stamps and confidences from lattice.

egs/wsj/s5/steps/conf/get_ctm_conf.sh

Have a look to this script to understand which files you need and what are the steps from lattice to ctm.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

orum farhang

unread,
Aug 26, 2019, 8:35:35 AM8/26/19
to kaldi...@googlegroups.com
You don't need to change run.pl file. This file is for running your jobs in local machine (not SGE) and normaly you don't need to change it anyway.

Check the decoding directory to see if there is 'num_jobs' file which should contain a number (if you have used 8 jobs to decode your test set this file should contain 8). The next for loop will iterate from 1->8 and tries to find 'lat.1.gz and lat2.gz and ... lat.8.gz' under your working directory. If you don't have these files maybe it isn't finding any lattice file to do process. In this case you need to provide the name of your lattice file directly to the script.


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/ad230a71-0848-4009-be9b-b87b946a43cf%40googlegroups.com.

Daniel Povey

unread,
Aug 26, 2019, 3:32:34 PM8/26/19
to kaldi-help
RE a FAQ: I think one would be a good idea.  I'm creating a page in the documentation and we can gradually add content.


Li Ye

unread,
Aug 27, 2019, 8:01:46 AM8/27/19
to kaldi-help
Thanks. That is it. 
I'm trying to get phonemes from input audio. I could do this with python which can convert Chinese to Pinyin. However, I found this https://github.com/kaldi-asr/kaldi/blob/master/src/latbin/lattice-to-phone-lattice.cc .  Is there a secretly hidden kaldi bash script which can get phonemes directly? Maybe I should change output symbols of Lattice from words to phonemes?

Daniel Povey

unread,
Aug 27, 2019, 5:27:11 PM8/27/19
to kaldi-help
I don't think we've wrapped it at the script level, but if you do lattice-to-phone-lattice and then lattice-best-path with output to ark,t:-  and pipe into int2sym.pl, giving it the phones.txt instead of the words.txt, you may get what you want.  Or instead of lattice-best-path, pipe into lattice-1best and then nbest-to-ctm... you'd have to pipe that into int2sym.pl again.
Note: you will need the -f option to int2sym.pl to specify which fields to transform.


On Tue, Aug 27, 2019 at 5:01 AM Li Ye <kumo....@gmail.com> wrote:
Thanks. That is it. 
I'm trying to get phonemes from input audio. I could do this with python which can convert Chinese to Pinyin. However, I found this https://github.com/kaldi-asr/kaldi/blob/master/src/latbin/lattice-to-phone-lattice.cc .  Is there a secretly hidden kaldi bash script which can get phonemes directly? Maybe I should change output symbols of Lattice from words to phonemes?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages