questions on mini_librispeech

Ri ki

unread,

Aug 1, 2017, 10:50:36 PM8/1/17

to kaldi-help

Hi,

I am trying to run the mini_librispeech. This is what I have done:

I ran the run.sh script. it finished successfully except for the last step.

# Train a chain model
if [ $stage -le 9 ]; then
  local/chain/run_tdnn.sh --stage 0
fi

At the end of the run, it gave me message that this step could only run if CUDA is configured(something like that). which made sense. i built my kaldi without the CUDA option. (i do not have a good enough GPU , so decided to skip that. i will probably run on GPU soon though for NN training)

my thinking is that since it finished successfully, i should be able to run the gmm decoder on it.

i took the following code snippet from decode.sh and ran it with online2-wav-gmm-latgen-faster decoder. but it complained about passing in the --global-cmvn-stats option.

   online2-wav-gmm-latgen-faster --do-endpointing=$do_endpointing \
     --config=$srcdir/conf/online_decoding.conf \
     --max-active=$max_active --beam=$beam --lattice-beam=$lattice_beam \
     --acoustic-scale=$acwt --word-symbol-table=$graphdir/words.txt \
     $graphdir/HCLG.fst $spk2utt_rspecifier "$wav_rspecifier" \
      "ark:|gzip -c > $dir/lat.JOB.gz" || exit 1;

i have the following questions:

1) is the above code snippet still valid? i.e can i run the online2-wav-gmm-latgen-faster decoder on the mini_librispeech recipe? and with the above options?

2) the run.sh command created many directories. i am trying to use the tri3b under the exp directory. the following gives a listing of dirs in exp/(just for info) :
make_mfcc mono mono_ali_train_clean_5 tri1 tri1_ali_train_clean_5 tri2b tri2b_ali_train_clean_5 tri3b tri3b_ali_train_clean_5

3) the above code snippet references online_decoding.conf file. I dont see it in any of my dirs. Is there a sample conf file for mini_librispeech (there is one decode.config file under s5/conf dir but it's empty)

4) also the above code snippet doesnt refernce a .mdl file. how would it know which model to use?

5) the mini_librispeech audio files are in .flac format. do i need to convert them to .wav before i pass them to the

online2-wav-gmm-latgen-faster for decoding? or would .flac files be ok?

6) also the above code snippet does not pass the

--global-cmvn-stats but i am getting an error about it when i tried to run the above code snippet. if it's required to pass that option, what should be the value for

--global-cmvn-stats?

sorry for so many questions. i just want to make sure i am doing it right.

Thanks in advance

Daniel Povey

unread,

Aug 1, 2017, 10:59:02 PM8/1/17

to kaldi-help

If you want to do GMM-based online decoding you can't do it with the
models trained conventionally, you need to build special models.
Search for example scripts that invoke a script called
prepare_online_decoding.sh and you'll find them.

> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Ri ki

unread,

Aug 1, 2017, 11:36:16 PM8/1/17

to kaldi-help, dpo...@gmail.com

Thanks Dan for the quick response.

so models trained "conventionally" are nnet models? or are they something else?

Daniel Povey

unread,

Aug 1, 2017, 11:42:24 PM8/1/17

to Ri ki, kaldi-help

I assume we're talking about GMM-based models since you have no GPU.
The training for our online setup for GMMs requires a different kind
of mean normalization.

Ri ki

unread,

Aug 1, 2017, 11:52:15 PM8/1/17

to kaldi-help, maria...@gmail.com, dpo...@gmail.com

i see...thanks again(pardon my ignorance..i am still learning)

Ri ki

unread,

Aug 2, 2017, 1:47:51 AM8/2/17

to kaldi-help, maria...@gmail.com, dpo...@gmail.com

Hi Dan,

As you suggested I searched for "prepare_online_decoding.sh" and looked at the scripts that are calling this. I found two types of scripts:

1) Most of the search results returned are those scripts which are nnet based (i believe these are not suitable for the GMM training and decoding i am trying to do)

2) the second types are not nnet based but the instructions in the script is exactly the same i found and used initially ( the code snippet i put in my first email)

i am confused now.

can anyone point me to a run.sh script that i can use for mini_librispeech example for GMM training and using the "online2-wav-gmm-latgen-faster" for decoding?

thanks again

Daniel Povey

unread,

Aug 2, 2017, 1:56:50 AM8/2/17

to Ri ki, kaldi-help

An example is

egs/rm/s5/local/online/run_gmm.sh

Ri ki

unread,

Aug 2, 2017, 2:11:33 AM8/2/17

to kaldi-help, maria...@gmail.com, dpo...@gmail.com

ahh..i missed this one( the "prepare_online_decoding" search returned 699 results...i missed this one..sorry Dan)

i will go over it and try to run the gmm decoding.

thanks again

Ri ki

unread,

Aug 3, 2017, 4:56:14 AM8/3/17

to kaldi-help, maria...@gmail.com, dpo...@gmail.com

Hi Dan,

I went over the run.sh and run_gmm.sh scripts from the "rm" directory and then also the run.sh from the mini_librispeech(I believe Daniel Galvez created this recipe, i saw his name and comments in the script file.hence assuming so).

I then tried to create a script that will do the online GMM training and decoding for the mini_librispeech data. I am not sure if I have done it right. When you (or Daniel Galvez) get a chance, take a look at the follwoing script and let me know if this works for the GMM online training and decoding? (if any of the steps are not needed please advise)

Thanks in advance



#!/bin/bash

data=data/

data_url=www.openslr.org/resources/31
lm_url=www.openslr.org/resources/11

. ./cmd.sh
. ./path.sh

stage=0
. utils/parse_options.sh

set -euo pipefail


mkdir -p $data
for part in dev-clean-2 train-clean-5; do
  local/download_and_untar.sh $data $data_url $part
done

local/download_lm.sh $lm_url data/local/lm

# format the data as Kaldi data directories
for part in dev-clean-2 train-clean-5; do
# use underscore-separated names in data directories.
local/data_prep.sh $data/LibriSpeech/$part data/$(echo $part | sed s/-/_/g)

local/prepare_dict.sh --stage 3 --nj 30 --cmd "$train_cmd" \
  data/local/lm data/local/lm data/local/dict_nosp

  utils/prepare_lang.sh data/local/dict_nosp \
    "<UNK>" data/local/lang_tmp_nosp data/lang_nosp

  local/format_lms.sh --src-dir data/lang_nosp data/local/lm
  # Create ConstArpaLm format language model for full 3-gram and 4-gram LMs
  utils/build_const_arpa_lm.sh data/local/lm/lm_tglarge.arpa.gz \
    data/lang_nosp data/lang_nosp_test_tglarge
#fi

featdir=mfcc

for x in dev_clean_2 train_clean_5 do
  steps/make_mfcc.sh --nj 8 --cmd "$train_cmd" data/$x exp/make_feat/$x $featdir
  steps/compute_cmvn_stats.sh data/$x exp/make_feat/$x $featdir
done

# Get the shortest 500 utterances first because those are more likely
# to have accurate alignments.
utils/subset_data_dir.sh --shortest data/train_clean_5 500 data/train_500short

steps/train_mono.sh --nj 4 --cmd "$train_cmd" data/train_500short data/lang exp/mono

utils/mkgraph.sh data/lang_nosp_test_tgsmall exp/mono exp/mono/graph_nosp_tgsmall

for test in dev_clean_2; do
    steps/decode.sh --config conf/decode.config --nj 20 --cmd "$decode_cmd" \
      exp/mono/graph_nosp_tgsmall data/$test exp/mono/decode_nosp_tgsmall_$test
done

# Get alignments from monophone system.
steps/align_si.sh --nj 8 --cmd "$train_cmd" \
  data/train_clean_5 data/lang_nosp exp/mono exp/mono_ali_train_clean_5


# train tri1 [first triphone pass]
steps/train_deltas.sh --cmd "$train_cmd" \
 2000 10000 data/train_clean_5 data/lang_nosp exp/mono_ali_train_clean_5 exp/tri1


# decode tri1
utils/mkgraph.sh data/lang_nosp_test_tgsmall exp/tri1 exp/tri1/graph_nosp_tgsmall
for test in dev_clean_2; do
    steps/decode.sh --config conf/decode.config --nj 20 --cmd "$decode_cmd" \
      exp/tri1/graph_nosp_tgsmall data/$test exp/tri1/decode_nosp_tgsmall_$test
    steps/lmrescore.sh --cmd "$decode_cmd" data/lang_nosp_test_{tgsmall,tgmed} \
        data/$test exp/tri1/decode_nosp_{tgsmall,tgmed}_$test
    steps/lmrescore_const_arpa.sh \
        --cmd "$decode_cmd" data/lang_nosp_test_{tgsmall,tglarge} \
        data/$test exp/tri1/decode_nosp_{tgsmall,tglarge}_$test
done


# align tri1
steps/align_si.sh --nj 8 --cmd "$train_cmd" \
  --use-graphs true data/train_clean_5 data/lang_nosp exp/tri1 exp/tri1_ali_train_clean_5


# train and decode tri2b [LDA+MLLT]
steps/train_lda_mllt.sh --cmd "$train_cmd" \
  --splice-opts "--left-context=3 --right-context=3" \
 2500 15000 data/train_clean_5 data/lang_nosp exp/tri1_ali_train_clean_5 exp/tri2b

#decode using the LDA+MLLT model
utils/mkgraph.sh data/lang_nosp_test_tgsmall exp/tri2b exp/tri2b/graph_nosp_tgsmall

for test in dev_clean_2; do
      steps/decode.sh --config conf/decode.config --nj 20 --cmd "$decode_cmd" \
       exp/tri2b/graph_nosp_tgsmall data/$test exp/tri2b/decode_nosp_tgsmall_$test
      steps/lmrescore.sh --cmd "$decode_cmd" data/lang_nosp_test_{tgsmall,tgmed} \
        data/$test exp/tri2b/decode_nosp_{tgsmall,tgmed}_$test
      steps/lmrescore_const_arpa.sh \
        --cmd "$decode_cmd" data/lang_nosp_test_{tgsmall,tglarge} \
        data/$test exp/tri2b/decode_nosp_{tgsmall,tglarge}_$test
done


# Align all data with LDA+MLLT system (tri2b)
steps/align_si.sh --nj 8 --cmd "$train_cmd" --use-graphs true \
   data/train_clean_5 data/lang_nosp exp/tri2b exp/tri2b_ali_train_clean_5

#  Do MMI on top of LDA+MLLT.
steps/make_denlats.sh --nj 8 --cmd "$train_cmd" \
  data/train_clean_5 data/lang_nosp exp/tri2b exp/tri2b_denlats
steps/train_mmi.sh data/train_clean_5 data/lang_nosp exp/tri2b_ali_train_clean_5 exp/tri2b_denlats exp/tri2b_mmi

for test in dev_clean_2; do
      steps/decode.sh --config conf/decode.config --iter 4 --nj 20 --cmd "$decode_cmd" \
    exp/tri2b/graph_nosp_tgsmall data/$test exp/tri2b_mmi/decode_it4
      steps/decode.sh --config conf/decode.config --iter 3 --nj 20 --cmd "$decode_cmd" \
    exp/tri2b/graph_nosp_tgsmall data/$test exp/tri2b_mmi/decode_it3
done

# Do the same with boosting.
steps/train_mmi.sh --boost 0.05 data/train_clean_5 data/lang_nosp \
   exp/tri2b_ali_train_clean_5 exp/tri2b_denlats exp/tri2b_mmi_b0.05

for test in dev_clean_2; do
      steps/decode.sh --config conf/decode.config --iter 4 --nj 20 --cmd "$decode_cmd" \
        exp/tri2b/graph_nosp_tgsmall data/$test exp/tri2b_mmi_b0.05/decode_it4
      steps/decode.sh --config conf/decode.config --iter 3 --nj 20 --cmd "$decode_cmd" \
        exp/tri2b/graph_nosp_tgsmall data/$test exp/tri2b_mmi_b0.05/decode_it3
done


# Do MPE.
steps/train_mpe.sh data/train_clean_5 data/lang_nosp exp/tri2b_ali_train_clean_5 exp/tri2b_denlats exp/tri2b_mpe

for test in dev_clean_2; do
      steps/decode.sh --config conf/decode.config --iter 4 --nj 20 --cmd "$decode_cmd" \
        exp/tri2b/graph_nosp_tgsmall data/$test exp/tri2b_mpe/decode_it4
      steps/decode.sh --config conf/decode.config --iter 3 --nj 20 --cmd "$decode_cmd" \
        exp/tri2b/graph_tgsmall data/$test exp/tri2b_mpe/decode_it3
done


## Do LDA+MLLT+SAT, and decode.
steps/train_sat.sh 2500 15000 data/train_clean_5 data/lang_nosp exp/tri2b_ali_train_clean_5 exp/tri3b

#decode using the tri3b model
utils/mkgraph.sh data/lang_nosp_test_tgsmall exp/tri3b exp/tri3b/graph_nosp_tgsmall

for test in dev_clean_2; do
      steps/decode_fmllr.sh --config conf/decode.config --nj 20 --cmd "$decode_cmd" \
      exp/tri3b/graph_nosp_tgsmall data/$test exp/tri3b/decode_nosp_tgsmall_$test
      steps/lmrescore.sh --cmd "$decode_cmd" data/lang_nosp_test_{tgsmall,tgmed} \
        data/$test exp/tri3b/decode_nosp_{tgsmall,tgmed}_$test
      steps/lmrescore_const_arpa.sh \
        --cmd "$decode_cmd" data/lang_nosp_test_{tgsmall,tglarge} \
        data/$test exp/tri3b/decode_nosp_{tgsmall,tglarge}_$test
done

### Not sure if this is needed???
#(
# utils/mkgraph.sh data/lang_ug exp/tri3b exp/tri3b/graph_ug
# steps/decode_fmllr.sh --config conf/decode.config --nj 20 --cmd "$decode_cmd" \
#   exp/tri3b/graph_nosp_ug data/test exp/tri3b/decode_ug
#)


# Align all data with LDA+MLLT+SAT system (tri3b)
steps/align_fmllr.sh --nj 8 --cmd "$train_cmd" --use-graphs true \
  data/train_clean_5 data/lang_nosp_tg_small exp/tri3b exp/tri3b_ali_train_clean_5


# Now we compute the pronunciation and silence probabilities from training data,
# and re-create the lang directory.
steps/get_prons.sh --cmd "$train_cmd" \
data/train_clean_5 data/lang_nosp exp/tri3b
utils/dict_dir_add_pronprobs.sh --max-normalize true \
data/local/dict_nosp \
exp/tri3b/pron_counts_nowb.txt exp/tri3b/sil_counts_nowb.txt \
exp/tri3b/pron_bigram_counts_nowb.txt data/local/dict

utils/prepare_lang.sh data/local/dict \
"<UNK>" data/local/lang_tmp data/lang

local/format_lms.sh --src-dir data/lang data/local/lm

utils/build_const_arpa_lm.sh \
data/local/lm/lm_tglarge.arpa.gz data/lang data/lang_test_tglarge

steps/align_fmllr.sh --nj 5 --cmd "$train_cmd" \
data/train_clean_5 data/lang exp/tri3b exp/tri3b_ali_train_clean_5


  # Test the tri3b system with the silprobs and pron-probs.

  # decode using the tri3b model
utils/mkgraph.sh data/lang_test_tgsmall \
           exp/tri3b exp/tri3b/graph_tgsmall
for test in dev_clean_2; do
steps/decode_fmllr.sh --nj 10 --cmd "$decode_cmd" \
                  exp/tri3b/graph_tgsmall data/$test \
                  exp/tri3b/decode_tgsmall_$test
steps/lmrescore.sh --cmd "$decode_cmd" data/lang_test_{tgsmall,tgmed} \
               data/$test exp/tri3b/decode_{tgsmall,tgmed}_$test
steps/lmrescore_const_arpa.sh \
--cmd "$decode_cmd" data/lang_test_{tgsmall,tglarge} \
data/$test exp/tri3b/decode_{tgsmall,tglarge}_$test
done



## MMI on top of tri3b (i.e. LDA+MLLT+SAT+MMI)
steps/make_denlats.sh --config conf/decode.config \
   --nj 8 --cmd "$train_cmd" --transform-dir exp/tri3b_ali_train_clean_5 \
  data/train_clean_5 data/lang_nosp exp/tri3b exp/tri3b_denlats
steps/train_mmi.sh data/train_clean_5 data/lang_nosp exp/tri3b_ali_train_clean_5 exp/tri3b_denlats exp/tri3b_mmi

for test in dev_clean_2; do
      steps/decode_fmllr.sh --config conf/decode.config --nj 20 --cmd "$decode_cmd" \
      --alignment-model exp/tri3b/final.alimdl --adapt-model exp/tri3b/final.mdl \
       exp/tri3b/graph_nosp_tgsmall data/$test exp/tri3b_mmi/decode_nosp_tgsmall_$test

     # Do a decoding that uses the exp/tri3b/decode directory to get transforms from.
     steps/decode.sh --config conf/decode.config --nj 20 --cmd "$decode_cmd" \
       --transform-dir exp/tri3b/decode_nosp_tgsmall_$test  exp/tri3b/graph_nosp_tgsmall data/$test exp/tri3b_mmi/decode2_nosp_tgsmall_$test
done


#call prepare_online_decoding.sh
steps/online/prepare_online_decoding.sh --cmd "$train_cmd" data/train_clean_5 data/lang_nosp \
  exp/tri3b exp/tri3b_mmi/final.mdl exp/tri3b_online/ || exit 1;

#online decoding
for test in dev_clean_2; do
    steps/online/decode.sh --config conf/decode.config --cmd "$decode_cmd" --nj 20 exp/tri3b/graph_nosp_tgsmall \
        data/$test exp/tri3b_online/decode_$test
done

Daniel Galvez

unread,

Aug 3, 2017, 11:07:26 AM8/3/17

to kaldi-help

Hi Ri,

The quick answer is that, yes it looks reasonable to me. You need to first call a prepare for online decoding script (the point is to set things up for things like on-the-fly cepstral mean normalization [not sure if this is the exact feature transform done]), which you do.

The best way to self check is to see if the word error rates from the online decodings are similar to the offline decodings for the same model. Look at RESULTS (in the same directory as run.sh) to see how to get WERs.

Daniel

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Daniel Galvez

Ri ki

unread,

Aug 3, 2017, 5:47:49 PM8/3/17

to kaldi-help

Thank you Daniel Galvez for your quick response(please do let me know if you see any issues with the script like may be I am calling certain commands which many not be needed or if I am calling them in a wrong order).

I will now run the commands one by one and will update here how it goes(maybe someone else will find it useful).

Thanks again to both of you.

Daniel Galvez

Ri ki

unread,

Aug 3, 2017, 6:58:50 PM8/3/17

to kaldi-help

Hi Daniel Galvez,

quick question.out of curiosity. would your recipe for mini_librispeech work for offline decoding with the online2-wav-gmm-latgenfaster decoder by passing in online=false option? In that recipe I havent run the last step (i.e step 9 which is to train a chain model)

so if i run the recipe just upto step 8, will it work for offline decoding with online2-wav-gmm-latgenfaster decoder (by passing in online=false)?

thanks

Daniel Galvez

unread,

Aug 3, 2017, 9:40:33 PM8/3/17

to kaldi-help

Hi Ri,

Yes, you can use online2-wav-gmm-latgen-faster if you've only done up to stage 8. But as Dan has mentioned before, you need to call the script steps/online/prepare_online_decoding.sh in order to prepare a model for the online2-* binaries, including online2-wav-gmm-lastgenfaster. The output model directory is what you'd pass into an online2 decoder.

The --online=false option basically just estimates any important statistics for adaptation (CMVN, ivector, etc.) based on the whole waveform, instead of doing online estimation procedures. Results from that style of thing is not really useful, given that in normal online decoding, you can't estimate those adaptation statistics from the whole waveform. I'm guessing it's there only for debugging purposes, e.g., checking if your online estimation procedure is doing well relative to the offline estimation procedure.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Daniel Galvez

Ri ki

unread,

Aug 3, 2017, 11:51:01 PM8/3/17

to kaldi-help

ok.thank you Daniel. That is helpful. i will work on it.
(i will call the prepare_online_decoding.sh after the step 8 from your recipe as well,apart from what i am currently working on, and see how it goes)

thanks again

Ri ki

unread,

Aug 4, 2017, 8:06:20 PM8/4/17

to kaldi-help

Hi Dan/Daniel,

I am at the following step:

## MMI on top of tri3b (i.e. LDA+MLLT+SAT+MMI)
steps/make_denlats.sh --config conf/decode.config \
   --nj 8 --cmd "$train_cmd" --transform-dir exp/tri3b_ali_train_clean_5 \
  data/train_clean_5 data/lang_nosp exp/tri3b exp/tri3b_denlats

and i get the following error:

...
...
steps/make_denlats.sh: feature type is lda
steps/make_denlats.sh: using fMLLR transforms from exp/tri3b_ali_train_clean_5
steps/make_denlats.sh: mismatch in number of jobs with exp/tri3b_ali_train_clean_5

what is the "mismatch in number of jobs" related to?

Daniel Povey

unread,

Aug 4, 2017, 8:11:37 PM8/4/17

to kaldi-help

You should use the same --nj value as you used to get the fMLLR
transforms in the 'transform_dir'. You should have been able to work
that out from the script.

Ri ki

unread,

Aug 5, 2017, 3:35:28 PM8/5/17

to kaldi-help, dpo...@gmail.com

thanks Dan..(sorry i missed that..)

thanks again

Ri ki

unread,

Aug 7, 2017, 3:31:26 AM8/7/17

to kaldi-help, dpo...@gmail.com

Hi Dan/Daniel,

So after I trained the mini_librispeech(both using the script i created and using Daniel's recipe as it is till the step 8 and then calling the prepare_online_decoding.sh and decode.sh like so:

#call prepare_online_decoding.sh
steps/online/prepare_online_decoding.sh --cmd "$train_cmd" data/train_clean_5 data/lang_nosp \


  exp/tri3b exp/tri3b_online/ || exit 1;



#online decoding
for test in dev_clean_2; do
    steps/online/decode.sh --config conf/decode.config --cmd "$decode_cmd" --nj 20 exp/tri3b/graph_nosp_tgsmall \
        data/$test exp/tri3b_online/decode_$test
done

I got worse error rate for the online decoding ( i remember Daniel saying that the online error should match the offline error rates.

Here is the relevant info.

%WER 48.51 [ 9768 / 20138, 447 ins, 2260 del, 7061 sub ] exp/mono/decode_nosp_tgsmall_dev_clean_2/wer_8_0.0
%WER 21.01 [ 4230 / 20138, 489 ins, 575 del, 3166 sub ] exp/tri1/decode_nosp_tglarge_dev_clean_2/wer_13_0.0
%WER 24.98 [ 5030 / 20138, 398 ins, 948 del, 3684 sub ] exp/tri1/decode_nosp_tgmed_dev_clean_2/wer_15_0.0
%WER 27.52 [ 5541 / 20138, 433 ins, 1053 del, 4055 sub ] exp/tri1/decode_nosp_tgsmall_dev_clean_2/wer_14_0.0
%WER 18.67 [ 3759 / 20138, 452 ins, 516 del, 2791 sub ] exp/tri2b/decode_nosp_tglarge_dev_clean_2/wer_15_0.0
%WER 22.42 [ 4515 / 20138, 397 ins, 820 del, 3298 sub ] exp/tri2b/decode_nosp_tgmed_dev_clean_2/wer_16_0.0
%WER 24.55 [ 4944 / 20138, 375 ins, 965 del, 3604 sub ] exp/tri2b/decode_nosp_tgsmall_dev_clean_2/wer_16_0.0
%WER 13.39 [ 2696 / 20138, 380 ins, 328 del, 1988 sub ] exp/tri3b/decode_nosp_tglarge_dev_clean_2/wer_16_0.0
%WER 16.34 [ 3291 / 20138, 351 ins, 481 del, 2459 sub ] exp/tri3b/decode_nosp_tgmed_dev_clean_2/wer_16_0.0
%WER 17.88 [ 3600 / 20138, 363 ins, 561 del, 2676 sub ] exp/tri3b/decode_nosp_tgsmall_dev_clean_2/wer_15_0.0
%WER 24.70 [ 4974 / 20138, 430 ins, 907 del, 3637 sub ] exp/tri3b/decode_nosp_tgsmall_dev_clean_2.si/wer_16_0.0
%WER 13.14 [ 2647 / 20138, 377 ins, 309 del, 1961 sub ] exp/tri3b/decode_tglarge_dev_clean_2/wer_15_0.5
%WER 15.87 [ 3195 / 20138, 376 ins, 413 del, 2406 sub ] exp/tri3b/decode_tgmed_dev_clean_2/wer_17_0.0
%WER 17.36 [ 3495 / 20138, 371 ins, 500 del, 2624 sub ] exp/tri3b/decode_tgsmall_dev_clean_2/wer_17_0.0
%WER 23.81 [ 4794 / 20138, 519 ins, 701 del, 3574 sub ] exp/tri3b/decode_tgsmall_dev_clean_2.si/wer_15_0.0
%WER 21.87 [ 4405 / 20138, 629 ins, 504 del, 3272 sub ] exp/tri3b_online/decode_dev_clean_2/wer_17_0.0

as u can see(i have highlighted the best offline decoding error rate which is 13.14% and the online decoding which is 21.87%) the online decoding error rate is worse than the offline decoding.

where did i make mistakes? ( i got the same error rates for Daniel's recipe as I said above. i commented out the step 9 which is the nnet step and instead ran the prepare_online_decoding.sh followed by the decode script as mentioned above)

thanks in advance

Daniel Povey

unread,

Aug 7, 2017, 2:17:22 PM8/7/17

to Ri ki, kaldi-help

You are not comparing with the correct baseline, because you decoded
with the 'tgsmall' graph and didn't do LM rescoring.
The speaker-independent and speaker-dependent decodings for the
tgsmall graph are 24.7 and 17.9, and the online decoding is 21.9 which
is in between the two, so it's not all that surprising. For
GMM-based models there will be a performance gap between online and
offline decoding, especially if you were getting a lot of improvement
from adaptation.

Dan

Ri ki

unread,

Aug 7, 2017, 5:45:29 PM8/7/17

to kaldi-help, maria...@gmail.com, dpo...@gmail.com

Thanks Dan. I kind of suspected that I was not comparing the correct results( large vs small. thanks for clarifying). I will run the decoding on large and see what the WER would be.

Regards to the lmrescore , there was rescoring done after the decode_fmllr was run( a step ahead of my online decoding). I thought that would have been sufficient. may be my understanding of lmrecoring is not adequate. I will read up on it and attempt rescoring again to see if it improves results.

thanks again

Ri ki

unread,

Aug 14, 2017, 1:28:38 AM8/14/17

to kaldi-help, maria...@gmail.com, dpo...@gmail.com

Hi Dan,

Is there a way to run the mini_librispeech nnet recipe on CPUs instead of GPU?

When I ran the the default run.sh a while back it errored out stating that it needs to be run on GPU. The error message I noticed is in run_tdnn.ch file.

Then today I came across an old message post stating that the nnet scripts could be run on CPUs by using the flag --use-gpu=false.

I looked at the run.sh and run_tdnn.sh but I dont see them taking this option anymore. Is this intended?

Also in the run_tdnn.sh , --use-gpu=true is being passed to steps/nnete3/chain/train.py by default.

Would it be ok to make this --use-gpu=false to make it run on CPUs instead of GPUs? Or would it essentially mess up the training and the models?

Thanks in advance

Daniel Povey

unread,

Aug 14, 2017, 1:50:01 AM8/14/17

to Ri ki, kaldi-help

In the inner scripts the option is supported, but it's so slow that we
don't tend to do it.

Ri ki

unread,

Aug 14, 2017, 2:06:17 AM8/14/17

to kaldi-help, maria...@gmail.com, dpo...@gmail.com

ok. got it.

would a single Nvidia GPU (NVIDIA Tesla K80) be ok to run the mini_librispeech recipe?(they are quite expensive to rent on AWS:)

Thanks

>>> >>>...

Daniel Galvez

unread,

Aug 14, 2017, 2:34:31 AM8/14/17

to kaldi-help

Ri,

My gut instinct says that yes, one K80 is enough. We are doing cross-entropy and lattice-free MMI training, which are quite speedy on GPUs. I don't have a time estimate off the top of my head, but it will certainly be may hours (one day?).

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Daniel Galvez

Ri ki

unread,

Aug 14, 2017, 2:52:58 AM8/14/17

to kaldi-help

thanks Daniel. One day seems fine for the mini_librispeech.

I am guessing the whole librispeech recipe will take more than one day. Probably a week?? any estimate?

also are there any precautions to be taken so that I do the training correctly the first time(so as not to waste the precious time::)

Thanks again

...

Daniel Galvez

unread,

Aug 14, 2017, 2:59:47 AM8/14/17

to kaldi-help

mini_librispeech is intended to evaluate very quickly. The point of it was to do sanity checks to make sure that that error rates don't suddenly go up after big changes in the library.

If you want to minimize mistakes, make sure that you use no more jobs than the number of GPUs you have, put the GPU in process-exclusive compute mode (done via nvidia-smi, google it), and make sure that your $nj variable in the bash scripts is consistent (this is probably a place where Kaldi could be better, as there are sometimes hard-coded number-of-jobs fields.

--

Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Daniel Galvez

Ri ki

unread,

Aug 14, 2017, 3:01:31 AM8/14/17

to kaldi-help

ok. thank you Daniel.

Daniel Povey

unread,

Aug 14, 2017, 2:59:59 PM8/14/17

to Ri ki, kaldi-help

Yes but you'll have to change the --num-jobs-initial and
--num-jobs-final in the script (e.g. local/chain/run_tnn.sh) to be no
greater than the number of visible GPUs, as output by `nvidia-smi`.

Also you have to install the CUDA toolkit yourself unless the image
comes pre-loaded with that.

Daniel Povey

unread,

Aug 14, 2017, 3:58:03 PM8/14/17

to Ri ki, kaldi-help

Something else: if you reduce the --num-jobs-{initial,final} by a
large amount, you shoudl reduce the number of epochs, e.g. try halving
it, because with fewer jobs the model will optimize faster and will
overfit too soon.

Ri ki

unread,

Aug 14, 2017, 4:29:30 PM8/14/17

to kaldi-help, maria...@gmail.com, dpo...@gmail.com

ok.got it. thanks Dan.

Ri ki

unread,

Aug 14, 2017, 4:33:18 PM8/14/17

to kaldi-help, maria...@gmail.com, dpo...@gmail.com

right. i realized that when I tried to run the nvidia-smi command. it came back as command not found.

so i need to install the nvidia drivers as well as CUDA tool kit.

the CUDA tool kit i might have to install is 8.0 and it requires gcc version >= 5.0. So i need to upgrade the gcc(and any other dependencies) version as well i am thinking.

Daniel Povey

unread,

Aug 14, 2017, 4:40:57 PM8/14/17

to Ri ki, kaldi-help

We have the NVidia toolkit here at Hopkins at version 8.0, and gcc is
4.9.2-10. You shouldn't be compiling the NVidia toolkit, you should
just run the run.sh or whatever it's called, from the NVidia people,
and install it as binaries. At least that's what I recall. There
shouldn't be a dependency on gcc >= 5.0.

Daniel Povey

unread,

Aug 14, 2017, 4:51:41 PM8/14/17

to Ri ki, kaldi-help

Yenda said off-list:
"I think the kernel module has to be always compiled (and might demand
the same compiler version as the kernel was compiled with... or
something like that)".
I'd be surprised if your system had a gcc installed that's different
from that used to compile the kernel, unless you selected it manually.
It could be something else though. You could show us the error mesage.

Ri ki

unread,

Aug 14, 2017, 5:47:56 PM8/14/17

to kaldi-help, maria...@gmail.com, dpo...@gmail.com

oh..good to know Dan. you saved me a lot of time. thanks again.

This is the reason why i was thinking i needed to upgrade my gcc.

So, the NVIDIA drivers for Tesla K80 needs CUDA 8.0(as per this page):

http://www.nvidia.com/download/driverResults.aspx/118962/en-us (my linux is debian jesse, but i did not find one for Debian Jesse, so i decided on Ubuntu 16.04 version of the driver to download)

I will be downloading the CUDA toolkit from the following:

https://developer.nvidia.com/cuda-downloads

(But somewhere I read that CUDA 8.0 requires gcc >= 5.0. May be as you said it's required only if I am doing development on CUDA but not needed if I am just using the toolkit)

Ri ki

unread,

Aug 14, 2017, 7:12:39 PM8/14/17

to kaldi-help, maria...@gmail.com, dpo...@gmail.com

ok.so this is the link i was referring to. i might have misread it. as per this, it is recommended to use GCC >= 5.x but earlier versions work as well.

https://unix.stackexchange.com/questions/218163/how-to-install-cuda-toolkit-7-8-9-on-debian-8-jessie-or-9-stretch

Daniel Povey

unread,

Aug 14, 2017, 7:15:38 PM8/14/17

to Ri ki, kaldi-help

Oh yes-- the NVidia people give out a table with super-restrictive
versions that they say CUDA will work with, but in practice it will
work with a lot of versions. I think it comes down to which versions
they bothered to test with, which with some distributions of Linux is
just a single version of gcc.

Ri ki

unread,

Aug 14, 2017, 7:26:03 PM8/14/17

to kaldi-help, maria...@gmail.com, dpo...@gmail.com

ok. thanks

Ri ki

unread,

Aug 14, 2017, 11:30:42 PM8/14/17

to kaldi-help, maria...@gmail.com, dpo...@gmail.com

so finally i finished installing the NVIDIA drivers and the CUDA toolkit successfully.

And Yenda was right. I had to install the gcc version that my linux kernel was built with(which happened to be 4.8.2). The gcc version 4.9.2 was installed during the kaldi make process(extras/check_dependencies.sh script).

Just have one quick question.

So as long as I have the nvcc in my PATH, when I do make in kaldi/src, it should compile for GPU , right? I dont have to set any additional env variables or flags anywhere else, correct?(just want to make sure i am doing it correctly so as not to waste any time).

Thanks again.

Daniel Povey

unread,

Aug 14, 2017, 11:32:17 PM8/14/17

to Ri ki, kaldi-help

Yes, if nvcc is in your path it will compile for GPU.

Daniel Povey

unread,

Aug 14, 2017, 11:32:54 PM8/14/17

to Ri ki, kaldi-help

..to be precise, if nvcc is on your path *when you run ./configure*

it will compile for GPU.

Ri ki

unread,

Aug 15, 2017, 5:05:39 PM8/15/17

to kaldi-help, maria...@gmail.com, dpo...@gmail.com

Hi Dan / Daniel

So as you both suggested I reduced the number of jobs to 1 (as I have only one GPU) and reduced the epochs to 5 (halved it as Dan suggested).

I had a couple of issues - at around step 6, it exited , mostly due to memory issues. I restarted it and then once it was in the run_tdnn.sh, it failed as it required sox. Once i installed the sox and restarted the tdnn part, it completed successfully.

I have a question about WER.

The RESULTS file has :

%WER 13.35 [ 2689 / 20138, 318 ins, 491 del, 1880 sub ] exp/chain/tdnn1a_sp/decode_tglarge_dev_clean_2/wer_9_0.5

while the best I got is:

%WER 16.17 [ 3256 / 20138, 391 ins, 482 del, 2383 sub ] exp/chain/tdnn1c_sp/decode_tglarge_dev_clean_2/wer_10_0.5
%WER 16.06 [ 3234 / 20138, 354 ins, 532 del, 2348 sub ] exp/chain/tdnn1c_sp_online/decode_tglarge_dev_clean_2/wer_11_0.5

I hope I am comparing the right ones this time(both the offline and online are approximately the same for my experiment).

So there is about 3% (more like ~2.8%) difference(mine was worse).

Is the difference within expected range?

Or is it because I had only 5 epochs?

Thank you.

Daniel Povey

unread,

Aug 15, 2017, 5:06:51 PM8/15/17

to Ri ki, kaldi-help

What is the output of chain_dir_info.pl on the directory and how does
it compare with what is in the checked-in script? That will show
whether you are undertraining (-> need more epochs) or overtraining
(->need fewer epochs).

Ri ki

unread,

Aug 15, 2017, 5:38:24 PM8/15/17

to kaldi-help, maria...@gmail.com, dpo...@gmail.com

Hi Dan,

Here is the info you had asked:

From my run:

exp/chain/tdnn1c_sp: num-iters=30 nj=1..1 num-params=7.0M dim=40+100->2337 combine=-0.039->-0.037 xent:train/valid[19,29,final]=(-0.972,-0.741,-0.718/-1.62,-1.64,-1.60) logprob:train/valid[19,29,final]=(-0.043,-0.035,-0.034/-0.108,-0.117,-0.113)

from the checked in values:

exp/chain/tdnn1c_sp: num-iters=17 nj=2..5 num-params=7.0M dim=40+100->2353 combine=-0.061->-0.050 xent:train/valid[10,16,final]=(-1.56,-1.17,-1.06/-1.85,-1.53,-1.46) logprob:train/valid[10,16,final]=(-0.081,-0.053,-0.046/-0.120,-0.096,-0.090)

Also while looking for this, I came upon the actual checked in WER for tdnn1c_sp script which is what I ran. This one has about 6% difference(mine is worse)

The checked in WER rates for tdnn1c_sp:

WER dev_clean_2 (tglarge)           10.45              <---- mine is 16.17
             [online:]              10.56              <---- mine is 16.06

Thank you

Daniel Povey

unread,

Aug 15, 2017, 5:51:09 PM8/15/17

to Ri ki, kaldi-help

Yours is overtraining more. Try with about 2/3 the epochs you used.

Ri ki

unread,

Aug 15, 2017, 5:55:22 PM8/15/17

to kaldi-help, maria...@gmail.com, dpo...@gmail.com

ok. so I was using 5. I will use just 3 then?

Also, can i just restart at the tdnn step? or do I need to do the whole run.sh from step 0? (i am assuming running the run_tdnn.sh with epochs=3 should be ok, correct?)

thanks

Daniel Povey

unread,

Aug 15, 2017, 5:56:41 PM8/15/17

to kaldi-help, Ri ki

You can run the run_tdnn.sh script with --stage equal to whatever
stage of the run_tdnn.sh corresponds to the train.py call.

Ri ki

unread,

Aug 15, 2017, 5:59:33 PM8/15/17

to Daniel Povey, kaldi-help

ok. thanks once again

> email to kaldi-help+unsubscribe@googlegroups.com.

Ri ki

unread,

Aug 15, 2017, 8:30:07 PM8/15/17

to kaldi-help, dpo...@gmail.com

HI Dan,

So the new results are as follows:

exp/chain/tdnn1c_sp: num-iters=30 nj=1..1 num-params=7.0M dim=40+100->2337 combine=-0.046->-0.044 xent:train/valid[19,29,final]=(-0.972,-0.741,-0.968/-1.62,-1.64,-1.63) logprob:train/valid[19,29,final]=(-0.043,-0.035,-0.039/-0.108,-0.117,-0.104)


%WER 14.23 [ 2865 / 20138, 328 ins, 436 del, 2101 sub ] exp/chain/tdnn1c_sp/decode_tglarge_dev_clean_2/wer_9_1.0
%WER 14.32 [ 2884 / 20138, 327 ins, 440 del, 2117 sub ] exp/chain/tdnn1c_sp_online/decode_tglarge_dev_clean_2/wer_9_1.0

Little bit better than the last run but still about 4% worse.

any suggestions?

thanks

Daniel Povey

unread,

Aug 15, 2017, 9:21:00 PM8/15/17

to Ri ki, kaldi-help

I think those diagnostics are not completely correct because you
overwrote the original directory and there were log files there.
Anyway mini_librispeech is just to show that things can run, it's not
a serious task so I'm not really bothered about this. You can
reproduce the results if you install GridEngine properly but don't
expect me to help much.

Ri ki

unread,

Aug 15, 2017, 9:45:48 PM8/15/17

to kaldi-help, maria...@gmail.com, dpo...@gmail.com

sounds good. thanks Dan.

Ri ki

unread,

Aug 15, 2017, 10:32:37 PM8/15/17

to kaldi-help, maria...@gmail.com, dpo...@gmail.com

Hi Dan,

These questions are more out of curiosity to better understand the kaldi process.

Acoustic Models:

1. The acoustic models in case of nnet is about 28 MB while that of the gmm model is about 5.5 MB. What makes the nnet models larger size?

HCLG.fst

2. In both the cases(nnet vs gmm) the HCLG.fst is approximately the same sized (about 550 MB). So if we are composing with the nnet model, shouldnt we see the HCLG.fst for nnet bit little larger(to account for the 28MB AM size vs 5.5 MB AM for gmm)?

(If you can just point me to some documentation about these instead I can read them to better understand these concepts)

Thanks for patiently answering all the questions.

>>> >> >> >> &gt

Daniel Povey

unread,

Aug 15, 2017, 10:33:40 PM8/15/17

to Ri ki, kaldi-help

too many questions.

Ri ki

unread,

Aug 15, 2017, 10:38:36 PM8/15/17

to Daniel Povey, kaldi-help

sorry::))

i will try to figure them out.

thanks Dan.

Reply all

Reply to author

Forward