Kaldi-help

2,908 views
Skip to first unread message

Patrice Yemmene

unread,
Jun 23, 2017, 1:18:58 AM6/23/17
to kaldi-help, Daniel Povey, Chiang, Chi-Lung
 Dear Dan,

Below are the results I am getting. Much better now, but I still need some guidance:

In the test and train folder in kaldi-trunk/egs/digits/data,  there is the same data in their respective utt2spk files. However, after I run the script, the data is wiped out of the utt2spk file in the train folder. This is why the  "Invalid line in utt2spk file:" error message is being thrown out. I am curious how to fix this.


Also, for the features extraction mono training as well as  MAKING G.fst , I am curious what I may have done wrong.

Any help would be appreciated.

Thank you





===== PREPARING ACOUSTIC DATA =====

Invalid line in utt2spk file:  
Invalid line in utt2spk file:  

===== FEATURES EXTRACTION =====

steps/make_mfcc.sh --nj 1 --cmd run.pl data/train exp/make_mfcc/train mfcc
utils/validate_data_dir.sh: empty file spk2utt
steps/make_mfcc.sh --nj 1 --cmd run.pl data/test exp/make_mfcc/test mfcc
utils/validate_data_dir.sh: empty file spk2utt
steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train mfcc
make_cmvn.sh: no such file data/train/feats.scp
steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test mfcc
make_cmvn.sh: no such file data/test/feats.scp

===== PREPARING LANGUAGE DATA =====

utils/prepare_lang.sh data/local/dict <UNK> data/local/lang data/lang
Checking data/local/dict/silence_phones.txt ...
--> reading data/local/dict/silence_phones.txt
--> data/local/dict/silence_phones.txt is OK

Checking data/local/dict/optional_silence.txt ...
--> reading data/local/dict/optional_silence.txt
--> data/local/dict/optional_silence.txt is OK

Checking data/local/dict/nonsilence_phones.txt ...
--> reading data/local/dict/nonsilence_phones.txt
--> ERROR: empty line in data/local/dict/nonsilence_phones.txt (line 18)
--> ERROR: empty line in data/local/dict/nonsilence_phones.txt (line 19)
--> ERROR: empty line in data/local/dict/nonsilence_phones.txt (line 20)

Checking disjoint: silence_phones.txt, nonsilence_phones.txt
--> disjoint property is OK.

Checking data/local/dict/lexicon.txt
--> reading data/local/dict/lexicon.txt
--> ERROR: data/local/dict/lexicon.txt contains Carriage Return (^M) characters.

Checking data/local/dict/extra_questions.txt ...
--> data/local/dict/extra_questions.txt is empty (this is OK)
--> ERROR validating dictionary directory data/local/dict (see detailed error messages above)

*Error validating directory data/local/dict*

===== LANGUAGE MODEL CREATION =====
===== MAKING lm.arpa =====


===== MAKING G.fst =====

./run.sh: line 93: arpa2fst: command not found

===== MONO TRAINING =====

steps/train_mono.sh --nj 1 --cmd run.pl data/train data/lang exp/mono
cat: data/lang/oov.int: No such file or directory



Below is the scrip I am running


#!/bin/bash
     . ./path.sh || exit 1
     . ./cmd.sh || exit 1
    nj=1       # number of parallel jobs - 1 is perfect for such a small data set
    lm_order=1 # language model order (n-gram quantity) - 1 is enough for digits grammar
     
    # Safety mechanism (possible running this script with modified arguments)
    . utils/parse_options.sh || exit 1
    [[ $# -ge 1 ]] && { echo "Wrong arguments!"; exit 1; }
    # Removing previously created data (from last run.sh execution)
    rm -rf exp mfcc data/train/spk2utt data/train/cmvn.scp data/train/feats.scp data/train/split1 data/test/spk2utt data/test/cmvn.scp data/test/feats.scp data/test/split1 data/local/lang data/lang data/local/tmp data/local/dict/lexiconp.txt
    
    echo
    echo "===== PREPARING ACOUSTIC DATA ====="
    echo
    
    # Needs to be prepared by hand (or using self written scripts):
    #
    # spk2gender  [<speaker-id> <gender>]
    # wav.scp     [<uterranceID> <full_path_to_audio_file>]
    # text           [<uterranceID> <text_transcription>]
    # utt2spk     [<uterranceID> <speakerID>]
    # corpus.txt  [<text_transcription>]
    
    # Making spk2utt files
    utils/utt2spk_to_spk2utt.pl data/train/utt2spk > data/train/spk2utt
    utils/utt2spk_to_spk2utt.pl data/test/utt2spk > data/test/spk2utt
    
    echo
    echo "===== FEATURES EXTRACTION ====="
    echo
    
    # Making feats.scp files
    mfccdir=mfcc
    # Uncomment and modify arguments in scripts below if you have any problems with data sorting
    # utils/validate_data_dir.sh data/train     # script for checking prepared data - here: for data/train directory
    # utils/fix_data_dir.sh data/train          # tool for data proper sorting if needed - here: for data/train directory
   steps/make_mfcc.sh --nj $nj --cmd "$train_cmd" data/train exp/make_mfcc/train $mfccdir
    steps/make_mfcc.sh --nj $nj --cmd "$train_cmd" data/test exp/make_mfcc/test $mfccdir
   
    # Making cmvn.scp files
    steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train $mfccdir
    steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test $mfccdir
    
    echo
    echo "===== PREPARING LANGUAGE DATA ====="
    echo
    
    # Needs to be prepared by hand (or using self written scripts):
    #
    # lexicon.txt           [<word> <phone 1> <phone 2> ...]
    # nonsilence_phones.txt    [<phone>]
    # silence_phones.txt    [<phone>]
    # optional_silence.txt  [<phone>]
    # Preparing language data
    utils/prepare_lang.sh data/local/dict "<UNK>" data/local/lang data/lang
    echo
    echo "===== LANGUAGE MODEL CREATION ====="
    echo "===== MAKING lm.arpa ====="
    echo
    loc=`which ngram-count`;
   if [ -z $loc ]; then
       if uname -a | grep 64 >/dev/null; then
               sdir=$KALDI_ROOT/tools/srilm/bin/i686-m64
       else
                       sdir=$KALDI_ROOT/tools/srilm/bin/i686
      fi
      if [ -f $sdir/ngram-count ]; then
                      echo "Using SRILM language modelling tool from $sdir"
                       export PATH=$PATH:$sdir
       else
                       echo "SRILM toolkit is probably not installed.
                              Instructions: tools/install_srilm.sh"
                       exit 1
       fi
   fi
    local=data/local
    mkdir $local/tmp
    ngram-count -order $lm_order -write-vocab $local/tmp/vocab-full.txt -wbdiscount -text $local/corpus.txt -lm $local/tmp/lm.arpa
    echo
    echo "===== MAKING G.fst ====="
    echo
    lang=data/lang
    arpa2fst --disambig-symbol=#0 --read-symbol-table=$lang/words.txt $local/tmp/lm.arpa $lang/G.fst
    echo
    echo "===== MONO TRAINING ====="
    echo
    
    steps/train_mono.sh --nj $nj --cmd "$train_cmd" data/train data/lang exp/mono  || exit 1
   
   echo
   echo "===== MONO DECODING ====="
   echo
   
   utils/mkgraph.sh --mono data/lang exp/mono exp/mono/graph || exit 1
   steps/decode.sh --config conf/decode.config --nj $nj --cmd "$decode_cmd" exp/mono/graph data/test exp/mono/decode
   
   echo
  echo "===== MONO ALIGNMENT ====="
   echo
   
   steps/align_si.sh --nj $nj --cmd "$train_cmd" data/train data/lang exp/mono exp/mono_ali || exit 1
   
   echo
   echo "===== TRI1 (first triphone pass) TRAINING ====="
   echo
   
   steps/train_deltas.sh --cmd "$train_cmd" 2000 11000 data/train data/lang exp/mono_ali exp/tri1 || exit 1
   
   echo
   echo "===== TRI1 (first triphone pass) DECODING ====="
   echo
   
   utils/mkgraph.sh data/lang exp/tri1 exp/tri1/graph || exit 1
   steps/decode.sh --config conf/decode.config --nj $nj --cmd "$decode_cmd" exp/tri1/graph data/test exp/tri1/decode
  
   echo
   echo "===== run.sh script is finished ====="
   echo

Daniel Povey

unread,
Jun 23, 2017, 1:23:13 AM6/23/17
to Patrice Yemmene, kaldi-help, Chiang, Chi-Lung
I haven't heard of this problem before; you'll have to run it step by
step and figure out which command removes utt2spk.
the problem with arpa2fst not found means you need to make sure Kaldi
is installed, and then ensure your path is set up. Because your
directory is one level shallower than normal experimental, path.sh
should refer to ../.. where it would normally refer to ../../..

Patrice Yemmene

unread,
Jun 23, 2017, 7:44:36 AM6/23/17
to Daniel Povey, kaldi-help, Chiang, Chi-Lung
Thank you. I will review what I may have missed

veerender reddy

unread,
Oct 20, 2017, 1:04:28 PM10/20/17
to kaldi-help
Respected Dan,
                       I was facing difficulty in mfcc extraction, following error occured when i tried for an4 dataset:

"steps/make_mfcc.sh --nj 1 --cmd  data/test exp/make_mfcc/test mfcc
steps/make_mfcc.sh: empty argument to --cmd option

steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test mfcc
make_cmvn.sh: no such file data/test/feats.scp
fix_data_dir.sh: kept all 130 utterances.
fix_data_dir.sh: old files are kept in data/test/.backup
steps/make_mfcc.sh --nj 1 --cmd  data/train exp/make_mfcc/train mfcc
steps/make_mfcc.sh: empty argument to --cmd option

steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train mfcc
make_cmvn.sh: no such file data/train/feats.scp
fix_data_dir.sh: kept all 948 utterances.
fix_data_dir.sh: old files are kept in data/train/.backup"

I am new to kaldi so unable to sort out, please help me out in this issue....

Daniel Povey

unread,
Oct 20, 2017, 1:09:45 PM10/20/17
to kaldi-help
Likely you did not set the "train_cmd" shell variable, should be something like
train_cmd=run.pl
train_cmd=queue.pl

check your cmd.sh.
That recipe does in run.sh:
. cmd.sh
instead of (what I recommend)
. ./cmd.sh
so if you have another file called cmd.sh on your path somewhere, it
wouldn't be invoking the current directory's cmd.sh.
Change the run.sh to say:
. ./cmd.sh
that might help.
> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> To post to this group, send email to kaldi...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kaldi-help/eaa5f33e-d194-4188-a78d-32610c91582b%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

veerender reddy

unread,
Oct 20, 2017, 1:21:50 PM10/20/17
to kaldi-help
Thanks...It worked out sir...I missed to execute . ./cmd.sh when i invoked terminal...

veerender reddy

unread,
Nov 16, 2017, 8:25:17 PM11/16/17
to kaldi-help


          Respected Dan,
                                   I had loaded dataset of timit in my path "/home/veerender/speech-datasets/TIMIT"  but, when i executed line "timit= /home/veerender/speech-datasets/TIMIT" after setting acoustic model parameters, i am getting following error.

"bash: /home/veerender/speech-datasets/TIMIT: Is a directory"

when i worked with an4 dataset i also placed dataset in similar path, it worked out but, in this case it is pointing as directory and loading variable to point to its location...may i know how to handle this...




Daniel Povey

unread,
Nov 16, 2017, 8:28:02 PM11/16/17
to kaldi-help
remove the space after '='


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

veerender reddy

unread,
Nov 17, 2017, 1:22:28 PM11/17/17
to kaldi-help


Respected Dan,
                           I tried to run "local/timit_data_prep.sh $timit"  and i encountered following error:


" wav-to-duration --read-entire-file=true scp:train_wav.scp ark,t:train_dur.ark

LOG (wav-to-duration[5.2.97~1-5846a2d]:main():wav-to-duration.cc:92) Printed duration for 3696 audio files.
LOG (wav-to-duration[5.2.97~1-5846a2d]:main():wav-to-duration.cc:94) Mean duration was 3.06336, min and max durations were 0.91525, 7.78881
awk: line 12: function gensub never defined  "


Is it related to train and test files unloading properly? I wish to know what these errors exactly mean, and how to rectify this......

Daniel Povey

unread,
Nov 17, 2017, 1:23:25 PM11/17/17
to kaldi-help, Karel Vesely
That is an awk version issue.  In general I prefer to use perl for things like this, as it has fewer compatibility problems.
Karel, do you have time to fix this?
Dan


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages