Dear I am new to kaldi and I am following this example.
https://groups.google.com/forum/#!msg/kaldi-help/tzyCwt7zgMQ/wvCLpVVpBgAJBut when I run the code I encountered with this problem.
sam@ubuntu:~/kaldi/egs/Digit$ ./run.sh
===== PREPARING ACOUSTIC DATA =====
===== FEATURES EXTRACTION =====
Checking data/train/text ...
--> reading data/train/text
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
utils/validate_data_dir.sh: no such file data/train/feats.scp (if this is by design, specify --no-feats)
utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea.
Search for the word 'bold' in
http://kaldi-asr.org/doc/data_prep.html for more information.
Checking data/test/text ...
--> reading data/test/text
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
utils/validate_data_dir.sh: no such file data/test/feats.scp (if this is by design, specify --no-feats)
steps/make_mfcc.sh --nj 1 --cmd
run.pl data/train exp/make_mfcc/train mfcc
Checking data/train/text ...
--> reading data/train/text
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
Mal-formed spk2gender file
steps/make_mfcc.sh --nj 1 --cmd
run.pl data/test exp/make_mfcc/test mfcc
utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea.
Search for the word 'bold' in
http://kaldi-asr.org/doc/data_prep.html for more information.
Checking data/test/text ...
--> reading data/test/text
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
Mal-formed spk2gender file
steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train mfcc
make_cmvn.sh: no such file data/train/feats.scp
steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test mfcc
make_cmvn.sh: no such file data/test/feats.scp
Why it is not finding or making feast file? I am using below script.
#!/bin/bash
. ./path.sh || exit 1
. ./cmd.sh || exit 1
nj=1 # number of parallel jobs - 1 is perfect for such a small data set
lm_order=1 # language model order (n-gram quantity) - 1 is enough for digits grammar
# Safety mechanism (possible running this script with modified arguments)
. utils/parse_options.sh || exit 1
[[ $# -ge 1 ]] && { echo "Wrong arguments!"; exit 1; }
# Removing previously created data (from last run.sh execution)
rm -rf exp mfcc data/train/spk2utt data/train/cmvn.scp data/train/feats.scp data/train/split1 data/test/spk2utt data/test/cmvn.scp data/test/feats.scp data/test/split1 data/local/lang data/lang data/local/tmp data/local/dict/lexiconp.txt
echo
echo "===== PREPARING ACOUSTIC DATA ====="
echo
# Needs to be prepared by hand (or using self written scripts):
#
# spk2gender [<speaker-id> <gender>]
# wav.scp [<uterranceID> <full_path_to_audio_file>]
# text [<uterranceID> <text_transcription>]
# utt2spk [<uterranceID> <speakerID>]
# corpus.txt [<text_transcription>]
# Making spk2utt files
utils/
utt2spk_to_spk2utt.pl data/train/utt2spk > data/train/spk2utt
utils/
utt2spk_to_spk2utt.pl data/test/utt2spk > data/test/spk2utt
echo
echo "===== FEATURES EXTRACTION ====="
echo
# Making feats.scp files
mfccdir=mfcc
utils/validate_data_dir.sh data/train # script for checking if prepared data is all right
# utils/fix_data_dir.sh data/train # tool for data sorting if something goes wrong above
utils/validate_data_dir.sh data/test # script for checking if prepared data is all right
#utils/fix_data_dir.sh data/test # tool for data sorting if something goes wrong above
steps/make_mfcc.sh --nj $nj --cmd "$train_cmd" data/train exp/make_mfcc/train $mfccdir
steps/make_mfcc.sh --nj $nj --cmd "$train_cmd" data/test exp/make_mfcc/test $mfccdir