Problem in Feature extraction

Sher Afghan Malik

unread,

Dec 27, 2017, 2:27:50 PM12/27/17

to kaldi-help

Dear I am new to kaldi and I am following this example.
https://groups.google.com/forum/#!msg/kaldi-help/tzyCwt7zgMQ/wvCLpVVpBgAJ

But when I run the code I encountered with this problem.
sam@ubuntu:~/kaldi/egs/Digit$ ./run.sh

===== PREPARING ACOUSTIC DATA =====

===== FEATURES EXTRACTION =====

Checking data/train/text ...
--> reading data/train/text
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
utils/validate_data_dir.sh: no such file data/train/feats.scp (if this is by design, specify --no-feats)
utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea.
   Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html
   for more information.
Checking data/test/text ...
--> reading data/test/text
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
utils/validate_data_dir.sh: no such file data/test/feats.scp (if this is by design, specify --no-feats)
steps/make_mfcc.sh --nj 1 --cmd run.pl data/train exp/make_mfcc/train mfcc
Checking data/train/text ...
--> reading data/train/text
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
Mal-formed spk2gender file
steps/make_mfcc.sh --nj 1 --cmd run.pl data/test exp/make_mfcc/test mfcc
utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea.
   Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html
   for more information.
Checking data/test/text ...
--> reading data/test/text
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
Mal-formed spk2gender file
steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train mfcc
make_cmvn.sh: no such file data/train/feats.scp
steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test mfcc
make_cmvn.sh: no such file data/test/feats.scp

Why it is not finding or making feast file? I am using below script.
#!/bin/bash

. ./path.sh || exit 1
. ./cmd.sh || exit 1

nj=1         # number of parallel jobs - 1 is perfect for such a small data set
lm_order=1     # language model order (n-gram quantity) - 1 is enough for digits grammar

# Safety mechanism (possible running this script with modified arguments)
. utils/parse_options.sh || exit 1
[[ $# -ge 1 ]] && { echo "Wrong arguments!"; exit 1; }

# Removing previously created data (from last run.sh execution)
rm -rf exp mfcc data/train/spk2utt data/train/cmvn.scp data/train/feats.scp data/train/split1 data/test/spk2utt data/test/cmvn.scp data/test/feats.scp data/test/split1 data/local/lang data/lang data/local/tmp data/local/dict/lexiconp.txt

echo
echo "===== PREPARING ACOUSTIC DATA ====="
echo

# Needs to be prepared by hand (or using self written scripts):
#
# spk2gender    [<speaker-id> <gender>]
# wav.scp    [<uterranceID> <full_path_to_audio_file>]
# text        [<uterranceID> <text_transcription>]
# utt2spk    [<uterranceID> <speakerID>]
# corpus.txt    [<text_transcription>]

# Making spk2utt files
utils/utt2spk_to_spk2utt.pl data/train/utt2spk > data/train/spk2utt
utils/utt2spk_to_spk2utt.pl data/test/utt2spk > data/test/spk2utt

echo
echo "===== FEATURES EXTRACTION ====="
echo

# Making feats.scp files
mfccdir=mfcc
utils/validate_data_dir.sh data/train     # script for checking if prepared data is all right
# utils/fix_data_dir.sh data/train          # tool for data sorting if something goes wrong above
utils/validate_data_dir.sh data/test     # script for checking if prepared data is all right
#utils/fix_data_dir.sh data/test          # tool for data sorting if something goes wrong above
steps/make_mfcc.sh --nj $nj --cmd "$train_cmd" data/train exp/make_mfcc/train $mfccdir
steps/make_mfcc.sh --nj $nj --cmd "$train_cmd" data/test exp/make_mfcc/test $mfccdir

Daniel Povey

unread,

Dec 27, 2017, 3:37:14 PM12/27/17

to kaldi-help

The first error seems to be
"Mal-formed spk2gender file"

that file is user-generated so likely that was the problem.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/94b7da72-e6de-43ff-a99f-98081a24e49f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sher Afghan Malik

unread,

Dec 31, 2017, 5:08:30 AM12/31/17

to kaldi-help

I have corrected spk2gender file issue was I wrote gender in capital letters now it is ok but still it is not making feast file.

Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html
for more information.
Checking data/test/text ...
--> reading data/test/text
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces

utils/validate_data_dir.sh: Successfully validated data-directory data/test
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
run.pl: job failed, log is in exp/make_mfcc/test/make_mfcc_test.1.log

steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train mfcc
make_cmvn.sh: no such file data/train/feats.scp
steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test mfcc
make_cmvn.sh: no such file data/test/feats.scp

Sher Afghan Malik

unread,

Dec 31, 2017, 6:31:10 AM12/31/17

to kaldi-help

Thanks all the issues are resolved

On Thursday, December 28, 2017 at 12:27:50 AM UTC+5, Sher Afghan Malik wrote:

Sher Afghan Malik

unread,

Dec 31, 2017, 7:04:28 AM12/31/17

to kaldi-help

what are good values for wer and ser
My values are
%WER 60.00 [ 6 / 10, 2 ins, 0 del, 4 sub ]
%SER 40.00 [ 4 / 10 ]
exp/tri1/decode/wer_11
%WER 50.00 [ 5 / 10, 1 ins, 0 del, 4 sub ]
%SER 40.00 [ 4 / 10 ]
exp/tri1/decode/wer_12
%WER 50.00 [ 5 / 10, 1 ins, 0 del, 4 sub ]
%SER 40.00 [ 4 / 10 ]
exp/tri1/decode/wer_13
%WER 50.00 [ 5 / 10, 1 ins, 0 del, 4 sub ]
%SER 40.00 [ 4 / 10 ]
exp/tri1/decode/wer_14
%WER 50.00 [ 5 / 10, 1 ins, 0 del, 4 sub ]
%SER 40.00 [ 4 / 10 ]
exp/tri1/decode/wer_15
%WER 50.00 [ 5 / 10, 1 ins, 0 del, 4 sub ]
%SER 40.00 [ 4 / 10 ]
exp/tri1/decode/wer_16
%WER 50.00 [ 5 / 10, 1 ins, 0 del, 4 sub ]
%SER 40.00 [ 4 / 10 ]
exp/tri1/decode/wer_17
%WER 50.00 [ 5 / 10, 1 ins, 0 del, 4 sub ]
%SER 40.00 [ 4 / 10 ]
exp/tri1/decode/wer_7
%WER 80.00 [ 8 / 10, 4 ins, 0 del, 4 sub ]
%SER 50.00 [ 5 / 10 ]
exp/tri1/decode/wer_8
%WER 60.00 [ 6 / 10, 2 ins, 0 del, 4 sub ]
%SER 40.00 [ 4 / 10 ]
exp/tri1/decode/wer_9
%WER 60.00 [ 6 / 10, 2 ins, 0 del, 4 sub ]
%SER 40.00 [ 4 / 10 ]

On Thursday, December 28, 2017 at 12:27:50 AM UTC+5, Sher Afghan Malik wrote:

Daniel Povey

unread,

Dec 31, 2017, 3:55:09 PM12/31/17

to kaldi-help

Lower WER/SER is better. But you only have 10 words in your test set, and that probably means your training set is pretty small. Normally you'd want tens of thousands of words for test, and hundreds of thousands for train-- ideally. The more the better.

--

Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/bf539f77-0704-41f8-b85b-1fb23811a6cd%40googlegroups.com.

Reply all

Reply to author

Forward