Error in making MFCC features

92 views
Skip to first unread message

Sidrah Azhar

unread,
Jul 18, 2018, 1:54:12 AM7/18/18
to kaldi-help
Hi,
I have 23 speakers in total, 12 for training and 11 for test data. My data/local/train.spk2utt (Attached) file is in right format and data/local/spk2gender (Attached) has all speakers i.e train and test. But when i reach this piece of code

# Now make MFCC features.
# mfccdir should be some place with a largish disk where you
# want to store MFCC features.
mfccdir=${DATA_ROOT}/mfcc
for x in train test; do
 steps/make_mfcc.sh --cmd "$train_cmd" --nj $njobs \
   data/$x exp/make_mfcc/$x $mfccdir || exit 1;
 steps/compute_cmvn_stats.sh data/$x exp/make_mfcc/$x $mfccdir || exit 1;
done

The utils/validate_data_dir.sh gives me this error:

utils/validate_data_dir.sh: Error: in data/train, speaker lists extracted from spk2utt and spk2gender
utils/validate_data_dir.sh: differ, partial diff is:
1,12d0
< ahmed
< ayesha
< fatima
< iqra
< izza
...
< khatija
< laiba
< lozina
< mubarra
< muneeb
< shiza
[Lengths are /tmp/kaldi.2TxI/speakers=12 versus /tmp/kaldi.2TxI/speakers.spk2gender=0]

Any help is appreciated!
Thanks.


train.spk2utt
spk2gender
data-local_directory.png

Daniel Povey

unread,
Jul 18, 2018, 1:55:32 AM7/18/18
to kaldi-help
looks like you have a spk2gender file that's empty.  either delete it or create a valid one.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/ae2983b8-8567-4a50-8525-bbeb3126e58b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sidrah Azhar

unread,
Jul 18, 2018, 2:01:59 AM7/18/18
to kaldi-help
I checked and spk2gender file is not empty. I am stuck in this error for almost 2 days and i cant get it that what am i doing wrong.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
spk2gender.png

Sidrah Azhar

unread,
Jul 18, 2018, 2:06:34 AM7/18/18
to kaldi-help
spk2gender in data/train directory is empty. But still why?

On Wednesday, July 18, 2018 at 10:55:32 AM UTC+5, Dan Povey wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
data-train_directory.png

xiaoy...@gmail.com

unread,
Jul 18, 2018, 7:04:49 AM7/18/18
to kaldi-help
because in the spk2gender file, there are 23 speakers in total, while in the train.utt2spk, there are only 12 speaker, and kaldi expects these 2 files contain the same number of speakers.
i wrote a python scripts to remove the extra speakers in the spk2gender file, and pass the validate_data_test
spk2gender

Sidrah Azhar

unread,
Jul 18, 2018, 7:06:38 AM7/18/18
to kaldi-help
Thank you so much.
Reply all
Reply to author
Forward
0 new messages