Kaldi for dummies

844 views
Skip to first unread message

Ana Montalvo

unread,
Jun 22, 2016, 9:09:26 AM6/22/16
to kaldi-help
Hi all,
I am trying to build a simple ASR system using my own data and following the tutorial shared by W. Zielinski. I am having this error:
copy-feats --compress=true ark:- ark,scp:/home/ana-cuda/Desktop/kaldi-trunk/egs/Digits_verivoz/mfcc/raw_mfcc_test.1.ark,/home/ana-cuda/Desktop/kaldi-trunk/egs/Digits_verivoz/mfcc/raw_mfcc_test.1.scp
compute-mfcc-feats --verbose=2 --config=conf/mfcc.conf scp,p:exp/make_mfcc/data/test/wav_test.1.scp ark:-

What could be happening?
Is it a problem of the signal format?
Thanks in advance
ana

Daniel Povey

unread,
Jun 22, 2016, 12:20:03 PM6/22/16
to kaldi-help
I don't see an error there.
> --
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

ahmet gülüm

unread,
Oct 21, 2016, 4:50:22 AM10/21/16
to kaldi-help
Hi, I am following same tutorial.When I run run.sh it gives me this:

===== PREPARING ACOUSTIC DATA =====


===== FEATURES EXTRACTION =====

steps/make_mfcc.sh --nj 1 --cmd run.pl data/train exp/make_mfcc/train mfcc
utils/validate_data_dir.sh: Successfully validated data-directory data/train
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
Succeeded creating MFCC features for train
steps/make_mfcc.sh --nj 1 --cmd run.pl data/test exp/make_mfcc/test mfcc
utils/validate_data_dir.sh: WARNING: you have only one speaker.  This probably a bad idea.
   Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html
   for more information.
utils/validate_data_dir.sh: Successfully validated data-directory data/test
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
Succeeded creating MFCC features for test
steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train mfcc
Succeeded creating CMVN stats for train
steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test mfcc
Succeeded creating CMVN stats for test

===== PREPARING LANGUAGE DATA =====

utils/prepare_lang.sh data/local/dict <UNK> data/local/lang data/lang
Checking data/local/dict/silence_phones.txt ...
--> reading data/local/dict/silence_phones.txt
--> data/local/dict/silence_phones.txt is OK

Checking data/local/dict/optional_silence.txt ...
--> reading data/local/dict/optional_silence.txt
--> data/local/dict/optional_silence.txt is OK

Checking data/local/dict/nonsilence_phones.txt ...
--> reading data/local/dict/nonsilence_phones.txt
--> data/local/dict/nonsilence_phones.txt is OK

Checking disjoint: silence_phones.txt, nonsilence_phones.txt
--> disjoint property is OK.

Checking data/local/dict/lexicon.txt
--> reading data/local/dict/lexicon.txt
--> data/local/dict/lexicon.txt is OK

Checking data/local/dict/extra_questions.txt ...
--> data/local/dict/extra_questions.txt is empty (this is OK)
--> SUCCESS [validating dictionary directory data/local/dict]

**Creating data/local/dict/lexiconp.txt from data/local/dict/lexicon.txt
sym2int.pl: undefined symbol <UNK> (in position 1)

===== LANGUAGE MODEL CREATION =====
===== MAKING lm.arpa =====


===== MAKING G.fst =====

arpa2fst -
LOG (arpa2fst:Read():arpa-file-parser.cc:90) Reading \data\ section.
LOG (arpa2fst:Read():arpa-file-parser.cc:145) Reading \1-grams: section.
LOG (arpa2fst:RemoveRedundantStates():arpa-lm-compiler.cc:341) Reduced num-states from 3 to 3

===== MONO TRAINING =====

steps/train_mono.sh --nj 1 --cmd run.pl data/train data/lang exp/mono
steps/train_mono.sh: Initializing monophone system.

What is wrong? Can anyone help me?
Best Regards
Ahmet

Danijel Korzinek

unread,
Oct 21, 2016, 7:11:39 AM10/21/16
to kaldi-help
You have a line that reads "sym2int.pl: undefined symbol <UNK> (in position 1)"

Just add "<UNK> sil" to the top of your lexicon.txt file and try again.

ahmet gülüm

unread,
Oct 21, 2016, 7:25:29 AM10/21/16
to kaldi-help
That worked thank you so much.
Best Regards
Ahmet

Sabr Tasbolatov

unread,
Oct 24, 2016, 8:15:40 AM10/24/16
to kaldi-help
+ahmet gülüm,

Could you please share dataset you used for this tutorial? I couldn't find any ready digits data except this repo with 1 Speaker

Thanks,
Sabr

ahmet gülüm

unread,
Oct 24, 2016, 10:42:03 AM10/24/16
to kaldi...@googlegroups.com
Unfortunately, I used yes/no dataset for this tutorial.If you want I can share with you.
Best Regards
Ahmet

--
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/0Dv0CZBareI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages