Why WER is nan?

113 views
Skip to first unread message

asr shylock

unread,
Apr 4, 2021, 5:25:05 AM4/4/21
to kaldi-help
Dear @Dan:
  We use our own data to train ASR model with kaldi, and the training process is the same as usual. But when in the monophone stage, the decoded result(WER & SER) is a nan value. We don't know why?  The screenshot results are as follows:

%WER -nan [ 0 / 0, 0 ins, 0 del, 0 sub ] [PARTIAL] exp/mono/decode_test/cer_7_0.0

Daniel Povey

unread,
Apr 4, 2021, 5:49:48 AM4/4/21
to kaldi...@googlegroups.com
I don't know, look into the decoding logs (look for files called *.log)


--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/bcfeabe4-d99f-403c-8cf0-d8ce1d392c01n%40googlegroups.com.

asr shylock

unread,
Apr 4, 2021, 9:46:56 AM4/4/21
to kaldi-help
Thanks Dan:

  We check the "exp/mono/decode_test/scoring_kaldi/log/ stats1.log " .
  The log is as follows:"WARNING (align-text[5.5]:main():align-text.cc:88) Key xxxxxxxxxxx is in ark:exp/mono/decode_test/scoring_kaldi/test_filt.txt, but not in ark:-"
  Therefore, we think there may be some problems here.

Thx

saurabh vyas

unread,
Apr 4, 2021, 10:05:17 AM4/4/21
to kaldi...@googlegroups.com
Check if both files ( hypothesis and reference ), have all utterance ids.

You can extract first column from both files ( utt ids ), and using diff command find if there is any difference.

asr shylock

unread,
Apr 5, 2021, 3:51:21 AM4/5/21
to kaldi-help
Thanks saurabh:
  This is so weird, i checked "text wav.scp utt2spk spk2utt" for a long time and all files are normal.
  At the same time, I checked all monophone training logs, only the following directory (exp/mono/decode_test/scoring_kaldi/log/ stats1.log ) logs have problems.

  I don't know how to solve this problems?

Looking forward to your reply.

saurabh vyas

unread,
Apr 5, 2021, 4:57:49 AM4/5/21
to kaldi...@googlegroups.com
You need to explore the scoring scripts used in your recipie in detail, without which scoring script you are using it would be very hard to debug.






asr shylock

unread,
Apr 6, 2021, 3:05:42 AM4/6/21
to kaldi-help
Thanks @Dan,@saurabh:
  We have solved this problem by using "dos2unix" commands to process our lexicon, training and test data.
  And we also fix some pitch errors in lexicon.txt.

Thx,
shylock
Reply all
Reply to author
Forward
0 new messages