Error in GMM train, in lmrescore_const_arpa

95 views
Skip to first unread message

Igor S

unread,
Dec 13, 2017, 6:03:57 AM12/13/17
to kaldi-help
Hi,

I trying to do training of GMM, based on the TEDLIUM example.
While running stage 12 I got the following message:
"utils/mkgraph.sh: exp/tri2/graph_nosp/HCLG.fst is up to date.
steps/decode.sh --nj 8 --cmd run.pl --num-threads 4 exp/tri2/graph_nosp data/train exp/tri2/decode_nosp_train
decode.sh: feature type is lda
steps/diagnostic/analyze_lats.sh --cmd run.pl exp/tri2/graph_nosp exp/tri2/decode_nosp_train
analyze_phone_length_stats.py: WARNING: optional-silence SIL is seen only 59.6349467849% of the time at utterance begin.  This may not be optimal.
analyze_phone_length_stats.py: WARNING: optional-silence SIL is seen only 73.2390192028% of the time at utterance end.  This may not be optimal.
steps/diagnostic/analyze_lats.sh: see stats in exp/tri2/decode_nosp_train/log/analyze_alignments.log
Overall, lattice depth (10,50,90-percentile)=(1,6,77) and mean=28.1
steps/diagnostic/analyze_lats.sh: see stats in exp/tri2/decode_nosp_train/log/analyze_lattice_depth_stats.log
run.pl: 4 / 11 failed, log is in exp/tri2/decode_nosp_train/scoring/log/score.*.0.0.log
steps/decode.sh: Scoring failed. (ignore by '--skip-scoring true') "

In some of the logs I get error, others seem fine. Here is the error in 'exp\tri2\decode_nosp_train\scoring\log\score.7.0.0.log':
        1055 of 1055 Segments For Channel a. Error: SCLITE execution failed
      Command: sclite -r exp/tri2/decode_nosp_train/score_7_0.0/stm.filt stm -h exp/tri2/decode_nosp_train/score_7_0.0/ctm.filt.filt ctm exp/tri2/decode_nosp_train/score_7_0.0/ctm.filt -F -D -o sum rsum sgml lur dtl pra prf -C det sbhist hist -O exp/tri2/decode_nosp_train/score_7_0.0 -n ctm.filt.filt at /media/USERS/igor/kaldi/egs/ru_films/s5_r8/../../../tools/sctk/bin/hubscr.pl line 658.

What is the meaning of this error? And how can I avoid it?

Thanks,
Igor

Daniel Povey

unread,
Dec 13, 2017, 12:46:15 PM12/13/17
to kaldi-help
Is it possible that it's a memory issue, or that you were running the scripts twice at the same time?
And is the problem repeatable?

People have previusly reported issues with sclite where there were errors that could not be replicated by running from the command line.  Possibly it might be something that depends on the locale or on the terminal type.  You might have to do some experimentation.

sclite is a complicated and mysterious beast, and it's not feasible for us to fully support it, but I'd be interested to hear what you find out.

personally my bias is towards using the native Kaldi scoring tools for new recipes, because sclite is too hard to debug.

Dan




--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/a8048c2b-2910-4920-9cb3-ee1f8a3a5853%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Igor S

unread,
Dec 14, 2017, 2:12:34 AM12/14/17
to kaldi-help
The problem is repeatable. I tried 4 times before I added "--skip-scoring" to the decode stage.

I ran the script once, and waited until it crashed. And I ran it from command line, so that is not the issue.

It can be memory issue, although the server has 128G memory. How can I debug memory issue?

I don't think it got anything to do with locale, or terminal type. I successfully ran  exactly the same script with with smaller dataset. So, my guess was that it is something in the dataset. But I can't see anything. And it seems that everything was OK until that stage.

Also, I have something peculiar in ctm files. From specific "moment" the words are repeated. This way:
1000077 A 2145.18 0.36 большие 1.00
1000077 A 2145.54 0.49 проблемы 1.00
1000077 A 2146.32 0.36 о 1.00
1000077 A 2146.32 0.36 о 1.00
1000077 A 2147.42 0.25 да 0.95
1000077 A 2147.42 0.25 да 0.95
1000077 A 2151.29 0.63 поняла 0.99
1000077 A 2151.29 0.63 поняла 0.99
The first two lines, each has a word. But after that every words is repeated with exactly the same time. Any ideas?

Thanks,
Igor

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Igor S

unread,
Dec 14, 2017, 4:47:44 AM12/14/17
to kaldi-help
Also, what do you mean "native Kaldi scoring tools" and how do I use them.

Thanks,
Igor

Daniel Povey

unread,
Dec 14, 2017, 6:07:32 PM12/14/17
to kaldi-help
That error, with repeated lines in the CTM, is usually caused if you previously ran decoding with a larger number of jobs, or modified the script, so that the glob 'lat.*.gz'  matches more filenames than you want.
That may be what is crashing the sclite software.  It's hard to debug unless you are really good at perl.
The native Kaldi scoring tools are what you get if you call link ../steps/score_kadi.sh to local/score.sh. 


Dan


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages