Reg: Error at Delta Delta Training

589 views
Skip to first unread message

Naresh

unread,
Sep 21, 2015, 1:53:31 PM9/21/15
to kaldi-help
Dear all,
I was training with Delta + Delta-Delta module for English database. I found the error at


 steps/train_deltas.sh  --cmd "run.pl" 2500 15000 eval_benchmark/ data/lang exp_full/mono_ali/ exp_full/tri1
steps/train_deltas.sh --cmd run.pl 2500 15000 eval_benchmark/ data/lang exp_full/mono_ali/ exp_full/tri1
steps/train_deltas.sh: accumulating tree stats
steps/train_deltas.sh: getting questions for tree-building, via clustering
steps/train_deltas.sh: building the tree
steps/train_deltas.sh: converting alignments from exp_full/mono_ali/ to use current tree
steps/train_deltas.sh: compiling graphs of transcripts
steps/train_deltas.sh: training pass 1
run.pl: 1 / 4 failed, log is in exp_full/tri1/log/acc.1.*.log


The error is occured while accumating the statistics for GMM. The log file looks like this

# gmm-acc-stats-ali exp_full/tri1/1.mdl scp:eval_benchmark//split4/1/feats.scp "ark,s,cs:gunzip -c exp_full/tri1/ali.1.gz|" exp_full/tri1/1.1.acc 

gmm-acc-stats-ali exp_full/tri1/1.mdl scp:eval_benchmark//split4/1/feats.scp 'ark,s,cs:gunzip -c exp_full/tri1/ali.1.gz|' exp_full/tri1/1.1.acc 
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_057
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_058
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_059
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_060
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_061
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_063
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_064
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_065
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_066
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_067
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_069
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_070
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_071
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_072
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_073
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_075
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_076
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_077
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_078
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_079
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_081
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_082
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_083
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_084
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_085
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_087
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_088
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_089
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_090
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_091
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_093
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_094
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_095
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_096
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_097
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_099
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance datacollection2_0009_Name_100
.
.
.
LOG (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:112) Done 8238 files, 5383 with errors.

But I can see the alignments of ali.1.gz using show-alignments.

Please help me out,


Daniel Povey

unread,
Sep 21, 2015, 2:27:47 PM9/21/15
to kaldi-help
Make sure your alignments in mono_ali were created using exactly the same data directory- or at least exactly the same utterance list- that you are trying to align currently.
If not you might have to recompute the alignments.
Dan


--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Naresh kumar

unread,
Sep 22, 2015, 2:55:17 AM9/22/15
to kaldi...@googlegroups.com
Yes alignments were created using same data directory and with same utterance list. 
Again monophone alignments are computed. I am getting the same error.

 The align.1.log file (exp/mono_ali/log/) looks like this

# compile-train-graphs exp_full/mono_ali/tree exp_full/mono_ali/final.mdl data/lang/L.fst "ark:utils/sym2int.pl --map-oov 7647 -f 2- data/lang/words.txt eval_benchmark//split4/1/text|" ark:- | gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=100 --retry-beam=300 --careful=false "gmm-boost-silence --boost=1.25 1 exp_full/mono_ali/final.mdl - |" ark:- scp:eval_benchmark//split4/1/feats.scp "ark,t:|gzip -c >exp_full/mono_ali/ali.1.gz" 
# Started at Tue Sep 22 10:43:08 IST 2015
#
gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=100 --retry-beam=300 --careful=false 'gmm-boost-silence --boost=1.25 1 exp_full/mono_ali/final.mdl - |' ark:- scp:eval_benchmark//split4/1/feats.scp 'ark,t:|gzip -c >exp_full/mono_ali/ali.1.gz' 
compile-train-graphs exp_full/mono_ali/tree exp_full/mono_ali/final.mdl data/lang/L.fst 'ark:utils/sym2int.pl --map-oov 7647 -f 2- data/lang/words.txt eval_benchmark//split4/1/text|' ark:- 
gmm-boost-silence --boost=1.25 1 exp_full/mono_ali/final.mdl - 
LOG (gmm-boost-silence:main():gmm-boost-silence.cc:93) Boosted weights for 3 pdfs, by factor of 1.25
LOG (gmm-boost-silence:main():gmm-boost-silence.cc:103) Wrote model to -
WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466) Retrying utterance datacollection2_0218_MobileOffice_001 with beam 300
WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file datacollection2_0218_MobileOffice_001, len = 54
LOG (compile-train-graphs:main():compile-train-graphs.cc:151) compile-train-graphs: succeeded for 13621 graphs, failed for 0
LOG (gmm-align-compiled:main():gmm-align-compiled.cc:129) Overall log-likelihood per frame is -87.3346 over 4420738 frames.
LOG (gmm-align-compiled:main():gmm-align-compiled.cc:131) Retried 1 out of 13621 utterances.
LOG (gmm-align-compiled:main():gmm-align-compiled.cc:133) Done 13620, errors on 1
# Accounting: time=510 threads=1
# Ended (code 0) at Tue Sep 22 10:51:38 IST 2015, elapsed time 510 seconds

Here only utterance was failed and I hope this should not effect delta-delta training.

--
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/2XskLhZO0OA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--


Regards
Naresh Kumar

Daniel Povey

unread,
Sep 22, 2015, 3:03:51 AM9/22/15
to kaldi-help
Convert the alignments to text form using copy-int-vector with output to text form (ark,t:-) and see which utterance ids are present.  Maybe the file was truncated; maybe the utterance-ids are somehow out of sync with the features.  Check that the data directory validates correctly (validate_data_dir.sh or whatever it is).  Sometimes these issues can be caused by incorrectly sorted data or inconsistent utterance vs. speaker sorting.  If validation fails, read the section on data preparation carefully, and prepare your data with correct sorting and with speaker-id as a prefix of utterance-id.

Dan



Joshua Meyer

unread,
Nov 17, 2017, 11:20:49 AM11/17/17
to kaldi-help
Hi Dan,

I had this same problem, and as you suggested, `validata_data_dir.sh` did the trick.

I had to run it on each subdir after I split up my feats in the `splitN` subdirs.

Just chriping in so others know.

-josh

Daniel Povey

unread,
Nov 17, 2017, 12:44:57 PM11/17/17
to kaldi-help
validate_data_dir.sh does nothing, it just checks it.  so I doubt this made the difference.  Possibly you were out of memory.




--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/b49fa684-11cd-4897-895d-f8632d1e705d%40googlegroups.com.

Joshua Meyer

unread,
Nov 17, 2017, 12:53:24 PM11/17/17
to kaldi...@googlegroups.com
That explains why the problem came back again.

I've been stumped ever since I replied (prematurely it seems).

I'll check memory.


To unsubscribe from this group and all its topics, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Joshua Meyer
Ph.D. Candidate
University of Arizona

Joshua Meyer

unread,
Nov 17, 2017, 1:07:18 PM11/17/17
to kaldi...@googlegroups.com
I don't get above 5% memory usage, so that shouldn't be it.

It's some bug in my own scripts I'm realizing, and probably not interesting to the group.

If I find something of interest I'll report back in.

Thanks for the help.

-josh


Reply all
Reply to author
Forward
0 new messages