Problem with lattice-align-words

dakl...@gmail.com

unread,

Apr 6, 2016, 4:15:23 AM4/6/16

to kaldi-help

Hi,

I'm currently working on a french speech recognizer using LIUM models and the ESTER 2 corpus (broadcast data) but I have a problem with lattice-align-words.
I used TEDLIUM recipe for my first experiment and everything was ok. From this point, I wanted to try nnet3 and some improvements (rescoring, mbr decoding, different alignments, etc) so I went to wsj recipe since this seems to be the last updated recipe.

The problem is I'm stuck during the word-lattices alignment when using word_align_lattices.sh. The command I used is :

    sil_label=`grep '!SIL' data/lang_nosp_test_tgpr/words.txt | awk '{print $2}'`

    steps/word_align_lattices.sh --cmd "$train_cmd" --silence-label $sil_label data/lang_nosp_test_tgpr   exp/tri1/decode_nosp_dev_tgpr exp/tri1/decode_nosp_dev_tgpr_ali || exit 1;

which ends up with an error :

    run.pl: 3 / 8 failed, log is in exp/tri1/decode_nosp_dev_tgpr_ali/log/align.*.log

When looking in the log files we can see that I have several invalid arcs :

# lattice-align-words --silence-label=1 --test=true data/lang_nosp_test_tgpr/phones/word_boundary.int exp/tri1/final.mdl "ark:gunzip -c exp/tri1/decode_nosp_dev_tgpr/lat.3.gz|" "ark,t:|gzip -c >exp/tri1/decode_nosp_dev_tgpr_ali/lat.3.gz"
# Started at Wed Apr 6 08:19:42 CEST 2016
#
lattice-align-words --silence-label=1 --test=true data/lang_nosp_test_tgpr/phones/word_boundary.int exp/tri1/final.mdl 'ark:gunzip -c exp/tri1/decode_nosp_dev_tgpr/lat.3.gz|' 'ark,t:|gzip -c >exp/tri1/decode_nosp_dev_tgpr_ali/lat.3.gz'
ERROR (lattice-align-words:TestArc():word-align-lattice.cc:749) Invalid arc in aligned CompactLattice: 0 0 1 7.24455,1890.09,2_1_1_1_1_1_1_1_1_1_1_1_1_1_8_5_5_5_5_5_18
WARNING (lattice-align-words:Close():kaldi-io.cc:496) Pipe gunzip -c exp/tri1/decode_nosp_dev_tgpr/lat.3.gz| had nonzero return status 13
ERROR (lattice-align-words:TestArc():word-align-lattice.cc:749) Invalid arc in aligned CompactLattice: 0 0 1 7.24455,1890.09,2_1_1_1_1_1_1_1_1_1_1_1_1_1_8_5_5_5_5_5_18

[stack trace: ]
kaldi::KaldiGetStackTrace()
kaldi::KaldiErrorMessage::~KaldiErrorMessage()
kaldi::WordAlignedLatticeTester::TestArc(fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> > const&)
kaldi::WordAlignedLatticeTester::Test()
kaldi::TestWordAlignedLattice(fst::VectorFst<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> > > const&, kaldi::TransitionModel const&, kaldi::WordBoundaryInfo const&, fst::VectorFst<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> > > const&)
lattice-align-words(main+0x51f) [0x67ca45]
/lib64/libc.so.6(__libc_start_main+0xf0) [0x7f2269d79700]
lattice-align-words(_start+0x29) [0x67c459]

# Accounting: time=23 threads=1
# Ended (code 255) at Wed Apr 6 08:20:05 CEST 2016, elapsed time 23 seconds

I understand the problem but to be honest I can't figure out where it went wrong nor how to fix this since all my previous stages (in order : data preparation, features extraction, subset creation, train mono + tri1, rescoring w/ non-pruned trigram) didn't have any errors in log.
Do you have any ideas how to solve this problem?

Thank you in advance. Best regards,

Florian B.

Daniel Povey

unread,

Apr 6, 2016, 2:42:01 PM4/6/16

to kaldi-help

You seem to be running it with slightly different-than-normal options, e.g. by adding silence. I suspect it is a bug, triggered by options that haven't previously been used.

If you send me the files that will allow me to reproduce one of these failures, I can try to fix it. I can't figure out just from looking at the code, what the problem might be.

Dan

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Florian BOYER

unread,

Apr 7, 2016, 2:48:57 AM4/7/16

to kaldi-help, dpo...@gmail.com

Thank you for your answer Dan! You're right, the problem was with the silence addition.

The command I wrote in my previous message is directly taken from the wsj recipe. The only difference in the previous steps between my and their recipe is that I didn't compute the mono training alignments with --boost-silence. Could this be the cause of the errors?

Thanks again. Best regards,

Florian B.

Daniel Povey

unread,

Apr 7, 2016, 2:46:37 PM4/7/16

to Florian BOYER, kaldi-help

The --boost-silence wouldn't have been the cause of the errors. It's likely a bug in the program- please send me the files (model and alignments, and lang/ directory) so I can reproduce it. You can do it off the list.

Dan

Daniel Povey

unread,

Apr 11, 2016, 7:48:10 PM4/11/16

to Florian BOYER, kaldi-help

This issue is fixed now.
Dan

Reply all

Reply to author

Forward