Hi,
I'm currently working on a french speech recognizer using
LIUM models and the ESTER 2 corpus (broadcast data) but I have a problem
with lattice-align-words.
I used TEDLIUM recipe for my first experiment and everything was ok. From this point, I wanted to try nnet3 and some
improvements (rescoring, mbr decoding, different alignments, etc) so I went to wsj recipe since this seems to be the last updated recipe.The problem is I'm stuck during the word-lattices alignment when using word_align_lattices.sh. The command I used is :
sil_label=`grep '!SIL' data/lang_nosp_test_tgpr/words.txt | awk '{print $2}'`
steps/word_align_lattices.sh --cmd "$train_cmd" --silence-label
$sil_label data/lang_nosp_test_tgpr exp/tri1/decode_nosp_dev_tgpr
exp/tri1/decode_nosp_dev_tgpr_ali || exit 1;which ends up with an error :
run.pl: 3 / 8 failed, log is in exp/tri1/decode_nosp_dev_tgpr_ali/log/align.*.log
When looking in the log files we can see that I have several invalid arcs :
# lattice-align-words --silence-label=1 --test=true
data/lang_nosp_test_tgpr/phones/word_boundary.int exp/tri1/final.mdl
"ark:gunzip -c exp/tri1/decode_nosp_dev_tgpr/lat.3.gz|" "ark,t:|gzip -c
>exp/tri1/decode_nosp_dev_tgpr_ali/lat.3.gz"
# Started at Wed Apr 6 08:19:42 CEST 2016
#
lattice-align-words
--silence-label=1 --test=true
data/lang_nosp_test_tgpr/phones/word_boundary.int exp/tri1/final.mdl
'ark:gunzip -c exp/tri1/decode_nosp_dev_tgpr/lat.3.gz|' 'ark,t:|gzip -c
>exp/tri1/decode_nosp_dev_tgpr_ali/lat.3.gz'
ERROR
(lattice-align-words:TestArc():word-align-lattice.cc:749) Invalid arc in
aligned CompactLattice: 0 0 1
7.24455,1890.09,2_1_1_1_1_1_1_1_1_1_1_1_1_1_8_5_5_5_5_5_18
WARNING
(lattice-align-words:Close():kaldi-io.cc:496) Pipe gunzip -c
exp/tri1/decode_nosp_dev_tgpr/lat.3.gz| had nonzero return status 13
ERROR
(lattice-align-words:TestArc():word-align-lattice.cc:749) Invalid arc
in aligned CompactLattice: 0 0 1
7.24455,1890.09,2_1_1_1_1_1_1_1_1_1_1_1_1_1_8_5_5_5_5_5_18
[stack trace: ]
kaldi::KaldiGetStackTrace()
kaldi::KaldiErrorMessage::~KaldiErrorMessage()
kaldi::WordAlignedLatticeTester::TestArc(fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>,
int> > const&)
kaldi::WordAlignedLatticeTester::Test()
kaldi::TestWordAlignedLattice(fst::VectorFst<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>,
int> > > const&, kaldi::TransitionModel const&,
kaldi::WordBoundaryInfo const&,
fst::VectorFst<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>,
int> > > const&)
lattice-align-words(main+0x51f) [0x67ca45]
/lib64/libc.so.6(__libc_start_main+0xf0) [0x7f2269d79700]
lattice-align-words(_start+0x29) [0x67c459]
# Accounting: time=23 threads=1
# Ended (code 255) at Wed Apr 6 08:20:05 CEST 2016, elapsed time 23 seconds
I understand the problem but to be honest I can't figure out where it went wrong nor how to fix this since all my previous stages (in order : data preparation, features extraction, subset creation, train mono + tri1, rescoring w/ non-pruned trigram) didn't have any errors in log.
Do you have any ideas how to solve this problem?
Thank you in advance. Best regards,
Florian B.