Too many alignment errors in Librispeech recipe?

336 views
Skip to first unread message

Xavier Anguera

unread,
May 12, 2016, 8:39:56 AM5/12/16
to kaldi-help
Hi,
I am running the Librispeech recipe and I am getting reasonable WER results (by reasonable I mean they are quite close/equal to those in RESULTS file), therefore I suppose my setup is correct.
Looking at the logs closely I notice that for the alignment step tri4b_ali_clean_460 (i.e. right after pronunciation and silence probabilities and recreating the lang directory) there is an insane number of warnings indicating that a file could not be decoded/aligned successfully. For example:

align_pass1.10.log:WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466) Retrying utterance 4152-12926-0010 with beam 40

align_pass1.10.log:WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file 4152-12926-0010, len = 1377


I tried raising the retry_beam to 60 but still get many of those.

I am wondering whether this is "normal" in the Librispeech recipe or else it must be something in my setup that I messed up (in the second case I would not see where, as WER's seem still good).


It is a shame to be throwing away so much data, so any help would be appreciated!


X.


Daniel Povey

unread,
May 12, 2016, 1:27:25 PM5/12/16
to kaldi-help
It's hard to know from your email what proportion were not aligned
successfully (and therefore how problematic it is). The last line of
the log file should tell you.
Dan
> --
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Xavier Anguera

unread,
May 12, 2016, 1:49:30 PM5/12/16
to kaldi...@googlegroups.com
About 30% can not be aligned.

Daniel Povey

unread,
May 12, 2016, 2:22:02 PM5/12/16
to kaldi-help
I found an old directory of this, and it was only retrying a handful
(see below).
You might want to check for earlier errors, in other log files in that
directory.
And check that the phones.txt is not changed between 'lang_nosp' and 'lang'.

Dan


grep 'Retried'
/home/dpovey/kaldi-svn-clean/egs/librispeech/s5/exp/tri4b_ali_clean_460/log/align_pass?.1.log

b_ali_clean_460/log/align_pass?.1.log

/home/dpovey/kaldi-svn-clean/egs/librispeech/s5/exp/tri4b_ali_clean_460/log/align_pass1.1.log:LOG
(gmm-align-compiled:main():gmm-align-compiled.cc:180) Retried 12 out
of\

6511 utterances.

/home/dpovey/kaldi-svn-clean/egs/librispeech/s5/exp/tri4b_ali_clean_460/log/align_pass2.1.log:LOG
(gmm-align-compiled:main():gmm-align-compiled.cc:180) Retried 23 out
of\

6511 utterances.

On Thu, May 12, 2016 at 1:49 PM, Xavier Anguera <xav...@elsanow.io> wrote:
> About 30% can not be aligned.
>

Xavier Anguera

unread,
May 12, 2016, 2:25:30 PM5/12/16
to kaldi...@googlegroups.com
I see, thanks!
I will try to find what I broke.


You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/VcdrcSnJhTE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Xavier Anguera
CTO & CSO
ELSA Corp.

Xavier Anguera

unread,
May 12, 2016, 2:40:53 PM5/12/16
to kaldi...@googlegroups.com
Quick update, your hunch was totally accurate. I just realized that I recently changed the initial prepare_lang.sh call to add an external phones.txt but did not change the second call. This will definitely be the reason.

thank you very much!
Reply all
Reply to author
Forward
0 new messages