End-to-end speech recognition

1,725 views
Skip to first unread message

Moataz El Ayadi

unread,
Jun 24, 2017, 6:01:25 PM6/24/17
to kaldi-help
Hello
Recently, the interest of end-to-end speech recognition has increased significantly. The major claimed advantages over the traditional approaches are the ease of training the models (only one model and there's on need to construct a lexicon, different acoustic models for mono phonemes, tri-phonemes, ... etc) and the ability to transfer learning, i. e., simple retraining for new use cases. 

But, I'd like to ask whether there are future plan to support end-to-end ASR or not.

Thank you

P.S.: I am more familiar with the traditional system (acoustic modelling, linguistic modelling, ... etc) and hence I don't know under which conditions (amount of training data, recording conditions, ... etc) the  above claims are valid.

Thank you.

Daniel Povey

unread,
Jun 24, 2017, 6:12:42 PM6/24/17
to kaldi-help
The end-to-end systems that have actually produced decent results for
realistic scenarios have generally not been word-based, but
phone-based. (Of course, you can always choose to use a graphemic
lexicon instead of phones, but that's an orthogonal issue).

If we are talking about the word-based end-to-end systems, actually,
for new use-cases it's precisely the opposite of what you say--
end-to-end systems are harder to adapt to new scenarios because you
would need training data with the same vocabulary you want to use. If
the systems are phone-based or grapheme-based, then it's no different
from adapting a conventional (not-end-to-end) system.

There was a lot of noise about end-to-end systems at one point, but I
think it's mostly died down. People have realized that you can get
better results with more conventional systems. Google realized that
the improvements they appeared to be getting from end-to-end training
(on a phone, not word level) were due to the lower frame rate and not
to the CTC objective function.

In Kaldi we implemented lattice-free MMI (LF-MMI), which took certain
ideas from end-to-end systems but is really a more conventional system
based on a sequence-level objective function, and that's our standard
system.

There is no plan to add end-to-end training to Kaldi.
> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Moataz El Ayadi

unread,
Jun 24, 2017, 6:38:41 PM6/24/17
to kaldi-help, dpo...@gmail.com
Hi Dan
Thanks a lot for your quick and informative answer (as usual)

Moataz El Ayadi

unread,
Mar 6, 2018, 9:47:39 PM3/6/18
to kaldi-help
Hello wonderful Kaldi team
I noticed that you have just pushed the code for the end to end method a day ago and I'd like to thank you for that.
Is this the implementation of the LF MMI work on that paper http://danielpovey.com/files/2018_icassp_end2end.pdf?

Thanks

Hossein Hadian

unread,
Mar 7, 2018, 2:35:47 PM3/7/18
to kaldi-help
Yes, it is (only examples for WSJ are merged for now).

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/b30e2e1e-0dfc-4ba6-a81a-a3bb4e0d536d%40googlegroups.com.

Moataz El Ayadi

unread,
Mar 7, 2018, 7:28:35 PM3/7/18
to kaldi-help
Great. Looking forward to include results for common examples such as Switchboard and Fisher English soon.

Alim Misbullah

unread,
Mar 12, 2018, 1:51:03 AM3/12/18
to kaldi-help
It also can work well for Mandarin corpus especially for phones based lexicon. 

For characters based, the pdf-id will be too large so the network cannot be trained currently in my case.

I have question, 

Is it possible to do greedy decoding in Kaldi if I use characters based end-to-end training for Mandarin language?

I would like to see the result without rely on language model.

Thanks,
Alim

Daniel Povey

unread,
Mar 12, 2018, 2:00:45 AM3/12/18
to kaldi-help
The model that this e2e code produces is structurally the same as a regular chain model, and it also is intended to be used with a language model (like all so-called end-to-end systems that produce decent results).

It's just a different way of producing it, that doesn't require alignments from a GMM  system.

Dan


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages