Pykaldi2: Yet another speech toolkit based on Kaldi and Pytorch

1,085 views
Skip to first unread message

Jeyhoun Marks

unread,
Jul 20, 2019, 7:26:30 AM7/20/19
to kaldi-help
any thoughts about this https://arxiv.org/abs/1907.05955 ?
has any one tried it?

Daniel Povey

unread,
Jul 20, 2019, 2:36:51 PM7/20/19
to kaldi-help
I was not aware of it, but that is a good team.
I see it is a bit heavy on dependencies - it installs TensorFlow, Keras, PyTorch and MXNet.
Perhaps most of them are not needed though.
Also they seem to be using 100-frames-per-second systems which IMO is not optimal for speed.
(Our LF-MMI systems have 3-fold reduced frame rate).
The Python wrapping seems to be done via CLIF.

I have actually decided to get serious about PyTorch myself, and to try to integrate with PyTorch.
Not sure of the timeline.  Probably a few months.  I wanted to do it via pybind11 though, because I get the
impression that clif is abandoned.  E.g. last commit here https://github.com/google/clif seems to be in 2017.


Dan


On Sat, Jul 20, 2019 at 4:26 AM Jeyhoun Marks <jeyhou...@gmail.com> wrote:
any thoughts about this https://arxiv.org/abs/1907.05955 ?
has any one tried it?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/d49883e3-fd45-4bbd-b907-0e01ae627ff9%40googlegroups.com.

Jeyhoun

unread,
Jul 20, 2019, 5:35:10 PM7/20/19
to kaldi-help
I think it only depends on PyTorch (and Horovord for parallel training)
those TensorFlow, MXNet are probably remainders of some base image that they cleaned up and forgot to remove and not actually needed, installed etc.
yes, saw the discussion in github re moving to another NN training framework - so flashlight is no longer considered?


On Saturday, 20 July 2019 20:36:51 UTC+2, Dan Povey wrote:
I was not aware of it, but that is a good team.
I see it is a bit heavy on dependencies - it installs TensorFlow, Keras, PyTorch and MXNet.
Perhaps most of them are not needed though.
Also they seem to be using 100-frames-per-second systems which IMO is not optimal for speed.
(Our LF-MMI systems have 3-fold reduced frame rate).
The Python wrapping seems to be done via CLIF.

I have actually decided to get serious about PyTorch myself, and to try to integrate with PyTorch.
Not sure of the timeline.  Probably a few months.  I wanted to do it via pybind11 though, because I get the
impression that clif is abandoned.  E.g. last commit here https://github.com/google/clif seems to be in 2017.


Dan


On Sat, Jul 20, 2019 at 4:26 AM Jeyhoun Marks <jeyhou...@gmail.com> wrote:
any thoughts about this https://arxiv.org/abs/1907.05955 ?
has any one tried it?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Jul 20, 2019, 5:38:39 PM7/20/19
to kaldi-help
No, I'm not considering flashlight any more.  The code of flashlight itself was quite nice; my objection was more about lack of precise documentation once you get to the matrix backend based on ArrayFire.  It's kind of opaque.  It probably works fine if you are doing mainly expression-based things within that framework, but it was hard to understand further and I felt it would be hard to write code interacting with it.  Plus PyTorch has a lot of community support and a lot of things built around it, like FairSeq, which would probably come in useful.  I have to learn it better though.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/38c6f1d3-ff9a-4976-aa6a-c4e9e0c0b962%40googlegroups.com.

Jan Trmal

unread,
Jul 20, 2019, 5:51:02 PM7/20/19
to kaldi-help
I cannot speak for Dan, but I think he even talked to the authors of CLIF (or maybe it was Dogan?) and they have either moved on, or they are developing it strictly for Google-specific environments and next releases are not planned. 
y.

Jeyhoun

unread,
Jul 20, 2019, 5:51:07 PM7/20/19
to kaldi-help
thanks a lot Dan - like always very insightful comments

Jeyhoun

unread,
Jul 20, 2019, 5:54:47 PM7/20/19
to kaldi-help
I also heard a lot of good things about pybind11 - but never used myself
imho would be a shame if PyKaldi is not used - seems a very thorough piece of work


On Saturday, 20 July 2019 23:51:02 UTC+2, Yenda wrote:
I cannot speak for Dan, but I think he even talked to the authors of CLIF (or maybe it was Dogan?) and they have either moved on, or they are developing it strictly for Google-specific environments and next releases are not planned. 
y.

On Sat, Jul 20, 2019 at 2:36 PM Daniel Povey <dpo...@gmail.com> wrote:
I was not aware of it, but that is a good team.
I see it is a bit heavy on dependencies - it installs TensorFlow, Keras, PyTorch and MXNet.
Perhaps most of them are not needed though.
Also they seem to be using 100-frames-per-second systems which IMO is not optimal for speed.
(Our LF-MMI systems have 3-fold reduced frame rate).
The Python wrapping seems to be done via CLIF.

I have actually decided to get serious about PyTorch myself, and to try to integrate with PyTorch.
Not sure of the timeline.  Probably a few months.  I wanted to do it via pybind11 though, because I get the
impression that clif is abandoned.  E.g. last commit here https://github.com/google/clif seems to be in 2017.


Dan


On Sat, Jul 20, 2019 at 4:26 AM Jeyhoun Marks <jeyhou...@gmail.com> wrote:
any thoughts about this https://arxiv.org/abs/1907.05955 ?
has any one tried it?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Jul 20, 2019, 6:04:31 PM7/20/19
to kaldi-help
Yes it would be a shame.  I was hoping we might be able to look at PyKaldi and somehow transfer some of the stuff done there into pybind11, if possible.  I just think we will be setting ourselves up for problems in the future if we build on top of a framework that's not well supported.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/95e6917b-3d18-4d35-aeff-79c4e2ba6d46%40googlegroups.com.

Jan Trmal

unread,
Jul 20, 2019, 6:06:17 PM7/20/19
to kaldi-help
I agree, PyKaldi is a very good work...
y.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/95e6917b-3d18-4d35-aeff-79c4e2ba6d46%40googlegroups.com.

Daniel Galvez

unread,
Jul 21, 2019, 1:48:18 AM7/21/19
to kaldi-help
I am probably one of the handful of people qualified to migrate parts of kaldi to pybind11 given the work I've done on Kaldi and because my job has required pybind usage a lot, but I'm curious what the perceived benefit of this would be.

I do agree that pykaldi2 shows a benefit. Wrapping the lattice-generating decoder, for the sake of getting gradients with respect to your logits without doing weird subprocess calls seems very useful. But what other use cases are there?

Daniel Povey

unread,
Jul 21, 2019, 1:51:16 AM7/21/19
to kaldi-help
I'm not sure that *all* parts of Kaldi really need to be wrapped.  People do want a PyTorch-based neural net training setup though.  I have to figure out the right way to do that.  Right now I'm learning more about how PyTorch works.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/0f2df16a-7f46-4572-b122-be1fa7a0a103%40googlegroups.com.

Daniel Galvez

unread,
Jul 21, 2019, 2:25:19 AM7/21/19
to kaldi-help
Actually, I was surprised to learn that pykaldi2 doesn't do LF-MMI training. Honestly, that seems pretty disappointing. Maybe I'm wrong, but does anyone use the vanilla MMI, MPE, or MBR criteria anymore? BTW, my understanding is that we can implement LF-MMI using RNN's following this paper http://bacchiani.net/resume/papers/ASRU2017.pdf. However, my understanding is that that is useful primarily for TPUs, which don't have GPUs' flexible programming models, since last time I checked, the LF-MMI criterion is not a bottleneck in training anyway, so there is no need to reimplement kaldi's existing cuda kernel for it. My point is that if people want LF-MMI criterion in pytorch, it can be done in terms of existing primitives, *without* interfacing to kaldi in a substantial way unless I am mistaken (although you still need the GMM to bootstrap from and you need to transform the denominator and numerator FSTs as discussed in the paper so that each state corresponds to only one transition id).

The pykaldi2 paper also doesn't do any experiments with on-the-fly realignment if I am understanding correctly, which is disappointing for me.

My #1 problem with integrating other neural network toolkits with kaldi was that none of them had a concept of nnet3's "given that we need this output at time t, figure out the intermediate values you need, and compute only those, caching as needed for the future". Admittedly, this is useful only for online decoding. It doesn't matter for research purposes.



--

Vassil Panayotov

unread,
Jul 21, 2019, 3:08:11 AM7/21/19
to kaldi...@googlegroups.com
Dan, I'm not following Kaldi's development very closely these days, but I wonder if you've decided to abandon the idea of creating a new nnet library(the stuff in src/tensor/), in favor of integrating PyTorch? I can certainly see how creating a new framework can be a lot of work for a single person(even though you've basically done it twice before)...

Vassil

Rudolf A. Braun

unread,
Jul 21, 2019, 9:22:51 AM7/21/19
to kaldi-help
Awesome to hear!


On Saturday, July 20, 2019 at 7:36:51 PM UTC+1, Dan Povey wrote:
I was not aware of it, but that is a good team.
I see it is a bit heavy on dependencies - it installs TensorFlow, Keras, PyTorch and MXNet.
Perhaps most of them are not needed though.
Also they seem to be using 100-frames-per-second systems which IMO is not optimal for speed.
(Our LF-MMI systems have 3-fold reduced frame rate).
The Python wrapping seems to be done via CLIF.

I have actually decided to get serious about PyTorch myself, and to try to integrate with PyTorch.
Not sure of the timeline.  Probably a few months.  I wanted to do it via pybind11 though, because I get the
impression that clif is abandoned.  E.g. last commit here https://github.com/google/clif seems to be in 2017.


Dan


On Sat, Jul 20, 2019 at 4:26 AM Jeyhoun Marks <jeyhou...@gmail.com> wrote:
any thoughts about this https://arxiv.org/abs/1907.05955 ?
has any one tried it?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Jul 21, 2019, 2:21:20 PM7/21/19
to kaldi-help

@Daniel:
to do that LF-MMI stuff you need good sparse-tensor support, and I think Pytorch has been changing their sparse-tensor stuff and maybe deprecating some of it.  According to
it supports sparse matrices in COO (co-ordinate) format but this is experimental and may change.
(Also, I suspect you'd get a substantial slowdown because I went to a lot of effort to preserve memory locality, batch things right, etc.  But as you say, it may still not dominate).

Regarding nnet3's concept of "give me this output at this time" and also the fact that nnet3 has a unified concept of a model such that the same likelihood-evaluation code can be used with different models... I think one just has to accept a mental shift to the way that toolkits like PyTorch work, and accept that all the workflows etc. would have to be re-done and things like decoders would have to be invoked in a different way.   These things are always a tradeoff.

@Vassil: yes, I am putting my tensor stuff on hold for the time being.  I may come back to it in future, or try to persuade some of the PyTorch team to accept some of the ideas in there.  (The novel part is largely about a language/formalization to make tensor layout issues much easier to explain precisely and reason about precisely, plus a shift to more operation-based, less function-based interpretations for operations).

Dan


Daniel Galvez

unread,
Jul 21, 2019, 9:12:07 PM7/21/19
to kaldi-help
You can represent a CSR sparse matrix as three dense arrays. Just a little more work on the user's part to pass three variables instead of one. It works for Tensorflow Ops. No reason it shouldn't work for pytorch.

My understanding is that the authors of that paper claim in their earlier work "An efficient phone n-gram forward-backward computation using dense matrix multiplication" that the FST can be represented as a block diagonal matrix (this immediately doesn't make sense, given that FSTs are directed). So they are probably using what I've colloquially heard of as "BOO" format - "block COO", which would make sense for TPU's, which literally cannot anything other than GEMM (lol).

Regardless, for GPU purposes, there is an implementation of CSR-matrix-by-dense-vector multiplication for CUDA that works on arbitrary sparsity patterns here: https://github.com/owensgroup/merge-spmv It is a template library that can be specialized for the log-semiring and tropical semiring.

I suppose what I am saying is that if this were my full-time job, I would have no problem implementing LF-MMI training on GPU in pytorch or tensorflow or what-have-you.

Daniel

Vassil Panayotov

unread,
Jul 22, 2019, 3:37:32 AM7/22/19
to kaldi...@googlegroups.com
On Sun, Jul 21, 2019 at 9:21 PM Daniel Povey <dpo...@gmail.com> wrote:

@Vassil: yes, I am putting my tensor stuff on hold for the time being.  I may come back to it in future, or try to persuade some of the PyTorch team to accept some of the ideas in there.  (The novel part is largely about a language/formalization to make tensor layout issues much easier to explain precisely and reason about precisely, plus a shift to more operation-based, less function-based interpretations for operations).

FWIW, using PyTorch as a starting point makes sense to me, especially if it saves work. I guess after you figure out the proper way to pythonize Kaldi(perhaps reusing some ideas of PyKaldi{,2}- I'm not familiar with any of those unfortunately) and see how people are using it, you might be in a better position to see how to evolve the nnet parts- whether to enhance the core PyTorch(if the developers agree), fork it, or to roll your own.
I just hope Kaldi will retain(and hopefully enhance) its transparency and modularity when the Python APIs are added- I mean higher level interfaces are good, but the flexibility and simplicity of the backend code and recipes are worth preserving IMO, as is the performance for people using it in production.

Vassil

 
Reply all
Reply to author
Forward
0 new messages