Regarding integrating Kaldi's acoustic models with a mainstream toolkit:
I think that to do this right would involve a ground-up reimagining of
quite a lot of the mechanisms used in Kaldi. Even if it turned out
that there was a way to do it that enabled us to keep much of our
existing code, it would be a very huge project. The issue, IMO, is
that there's a kind of impedance mismatch between the way Kaldi is
designed and the way most of these toolkits work. Just one of the
issues is language (C++ versus python). A lot of Kaldi code is in C++
and interfacing that with some of these toolkits would be quite hard.
They can be fairly easy to use in python if you use them as intended,
but (at least this my experience from trying to understand TensorFlow)
as soon as you either look into the internals of the python code or
into the C++ code, things spiral quickly out of control in terms of
the complexity. I imagine the TensorFlow team must have some internal
documentation on how it's designed from the C++ level, for instance,
because what is available externally doesn't help you understand it at
all, and the code is almost completely opaque. (And I consider myself
an expert level C++ programmer).
Pytorch seems to have a more straightforward design from a user
perspective than TensorFlow (I just attended a talk on PyTorch,
actually) and might possibly be easier to integrate with, but this
isn't something that's really on my agenda right now; anyway, most of
the same issues apply as the design isn't really that different from
TensorFlow or Theano.
Also there are some things we do, like the self-repair and
(particularly) natural gradient, that are helpful but which would be
hard to reproduce in one of the standard deep learning toolkits, due
to fundamental differences in design.
Regarding CuDNN: the reason we're not using that is
- Because I didn't want an extra dependency
- Because it would create hassles with being able to do inference on
CPU (i.e. we'd have to create interfaces that were compatible with the
CuDNN ones in order to have a parallel CPU implementation), and CuDNN
does not provide a CPU implementation
- Because it takes away quite a lot of your freedom to experiment
(e.g. their LSTM code can't be modified to variants of LSTMs).
Regarding the model averaging:
This probably loses you 10% or so in terms of clock time, but it's
very convenient in terms of being able to run it on a GridEngine-type
setup where different jobs may run on different machines with
different types of GPU, without necessarily having fast
interconnections, and not necessarily all running at the same time.
It also has advantages in reproducibility since with these
asynchronous SGD things the performance might depend on how loaded the
network is.
If we were using a standard machine learning toolkit all our recipes
would have to be re-tuned from scratch.
It's definitely something that is worthwhile, but I don't have time to
take it on right now.
I suspect some of these companies don't realize how much work it is--
it's not something you can sponsor students to do, you need experts to
devote significant time to it.
Dan
>> > email to
kaldi-help+...@googlegroups.com.
>> email to
kaldi-help+...@googlegroups.com.
> --
> Go to
http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
kaldi-help+...@googlegroups.com.