About TensorFlow and Kaldi

1,949 views
Skip to first unread message

Searcher Ray

unread,
Sep 2, 2017, 4:54:14 PM9/2/17
to kaldi-help
Dear all,

As probably some of you guys have already noticed, Google released a post about Kaldi integration with TensorFlow [2]. There have been previous work present as TFKaldi and Dan used to have comments on TensorFlow [1]. However, this time the post from Google seems like an official one and their implementation has already been acquired into the Kaldi Github repo. But seems like the post focused more on advertising and just said slightly about what they have implemented. I roughly read the post and want to ensure that - Did they actually implement RNNLM with integrating TF but did nothing on acoustic part? Of course, that is still very useful. Thanks!

cheers 

Daniel Povey

unread,
Sep 2, 2017, 4:59:25 PM9/2/17
to kaldi-help
It's just related to RNNLMs, not the acoustic part.
I do plan at some point in the near future to figure out a way to
support exporting to TensorFlow for the inference (i.e. decoding) of
the acoustic models. For the training it would be extremely difficult
to integrate with TensorFlow (mostly because of LF-MMI), and anyway I
don't believe the performance in either speed, memory or WERs would be
better.

Dan
> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Daniel Galvez

unread,
Sep 8, 2017, 2:24:09 AM9/8/17
to kaldi-help
I feel this blog post is misleading. I can understand the goal to get publicity for recruiting and distinguishing yourself from other ASR companies, but ay.

I do want to poke you about this, Dan. I have been poked by a couple of startups in the bay claiming to be willing to sponsor development on integrating Kaldi with a mainstream ASR toolkit (I consider caffe2 and tensorflow the only serious options. CNTK is possible, but I suppose it's not fashionable). Again, they're probably willing to do this mostly for publicity and recruiting, but the point is that there is active interest in industry in such a thing.

I disagree with you on the goal of doing only decoding in tensorflow, personally. I don't have numbers to back me up on this, but I think it's a good bet that Kaldi is being held back in training because (1) we force a process to have only one GPU because the CuDevice class is a singleton and (2) Kaldi does not use CUDNN, which allows for a lot of useful goodies like drastically reducing global memory reads in LSTMs, 16-bit floating point weight storage for almost free, and the best convolution implementations around. (From my struggles adding cudnn to kaldi, I'd prefer to use someone else's handiwork here. Anyone curious can ask me what the problems were that made it hard.)

(1) is a problem because we are limiting our training throughput by first doing gradient descent in multiple processes, writing to disk, and then calling nnet3-combine to combine the multiple processes. Nowadays, NVIDIA's nccl library could do this all within the PCI bus (or NVLINK, but I don't remember CLSP having NVLINK), but it requires that a single process be able to access multiple GPUs. Note I don't have hard numbers on how much speed up this would get, but I suspect it's quite sizable.

Meanwhile, I'm personally not convinced that LF-MMI would be hard to integrate into, e.g., tensorflow, but I'm familiar only with the CUDA kernels.


> For more options, visit https://groups.google.com/d/optout.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Daniel Galvez

Daniel Povey

unread,
Sep 8, 2017, 5:05:28 PM9/8/17
to kaldi-help
Regarding integrating Kaldi's acoustic models with a mainstream toolkit:

I think that to do this right would involve a ground-up reimagining of
quite a lot of the mechanisms used in Kaldi. Even if it turned out
that there was a way to do it that enabled us to keep much of our
existing code, it would be a very huge project. The issue, IMO, is
that there's a kind of impedance mismatch between the way Kaldi is
designed and the way most of these toolkits work. Just one of the
issues is language (C++ versus python). A lot of Kaldi code is in C++
and interfacing that with some of these toolkits would be quite hard.
They can be fairly easy to use in python if you use them as intended,
but (at least this my experience from trying to understand TensorFlow)
as soon as you either look into the internals of the python code or
into the C++ code, things spiral quickly out of control in terms of
the complexity. I imagine the TensorFlow team must have some internal
documentation on how it's designed from the C++ level, for instance,
because what is available externally doesn't help you understand it at
all, and the code is almost completely opaque. (And I consider myself
an expert level C++ programmer).

Pytorch seems to have a more straightforward design from a user
perspective than TensorFlow (I just attended a talk on PyTorch,
actually) and might possibly be easier to integrate with, but this
isn't something that's really on my agenda right now; anyway, most of
the same issues apply as the design isn't really that different from
TensorFlow or Theano.

Also there are some things we do, like the self-repair and
(particularly) natural gradient, that are helpful but which would be
hard to reproduce in one of the standard deep learning toolkits, due
to fundamental differences in design.

Regarding CuDNN: the reason we're not using that is
- Because I didn't want an extra dependency
- Because it would create hassles with being able to do inference on
CPU (i.e. we'd have to create interfaces that were compatible with the
CuDNN ones in order to have a parallel CPU implementation), and CuDNN
does not provide a CPU implementation
- Because it takes away quite a lot of your freedom to experiment
(e.g. their LSTM code can't be modified to variants of LSTMs).

Regarding the model averaging:
This probably loses you 10% or so in terms of clock time, but it's
very convenient in terms of being able to run it on a GridEngine-type
setup where different jobs may run on different machines with
different types of GPU, without necessarily having fast
interconnections, and not necessarily all running at the same time.
It also has advantages in reproducibility since with these
asynchronous SGD things the performance might depend on how loaded the
network is.

If we were using a standard machine learning toolkit all our recipes
would have to be re-tuned from scratch.
It's definitely something that is worthwhile, but I don't have time to
take it on right now.
I suspect some of these companies don't realize how much work it is--
it's not something you can sponsor students to do, you need experts to
devote significant time to it.

Dan
>> > email to kaldi-help+...@googlegroups.com.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> Go to http://kaldi-asr.org/forums.html find out how to join
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "kaldi-help" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to kaldi-help+...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Daniel Galvez
>
> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.

Daniel Galvez

unread,
Sep 9, 2017, 2:01:52 PM9/9/17
to kaldi-help
Some thoughts of my own:

It is unfortunate that python is the only first-class interface for training in just about everything other than CNTK. Not that CNTK's other interfaces (brainscript and the crazy model description language and .cntk files) are better targets. Since operations (what we call components in nnet3) can be defined in both python and C++ in (IIRC) caffe2, tensorflow, and CNTK, there would always be a need to rewrite code if a C++-based interface to Kaldi is established if someone ever wanted to use an operations implemented only in python, for example.

Based on your claims, I have begun to dig deep into caffe2, which has sizable C++ and python codebases, (no particular reason for that, other than that I consider CNTK's codebase messy and have never seriously looked at tensorflow). Its approach is to use pybind11, which is a stripped down version of boost::python.I like the pybind11 library quite a lot so far.

In my opinion, it's not worthwhile to redo all the recipes in, e.g., tensorflow. There are absolutely no incentives for that as far as I'm concerned. Probably the most useful code for a mainstream neural net library to leverage is the I/O code and the decoder code. Leveraging the I/O code would allow someone to reuse much of the existing data preparation scripts by simply reading minibatches from .ark and .scp files after gmm training is done. I suspect that providing an interface to decoder code would be much hairier because it has a much larger scope.

I can't argue with you about performance of using something other than model averaging on disk without actually experimenting with that myself (which is hard to do because of Kaldi's singleton GPU device class and CLSP's current setup with SGE). I will mention that there is work on performant non-asynchronous SGD exactly because of reproducibility issues of asynchronous SGD, though.

Overall point is that I'm actively thinking about this.


>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> Go to http://kaldi-asr.org/forums.html find out how to join
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "kaldi-help" group.
>> To unsubscribe from this group and stop receiving emails from it, send an

>> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Daniel Galvez
>
> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> For more options, visit https://groups.google.com/d/optout.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Daniel Galvez

Daniel Povey

unread,
Sep 9, 2017, 2:24:47 PM9/9/17
to kaldi-help
I'm not sure that it's quite correct to identify components with the
operations of the standard neural net toolkits. (Actually I have been
thinking about this topic lately myself, as I try to figure out what
the essential difference is between nnet3 and the standard neural net
toolkits).

The operations of standard neural net toolkits are n-ary operations
that are like mathematical functions, e.g. you'd have an operation
for "+", one for matrix multiplication, one for convolution, etc.
Usually they have zero or more inputs and one output. But they don't
have their own parameters (the parameters would be an input to the an
operation, e.g. a Variable in Theano or TensorFlow or PyTorch). In
nnet3 the components are unary operations that in general contain
their own parameters.

This might be the key difference-- that in nnet3 the parameters are
treated differently from data and hidden activations, i.e. they live
in the components. And the components generally accept a bunch of
configuration information. Also the models have a generic interface
so that you can write a "general-purpose" decoder, unlike with things
like TensorFlow where as I understand it, you'd generally be expected
to put model-specific code into the decoder. Plus, n nnet3 the user
doesn't say what to compute programmatically (e.g. the user doesn't
have to figure out what 't' values need to be computed in the
different layers, and decide how they are represented in matrices or
tensors); instead the user specifies the data flow between layers and
the framework decides how things are arranged into physical matrices.
Of course nnet3 is a much less general-purpose framework, i.e. it's
geared towards things that look like neural networks; it's not a
general way of doing backprop through arbitrary expressions.

Some of the things we do in nnet3 e.g. with natural gradient and
self-repair, would not be very natural to do in the standard toolkits.

Dan
>> >> > email to kaldi-help+...@googlegroups.com.
>> >> > For more options, visit https://groups.google.com/d/optout.
>> >>
>> >> --
>> >> Go to http://kaldi-asr.org/forums.html find out how to join
>> >> ---
>> >> You received this message because you are subscribed to the Google
>> >> Groups
>> >> "kaldi-help" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> >> an
>> >> email to kaldi-help+...@googlegroups.com.
>> >> For more options, visit https://groups.google.com/d/optout.
>> >
>> >
>> >
>> >
>> > --
>> > Daniel Galvez
>> >
>> > --
>> > Go to http://kaldi-asr.org/forums.html find out how to join
>> > ---
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "kaldi-help" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to kaldi-help+...@googlegroups.com.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> Go to http://kaldi-asr.org/forums.html find out how to join
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "kaldi-help" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to kaldi-help+...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Daniel Galvez
>
> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.

Arkadi Gurevich

unread,
Sep 4, 2018, 2:57:56 PM9/4/18
to kaldi-help
Hi Dan, Daniel 

I do not understand what it means to export an acoustic model to tensorflow , 
do you intend on making decoding ( forwarding ) on tensorflow ? 

I want to use the parameters of the trained acoustic model ( the TDNN-F of mini-librispeech ) 
and build a neural network in tensorflow , isn't it possible ?  

According to what I understand, the resulting network is a "standard" neural network , 
no different from the one build in the standard tools ( e.g tensorflow ) 

All best,
Arkadi 

Daniel Povey

unread,
Sep 4, 2018, 5:59:58 PM9/4/18
to kaldi-help
> I do not understand what it means to export an acoustic model to tensorflow ,
> do you intend on making decoding ( forwarding ) on tensorflow ?
>
> I want to use the parameters of the trained acoustic model ( the TDNN-F of mini-librispeech )
> and build a neural network in tensorflow , isn't it possible ?
>
> According to what I understand, the resulting network is a "standard" neural network ,
> no different from the one build in the standard tools ( e.g tensorflow )

That is not possible. The explanation for why is long.

Dan
> To post to this group, send email to kaldi...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/2d2cf783-542e-41e7-89ca-68ab94f472b9%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages