--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
I looked at gmm-decode-biglm-faster (e.g. https://github.com/kaldi-asr/kaldi/blob/85a3dd5f0b71e419abf1169a26b759bfc423a543/src/gmmbin/gmm-decode-biglm-faster.cc#L168), BiglmFasterDecoder (https://github.com/kaldi-asr/kaldi/blob/master/src/decoder/biglm-faster-decoder.h#L51), and *DeterministicOnDemandFst (https://github.com/kaldi-asr/kaldi/blob/85a3dd5f0b71e419abf1169a26b759bfc423a543/src/fstext/deterministic-fst.h#L100)
I see that one option is to extend LatticeFasterOnlineDecoder (https://github.com/kaldi-asr/kaldi/blob/85a3dd5f0b71e419abf1169a26b759bfc423a543/src/decoder/lattice-faster-online-decoder.h#L47)
so that it would use fst::DeterministicOnDemandFst<fst::StdArc> (lm_diff_fst) as in a similar way as is BiglmFasterDecoder. Then, I would implement something like
fst::RNNDeterministicOnDemandFst<StdArc> using some RNN language model, e.g. based on NNET3.
And as you suggest, I would limit the word history to prevent the state space to get too large. This would have the advantage, that I can get scores for any N-gram history while keeping the RNN LM relatively small in memory. Though, it may be computationally expensive (depending on the size of the RNN LM) and the number of active states (word histories). I could limit the computational cost by using some form of cashing as described in https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/rnnFirstPass.pdf
However, considering approximating an RNN LM with some N-gram history (e.g. 4- grams) I could directly approximate RNN LM by an N-gram LM offline and extend only LatticeFasterOnlineDecoder to work with lm_diff_fst. My understanding is that the approximation of an RNN LM would have to result in a very large LM. The approximated LM must cover all possible (reasonable) N-gram histories as they are generated offline instead of online as in the approach described in the paragraph above. In this case, I would save time on RNN computations but it would require more memory.
I looked at gmm-decode-biglm-faster (e.g. https://github.com/kaldi-asr/kaldi/blob/85a3dd5f0b71e419abf1169a26b759bfc423a543/src/gmmbin/gmm-decode-biglm-faster.cc#L168), BiglmFasterDecoder (https://github.com/kaldi-asr/kaldi/blob/master/src/decoder/biglm-faster-decoder.h#L51), and *DeterministicOnDemandFst (https://github.com/kaldi-asr/kaldi/blob/85a3dd5f0b71e419abf1169a26b759bfc423a543/src/fstext/deterministic-fst.h#L100)
I see that one option is to extend LatticeFasterOnlineDecoder (https://github.com/kaldi-asr/kaldi/blob/85a3dd5f0b71e419abf1169a26b759bfc423a543/src/decoder/lattice-faster-online-decoder.h#L47)
so that it would use fst::DeterministicOnDemandFst<fst::StdArc> (lm_diff_fst) as in a similar way as is BiglmFasterDecoder. Then, I would implement something like
fst::RNNDeterministicOnDemandFst<StdArc> using some RNN language model, e.g. based on NNET3.Yes, although the nnet3-based RNNLMs are not ready (however, Hainan has been making progress).
And as you suggest, I would limit the word history to prevent the state space to get too large. This would have the advantage, that I can get scores for any N-gram history while keeping the RNN LM relatively small in memory. Though, it may be computationally expensive (depending on the size of the RNN LM) and the number of active states (word histories). I could limit the computational cost by using some form of cashing as described in https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/rnnFirstPass.pdf
However, considering approximating an RNN LM with some N-gram history (e.g. 4- grams) I could directly approximate RNN LM by an N-gram LM offline and extend only LatticeFasterOnlineDecoder to work with lm_diff_fst. My understanding is that the approximation of an RNN LM would have to result in a very large LM. The approximated LM must cover all possible (reasonable) N-gram histories as they are generated offline instead of online as in the approach described in the paragraph above. In this case, I would save time on RNN computations but it would require more memory.They tried this at Microsoft (I think Geoff Zweig was involved). It does not work very well.. The difference with what you get when you limit the state-space by mapping, say, all things with the same 4-gram history to the same state, is that with the latter, even though you might not get the correct history from more than 4 words ago, it's probably *almost* correct. When you completely throw away the history you get a bigger degradation and you lose most of the benefit of using an RNNLM in the first place.
Dan