Why the reluGRU makes the kaldi crash?

lishimi...@gmail.com

unread,

Oct 26, 2017, 7:40:15 AM10/26/17

to kaldi-help

I just want to implement the reluGRU in the interspeech2017 paper "Improving speech recognition by revising gated recurrent units"

I have just sucessfully implement the simple GRU. it is ok.

But when I just change the tanh to Relu complement. The kaldi is crashed. The log is like this:

nnet3-chain-train --apply-deriv-weights=False --l2-regularize=5e-05 --leaky-hmm-coefficient=0.1 --write-cache=exp/chain/gru_6j_relu_ld5_sp/cache.1 --xent-regularize=0.025 --optimization.min-deriv-time=-8 --optimization.max-deriv-time-relative=15 --print-interval=10 --momentum=0.0 --max-param-change=1.41421356237 --backstitch-training-scale=0.0 --backstitch-training-interval=1 --srand=0 'nnet3-am-copy --raw=true --learning-rate=0.003 --scale=0.99 exp/chain/gru_6j_relu_ld5_sp/0.mdl - |' exp/chain/gru_6j_relu_ld5_sp/den.fst 'ark,bg:nnet3-chain-copy-egs --frame-shift=1 ark:/nobackup/f1/asr/zhangshaofu/kaldi/egs/swbdgru/s5c/exp/chain/gru_6j_ld5_sp/egs/cegs.1.ark ark:- | nnet3-chain-shuffle-egs --buffer-size=5000 --srand=0 ark:- ark:- | nnet3-chain-merge-egs --minibatch-size=32 ark:- ark:- |' exp/chain/gru_6j_relu_ld5_sp/1.1.raw

LOG (nnet3-chain-train[5.2]:IsComputeExclusive():cu-device.cc:263) CUDA setup operating under Compute Exclusive Process Mode.

LOG (nnet3-chain-train[5.2]:FinalizeActiveGpu():cu-device.cc:225) The active GPU is [3]: Tesla K20m free:4704M, used:95M, total:4799M, free/total:0.980196 version 3.5

nnet3-am-copy --raw=true --learning-rate=0.003 --scale=0.99 exp/chain/gru_6j_relu_ld5_sp/0.mdl -

WARNING (nnet3-am-copy[5.2]:Check():nnet-nnet.cc:783) Node lda.delayed is never used to compute any output.

LOG (nnet3-am-copy[5.2]:main():nnet3-am-copy.cc:140) Copied neural net from exp/chain/gru_6j_relu_ld5_sp/0.mdl to raw format as -

WARNING (nnet3-chain-train[5.2]:Check():nnet-nnet.cc:783) Node lda.delayed is never used to compute any output.

nnet3-chain-shuffle-egs --buffer-size=5000 --srand=0 ark:- ark:-

nnet3-chain-merge-egs --minibatch-size=32 ark:- ark:-

nnet3-chain-copy-egs --frame-shift=1 ark:/nobackup/f1/asr/zhangshaofu/kaldi/egs/swbdgru/s5c/exp/chain/gru_6j_ld5_sp/egs/cegs.1.ark ark:-

ASSERTION_FAILED (nnet3-chain-train[5.2]:HouseBackward():qr.cc:124) : 'KALDI_ISFINITE(sigma) && "Tridiagonalizing matrix that is too large or has NaNs."'

[ Stack-Trace: ]

nnet3-chain-train() [0x11a26c0]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)

kaldi::MessageLogger::~MessageLogger()

kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)

void kaldi::HouseBackward<float>(int, float const*, float*, float*)

kaldi::SpMatrix<float>::Tridiagonalize(kaldi::MatrixBase<float>*)

kaldi::SpMatrix<float>::Eig(kaldi::VectorBase<float>*, kaldi::MatrixBase<float>*) const

kaldi::nnet3::OnlineNaturalGradient::PreconditionDirectionsInternal(int, float, kaldi::Vector<float> const&, kaldi::CuMatrixBase<float>*, kaldi::CuMatrixBase<float>*, kaldi::CuVectorBase<float>*, float*)

kaldi::nnet3::OnlineNaturalGradient::PreconditionDirections(kaldi::CuMatrixBase<float>*, kaldi::CuVectorBase<float>*, float*)

.

kaldi::nnet3::OnlineNaturalGradient::PreconditionDirections(kaldi::CuMatrixBase<float>*, kaldi::CuVectorBase<float>*, float*)

kaldi::nnet3::NaturalGradientAffineComponent::Update(std::string const&, kaldi::CuMatrixBase<float> const&, kaldi::CuMatrixBase<float> const&)

kaldi::nnet3::AffineComponent::Backprop(std::string const&, kaldi::nnet3::ComponentPrecomputedIndexes const*, kaldi::CuMatrixBase<float> const&, kaldi::CuMatrixBase<float> const&, kaldi::CuMatrixBase<float> const&, void*, kaldi::nnet3::Component*, kaldi::CuMatrixBase<float>*) const

kaldi::nnet3::NnetComputer::ExecuteCommand()

kaldi::nnet3::NnetComputer::Run()

kaldi::nnet3::NnetChainTrainer::TrainInternal(kaldi::nnet3::NnetChainExample const&, kaldi::nnet3::NnetComputation const&)

kaldi::nnet3::NnetChainTrainer::Train(kaldi::nnet3::NnetChainExample const&)

main

__libc_start_main

nnet3-chain-train() [0xcf4999]

# Accounting: time=6 threads=1

# Finished at Thu Oct 26 19:15:44 CST 2017 with status 134

So I do not know what is wrong? Can you help me ? thank you !

lishimi...@gmail.com

unread,

Oct 26, 2017, 8:08:27 AM10/26/17

to kaldi-help

When I just change the tanh to relu in the normal lstm.py . The program is also crashed!

Daniel Povey

unread,

Oct 26, 2017, 12:17:24 PM10/26/17

to kaldi-help

It's because of divergence of the activations. It's expected. In recurrent setups you need to have a way to limit the magnitude of the activations.

On Thu, Oct 26, 2017 at 8:08 AM, <lishimi...@gmail.com> wrote:

When I just change the tanh to relu in the normal lstm.py . The program is also crashed!

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/70a22193-0c20-4192-8fa1-f9a3e78ac609%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Shane Walker

unread,

Feb 7, 2018, 12:13:50 PM2/7/18

to kaldi-help

Are there straightforward activation clamping operations for this?

Daniel Povey

unread,

Feb 7, 2018, 1:38:32 PM2/7/18

to kaldi-help

You'd have to do some coding, to do activation clamping cleanly. But in any case I don't think that would work very well in this context.

I seem to remember someone pointing out some problems with that paper "Improving speech recognition by revising gated recurrent units" and the conclusion was that the method was probably not that useful.

Gaofeng Cheng is doing a lot of work trying to get various forms of GRUs to work. Maybe you can talk to him if you're interested in the topic. (Gaofeng Cheng <gfche...@gmail.com>). There are a lot of options. I think there may even be xconfig-level support for some variants of GRUs checked in.

Dan

On Wed, Feb 7, 2018 at 12:13 PM, Shane Walker <walker...@gmail.com> wrote:

Are there straightforward activation clamping operations for this?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/abbfb93d-14aa-45b5-9f47-c7053ceb7200%40googlegroups.com.

gaofeng

unread,

Feb 8, 2018, 4:17:42 AM2/8/18

to kaldi-help

IIRC the reported mGRU (relu-GRU with one gate) needs not only normalization but also specific normalization initialization to achieve good results on their test sets.

What I mean is that even with direct batch-normalization added, it may not be surprised to see reluGRU does not achieve reported results.

Also, my experience tells me that initialization is a very subtle thing, it (at least in my experiments) cannot guarantee performance if the training data is very large (1000hrs+, maybe 2000~10000 hrs).

在 2017年10月26日星期四 UTC+8下午8:08:27，lishimi...@gmail.com写道：

Reply all

Reply to author

Forward