Can I change activation functions in LSTM?

Donghyun Lee

unread,

Oct 16, 2015, 12:49:33 AM10/16/15

to kaldi-help

Hello.

At last week, I trained nnet3-based LSTM acoustic model using WSJ corpus and I got results.

Suddenly, I have a question: Can I change activation functions in LSTM?

For example, now, activation function of input gate is implemented with sigmoid function,

but I wanna change this function to tanh function.

In this case, for doing this,

if I just modify AddLstmLayer function in steps/nnet3/components.py file, it can be implemented ?

Or have to change another files?

Best regards,

Donghyun.

(p.s) Thank you for your response of my previous question, Dan.

Xingyu Na

unread,

Oct 16, 2015, 12:55:24 AM10/16/15

to kaldi...@googlegroups.com

Hello.

At last week, I trained nnet3-based LSTM acoustic model using WSJ corpus and I got results.

Suddenly, I have a question: Can I change activation functions in LSTM?

For example, now, activation function of input gate is implemented with sigmoid function,

but I wanna change this function to tanh function.

In this case, for doing this,

if I just modify AddLstmLayer function in steps/nnet3/components.py file, it can be implemented ?

Yes. That will change the active function for you in the config file.

Or have to change another files?

Best regards,

Donghyun.

(p.s) Thank you for your response of my previous question, Dan.

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Povey

unread,

Oct 16, 2015, 12:59:15 AM10/16/15

to kaldi-help

At last week, I trained nnet3-based LSTM acoustic model using WSJ corpus and I got results.

Suddenly, I have a question: Can I change activation functions in LSTM?

For example, now, activation function of input gate is implemented with sigmoid function,

but I wanna change this function to tanh function.

In this case, for doing this,

if I just modify AddLstmLayer function in steps/nnet3/components.py file, it can be implemented ?

Yes, that is sufficient.

Or have to change another files?

The only thing to watch out for is that we have something in the current training script that greps for the sigmoid components - the purpose there is to detect when the sigmoids are getting over-saturated and if so, to shrink the parameters in the model. If you change that sigmoid to tanh, I would suggest that you replace Sigmoid in the inline perl script with Tanh, and the 0.15 'shrink_threshold' parameter to something like 0.55. (it represents the average derivative of the tanh units measured on the training data, and the derivatives of tanh units have a 4 times larger range than those of the sigmoids; they go from 0 to 1 instead of from 0 to 0.25.).

You may also want to decrease the learning rates, e.g. by a factor of 4 to 10.

Dan

Best regards,

Donghyun.

(p.s) Thank you for your response of my previous question, Dan.

Donghyun Lee

unread,

Oct 16, 2015, 1:02:16 AM10/16/15

to kaldi-help

Thank you for your response.

In AddLstmLayer funcation of steps/nnet3/components.py file, I can see the following code:

components.append("# Defining the non-linearities")

components.append("component name={0}_i type=SigmoidComponent dim={1}".format(name, cell_dim))

components.append("component name={0}_f type=SigmoidComponent dim={1}".format(name, cell_dim))

components.append("component name={0}_o type=SigmoidComponent dim={1}".format(name, cell_dim))

components.append("component name={0}_g type=TanhComponent dim={1}".format(name, cell_dim))

components.append("component name={0}_h type=TanhComponent dim={1}".format(name, cell_dim))

Xingyu.

You mean, if I change SigmoidComponent to TanhComponent in 2nd line of the bellow code, I can get some results of my question?

Best regards,

Donghyun

Xingyu Na

unread,

Oct 16, 2015, 1:08:53 AM10/16/15

to kaldi...@googlegroups.com

Yes. But as Dan pointed out, you need to watch out for some activation-saturation related stuff in the training script.

Donghyun Lee

unread,

Oct 16, 2015, 1:11:59 AM10/16/15

to kaldi-help, dpo...@gmail.com

Thank you for your response, Dan.

I have two questions:

1) Reall sorry, but I don't know what perl scirpt in what you said.

Is it in wsj/s5/steps/nnet3/ directory?

2) It is also possible to use Relu component instead of Tanh component in the following code?

components.append("component name={0}_g type=TanhComponent dim={1}".format(name, cell_dim))

components.append("component name={0}_h type=TanhComponent dim={1}".format(name, cell_dim))

Best regards,

Donghyun.

Daniel Povey

unread,

Oct 16, 2015, 1:14:36 AM10/16/15

to Donghyun Lee, kaldi-help

An inline perl script inside steps/nnet3/lstm/train.sh that has the word 'Sigmoid' in it, change that to tanh.

No, you can't use ReLU there-- well, you can, but it won't work as well; the whole point of those nodes is to limit the dynamic range of the output.

Also, changing the sigmoids to tanh is not a good idea because the whole point is that the output should be between 0 and 1. Changing the Tanh's to sigmoids, on the other hand, is possible (but again, probably won't be helpful).

Dan

Donghyun Lee

unread,

Oct 16, 2015, 1:20:31 AM10/16/15

to kaldi-help, dino...@gmail.com, dpo...@gmail.com, asr.na...@gmail.com

Okay, I got it.

Really?

I just think that I can use Relu function to activation functions in input and output of hidden node consist of LSTM cell.