which technique is being used to initialize weights for LSTM?

mirfan.ms...@seecs.edu.pk

unread,

Feb 17, 2018, 2:43:43 AM2/17/18

to kaldi-help

I want to know which weights initialization method is used is it randomization or something else?

Message has been deleted

Daniel Povey

unread,

Feb 17, 2018, 4:01:28 PM2/17/18

to kaldi-help

We pretty just use the standard glorot initialization (stddev of each weight matrix's parameters = 1/sqrt(input dimension)). The output layer of the network has zero initialization though.

The rules for the bias initialization and peephole parameter initialization, I don't recall; you'd have to check the xconfig scripts (xconfig/lstm.py) or the generated configs files.

Dan

On Sat, Feb 17, 2018 at 3:20 PM, Zoltán Somogyi <zsomo...@gmail.com> wrote:

Read this PDF for an example (see section LSTM Initialization): https://arxiv.org/pdf/1707.00722.pdf

On Saturday, February 17, 2018 at 8:43:43 AM UTC+1, mirfan.ms...@seecs.edu.pk wrote:
I want to know which weights initialization method is used is it randomization or something else?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/65791622-bf56-4f85-81a3-a7d5b30cfad9%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

mirfan.ms...@seecs.edu.pk

unread,

Feb 19, 2018, 3:50:04 AM2/19/18

to kaldi-help

Well, I got there and find out what is says in lstm.py

# ng-affine-options='' [Additional options used for the full matrices in the LSTM, can be used to do things like set biases to initialize to 1]

# ng-per-element-scale-options='' [Additional options used for the diagonal matrices in the LSTM ]

and default settings for lstm is:

def set_default_configs(self):

self.config = {'input':'[-1]',

'cell-dim' : -1, # this is a compulsory argument

'clipping-threshold' : 30.0,

'delay' : -1,

'ng-per-element-scale-options' : ' max-change=0.75',

'ng-affine-options' : ' max-change=0.75 ',

'self-repair-scale-nonlinearity' : 0.00001,

'zeroing-interval' : 20,

'zeroing-threshold' : 15.0,

'decay-time': -1.0

}

I'm confused here. I think normal distribution is being used for weight and 1 for bias vectors. Can you tell which method is used?

On Sunday, February 18, 2018 at 2:01:28 AM UTC+5, Dan Povey wrote:

We pretty just use the standard glorot initialization (stddev of each weight matrix's parameters = 1/sqrt(input dimension)). The output layer of the network has zero initialization though.
The rules for the bias initialization and peephole parameter initialization, I don't recall; you'd have to check the xconfig scripts (xconfig/lstm.py) or the generated configs files.

Dan

On Sat, Feb 17, 2018 at 3:20 PM, Zoltán Somogyi <zsomo...@gmail.com> wrote:

Read this PDF for an example (see section LSTM Initialization): https://arxiv.org/pdf/1707.00722.pdf

On Saturday, February 17, 2018 at 8:43:43 AM UTC+1, mirfan.ms...@seecs.edu.pk wrote:
I want to know which weights initialization method is used is it randomization or something else?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Daniel Povey

unread,

Feb 19, 2018, 3:53:56 PM2/19/18

to kaldi-help

That's just a comment saying that you could potentially do that if you wanted to.

What matters is the defaults in NaturalGradientAffineComponent::InitFromConfig().

The default bias initialization is with mean 0 and a standard deviation of 1; the default affine-parameter initialization is mean 0 and standard deviation of 1/sqrt(input_dim).

It's not clear that initializing the bias with nonzero values is the right thing to do or is even the standard method, but that's the way we do it; IIRC, when I tried to change the bias initializations to 0, either the results were the same or a little worse, so I left it as-is.

Dan

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/07a9661b-8917-429f-8c29-4cae43f27155%40googlegroups.com.

mirfan.ms...@seecs.edu.pk

unread,

May 2, 2018, 8:06:47 AM5/2/18

to kaldi-help

One more thing @dan. How are weights being updated during training? I mean can I get a mathematical representation of updating the weights during training? I've tried to understand it in

nnet-simple-component but failed to get the bigger picture.

Reply all

Reply to author

Forward