which technique is being used to initialize weights for LSTM?

235 views
Skip to first unread message

mirfan.ms...@seecs.edu.pk

unread,
Feb 17, 2018, 2:43:43 AM2/17/18
to kaldi-help
I want to know which weights initialization method is used is it randomization or something else?
Message has been deleted

Daniel Povey

unread,
Feb 17, 2018, 4:01:28 PM2/17/18
to kaldi-help
We pretty just use the standard glorot initialization (stddev of each weight matrix's parameters = 1/sqrt(input dimension)).  The output layer of the network has zero initialization though.
The rules for the bias initialization and peephole parameter initialization, I don't recall; you'd have to check the xconfig scripts (xconfig/lstm.py) or the generated configs files.

Dan


On Sat, Feb 17, 2018 at 3:20 PM, Zoltán Somogyi <zsomo...@gmail.com> wrote:
Read this PDF for an example (see section LSTM Initialization): https://arxiv.org/pdf/1707.00722.pdf 


On Saturday, February 17, 2018 at 8:43:43 AM UTC+1, mirfan.ms...@seecs.edu.pk wrote:
I want to know which weights initialization method is used is it randomization or something else?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/65791622-bf56-4f85-81a3-a7d5b30cfad9%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

mirfan.ms...@seecs.edu.pk

unread,
Feb 19, 2018, 3:50:04 AM2/19/18
to kaldi-help
Well, I got there and find out what is says in lstm.py


#   ng-affine-options=''                [Additional options used for the full matrices in the LSTM, can be used to do things like set biases to initialize to 1]

#   ng-per-element-scale-options=''     [Additional options used for the diagonal matrices in the LSTM ]



and default settings for lstm is:

def set_default_configs(self):

        self.config = {'input':'[-1]',

                        'cell-dim' : -1, # this is a compulsory argument

                        'clipping-threshold' : 30.0,

                        'delay' : -1,

                        'ng-per-element-scale-options' : ' max-change=0.75',

                        'ng-affine-options' : ' max-change=0.75 ',

                        'self-repair-scale-nonlinearity' : 0.00001,

                        'zeroing-interval' : 20,

                        'zeroing-threshold' : 15.0,

                        'decay-time':  -1.0

                        }


I'm confused here. I think normal distribution is being used for weight and 1 for bias vectors. Can you tell which method is used?



On Sunday, February 18, 2018 at 2:01:28 AM UTC+5, Dan Povey wrote:
We pretty just use the standard glorot initialization (stddev of each weight matrix's parameters = 1/sqrt(input dimension)).  The output layer of the network has zero initialization though.
The rules for the bias initialization and peephole parameter initialization, I don't recall; you'd have to check the xconfig scripts (xconfig/lstm.py) or the generated configs files.

Dan

On Sat, Feb 17, 2018 at 3:20 PM, Zoltán Somogyi <zsomo...@gmail.com> wrote:
Read this PDF for an example (see section LSTM Initialization): https://arxiv.org/pdf/1707.00722.pdf 

On Saturday, February 17, 2018 at 8:43:43 AM UTC+1, mirfan.ms...@seecs.edu.pk wrote:
I want to know which weights initialization method is used is it randomization or something else?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Feb 19, 2018, 3:53:56 PM2/19/18
to kaldi-help
That's just a comment saying that you could potentially do that if you wanted to.
What matters is the defaults in  NaturalGradientAffineComponent::InitFromConfig().
The default bias initialization is with mean 0 and a standard deviation of 1; the default affine-parameter initialization is mean 0 and standard deviation of  1/sqrt(input_dim).

It's not clear that initializing the bias with nonzero values is the right thing to do or is even the standard method, but that's the way we do it; IIRC, when I tried to change the bias initializations to 0, either the results were the same or a little worse, so I left it as-is.


Dan


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

mirfan.ms...@seecs.edu.pk

unread,
May 2, 2018, 8:06:47 AM5/2/18
to kaldi-help
One more thing @dan. How are weights being updated during training? I mean can I get a mathematical representation of updating the weights during training? I've tried to understand it in 
nnet-simple-component but failed to get the bigger picture.
Reply all
Reply to author
Forward
0 new messages