A little help understanding AdaGrad in learning_rule.py

42 views
Skip to first unread message

Muhammad Ali

unread,
Jul 7, 2015, 2:58:11 PM7/7/15
to pylea...@googlegroups.com
I am reading through the source code in an effort to understand and implement a custom implementation of AdaGrad. I'd be really thankful if anyone could help me resolve a small confusion. Line 364 in learning_rule.py says: 

sum_square_grad = sharedX(param.get_value() * 0.)

I do not understand what this line is trying to do. Why multiply with the zero? What part of the algorithm can I see here?
I'm sorry if I sound naive. I hope to learn more and more.

Matthew Koichi Grimes

unread,
Jul 7, 2015, 3:01:38 PM7/7/15
to pylea...@googlegroups.com
It's creating a shared variable of dtype floatX (usually floatX=float32) with the same array shape as param, and filling it with zeros.

--

---
You received this message because you are subscribed to the Google Groups "pylearn-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-dev...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Muhammad Ali

unread,
Jul 7, 2015, 3:10:01 PM7/7/15
to pylea...@googlegroups.com, mk...@cam.ac.uk
Hi Matthew! Thank you for responding.
Going on at line 370:

new_sum_squared_grad = ( sum_square_grad + T.sqr(grads[param]) )

The loop then adds the squares of all the parameters to new_sum_squared_grad. Isn't AdaGrad supposed to retain a per parameter history instead of squaring and adding all the parameters? Where is my understanding wrong?

Matthew Koichi Grimes

unread,
Jul 7, 2015, 3:36:23 PM7/7/15
to Muhammad Ali, pylea...@googlegroups.com
Sorry, I'm not familiar with the AdaGrad algorithm, so I can't help you there.

Pascal Lamblin

unread,
Jul 7, 2015, 4:48:31 PM7/7/15
to pylea...@googlegroups.com
On Tue, Jul 07, 2015, Muhammad Ali wrote:
> Hi Matthew! Thank you for responding.
> Going on at line 370:
>
> new_sum_squared_grad = ( sum_square_grad + T.sqr(grads[param]) )
>
> The loop then adds the squares of all the parameters to
> new_sum_squared_grad. Isn't AdaGrad supposed to retain a per parameter
> history instead of squaring and adding all the parameters? Where is my
> understanding wrong?

That is right, and this is what this code is doing.
That line is in the middle of a loop over all parameters, so it only deals
with one parameter (`param`).

There is a different "sum_square_grad", as well as a different
new_sum_squared_grad, for each param.

>
> On Wednesday, July 8, 2015 at 12:01:38 AM UTC+5, Matthew Koichi Grimes
> wrote:
> >
> > It's creating a shared variable of dtype floatX (usually floatX=float32)
> > with the same array shape as param, and filling it with zeros.
> >
> > On Tue, Jul 7, 2015 at 7:58 PM Muhammad Ali <mohumm...@gmail.com
> > <javascript:>> wrote:
> >
> >> I am reading through the source code in an effort to understand and
> >> implement a custom implementation of AdaGrad. I'd be really thankful if
> >> anyone could help me resolve a small confusion. Line 364 in
> >> learning_rule.py says:
> >>
> >> sum_square_grad = sharedX(param.get_value() * 0.)
> >>
> >> I do not understand what this line is trying to do. Why multiply with the
> >> zero? What part of the algorithm can I see here?
> >> I'm sorry if I sound naive. I hope to learn more and more.
> >>
> >> --
> >>
> >> ---
> >> You received this message because you are subscribed to the Google Groups
> >> "pylearn-dev" group.
> >> To unsubscribe from this group and stop receiving emails from it, send an
> >> email to pylearn-dev...@googlegroups.com <javascript:>.
> >> For more options, visit https://groups.google.com/d/optout.
> >>
> >
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "pylearn-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-dev...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


--
Pascal
Reply all
Reply to author
Forward
0 new messages