On Tue, Jul 07, 2015, Muhammad Ali wrote:
> Hi Matthew! Thank you for responding.
> Going on at line 370:
>
> new_sum_squared_grad = ( sum_square_grad + T.sqr(grads[param]) )
>
> The loop then adds the squares of all the parameters to
> new_sum_squared_grad. Isn't AdaGrad supposed to retain a per parameter
> history instead of squaring and adding all the parameters? Where is my
> understanding wrong?
That is right, and this is what this code is doing.
That line is in the middle of a loop over all parameters, so it only deals
with one parameter (`param`).
There is a different "sum_square_grad", as well as a different
new_sum_squared_grad, for each param.
>
> On Wednesday, July 8, 2015 at 12:01:38 AM UTC+5, Matthew Koichi Grimes
> wrote:
> >
> > It's creating a shared variable of dtype floatX (usually floatX=float32)
> > with the same array shape as param, and filling it with zeros.
> >
> > On Tue, Jul 7, 2015 at 7:58 PM Muhammad Ali <
mohumm...@gmail.com
> > <javascript:>> wrote:
> >
> >> I am reading through the source code in an effort to understand and
> >> implement a custom implementation of AdaGrad. I'd be really thankful if
> >> anyone could help me resolve a small confusion. Line 364 in
> >> learning_rule.py says:
> >>
> >> sum_square_grad = sharedX(param.get_value() * 0.)
> >>
> >> I do not understand what this line is trying to do. Why multiply with the
> >> zero? What part of the algorithm can I see here?
> >> I'm sorry if I sound naive. I hope to learn more and more.
> >>
> >> --
> >>
> >> ---
> >> You received this message because you are subscribed to the Google Groups
> >> "pylearn-dev" group.
> >> To unsubscribe from this group and stop receiving emails from it, send an
> >> email to
pylearn-dev...@googlegroups.com <javascript:>.
--
Pascal