Does Caffe normalize the gradient by the batch size?

375 views
Skip to first unread message

Ran Manor

unread,
Jan 17, 2015, 3:56:14 PM1/17/15
to caffe...@googlegroups.com
Hi,

Does Caffe normalize the gradient by the batch size?
Can someone point me to where it happens in the code?

Thanks,
Ran

Evan Shelhamer

unread,
Jan 19, 2015, 3:57:39 PM1/19/15
to Ran Manor, caffe...@googlegroups.com
The loss is divided by the batch size, which accordingly scales the gradients, as in EUCLIDEAN_LOSS https://github.com/BVLC/caffe/blob/master/src/caffe/layers/euclidean_loss_layer.cpp#L31 for example.

Evan Shelhamer

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/b479dab3-d8f0-4cf5-ab4f-318df914df5b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ran Manor

unread,
Jan 19, 2015, 5:28:29 PM1/19/15
to Evan Shelhamer, caffe...@googlegroups.com
Thanks!

ngc...@gmail.com

unread,
Jan 27, 2015, 10:31:53 PM1/27/15
to caffe...@googlegroups.com, shel...@eecs.berkeley.edu
So the learning rate should be invariant to the batch size right?
ie if I used a large batch, I do not need to reduce the learning rate proportionally, or, if I get a nan, I should just try smaller learning rates rather than tweak the batch size.
Reply all
Reply to author
Forward
0 new messages