Groups

Does Caffe normalize the gradient by the batch size?

375 views

Skip to first unread message

Ran Manor

unread,

Jan 17, 2015, 3:56:14 PM1/17/15

to caffe...@googlegroups.com

Hi,

Does Caffe normalize the gradient by the batch size?

Can someone point me to where it happens in the code?

Thanks,

Ran

Evan Shelhamer

unread,

Jan 19, 2015, 3:57:39 PM1/19/15

to Ran Manor, caffe...@googlegroups.com

The loss is divided by the batch size, which accordingly scales the gradients, as in EUCLIDEAN_LOSS https://github.com/BVLC/caffe/blob/master/src/caffe/layers/euclidean_loss_layer.cpp#L31 for example.

Evan Shelhamer

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/b479dab3-d8f0-4cf5-ab4f-318df914df5b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ran Manor

unread,

Jan 19, 2015, 5:28:29 PM1/19/15

to Evan Shelhamer, caffe...@googlegroups.com

Thanks!

ngc...@gmail.com

unread,

Jan 27, 2015, 10:31:53 PM1/27/15

to caffe...@googlegroups.com, shel...@eecs.berkeley.edu

So the learning rate should be invariant to the batch size right?

ie if I used a large batch, I do not need to reduce the learning rate proportionally, or, if I get a nan, I should just try smaller learning rates rather than tweak the batch size.

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu