Example: Fine tuning flickr dataset

Ankit Arya

unread,

Oct 5, 2014, 2:04:27 AM10/5/14

to caffe...@googlegroups.com

This is from the flickr example (http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html):

"We will also decrease the overall learning rate base_lr in the solver prototxt, but boost the blobs_lron the newly introduced layer. The idea is to have the rest of the model change very slowly with new data, but let the new layer learn fast. Additionally, we set stepsize in the solver to a lower value than if we were training from scratch, since we’re virtually far along in training and therefore want the learning rate to go down faster. Note that we could also entirely prevent fine-tuning of all layers other than fc8_flickr by setting their blobs_lr to 0."

What I don't understand is why would setting blobs_r to 0 prevent fine-tuning. From what I understand, blob_lr is responsible for training the last layer, and base_lr for fine-tuning the network. Can someone elaborate on this ? I am confused.

Thanks in advance

Yangqing Jia

unread,

Oct 5, 2014, 2:08:18 AM10/5/14

to Ankit Arya, caffe...@googlegroups.com

base_lr is set as the base scale of the learning rate applied to the network; blob_lr is a blob-specific multiplier that is applied to the base learning rate on a per-blob basis (blob_lr defaults to 1). The final learning rate for a blob is set to base_lr * blobs_lr.

The reason behind it is that, when we do SGD and want to gradually reduce the overall learning rate, we can simply decrease the base_lr. blobs_lr allows us to change the *relative* scale between different parameters - for example, we can set the learning rate of the bias terms to be twice as that of the weights.

Yangqing

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/d0dde4c0-5403-40e6-bd1d-3725b3543b72%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Evan Shelhamer

unread,

Oct 5, 2014, 3:22:53 AM10/5/14

to Yangqing Jia, Ankit Arya, caffe...@googlegroups.com

In the case of blobs_lr = 0, the point is to fix parts of the model and only learn a subset of the parameters. This could be done for instance to learn a new classifier on the fc7 features quickly by setting all the blobs_lr but the fc8-style layer's to 0.

Caffe configures the loss and learning rates when setting up the backward pass, and will only compute the gradients for the last layer if the rest of the parameters are fixed.

Evan Shelhamer

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/CAAV6PX%3DgCK3ACfsPFxwcAAaxhJEnBwS-33ZwTq0EJqnH2u8QLw%40mail.gmail.com.

Ankit Arya

unread,

Oct 5, 2014, 12:47:36 PM10/5/14

to caffe...@googlegroups.com, jia...@gmail.com, ankit....@gmail.com, shel...@eecs.berkeley.edu

Thanks Yangqing and Evan, that was helpful !

I get the point now. Sorry if it was a naive question, just starting on deep learning.

Reply all

Reply to author

Forward