Variance in optimum learing rate value to fine-tune FCN in different frameworks

Raúl Gombru

unread,

Jul 11, 2016, 5:52:36 AM7/11/16

to Caffe Users

Hello,

I have been fine-tuning FCN models using my own data with Caffe and also with MatConvNet. The initial weights of the models are adapted from PASCAL VOC trained CNN. I have observed a huge variance in the optimum learning rate for both frameworks. In Caffe the optimum learning rate is around 1e-12. Besides, if a learning rate higher than 1e-10 is used the net doesn't learn. On the other hand, in MatConvNet the optimum learning rate is around 1e-4.

Any ideas about that difference? Isn't 1e-12 a too tiny learning rate?

The FCN models I'm using are:

- Caffe: https://github.com/shelhamer/fcn.berkeleyvision.org
- MatConvNet: https://github.com/vlfeat/matconvnet-fcn

Aside from my experiments, I have observed in several text localization papers the same kind of variance in the used learning rates. The authors do not specify the framework used for training, but the differences match with the ones I have observed:

- Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network - Tong He (1e-10)
- Synthetic Data for Text Localisation in Natural Images - Ankush Gupta (1e-4)

Thank you

Evan Shelhamer

unread,

Jul 25, 2016, 5:33:05 PM7/25/16

to Raúl Gombru, Caffe Users

The MatConvNet edition of FCNs is actually different from the FCN paper and Caffe edition of the code. Most importantly, they resize or crop all of their inputs to 512x512 instead of keeping the original dimensions of each input and reshaping the net, as is done in the FCN paper. There may be other differences; I have not done a comprehensive comparison.

The difference in learning rate is due to the choice of normalized/unnormalized loss. If all inputs are the same dimensions the choice is irrelevant, provided an appropriate learning rate is chosen, but for inputs of differing dimensions it is important to pick an unnormalized loss. The unnormalized loss weights every pixel the same no matter the image dimensions, but the normalized loss re-weights a pixel according to the size of the image that contains it.

To follow the FCN paper's method, you should use the fcn.berkeleyvision.org implementation, which covers the journal version of our work that gives improved results.

Evan Shelhamer

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/087b2445-0bd5-4bbc-aa52-ffad50288203%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Raúl Gombru

unread,

Jul 28, 2016, 8:13:14 AM7/28/16

to Caffe Users, raulg...@gmail.com

Thank you for your answer Evan! It has been really helpful.

So I understand that the choice of unnormalized loss brings higher loss values, as it is not normalized per pixel. That makes that we have to use lower learning rate values in order to modify net parameters the same amount as in the normalized loss case.

The numbers also make sense to me. If we consider a 1Mp image we would have 1.000.000 pixels. So the difference in the learning rates order of magnitude (1e-10 for unnormalizes / 1e-4 for normalized), agrees with the order of magnitude of loss normalization.

Reply all

Reply to author

Forward