The learning rate of deconvolutional layer

Libra Allen

unread,

Aug 6, 2017, 12:17:59 PM8/6/17

to Caffe Users

When using the offical FCN demo(https://github.com/shelhamer/fcn.berkeleyvision.org), I found that the learning rate of deconvolutional layer is 0. I can't understand that. Is there anyone could explain this ?

Przemek D

unread,

Aug 22, 2017, 6:44:44 AM8/22/17

to Caffe Users

From the readme on this very site:

In our original experiments the interpolation layers were initialized to bilinear kernels and then learned. In follow-up experiments, and this reference implementation, the bilinear kernels are fixed. There is no significant difference in accuracy in our experiments, and fixing these parameters gives a slight speed-up.

Alex Ter-Sarkisov

unread,

Aug 23, 2017, 7:13:08 AM8/23/17

to Caffe Users

hi Przemek,

do you have any idea what I do wrong if almost immediately I get nan everywhere when I init weight to 'bilinear'? If I init to 'gaussian' it does learn something.

Przemek D

unread,

Aug 23, 2017, 8:56:04 AM8/23/17

to Caffe Users

Deconvolutional upsampling should be initialized with bilinear filter, and left fixed (lr_mult: 0). Gaussian initialization does not make much sense for those layers, even though it might seem as if it learns something. NANs might be due to the learning rate being set too high - I had to go down 3-4 orders of magnitude before my FCNs started converging.

Alex Ter-Sarkisov

unread,

Aug 24, 2017, 9:04:41 AM8/24/17

to Caffe Users

Well the test error even before training is 0.6931, which is a local minimum of softmax cross-entropy (p1=p2=1/2) and usually networks can't get out of it. Should I do something else, like initialize Deconv layer in some specific way perhaps?

Alex Ter-Sarkisov

unread,

Aug 24, 2017, 9:26:40 AM8/24/17

to Caffe Users

[after a bit of training] well it just gets stuck at all 0 output, nothing's changing.

On Thursday, August 24, 2017 at 12:56:04 AM UTC+12, Przemek D wrote:

Reply all

Reply to author

Forward