Finetuning: Stopping some layers from learning via "lr_mult: 0" doesn't seem to work

167 views
Skip to first unread message

suk

unread,
Jul 22, 2016, 10:53:55 AM7/22/16
to Caffe Users
Hey all,

I have a train_val.prototxt with two loss layers, loss_heatmap and loss_fusion. The "heatmap" loss is calculated with the output of the first part of the net (8 convolution layers), the second loss, "loss_fusion" with the final output (after 7 of the first 8 conv layers plus 5 more conv layers).

I tried to switch off learning for the first eight layers by setting lr_mult for biases and weights to 0 there. (In a second try also set decay_mult to 0, but without change:)
However, the "loss_heatmap" continues to fall:

 Resuming from [...]_iter_12.solverstate

 I0722 16:27:05.602372 22003 solver.cpp:244]     Train net output #0: loss_fusion = 0.921167 (* 3 = 2.7635 loss)
 I0722 16:27:05.602403 22003 solver.cpp:244]     Train net output #1: loss_heatmap = 0.0943092 (* 1 = 0.0943092 loss)
I0722 16:27:05.602428 22003 sgd_solver.cpp:106] Iteration 12, lr = 1e-08
...
 I0722 16:27:08.175077 22003 solver.cpp:244]     Train net output #0: loss_fusion = 0.816552 (* 3 = 2.44966 loss)
 I0722 16:27:08.175101 22003 solver.cpp:244]     Train net output #1: loss_heatmap = 0.0428295 (* 1 = 0.0428295 loss)
...
I0722 16:27:10.764152 22003 solver.cpp:244]     Train net output #0: loss_fusion = 2.43964 (* 3 = 7.31892 loss)
I0722 16:27:10.764168 22003 solver.cpp:244]     Train net output #1: loss_heatmap = 0.0161336 (* 1 = 0.0161336 loss)
...
I0722 16:27:13.525418 22003 solver.cpp:244]     Train net output #0: loss_fusion = 4.35199 (* 3 = 13.056 loss)
I0722 16:27:13.525431 22003 solver.cpp:244]     Train net output #1: loss_heatmap = 0.00575636 (* 1 = 0.00575636 loss)
...
I0722 16:27:16.177412 22003 solver.cpp:244]     Train net output #0: loss_fusion = 15.5188 (* 3 = 46.5565 loss)
I0722 16:27:16.177426 22003 solver.cpp:244]     Train net output #1: loss_heatmap = 0.00272083 (* 1 = 0.00272083 loss)

I0722 16:27:16.177470 22003 sgd_solver.cpp:106] Iteration 16, lr = 1e-08

...

In my solver prototxt, you find (among others):

base_lr: 0.00000001
lr_policy: "fixed"
gamma: 0.1
momentum: 0.95
weight_decay: 0.0005


Does anyone know what might cause the first loss to fall - with weights that should be completely fixed? Or is there another parameter that might make the weights change even when lr_mult = 0? Is this a bug?

Thanks!

PS: The prototxt should essentially be the same as the one of Thomas Pfisters' train_val.prototxt here:
https://github.com/tpfister/caffe-heatmap/tree/master/models/heatmap-flic-fusion

suk

unread,
Jul 22, 2016, 11:25:06 AM7/22/16
to Caffe Users
Ah, I have a guess (didn't find out how to modify my former post, so I put it here):

Probably, restarting from a solverstate (with a net where the lr_mult's are not set to zero) will use the former net model. Or rather, will go on using the old net.

Now, another question: Does "caffe train" completely ignore the argument "-solver ..." if the argument "-snapshot ..." is given as well? Or what parts are reloaded from the solver model, what parts from the old solverstate?

And is there a way to change parameters (the lr_mults) when starting from a snapshot, or do I have to create a .caffemodel?

suk

unread,
Jul 22, 2016, 12:02:49 PM7/22/16
to Caffe Users
Or maybe it doesn't use the former net, after all:
https://groups.google.com/forum/#!searchin/caffe-users/solverstate$20finetuning/caffe-users/mMRkU4I8mRM/TNd96RnXuooJ
Any way to cancel the momentum?
Reply all
Reply to author
Forward
0 new messages