What's the rule of gamma and weight_decay is solver.prototxt

1,194 views
Skip to first unread message

Zhongyu Lou

unread,
Jan 4, 2015, 5:22:51 AM1/4/15
to caffe...@googlegroups.com
Hey guys,

       I didn't find any explanation of gamma and weight_decay. I suppose one of them should control the change of learning rate, but how it works?

       I have the solver below, how  the learning rate is going to change? Thanks

net: "train_scripts/train_val.prototxt"
test_iter: 10000
test_interval: 1000000
base_lr: 0.0001
lr_policy: "step"
gamma: 0.1
stepsize: 50000
display: 20
max_iter: 300000
momentum: 0.9
weight_decay: 0.00005
snapshot: 10000
snapshot_prefix: "models/caffenet_train"



Daniel Orf

unread,
Jan 5, 2015, 5:01:10 PM1/5/15
to caffe...@googlegroups.com
The learning rate is multiplied by gamma at each step.  In your example, the learning rate would go from 0.0001 to 0.00001 (0.0001X0.1) at 50000 iterations and 0.000001 at 100000 iterations, etc.

ada...@ucr.edu

unread,
Oct 24, 2016, 1:38:08 PM10/24/16
to Caffe Users, dani...@gmail.com
And what about "weight_decay"? What does it do? Is it a regularization parameter (on L2 or L1 loss on the weights)? If it is, then, why is it called "decay"?

ada...@ucr.edu

unread,
Oct 24, 2016, 3:46:49 PM10/24/16
to Caffe Users, dani...@gmail.com
From some old discussions (link1, link2) I got the idea that 'weight_decay' parameter is the regularization parameter for L2 loss over the weights. For example, in the cifar10 solver, the weight_decay value is 0.004. Does it mean the loss to be minimized is is "cross-entropy + 0.004*sum_of_L2_Norm_of_all_weights"? Is it, by any chance, "cross-entropy + 0.004/2*sum_of_L2_Norm_of_all_weights"?
Reply all
Reply to author
Forward
0 new messages