What's the rule of gamma and weight_decay is solver.prototxt

Zhongyu Lou

unread,

Jan 4, 2015, 5:22:51 AM1/4/15

to caffe...@googlegroups.com

Hey guys,

I didn't find any explanation of gamma and weight_decay. I suppose one of them should control the change of learning rate, but how it works?

I have the solver below, how the learning rate is going to change? Thanks

net: "train_scripts/train_val.prototxt"

test_iter: 10000

test_interval: 1000000

base_lr: 0.0001

lr_policy: "step"

gamma: 0.1

stepsize: 50000

display: 20

max_iter: 300000

momentum: 0.9

weight_decay: 0.00005

snapshot: 10000

snapshot_prefix: "models/caffenet_train"

Daniel Orf

unread,

Jan 5, 2015, 5:01:10 PM1/5/15

to caffe...@googlegroups.com

The learning rate is multiplied by gamma at each step. In your example, the learning rate would go from 0.0001 to 0.00001 (0.0001X0.1) at 50000 iterations and 0.000001 at 100000 iterations, etc.

ada...@ucr.edu

unread,

Oct 24, 2016, 1:38:08 PM10/24/16

to Caffe Users, dani...@gmail.com

And what about "weight_decay"? What does it do? Is it a regularization parameter (on L2 or L1 loss on the weights)? If it is, then, why is it called "decay"?

ada...@ucr.edu

unread,

Oct 24, 2016, 3:46:49 PM10/24/16

to Caffe Users, dani...@gmail.com

From some old discussions (link1, link2) I got the idea that 'weight_decay' parameter is the regularization parameter for L2 loss over the weights. For example, in the cifar10 solver, the weight_decay value is 0.004. Does it mean the loss to be minimized is is "cross-entropy + 0.004*sum_of_L2_Norm_of_all_weights"? Is it, by any chance, "cross-entropy + 0.004/2*sum_of_L2_Norm_of_all_weights"?

Reply all

Reply to author

Forward