Learning rate ,Momentum and Weight

Ninja At Work

unread,

Nov 8, 2015, 9:30:07 PM11/8/15

to Caffe Users

Hi

For solving in SGD we have lr_rate and momentum ,while we use gamma to multiply the base lr_rate ,whats going to do with the momentum at 0.9 ?the doc recommendation says to decrease lr if momentum increased and vice versa.why?

and , what is weight_decay?

Sorry for nob questions

Any clarification appreciated.

-Thanks

escorciav

unread,

Nov 9, 2015, 12:02:29 PM11/9/15

to Caffe Users

Weight decay is the regularization constant of typical machine learning optimization problems.

In few words and lack sense it can help your model to generalize. I recommend you to check a machine learning slides with details about optimization in order to get a clear sense of its meaning.

Victor

Jan C Peters

unread,

Nov 9, 2015, 2:58:54 PM11/9/15

to Caffe Users

Yeah you should definitely have a look at the corresponding scientific literature, maybe starting with http://research.microsoft.com/pubs/192769/tricks-2012.pdf.

In short: For most of these settings there are some heuristics that worked well in the past, but we actually don't have any true insight on most of them, e. g. what the values should be in a specific case. For weight decay one usually chooses something in the range [1e-6, 1e-4] afaik. The lr_base, policy and associated params (gamma, power, step) are also crucial to training success, but I (and probably most other people) am not sure what is the best choice here. Maybe you try with the recommendation from the referenced paper? It also depends somewhat on the solver type: From past experience I would recommend AdaDelta and a fixed 1.0 learning rate (the adaption of the gradient updates is completely managed by adadelta, you don't have to care about them anymore).

Jan

Ninja At Work

unread,

Nov 9, 2015, 4:26:34 PM11/9/15

to Caffe Users

Thanks Victor :)

Ninja At Work

unread,

Nov 9, 2015, 6:08:25 PM11/9/15

to Caffe Users

Thanks for the explanation Jan :)

I'm reading the useful link :)

Still i'm not sure about the fixed rate and/of adadelta ,seems something like Auto setup for a Nikon D810A-DSLR :)

Reply all

Reply to author

Forward

Learning rate ,Momentum and Weight_decay

Ninja At Work

escorciav

Jan C Peters

Ninja At Work

Ninja At Work