Why is SGD better?

Daniela G

unread,

Aug 19, 2016, 9:36:35 AM8/19/16

to Caffe Users

Hi!
I noticed that the many of the default nets in Caffe use SGD solver.
Why is it better than the others? Or what are the advantages?

Thank you
x

charles....@digitalbridge.eu

unread,

Aug 19, 2016, 11:52:31 AM8/19/16

to Caffe Users

SGD is simple, inherently parallelisable, is a strong approximation to the true gradient, and works perfectly on batches. Alongside this, it is suitable in cases where a local minimum is sufficient (though this can be said of vanilla GD also).

There are modifications of it which may be better, depending on your use case. A useful one being SGD with momentum, where the learning rate is modified throughout training.

Evan Shelhamer

unread,

Aug 19, 2016, 1:49:36 PM8/19/16

to charles....@digitalbridge.eu, Caffe Users

SGD with momentum

Note that virtually every deep learning optimization is done by SGD + momentum, and not plain SGD.

SGD + momentum is a pretty strong default, and while more sophisticated variations like Adam can help, it's not always the case. At any rate, there's sometimes a bit of tuning necessary whatever the method.

Evan Shelhamer

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users+unsubscribe@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/4dc5dd1d-1f2c-451c-8635-142243d8b745%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Daniela G

unread,

Aug 26, 2016, 5:27:11 AM8/26/16

to Caffe Users

Thank you!

Reply all

Reply to author

Forward