Which gradient algorithm to use?

42 views
Skip to first unread message

Thomas Wood

unread,
Sep 22, 2016, 5:31:44 AM9/22/16
to Caffe Users
Hi,
I'm using a similarity learning (triplet training) for face model.
I've been experimenting with SGD, AdaGrad, AdaDelta and Nesterov.
My CNN maps a face image to 128 floats in a triplet embedding space using GoogLeNet modified to have 128 outputs. I can then quantify the similarity of two faces by calculating the dot product distance between the two vectors. I'm using a fork of Caffe called caffe-sl.
Wondering if anybody has a recommendation for which is the best solver to use in this circumstance?
I think SGD is too basic so I've been reading about the advantages of the more advanced algorithms such as AdaGrad. However unsure which is the best in my case.

Are some solver mechanisms inherently better than others in all cases? Or does it depend highly on the use case?
I am leaning towards AdaGrad as it's given me the best accuracy so far but not sure if it really is better!

Thank you!

Thomas Wood

unread,
Sep 22, 2016, 5:32:52 AM9/22/16
to Caffe Users
In addition, lots of pages seem to state that AdaGrad and AdaDelta eliminate the need for a LR. But the LR option is still there in the solver and it is clearly affecting how my model learns so this makes no sense to me. Any clarification welcome please!
Thanks

charles....@digitalbridge.eu

unread,
Sep 22, 2016, 8:41:03 AM9/22/16
to Caffe Users
There's no clear cut answer. Try different options for a short amount of time and compare their losses, and / or look at the literature around similar problems as your own and take influence from what settings they used.

Thomas Wood

unread,
Sep 22, 2016, 9:15:12 AM9/22/16
to Caffe Users
Thank you, I will keep on experimenting. From what I can tell it appears it's best not to use SGD  - I found this page really helpful to get an overview of what the algorithms are doing.
Reply all
Reply to author
Forward
0 new messages