Hi,
I'm using a similarity learning (triplet training) for face model.
I've been experimenting with SGD, AdaGrad, AdaDelta and Nesterov.
My CNN maps a face image to 128 floats in a triplet embedding space using GoogLeNet modified to have 128 outputs. I can then quantify the similarity of two faces by calculating the dot product distance between the two vectors. I'm using a fork of Caffe called caffe-sl.
Wondering if anybody has a recommendation for which is the best solver to use in this circumstance?
I think SGD is too basic so I've been reading about the advantages of the more advanced algorithms such as AdaGrad. However unsure which is the best in my case.
Are some solver mechanisms inherently better than others in all cases? Or does it depend highly on the use case?
I am leaning towards AdaGrad as it's given me the best accuracy so far but not sure if it really is better!
Thank you!