solver type Adam worse than SGD

105 views

Skip to first unread message

unread,

Nov 9, 2016, 3:34:47 AM11/9/16

to Caffe Users

I use

Dataset: cifar10

model: googlenet (inception v1)

learning rate: fixed, 0.001(0~60000 iter), 0.0001(60000~65000 iter), 0.00001(65000~70000 iter)

and try two experiments based on

1) SGD with momentum: 0.9, weight_decay: 0.004

2) Adam with momentum: 0.9, momentum2: 0.999

I found that Adam was better than SGD in early iterations, but worse in later iterations, as shown bellow (red for SGD， green for Adam)