I use
Dataset: cifar10
model: googlenet (inception v1)
learning rate: fixed, 0.001(0~60000 iter), 0.0001(60000~65000 iter), 0.00001(65000~70000 iter)
and try two experiments based on
1) SGD with momentum: 0.9, weight_decay: 0.004
2) Adam with momentum: 0.9, momentum2: 0.999
I found that Adam was better than SGD in early iterations, but worse in later iterations, as shown bellow (red for SGD, green for Adam)

Is there any method to improve Adam? Thanks!