I use
Dataset: cifar10
model: googlenet (inception v1)
learning rate: fixed, 0.001(0~60000 iter), 0.0001(60000~65000 iter), 0.00001(65000~70000 iter)
and try two experiments based on
1) SGD with momentum: 0.9, weight_decay: 0.004
2) Adam with momentum: 0.9, momentum2: 0.999
I found that Adam was better than SGD in early iterations, but worse in later iterations, as shown bellow (red for SGD, green for Adam)
![](https://lh3.googleusercontent.com/-q0mqvmDSpZQ/WCLemMwxqHI/AAAAAAAAAEA/F1CHOYvDn7E8zifDA08dEUB_exA4yhK1wCLcB/s400/trainvalerr.jpg)
Is there any method to improve Adam? Thanks!