A few days ago, my loss was stuck too, and the loss was not satisfying much. I needed much less loss.
I had been using SGD and AdaGrad with different learning rates, which didn't solve the problem.
Then I tried Adam. The loss got higher, and I thought it was definitely wrong, but in around epoch 14, the loss just suddenly got much lower, lower than the minimum loss achieved with SGD. This is a small and specialized example.