Imagenet training stuck at loss ~ 6.91 from iteration 100 to 83000

1,748 views
Skip to first unread message

César Salgado

unread,
Sep 9, 2014, 10:27:23 AM9/9/14
to caffe...@googlegroups.com
Hi,

I'm training imagenet and the loss don't get below 6.91 even after 83000 iterations. The loss drops from 7.66 to 6.91 in the first 100 iterations and then it doesn't get any lower.

Is this problem related with the fact that I'm using batch size = 64 instead of 256? (I can't set higher than 64 because I haven't enough video memory).

My imagenet_solver.prototxt looks as follows:

  1 net: "imagenet_train_val.prototxt"
  2 test_iter: 1000
  3 test_interval: 4000
  4 base_lr: 0.01
  5 lr_policy: "step"
  6 gamma: 0.1
  7 stepsize: 400000
  8 display: 20
  9 max_iter: 1800000
 10 momentum: 0.9
 11 weight_decay: 0.0005
 12 snapshot: 40000
 13 snapshot_prefix: "caffe_imagenet_train"
 14 solver_mode: GPU

Is there a combination of learning rate and momentum that can solve this problem?

Thanks,

Evan Shelhamer

unread,
Sep 9, 2014, 10:37:45 AM9/9/14
to César Salgado, caffe...@googlegroups.com
The mini-batch size is an important hyperparameter in tuning SGD. You cannot simply reduce it for memory reasons and have the same mathematical result.

There is a thread on how to jointly tune the mini-batch size, learning rate, etc. for low memory training here: https://github.com/BVLC/caffe/issues/430

Evan Shelhamer

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/fc4401b3-cc74-461e-aee5-a6c2504f47f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages