I have a model which has already been trained for 4000 iterations and now I am occasionally running into "out of memory" issues. It seems those only occur when the GPU is also used by other CUDA tasks that are running on the same machine. So my thought is that if I use a smaller batch size the training might become more reliable. I wonder if it's okay to change the batch size or if I have to re start with a smaller size from iteration 1?
Right now I am resuming training this way:
solver = caffe.SGDSolver('models/bvlc_googlenet/solver.prototxt')
solver.restore('models/bvlc_googlenet/bvlc_googlenet_iter_4000.solverstate');
What would be the right place to change the batch size (if that is a feasible approach at all):
Is it enough to change it inside of solver.prototxt or does that value get overwritten by the restoring of the solverstate?
Do I have to change it after the restore() via script?
Thanks
Mario Klingemann