actual behavior of iter_size

376 views

iter_sizesgdsolver

Skip to first unread message

zhao...@gmail.com

unread,

Oct 8, 2016, 11:58:10 PM10/8/16

to Caffe Users

Hi all,

I get confused about actual behavior of "iter_size" option.

In this discuss Shelhamer said "Solver called forward&backward iter_size times and the weight_diff will accumulate while backward".

In my opinion, when iter_size>1, Several times of Forward & Backward have the same input data, output loss and gradient, so what's the different between 1) set iter_size=10 and 2) 10 times of learning rate ?

And in SGDSolver::ApplyUpdate(), which will Normalize the weight_diff, that's to say, divide weight_diff by iter_size. So accumulate weight_diff iter_size times and then divide it by iter_size, isn't it useless ?

My core argument is: the Forward & Backward within iter_size will have the same input, output, and gradient, am I right ?

Jonathan R. Williford

unread,

Oct 10, 2016, 4:05:25 PM10/10/16

to Caffe Users

iter_size is a way to effectively increase the batch size without requiring the extra GPU memory. If you have an iter_size of 10, then the gradients will be accumulated for 10 training iterations and then the weights will only be updated once. If you have an iter_size of 1, then the weights are updated with the gradient every training iteration.

The relationship between iter_size and the learning rate is not that straightforward. Increasing the batch size decreases the variance of the gradient estimate. If you decrease the learning rate instead of increasing the iter_size, you will be making more smaller steps, sometimes in the direction of the estimated gradient and sometimes away.

Say someone creates a model with a much more expensive GPU than what you have and they use a batch size of 1024 and iter_size of 1. If your GPU only supports a batch size of 128, then you can set the iter_size to 8 and train their model with the same learning parameters (although this may effect batch normalization, if this is used).

Cheers,

Jonathan

Reply all

Reply to author

Forward

0 new messages