Batch size and Validation Accuracy

Axel Straminsky

unread,

Aug 6, 2015, 5:40:55 PM8/6/15

to Caffe Users

Hello all, I have the following problem:

I was training CaffeNet and experimenting, changing various hyperparameters, adding layers, etc, and given a certain configuration, the validation set accuracy would stall. Then, with the exact same network and hyperparameters, but making the batch size a little bigger, the model converges just fine, as can be seen in the images. This puzzles me, because until now, I thought that the only influence in the training process of the batch size was making it faster/slower by allowing the net to train with more/less images at a given time, but apparently it also affects whether the model converges or not. Why?

Another related question: suppose I tweak a network and change for example a 5x5 conv layer for 2 3x3 conv layers, and things like that. This would result in a bigger network, requiring more memory, and thus I may be forced to decrease the batch size. In that case, if my model does not converge, how could I know if it is because of the small batch size or because there is something wrong with the model per se?

Thanks!

caffenet-batchsize-50.jpg

caffenet-batchsize-80.jpg

Dinesh

unread,

Aug 6, 2015, 5:57:47 PM8/6/15

to Caffe Users

I've observed this happening too. Specifically, when the number of labels is larger of a classification network is large, I find it helps to increase batch size.

My hypothesis for why this happens is that with more labels, each batch must be large to be reliably representative of the dataset (so that in turn the gradient over it is closer to the gradient over the full dataset).

Evan Shelhamer

unread,

Aug 6, 2015, 8:23:04 PM8/6/15

to Axel Straminsky, Caffe Users

The batch size is a hyperparameter of SGD and it absolutely does have an effect on learning. A weight update will be made for every batch, so different batches yield a different sequence of updates.

If you run out of memory when computing the desired batch size, you can duck the memory constraint by accumulating gradients across batches through the `iter_size` solver option: https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto#L152-L153

Evan Shelhamer

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/93bc4e24-221d-4b1b-bd15-dab8dd30b2aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Axel Straminsky

unread,

Aug 6, 2015, 8:37:30 PM8/6/15

to Caffe Users, axelstr...@gmail.com

Thanks Evan, I didn't know that, your answer was very helpful. I will play with iter_size and see how it goes.

Farik John

unread,

Mar 11, 2016, 10:34:10 AM3/11/16

to Caffe Users

Hi Axel,

Did you test with iter_size?

it's the same result with the same batch size not supporting iter_size?

Thanks

Reply all

Reply to author

Forward