Hello all, I have the following problem:
I was training CaffeNet and experimenting, changing various hyperparameters, adding layers, etc, and given a certain configuration, the validation set accuracy would stall. Then, with the exact same network and hyperparameters, but making the batch size a little bigger, the model converges just fine, as can be seen in the images. This puzzles me, because until now, I thought that the only influence in the training process of the batch size was making it faster/slower by allowing the net to train with more/less images at a given time, but apparently it also affects whether the model converges or not. Why?
Another related question: suppose I tweak a network and change for example a 5x5 conv layer for 2 3x3 conv layers, and things like that. This would result in a bigger network, requiring more memory, and thus I may be forced to decrease the batch size. In that case, if my model does not converge, how could I know if it is because of the small batch size or because there is something wrong with the model per se?
Thanks!