I am confuse some concept about the effectiveness of batch size on multi-GPU training. Plz correct it if it's wrong
i.e. if
(1) Using a batch size of 64 (literally, in the prototxt), and train on a single GPU
(2) Using a batch size of 16 (literally, in the prototxt), and train on 4 GPU
Both of the actual batch size on training of these 2 scenarios is 64, right?(according to https://github.com/BVLC/caffe/blob/master/docs/multigpu.md)
If so, is that means both of the scenarios would come out similar results theoretically?