Multiple GPUs, GPU memory, batch size, and batch accumulation

Visto 891 veces
Saltar al primer mensaje no leído

Ken M Erney

no leída,
14 sept 2016, 15:56:4714/9/16
a DIGITS Users
I have a system setup with DIGITS installed from source.  This includes DIGITS 4.1-dev and Caffe 0.15.13 with NCCL.  I have two cards, both are smaller K2200 Quadro cards with 4GB of RAM each.  I can run the KITTI example by setting the batch size to 2 and the batch accumulation to 5.  I am trying to figure out what this actually means and how what I need to set these to for other networks so that they can work with my cards.  I imagine the answer to this question could be involved as well as having some complex nuances.  I was wondering if anybody had advice on some resources that I could read that would help me understand how these parameters relate to GPU resources.

Thanks,
Ken



Greg Heinrich

no leída,
14 sept 2016, 18:26:4914/9/16
a DIGITS Users
Hello,
with nv-caffe we are doing "strong" scaling i.e. if you have a mini batch size of 8 and train over 2 GPUs then each GPU will get to process 4 samples on every iteration. Mini-batch training obviously allows for a greater amount of the work to be parallelized therefore this will generally lead to faster processing on GPUs. Additionally, there is merit in mini-batch training in that the network is less likely to diverge since there is less variance in a mini batch than there is a single sample. Consequently, sometimes even though you don't have GPU resources (memory) to train on large mini batches you still want to perform the parameter updates only after processing a certain number of samples. This is when batch accumulation comes in: suppose you want to train on mini batches of 10 samples but your GPU can only process 2 samples at a time. You can reach a numerically identical solution if you use a mini batch size of 2 samples and batch accumulation of 5 iterations. I hope this helps.

A good paper to read on the subject is "Efficient back-prop": http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf

Regards,
Greg.

parallel processing
. There is some litterature (e.g. ) in which the recomendation is

Ken M Erney

no leída,
15 sept 2016, 7:53:1115/9/16
a DIGITS Users
Thanks Greg, now I understand the batch size vs. the batch accumulation.  The paper is also a good read.  From that paper, I was able to find some additional resources that talk about batch size and noise.  Thanks agin.
Responder a todos
Responder al autor
Reenviar
0 mensajes nuevos