Multi gpu: one gpu use significantly more memory than others

254 views
Skip to first unread message

Amir Habibian

unread,
Jan 5, 2016, 11:17:33 AM1/5/16
to Caffe Users

Hi everyone,

I am trying to train a very deep network (ResNet http://arxiv.org/abs/1512.03385) in caffe. I am training on 8 GPUs with batch size = 32.

My problem is that 1 out of 8 GPUs is using significantly more memory (almost twice), which restricts me for training bigger networks with more than 70 layers. Any insight is appreciated :)

Memory usage between the 8 GPUs:
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================== |
| 2 12626 C ./build/tools/caffe 5979MiB |
| 3 12626 C ./build/tools/caffe 5832MiB |
| 4 12626 C ./build/tools/caffe 6272MiB |
| 5 12626 C ./build/tools/caffe 5832MiB |
| 6 12626 C ./build/tools/caffe 5979MiB |
| 7 12626 C ./build/tools/caffe 5832MiB |
| 8 12626 C ./build/tools/caffe 11248MiB |
| 9 12626 C ./build/tools/caffe 5832MiB |
+-----------------------------------------------------------------------------+

Amir Habibian

unread,
Jan 7, 2016, 5:55:03 AM1/7/16
to Caffe Users

Solved by:


xkszltl commented a day ago

That's for test. Caffe runs test phase on only one gpu. You can reduce the test batch size and increase the number of test iterations.

I don't know why caffe does not parallelize the test phase.

============================================================

Amir Abdi

unread,
Jun 1, 2016, 9:45:29 PM6/1/16
to Caffe Users
How can I setup my Ubuntu / Caffe to run training on multiple GPUs? (in my case 2 GPUs)

Ben

unread,
Aug 3, 2016, 10:29:54 PM8/3/16
to Caffe Users
Hi, Habibian. I'm training ResNet-101 now. And I'm facing the same issue.  I set smaller batch size for test stage, and test is ok.
But when as you do,  I set batch size 32 for each gpu (8 K40m gpu), I failed with "cuda out of memory". Did you have this problem when training ?
Can you give me some help please.
Reply all
Reply to author
Forward
0 new messages