Hi everyone,
I am trying to train a very deep network (ResNet http://arxiv.org/abs/1512.03385) in caffe. I am training on 8 GPUs with batch size = 32.
My problem is that 1 out of 8 GPUs is using significantly more memory (almost twice), which restricts me for training bigger networks with more than 70 layers. Any insight is appreciated :)
Memory usage between the 8 GPUs:
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================== |
| 2 12626 C ./build/tools/caffe 5979MiB |
| 3 12626 C ./build/tools/caffe 5832MiB |
| 4 12626 C ./build/tools/caffe 6272MiB |
| 5 12626 C ./build/tools/caffe 5832MiB |
| 6 12626 C ./build/tools/caffe 5979MiB |
| 7 12626 C ./build/tools/caffe 5832MiB |
| 8 12626 C ./build/tools/caffe 11248MiB |
| 9 12626 C ./build/tools/caffe 5832MiB |
+-----------------------------------------------------------------------------+