Model: resnet50
Batch size: 64
Number of GPUs: 1
Running warmup...
Running benchmark...
Iter #0: 844.3 img/sec per GPU
Iter #1: 844.0 img/sec per GPU
Iter #2: 843.6 img/sec per GPU
Iter #3: 843.5 img/sec per GPU
Iter #4: 843.5 img/sec per GPU
Iter #5: 842.0 img/sec per GPU
Iter #6: 841.3 img/sec per GPU
Iter #7: 841.8 img/sec per GPU
Iter #8: 841.1 img/sec per GPU
Iter #9: 841.1 img/sec per GPU
Img/sec per GPU: 842.6 +-2.4
Total img/sec on 1 GPU(s): 842.6 +-2.4
Run with two GPU(s) on the same node
Model: resnet50
Batch size: 64
Number of GPUs: 2
Running warmup...
Running benchmark...
Iter #0: 235.7 img/sec per GPU
Iter #1: 251.5 img/sec per GPU
Iter #2: 217.0 img/sec per GPU
Iter #3: 239.4 img/sec per GPU
Iter #4: 257.2 img/sec per GPU
Iter #5: 258.3 img/sec per GPU
Iter #6: 248.4 img/sec per GPU
Iter #7: 242.6 img/sec per GPU
Iter #8: 238.0 img/sec per GPU
Iter #9: 240.3 img/sec per GPU
Img/sec per GPU: 242.8 +-22.4
Total img/sec on 2 GPU(s): 485.7 +-44.8
To unsubscribe from this group and stop receiving emails from it, send an email to users+un...@lists.open-mpi.org.
Hi Shruti,
What version of NCCL is installed on the system? Horovod has environment variables you can set to force use of NCCL.
You may want to use nvidia-smi to double check whether the benchmark is actually using both GPUs when using two mpi processes.
Also, you may want to consult with a technical assistant like chat-gpt o4-mini about this problem. The assistant may prove to be quite helpful.
Howard