Poor performance of multiple tasks on the node with multiple GPUs

32 views
Skip to first unread message

KY

unread,
May 17, 2021, 12:49:59 PM5/17/21
to hoomd-users
Hi everyone,

I would like to run multiple tasks on a node with multiple GPUs. The following script is submitted to slurm:

#!/bin/bash
# choose our partition
#SBATCH -p x-gpu

# gpu node has 2x10 cores and 10 gpu
#SBATCH -N 1 -n 20 --gres=gpu:10

source ~/.bashrc
mpirun -np 1 python init1.py --gpu=1 > init1.log &
mpirun -np 1 python init2.py --gpu=2 > init2.log &
mpirun -np 1 python init3.py --gpu=3 > init3.log &
mpirun -np 1 python init4.py --gpu=4 > init4.log &
mpirun -np 1 python init5.py --gpu=5 > init5.log &
mpirun -np 1 python init6.py --gpu=6 > init6.log &
mpirun -np 1 python init7.py --gpu=7 > init7.log
wait

Usually I get 10000 TPS on a GTX2080ti GPU. Yet I find only 1000 TPS if I run multiple tasks on a node with multiple GPUs. The reason why I use multiple mpirun is that hoomd would choose only one GPU to run all the job.

Anyone got any idea why this happened?
Hoomd version 2.9.6

Joshua Anderson

unread,
May 17, 2021, 12:54:50 PM5/17/21
to hoomd...@googlegroups.com
KY,

HOOMD does not choose which GPU to use, either the CUDA driver does or you do. The CUDA driver will put all processes on the same GPU. You can select GPUs using `context.initialize` in HOOMD v2 or `device.GPU` in v3.
------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan
> --
> You received this message because you are subscribed to the Google Groups "hoomd-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hoomd-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/hoomd-users/1e4e2b6a-f26a-488f-9a4d-dd566365b22dn%40googlegroups.com.

谯楷耀

unread,
May 17, 2021, 1:03:42 PM5/17/21
to hoomd...@googlegroups.com
Hi Joshua,

Thanks. As I said, I already put tasks on different GPUs using —gpu=#. And it works as I can see hoomd chooses different gpu devices in log file. The question is why the performance drops dramatically. I think hoomd has enough cpu and memory sources.

> 在 2021年5月18日,00:54,Joshua Anderson <joaa...@umich.edu> 写道:
>
> KY,
> To view this discussion on the web visit https://groups.google.com/d/msgid/hoomd-users/62DDB552-B7DD-4159-99F9-12294D6E4F38%40umich.edu.

Joshua Anderson

unread,
May 18, 2021, 9:33:57 AM5/18/21
to hoomd...@googlegroups.com
KY,

Log in and check the running node using htop. Your clusters mpirun may be binding all of the processes to the same CPU core. You can also verify with nvidia-smi that the different processes are assigned to different GPUs.
------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan

> To view this discussion on the web visit https://groups.google.com/d/msgid/hoomd-users/E24D2705-6B7E-4CD3-847F-5727D8BCFF03%40gmail.com.

KY

unread,
May 18, 2021, 10:47:21 AM5/18/21
to hoomd-users
Joshua,

Thanks. It turns out that the problem indeed lay on mpirun. The cluster binding all processes to the same CPU core. I put the following here in case someone needs it.

#!/bin/bash
#SBATCH -p x-gpu
# gpu node has 2x10 cores and 10 gpu
#SBATCH -N 1 -n 20 --gres=gpu:10
source ~/.bashrc
cd /home/kqiao/xgpu-scratch/kqiao/hoomd/poly-swap/test
mpirun --cpu-set 0-1 --bind-to core -np 1 python init1.py --gpu=1 > init1.log &
mpirun --cpu-set 2-3 --bind-to core -np 1 python init2.py --gpu=2 > init2.log &
mpirun --cpu-set 4-5 --bind-to core -np 1 python init3.py --gpu=3 > init3.log &
mpirun --cpu-set 6-7 --bind-to core -np 1 python init4.py --gpu=4 > init4.log &
mpirun --cpu-set 8-9 --bind-to core -np 1 python init5.py --gpu=5 > init5.log &
mpirun --cpu-set 10-11 --bind-to core -np 1 python init6.py --gpu=6 > init6.log &
mpirun --cpu-set 12-13 --bind-to core -np 1 python init7.py --gpu=7 > init7.log &
mpirun --cpu-set 14-15 --bind-to core -np 1 python init8.py --gpu=8 > init8.log &
mpirun --cpu-set 16-17 --bind-to core -np 1 python init9.py --gpu=9 > init9.log 
wait
Reply all
Reply to author
Forward
0 new messages