Poor performance of multiple tasks on the node with multiple GPUs

KY

unread,

May 17, 2021, 12:49:59 PM5/17/21

to hoomd-users

Hi everyone,

I would like to run multiple tasks on a node with multiple GPUs. The following script is submitted to slurm:

#!/bin/bash

# choose our partition

#SBATCH -p x-gpu

# gpu node has 2x10 cores and 10 gpu

#SBATCH -N 1 -n 20 --gres=gpu:10

source ~/.bashrc

mpirun -np 1 python init1.py --gpu=1 > init1.log &

mpirun -np 1 python init2.py --gpu=2 > init2.log &

mpirun -np 1 python init3.py --gpu=3 > init3.log &

mpirun -np 1 python init4.py --gpu=4 > init4.log &

mpirun -np 1 python init5.py --gpu=5 > init5.log &

mpirun -np 1 python init6.py --gpu=6 > init6.log &

mpirun -np 1 python init7.py --gpu=7 > init7.log

wait

Usually I get 10000 TPS on a GTX2080ti GPU. Yet I find only 1000 TPS if I run multiple tasks on a node with multiple GPUs. The reason why I use multiple mpirun is that hoomd would choose only one GPU to run all the job.

Anyone got any idea why this happened?

Hoomd version 2.9.6

Joshua Anderson

unread,

May 17, 2021, 12:54:50 PM5/17/21

to hoomd...@googlegroups.com

KY,

HOOMD does not choose which GPU to use, either the CUDA driver does or you do. The CUDA driver will put all processes on the same GPU. You can select GPUs using `context.initialize` in HOOMD v2 or `device.GPU` in v3.
------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan

> --
> You received this message because you are subscribed to the Google Groups "hoomd-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hoomd-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/hoomd-users/1e4e2b6a-f26a-488f-9a4d-dd566365b22dn%40googlegroups.com.

谯楷耀

unread,

May 17, 2021, 1:03:42 PM5/17/21

to hoomd...@googlegroups.com

Hi Joshua,

Thanks. As I said, I already put tasks on different GPUs using —gpu＝#. And it works as I can see hoomd chooses different gpu devices in log file. The question is why the performance drops dramatically. I think hoomd has enough cpu and memory sources.

> 在 2021年5月18日，00:54，Joshua Anderson <joaa...@umich.edu> 写道：
>
> KY,

> To view this discussion on the web visit https://groups.google.com/d/msgid/hoomd-users/62DDB552-B7DD-4159-99F9-12294D6E4F38%40umich.edu.

Joshua Anderson

unread,

May 18, 2021, 9:33:57 AM5/18/21

to hoomd...@googlegroups.com

KY,

Log in and check the running node using htop. Your clusters mpirun may be binding all of the processes to the same CPU core. You can also verify with nvidia-smi that the different processes are assigned to different GPUs.

------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan

> To view this discussion on the web visit https://groups.google.com/d/msgid/hoomd-users/E24D2705-6B7E-4CD3-847F-5727D8BCFF03%40gmail.com.

KY

unread,

May 18, 2021, 10:47:21 AM5/18/21

to hoomd-users

Joshua,

Thanks. It turns out that the problem indeed lay on mpirun. The cluster binding all processes to the same CPU core. I put the following here in case someone needs it.

#!/bin/bash

#SBATCH -p x-gpu

# gpu node has 2x10 cores and 10 gpu

#SBATCH -N 1 -n 20 --gres=gpu:10

source ~/.bashrc

cd /home/kqiao/xgpu-scratch/kqiao/hoomd/poly-swap/test

mpirun --cpu-set 0-1 --bind-to core -np 1 python init1.py --gpu=1 > init1.log &

mpirun --cpu-set 2-3 --bind-to core -np 1 python init2.py --gpu=2 > init2.log &

mpirun --cpu-set 4-5 --bind-to core -np 1 python init3.py --gpu=3 > init3.log &

mpirun --cpu-set 6-7 --bind-to core -np 1 python init4.py --gpu=4 > init4.log &

mpirun --cpu-set 8-9 --bind-to core -np 1 python init5.py --gpu=5 > init5.log &

mpirun --cpu-set 10-11 --bind-to core -np 1 python init6.py --gpu=6 > init6.log &

mpirun --cpu-set 12-13 --bind-to core -np 1 python init7.py --gpu=7 > init7.log &

mpirun --cpu-set 14-15 --bind-to core -np 1 python init8.py --gpu=8 > init8.log &