[slurm-users] Job allocation from a heterogenous pool of nodes

237 views
Skip to first unread message

Le, Viet Duc

unread,
Dec 7, 2022, 3:42:57 AM12/7/22
to slurm...@lists.schedmd.com

Dear slurm community, 


I am encountering a unique situation where I need to allocate jobs to nodes with different numbers of CPU cores. For instance: 

node01:  Xeon 6226 32 cores

node02:  EPYC 7543 64 cores


salloc --partition=all --nodes=2 --nodelist=gpu01,gpu02 --ntasks-per-node=32 --comment=etc

If --ntasks-per-node is larger than 32, the job could not be allocated since node01 has only 32 cores. 


In the context of NVIDIA's HPL container, we need to pin MPI processes according to NUMA affinity for best performance. 

For HGX-1, there are 8 A100s having affinity with 1st, 3rd, 5th, and 7th NUMA domain, respectively. 

With --ntasks-per-node=32, only the first half of EPYC's NUMA domain is available, and we had to assign the 4-7th A100 to 0th and 2nd NUMA domain, leading to some performance degradation. 


I am looking for a way to request more tasks than the number of physically available cores, i.e.  

salloc --partition=all --nodes=2 --nodelist=gpu01,gpu02 --ntasks-per-node=64 --comment=etc


Your suggestions are much appreciated. 


Regards, 

Viet-Duc

Brian Andrus

unread,
Dec 7, 2022, 12:27:25 PM12/7/22
to slurm...@lists.schedmd.com

You may want to look here:

https://slurm.schedmd.com/heterogeneous_jobs.html

Brian Andrus

Le, Viet Duc

unread,
Dec 17, 2022, 8:19:59 AM12/17/22
to Slurm User Community List
Hi Brian, 

Thanks for suggesting this interesting feature of Slurm. 
And sorry for the late follow up since I only had access to the cluster for a short time. 

We were now able to perform HPL benchmark across different partitions with correct NUMA affinity. 
For future reference, I put the procedure here: 

$ salloc \
       --partition=v100 --nodes=1 --ntasks-per-node=40 --gres=gpu:4 : \
       --partition=a100 --nodes=1 --ntasks-per-node=64 --gres=gpu:8 

$ srun \ 
       -n 4 : \ 
       -n 8   \
       hpl.sh 

Initially we thought there would be some performance degradation when mixing partitions.
But at least for small scale test, this seems to be negligible. 

Thanks. 
Viet-Duc  
Reply all
Reply to author
Forward
0 new messages