[slurm-users] Sharding GPUs

8 views
Skip to first unread message

Alessandro D'Auria via slurm-users

unread,
Feb 13, 2026, 5:26:19 AM (9 days ago) Feb 13
to slurm...@lists.schedmd.com
Dear Community,

We are trying to activate sharding.
Our Compute Node are configured with 64 cores, 4 phisical GPU MI250x ( 8 logical ) 4 Numa Domain. 1 Phisical Gpu / 2 logical GPU for each Numa Domain. 1 Logical GPU for each l3 cache domain

gres.conf
AutoDetect=rsmi
NodeName=c6n[3339-3348] Name=shard File=/dev/dri/renderD128 Count=4
NodeName=c6n[3339-3348] Name=shard File=/dev/dri/renderD129 Count=4
NodeName=c6n[3339-3348] Name=shard File=/dev/dri/renderD130 Count=4
NodeName=c6n[3339-3348] Name=shard File=/dev/dri/renderD131 Count=4
NodeName=c6n[3339-3348] Name=shard File=/dev/dri/renderD132 Count=4
NodeName=c6n[3339-3348] Name=shard File=/dev/dri/renderD133 Count=4
NodeName=c6n[3339-3348] Name=shard File=/dev/dri/renderD134 Count=4
NodeName=c6n[3339-3348] Name=shard File=/dev/dri/renderD135 Count=4

If I ask 2 cores with block:cyclic I have the expected result
srun -N1 -n2 -c1 --cpu-bind=cores -m block:cyclic --pty bash
cpuset cgroup is 1,17

But if I add 2 shard in the request I don't expect this result
srun -N1 -n2 -c1 --cpu-bind=cores --gres=shard:2 -m block:cyclic --pty bash
cpuset cgroup is 1-2
ROCR_VISIBILE_DEVICES=0


Is it possibile request 2 sharding in round robin fashion, in order to run a multigpu job on different GPUs?
srun -N1 -n2 -c1 --cpu-bind=cores --gres=shard:2 -m block:cyclic --pty bash

Practically, I would to have this result
cpuset cgroup is 1-17
ROCR_VISIBILE_DEVICES=0,1

Thank you in advance,
Alessandro

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

Alessandro D'Auria via slurm-users

unread,
Feb 13, 2026, 10:52:49 AM (9 days ago) Feb 13
to slurm...@lists.schedmd.com
I add other informations

O.S. Rhel8.9
Slurm 25.11.2
Reply all
Reply to author
Forward
0 new messages