I have some users that are using ray on slurm.
I will preface by saying we are new slurm users, so may not be doing everything exactly correct.
The only issue that we came across so far as something that was somewhat ray specific that we ran into.
Specifically, and pardon my lack of specificity, the ray user I worked on this with is on vacation at the moment, there was an environment variable that needed to be unset so that ray wouldn’t kneecap itself if it hit a cpuset corner case in cgroup fencing.
Specifically, in this workload, the user spawns a “ray head,” and important to mention that this head worker may not have the same resources allocated to it as the “ray worker”.
TL;DR the ray head would be given fewer cpus than the worker(s), and in some corner cases, the worker pid spawned would inherit a smaller cpuset from an environment variable passed from the ray head that is then spawning workers via srun.
The user noticed that some workers would be able to get 100% util for their allocated cpu resources, where other workers running identical workloads would end up at partial usage, which we discovered were due to the cpuset getting inherited in a way we didn’t intend for it to.
I’ll have to follow up with the environment variable we had to unset when that user is back.
But here is my quick and dirty bash script that was able to show the cpu’s allocated to the cgroup, and the pid’s inside the cgroup, which should match, but didn’t always, which was our discovery.
Just use the uid of the user submitting the jobs.
#!/bin/bash
UID=$1
for JOB in $(ls -lah /sys/fs/cgroup/cpuset/slurm/uid_$UID/ | grep job | awk -F'_' '{print $2}' | xargs)
do
echo "Slurm JobID: “$JOB
echo -n "Cgroup CPU set: "
cat /sys/fs/cgroup/cpuset/slurm/uid_$UID/job_$JOB/cpuset.cpus
for PID in $(cat /sys/fs/cgroup/cpuset/slurm/uid_$UID/job_$JOB/step_0/cgroup.procs | xargs)
do
echo -n "CPUs allocated for PID "$PID": "
cat /proc/$PID/status | grep Cpus_allowed_list | awk '{print $2}'
done
echo ""
done
slurmd3:
Slurm Job: 409
Cgroup CPU set: 0-7
CPUs allocated for PID 7907: 0-7
CPUs allocated for PID 7912: 0-3
CPUs allocated for PID 7931: 0-3
slurmd1:
Slurm Job: 406
Cgroup CPU set: 0-3
CPUs allocated for PID 7409: 0-3
CPUs allocated for PID 7414: 0-3
CPUs allocated for PID 7425: 0-3
slurmd2:
Slurm Job: 408
Cgroup CPU set: 0-7
CPUs allocated for PID 7491: 0-7
CPUs allocated for PID 7496: 0-3
CPUs allocated for PID 7515: 0-3
But otherwise, I’ve not had issues with users spawning jobs from within jobs, but I’m not a seasoned slurm admin, so that may not hold up with others.
Reed