Hey,
I am currently trying to understand how I can schedule a job that needs a GPU.
I read about GRES https://slurm.schedmd.com/gres.html and tried to use:
GresTypes=gpu NodeName=test Gres=gpu:1
But calling - after a 'sudo scontrol reconfigure':
srun --gpus 1 hostname
didn't work:
srun: error: Unable to allocate resources: Invalid generic resource (gres) specification
so I read more https://slurm.schedmd.com/gres.conf.html but that didn't really help me.
I am rather confused. GRES claims to be generic resources but then it comes with three defined resources (GPU, MPS, MIG) and using one of those didn't work in my case.
Obviously, I am misunderstanding something, but I am unsure where to look.
Best regards,
Xaver Stiensmeier
Alright,
I tried a few more things, but I still wasn't able to get past: srun: error: Unable to allocate resources: Invalid generic resource (gres) specification.
I should mention that the node I am trying to test GPU with, doesn't really have a gpu, but Rob was so kind to find out that you do not need a gpu as long as you just link to a file in /dev/ in the gres.conf. As mentioned: This is just for testing purposes - in the end we will run this on a node with a gpu, but it is not available at the moment.
The error isn't changing
If I omitt "GresTypes=gpu" and "Gres=gpu:1", I still get the same error.
Debug Info
I added the gpu debug flag and logged the following:
[2023-07-18T14:59:45.026] restoring original state of nodes
[2023-07-18T14:59:45.026] select/cons_tres:
part_data_create_array: select/cons_tres: preparing for 2
partitions
[2023-07-18T14:59:45.026] error: GresPlugins changed from (null)
to gpu ignored
[2023-07-18T14:59:45.026] error: Restart the slurmctld daemon to
change GresPlugins
[2023-07-18T14:59:45.026] read_slurm_conf: backup_controller not
specified
[2023-07-18T14:59:45.026] error: GresPlugins changed from (null)
to gpu ignored
[2023-07-18T14:59:45.026] error: Restart the slurmctld daemon to
change GresPlugins
[2023-07-18T14:59:45.026] select/cons_tres: select_p_reconfigure:
select/cons_tres: reconfigure
[2023-07-18T14:59:45.027] select/cons_tres:
part_data_create_array: select/cons_tres: preparing for 2
partitions
[2023-07-18T14:59:45.027] No parameter for mcs plugin, default
values set
[2023-07-18T14:59:45.027] mcs: MCSParameters = (null). ondemand
set.
[2023-07-18T14:59:45.028] _slurm_rpc_reconfigure_controller:
completed usec=5898
[2023-07-18T14:59:45.952]
SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=2
I am a bit unsure what to do next to further investigate this issue.
Best regards,
Xaver
Okay,
thanks to S. Zhang I was able to figure out why nothing changed.
While I did restart systemctld at the beginning of my tests, I
didn't do so later, because I felt like it was unnecessary, but it
is right there in the fourth line of the log that this is needed.
Somehow I misread it and thought it automatically restarted
slurmctld.
Given the setup:
slurm.conf
...
GresTypes=gpu
NodeName=NName SocketsPerBoard=8 CoresPerSocket=1 RealMemory=8000
GRES=gpu:1 State=UNKNOWN
...
gres.conf
NodeName=NName Name=gpu File=/dev/tty0
When restarting, I get the following error:
error: Setting node NName state to INVAL with reason:gres/gpu count reported lower than configured (0 < 1)
So it is still not working, but at least I get a more helpful log message. Because I know that this /dev/tty trick works, I am still unsure where the current error lies, but I will try to investigate it further. I am thankful for any ideas in that regard.
Best regards,
Xaver
Hi Hermann,
count doesn't make a difference, but I noticed that when I reconfigure slurm and do reloads afterwards, the error "gpu count lower than configured" no longer appears - so maybe it is just because a reconfigure is needed after reloading slurmctld - or maybe it doesn't show the error anymore, because the node is still invalid? However, I still get the error:
error: _slurm_rpc_node_registration node=NName: Invalid
argument
If I understand correctly, this is telling me that there's something wrong with my slurm.conf. I know that all pre-existing parameters are correct, so I assume it must be the gpus entry, but I don't see where it's wrong:
NodeName=NName SocketsPerBoard=8 CoresPerSocket=1 RealMemory=8000 Gres=gpu:1 State=CLOUD # bibiserv
Thanks for all the help,
Xaver
Hey everyone,
I am answering my own question:
It wasn't working because I need to reload slurmd on the
machine, too. So the full "test gpu management without gpu"
workflow is:
1. Start your slurm cluster.
2. Add a gpu to an instance of your choice in the slurm.conf
For example:
DebugFlags=GRES # consider this for initial setup.
SelectType=select/cons_tres
GresTypes=gpu
NodeName=master SocketsPerBoard=8 CoresPerSocket=1 RealMemory=8000 GRES=gpu:1 State=UNKNOWN
3. Register it at gres.conf and give it some file
NodeName=master Name=gpu File=/dev/tty0 Count=1 # count seems to be optional
4. Reload slurmctld (on the master) and slurmd (on the gpu node)
sudo systemctl restart slurmctld
sudo systemctl restart slurmd
I haven't tested this solution thoroughly yet, but at least
commands like:
sudo systemctl restart slurmd
# master
run without any issues afterwards.
Thank you for all your help!
Best regards,
Xaver