[slurm-users] GRES and GPUs

Xaver Stiensmeier

unread,

Jul 17, 2023, 7:44:52 AM7/17/23

to slurm...@lists.schedmd.com

Hey,

I am currently trying to understand how I can schedule a job that needs a GPU.

I read about GRES https://slurm.schedmd.com/gres.html and tried to use:

GresTypes=gpu
NodeName=test Gres=gpu:1

But calling - after a 'sudo scontrol reconfigure':

srun --gpus 1 hostname

didn't work:

srun: error: Unable to allocate resources: Invalid generic resource (gres) specification

so I read more https://slurm.schedmd.com/gres.conf.html but that didn't really help me.

I am rather confused. GRES claims to be generic resources but then it comes with three defined resources (GPU, MPS, MIG) and using one of those didn't work in my case.

Obviously, I am misunderstanding something, but I am unsure where to look.

Best regards,
Xaver Stiensmeier

Hermann Schwärzler

unread,

Jul 17, 2023, 8:11:36 AM7/17/23

to slurm...@lists.schedmd.com

Hi Xaver,

what kind of SelectType are you using in your slurm.conf?

Per https://slurm.schedmd.com/gres.html you have to consider:
"As for the --gpu* option, these options are only supported by Slurm's
select/cons_tres plugin."

So you can use "--gpus ..." only when you state
SelectType = select/cons_tres
in your slurm.conf.

But "--gres=gpu:1" should work always.

Regards
Hermann

Xaver Stiensmeier

unread,

Jul 17, 2023, 9:44:02 AM7/17/23

to slurm...@lists.schedmd.com

Hi Hermann,

Good idea, but we are already using `SelectType=select/cons_tres`. After
setting everything up again (in case I made an unnoticed mistake), I saw
that the node got marked STATE=inval.

To be honest, I thought I can just claim that a node has a gpu even if
it doesn't have one - just for testing purposes. Could this be the issue?

Best regards,
Xaver Stiensmeier

Groner, Rob

unread,

Jul 17, 2023, 9:57:45 AM7/17/23

to slurm...@lists.schedmd.com

That would certainly do it. If you look at the slurmctld log when it comes up, it will say that it's marking that node as invalid because it has less (0) gres resources then you say it should have. That's because slurmd on that node will come up and say "What gres resources??"

For testing purposes, you can just create a dummy file on the node, then in gres.conf, point to that file as the "graphics file" interface. As long as you don't try to actually use it as a graphics file, that should be enough for that node to think it has gres/gpu resources. That's what I do in my vagrant slurm cluster.

Rob

From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Xaver Stiensmeier <xaversti...@gmx.de>
Sent: Monday, July 17, 2023 9:43 AM
To: slurm...@lists.schedmd.com <slurm...@lists.schedmd.com>
Subject: Re: [slurm-users] GRES and GPUs

Hi Hermann,

Good idea, but we are already using `SelectType=select/cons_tres`. After
setting everything up again (in case I made an unnoticed mistake), I saw
that the node got marked STATE=inval.

To be honest, I thought I can just claim that a node has a gpu even if
it doesn't have one - just for testing purposes. Could this be the issue?

Best regards,
Xaver Stiensmeier

On 17.07.23 14:11, Hermann Schwärzler wrote:
> Hi Xaver,
>
> what kind of SelectType are you using in your slurm.conf?
>

> Per https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.html&data=05%7C01%7Crug262%40psu.edu%7Cbc4b7775beae4d2e376c08db86cbfc7b%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638251982928987379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PqvE6pL2sKSb6KxLngi0sbm6qhIv8MRYTmUM%2Bgq1hrI%3D&reserved=0 you have to consider:

> "As for the --gpu* option, these options are only supported by Slurm's
> select/cons_tres plugin."
>
> So you can use "--gpus ..." only when you state
> SelectType = select/cons_tres
> in your slurm.conf.
>
> But "--gres=gpu:1" should work always.
>
> Regards
> Hermann
>
>
> On 7/17/23 13:43, Xaver Stiensmeier wrote:
>> Hey,
>>
>> I am currently trying to understand how I can schedule a job that
>> needs a GPU.
>>

>> I read about GRES https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.html&data=05%7C01%7Crug262%40psu.edu%7Cbc4b7775beae4d2e376c08db86cbfc7b%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638251982928987379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PqvE6pL2sKSb6KxLngi0sbm6qhIv8MRYTmUM%2Bgq1hrI%3D&reserved=0 and tried to use:

>>
>> GresTypes=gpu
>> NodeName=test Gres=gpu:1
>>
>> But calling - after a 'sudo scontrol reconfigure':
>>
>> srun --gpus 1 hostname
>>
>> didn't work:
>>
>> srun: error: Unable to allocate resources: Invalid generic resource
>> (gres) specification
>>

Xaver Stiensmeier

unread,

Jul 19, 2023, 4:24:22 AM7/19/23

to slurm...@lists.schedmd.com

Alright,

I tried a few more things, but I still wasn't able to get past: srun: error: Unable to allocate resources: Invalid generic resource (gres) specification.

I should mention that the node I am trying to test GPU with, doesn't really have a gpu, but Rob was so kind to find out that you do not need a gpu as long as you just link to a file in /dev/ in the gres.conf. As mentioned: This is just for testing purposes - in the end we will run this on a node with a gpu, but it is not available at the moment.

The error isn't changing

If I omitt "GresTypes=gpu" and "Gres=gpu:1", I still get the same error.

Debug Info

I added the gpu debug flag and logged the following:

[2023-07-18T14:59:45.026] restoring original state of nodes
[2023-07-18T14:59:45.026] select/cons_tres: part_data_create_array: select/cons_tres: preparing for 2 partitions
[2023-07-18T14:59:45.026] error: GresPlugins changed from (null) to gpu ignored
[2023-07-18T14:59:45.026] error: Restart the slurmctld daemon to change GresPlugins
[2023-07-18T14:59:45.026] read_slurm_conf: backup_controller not specified
[2023-07-18T14:59:45.026] error: GresPlugins changed from (null) to gpu ignored
[2023-07-18T14:59:45.026] error: Restart the slurmctld daemon to change GresPlugins
[2023-07-18T14:59:45.026] select/cons_tres: select_p_reconfigure: select/cons_tres: reconfigure
[2023-07-18T14:59:45.027] select/cons_tres: part_data_create_array: select/cons_tres: preparing for 2 partitions
[2023-07-18T14:59:45.027] No parameter for mcs plugin, default values set
[2023-07-18T14:59:45.027] mcs: MCSParameters = (null). ondemand set.
[2023-07-18T14:59:45.028] _slurm_rpc_reconfigure_controller: completed usec=5898
[2023-07-18T14:59:45.952] SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=2

I am a bit unsure what to do next to further investigate this issue.

Best regards,
Xaver

Xaver Stiensmeier

unread,

Jul 19, 2023, 8:19:56 AM7/19/23

to slurm...@lists.schedmd.com

Okay,

thanks to S. Zhang I was able to figure out why nothing changed. While I did restart systemctld at the beginning of my tests, I didn't do so later, because I felt like it was unnecessary, but it is right there in the fourth line of the log that this is needed. Somehow I misread it and thought it automatically restarted slurmctld.

Given the setup:

slurm.conf
...
GresTypes=gpu
NodeName=NName SocketsPerBoard=8 CoresPerSocket=1 RealMemory=8000 GRES=gpu:1 State=UNKNOWN
...

gres.conf
NodeName=NName Name=gpu File=/dev/tty0

When restarting, I get the following error:

error: Setting node NName state to INVAL with reason:gres/gpu count reported lower than configured (0 < 1)

So it is still not working, but at least I get a more helpful log message. Because I know that this /dev/tty trick works, I am still unsure where the current error lies, but I will try to investigate it further. I am thankful for any ideas in that regard.

Best regards,
Xaver

Hermann Schwärzler

unread,

Jul 19, 2023, 9:04:56 AM7/19/23

to slurm...@lists.schedmd.com

Hi Xaver,

I think you are missing the "Count=..." part in gres.conf

It should read

NodeName=NName Name=gpu File=/dev/tty0 Count=1

in your case.

Regards,
Hermann

>> *The error isn't changing*

>>
>> If I omitt "GresTypes=gpu" and "Gres=gpu:1", I still get the same error.
>>

>> *Debug Info*

>>> ------------------------------------------------------------------------
>>> *From:* slurm-users <slurm-use...@lists.schedmd.com> on behalf
>>> of Xaver Stiensmeier <xaversti...@gmx.de>
>>> *Sent:* Monday, July 17, 2023 9:43 AM
>>> *To:* slurm...@lists.schedmd.com <slurm...@lists.schedmd.com>
>>> *Subject:* Re: [slurm-users] GRES and GPUs

>>> Hi Hermann,
>>>
>>> Good idea, but we are already using `SelectType=select/cons_tres`. After
>>> setting everything up again (in case I made an unnoticed mistake), I saw
>>> that the node got marked STATE=inval.
>>>
>>> To be honest, I thought I can just claim that a node has a gpu even if
>>> it doesn't have one - just for testing purposes. Could this be the issue?
>>>
>>> Best regards,
>>> Xaver Stiensmeier
>>>
>>> On 17.07.23 14:11, Hermann Schwärzler wrote:
>>> > Hi Xaver,
>>> >
>>> > what kind of SelectType are you using in your slurm.conf?
>>> >
>>> > Per

>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.html&data=05%7C01%7Crug262%40psu.edu%7Cbc4b7775beae4d2e376c08db86cbfc7b%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638251982928987379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PqvE6pL2sKSb6KxLngi0sbm6qhIv8MRYTmUM%2Bgq1hrI%3D&reserved=0 <https://slurm.schedmd.com/gres.html> you have to consider:

>>> > "As for the --gpu* option, these options are only supported by Slurm's
>>> > select/cons_tres plugin."
>>> >
>>> > So you can use "--gpus ..." only when you state
>>> > SelectType = select/cons_tres
>>> > in your slurm.conf.
>>> >
>>> > But "--gres=gpu:1" should work always.
>>> >
>>> > Regards
>>> > Hermann
>>> >
>>> >
>>> > On 7/17/23 13:43, Xaver Stiensmeier wrote:
>>> >> Hey,
>>> >>
>>> >> I am currently trying to understand how I can schedule a job that
>>> >> needs a GPU.
>>> >>
>>> >> I read about GRES

>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.html&data=05%7C01%7Crug262%40psu.edu%7Cbc4b7775beae4d2e376c08db86cbfc7b%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638251982928987379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PqvE6pL2sKSb6KxLngi0sbm6qhIv8MRYTmUM%2Bgq1hrI%3D&reserved=0 <https://slurm.schedmd.com/gres.html> and tried to use:

>>> >>
>>> >> GresTypes=gpu
>>> >> NodeName=test Gres=gpu:1
>>> >>
>>> >> But calling - after a 'sudo scontrol reconfigure':
>>> >>
>>> >> srun --gpus 1 hostname
>>> >>
>>> >> didn't work:
>>> >>
>>> >> srun: error: Unable to allocate resources: Invalid generic resource
>>> >> (gres) specification
>>> >>
>>> >> so I read more

>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.conf.html&data=05%7C01%7Crug262%40psu.edu%7Cbc4b7775beae4d2e376c08db86cbfc7b%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638251982928987379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aCh8X6QtJpRlIWxo%2BQxL85CC%2FbIo6bDxAY%2Fd5B9khmE%3D&reserved=0 <https://slurm.schedmd.com/gres.conf.html> but that

Groner, Rob

unread,

Jul 19, 2023, 9:22:59 AM7/19/23

to slurm...@lists.schedmd.com

Worth a try, but the documentation says that by default the count is the same as the number of files specified...so, should automatically be 1.

If you want to stop the node from going to INVAL, you can always set config_overrides in slurm.conf. That will tell the node what it has, instead of what it thinks it has. Useful for testing.

Rob

From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Hermann Schwärzler <hermann.s...@uibk.ac.at>
Sent: Wednesday, July 19, 2023 9:04 AM
To: slurm...@lists.schedmd.com <slurm...@lists.schedmd.com>
Subject: Re: [slurm-users] GRES and GPUs

>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.html&data=05%7C01%7Crug262%40psu.edu%7Ca605d51361ff4715490b08db8858d027%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638253687267659689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fMujgwxjxUlumFDIXy60JKlBQz6Qy6kSxMNGDmnhOOo%3D&reserved=0 <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.html&data=05%7C01%7Crug262%40psu.edu%7Ca605d51361ff4715490b08db8858d027%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638253687267659689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fMujgwxjxUlumFDIXy60JKlBQz6Qy6kSxMNGDmnhOOo%3D&reserved=0> you have to consider:

>>> > "As for the --gpu* option, these options are only supported by Slurm's
>>> > select/cons_tres plugin."
>>> >
>>> > So you can use "--gpus ..." only when you state
>>> > SelectType = select/cons_tres
>>> > in your slurm.conf.
>>> >
>>> > But "--gres=gpu:1" should work always.
>>> >
>>> > Regards
>>> > Hermann
>>> >
>>> >
>>> > On 7/17/23 13:43, Xaver Stiensmeier wrote:
>>> >> Hey,
>>> >>
>>> >> I am currently trying to understand how I can schedule a job that
>>> >> needs a GPU.
>>> >>
>>> >> I read about GRES

>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.html&data=05%7C01%7Crug262%40psu.edu%7Ca605d51361ff4715490b08db8858d027%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638253687267659689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fMujgwxjxUlumFDIXy60JKlBQz6Qy6kSxMNGDmnhOOo%3D&reserved=0 <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.html&data=05%7C01%7Crug262%40psu.edu%7Ca605d51361ff4715490b08db8858d027%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638253687267659689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fMujgwxjxUlumFDIXy60JKlBQz6Qy6kSxMNGDmnhOOo%3D&reserved=0> and tried to use:

>>> >>
>>> >> GresTypes=gpu
>>> >> NodeName=test Gres=gpu:1
>>> >>
>>> >> But calling - after a 'sudo scontrol reconfigure':
>>> >>
>>> >> srun --gpus 1 hostname
>>> >>
>>> >> didn't work:
>>> >>
>>> >> srun: error: Unable to allocate resources: Invalid generic resource
>>> >> (gres) specification
>>> >>
>>> >> so I read more

>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.conf.html&data=05%7C01%7Crug262%40psu.edu%7Ca605d51361ff4715490b08db8858d027%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638253687267659689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fpQKfO7iKJKZoUe15dPLKRl33XQwFC%2BFJ%2FqKlMtB2V0%3D&reserved=0 <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.conf.html&data=05%7C01%7Crug262%40psu.edu%7Ca605d51361ff4715490b08db8858d027%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638253687267659689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fpQKfO7iKJKZoUe15dPLKRl33XQwFC%2BFJ%2FqKlMtB2V0%3D&reserved=0> but that

Xaver Stiensmeier

unread,

Jul 19, 2023, 11:06:15 AM7/19/23

to slurm...@lists.schedmd.com

Hi Hermann,

count doesn't make a difference, but I noticed that when I reconfigure slurm and do reloads afterwards, the error "gpu count lower than configured" no longer appears - so maybe it is just because a reconfigure is needed after reloading slurmctld - or maybe it doesn't show the error anymore, because the node is still invalid? However, I still get the error:

error: _slurm_rpc_node_registration node=NName: Invalid argument

If I understand correctly, this is telling me that there's something wrong with my slurm.conf. I know that all pre-existing parameters are correct, so I assume it must be the gpus entry, but I don't see where it's wrong:

NodeName=NName SocketsPerBoard=8 CoresPerSocket=1 RealMemory=8000 Gres=gpu:1 State=CLOUD # bibiserv

Thanks for all the help,
Xaver

Xaver Stiensmeier

unread,

Jul 20, 2023, 10:02:57 AM7/20/23

to slurm...@lists.schedmd.com

Hey everyone,

I am answering my own question:
It wasn't working because I need to reload slurmd on the machine, too. So the full "test gpu management without gpu" workflow is:

1. Start your slurm cluster.
2. Add a gpu to an instance of your choice in the slurm.conf

For example:

DebugFlags=GRES # consider this for initial setup.
SelectType=select/cons_tres
GresTypes=gpu
NodeName=master SocketsPerBoard=8 CoresPerSocket=1 RealMemory=8000 GRES=gpu:1 State=UNKNOWN

3. Register it at gres.conf and give it some file

NodeName=master Name=gpu File=/dev/tty0 Count=1 # count seems to be optional

4. Reload slurmctld (on the master) and slurmd (on the gpu node)

sudo systemctl restart slurmctld
sudo systemctl restart slurmd

I haven't tested this solution thoroughly yet, but at least commands like:

sudo systemctl restart slurmd
# master

run without any issues afterwards.

Thank you for all your help!

Best regards,
Xaver

Reply all

Reply to author

Forward