[slurm-users] Users do not have local groups when in a SLURM session (but do when 'su-ed')

696 views
Skip to first unread message

Will Furnell - STFC UKRI

unread,
Jun 8, 2022, 4:48:23 AM6/8/22
to slurm...@lists.schedmd.com

Hello,

 

We’re facing a really weird issue that unfortunately we are completely stuck on. It appears that when users are running a job through SLURM they don’t have the systems local groups applied to their user. They DO have the local groups if I ‘su’ to the user on the compute node. They also DO have the local groups if I do `id user1` when logged in as that user – but as far as the system is concerned they don’t have these groups for reading/writing files etc.

 

I have read through the thread here: https://groups.google.com/g/slurm-users/c/r4DyJqgduAc which is exactly the same issue as mine but I can’t see that it was resolved unfortunately. If anyone has any ideas of what I could try (I’ve tried initgroups as shown here as well with no luck: https://groups.google.com/g/slurm-users/c/ZTyIO_lYxac/m/4PAd_ykTAgAJ) that would be very much appreciated please!


The groups that _are_ shown are from Active Directory, through SSSD. Below I show some sample output that may assist with debugging and help explain my issue,

 

[user1@ui-host~]$ salloc

salloc: Granted job allocation 530

bash-4.2$ id

uid=1003353(user1) gid=1003353 groups=1003353,1851018(SCARFFILEASR92),1851500(adhoc_storage),1851501(analysis_computers),1851502(fileserver),1851504(node_provisioning)

bash-4.2$ hostname

octocn01.nubes.stfc.ac.uk

 

Interestingly, if I do `id user1` here then I DO get the output below. !?

 

[root@octocn01 ~]# su user1

bash-4.2$ id

uid=1003353(user1) gid=1003353 groups=1003353,1001(octopus_scarf_rwR92),9999(will_test),1851018(SCARFFILEASR92),1851500(adhoc_storage),1851501(analysis_computers),1851502(fileserver),1851504(node_provisioning)

 

cat /etc/group on octocn01

octopus_scarf_rwR92:x:1001:user1

will_test:x:9999:user1

 

Thank you very much,

 

Will Furnell.

 

 

This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. 

Will Furnell - STFC UKRI

unread,
Jun 13, 2022, 6:51:02 AM6/13/22
to slurm...@lists.schedmd.com

Hi all,


I was able to solve this issue with thanks to this bug report: https://bugs.schedmd.com/show_bug.cgi?id=13217

 

Basically I needed to set

“LaunchParameters=disable_send_gids”

in my configuration and everything started magically working – probably because the controller doesn’t have the local groups the compute nodes do,


Thanks,

 

Will.

 

 

Reply all
Reply to author
Forward
0 new messages