Hello,
We’re facing a really weird issue that unfortunately we are completely stuck on. It appears that when users are running a job through SLURM they don’t have the systems local groups applied to their user. They DO have the local groups if I ‘su’ to the user on the compute node. They also DO have the local groups if I do `id user1` when logged in as that user – but as far as the system is concerned they don’t have these groups for reading/writing files etc.
I have read through the thread here: https://groups.google.com/g/slurm-users/c/r4DyJqgduAc which is exactly the same issue as mine but I can’t see that it was resolved unfortunately. If anyone has any ideas of what I could try (I’ve tried initgroups as shown here as well with no luck: https://groups.google.com/g/slurm-users/c/ZTyIO_lYxac/m/4PAd_ykTAgAJ) that would be very much appreciated please!
The groups that _are_ shown are from Active Directory, through SSSD. Below I show some sample output that may assist with debugging and help explain my issue,
[user1@ui-host~]$ salloc
salloc: Granted job allocation 530
bash-4.2$ id
uid=1003353(user1) gid=1003353 groups=1003353,1851018(SCARFFILEASR92),1851500(adhoc_storage),1851501(analysis_computers),1851502(fileserver),1851504(node_provisioning)
bash-4.2$ hostname
Interestingly, if I do `id user1` here then I DO get the output below. !?
[root@octocn01 ~]# su user1
bash-4.2$ id
uid=1003353(user1) gid=1003353 groups=1003353,1001(octopus_scarf_rwR92),9999(will_test),1851018(SCARFFILEASR92),1851500(adhoc_storage),1851501(analysis_computers),1851502(fileserver),1851504(node_provisioning)
cat /etc/group on octocn01
octopus_scarf_rwR92:x:1001:user1
will_test:x:9999:user1
Thank you very much,
Will Furnell.
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
Hi all,
I was able to solve this issue with thanks to this bug report:
https://bugs.schedmd.com/show_bug.cgi?id=13217
Basically I needed to set
“LaunchParameters=disable_send_gids”
in my configuration and everything started magically working – probably because the controller doesn’t have the local groups the compute nodes do,
Thanks,
Will.