Sudo on slurm compute node

497 views
Skip to first unread message

Ankit Maroo

unread,
Apr 26, 2022, 11:04:34 PM4/26/22
to google-cloud-slurm-discuss
Hi all,

I am running a slurm cluster using the basic example in terraform section of slurm-gcp repo. The setup uses the OSlogin and the user has admin access to the compute node. 

Now, if the admin user ssh into the login node using IAP, it is able to sudo su and change themself to the root user, but if the same user starts an interactive session by using `srun --exclusive=user --pty --partition=P1 /bin/bash` they are not able to sudo su on the compute node. If the user directly ssh to compute node using IAP, sudo su works.

Things I have tried
1. adding  /var/google-sudoers.d/$user with allowing all permission at startup (after google agent and before sshd). This results in `sudo: account validation failure, is your account locked` error.

The setup seems to involve multiple systems like OSlogin and munge and I lack understanding of PAM, NSS, etc to play with this setup. Is there a way to make sure that the Oslogin permissions are reflected in compute node without needing to separately ssh into the node?


Alex Chekholko

unread,
Apr 27, 2022, 11:47:15 AM4/27/22
to Ankit Maroo, google-cloud-slurm-discuss
Hi Ankit,

Just to clarify, you want a regular user to be able to get root from inside an interactive compute job?  That sounds unusual, what problem are you trying to solve?

Off the top of my head, one workaround could be to provision one cluster per user and then have the user just do everything as root.  Obviously not the way we would do it in the old multi-user environment days but you could call it "cloud native" :)

Regards,
Alex

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/ad0df198-274b-43bd-9bd5-20e177d2ee00n%40googlegroups.com.

Ankit Maroo

unread,
Apr 27, 2022, 5:20:11 PM4/27/22
to Alex Chekholko, google-cloud-slurm-discuss
Hi, thanks for the reply. 

yes using root for any purpose is mostly avoidable, but in my case, the users are heavily gonna be using the interactive machine for ML tasks with GPU, and they need to be able to sudo in the machine. The current workaround for us to log in to the new node we got via srun is to open another ssh connection and that works as the Oslogin allows that user to do sudo su. The issue is how can this be avoided and why cant i get srun to get me a machine where I can sudo su.


thanks
Ankit

Ward Harold

unread,
Apr 27, 2022, 7:35:03 PM4/27/22
to Ankit Maroo, Alex Chekholko, google-cloud-slurm-discuss
Have you tried setting setuid for the specific binary or binaries that users need to be root to run? 

Slurm was developed to work on large shared systems where giving random users sudo privileges wouldn't fly so it doesn't surprise me that srun would prevent that by default.

... WkH
Ward Harold |  Solutions Architect | w...@google.com | 512-751-9198



Ankit Maroo

unread,
May 2, 2022, 12:30:44 PM5/2/22
to Ward Harold, Alex Chekholko, google-cloud-slurm-discuss
That would help, but I am looking to provide full root access to the node. This is a special R&D group and they need to be able to do anything with compute node before things are fully baked.

Ankit

Ward Harold

unread,
May 2, 2022, 2:02:28 PM5/2/22
to Ankit Maroo, Alex Chekholko, google-cloud-slurm-discuss
While, per previous comments, that's a pretty obvious anti-pattern you could try removing 'slurm' from the /etc/nsswitch.conf file. That removes the nss slurm module.

... WkH
Ward Harold |  Solutions Architect | w...@google.com | 512-751-9198


Reply all
Reply to author
Forward
0 new messages