[slurm-users] unable to ssh onto compute nodes on which I have running jobs

2,297 views
Skip to first unread message

byron

unread,
Jul 27, 2022, 11:54:16 AM7/27/22
to Slurm User Community List
Hi

When a user tries to login into a compute node on which they have a running job they get the error

Access denied: user blahblah (uid=3333) has no active jobs on this node.
Authentication failed.

I recently upgraded slurm to 20.11.9 and was under the impression that prior to the upgrade they were able to ssh into nodes where they had running jobs, but its entirely possible that I'm mistaken.

Either way, can some explain how to enable that behaviour please.

Thanks




Brian Andrus

unread,
Jul 27, 2022, 12:12:16 PM7/27/22
to slurm...@lists.schedmd.com
Verify that their uid on the node is the same as the uid your master sees

Brian Andrus

Lloyd Goodman

unread,
Jul 27, 2022, 12:29:56 PM7/27/22
to Slurm User Community List
I don't think that's the source of the problem.  All our user accounts are centrally managed using sssd.

And just to be sure I run "getent passwd <username>" on the management, head and compute nodes and they all returned the same values
--
Lloyd Goodman // HPC Systems Administrator
CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons Green // Bristol // BS16 7FR 
 

Brian Andrus

unread,
Jul 27, 2022, 12:50:51 PM7/27/22
to slurm...@lists.schedmd.com

Lloyd,

You could  check out the order of entries in your pam.d/ssh (and related/included) files

See where the slurm_pam_adopt is, how it is being called and if there are settings that are interferring.

Does this occur only on a single node, or all of them?

Brian Andrus

byron

unread,
Jul 27, 2022, 1:32:30 PM7/27/22
to Slurm User Community List
This happens on all our compute nodes.

I can't find any mention of slurm_pam_adopt in /etc/pamd.d.  All I have is in sshd, account required pam_slurm.so.

Fulcomer, Samuel

unread,
Jul 27, 2022, 2:44:08 PM7/27/22
to Slurm User Community List
From our /etc/pam.d/sshd on our compute nodes


account    required     pam_nologin.so
account    sufficient    pam_access.so
account    include      password-auth
-account    required      pam_slurm_adopt.so


....and /pam.d/password-auth: 

#-session     optional      pam_systemd.so

Note that disabling pam_systemd is necessary to have the ssh login properly fenced by cgroups.

Bernd Melchers

unread,
Jul 27, 2022, 4:44:18 PM7/27/22
to Slurm User Community List
> This happens on all our compute nodes.
> I can't find any mention of slurm_pam_adopt in /etc/pamd.d. All I have
> is in sshd, account required pam_slurm.so.

We had a similar problem, caused by wrong access bits for
ssh host key files in /etc/ssh/
now we have
-rw-r--r-- root root for public host keys and
-rw-r----- root ssh_keys for private part of host keys

Mit freundlichen Grüßen
Bernd Melchers

--
Archiv- und Backup-Service | fab-s...@zedat.fu-berlin.de
Freie Universität Berlin | Tel. +49-30-838-55905

byron

unread,
Aug 3, 2022, 6:19:25 AM8/3/22
to Slurm User Community List
Thanks for everyones help.  All I needed to do was compile a new version of pam_slurm.so.  I'm aware there's a newer slurm_pam_adopt but everything was already setup for pam_slurm.so so I just went with that.

Regards
Lloyd
Reply all
Reply to author
Forward
0 new messages