[slurm-users] ssh-keys on compute nodes?

186 views
Skip to first unread message

Durai Arasan

unread,
Jun 8, 2020, 11:17:11 AM6/8/20
to slurm...@lists.schedmd.com
Hi,

we are setting up a slurm cluster and are at the stage of adding ssh keys of the users to the nodes.

I thought it would be sufficient to add the ssh keys of the users to only the designated login nodes. But I heard that it is also necessary to add them to the compute nodes as well for slurm to be able to submit jobs of the users successfully. Apparently this is true especially for MPI jobs.

So is it true that ssh keys of the users must be added to the ~/.ssh/authorized_keys of *all* nodes and not just the login nodes?

Thanks,
Durai

Jeffrey T Frey

unread,
Jun 8, 2020, 11:29:44 AM6/8/20
to Slurm User Community List
An MPI library with tight integration with Slurm (e.g. Intel MPI, Open MPI) can use "srun" to start the remote workers. In some cases "srun" can be used directly for MPI startup (e.g. "srun" instead of "mpirun").


Other/older MPI libraries that start remote processes using "ssh" would, naturally, require keyless ssh logins to work across all compute nodes in the cluster.


When we provision user accounts on our Slurm cluster we still add .ssh, .ssh/id_rsa (needed for older X11 tunneling via libssh2), and add the public key to .ssh/authorized_keys. All officially-supported MPIs on the cluster are tightly integrated with Slurm. But there are commercial products and older software our clients use that are not, so having keyless access ready for them helps those users get their workflows working more quickly.

gil...@rist.or.jp

unread,
Jun 8, 2020, 11:37:56 AM6/8/20
to Slurm User Community List

Durai,

A high quality MPI implementation uses SLURM (e.g. srun) to spawn MPI tasks/daemons, and hence does not require (passwordless) SSH between nodes.

 

Cheers,

 

Gilles

Durai Arasan

unread,
Jun 8, 2020, 11:42:32 AM6/8/20
to Slurm User Community List
Ok, that was useful information.

So when you provision user accounts, you add the public key to .ssh/authorized_keys of *all* nodes on the cluster? Not just the login nodes.. ?
When we provision user accounts on our Slurm cluster we still add .ssh, .ssh/id_rsa (needed for older X11 tunneling via libssh2), and add the public key to .ssh/authorized_keys.  

Thanks,
Durai 

Reuti

unread,
Jun 8, 2020, 11:45:55 AM6/8/20
to Slurm User Community List
Hi,
Instead of doing this for each and every user in the cluster, you can also implement hostbased authentication. I put some notes here:

https://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html

Note: inside the cluster each node is a source and/or target machine depending the parallel library, hence both steps must be done. But often the nodes are having an identical image on all nodes anyway. Sure, the login node is a solely a source machine.

Note: I found that with an update of the SSH package the changed permissions bits for /usr/lib64/ssh/ssh-keysign might also get reset, if it's not working after an `rpm -U ssh`, check this flag.

-- Reuti


Durai Arasan

unread,
Jun 8, 2020, 12:02:09 PM6/8/20
to Jeffrey T Frey, Slurm User Community List
Hi Jeffrey,

Thanks for the clarification.

But this is concerning, as the users will be able to ssh into any node. How do you prevent that?

Best,
Durai

On Mon, Jun 8, 2020 at 5:55 PM Jeffrey T Frey <fr...@udel.edu> wrote:
User home directories are on a shared (NFS) filesystem that's mounted on every node.  Thus, they have the same id_rsa key and authorized_keys file present on all nodes.

Jeffrey T Frey

unread,
Jun 8, 2020, 12:07:28 PM6/8/20
to Durai Arasan, Slurm User Community List
There's a Slurm PAM module you can use to gate ssh access -- basically it checks to see if the user has a job running on the node and moves any ssh sessions to the first cgroup associated with that user on that node. If you don't use cgroup resource limiting I think it just gates access w/o any such cgroup assignments.

Ole Holm Nielsen

unread,
Jun 8, 2020, 12:41:33 PM6/8/20
to slurm...@lists.schedmd.com
On 08-06-2020 18:07, Jeffrey T Frey wrote:
> There's a Slurm PAM module you can use to gate ssh access -- basically it checks to see if the user has a job running on the node and moves any ssh sessions to the first cgroup associated with that user on that node. If you don't use cgroup resource limiting I think it just gates access w/o any such cgroup assignments.

The pam_slurm_adopt[1] module is used by lots of Slurm sites for
restricting access by SSH. See the discussion in
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#pam-module-restrictions

/Ole

[1] https://slurm.schedmd.com/pam_slurm_adopt.html

Durai Arasan

unread,
Jun 9, 2020, 5:22:10 AM6/9/20
to Slurm User Community List
Hi,

Can you please help me understand how the passwordless ssh works on SLURM?

I was under the assumption that jobs/tasks are ultimately submitted by the "slurm" linux user and not by the linux user who wants to run jobs. Is this not correct? So is it not sufficient for only the "slurm" linux user to have passwordless ssh access to all nodes? Why do we have to give passwordless ssh access to every user of the cluster?

Thanks,
Durai
Zentrum für Datenverarbeitung
Tübingen

Ole Holm Nielsen

unread,
Jun 9, 2020, 6:44:04 AM6/9/20
to slurm...@lists.schedmd.com
Hi Durai,

I can only try to explain how I understand this: The "slurm" user runs
only the slurmctld and slurmdbd central server daemons. On the compute
nodes, the slurmd daemon runs as the root user so that it can start user
tasks on behalf of normal users.

The "slurm" user should *not* have password-less SSH!

Normal users also do not need SSH if their MPI tasks are started with
Slurm's "srun". Users only need password-less SSH if they have some
strange MPI software, in which case you need to set up SSH authorized_keys
files for such users.

/Ole
> <mailto:arasan...@gmail.com>> wrote:
> >>
> >> Hi Jeffrey,
> >>
> >> Thanks for the clarification.
> >>
> >> But this is concerning, as the users will be able to ssh into any
> node. How do you prevent that?
> >>
> >> Best,
> >> Durai
> >>
> >> On Mon, Jun 8, 2020 at 5:55 PM Jeffrey T Frey <fr...@udel.edu
> <mailto:fr...@udel.edu>> wrote:
> >> User home directories are on a shared (NFS) filesystem that's
> mounted on every node.  Thus, they have the same id_rsa key and
> authorized_keys file present on all nodes.
> >>
> >>
> >>
> >>
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: Ole.H....@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620
> >>> On Jun 8, 2020, at 11:42 , Durai Arasan <arasan...@gmail.com

Michael Jennings

unread,
Jun 9, 2020, 11:45:43 AM6/9/20
to Slurm User Community List
On Tuesday, 09 June 2020, at 12:43:34 (+0200),
Ole Holm Nielsen wrote:

> in which case you need to set up SSH authorized_keys files for such
> users.

I'll admit that I didn't know about this until I came to LANL, but
there's actually a much better alternative than having to create user
key pairs and manage users' ~/.ssh/authorized_keys files: Host-based
Authentication.

Setting "HostbasedAuthentication yes" and configuring it properly on
all the cluster hosts allows a cryptographically-secured equivalent of
what used to be known as RHosts-style Authentication using ~/.rhosts
and /etc/hosts.equiv. Essentially, it allows host-key-authenticated
systems to recognize each other, and once that completes successfully,
the target host trusts the source host to accurately introduce the
user who's logging in.

Once you have host-based authentication working, users can SSH around
inside your cluster seamlessly (subject to additional restrictions, of
course, like access.conf or pam_slurm_adopt) without needing hackish
extra utilities to create and manage cluster-specific passphraseless
key pairs for every single user! :-)

There's a great cookbook online that tells you step-by-step how to set
it up: https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Host-based_Authentication

HTH!
Michael

--
Michael E. Jennings <m...@lanl.gov>
HPC Systems Team, Los Alamos National Laboratory
Bldg. 03-2327, Rm. 2341 W: +1 (505) 606-0605

Prentice Bisbal

unread,
Jun 9, 2020, 3:26:57 PM6/9/20
to slurm...@lists.schedmd.com
Host-based security is not considered as safe as user-based security, so
should only be used in special cases.
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
http://www.pppl.gov


Ole Holm Nielsen

unread,
Jun 9, 2020, 3:27:46 PM6/9/20
to slurm...@lists.schedmd.com
Hi Michael,

Thanks very much, this is really cool! I need to look into the
HostbasedAuthentication for intra-cluster MPI tasks spawned by SSH (not
using srun).

Presumably external access still needs to use SSH authorized keys?

Best regards,
Ole

Ole Holm Nielsen

unread,
Jun 9, 2020, 3:35:02 PM6/9/20
to slurm...@lists.schedmd.com
Hi Prentice,

Could you kindly elaborate on this statement? Is host-based security
safe inside a compute cluster compared to user-based SSH keys?

Thanks,
Ole

Michael Jennings

unread,
Jun 9, 2020, 11:29:18 PM6/9/20
to Slurm User Community List
On Tuesday, 09 June 2020, at 15:26:36 (-0400),
Prentice Bisbal wrote:

> Host-based security is not considered as safe as user-based security, so
> should only be used in special cases.

That's a pretty significant claim, and certainly one that would need
to be backed up with evidence, references, etc.

Especially given that, from a cryptographic perspective, there's no
significant difference. The host keys are created, exchanged, and
validated in essentially the same manner as the user keys. Plus,
given that host-based authentication is set up and maintained by the
system admin(s) (presumably) carefully and with no opportunity for
users to "accidentally" introduce errors or flaws into their
configurations, one can easily see a clear argument for the
superiority of authenticating both host and user via a methodology
possessing none of these flaws or opportunities for tragedy! :-)

If your concerns are related to STIG compliance and/or other similar
policy-based safeguards, remember that clusters are a unique case --
one in which there is no significant difference between "compromised
cluster node" and "compromised cluster" (excepting the
master/SMW/admin host, of course) -- and such blanket policies have
*never* really made much sense in the HPC world.

So while it may be a "bad idea" in general for hosts to trust each
other, if the alternative is forceably maintaining unencrypted private
keys (that's what passphraseless key pairs are, after all!) and
relevant configuration stanza(s) per user to facilitate free
intracluster SSHing, host-based authentication managed and maintained
by the system's administrative staff *is*, unequivocally, a superior
solution.

And above all, remember the cardinal rule of security/insecurity
claims: Sweeping generalizations about cybersecurity are ALWAYS
WRONG! ;-)

Michael Jennings

unread,
Jun 9, 2020, 11:35:49 PM6/9/20
to slurm...@lists.schedmd.com
On Tuesday, 09 June 2020, at 21:27:27 (+0200),
Ole Holm Nielsen wrote:

> Thanks very much, this is really cool! I need to look into the
> HostbasedAuthentication for intra-cluster MPI tasks spawned by SSH (not
> using srun).
>
> Presumably external access still needs to use SSH authorized keys?

Or some other authentication method, yes. We use MFA, IP address
restrictions, and other techniques to secure cluster borders; only
once the user has been thoroughly authenticated and allowed entry to
the cluster login nodes (what we refer to as FEs or "front-end" nodes)
can the user then SSH freely within the cluster. (And, to be fair,
not all clusters allow free internal movement. Depends on the
cluster.)

And I will readily admit that I, somewhat selfishly, would love to see
a blurb about host-based auth in your thorough and wonderfully written
wiki! O;-)

Prentice Bisbal

unread,
Jun 10, 2020, 1:43:22 PM6/10/20
to slurm...@lists.schedmd.com
Gladly! User based security means that you need to enter a user password
or something similar like kerberos keys or SSH keys to authenticate with
a different hosts. In every place I've worked, passwordless ssh keys
were forbidden, so even if using SSH keys, that ssh key would need to be
unlocked with with the password the first time it's used. In this
scenario, if a user account is compromised on one system, the damage is
limited to that system.

With host-based security, all the hosts in the trusted group allow users
to go from one machine to the other without using a password. In this
case, if a user account is compromised on one system, then that user
account now compromised on *every* system in the trusted group.

Does that make sense?

There's a reason why host-based authentication is not the default
behavior in SSH.

Prentice

Durai Arasan

unread,
Jun 16, 2020, 9:18:25 AM6/16/20
to Slurm User Community List
Thank you. We are planning to put ssh keys on login nodes only and use the PAM module to control access to compute nodes. Will such a setup work? Or is it necessary for PAM to work to have the ssh keys on the compute nodes as well?  I'm sorry but this is not clearly mentioned on any documentation...

Durai
Zentrum für Datenverarbeitung
Tübingen

Ole Holm Nielsen

unread,
Jun 17, 2020, 3:59:43 AM6/17/20
to slurm...@lists.schedmd.com
On 6/16/20 3:17 PM, Durai Arasan wrote:
> Thank you. We are planning to put ssh keys on login nodes only and use the
> PAM module to control access to compute nodes. Will such a setup work? Or
> is it necessary for PAM to work to have the ssh keys on the compute nodes
> as well?  I'm sorry but this is not clearly mentioned on any documentation...

That will work. You need to configure the pam_slurm_adopt module to
control access, see
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#pam-module-restrictions

SSH authentication must of course be configured in order for users to ssh
to the nodes, but I don't believe that PAM cares about that.

/Ole

Ole Holm Nielsen

unread,
Jun 17, 2020, 4:27:19 AM6/17/20
to slurm...@lists.schedmd.com
On 6/9/20 5:45 PM, Michael Jennings wrote:
> On Tuesday, 09 June 2020, at 12:43:34 (+0200),
> Ole Holm Nielsen wrote:
>
>> in which case you need to set up SSH authorized_keys files for such
>> users.
>
> I'll admit that I didn't know about this until I came to LANL, but
> there's actually a much better alternative than having to create user
> key pairs and manage users' ~/.ssh/authorized_keys files: Host-based
> Authentication.
>
> Setting "HostbasedAuthentication yes" and configuring it properly on
> all the cluster hosts allows a cryptographically-secured equivalent of
> what used to be known as RHosts-style Authentication using ~/.rhosts
> and /etc/hosts.equiv. Essentially, it allows host-key-authenticated
> systems to recognize each other, and once that completes successfully,
> the target host trusts the source host to accurately introduce the
> user who's logging in.
>
> Once you have host-based authentication working, users can SSH around
> inside your cluster seamlessly (subject to additional restrictions, of
> course, like access.conf or pam_slurm_adopt) without needing hackish
> extra utilities to create and manage cluster-specific passphraseless
> key pairs for every single user! :-)

The host-based SSH authentication is a good idea, but only inside the
cluster's security perimeter, and one should not trust computers external
to the cluster nodes in this way.

I was looking at the OpenSSH documentation and the cookbooks on the net
for configuring host-based SSH authentication. The information can be a
little imprecise, so after a good deal of testing I've written a new
section in my Wiki page for Slurm on CentOS 7 systems:

https://wiki.fysik.dtu.dk/niflheim/SLURM#ssh-keys-for-password-less-access-to-cluster-nodes

This also includes ways to gather SSH public keys from the cluster nodes.

Comments are welcome.

Best regards,
Ole

Brian Andrus

unread,
Jun 19, 2020, 10:27:26 AM6/19/20
to slurm...@lists.schedmd.com

Nice write-up Ole!

I especially like the statement (emphasis added):

    For security reasons it is strongly recommended not to include the Slurm servers slurmctld and slurmdbd hosts in the Host-based_Authentication because normal users have no business on those servers!

Brian Andrus

Mark Hahn

unread,
Jun 19, 2020, 12:56:02 PM6/19/20
to Slurm User Community List
> The host-based SSH authentication is a good idea, but only inside the
> cluster's security perimeter, and one should not trust computers external to
> the cluster nodes in this way.

Even more than that! Hostbased allows you to define intersecting sets of
asymmetric trust. For instance, usually symmetric trust among compute nodes,
and they trust login nodes. But perhaps login nodes don't trust compute
nodes, but do trust each other. And admin nodes don't trust anyone,
but everyone trusts them. If you have "equivalent" clusters (same LDAP,
etc), then you might want login nodes of different clusters to trust each other.

The big win is that you entirely avoid the presence of private keys on the cluster.

We've used this widely in ComputeCanada since about 2003.

regards, mark hahn.

Ole Holm Nielsen

unread,
Jun 19, 2020, 2:58:31 PM6/19/20
to slurm...@lists.schedmd.com
On 19-06-2020 18:55, Mark Hahn wrote:
>> The host-based SSH authentication is a good idea, but only inside the
>> cluster's security perimeter, and one should not trust computers
>> external to the cluster nodes in this way.
>
> Even more than that!  Hostbased allows you to define intersecting sets of
> asymmetric trust.  For instance, usually symmetric trust among compute
> nodes,
> and they trust login nodes.  But perhaps login nodes don't trust compute
> nodes, but do trust each other.  And admin nodes don't trust anyone, but
> everyone trusts them.  If you have "equivalent" clusters (same LDAP,
> etc), then you might want login nodes of different clusters to trust
> each other.

So how do you configure that? Let me guess that you configure
host-based SSH authentication on all nodes, but who trusts who is
configured in the /etc/ssh/shosts.equiv file? Do you have any
guidelines for how to configure such asymmetric trust?

> The big win is that you entirely avoid the presence of private keys on
> the cluster.
>
> We've used this widely in ComputeCanada since about 2003.

/Ole

Reply all
Reply to author
Forward
0 new messages