[slurm-users] x11 forwarding not available?

1,340 views
Skip to first unread message

Dave Botsch

unread,
Oct 15, 2018, 5:51:34 PM10/15/18
to slurm...@lists.schedmd.com


Wanted to test X11 forwarding. X11 forwarding works as a normal user
just ssh'ing to a node and running xterm/etc.

With srun, however:

srun -n1 --pty --x11 xterm
srun: error: Unable to allocate resources: X11 forwarding not available

So, what am I missing?

Thanks.

PS

srun --version
slurm 17.11.7

rpm -qa |grep slurm
ohpc-slurm-server-1.3.5-8.1.x86_64
...


--
********************************
David William Botsch
Programmer/Analyst
@CNFComputing
bot...@cnf.cornell.edu
********************************
--
********************************
David William Botsch
Programmer/Analyst
@CNFComputing
bot...@cnf.cornell.edu
********************************

Rhian Resnick

unread,
Oct 15, 2018, 5:56:25 PM10/15/18
to slurm...@lists.schedmd.com



Double check /etc/ssh/sshd_config allows X11 forwarding on the node as it is disable by default. (I think)


X11Forwarding yes



Rhian Resnick

Associate Director Research Computing

Enterprise Systems

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 image




From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Dave Botsch <bot...@cnf.cornell.edu>
Sent: Monday, October 15, 2018 5:51 PM
To: slurm...@lists.schedmd.com
Subject: [slurm-users] x11 forwarding not available?
 

Dave Botsch

unread,
Oct 15, 2018, 7:07:55 PM10/15/18
to Slurm User Community List
Hi.

X11 forwarding is enabled and works for normal ssh.

Thanks.

On Mon, Oct 15, 2018 at 09:55:59PM +0000, Rhian Resnick wrote:
>
>
> Double check /etc/ssh/sshd_config allows X11 forwarding on the node as it is disable by default. (I think)
>
>
> X11Forwarding yes
>
>
>
>
> Rhian Resnick
>
> Associate Director Research Computing
>
> Enterprise Systems
>
> Office of Information Technology
>
>
> Florida Atlantic University
>
> 777 Glades Road, CM22, Rm 173B
>
> Boca Raton, FL 33431
>
> Phone 561.297.2647
>
> Fax 561.297.0222
>
> [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>

R. Paul Wiegand

unread,
Oct 15, 2018, 9:32:22 PM10/15/18
to Slurm User Community List
I believe you also need:

X11UseLocalhost no

Mahmood Naderan

unread,
Oct 16, 2018, 12:05:47 AM10/16/18
to Slurm User Community List
Dave,
With previous versions, I followed some steps with the help of guys here. Don't know about newer versions.

Please sent me a reminder in the next 24 hours and I will send you the instructions. At the moment, I don't have access to the server.

Regards,
Mahmood



Sent from Gmail on Android

Olivier Sallou

unread,
Oct 16, 2018, 3:30:12 AM10/16/18
to Slurm User Community List, Dave Botsch


On 10/16/2018 01:07 AM, Dave Botsch wrote:
> Hi.
>
> X11 forwarding is enabled and works for normal ssh.

I faced same issue, with ssh x11 working as expected on compute nodes,
but not with slurm -x11.

I patched slurm locally to make it work.

what you can try to see if it is the same issue:


srun -n1 --pty --x11 --pty bash


# xterm
// you should have an authorization failure error

// on connected node
# xauth list

you will have a list of MAGIC COOKIE like

myslurmmaster/unix:10  MIT-MAGIC-COOKIE-1  YYYYYY
myslurmnode/unix:52  MIT-MAGIC-COOKIE-1  XXXXXX

# echo $DISPLAY
localhost:52.0


To make it work manually I did (of course adapting node names and
display port number):

xauth remove myslurmnode/unix:52
xauth add localhost:52.0

then xterm (for example) worked.

If this is the same problem, slurm can be easilly patched to work (can
give you how)

Olivier
Olivier Sallou
Univ Rennes, Inria, CNRS, IRISA
Irisa, Campus de Beaulieu
F-35042 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438 (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438



Tina Friedrich

unread,
Oct 16, 2018, 4:48:25 AM10/16/18
to Slurm User Community List
I had an issue getting x11 forwarding via SLURM (srun/sbatch) to work; ssh
worked fine. Tracked it down to the host name setting on the nodes; as per
RedHat/CentOS default, the hostname was set to the fully qualified. Turns out
SLURMs X11 forwarding doesn't work with that; setting the hostnames to the
short hostname made it all magically work.

Tina
Tina Friedrich, Snr HPC Systems Administrator, Advanced Research Computing
Research Computing and Support Services, Academic IT
IT Services, University of Oxford
http://www.arc.ox.ac.uk

Jeffrey Frey

unread,
Oct 16, 2018, 9:04:50 AM10/16/18
to Slurm User Community List
Make sure you're using RSA keys in users' accounts -- we'd started setting-up ECDSA on-cluster keys as we built our latest cluster but libssh at that point didn't support them. And since the Slurm X11 plugin is hard-coded to only use ~/.ssh/id_rsa, that further tied us to RSA. It would be nice for the host and user key files to be configurable options; more configurable options for that plugin in general would be useful.
::::::::::::::::::::::::::::::::::::::::::::::::::::::
Jeffrey T. Frey, Ph.D.
Systems Programmer V / HPC Management
Network & Systems Services / College of Engineering
University of Delaware, Newark DE 19716
Office: (302) 831-6034 Mobile: (302) 419-4976
::::::::::::::::::::::::::::::::::::::::::::::::::::::





Olivier Sallou

unread,
Oct 16, 2018, 9:13:06 AM10/16/18
to Slurm User Community List


On 10/16/2018 03:04 PM, Jeffrey Frey wrote:
> Make sure you're using RSA keys in users' accounts -- we'd started setting-up ECDSA on-cluster keys as we built our latest cluster but libssh at that point didn't support them. And since the Slurm X11 plugin is hard-coded to only use ~/.ssh/id_rsa, that further tied us to RSA. It would be nice for the host and user key files to be configurable options; more configurable options for that plugin in general would be useful.

and rsa key needs to be password-less (a shame...)

Dave Botsch

unread,
Oct 16, 2018, 9:31:05 AM10/16/18
to Slurm User Community List
Hrm... it looks like the default install of OHPC went with DHA keys
instead:

.ssh]$ cat config
# Added by Warewulf 2018-10-08
Host *
IdentityFile ~/.ssh/cluster
StrictHostKeyChecking=no
$ file cluster
cluster: PEM DSA private key


Now I have to find where that's configured since it autocreates those on
user login.

Dave Botsch

unread,
Oct 16, 2018, 9:31:39 AM10/16/18
to Slurm User Community List
That's not the issue, here (though I have experienced that before).
Regular ssh forwarding works fine.

Dave Botsch

unread,
Oct 16, 2018, 9:33:05 AM10/16/18
to Slurm User Community List
Hi.

Reminder :)

Dave Botsch

unread,
Oct 16, 2018, 9:39:36 AM10/16/18
to Slurm User Community List
At least by itself, switching to rsa keys did not fix it.

Used ssh-keygen to create an RSA key and edited .ssh/config to point to
that instead of to the dsa key. So unless srun is bypassing that
.ssh/config... nope.

Dave Botsch

unread,
Oct 16, 2018, 9:43:27 AM10/16/18
to Slurm User Community List
Sadly that did not make a difference.

Tina Friedrich

unread,
Oct 16, 2018, 10:59:25 AM10/16/18
to Slurm User Community List
Regular ssh forwarding worked just fine with the long hostnames. It's just
SLURMs version thereof that doesn't. (I think same for DSA/RSA keys etc).

Tina

Dave Botsch

unread,
Oct 16, 2018, 11:31:12 AM10/16/18
to Slurm User Community List
Hmm..

my hostname is already set to the short hostname per the output of
"hostname" .

Dave Botsch

unread,
Oct 16, 2018, 11:40:00 AM10/16/18
to slurm...@lists.schedmd.com
Ok. Progress.

Per:

https://bugs.schedmd.com/show_bug.cgi?id=4721

I was missing PrologFlags=x11 in slurm.conf .

So now my issue is just that X11 forwarding doesn't work... slurmd.log:

[2018-10-16T11:38:34.136] [27.extern] error: ssh public key
authentication failure: Username/PublicKey combination invalid
[2018-10-16T11:38:34.136] [27.extern] error: x11 port forwarding setup
failed
[2018-10-16T11:38:34.136] [27.extern] error: _spawn_job_container:
failed retrieving x11 display value: No such file or directory
[2018-10-16T11:38:34.146] [27.extern] done with job
[2018-10-16T11:38:34.297] launch task 27.0 request from
1000...@10.84.184.133 (port 3287)
[2018-10-16T11:38:34.357] error: could not get x11 forwarding display
for job 27 step 0, x11 forwarding disabled
[2018-10-16T11:38:34.364] [27.0] in _window_manager

Will have to try RSA again and see if that gets things going.

Mahmood Naderan

unread,
Oct 16, 2018, 2:19:52 PM10/16/18
to Slurm User Community List
Dave,
My platform is Rocks with Centos 7.0. It may not be exactly your case,
but it may help you with some ideas on what to do. I used
https://github.com/hautreux/slurm-spank-x11 and here is the guide
which Ian Mortimer told me:



There should be a binary slurm-spank-x11 and a library x11.so which
have to be installed in the correct locations. Two configuration files
also have to be installed.

These have to be installed on all compute nodes as well as the head
node or login node(s) so the simplest way is to build a package.
Fortunately a spec file is included in the spank-x11 distribution but I
had to make some changes to make it usable.

Move slurm-spank-x11-0.2.5.tar.gz to ~/rpmbuild/SOURCES
My modified spec file is attached. Copy that to ~/rpmbuild/SPECS and
run:

rpmbuild -bb --clean ~/rpmbuild/SPECS/slurm-spank-x11.spec

That should build a package:

~/rpmbuild/RPMS/x86_64/slurm-spank-x11-0.2.5-3.x86_64.rpm

Copy that package to /export/rocks/install/contrib/7.0/x86_64/RPMS/
and add the package to the package list in:

/export/rocks/install/site-profiles/7.0/nodes/extend-compute.xml

If you use login node(s) also add it to the package list in:

/export/rocks/install/site-profiles/7.0/nodes/extend-login.xml

Then rebuild the distro with the new package added:

cd /export/rocks/install; rocks create distro

Now you need to install it on the login node(s) (or front end) and all
the compute nodes. You can use pdsh or 'rocks run host' for that.

The last step is to create the file /etc/slurm/plugstack.conf to enable
plugins:

echo 'include /etc/slurm/plugstack.conf.d/*' \
>> /etc/slurm/plugstack.conf

Copy that file to the compute nodes and login node(s) and to ensure the
file is created when you reinstall your nodes or install new nodes you
also need to add the command to the %post section of extend-compute.xml
(and extend-login.xml if you're using it).

When that's all done restart slurm everywhere with:

rocks sync slurm

You should then be able to get an interactive login with:

srun --x11 --pty bash



Regards,
Mahmood



On Tue, Oct 16, 2018 at 5:04 PM Dave Botsch <bot...@cnf.cornell.edu> wrote:
>
> Hi.
>
> Reminder :)

Michael Jennings

unread,
Oct 16, 2018, 3:42:16 PM10/16/18
to Slurm User Community List
On Tuesday, 16 October 2018, at 09:30:13 (-0400),
Dave Botsch wrote:

> Hrm... it looks like the default install of OHPC went with DHA keys
> instead:
>
> .ssh]$ cat config
> # Added by Warewulf 2018-10-08
> Host *
> IdentityFile ~/.ssh/cluster
> StrictHostKeyChecking=no
> $ file cluster
> cluster: PEM DSA private key

That's not OHPC. That's a (rather unfortunate) part of Warewulf
called `cluster-env`, a tool used to seamlessly make passphrase-less
SSH work within a cluster without admin/user intervention. You can
see the code here:
https://github.com/warewulf/warewulf3/blob/master/cluster/bin/cluster-env

If you install the warewulf-cluster RPM, a script installed as
/etc/profile.d/cluster-env.sh will run /usr/bin/cluster-env on each
login (for sh/ksh/bash users...and an equivalent script is installed
for csh/tcsh users). See
e.g. https://github.com/warewulf/warewulf3/blob/master/cluster/etc/cluster-env.sh
for the stub script.

The above version on GitHub has been updated to use RSA keys instead
of DSA, but the *actually* correct solution, rather than forceably
altering each user's SSH configuration and ~/.ssh/ contents, is to
enable Host-based authentication for SSH in /etc/ssh/sshd_config (or
GSSAPI authentication, or host-based certificates, or any of the other
options available to have machines authenticate themselves so that
users can move between cluster hosts seamlessly and securely).

When that utility was written, DSA was the "state-of-the-art," and it
unfortunately went untouched for a very long time. The key type
should not have been hard-coded with no way to permit site-specific
configuration, but it was. As I said, though, there are better ways
to accomplish user auth between nodes without passphrases, and I
recommend disabling `cluster-env` and using one of those alternatives
instead. (In fact, it's probably best to remove the entire
warewulf-cluster RPM. wwinit and wwfirstboot are similarly ancient
tools in need of updating/replacement.)

As for X11 forwarding/authentication, there is no easy/simple answer
to why it won't work. Lots of things need to be in sync for it to
work, including xauth, xhost, $DISPLAY, firewall rules, etc., and
there are numerous opportunities for minor misconfigurations to break
the whole kit-and-kaboodle. To troubleshoot, I recommend examining
the values of $DISPLAY and the results of `xauth list` and `xhost`
under both working and non-working conditions, and see if you can see
a pattern. Also make sure `ssh -Y` is being used all along the way,
not just `ssh -X`.

Our solution at LANL uses a 130-line PERL script that does proper
NFS-based locking of the user's ~/.Xauthority file, forceably resets
their $DISPLAY to the correct value, and adds the correct entry to
~/.Xauthority using `xauth add`. Our experience has been that's the
only way to correctly handle all cases. (And no, unfortunately I
can't share it, but it's not a difficult thing to write.)

Michael

--
Michael E. Jennings <m...@lanl.gov>
HPC Systems Team, Los Alamos National Laboratory
Bldg. 03-2327, Rm. 2341 W: +1 (505) 606-0605

Dave Botsch

unread,
Oct 16, 2018, 3:54:31 PM10/16/18
to Slurm User Community List
So I got what I want working with RSA keys (and making sure to put the
public rsa key in ~/.ssh/authorized_keys) .

and of course that prolog statment in slurm.conf .

What I ended up doing was just created my own separate script analogous
to cluter-env to create the rsa keys. I'm trying not to stray too far
from the defaults to make upgrades easier.

Thanks.

Chris Samuel

unread,
Oct 16, 2018, 11:59:00 PM10/16/18
to slurm...@lists.schedmd.com
On Wednesday, 17 October 2018 12:04:05 AM AEDT Jeffrey Frey wrote:

> Make sure you're using RSA keys in users' accounts

We use SSH's host based authentication instead (along with pam_slurm_adopt on
compute nodes so users can only get into nodes they have a job on).

X11 forwarding works here.

--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC




Dave Botsch

unread,
Oct 17, 2018, 12:08:04 AM10/17/18
to Slurm User Community List
Hmm.. I will have to investigate pam_slurm_adopt .

********************************
David William Botsch
Programmer/Analyst
@CNFComputing
bot...@cnf.cornell.edu
********************************

Reply all
Reply to author
Forward
0 new messages