run ssh server in a Singularity container?

2,062 views
Skip to first unread message

Vang Le Quy

unread,
Aug 27, 2019, 3:44:01 AM8/27/19
to singularity

This is my setup:

Laptop (LT) -> slurm front-end(FE) containing Singularity executable -> Compute node (CN) with GPUs, and also contains Singularity executable.

FE and CN share a partition so users can have shared home directories between FE and CN.

Singularity version: 3.3.0

Intermediate goal:

Start a singularity container/instance on FE, and SSH server service therein. Then do SSH from LT into that container.

Final Goal:

Submit a slurm job from FE which in turn start a Singularity container/instance on CN. SSH from LT to that Container on CN via FE because users can't logon to CN directly.


My try so far:

This is my DEF file:

#############
Bootstrap: docker
From: nvidia/tensorflow:19.05-py3
Registry: nvcr.io
IncludeCmd: yes

%environment
    export LANG=en_US.UTF-8

%post
    apt-get update && apt-get install -y --no-install-recommends apt-utils
    LANG=en_US.UTF-8
    # Language/locale settings
    echo $LANG UTF-8 > /etc/locale.gen
    apt-get install -y locales && update-locale --reset LANG=$LANG

    apt-get install -y --no-install-recommends wget lsb-release parallel vim openssh-server
    systemctl enable ssh

%startscript
    systemctl start ssh
###############

Build:
singularity build --fakeroot tensorflow_19.05-py3.sif Singularity.def

Run:
mkdir overlay
singularity instance start -B /run:/run  --writable-tmpfs --overlay $PWD/overlay  --fakeroot --net --network-args="portmap=2222:22/tcp" ./tensorflow_19.05-py3.sif sshins

INFO:    Convert SIF file to sandbox...
WARNING: Ignoring --writable-tmpfs as it requires overlay support
Could not watch jobs: Operation not permitted

INFO:    instance started successfully

Test ssh server status:

singularity shell instance://sshins
Singularity rootfs-054517279:/tmp/> whoami
root
Singularity rootfs-054517279:/tmp> service ssh status
Failed to retrieve unit: Access denied
Failed to get properties: Access denied
Singularity rootfs-054517279:/tmp> ps -ef
UID         PID   PPID  C STIME TTY          TIME CMD
root          1      0  0 07:33 ?        00:00:00 sinit
root         25      0  0 07:34 pts/27   00:00:00 /bin/bash --norc
root        253     25  0 07:35 pts/27   00:00:00 ps -ef


On FE:
ssh -p 2222 feuser@localhost
ssh: connect to host localhost port 2222: Connection refused

This is a dead end to me at the moment. Any info and suggestions are welcome.

Kind regards
Vang

Thomas Hartmann

unread,
Aug 27, 2019, 4:40:18 AM8/27/19
to singu...@lbl.gov
Hi Vang,

not a real answer, but maybe a full grown sshd with systemd underneath
might be overkill and that a busybox/dropbear ssh server might be more
streamlined.

But before going further: have you checked, that the ports on your
cluster nodes are accessible from the outside? I would expect, that your
admins have also set up a more or less restrictive firewall.

Cheers,
Thomas

On 27/08/2019 09.44, Vang Le Quy wrote:
> *
> *
> *This is my setup:*
>
> Laptop (LT) -> slurm front-end(FE) containing Singularity executable ->
> Compute node (CN) with GPUs, and also contains Singularity executable.
>
> FE and CN share a partition so users can have shared home directories
> between FE and CN.
>
> Singularity version: 3.3.0
>
> *Intermediate goal:*
>
> Start a singularity container/instance on FE, and SSH server service
> therein. Then do SSH from LT into that container.
>
> *Final Goal:*
>
> Submit a slurm job from FE which in turn start a Singularity
> container/instance on CN. SSH from LT to that Container on CN via FE
> because users can't logon to CN directly.
>
>
> *My try so far:*
> --
> You received this message because you are subscribed to the Google
> Groups "singularity" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to singularity...@lbl.gov
> <mailto:singularity...@lbl.gov>.
> To view this discussion on the web visit
> https://groups.google.com/a/lbl.gov/d/msgid/singularity/bece7fde-73f0-4b9b-852a-4bd9576cc968%40lbl.gov
> <https://groups.google.com/a/lbl.gov/d/msgid/singularity/bece7fde-73f0-4b9b-852a-4bd9576cc968%40lbl.gov?utm_medium=email&utm_source=footer>.

Oliver Freyermuth

unread,
Aug 27, 2019, 5:08:12 AM8/27/19
to singu...@lbl.gov
Hi Vang,

I would expect exactly what Thomas suggested - commonly, cluster nodes are firewalled and / or NATted
(or, even more fun at some HPC sites, have no outbound connectivity at all).

However, I expect your functionality can be emulated by running an interactive job. I've never used Slurm with containers,
but with HTCondor and the correct setup, running an interactive job gives you an ssh shell into the correct container environment
(even if the worker node is firewalled and / or NATted). In HTCondor 8.6, this requires running sshd inside the container,
in 8.8 (once some remaining issues are fixed) sshd will run outside the container and connect to the container via nsenter and ptys.

I'd presume Slurm offers something similar. If not, you can probably run sshd directly (without any systemd around) as long as you can somehow get direct network connectivity.

Cheers,
Oliver

Am 27.08.19 um 10:40 schrieb Thomas Hartmann:

Vang Le Quy

unread,
Aug 27, 2019, 5:28:54 AM8/27/19
to singularity
Hi Thomas and Oliver.

I need a full instance of Ubuntu with tensorflow, GPU driver, etc to do some machine learning inside the container. The ssh connection will allow remote debug functionality for python code with pyCharm. So a light weight busybox will not work. Regarding firewall, this reason can be excluded for now, because I logged on to the FE and do ssh to localhost. More importantly, the ssh service itself inside the instance is not running (see terminal output at the end of my first email). So I must get the ssh server up and running inside the instance first.

Cheers
Vang

Oliver Freyermuth

unread,
Aug 27, 2019, 5:34:06 AM8/27/19
to singu...@lbl.gov
Hi Vang,

Am 27.08.19 um 11:28 schrieb Vang Le Quy:
> Hi Thomas and Oliver.
>
> I need a full instance of Ubuntu with tensorflow, GPU driver, etc to do some machine learning inside the container. The ssh connection will allow remote debug functionality for python code with pyCharm. So a light weight busybox will not work.

this is exactly what our users do via interactive jobs (but with HTCondor). Did you check if this works or can be made to work with Slurm?
Maybe your admins can offer this?

> Regarding firewall, this reason can be excluded for now, because I logged on to the FE and do ssh to localhost.

I don't see how an ssh to localhost tests anything related to firewall setup?

> More importantly, the ssh service itself inside the instance is not running (see terminal output at the end of my first email). So I must get the ssh server up and running inside the instance first.

Did you try to run sshd manually instead of using systemd as I mentioned?

Cheers,
Oliver

>
> Cheers
> Vang
>
> On Tuesday, August 27, 2019 at 9:44:01 AM UTC+2, Vang Le Quy wrote:
>
> *
> *
> *This is my setup:*
>
> Laptop (LT) -> slurm front-end(FE) containing Singularity executable -> Compute node (CN) with GPUs, and also contains Singularity executable.
>
> FE and CN share a partition so users can have shared home directories between FE and CN.
>
> Singularity version: 3.3.0
>
> *Intermediate goal:*
>
> Start a singularity container/instance on FE, and SSH server service therein. Then do SSH from LT into that container.
>
> *Final Goal:*
>
> Submit a slurm job from FE which in turn start a Singularity container/instance on CN. SSH from LT to that Container on CN via FE because users can't logon to CN directly.
>
>
> *My try so far:*
>
> This is my DEF file:
>
> #############
> Bootstrap: docker
> From: nvidia/tensorflow:19.05-py3
> Registry: nvcr.io <http://nvcr.io>
> --
> You received this message because you are subscribed to the Google Groups "singularity" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov <mailto:singularity...@lbl.gov>.
> To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/singularity/d74ad067-3c38-40ca-97ab-adeb2a891e7f%40lbl.gov <https://groups.google.com/a/lbl.gov/d/msgid/singularity/d74ad067-3c38-40ca-97ab-adeb2a891e7f%40lbl.gov?utm_medium=email&utm_source=footer>.

Vang Le Quy

unread,
Aug 27, 2019, 9:08:57 AM8/27/19
to singularity
Hi Oliver

this is exactly what our users do via interactive jobs (but with HTCondor). Did you check if this works or can be made to work with Slurm?
Maybe your admins can offer this?
It works with slurm only: Submit interactive job, ssh to the compute node.

I don't see how an ssh to localhost tests anything related to firewall setup?
Sorry for the ambiguity. FE and CN are test servers, and therefore no firewall or extra security is setup. They are all default Ubuntu setup. Furthermore the network guys at my place control port opening on routers and switches. So ssh to localhost on FE is sufficient. For direct evidence. This works for nginx service:
user@FE:~ srun --pty bash -l
user@CN:~ singularity instance start --net --network-args="portmap=2222:80/tcp" --fakeroot nginx.img webtest

I then could visit http://CN:2222

Did you try to run sshd manually instead of using systemd as I mentioned?
I tried after your suggestion. There are several error before sshd really starts (e.g. missing /var/run/sshd directory, directory permission, etc). And I still can't connect.

My conclusion so far is that ssh server requires more things to start up properly inside singularity. But I don't know exactly what are these things yet, keep looking ...
> To unsubscribe from this group and stop receiving emails from it, send an email to singu...@lbl.gov <mailto:singularity+unsub...@lbl.gov>.

Dave Dykstra

unread,
Aug 27, 2019, 5:03:32 PM8/27/19
to Vang Le Quy, singularity
Hi Vang,

Here's a completely different approach you might like. Instead of using
ssh inside a container for debugging, from another process on the host
you can use "nsenter" to join an existing namespace. For example the
below script works when a container was started with singularity -c -i -p
and an unprivileged user namespace (that is, with -u or no setuid).

Dave


#!/bin/bash
# This assumes singularity was run with -c -i -p using
# unprivileged namespaces.
# Written by Dave Dykstra, 27 November 2017

usage()
{
echo "Usage: singularity-attach <pid>" >&2
exit 1
}

if [ $# != 1 ];then
usage
fi

if ! kill -0 "$1"; then
usage
fi

eval exec nsenter -t $1 -U --preserve-credentials -m -i -p -r -w /usr/bin/env \
-i $(xargs -0 bash -c 'printf "%q\n" "$@"' -- </proc/$1/environ) /bin/bash
> > an email to singu...@lbl.gov <javascript:> <mailto:
> > singularity...@lbl.gov <javascript:>>.
> --
> You received this message because you are subscribed to the Google Groups "singularity" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.
> To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/singularity/ad970876-e884-4823-8a9e-2e049cd927c2%40lbl.gov .

Vang Le Quy

unread,
Aug 28, 2019, 4:31:52 AM8/28/19
to singularity, lqva...@gmail.com
Hi Dave
Thanks for the suggestions. nscenter seems to be an interesting tool. However, I am not seeing how I can use it with pyCharm to debug python code running inside the container.

I found some more information thanks to Oliver suggestion about enabling sshd debug messages. One of missing component for sshd is the log service:

/usr/sbin/sshd -d
debug1: sshd version OpenSSH_7.6, OpenSSL 1.0.2n  7 Dec 2017
debug1: private host key #0: ssh-rsa SHA256:EDWRBlyyVHfWKk/cAp2CI1GcnN1OLvxrSu8ay4jxQmM
debug1: private host key #1: ecdsa-sha2-nistp256 SHA256:mMeGj7V1XxyJSkmAKGqNsBYLqVq3C19n0fRkPnEGVpeM
debug1: private host key #2: ssh-ed25519 SHA256:lN7is7WPx95JBWUjcvj9GSsc44/dZ/X4xVTGJ5YL28M
debug1: rexec_argv[0]='/usr/sbin/sshd'
debug1: rexec_argv[1]='-d'
debug1: Set /proc/self/oom_score_adj from 0 to -1000
debug1: Bind to port 22 on 0.0.0.0.
Server listening on 0.0.0.0 port 22.
debug1: Bind to port 22 on ::.
Server listening on :: port 22.
debug1: Server will not fork when running in debugging mode.
debug1: rexec start in 5 out 5 newsock 5 pipe -1 sock 8
debug1: inetd sockets after dupping: 3, 3
Connection from 172.19.8.14 port 51288 on 10.23.0.22 port 22 <========================= Login attempt
debug1: Client protocol version 2.0; client software version OpenSSH_7.2p2 Ubuntu-4ubuntu2.8
debug1: match: OpenSSH_7.2p2 Ubuntu-4ubuntu2.8 pat OpenSSH* compat 0x04000000
debug1: Local version string SSH-2.0-OpenSSH_7.6p1 Ubuntu-4ubuntu0.3
debug1: permanently_set_uid: 105/65534 [preauth]
permanently_set_uid: was able to restore old [e]gid [preauth]
debug1: do_cleanup [preauth]
debug1: monitor_read_log: child log fd closed <=========================Looking for log facility, and failed
debug1: do_cleanup
debug1: Killing privsep child 50
debug1: audit_event: unhandled event 12

So I went ahead and run:

Singularity rootfs-409417780:/tmp> service rsyslog start
 * Starting enhanced syslogd rsyslogd                                                                                                                                                                                                                                                                                                                                                                                          mknod: /dev/xconsole: Permission denied
chown: cannot access '/dev/xconsole': No such file or directory

/dev seems to be shared from host:
ls -lah /dev/ |head
total 0
drwxr-xr-x 21 nobody nogroup     4.0K Jul 26 06:53 .
drwxr-xr-x 23 root   root         580 Aug 28 10:13 ..
crw-------  1 nobody nogroup  10, 175 Jul  3 08:50 agpgart
crw-------  1 nobody nogroup  10, 235 Jul  3 08:50 autofs
drwxr-xr-x  2 nobody nogroup      400 Jul  3 08:50 block
drwxr-xr-x  2 nobody nogroup      120 Jul  3 08:50 bsg
crw-rw----  1 nobody nogroup  10, 234 Jul  3 08:50 btrfs-control
lrwxrwxrwx  1 nobody nogroup        3 Jul  3 08:50 cdrom -> sr0
lrwxrwxrwx  1 nobody nogroup        3 Jul  3 08:50 cdrw -> sr0

That means I can't start syslog. Still no go :(
> > singu...@lbl.gov <javascript:>>.
> > > To view this discussion on the web visit
> > https://groups.google.com/a/lbl.gov/d/msgid/singularity/d74ad067-3c38-40ca-97ab-adeb2a891e7f%40lbl.gov  
> > <
> > https://groups.google.com/a/lbl.gov/d/msgid/singularity/d74ad067-3c38-40ca-97ab-adeb2a891e7f%40lbl.gov?utm_medium=email&utm_source=footer >.
> >
> >
> >
>
> --
> You received this message because you are subscribed to the Google Groups "singularity" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to singu...@lbl.gov.

Josef Dvoracek

unread,
Aug 28, 2019, 7:53:33 AM8/28/19
to singu...@lbl.gov

this?

jose@koios2:~/projects/container-recipes/generic_containers$ cat centos7_w_ssh.def
BootStrap: docker
From: centos:7

#singularity file -----------------------
%post
    yum -y install yum-utils openssh-server

    ssh-keygen -A

    #my user:
    echo "+:jose:ALL" > /etc/security/access.conf
    # disable root tweaks. Works with openssh7, not with8
    echo "UsePrivilegeSeparation no" > /etc/ssh/sshd_config

    find /etc/ssh/ -type d -exec chmod 755 {} +
    find /etc/ssh/ -type f -exec chmod 644 {} +

#singularity file -----------------------

run with:

singularity exec centos7_w_ssh.sif /usr/sbin/sshd -p 12121 -D -d -e

connect: (as there is automounted home, there is already authorized_keys with my key..)


ssh localhost -p 12121
The authenticity of host '[localhost]:12121 ([::1]:12121)' can't be established.
ECDSA key fingerprint is SHA256:K8gLCw1b9ZicrCXdhh/V68XvuI9bTeHjY3XL3dxuTvk.
ECDSA key fingerprint is MD5:40:38:54:e0:95:c3:d1:e9:23:ab:b6:d9:d7:74:6d:36.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[localhost]:12121' (ECDSA) to the list of known hosts.
Attempt to write login records by non-root user (aborting)
Environment:
  USER=jose
  LOGNAME=jose
  HOME=/home/users/jose
  PATH=/usr/local/bin:/usr/bin
  MAIL=/var/mail/jose
  SHELL=/bin/sh
  SSH_CLIENT=::1 57690 12121
  SSH_CONNECTION=::1 57690 ::1 12121
  SSH_TTY=/dev/pts/17
  TERM=screen
-sh-4.2$ Connection to localhost closed by remote host.
Connection to localhost closed.


Josef Dvoracek
Institute of Physics @ Czech Academy of Sciences
cell: +420 608 563 558 | office: +420 266 052 669 | fzu phone nr. : 2669
To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.
To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/singularity/a5f0ce96-7500-47fc-8d10-52db928278d4%40lbl.gov.
Reply all
Reply to author
Forward
0 new messages