Jupyterhub + docker + swarm + can't connect to kernel - How to debug?

458 views
Skip to first unread message

Ted Liefeld

unread,
Nov 4, 2016, 2:12:22 PM11/4/16
to Project Jupyter

I have a jupyterhub (running on ubuntu on AWS) that uses a custom authentication plugin and the docker spawner to make the jupyter instances.  With that configuration, all was well.

Now I am trying to transition to using docker swarm to allow a bit more flexibility if/when the # of users goes up, and I basically followed the instructions from here:  https://zonca.github.io/2016/05/jupyterhub-docker-swarm.html.

After quite a bit of playing around (e.g. with the AWS VPC, swarm ports etc) I have things almost working except that the Jupyters that get launched cannot connect to the kernel.  So what is working is
1. user login
2. new docker container starts (on the remote swarm node)
    - startup logs look identical to non-swarm version
    - then I see what looks like http 400  errors in the log , e.g.
      [W 2016-11-04 17:51:09.323 ted3 log:47] 400 GET /user/ted3/api/kernels/1ebfdfcf-0347-4b5c-af6b-8cd58b6749f6/channels?session_id=F88D1AA65C694970935729229155198A (69.173.127.25) 10.30ms referer=None

In the browser I see the hub fine, it starts the individual server, loads a notebook and then gives a 'Could not connect to kernel' error dialog.  No errors appear AFAICT in the hub logs.

So clearly there is a problem with the forwarding somewhere between the hub, the containerized jupyter and the containerized kernel.  My suspicion its that last step.

Can anyone suggest how to go about debugging this?  For reference, here is my jupyterhub config.  The container image referenced is a slightly modified singleuser container that just adds installing a couple of notebook extensions (which according to the container logs are installed and activated fine).  Also this is running without SSL, HTTPS is provided by having this all fronted by an AWS load balancer and restricting direct access to the http port to the VPC its in (which the nodes and the hub are both in).

Thanks for any suggestions,

=============== jupyterhub config below ===================
# Configuration file for jupyterhub.
c = get_config()
import os
pjoin = os.path.join
c.JupyterHub.confirm_no_ssl = True

# put the JupyterHub cookie secret and state db
# in /var/run/jupyterhub
c.JupyterHub.cookie_secret_file = pjoin(runtime_dir, 'cookie_secret')
c.JupyterHub.db_url = pjoin(runtime_dir, 'jupyterhub.sqlite')

import sys
sys.path.append('/home/ubuntu/jupyter_auth')
import genomespaceauthenticator
c.JupyterHub.authenticator_class = genomespaceauthenticator.GenomeSpaceAuthenticator
c.GenomeSpaceAuthenticator.user_dir_path = "/jupyterhub/userdirs/"

# specify users and admin
c.JupyterHub.proxy_api_ip = '0.0.0.0'
c.JupyterHub.proxy_auth_token = 'REDACTED'

c.JupyterHub.hub_ip = '0.0.0.0'
c.JupyterHub.extra_log_file = '/home/ubuntu/logs_jupyterhub/jupyterhub.log'

import dockerspawner
c.JupyterHub.spawner_class = dockerspawner.DockerSpawner
c.DockerSpawner.remove_containers = True

# The docker instances need access to the Hub, so the default loopback port doesn't work:
from jupyter_client.localinterfaces import public_ips
c.JupyterHub.hub_ip = public_ips()[0]

# The IP address (or hostname) the single-user server should listen on
c.Spawner.ip = '0.0.0.0'
c.DockerSpawner.container_image = "gssingleuser"

# mount the custom css where jupyterhub in the container expects it
c.DockerSpawner.volumes = {
    '/home/ubuntu/genomespace_jupyterhub_config/docker_custom':'/opt/conda/lib/python3.5/site-packages/notebook/static/custom',
    '/jupyterhub/userdirs/{username}' : '/home/jovyan/work',
    '/home/ubuntu/genomespace/combined' : '/combined',
    '/home/ubuntu/genomespace_jupyterhub_config/extensions' : '/jupyter-hub-extensions'
}

#
# CHANGES FOR SWARM
#
os.environ["DOCKER_HOST"] = ":4000"
from IPython.utils.localinterfaces import public_ips
c.JupyterHub.hub_ip = public_ips()[0]

# The docker instances need access to the Hub, so the default loopback port
# doesn't work. We need to tell the hub to listen on 0.0.0.0 because it's in a
# container, and we'll expose the port properly when the container is run. Then,
# we explicitly tell the spawned containers to connect to the proper IP address.
c.JupyterHub.proxy_api_ip = '0.0.0.0'
c.DockerSpawner.container_ip = '0.0.0.0'
c.DockerSpawner.use_internal_ip = False

c.DockerSpawner.hub_ip_connect = c.JupyterHub.hub_ip
#
# END CHANGES FOR SWARM
#



Ted Liefeld

unread,
Nov 7, 2016, 6:09:51 PM11/7/16
to Project Jupyter
Still hoping someone out there has an idea...

I installed net-tools into the container and found the following using netstat -tnlp

sudo netstat -tnlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:33391         0.0.0.0:*               LISTEN      -              
tcp        0      0 127.0.0.1:52271         0.0.0.0:*               LISTEN      -              
tcp        0      0 127.0.0.1:36081         0.0.0.0:*               LISTEN      -              
tcp        0      0 127.0.0.1:49909         0.0.0.0:*               LISTEN      -              
tcp        0      0 0.0.0.0:8888            0.0.0.0:*               LISTEN      -              
tcp        0      0 127.0.0.1:39195         0.0.0.0:*               LISTEN      -              
tcp        0      0 127.0.0.1:51195         0.0.0.0:*               LISTEN      -   

I am wondering if the problem is that the kernel is listening on 127.0.0.1 instead of 0.0.0.0?  Still not sure how to control that in the jupyterhub_config.py though since I set the ips either explicitly to that of the hub or to 0.0.0.0.

Anyone?  Bueller?


Ted Liefeld

unread,
Nov 8, 2016, 2:25:39 PM11/8/16
to Project Jupyter
Found a solution and posting it hear for anyone who wants to do this in the future...

Seems that the problem was websockets not macking it through the ELB load balancer.  See this post for detaisl...

        http://blog.flux7.com/web-apps-websockets-with-aws-elastic-load-balancing

Fundamentally the fix is to change the load balancer (classic version) to using secure TCP as the protocol instead of https, so that the websockets connections are also forwarded.

Case closed (for now)

MinRK

unread,
Nov 11, 2016, 1:57:37 PM11/11/16
to Project Jupyter
Ted,

Sorry we didn't get to this before you did, a lot of Jupyter folks are at a Jupyter Team Meeting all week, so are falling behind on GitHub and email. Thanks for posting your solution!

-MinRK

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+unsubscribe@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/27ff8fc2-43e0-4003-924a-967646cc01c7%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages