Hi,
I promise I have searched extensively in/for jupyterhub gitlab and jhub_shibboleth_auth and jhub_remote_user_authenticator and Shibboeth docs and Jupyter docs and reverse proxy even websockets + httpd.
tl;dr problem: With the proxy turned on, everything works up to the kernel trying to connect to spark, but with the proxy off, users can only connect if they have a previous cookie.
My setup:
Ubuntu 16.04
Jupyterhub 0.9.2
Shibboleth 2.6.1-1
Apache2 (httpd) 2.4.18
I've tried both jhub_remote_user_authenticator-0.0.2 and jhub_shibboleth_auth-1.3.0
The goal is to have a shibbed jhub instance to connect to a spark instance on a separate hadoop cluster.
If I have the proxy turned on, users can log in no problem, their notebook starts up and everything is fine until they try to start up a kernel. I've tried pyspark, sparkmagic, plain old python3, and probably one or two others, and none of them start.
If the proxy is turned off after the user has the notebook cookie, they can reconnect to the non-proxied URL and launch their notebook and the kernel is fine.
If the user tries to log in when the proxy is turned off, and connects to the non-proxied URL, if they do not have a cookie they get a 403 Forbidden error and the REMOTE_USER is not passed from Shibboleth to the hub, so they get no notebook and no anything.
My configs:
Jupyterhub with proxy on: (with the proxy turned off you would swap the bind_url lines)
c.JupyterHub.admin_access = True
c.JupyterHub.hub_ip = '10.138.20.98'
c.Application.log_level = 'DEBUG'
c.JupyterHub.extra_log_file = '/var/log/jupyterhub.log'
c.JupyterHub.ssl_cert = '/etc/jupyterhub/ssl.crt'
c.JupyterHub.ssl_key = '/etc/jupyterhub/ssl.key'
c.Authenticator.admin_users = {'bryn'}
#c.Spawner.notebook_dir = '~/notebooks'
c.JupyterHub.authenticator_class = 'jhub_shibboleth_auth.shibboleth_auth.ShibbolethAuthenticator'
#c.JupyterHub.authenticator_class = 'jhub_remote_user_authenticator.remote_user_auth.RemoteUserAuthenticator'
Apache with proxy on and websockets:
<IfModule mod_ssl.c>
<ifModule mod_proxy.c>
<VirtualHost *:443>
ServerAdmin help@$UNIV.edu
ServerName jupyter-dev.$UNIV.edu
ErrorLog ${APACHE_LOG_DIR}/error-ssl.log
CustomLog ${APACHE_LOG_DIR}/access-ssl.log combined
SSLEngine on
SSLCertificateFile /etc/ssl/certs/ssl-cert-jupyter-dev.crt
SSLCertificateKeyFile /etc/ssl/private/ssl-cert-jupyter-dev.key
SSLCACertificatePath /etc/ssl/certs/
SSLCACertificateFile /etc/ssl/certs/incommon-2015.crt
ProxyVia On
ProxyRequests Off
ProxyPreserveHost on
SSLProxyEngine on
<Location />
Authtype shibboleth
ShibRequireSession On
ShibUseHeaders On
require shibboleth
RequestHeader set REMOTE_USER %{REMOTE_USER}s
</Location>
# <ifModule mod_wstunnel.c>
# <Location ~ "/jupyter/(user/[^/]*)/(api/kernels/[^/]+/channels|terminals/websocket)/?"\ >
# </Location>
# </IfModule>
</VirtualHost>
</IfModule>
</IfModule>
Apache2 without proxy info:
<IfModule mod_ssl.c>
<VirtualHost *:443>
ServerAdmin help@$UNIV.edu
ServerName jupyter-dev.$UNIV.edu
ErrorLog ${APACHE_LOG_DIR}/error-ssl.log
CustomLog ${APACHE_LOG_DIR}/access-ssl.log combined
SSLEngine on
SSLCertificateFile /etc/ssl/certs/ssl-cert-jupyter-dev.crt
SSLCertificateKeyFile /etc/ssl/private/ssl-cert-jupyter-dev.key
SSLCACertificatePath /etc/ssl/certs/
SSLCACertificateFile /etc/ssl/certs/incommon-2015.crt
<Location />
Authtype shibboleth
ShibRequireSession On
ShibUseHeaders On
require shibboleth
RequestHeader set REMOTE_USER %{REMOTE_USER}s
</Location>
</VirtualHost>
</IfModule>
I have tried approximately 30 variations on the websockets proxy configuration, the main proxy configuration, with shib turned on, with it off, new sparkmagic configs, etc.
When the proxy is turned off, the user goes straight to tornado and bypasses shibboleth so of course their user info is not passed in. If they had already logged in, their cookie is good, of course, but I can't go through this song and dance for each new user.
This is what is in jupyterhub.log when the proxy is turned off, so you can see that it is definitely not getting REMOTE_USER from shib.:
==> /var/log/jupyterhub.log <==
[I 2018-10-10 12:33:19.664 JupyterHub log:158] 302 GET / -> /hub (@10.237.5.144) 0.91ms [I 2018-10-10 12:33:19.691 JupyterHub log:158] 302 GET /hub -> /hub/ (@10.237.5.144) 0.69ms [I 2018-10-10 12:33:19.708 JupyterHub log:158] 302 GET /hub/ -> /hub/login (@10.237.5.144) 0.80ms [D 2018-10-10 12:33:19.727 JupyterHub base:880] No template for 403
[W 2018-10-10 12:33:19.729 JupyterHub log:158] 403 GET /hub/login (@10.237.5.144) 3.34ms [D 2018-10-10 12:33:19.894 JupyterHub log:158] 304 GET /favicon.ico (@10.237.5.144) 0.84ms [I 2018-10-10 12:34:33.432 JupyterHub proxy:301] Checking routes
[I 2018-10-10 12:39:33.433 JupyterHub proxy:301] Checking routes
[I 2018-10-10 12:44:33.433 JupyterHub proxy:301] Checking routes
[I 2018-10-10 12:49:33.432 JupyterHub proxy:301] Checking routes
I would appreciate any tips to try or sites to read or things to consider. Let me know if you want more information, as well.