Shibboleth + apache2 reverse proxy + .sparkmagic + spark + cookies = major headache

34 views
Skip to first unread message

Bryn Smith

unread,
Oct 10, 2018, 1:02:21 PM10/10/18
to Project Jupyter
Hi,
I promise I have searched extensively in/for jupyterhub gitlab and jhub_shibboleth_auth and jhub_remote_user_authenticator and Shibboeth docs and Jupyter docs and reverse proxy even websockets + httpd.  

tl;dr problem: With the proxy turned on, everything works up to the kernel trying to connect to spark, but with the proxy off, users can only connect if they have a previous cookie.

My setup:

Ubuntu 16.04
Jupyterhub 0.9.2
Shibboleth 2.6.1-1
Apache2 (httpd) 2.4.18
I've tried both jhub_remote_user_authenticator-0.0.2 and jhub_shibboleth_auth-1.3.0

The goal is to have a shibbed jhub instance to connect to a spark instance on a separate hadoop cluster.  

If I have the proxy turned on, users can log in no problem, their notebook starts up and everything is fine until they try to start up a kernel.  I've tried pyspark, sparkmagic, plain old python3, and probably one or two others, and none of them start.

If the proxy is turned off after the user has the notebook cookie, they can reconnect to the non-proxied URL and launch their notebook and the kernel is fine.

If the user tries to log in when the proxy is turned off, and connects to the non-proxied URL, if they do not have a cookie they get a 403 Forbidden error and the REMOTE_USER is not passed from Shibboleth to the hub, so they get no notebook and no anything.


My configs:
Jupyterhub with proxy on: (with the proxy turned off you would swap the bind_url lines)

c.JupyterHub.admin_access = True

c.JupyterHub.hub_ip = '10.138.20.98'

c.JupyterHub.bind_url = 'https://127.0.0.1:8000'
#c.JupyterHub.bind_url = 'https://jupyter-dev.$UNIV.edu:8000'

c.Application.log_level = 'DEBUG'
c.JupyterHub.extra_log_file = '/var/log/jupyterhub.log'

c.JupyterHub.ssl_cert = '/etc/jupyterhub/ssl.crt'

c.JupyterHub.ssl_key = '/etc/jupyterhub/ssl.key'

c.Authenticator.admin_users = {'bryn'}

#c.Spawner.notebook_dir = '~/notebooks'

c.JupyterHub.authenticator_class = 'jhub_shibboleth_auth.shibboleth_auth.ShibbolethAuthenticator'
#c.JupyterHub.authenticator_class = 'jhub_remote_user_authenticator.remote_user_auth.RemoteUserAuthenticator'

Apache with proxy on and websockets:

<IfModule mod_ssl.c>
 <ifModule mod_proxy.c>
<VirtualHost *:443>
ServerAdmin help@$UNIV.edu
ServerName jupyter-dev.$UNIV.edu


ErrorLog ${APACHE_LOG_DIR}/error-ssl.log
CustomLog ${APACHE_LOG_DIR}/access-ssl.log combined

SSLEngine on

SSLCertificateFile /etc/ssl/certs/ssl-cert-jupyter-dev.crt
SSLCertificateKeyFile /etc/ssl/private/ssl-cert-jupyter-dev.key

SSLCACertificatePath /etc/ssl/certs/
SSLCACertificateFile /etc/ssl/certs/incommon-2015.crt

ProxyVia On
ProxyRequests Off
ProxyPreserveHost on
SSLProxyEngine on


<Location />
Authtype shibboleth
ShibRequireSession On
ShibUseHeaders On
require shibboleth
RequestHeader set REMOTE_USER %{REMOTE_USER}s
    ProxyPass wss://127.0.0.1:8000
ProxyPassReverse wss://127.0.0.1:8000
ProxyPassReverse https://127.0.0.1:8000/

      </Location>

# <ifModule mod_wstunnel.c>
#    <Location ~ "/jupyter/(user/[^/]*)/(api/kernels/[^/]+/channels|terminals/websocket)/?"\ >
#      </Location>

# </IfModule>

</VirtualHost>
 </IfModule>
</IfModule>

Apache2 without proxy info:

<IfModule mod_ssl.c>
<VirtualHost *:443>
ServerAdmin help@$UNIV.edu
ServerName jupyter-dev.$UNIV.edu


ErrorLog ${APACHE_LOG_DIR}/error-ssl.log
CustomLog ${APACHE_LOG_DIR}/access-ssl.log combined

SSLEngine on

SSLCertificateFile /etc/ssl/certs/ssl-cert-jupyter-dev.crt
SSLCertificateKeyFile /etc/ssl/private/ssl-cert-jupyter-dev.key

SSLCACertificatePath /etc/ssl/certs/
SSLCACertificateFile /etc/ssl/certs/incommon-2015.crt



<Location />
Authtype shibboleth
ShibRequireSession On
ShibUseHeaders On
require shibboleth
RequestHeader set REMOTE_USER %{REMOTE_USER}s
    

      </Location>


</VirtualHost>
 </IfModule>


I have tried approximately 30 variations on the websockets proxy configuration, the main proxy configuration, with shib turned on, with it off, new sparkmagic configs, etc.  

When the proxy is turned off, the user goes straight to tornado and bypasses shibboleth so of course their user info is not passed in.  If they had already logged in, their cookie is good, of course, but I can't go through this song and dance for each new user.


This is what is in jupyterhub.log when the proxy is turned off, so you can see that it is definitely not getting REMOTE_USER from shib.:
==> /var/log/jupyterhub.log <==
[I 2018-10-10 12:33:19.664 JupyterHub log:158] 302 GET / -> /hub (@10.237.5.144) 0.91ms
[I 2018-10-10 12:33:19.691 JupyterHub log:158] 302 GET /hub -> /hub/ (@10.237.5.144) 0.69ms
[I 2018-10-10 12:33:19.708 JupyterHub log:158] 302 GET /hub/ -> /hub/login (@10.237.5.144) 0.80ms
[D 2018-10-10 12:33:19.727 JupyterHub base:880] No template for 403
[W 2018-10-10 12:33:19.729 JupyterHub log:158] 403 GET /hub/login (@10.237.5.144) 3.34ms
[D 2018-10-10 12:33:19.894 JupyterHub log:158] 304 GET /favicon.ico (@10.237.5.144) 0.84ms
[D 2018-10-10 12:34:33.420 JupyterHub proxy:678] Proxy: Fetching GET http://127.0.0.1:8001/api/routes
[I 2018-10-10 12:34:33.432 JupyterHub proxy:301] Checking routes
[D 2018-10-10 12:39:33.420 JupyterHub proxy:678] Proxy: Fetching GET http://127.0.0.1:8001/api/routes
[I 2018-10-10 12:39:33.433 JupyterHub proxy:301] Checking routes
[D 2018-10-10 12:44:33.420 JupyterHub proxy:678] Proxy: Fetching GET http://127.0.0.1:8001/api/routes
[I 2018-10-10 12:44:33.433 JupyterHub proxy:301] Checking routes
[D 2018-10-10 12:49:33.420 JupyterHub proxy:678] Proxy: Fetching GET http://127.0.0.1:8001/api/routes
[I 2018-10-10 12:49:33.432 JupyterHub proxy:301] Checking routes


I would appreciate any tips to try or sites to read or things to consider.  Let me know if you want more information, as well.

Evan Clark

unread,
Oct 10, 2018, 1:36:22 PM10/10/18
to jup...@googlegroups.com
Any reason why you are proxying to wss instead of ws? In this config I’d expect sal termination at Apache and not at the jupyter hub. I had a similar problem and it was caused by websockets not being properly proxied. I resolved it by moving the web socket proxy higher and I believe using the mod rewrite module instead.

Regards,
Evan Clark
 

From: 30051004020n behalf of
Sent: Wednesday, October 10, 2018 1:02 PM
To: Project Jupyter
Subject: [jupyter] Shibboleth + apache2 reverse proxy + .sparkmagic + spark + cookies = major headache
 
--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/c2ce203c-5c07-4541-bb7e-eca3269b1d7f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bryn Smith

unread,
Oct 10, 2018, 3:08:08 PM10/10/18
to Project Jupyter
The wss: vs ws: is an artifact of trying a bunch of different variations. Moving the web socket proxy higher than what, the shib config lines? 
I've just tried mod_rewrite instead of mod_proxy (both with ws: and wss: for the websockets), and it made no difference.  The kernel still wouldn't connect. I'm agnostic as far as the SSL termination point.
Reply all
Reply to author
Forward
0 new messages