Hello fellow Jupyter Hub enthusiasts,
We've been trialling Jupyter Hub as a possible multi-tenant notebook solution at our org. We have a single Hub+proxy running on an openstack VM with the Toree and PySpark kernels, with the default authenticator (but sss configured on the host to point to LDAP) and the default spawner, and things look great. So the next step is rolling this out as a production solution to support ~200 users, and I was hoping to tap into the wisdom of the community on a few questions that I have:
1) Remote Spawner
The way I understand it, the default spawner will launch the hub, the proxy and single user Jupyter servers on the same host. (Correct me if I'm wrong.) I don't think this is the ideal setup for production as there isn't complete user isolation (I understand we have the notion of a 'sandbox' root directory for each user, but we don't have CPU isolation for example). Given this fact, the idea of the RemoteSpawner looks good - presumably the hub and proxy will run on host 1, while host 2 will run remote single user servers, at least isolating the hub and proxy from the user servers. If I understand correctly, the remote spawner does not do any load balancing - in the sense that it doesn't attempt to distribute single user servers among n remote hosts. However, the github documentation (
https://github.com/zonca/remotespawner) suggests this code is still not production ready; is anyone using the remote spawner successfully in production?
(Also, while the Docker solution sounds great, unfortunately, we don't have a docker-ready production OS image at our org today.)
2) Load Balancer
I'm wondering if I need a load balancer (with session stickiness) in front of multiple hub servers (possibly connecting multiple remote single user servers in the backend). I suppose the answer depends on what kind of load a single Hub server can take. Does anybody have any numbers for me to absorb, or any documentation that'll give me a starting point in coming up with an estimate? Given that the Hub+proxy server is going to primarily be forwarding http requests, a single server should be able to able to handle the ~200 users that we have, correct?
3) Resource Reqs
What sort of hardware (in terms of memory,CPU,disk) would work for a Hub+Proxy server (assuming I'm running single user servers remotely)? Do you have any examples from production? From what I see, the Hub+Proxy needs CPU and memory primarily for incoming http requests, and disk for maintaining hub state. The single user servers on the other hand should primarily need resources for the running the spark drivers themselves, but what sort of overhead would Jupyter (and the Hub bits) add to that?
Appreciate the help!
Regards,
Abinav