Questions on scaling out Jupyter Hub

849 views
Skip to first unread message

abhinav...@gmail.com

unread,
May 23, 2016, 3:02:01 AM5/23/16
to Project Jupyter
Hello fellow Jupyter Hub enthusiasts,

We've been trialling Jupyter Hub as a possible multi-tenant notebook solution at our org. We have a single Hub+proxy running on an openstack VM with the Toree and PySpark kernels, with the default authenticator (but sss configured on the host to point to LDAP) and the default spawner, and things look great. So the next step is rolling this out as a production solution to support ~200 users, and I was hoping to tap into the wisdom of the community on a few questions that I have:

1) Remote Spawner
The way I understand it, the default spawner will launch the hub, the proxy and single user Jupyter servers on the same host. (Correct me if I'm wrong.) I don't think this is the ideal setup for production as there isn't complete user isolation (I understand we have the notion of a 'sandbox' root directory for each user, but we don't have CPU isolation for example). Given this fact, the idea of the RemoteSpawner looks good - presumably the hub and proxy will run on host 1, while host 2 will run remote single user servers, at least isolating the hub and proxy from the user servers. If I understand correctly, the remote spawner does not do any load balancing - in the sense that it doesn't attempt to distribute single user servers among n remote hosts. However, the github documentation (https://github.com/zonca/remotespawner) suggests this code is still not production ready; is anyone using the remote spawner successfully in production? 
(Also, while the Docker solution sounds great, unfortunately, we don't have a docker-ready production OS image at our org today.)

2) Load Balancer
I'm wondering if I need a load balancer (with session stickiness) in front of multiple hub servers (possibly connecting multiple remote single user servers in the backend). I suppose the answer depends on what kind of load a single Hub server can take. Does anybody have any numbers for me to absorb, or any documentation that'll give me a starting point in coming up with an estimate? Given that the Hub+proxy server is going to primarily be forwarding http requests, a single server should be able to able to handle the ~200 users that we have, correct?

3) Resource Reqs
What sort of hardware (in terms of memory,CPU,disk) would work for a Hub+Proxy server (assuming I'm running single user servers remotely)? Do you have any examples from production? From what I see, the Hub+Proxy needs CPU and memory primarily for incoming http requests, and disk for maintaining hub state. The single user servers on the other hand should primarily need resources for the running the spark drivers themselves, but what sort of overhead would Jupyter (and the Hub bits) add to that?

Appreciate the help!

Regards,
Abinav

Tim Head

unread,
May 23, 2016, 3:19:18 AM5/23/16
to jup...@googlegroups.com
Hi Abhinav,


On Mon, May 23, 2016 at 9:02 AM <abhinav...@gmail.com> wrote:


1) Remote Spawner

Have you looked into the docker swarm spawner? Using docker to spawn kernels would allow you to do resource limiting (CPU and RAM) and isolate your users from each other. Once you have the docker spawner working it is not a lot of extra work to switch to docker swarm which will allow you to distribute kernels across multiple machines (for free).

3) Resource Reqs

The question of resources came up a while ago and some of the wisdom from that thread got collected here: https://github.com/jupyterhub/jupyterhub/issues/505

T

abhinav...@gmail.com

unread,
May 23, 2016, 5:56:50 AM5/23/16
to Project Jupyter

Thanks Tim! Comments inline.

On Monday, May 23, 2016 at 12:49:18 PM UTC+5:30, Tim Head wrote:
Hi Abhinav,


On Mon, May 23, 2016 at 9:02 AM <abhinav...@gmail.com> wrote:


1) Remote Spawner

Have you looked into the docker swarm spawner? Using docker to spawn kernels would allow you to do resource limiting (CPU and RAM) and isolate your users from each other. Once you have the docker spawner working it is not a lot of extra work to switch to docker swarm which will allow you to distribute kernels across multiple machines (for free).
Yes, this seems perfect but unfortunately, we don't have a docker-ready production OS image in our org. I'm pushing for it, but it's probably not going to happen, which is why I'm looking at the remote spawner instead.
 

3) Resource Reqs

The question of resources came up a while ago and some of the wisdom from that thread got collected here: https://github.com/jupyterhub/jupyterhub/issues/505
This is great! From this, it looks like the hub or the proxy being able to handle 100s of concurrent users is not the problem, but fitting in those 100s of users on a single machine is. If I have to distribute my 200 users between, say, two hosts, then I'll need a load balancer in front of two Hub+Proxy servers, correct?
Also, I'd be interested in knowing what sort of config people are using for the machines running the Hub+Proxy processes.

Regards,
Abhinav
 

T
Reply all
Reply to author
Forward
0 new messages