Jupyterhub in production

525 views
Skip to first unread message

jst...@tacc.utexas.edu

unread,
Apr 8, 2016, 3:17:36 PM4/8/16
to Project Jupyter
Hi all,

First off, thanks for this incredible project! Our "preview-release" instance of Jupyterhub at the Texas Advanced Computing Center (UT Austin) has been generating lots of excitement and we'd like to move to a production (or beta?) release. I was hoping you could give me some guidance on capacity planning. Currently, we're running the hub via DockerSpawner on a single VM. With that approach, do you have any (rough) estimates for number of concurrent users that could be supported as a function of cores/memory/disk? Naturally, it will depend heavily on usage patterns, but any experience you could give would be much appreciated. For instance, if supporting any kind of "real" multi-user environment means moving to swarm and a cluster, that would be great to know.

Thanks very much!
Joe

MinRK

unread,
Apr 11, 2016, 7:25:50 AM4/11/16
to Project Jupyter

On Fri, Apr 8, 2016 at 9:17 PM, <jst...@tacc.utexas.edu> wrote:

Hi all,

First off, thanks for this incredible project! Our "preview-release" instance of Jupyterhub at the Texas Advanced Computing Center (UT Austin) has been generating lots of excitement and we'd like to move to a production (or beta?) release. I was hoping you could give me some guidance on capacity planning. Currently, we're running the hub via DockerSpawner on a single VM. With that approach, do you have any (rough) estimates for number of concurrent users that could be supported as a function of cores/memory/disk? Naturally, it will depend heavily on usage patterns, but any experience you could give would be much appreciated. For instance, if supporting any kind of "real" multi-user environment means moving to swarm and a cluster, that would be great to know.

We don’t have excellent estimates for how many concurrent users is too many, though we do know that it has worked well with ~100 concurrent users (a few hundred total) on a couple of beefy machines. I’ve been trying to collect information on resources various deployments have used with given users, but haven’t done a good job so far.

The main common point is that memory is almost always the limiting factor, in part because memory is what Jupyter uses the most of, and in part because it is the resource whose exhaustion leads to the most unpleasant failures. Very little in the Jupyter stack itself is CPU intensive, so the CPU limit is almost entirely based on what code the users are running. Also, since oversubscribing CPUs for many workloads usually behaves okay, being stingy with CPU isn’t so bad (as long as it’s not a course on CPU optimization!). Same goes for disk - really big notebooks with lots of high-res images are 10s of MB, but most are quite small. Docker on the other hand, tends to need a lot of disk space to function properly, unless you are super judicious about cleaning up. I wouldn’t run docker on a VM with less than 40GB of disk. How much disk you need for each user running with docker will depend significantly on how you mount things into docker, and what users are doing in their containers. If they are regularly installing a whole new scipy stack, for instance, that’s going to be a bunch of files (GB) duplicated across containers.

As for memory, since JupyterHub starts a notebook server for each user, and Jupyter starts a new interpreter for each notebook, you are looking at a baseline of ~50 MB/user, just for turning it on. After that, it’s highly usage dependent. If you are doing any kind of analysis, a few 100s of MB is probably the minimum you would allocate to each user. I typically give 0.5-1GB per user, but you could easily need more.

These are some reference numbers to start with, and should come with some headroom, depending on usage:

CPU: 0.5-1 / user*
memory: 0.5-1GB / user**
disk: 2-10GB / user***

* The user count for CPU limits is based on truly concurrent users, actually looking at and running their notebooks at the same time. For interactive notebooks, people tend to spend more time writing/thinking than running code. Multi-threaded libraries and CPU-intensive code change this calculus.

** The user count for memory is concurrent servers, which are typically culled after an idle period. This varies widely, and can be ~15 minutes for aggressive deployments, or 12-24 hours for less resource-constrained ones.

*** The user count for disk is total users, since disk doesn’t get freed when a server stops. How much space people need depends a lot - you can cut this down a bunch if you don’t allow users to perform installations and if they aren’t likely to download or produce large data sets.

Hope that helps,
-MinRK


Thanks very much!
Joe

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/446f6747-d5b8-40d1-a053-2a3d091d13c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jst...@tacc.utexas.edu

unread,
Apr 11, 2016, 1:27:24 PM4/11/16
to Project Jupyter
That's extremely helpful, thanks very much! I'll be glad to provide deployment details and usage numbers as we get further along.

Best,
Joe

Matthias Bussonnier

unread,
Apr 11, 2016, 2:05:18 PM4/11/16
to jup...@googlegroups.com

Would someone be kind enough to copy Min's response in the Jupyter Hub docs ? I'll be in a plane soon so will have difficulties doing it.

Thanks!
--
M

Brian Granger

unread,
Apr 11, 2016, 8:35:00 PM4/11/16
to Project Jupyter
Great idea Matthias.
> https://groups.google.com/d/msgid/jupyter/CANJQusVdiJtoDsKHuVuUOqZOitkOkH%2Bueu2Rz05iv80WZxnNOQ%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
bgra...@calpoly.edu and elli...@gmail.com

Carol Willing

unread,
Apr 11, 2016, 8:49:22 PM4/11/16
to Project Jupyter
Hi Matthias,

I opened an issue on JupyterHub to create a document on capacity planning guidance. I included much of Joe's question and Min's thorough response.

Safe travels!

Carol

Brian Granger

unread,
Apr 11, 2016, 9:03:23 PM4/11/16
to Project Jupyter
Thanks @willingc
> --
> You received this message because you are subscribed to the Google Groups
> "Project Jupyter" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to jupyter+u...@googlegroups.com.
> To post to this group, send email to jup...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jupyter/dc8b7b01-82e3-4738-9de1-fe3f4515a5fc%40googlegroups.com.

Matthias Bussonnier

unread,
Apr 13, 2016, 1:57:05 AM4/13/16
to jup...@googlegroups.com

Fernando Perez

unread,
Apr 13, 2016, 3:06:52 AM4/13/16
to Project Jupyter
On Mon, Apr 11, 2016 at 5:49 PM, Carol Willing <will...@gmail.com> wrote:
I opened an issue on JupyterHub to create a document on capacity planning guidance. I included much of Joe's question and Min's thorough response.

Awesome, thanks so much! These are the nuggets that we want to capture in the docs...


--
Fernando Perez (@fperez_org; http://fperez.org)
fperez.net-at-gmail: mailing lists only (I ignore this when swamped!)
fernando.perez-at-berkeley: contact me here for any direct mail
Reply all
Reply to author
Forward
0 new messages