Load Testing JupyterHub

261 views
Skip to first unread message

Thorin Tabor

unread,
Sep 2, 2016, 6:49:13 PM9/2/16
to jup...@googlegroups.com
Hello,

We've been operating a JupyterHub instance for some time, and are now
looking to gain some insight into the maximum number of users or
notebooks that our current hardware can handle. Basically we want to do
some load testing on our JupyterHub environment.

Since we are surely not alone in doing this sort of testing, I wanted to
ask the Jupyter development community what options and tools are out
there, and which work well with JupyterHub. What are people using? What
are the best practices?

Thorin

Fernando Perez

unread,
Sep 2, 2016, 6:55:13 PM9/2/16
to Project Jupyter, Shreyas Cholia, Ryan Lovett, Yong Qin
I think Yuvi at Wikimedia has been working on some tools for load monitoring, though I don't have a link handy.  And Ryan Lovett at Berkeley has also been looking into this, as well as folks at LBL.  I'm cc'ing a few of them here as I'm not sure if they are on the list or not (pretty sure Yuvi is, not sure about the others), so at least they are aware of the discussion...

Cheers,

--
Fernando Perez (@fperez_org; http://fperez.org)
fperez.net-at-gmail: mailing lists only (I ignore this when swamped!)
fernando.perez-at-berkeley: contact me here for any direct mail

DVD PS

unread,
Feb 16, 2017, 5:00:42 AM2/16/17
to Project Jupyter, sch...@lbl.gov, ry...@berkeley.edu, yong...@lbl.gov


On Friday, 2 September 2016 23:55:13 UTC+1, Fernando Perez wrote:
On Fri, Sep 2, 2016 at 3:49 PM, Thorin Tabor <tho...@broadinstitute.org> wrote:
Hello,

We've been operating a JupyterHub instance for some time, and are now looking to gain some insight into the maximum number of users or notebooks that our current hardware can handle. Basically we want to do some load testing on our JupyterHub environment.

Since we are surely not alone in doing this sort of testing, I wanted to ask the Jupyter development community what options and tools are out there, and which work well with JupyterHub. What are people using? What are the best practices?

I think Yuvi at Wikimedia has been working on some tools for load monitoring, though I don't have a link handy.  And Ryan Lovett at Berkeley has also been looking into this, as well as folks at LBL.  I'm cc'ing a few of them here as I'm not sure if they are on the list or not (pretty sure Yuvi is, not sure about the others), so at least they are aware of the discussion...

I'm wondering whether someone has managed to do a load test. Here, at University College London we are also trying to get a service ready for the students but we won't deploy it till we have some understanding of how many we can handle. The load test team is unaware of any tool they could use in the case of jupyterhub (their recording script tools fail with the javascript).

Any help would be really appreciated.

Thanks a lot,
David

Thomas Kluyver

unread,
Feb 16, 2017, 6:24:11 AM2/16/17
to Project Jupyter
On 16 February 2017 at 10:00, DVD PS <d.perez...@ucl.ac.uk> wrote:
we won't deploy it till we have some understanding of how many we can handle

I don't know of any tools to help you with this, unfortunately.

Running the notebook servers takes ~40MB memory per user, and each Python kernel started is another ~30MB before loading any libraries. But resource use will often be dominated by what code users are running inside their notebooks.

Thomas

Doug Blank

unread,
Feb 16, 2017, 6:53:24 AM2/16/17
to jup...@googlegroups.com
We've been using JupyterHub since it was almost ready to use, and there isn't an easy way to answer the question "how many students can a machine handle?" We use a variety of kernels, and some languages are better than others. We have some Javascript kernels that have a small impact on the server, and Java kernels that just a dozen of students can bring a machine to its knees (at least for a short time, while compiling). IPython is pretty good, but of course it depends on what they are doing.

We use JupyterHub across Physics, Biology, and Computer Science. The Physics courses tend to have the highest load per cell, partly due to their style of processing, but also because of their problem sets.

Our server has 512 gigabytes of RAM, and 12 cores. It is actually just the head node on a cluster. Our goal is to get it set up such that a student's kernel would spin up on the node that is most available (probably using docker). But we have to figure out how that plays with the other jobs and the scheduler. 

I had tried a CPU limiter in the past. Our load is getting so high now, we may have to revisit that.

I'd be glad to help answer this question as we have a live, operating setup. Feel free to contact me directly if there is something I can do (e.g., provide other stats, run a test program, etc.)

-Doug
 

Thomas

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+unsubscribe@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/CAOvn4qgxVXwS97MUQXxRQ8FioJh_DmjOTtcX9-KottgLmkVfJw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Brian Granger

unread,
Feb 16, 2017, 12:40:42 PM2/16/17
to Project Jupyter
In general, the bottleneck is always going to be the
N_students*resources_per_student.

That varies widely by what the students are doing. In our data science
courses, we give each students 2GB RAM and approximately 4 students
per CPU core. The CPU stuff isn't typically the issue - RAM *always*
is.

For more basic python programming stuff, you could use less RAM per
student, but I wouldn't go below around 500MB per student.

Cheers,

Brian
>> email to jupyter+u...@googlegroups.com.
>> To post to this group, send email to jup...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/jupyter/CAOvn4qgxVXwS97MUQXxRQ8FioJh_DmjOTtcX9-KottgLmkVfJw%40mail.gmail.com.
>>
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Project Jupyter" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to jupyter+u...@googlegroups.com.
> To post to this group, send email to jup...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jupyter/CAAusYCgQ4LWvd-btUC8z%3DgbR5u6q2GWsY%3DwcBuJVoEOEwgMwdA%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
bgra...@calpoly.edu and elli...@gmail.com

Ryan Lovett

unread,
Feb 16, 2017, 6:42:16 PM2/16/17
to DVD PS, Project Jupyter, Shreyas Cholia, Yong Qin
Hi all,

Sorry for not noticing this thread earlier. Yes, Yuvi setup a selenium cluster to test a KubeSpawner-based deployment. He got as high as 1000 simultaneous active users before there was a problem, and then that was with the KubeSpawner itself.




Ryan
Reply all
Reply to author
Forward
0 new messages