Using Dockers for kaggle kernel like feature

30 views
Skip to first unread message

Aakash Sharma

unread,
Feb 13, 2020, 8:28:00 AM2/13/20
to Project Jupyter
Hi I understand jupyterhub has a docker spawner that will give a container per user and isolate so users do not interfere their work. But if we wish to have a kaggle kernel like feature that for each notebook we spawn a container import required packages library build the environment and assign to user. Essentially multiple containers spawned with required dependencies and assigned to user. How it may be achieved. Thx, aakash

Luciano Resende

unread,
Feb 13, 2020, 10:21:21 AM2/13/20
to jup...@googlegroups.com
On Thu, Feb 13, 2020 at 5:28 AM Aakash Sharma <heyits...@gmail.com> wrote:
>
> Hi I understand jupyterhub has a docker spawner that will give a container per user and isolate so users do not interfere their work. But if we wish to have a kaggle kernel like feature that for each notebook we spawn a container import required packages library build the environment and assign to user. Essentially multiple containers spawned with required dependencies and assigned to user. How it may be achieved. Thx, aakash
>

I believe we started on that direction with our Docker support on
Jupyter Enterprise Gateway, but ended up making a more `docker swarm`
support: https://jupyter-enterprise-gateway.readthedocs.io/en/latest/kernel-docker.html

This would enable each kernel to run as an individual docker container.

We also have support for Kubernetes, which in combination with
JupyterHub can give you pretty much what you describe above:

https://jupyter-enterprise-gateway.readthedocs.io/en/latest/kernel-kubernetes.html


> --
> You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/12b99811-7c7e-401e-aa0e-a5a00f267d34%40googlegroups.com.



--
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Jason Anderson

unread,
Feb 13, 2020, 10:46:57 AM2/13/20
to jup...@googlegroups.com
Hi Aakash,

You could look at the named servers[1] feature for this. It allows you to allow users to spawn multiple Jupyter server containers assigned to them. A solution for your use case would probably require some additional functionality built in to JupyterHub. We have explored something similar in a three-part solution.

Part 1: use the pre_spawn_hook[2] functionality of the spawner to adjust spawn configurations based on something on the request. In our case, we inspected the request query string for some special arguments. In your case, those arguments could contain the location of the Notebook and/or perhaps the package requirements definition. This information can be passed to the spawner as environment variables; other ways of passing the information may be possible.

Part 2: extend a default JupyterLab docker container with a custom start script that checks those environment variables. Maybe it checks out the Notebook remotely and pip installs the required packages. I am not aware of a pre-built solution for this; repo2docker has the ability to do it but last time I looked, it's not possible to extract just the logic that scans for requirements.txt files and installs them; it can only output new Dockerfiles. Still a good place to steal from ;). Once the packages are installed, you call the original JupyterLab start script.

Part 3 (optional): register a custom handler (via extra_handlers[3]) in JupyterHub if you want to make it easier to direct the user to your spawn URL. In Part 1, your spawn url would have to look something like /spawn/{user_name}/{server_name}?notebook_url={notebook}. The server_name would be auto-generated as each server must have a unique name. We used some hash of the input params. It's probably also to encode the input params (like your notebook name) directly as the server name.

Hope this gives you some ideas,
/Jason


[1]: https://jupyterhub.readthedocs.io/en/stable/reference/rest.html#enabling-users-to-spawn-multiple-named-servers-via-the-api
[2]: https://jupyterhub.readthedocs.io/en/stable/api/spawner.html#jupyterhub.spawner.Spawner.pre_spawn_hook
[3]: https://jupyterhub.readthedocs.io/en/stable/api/app.html#jupyterhub.app.JupyterHub.extra_handlers

Tim Head

unread,
Feb 13, 2020, 11:07:13 AM2/13/20
to Project Jupyter
If you are looking for a production grade setup that builds containers on demand and uses JupyterHub in the background give https://github.com/jupyterhub/binderhub a spin.

T

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.

Jason Anderson

unread,
Feb 13, 2020, 1:49:07 PM2/13/20
to jup...@googlegroups.com
Tim, that's cool, I didn't realize BinderHub supported authentication. Definitely a more production-grade setup.

Aakash Sharma

unread,
Feb 13, 2020, 7:12:06 PM2/13/20
to jup...@googlegroups.com
thanks for your suggestions will check out these options.

best,
Aakash

Reply all
Reply to author
Forward
0 new messages