custom notebook deployments

556 views
Skip to first unread message

Jeremy Freeman

unread,
Jul 31, 2015, 12:59:55 AM7/31/15
to Project Jupyter

Hey everyone, wanted to let the community know that we’ve been prototyping a way to make it really easy to turn a GitHub repo with Jupyter notebooks into a tmpnb deployment.


Having done a few custom tmpnb deployments for our own projects (e.g. codeneuro.notebooks.org), we realized it could be cool to streamline and scale the process, obviously building on all the awesome work done by the community so far. 


Our working model is: users specify a GitHub repo (with notebooks), a set of dependencies (just Python ones, for now), a set of services (like Spark), we automatically build Docker images, and return a button that let’s anyone launch an instance of the deployment (notebooks + services) on a cluster running Kubernetes. We’re testing now on AWS + GCE (hosting is flexible) and making sure everything works.


Would love to hear any feedback on the idea, and we’ll share a version here (with code) to test out as soon as it’s ready!


-- Jeremy Freeman and Andrew Osheroff

Fernando Perez

unread,
Jul 31, 2015, 1:24:56 AM7/31/15
to jup...@googlegroups.com
On Thu, Jul 30, 2015 at 9:59 PM, Jeremy Freeman <freeman...@gmail.com> wrote:

Would love to hear any feedback on the idea, and we’ll share a version here (with code) to test out as soon as it’s ready!

Absolutely delighted to see this coming to fruition.  Fantastic job, Jeremy, thanks so much!


--
Fernando Perez (@fperez_org; http://fperez.org)
fperez.net-at-gmail: mailing lists only (I ignore this when swamped!)
fernando.perez-at-berkeley: contact me here for any direct mail

Kyle Kelley

unread,
Jul 31, 2015, 12:55:18 PM7/31/15
to jup...@googlegroups.com, Jesse Noller
Jeremy and Andrew,

I'm really glad to hear about trying this style of deployment with kubernetes, as well as the fact that you've got it down to "here's a simple button". This sounds wonderful! Most of the security ramifications have worried me on a setup like this, but I'd be really happy to help, reveiew, audit, and contribute as much as possible.

You certainly have my blessing on getting hosting (via Rackspace), especially if we can run try.jupyter.org over it. I've been looking to update tmpnb to a different style, but have been focused on the other end of Jupyter -- the frontend...

Reach me anytime, I've loved all your work and hope we get to meetup soon.

-- Kyle Kelley



On Thu, Jul 30, 2015 at 9:59 PM, Jeremy Freeman <freeman...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/d8be86fe-4e00-4f42-b4e0-b671c3e771c7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Nicholas Bollweg

unread,
Jul 31, 2015, 11:58:12 PM7/31/15
to Project Jupyter
Perhaps we'd like one of the places where such buttons would appear to be nbviewer?

Jeremy Freeman

unread,
Aug 3, 2015, 11:11:01 PM8/3/15
to Project Jupyter

Hi all, thanks for the thoughts! Replies below


Absolutely delighted to see this coming to fruition.  Fantastic job, Jeremy, thanks so much!


Thanks for the encouragement Fernando!


You certainly have my blessing on getting hosting (via Rackspace), especially if we can run try.jupyter.org over it.


Really awesome Kyle, this is all *terrific* to hear. We're now looking into Rackspace + Kubernetes integration, so far seems reasonable, e.g. https://github.com/GoogleCloudPlatform/kubernetes/blob/release-1.0/docs/getting-started-guides/rackspace.md, Andrew might jump in here with more comments. And running try.jupyter.org over this sounds awesome. Once it's all set up, it could probably be a repo (with a button) like any other, but wrapped by a custom URL and some nice custom styles. That'd be fun to work on together. And we'll definitely take you up on the offer to evaluate / audit / help, will ping about it again soon!


Perhaps we'd like one of the places where such buttons would appear to be nbviewer?


Linking with nbviewer would be great! Once the deploy button is created the link could go anywhere, but it'd be cool if nbviewer automatically picked it up when displaying a repo (if it exists), or something like that.

Nicholas Bollweg

unread,
Aug 5, 2015, 1:05:51 AM8/5/15
to Project Jupyter
Keep up the good work!

From the nbviewer perspective, a GH-focused feature would be a great start... but what if the provision/config stuff could live in the notebook itself, as metadata?


Basically, look for a particular key (say, "deploy") and rebuild the files described there into the docker(-compose) build context.


Then namespace the build via notebook sha1/256, neatly handling naming and cache invalidation.


Still plenty of reasons to support repos, but the metadata approach would give a closed form that could work for any provider... and be a structure that could drive other deployments (desktop, hpc, dashboard).


Data files would still be a concern, but if they were also addressable in an immutable format (commit blob, s3, magnet url) it would all kind of click.


Pulling all of that together:


{
 
"metadata": {
   
"deploy": {
     
"docker-compose.yml": {
       
"kernel": {
         
"image": "ipython/kernel",
         
"volumes": [
             
".:notebooks/"
         
],
         
"links": [
             
"db"
         
]
       
},
       
"db": {
         
"image": "postgres"
       
}
     
},
     
"big-data-file.hdf5": "http://s3.amazon.com/2183718249184891481924",
     
"requirements.txt": [
       
"pandas"
     
]
   
}
 
},
 
"cells": []
}

It seems like JSON(/YAML) could live as the tree object itself, rather than being string encoded. Otherwise, non-structured data is just a list of lines to be joined with newlines, while a string is a URL. Or use mimebundle.


Throw some UI around opt-in, config, and searching for docker images and kernel-level dependencies, and you basically have the whole reproducible compute environment described in a single data structure, customizable from the Notebook UI, which you can pass around via gist, email or whatever.

Brian Granger

unread,
Aug 21, 2015, 5:38:51 PM8/21/15
to Project Jupyter
Jeremy,

On Thu, Jul 30, 2015 at 9:59 PM, Jeremy Freeman
<freeman...@gmail.com> wrote:
> Hey everyone, wanted to let the community know that we’ve been prototyping a
> way to make it really easy to turn a GitHub repo with Jupyter notebooks into
> a tmpnb deployment.
>
>
> Having done a few custom tmpnb deployments for our own projects (e.g.
> codeneuro.notebooks.org), we realized it could be cool to streamline and
> scale the process, obviously building on all the awesome work done by the
> community so far.
>
>
> Our working model is: users specify a GitHub repo (with notebooks), a set of
> dependencies (just Python ones, for now), a set of services (like Spark), we
> automatically build Docker images, and return a button that let’s anyone
> launch an instance of the deployment (notebooks + services) on a cluster
> running Kubernetes. We’re testing now on AWS + GCE (hosting is flexible) and
> making sure everything works.

I *really* like this model and am wondering if it would make sense to
also think about deploying jupyterhub in this way as well.

>
>
> Would love to hear any feedback on the idea, and we’ll share a version here
> (with code) to test out as soon as it’s ready!
>

Yes, please share the code when it is ready - there are lots of folks
who would be interested in this.

Cheers,

Brian


>
> -- Jeremy Freeman and Andrew Osheroff
>
> --
> You received this message because you are subscribed to the Google Groups
> "Project Jupyter" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to jupyter+u...@googlegroups.com.
> To post to this group, send email to jup...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jupyter/d8be86fe-4e00-4f42-b4e0-b671c3e771c7%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
bgra...@calpoly.edu and elli...@gmail.com

Matt Craig

unread,
Oct 6, 2015, 8:42:35 PM10/6/15
to Project Jupyter
Hi Jeremy and Andrew,

Just tried this out today and it is amazing. I could imagine doing research with a much wider range of students using this without having to think about their hardware or the amount of storage they have. 

I spent about an hour today walking my ipad around the office reducing (calibrating) astronomy data with this: http://mybinder.org/repo/mwcraig/reducer-binder

Really incredible, and eager to see this progress!

Matt Craig

Jeremy Freeman

unread,
Oct 7, 2015, 12:01:21 AM10/7/15
to Project Jupyter
Thanks Matt! Really appreciate the feedback, and what a great example, that interactive table / image viewer is really cool! Also love that you were doing it from an iPad =)

-- Jeremy

Tony Hirst

unread,
Oct 30, 2015, 1:49:00 PM10/30/15
to Project Jupyter
Jeremy

I've been meaning to try to get my head round the consequences of this approach for all manner of open education projects for way to long, and started having a look again today...and wondered whether you could answer a couple of questions....

My understanding is that I create an image from a username/repo slug - I do not need to own the repo - and an image is created. If I then go to http://mybinder.org/repo/username/repo a container is fired up from the image. If I go to the URL again, another container is fired up.

First question - is that a correct understanding?!

Then the real questions that come immediately to mind:

- how long do containers persist? eg we're running a FutureLearn course at the moment that makes use of IPython/Jupyter notebooks (https://www.futurelearn.com/courses/learn-to-code), but it requires learners to install Anaconda (which has caused a few issues). The course lasts 4 weeks, with learners studying a couple of hours a day maybe two days a week. Presumably, the containers are destroyed as a matter of course according to some schedule or rule - but what rule? I guess learners could always save and download their notebooks to the desktop and then upload them to a running server?

- how does the system scale? eg I'm not suggesting we point FutureLearn learners to mybinder (several thousand learners signed up...) but if I wanted to  try to persuade my institution to use the binder approach, how easily would it scale? eg how easy is it to give a credit card to some back-end hosting company, get some keys, plug them in as binder settings and just expect it to work? (You can probably guess at my level devops/sysadmin ability!;-)

Then more roadmap style questions:

- how easy is it to add additional services? eg something like RStudio, for example, or OpenRefine? I haven't had a look through the repo yet - are you using Docker Compose to tie things together?

- how easy would it be to set up the system to use an alternative kernel - eg to support a Ruby or R course, for example? (I notice that tmpnb.org offers a variety of kernels, for example?)

- are you looking at closer integration with github - eg so if I log in with github, I could then save back to my repo?

Once again, thanks for sharing this - it's rich with possibilities, if only I could persuade my institution to agree!;-)

Will be pondering it muchly over the next few days...:-)

tony
Reply all
Reply to author
Forward
0 new messages