Hey everyone, wanted to let the community know that we’ve been prototyping a way to make it really easy to turn a GitHub repo with Jupyter notebooks into a tmpnb deployment.
Having done a few custom tmpnb deployments for our own projects (e.g. codeneuro.notebooks.org), we realized it could be cool to streamline and scale the process, obviously building on all the awesome work done by the community so far.
Our working model is: users specify a GitHub repo (with notebooks), a set of dependencies (just Python ones, for now), a set of services (like Spark), we automatically build Docker images, and return a button that let’s anyone launch an instance of the deployment (notebooks + services) on a cluster running Kubernetes. We’re testing now on AWS + GCE (hosting is flexible) and making sure everything works.
Would love to hear any feedback on the idea, and we’ll share a version here (with code) to test out as soon as it’s ready!
-- Jeremy Freeman and Andrew Osheroff
Would love to hear any feedback on the idea, and we’ll share a version here (with code) to test out as soon as it’s ready!
--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/d8be86fe-4e00-4f42-b4e0-b671c3e771c7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi all, thanks for the thoughts! Replies below
Absolutely delighted to see this coming to fruition. Fantastic job, Jeremy, thanks so much!
Thanks for the encouragement Fernando!
You certainly have my blessing on getting hosting (via Rackspace), especially if we can run try.jupyter.org over it.
Really awesome Kyle, this is all *terrific* to hear. We're now looking into Rackspace + Kubernetes integration, so far seems reasonable, e.g. https://github.com/GoogleCloudPlatform/kubernetes/blob/release-1.0/docs/getting-started-guides/rackspace.md, Andrew might jump in here with more comments. And running try.jupyter.org over this sounds awesome. Once it's all set up, it could probably be a repo (with a button) like any other, but wrapped by a custom URL and some nice custom styles. That'd be fun to work on together. And we'll definitely take you up on the offer to evaluate / audit / help, will ping about it again soon!
Perhaps we'd like one of the places where such buttons would appear to be nbviewer?
Linking with nbviewer would be great! Once the deploy button is created the link could go anywhere, but it'd be cool if nbviewer automatically picked it up when displaying a repo (if it exists), or something like that.
Basically, look for a particular key (say, "deploy") and rebuild the files described there into the docker(-compose) build context.
Then namespace the build via notebook sha1/256, neatly handling naming and cache invalidation.
Still plenty of reasons to
support repos, but the metadata approach would give a closed form
that could work for any provider... and be a structure that could drive
other deployments (desktop, hpc, dashboard).
Data files would still be a concern, but if they were also addressable in an immutable format (commit blob, s3, magnet url) it would all kind of click.
Pulling all of that together:
{
"metadata": {
"deploy": {
"docker-compose.yml": {
"kernel": {
"image": "ipython/kernel",
"volumes": [
".:notebooks/"
],
"links": [
"db"
]
},
"db": {
"image": "postgres"
}
},
"big-data-file.hdf5": "http://s3.amazon.com/2183718249184891481924",
"requirements.txt": [
"pandas"
]
}
},
"cells": []
}
It seems like JSON(/YAML) could live as the tree object itself, rather than being string encoded. Otherwise, non-structured data is just a list of lines to be joined with newlines, while a string is a URL. Or use mimebundle.
Throw some UI around opt-in, config, and searching for docker images and kernel-level dependencies, and you basically have the whole reproducible compute environment described in a single data structure, customizable from the Notebook UI, which you can pass around via gist, email or whatever.