I'm developing a Jupyter Notebook for my team to use to catalogue and analyse some proprietary data. I'm ready to share it with the team for on-going execution and development. The team generally have Windows 10 workstations and are skilled engineers, though not data scientists. No one currently uses Jupyter.
I now realise I might have thoroughly misjudged Jupyter's ability to support this sort of working environment.
Option 1: Individual installations
This is the worst case scenario. Anyone that wants to run or modify the notebook needs to install Jupyter. Anaconda is probably the best way to go, but its a big, ugly, scary install. Worse, every user will have to install and manage additional libraries. Any notebook change that requires a kernel change will have to be manually applied to each installation.
Surely, being client-server, this is not the intention of Jupyter.
Option 2: One server, many clients
The obvious alternative is to host the Jupyter server on a network accessible computer and have all users connect to it with a browser. That way there's only one shared installation to manage and each user just needs a URL to access it.
But there's a gotcha - the server expects the notebook to be on its own file system! So every user will access the same notebook file. This makes version control very problematic - no one can check out their own copy of the notebook for independent edit and commit sessions. Instead, changes will overwrite the only copy, and commits/reverts/diffs will have to be done on the server (or by mounting the server's file system).
Option 3: Server in Docker image, each user runs a container
Docker to the rescue? That way we can build/maintain one server image (and even version control it) and each user only needs to have a Docker engine installed to instantiate the image (which is a friendly 8GB download!!). They connect to their own container which, with a bit of scripting trickery, will be pointing at their own copy of the notebook.
This option only took 20 hours to investigate before discovering that it fundamentally sucks. Working with the kernel is tricky with lots of new skills necessary. But more showstopping - nothing that shows a Qt window will work. The qtconsole we can do without, but part of our notebook shows a File Open dialog and the best way to do that is with a Qt Widget. With the server in a Docker Container expecting an X Windows environment, and the client in a Windows browser, the Widget cannot be shown.
The Qt issue was the last of many, many issues trying to get the Docker option running. Everything from matplotlib to path mapping, from os
library calls to ipywidgets needed to be investigated, tweaked, Googled, chopped and changed to work. I'm fairly convinced that these dramas would be on-going.
Conclusion
There are lots of discussions around Jupyter version control. There's lots of options for read-only sharing. And there's even a project for runtime-building a Docker container to provide executable access to a notebook. But there is scant advice on using Jupyter in a team environment.
Given the endless complications when the server is not natively running on the same machine as the client, I'm starting to believe Option 1 is the only sane way to go. Before I go to my colleagues with the crappy news, are there any other suggestions?
JBB
On Tue, Jun 6, 2017 at 4:12 PM, Jean Bigboute <jeanbi...@gmail.com> wrote:
> Thank you for posting this and I hope you are successful. As an occasional Jupyter user, I have struggled to understand its use for collaborative work when so much depends on future developments and additional infrastructure. In my work environment we can't deploy IT services such as servers, containers etc. on our own. We don't always have access to help sites either. It is hard to know when to push and when to punt. I don't feel so alone any more.
The usage for collaborative work should become easier soon. We had our
dev meeting last week, and a version that allow some[1] syncing via
google drive is almost ready (Ian did and does a fantastic job). You
will probably see that in JupyterLab in one of the next releases (at
least as an extension). ...
Finding the right balance between consumers of Jupyter that want to
deploy that on large number of nodes with hundreds of users – ...
The documentation can definitively get improvement – and as you might
have seen in an earlier mail Jessica is starting to work with us on
that and we are thrilled – but we can't cover all the ground. We also
tend to assume more and more that people have internet to get help if
you have an interest if having our docs offline and we don't provide a
link please open an issue and we'll try to find time to make that
happen. If you have experience in deploying in a difficult environment
and have manage to go around some pain point, please let us know – or
even better make a PR against our documentation. We'd love to have a
clearer way of expressing the advantages drawback of each deployment
but so far didn't had the time to sit down and write these as things
are moving really fast even for us.