I'd like to use iPython/Jupyter to analyze data sitting on a supercomputer cluster. The supercomputer is a shared resource sitting behind a bastion host firewall. It is not easy to get to, and opening ports on it is probably a bad idea. Consider ssh as the only good way to talk to things on the supercomputer; assume that password-less ssh is possible.
Based on the docs, I see that iPython/Jupyter involves three process communicating with each other:
a) The kernel --- where the actual work gets done
b) The notebook --- talks to the kernel on one side, and to a web browser or Qt client on the other.
c) The user's client (web browser)
I would like to run (a) on the server and (c) on the user's desktop. (b) needs to run on the server too, since getting an http proxy through to the server is just about impossible / impractical / against policy in this case.
I see that iPython offers a way to run part (a) on the server, while (b) is running on your local machine:
The problem here is the connection between (a) and (b) is via a port on the server, which is forwarded via SSH to the notebook (running in this case on the user's desktop machine). Port forwarding and running processes that open server ports are probably both against supercomputer policy.
Instead, I'd like to connect the notebook and kernel via an ssh pipe. The notebook would launch a kernel by invoking ssh with a command line to run the kernel on the remote machine, something like:
ssh server kernel
The notebook would then talk to the kernel through a pipe to the ssh process; and the kernel talks to the notebook via STDIN/STDOUT. I've built such a system for another project, and it really works nicely. I understand and am OK with the "downsides" of this approach:
a) The lifetime of the kernel depends on keeping an open ssh connection between client and server machines.
b) A kernel could only server one notebook.
Questions:
a) Assuming the infrastructure for this KIND of networking has already been built (it has), how difficult would it be to use it in iPython/Jupyter?
b) Would anyone on the Jupyter team be available to do this, or to advise/assist in doing it?
If we could get this going, it could be a BIG win for our lab: all the data are sitting on the supercomputer cluster, and we don't have good ways to access them. People are resorting to manually rsync'ing a bunch of files to their desktop for analysis, or even just to plot it! But there's really too much data to do that easily. Jupyter would be much better.
Thank you,
-- Bob