Tipped off to a pipeline approach for sequencing a set functions contained in docker containers created (and then destroyed) on the fly as part of a workflow -
https://www.stat.auckland.ac.nz/~paul/Reports/OpenAPI/NZcrime/nz-crime-pipeline.html - I wondered if I could do the same in Jupyter.
For example, the following command runs the contentmine scientific literature grabbing tool with a search on "aardvark":
docker run --rm --volume "${PWD}/cm":/contentmine --tty --interactive psychemedia/contentmine getpapers -q aardvark -o /contentmine/aardvark -x
I haven't tried yet, this should be easy enough to put into a magic form, for example allowing me to write something of the form:
%docker psychemedia/contentmine -v "${PWD}/cm":/contentmine
getpapers -q aardvark -o /contentmine/aardvark -x
I also started to wonder whether it would be possible to do this completely within docker, eg launching Jupyter notebook inside a container and giving it access to the docker daemon so it could launch other containers:
notebook:
image: jupyter/notebook
ports:
- "8899:8888"
volumes:
- ./notebooks:/notebooks
- /var/run/docker.sock:/var/run/docker.sock
privileged: true
In the notebook container, we'd also need to add a docker CLI:
#Make sure docker is available in the Jupyter container
!apt-get update
!apt-get install -y docker.io
I then hoped I'd be able to do something like:
!mkdir -p downloads
#Run a download command in another container and share the downloaded files back
! docker run --rm --volume "${PWD}/downloads":/contentmine --tty --interactive psychemedia/contentmine getpapers -q aardvark -o /contentmine/aardvark -x
but this appears to run from the perspective of the docker daemon, and doesn't mount the shared folder in the notebook container that launched the command line container. (I can see the files download, presumably into the docker daemon namespace, but they can't be seen inside the Jupyter notebook.)
I couldn't work out a formulation that would allow me to share files between the Jupyter container and the briefly run command line container launched from the Jupyter container.
Is anyone already running this sort of pattern? If so, can you let me know how to set the containers up so they can share files between themselves?
Thanks,
--tony