Calling docker command line containers from a Jupyter notebook container?

1,020 views
Skip to first unread message

Tony Hirst

unread,
May 9, 2016, 5:59:24 AM5/9/16
to Project Jupyter
Tipped off to a pipeline approach for sequencing a set functions contained in docker containers created (and then destroyed) on the fly as part of a workflow - https://www.stat.auckland.ac.nz/~paul/Reports/OpenAPI/NZcrime/nz-crime-pipeline.html - I wondered if I could do the same in Jupyter.

If I'm running Jupyter on the desktop, for example via Anaconda, and also have docker running, I can run command line apps in docker containers working on shared file on the host.

For example, the following command runs the contentmine scientific literature grabbing tool with a search on "aardvark":

docker run --rm --volume "${PWD}/cm":/contentmine --tty --interactive psychemedia/contentmine getpapers -q aardvark -o /contentmine/aardvark -x

I haven't tried yet, this should be easy enough to put into a magic form, for example allowing me to write something of the form:

%docker psychemedia/contentmine -v "${PWD}/cm":/contentmine
getpapers -q aardvark -o /contentmine/aardvark -x


I also started to wonder whether it would be possible to do this completely within docker, eg launching Jupyter notebook inside a container and giving it access to the docker daemon so it could launch other containers:

notebook:
  image: jupyter/notebook
  ports:
    - "8899:8888"
  volumes:
    - ./notebooks:/notebooks
    - /var/run/docker.sock:/var/run/docker.sock
  privileged: true 


In the notebook container, we'd also need to add a docker CLI:

#Make sure docker is available in the Jupyter container
!apt-get update
!apt-get install -y docker.io

I then hoped I'd be able to do something like:

!mkdir -p downloads
#Run a download command in another container and share the downloaded files back
! docker run --rm --volume "${PWD}/downloads":/contentmine --tty --interactive psychemedia/contentmine getpapers -q aardvark -o /contentmine/aardvark -x 

but this appears to run from the perspective of the docker daemon, and doesn't mount the shared folder in the notebook container that launched the command line container. (I can see the files download, presumably into the docker daemon namespace, but they can't be seen inside the Jupyter notebook.)

I couldn't work out a formulation that would allow me to share files between the Jupyter container and the briefly run command line container launched from the Jupyter container.

Is anyone already running this sort of pattern? If so, can you let me know how to set the containers up so they can share files between themselves?

Thanks,

--tony

Sam Moskwa

unread,
May 9, 2016, 7:05:58 AM5/9/16
to Project Jupyter
Hi Tony,

If you plan to use cell magics then you can probably skip apt-get installing docker and instead pip install docker-py.

I'd probably let the jupyter container handle volumes and let the child containers access them using the --volumes-from option

For example:

docker run --name jupyter -it -v /tmp/jupytershare:/jupytershare busybox /bin/sh
docker run --volumes-from=jupyter busybox touch /jupytershare/foo

Then in the jupyter shell confirm the file has been added $(ls /juptyershare)
And on the host $(ls /tmp/jupytershare)


Presumably there would be a range of other uses for dockermagic, where containers are not run to completion immediately and/or communication is required beyond simply leaving files in a shared volume.

I'm intrigued by the possibilities..

regards,
Sam


--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/b8d97921-042e-4323-8839-810445e03345%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Message has been deleted

Tony Hirst

unread,
May 9, 2016, 12:10:55 PM5/9/16
to Project Jupyter
Hi Sam

>If you plan to use cell magics then you can probably skip apt-get installing docker and instead pip install docker-py.

The magics run where there? There is no docker command available in the basic notebook container?

I'd originally tried various variations on linked containers but got nowhere. Then I realised I was using the wrong linked cotnainer name!:-(

This seems to work - from notebookdockercli/docker-compose.yml:

notebook:
  image: jupyter/notebook
  ports:
    - "8899:8888"
  volumes_from:
    - contentmineshare
volumes: - ./notebooks:/notebooks - /var/run/docker.sock:/var/run/
docker.sock
  privileged: true 

contentmineshare:
  image: psychemedia/contentmine 
  volumes:
    - /contentmine

Then I can run

!apt-get update
!apt-get install -y docker.io

then run the docker CLI command:

! docker run --rm --volumes-from notebookdockercli_contentmineshare_1  --tty --interactive psychemedia/contentmine getpapers -q rhinocerous -o /contentmine/rhinocerous -x

I can then see the files:

!ls  /contentmine/aardvark/

The issue I had was using the wrong volumes-from name.. (I'm not sure how to pick up the name automatically?)


I take the point about docker-py though. It could be better to use docker-py commands to create the data volume container from within the code used by the magic and then issue the docker container command line commands/

Tony Hirst

unread,
May 10, 2016, 6:24:33 AM5/10/16
to Project Jupyter
I had another stab at this using docker-py.

If we know the name of the notebook container we're in, and we know the mount point of a shared directory, we can find the path to the directory that can be mounted as a volume when calling the command line container

import docker
def getPath(container,mountdir):
    cli = Client(base_url='unix://var/run/docker.sock')
    if cli.containers(filters={'name':container}):
        return [x['Source'] for x in cli.inspect_container(container)['Mounts'] if 'Destination' in x and  x['Destination']==mountdir ]
    return []

DD=getPath('/notebookdockercli_notebook_1','/notebooks')
! docker run -v {DD}:/contentmineTest --tty --interactive psychemedia/contentmine getpapers -q rhinocerous -o /contentmineTest/rhinocerous -x

Then it struck me that was being silly too... all we really need is to link an appropriate volume in to the command line container  from the notebook container:

! docker run --rm --volumes-from notebookdockercli_notebook_1 psychemedia/contentmine getpapers -q rhinocerous -o /notebooks/maybe/rhinocerous -x

Doh!

In docker-py:

cli = docker.Client(base_url='unix://var/run/docker.sock')
container_id = cli.create_container('psychemedia/contentmine',
                                    host_config=cli.create_host_config( volumes_from='notebookdockercli_notebook_1'),
                                    command='getpapers -q rhinocerous -o /notebooks/testX/rhinocerous -x')
cli.start(container_id)

I'm not sure how to remove the container once run though, given it may take an arbitrary amount of time to run so how do we know when to remove it? start() doesn't seem to accept the docker run --rm switch? I suppose we could name the containers in a particular way and at the end do housekeeping and remove them all?

Tony Hirst

unread,
May 11, 2016, 2:33:56 PM5/11/16
to Project Jupyter
First attempt at some magics for a specific, canned example: https://gist.github.com/psychemedia/616d8586e055eb1e4b0193ac5a55b9ad
Reply all
Reply to author
Forward
0 new messages