Docker options: read-only and user breaking my tool

318 views
Skip to first unread message

Miguel Vazquez

unread,
Feb 12, 2016, 9:07:15 AM2/12/16
to common-workflow-language
Hello,

I'm working on doing CWL files for some of my tools and I've ran into a technical problem; actually two. When I run my tool, which relays on a docker image, the following options are added to my command line: --read-only=true and --user=15267

 docker run -i --volume=/home/mvazquezg/test/muts.100000:/tmp/job903869115_test/muts.100000:ro --volume=/home/mvazquezg/tmp/rbbt:/tmp/job_output:rw --volume=/tmp/tmpGmGd8H:/tmp/job_tmp:rw --workdir=/tmp/job_output --read-only=true --user=15167 --rm --env=TMPDIR=/tmp/job_tmp mikisvaz/rbbt-sequence rbbt workflow task Sequence mutated_isoforms_fast --mutations /tmp/job903869115_test/muts.100000 --organism Hsa/feb2014 > /home/mvazquezg/tmp/rbbt/out.txt

I've checked the CWL library and found this on job.py

        if img_id:                                                                                                                                                                          
            runtime = ["docker", "run", "-i"]                                                                                                                                               
            for src in self.pathmapper.files():                                                                                                                                             
                vol = self.pathmapper.mapper(src)                                                                                                                                           
                runtime.append("--volume=%s:%s:ro" % vol)                                                                                                                                   
            runtime.append("--volume=%s:%s:rw" % (os.path.abspath(self.outdir), "/tmp/job_output"))                                                                                         
            runtime.append("--volume=%s:%s:rw" % (os.path.abspath(self.tmpdir), "/tmp/job_tmp"))                                                                                            
            runtime.append("--workdir=%s" % ("/tmp/job_output"))                                                                                                                            
            runtime.append("--read-only=true")                                                                                                                                              
            euid = docker_vm_uid() or os.geteuid()                                                                                                                                          
            runtime.append("--user=%s" % (euid))                                                                                                                                            
                                                                          

The --read-only breaks my command, becuase it needs to write a few lock files in a certain directory. The --user breaks it as well because the 15167 user does no exist on my docker image, and it is certainly not the 'rbbt' user, which has euid 1000.

If I uncomment both lines in job.py everything is fine. Why is it read-only? is there a proper way to avoid it? can I at least make non-read-only some particular directory e.g. lock file directory? Why is  the wrong user 15167 used? looks like a bug.

Thanks very much

Miguel                                                                                                              

denis...@gmail.com

unread,
Feb 12, 2016, 10:35:10 AM2/12/16
to common-workflow-language
Hi,

For the first issue, I don't know the reasoning behind making the filesystem read-only, this stung us too.
We did, however, discover a workaround by making all the directories that need r/w volumes in your Dockerfile
( https://github.com/common-workflow-language/cwltool/issues/30 )

Peter Amstutz

unread,
Feb 16, 2016, 9:00:21 AM2/16/16
to Miguel Vazquez, common-workflow-language

Hi Miguel,

The reasoning for making the root FS read-only is to encourage portability (and give implementers more flexibility) by limiting where the tools can write.  Denis's suggestion of specifying volumes in your Dockerfile is a fine solution, since that explicitly indicates that you need to write to that directory.

The reason for --user is so that the user id inside the container matches the user invoking the container. This had two benefits: better security (the container user is not root), and it avoids the problem of output files being written as a different user such that the host user doesn't own the output.

Docker 1.10 introduces user namespace mapping, which mitigates some of these issues.  I'm also experimenting with using the setgid bit in the output directory as another way of ensuring that the host user can access files created by the container user.

Thanks,
Peter

> --
> You received this message because you are subscribed to the Google Groups
> "common-workflow-language" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to common-workflow-la...@googlegroups.com.
> To post to this group, send email to
> common-workf...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/common-workflow-language/f70896ed-821b-4fbd-9f10-ebaf51c1f5d7%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Miguel Vazquez

unread,
Feb 16, 2016, 10:09:27 AM2/16/16
to Peter Amstutz, common-workflow-language
Hi Peter

Thanks for your clarifications. If you don't mind me being a bit pesky I'll reply to your comments

On Tue, Feb 16, 2016 at 3:00 PM, Peter Amstutz <peter....@curoverse.com> wrote:

Hi Miguel,

The reasoning for making the root FS read-only is to encourage portability (and give implementers more flexibility) by limiting where the tools can write. 

Can you elaborate what the problem is with portability? As I understand it you run the container, it writes whatever it needs to write, you take it down and next time you run it it has a clean slate. Isn't that right? Doesn't the fact that they are containers specifically take care of isolating them?
 

Denis's suggestion of specifying volumes in your Dockerfile is a fine solution, since that explicitly indicates that you need to write to that directory.

I understand what you mean, but volumes are not used in docker to indicate writable directories but persistent directories. I'm now faced with the problem of having to change my docker image in a way that I do not find very satisfactory; I rather set up volumes when running docker if need be.
 

The reason for --user is so that the user id inside the container matches the user invoking the container. This had two benefits: better security (the container user is not root), and it avoids the problem of output files being written as a different user such that the host user doesn't own the output.

But what if my user inside the container does not match the user running the container, as it happens with mine (mine is not 'root', is 'rbbt')? What do you suggest I do? In fact, isn't it true in general that users of host and container do not match? If this entails changing how the user is defined and bootstraped in my images it will definitely force me to maintain a parallel set of docker images that are CWL compliant. Furthermore, how do you know they user of the host when you build the image?

I guess I must be missing something if your approach does in fact work well with other images. To investigate this further I've tried a small test example. Attached. It builds an image that creates a user 'test_user', set it as default, creates a file '/tmp/foo' and does 'chmod 600 /tmp/foo'. As you can see in the tests included the file is read correctly when docker is run normally, but it gives a read permission error when executing it through CWL. To try it for yourself just unpack, move into the directory and execute './try_it_out.sh'

Docker 1.10 introduces user namespace mapping, which mitigates some of these issues.  I'm also experimenting with using the setgid bit in the output directory as another way of ensuring that the host user can access files created by the container user.

I've had that problem too. Actually I had a problem with the host not being able to remove the files created by the container. The docker helper tool I've cooked myself will create a directory for results and place them there. They can only be remove by the host by removing the complete directory, but I have no issues with creating and reading them.

Would it be to bad to add an entry in the CWL to specify additional docker options where we could at least override the 'read-only' and 'user' parameters?

I hope this discussion helps

Miguel
test_docker.tar.gz
Reply all
Reply to author
Forward
0 new messages