Access to files inside Docker containers

1,789 views
Skip to first unread message

Sascha Steinbiss

unread,
Feb 6, 2015, 11:53:24 AM2/6/15
to next...@googlegroups.com
Hi,

I have a question regarding the use of Docker containers with Nextflow. Now I'm struggling with accessing files I have placed there during the container build. To give a minimal example, consider a very simple container which just contains one additional file:


FROM debian
:wheezy
MAINTAINER ss34@sanger
.ac.uk
RUN mkdir
-p /data && echo 'test test this is a test' > /data/test123


Say this is in my container satta/test-docker-nf. I can confirm that the file is present:

$ docker run satta/test-docker-nf ls /data
test123
$ docker run satta
/test-docker-nf cat /data/test123
test test
this is a test

Now consider a Nextflow pipeline test.nf, which accesses this file:

#!/usr/bin/env nextflow

// this is the test file expected to be present inside the container
testfile
= file("/data/test123")

process test
{
    input
:
    file testfile

    output
:
    stdout result

   
"""
    ls -Al ${testfile}
    ls -Al /data
    ls -HAl ${testfile}
    """

}

result
.subscribe { println it }

However, the file appears to be gone when I run the workflow inside the container:

$ nextflow run test.nf -with-docker satta/test-docker-nf
N E X T F L O W  
~  version 0.12.0
[warm up] executor > local
[ed/78abf5] Submitted process > test (1)
Error executing process > 'test (1)'

Caused by:
 
Process 'test (1)' terminated with an error exit status

Command executed:

  ls
-Al test123
  ls
-Al /data
  cat test123

Command exit status:
 
1

Command output:
  lrwxr
-xr-x 1 1000 staff 13 Feb  6 16:46 test123 -> /data/test123
  total
0
  cat
: test123: No such file or directory

Work dir:
 
/Users/ss34/develop/test-docker-nf/work/ed/78abf58852ecc6817aa7e5bdaa41ea

Tip: when you have fixed the problem you can continue the execution appending to the nextflow command line the option '-resume'


The symlink is not created. Did I miss anything? Further experiments (e.g. looking into other directories at run time) have shown that the container's file systems are not available but I'm accessing the ones of the host system instead! I tried that on Linux and Mac OS X with boot2docker.
You can find the workflow and container at satta/test-docker-nf on GitHub and Docker Hub, respectively.

Strangely enough, I can run applications I have installed in a Docker image. Any hints or directions?

Thanks
Sascha

Paolo Di Tommaso

unread,
Feb 6, 2015, 12:38:41 PM2/6/15
to nextflow
Hi, 

The problem is that when a process run into a Docker container Nextflow automatically mounts the input files parent path in the container because it is expected that these are files *external* to the container. Begin so the mount will overwrite that path in the file system internal to the container, resulting in that error.

I think a simple workaround is not declare them as "file", as "val" instead. Since they are internally to the container you do not need to stage them in the working directory.  


Let me know if this solve your problem. 


Cheers,
Paolo



--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
Visit this group at http://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

Sascha Steinbiss

unread,
Feb 6, 2015, 5:31:00 PM2/6/15
to next...@googlegroups.com
Hi Paolo,

thanks a lot! This solves my problem and also explains some background of what "file" and "val" do. Perhaps you could elaborate a bit in the documentation about what types are most appropriate to use in the Docker context? I guess some 'best practices' would definitely help...

Best regards
Sascha

Paolo Di Tommaso

unread,
Feb 7, 2015, 6:44:28 AM2/7/15
to nextflow
Hi Sasha,


Happy to know that you solved your problem. Actually the original idea was to allow containers to access data in the host environment, but I see that your approach makes sense in some circumstances. 

Adding a "best practices" section in the documentation sounds a nice idea, thanks for suggesting that. 

Also it has a tremendous value for us to know how people is using Nextflow, whenever you want to blog or simply tweet about it it would a great contribution. 


Thanks,
Paolo



Reply all
Reply to author
Forward
0 new messages