Running Nextflow on AWS Cloud without EFS

390 views
Skip to first unread message

dtz...@emory.edu

unread,
Jul 13, 2017, 2:21:44 PM7/13/17
to Nextflow
Dear Paolo,

Would it be possible in the current or later versions to run Nextflow workflows on AWS Cloud without using EFS, but instead mounting an S3 bucket on all instances/nodes at launch using s3fs? s3fs does provide POSIX filesystem capabilities but it is not a complete filesystem and has several limitations. I would love to use Nextflow for production but the cost of using AWS EFS is quite high (roughly 15 times the cost of using S3).

Thanks,
Jun

Paolo Di Tommaso

unread,
Jul 13, 2017, 3:04:04 PM7/13/17
to nextflow
Yes, it only requires to specify the a S3 path as work directory by using the `-w` command line option, provided you are using the nextflow embedded cluster as explained in the documentation.


Hope it helps

Cheers,
Paolo


--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

dtz...@emory.edu

unread,
Jul 17, 2017, 8:30:03 PM7/17/17
to Nextflow
Hello Paolo,

Thanks for the reply. Do you know if users have been successful in using an S3 bucket mounted through s3fs as a shared storage system replacement for AWS EFS? 

I'm still having some issues running Nextflow on AWS where each process is run as a Docker container. I enabled the Docker 'enabled' and 'sudo' settings and allowed all users to `rwx` the work directory in the S3 bucket. I ran Nextflow with the `-w` option and the automatically configured `ignite` executor was used. The containers are started in all nodes but Nextflow terminates midway due to it not being able to find input/output files. It seems that the input files to the first process in the workflow contain nothing but the absolute path to the input file in the EC2 instance local filesystem and the Docker containers output empty files.

Is there perhaps any further configuration or small detail that I may be missing?

Best Regards,
Jun


On Thursday, July 13, 2017 at 3:04:04 PM UTC-4, Paolo Di Tommaso wrote:
Yes, it only requires to specify the a S3 path as work directory by using the `-w` command line option, provided you are using the nextflow embedded cluster as explained in the documentation.


Hope it helps

Cheers,
Paolo

On Thu, Jul 13, 2017 at 8:21 PM, <dtz...@emory.edu> wrote:
Dear Paolo,

Would it be possible in the current or later versions to run Nextflow workflows on AWS Cloud without using EFS, but instead mounting an S3 bucket on all instances/nodes at launch using s3fs? s3fs does provide POSIX filesystem capabilities but it is not a complete filesystem and has several limitations. I would love to use Nextflow for production but the cost of using AWS EFS is quite high (roughly 15 times the cost of using S3).

Thanks,
Jun

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.

Paolo Di Tommaso

unread,
Jul 18, 2017, 2:57:04 AM7/18/17
to nextflow
This may be related to a wrong declaration of the input in one or more processes in your pipeline. 

I suggest the following: 

1) create a small test dataset for your pipeline
2) make sure that is run successfully in a single machine having docker installed and enabled in your pipeline. 
3) run the same pipeline in the AWS cloud. 

If you have problem in one of these steps, report here the exact error message and include the `.nextflow.log` file. 


Hope it helps. 


Cheers,
Paolo


To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

dtz...@emory.edu

unread,
Jul 18, 2017, 5:15:08 PM7/18/17
to Nextflow
Hi Paolo,

It may be an incorrect configuration of the workflow document on my end. However, I was able to run the pipeline successfully on a single machine using the `local` executor. Running the pipeline using the `'ignite` executor in the master node causes the error. I have attached several files including those that you have requested. I hope they can help you diagnose the problem I am facing.

Thanks again,
Jun
error
nextflow.config
nextflow.config.cloud
nextflow.log
workflow-docker-multiple.nf

Paolo Di Tommaso

unread,
Jul 18, 2017, 7:01:03 PM7/18/17
to nextflow
Are you using a fuse file system mounted over S3  ?


p

To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

dtz...@emory.edu

unread,
Jul 21, 2017, 1:43:01 PM7/21/17
to Nextflow
Paolo,

Yes I am using the s3fs FUSE-based filesystem to mount the S3 bucket. 

Jun

Paolo Di Tommaso

unread,
Jul 21, 2017, 1:47:03 PM7/21/17
to nextflow
That's not a not reliable file system and above all it's not necessary. 

Nextflow has a native support for S3 storage, you only need to prefix the work path with the `s3://` protocol (when running in the AWS cloud). 

For example: 

nextflow run hello -w s3://your-bucket/path 


Hope it helps. 


Cheers,
Paolo

dtz...@emory.edu

unread,
Aug 8, 2017, 4:49:54 PM8/8/17
to Nextflow
Hi Paolo,

Following your suggestion made it work smoothly. Thanks for the help. 

Best,
Jun
Reply all
Reply to author
Forward
Message has been deleted
0 new messages