AWS Batch disk space and multiple containers per instance

Scott Hazelhurst

unread,

Apr 30, 2021, 2:34:14 AM4/30/21

to Nextflow

Dear all

We have a workflow where we want to process all the 1000 Genomes crams, so the disk requirements are relatively substantial. AWS Batch is ideal for us. We've run into a problem with the disk space available on our custom AMI.

The problem is that when multiple containers are scheduled on the same instance, the instance runs out of space. If I run maxForks=1 or 2 then there's no problem but with more, the instance runs out of disk space.

For this particular problem, we can solve the problem by tuning our memory and cpu requirements to ensure that at most we have two containers running per instance (obviously I could make the disk bigger but that would just change the n of the problem from 2 to 4 or whatever -- we need to do some calculations to work out reasonable configurations as this is likely to be an expensive run). This would be a satisfactory solution since we hope this will be a one-off run and we are not building this workflow for other people to use.

My question is not so much to solve our specific problem but to understand what the best way for handling this is for future. Could we mount an EFS volume to our containers that could be used for the local staging directory. (Of course as soon as the job has complete we want to delete the local cram copy because don't want to pay for 10s of TBs of disk storage). Is there another way of doing things?

As an aside, the documentation on AWS Batch may need some updating. The limitation of docker image sizes and base size no longer seems to apply to the current version of Docker on the default AWS ECS instance (and the instructions don't apply since e.g. docker info does not show base device size)

Many thanks

Scot

Paolo Di Tommaso

unread,

Apr 30, 2021, 11:07:36 AM4/30/21

to nextflow

Hi Scott,

One strategy is to avoid using too large instances as you are mentioning. Other to this, what can really help is the use of the EBS autoscale utility, which essentially expand the attached volume depending on the disk space available.

if you want to give it a try, Nextflow Tower configure it automatically enabling the "EBS Autoscale feature" setting.

Regarding the docs, thanks for pointing it out. The base size setting is not needed anymore using AWS Linux 2 AMI.

p

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nextflow/cba7cd0a-e37c-4366-863a-b26ef709b998n%40googlegroups.com.

Scott Hazelhurst

unread,

Apr 30, 2021, 11:27:48 AM4/30/21

to next...@googlegroups.com

Hi Paolo

Thanks — that’s very helpful. Yes, we’ll give it a a go, thanks

Scott

Reply all

Reply to author

Forward