Hi,
I have already posted here related to similar issues on Nextflow and AWS batch.
I have a pipeline on AWS that has several steps but lately after several resumes it always fails on a featureCount job with exit error 1. However, most of featurecount jobs have already succeeded and I have already the result in my output dir. I would exclude that is an error on the process itself.
The log in CloudWatch and .commang.log just say :
2021-02-04T08:41:01.335+01:00
nxf-scratch-dir ip-10-1-12-8:/tmp/nxf.w7cY06Noix'
the content in .command.sh is correct and I have also checked that the input file (a bam file) is also available in the work dir previously used by the star task.
This are parameter used in configuaration fileare (mostly copied from previous issues on Github) :
client {
protocol = 'https'
uploadStorageClass = 'INTELLIGENT_TIERING'
maxConnections = 20 // This seems to be closely tied to uploadMaxThreads - vast differences caused issues for me
maxErrorRetry = 100
uploadMaxThreads = 20 // Vary this according to your uploader computer
uploadChunkSize = '100MB' // I have found larger chunk sizes to be more stable
uploadMaxAttempts = 10
uploadRetrySleep = '10 sec'
}
aws.region = 'us-east-1'
aws.batch.cliPath = '/home/ec2-user/miniconda/bin/aws'
aws.batch.maxTransferAttempts= 100
aws.batch.delayBetweenAttempts = 1000
aws.batch.maxParallelTransfers = 8
Any help on how I can debug this kind of failure ?