AWS batch job fails with exit 1, but limited information on how to debug it .

208 views
Skip to first unread message

a.arg...@gmail.com

unread,
Feb 4, 2021, 3:47:25 AM2/4/21
to Nextflow

Hi,
I have already posted here related to similar issues on Nextflow and AWS batch.
I  have a pipeline on AWS that has  several steps but lately after several resumes it always fails on a featureCount job with exit error 1. However, most of featurecount jobs have  already succeeded and I have already the result in my output dir.  I would exclude that is an error on the process itself.

The log in CloudWatch and .commang.log  just say :

2021-02-04T08:41:01.335+01:00
nxf-scratch-dir ip-10-1-12-8:/tmp/nxf.w7cY06Noix
'

the content in .command.sh is correct and I have also checked that the input file (a bam file) is  also available in the work dir previously used by the star task.

This are parameter used in configuaration fileare (mostly copied from previous issues on Github) :
client {
                protocol = 'https'
                uploadStorageClass = 'INTELLIGENT_TIERING'
                maxConnections = 20 // This seems to be closely tied to uploadMaxThreads - vast differences caused issues for me
                maxErrorRetry = 100
                uploadMaxThreads = 20 // Vary this according to your uploader computer
                uploadChunkSize = '100MB' // I have found larger chunk sizes to be more stable
                uploadMaxAttempts = 10
                uploadRetrySleep = '10 sec'
            }
        aws.region = 'us-east-1'
        aws.batch.cliPath = '/home/ec2-user/miniconda/bin/aws'
        aws.batch.maxTransferAttempts= 100
        aws.batch.delayBetweenAttempts = 1000
        aws.batch.maxParallelTransfers = 8


Any help on how I can debug this kind of failure ?

Reply all
Reply to author
Forward
0 new messages